{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "2b2f9113",
   "metadata": {},
   "source": [
    "## Lecture 22\n",
    "\n",
    "- Lecture exercise out today, due on saturday at midnight\n",
    "- Hope to release another optional lecture exercise, timing is unknown\n",
    "- Hope to release Hw#6 before the week ends\n",
    "- Exam#2 grading to be completed very soon (though probably not today)\n",
    "- Hw#5 grading is next\n",
    "\n",
    "## B-tree indices and index scan\n",
    "\n",
    "- Review of search/insertion/deletion\n",
    "- Handling indices with multiple attributes\n",
    "- Handling indices with duplicate key values\n",
    "- Index search for answering select queries\n",
    "- Generalizing B-tree indices\n",
    "\n",
    "![x](lecture22_img.png)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1afbd17a",
   "metadata": {},
   "source": [
    "Suppose we have the following query:\n",
    "\n",
    "`SELECT A,B,C,D FROM R WHERE R.A=10 AND R.B>5 AND R.B<10 AND R.C > 20;`  \n",
    "And indices:\n",
    "\n",
    "I1 on R(A)  \n",
    "I2 on R(B)  \n",
    "I3 on R(B,A,C)  \n",
    "I4 on R(A,B,C,D)  \n",
    "\n",
    "Possible access paths:\n",
    "\n",
    "- Sequential scan: read all pages of R\n",
    "- Use index I1: \n",
    "  - find all tuples with A=10, \n",
    "  - read the matching tuples from disk and check on B and C conditions, \n",
    "  - output A,B,C,D\n",
    "- Use index I2:\n",
    "  - find all tuples with B<10, \n",
    "  - read the matching tuples from disk and check on A and C conditions, \n",
    "  - output A,B,C,D\n",
    "- Use index I1, find all tuples with A=10\n",
    "  - Use index I2, find all tuples with B<10, \n",
    "  - Find the intersection of tuples to find those that satisfy both\n",
    "  - read the matching tuples from disk and check on C condition, \n",
    "  - output A,B,C,D\n",
    "- Use index I3, on R(B,A,C)  : \n",
    "  - Start the search at A=10 and B>5, end the scan at B>=10 (more or less same as the search for B>5 and B<10).  \n",
    "  - Find all tuples that satisfy conditions on A,B and C\n",
    "  - Read the tuples to returnt the missing D value\n",
    "- Use index I4 on R(A,B,C,D) \n",
    "  - Scan the index for A and B conditions \n",
    "  - Return the A,B,C,D values of the found tuples that satisfy A,B,C conditions \n",
    "  - **This is called an index only scan**\n",
    "  \n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "35a73a4f",
   "metadata": {},
   "source": [
    "Assume a page has 8000 bytes of addressable space, a tuple pointer is 10 bytes\n",
    "\n",
    "movieroles(actorid, movieid, role, info1, info2) \n",
    "\n",
    "TUPLES(movieroles) = 265K \n",
    "PAGES(movieroles) = 2048 \n",
    "\n",
    "`select actorid, role from movieroles where actorid = 123;'\n",
    "\n",
    "- Cost = 2048\n",
    "\n",
    "- Index I1 on mr(actorid)\n",
    "  - 4+10= 14 bytes per entry \n",
    "  - Index can store 570 entries per page. (8000/14)\n",
    "  - Leaf level has 465 nodes (if all nodes are 100% full) -- the tree will have 2 levels (root and leaf)\n",
    "  - Leaf level has 665 nodes (if all nodes are 70% full) -- the tree will have 3 levels (root, internal, leaf)\n",
    "- Index I2 on mr(movieid)\n",
    "  - Same as I1\n",
    "- Index I3 on mr(movieid, actorid)\n",
    "  - 4+4+10= 18 bytes per entry \n",
    "  - Index can store 445 entries per page. (8000/18)\n",
    "  - Leaf level has 851 nodes  (if all nodes are 70% full) -- the tree will have 3 levels (root, internal, leaf)\n",
    "- Index I4 on mr(actorid, movieid, role)  \n",
    "  - 4+4+15+10=23 bytes per entry\n",
    "  - Index can store 347 entries per page\n",
    "  - 1091 leaf nodes at 70% full -- the tree will have 3 levels (root, internal, leaf)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "aa55e854",
   "metadata": {},
   "source": [
    "Assume indices are 70% full:\n",
    "\n",
    "`select movieid, role from movieroles where actorid = 1234;'\n",
    "\n",
    "- Sequential scan: Cost = 2048\n",
    "\n",
    "- Index I1, (assume about 5 movies per actor), 1 or 2 leaf nodes to scan\n",
    "  - Index scan: 1 root, 1 internal, 1 or 2 (worst case) leaf nodes = 4 disk pages\n",
    "  - Read the matching 5 tuples from disk (worst case 5 different pages)\n",
    "  - Total cost = 4 (index scan) + 5 (reading the tuples) = 9 pages\n",
    "  \n",
    "- I2 not usable for this query\n",
    "- I3:\n",
    "  - Index scan:Scan root, internal and then all leaf:  2+ 851 nodes\n",
    "  - Read the matching 5 tuples from disk for the \"role\" attribute (worst case 5 different pages)\n",
    "  - Total cost: 853+5= 858\n",
    "- I4:\n",
    "  - Index scan: 1 root, 1 internal, 1 or 2 (worst case) leaf nodes = 4 disk pages\n",
    "  - Return movieid, role directly from the index \n",
    "  - Total cost = 4 (index scan) = 4 pages\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "d6cc2ad2",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "4.491560312885961"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "1091/(347*.70)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d2169e06",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.13.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
