{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "0dbce42b",
   "metadata": {},
   "source": [
    "## Lecture 21 - B-tree indices\n",
    "\n",
    "### Announcements \n",
    "\n",
    "- Homework 5 due tonight    \n",
    "- Lecture Exercise 21 to be posted at 2pm today, due on wednesday midnight\n",
    "- Initial (unofficial) version of the grade db is up\n",
    "\n",
    "\n",
    "### Today's lecture \n",
    "\n",
    "- Secondary access methods\n",
    "  - Tree indices\n",
    "- B-tree indices\n",
    "- Insertion/deletion to B-trees\n",
    "- Searching in B-trees\n",
    "\n",
    "Two variables:\n",
    "\n",
    "- Pages (R)\n",
    "- Tuples(R)\n",
    "\n",
    "- So if X = Tuples(R)/Pages(R) then X is the number of tuples per disk page\n",
    "- Note that X depends on how large the tuples are\n",
    "\n",
    "- Primary access method:read every page of a relation to answer a query\n",
    "  - Sequential scan\n",
    "  - Cost: Pages(R)\n",
    "\n",
    "\n",
    "- Secondary access method: an additional access method for a query\n",
    "  - An index on a number of attributes\n",
    "\n",
    "- Each node in a **dense index** contains an entry for each tuple\n",
    "  - An entry is the key value (if I index on R(A), then A is the key) and a pointer to the tuple\n",
    "  \n",
    "- Each node in a **sparse index** stores an entry for a range of values. Often the upper levels of tree indices are sparse.  \n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "9da67124",
   "metadata": {},
   "outputs": [],
   "source": [
    "load_ext sql"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "3e6414a5",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<span style=\"None\">displaylimit: Value None will be treated as 0 (no limit)</span>"
      ],
      "text/plain": [
       "displaylimit: Value None will be treated as 0 (no limit)"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<span style=\"None\">Connecting and switching to connection &#x27;parks&#x27;</span>"
      ],
      "text/plain": [
       "Connecting and switching to connection 'parks'"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "%config SqlMagic.displaylimit = None\n",
    "%sql --section parks"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "9e34ed9e",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<span style=\"None\">Running query in &#x27;parks&#x27;</span>"
      ],
      "text/plain": [
       "Running query in 'parks'"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<span style=\"color: green\">22 rows affected.</span>"
      ],
      "text/plain": [
       "22 rows affected."
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<table>\n",
       "    <thead>\n",
       "        <tr>\n",
       "            <th>tablename</th>\n",
       "            <th>reltuples</th>\n",
       "            <th>relpages</th>\n",
       "        </tr>\n",
       "    </thead>\n",
       "    <tbody>\n",
       "        <tr>\n",
       "            <td>places</td>\n",
       "            <td>531.0</td>\n",
       "            <td>112</td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>place_tags</td>\n",
       "            <td>3261.0</td>\n",
       "            <td>35</td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>people</td>\n",
       "            <td>222.0</td>\n",
       "            <td>32</td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>people_facts</td>\n",
       "            <td>1130.0</td>\n",
       "            <td>21</td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>place_relatedparks</td>\n",
       "            <td>565.0</td>\n",
       "            <td>19</td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>people_tags</td>\n",
       "            <td>1593.0</td>\n",
       "            <td>17</td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>people_relatedparks</td>\n",
       "            <td>249.0</td>\n",
       "            <td>12</td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>park_topics</td>\n",
       "            <td>541.0</td>\n",
       "            <td>9</td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>sql_features</td>\n",
       "            <td>755.0</td>\n",
       "            <td>8</td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>people_images</td>\n",
       "            <td>149.0</td>\n",
       "            <td>7</td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>park_images</td>\n",
       "            <td>151.0</td>\n",
       "            <td>7</td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>parks</td>\n",
       "            <td>32.0</td>\n",
       "            <td>6</td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>park_activities</td>\n",
       "            <td>394.0</td>\n",
       "            <td>5</td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>park_hours</td>\n",
       "            <td>137.0</td>\n",
       "            <td>2</td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>park_emails</td>\n",
       "            <td>32.0</td>\n",
       "            <td>1</td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>park_phoneno</td>\n",
       "            <td>41.0</td>\n",
       "            <td>1</td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>park_addresses</td>\n",
       "            <td>64.0</td>\n",
       "            <td>1</td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>sql_implementation_info</td>\n",
       "            <td>12.0</td>\n",
       "            <td>1</td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>sql_sizing</td>\n",
       "            <td>23.0</td>\n",
       "            <td>1</td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>sql_parts</td>\n",
       "            <td>11.0</td>\n",
       "            <td>1</td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>park_states</td>\n",
       "            <td>75.0</td>\n",
       "            <td>1</td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>output_lines</td>\n",
       "            <td>0.0</td>\n",
       "            <td>0</td>\n",
       "        </tr>\n",
       "    </tbody>\n",
       "</table>"
      ],
      "text/plain": [
       "+-------------------------+-----------+----------+\n",
       "|        tablename        | reltuples | relpages |\n",
       "+-------------------------+-----------+----------+\n",
       "|          places         |   531.0   |   112    |\n",
       "|        place_tags       |   3261.0  |    35    |\n",
       "|          people         |   222.0   |    32    |\n",
       "|       people_facts      |   1130.0  |    21    |\n",
       "|    place_relatedparks   |   565.0   |    19    |\n",
       "|       people_tags       |   1593.0  |    17    |\n",
       "|   people_relatedparks   |   249.0   |    12    |\n",
       "|       park_topics       |   541.0   |    9     |\n",
       "|       sql_features      |   755.0   |    8     |\n",
       "|      people_images      |   149.0   |    7     |\n",
       "|       park_images       |   151.0   |    7     |\n",
       "|          parks          |    32.0   |    6     |\n",
       "|     park_activities     |   394.0   |    5     |\n",
       "|        park_hours       |   137.0   |    2     |\n",
       "|       park_emails       |    32.0   |    1     |\n",
       "|       park_phoneno      |    41.0   |    1     |\n",
       "|      park_addresses     |    64.0   |    1     |\n",
       "| sql_implementation_info |    12.0   |    1     |\n",
       "|        sql_sizing       |    23.0   |    1     |\n",
       "|        sql_parts        |    11.0   |    1     |\n",
       "|       park_states       |    75.0   |    1     |\n",
       "|       output_lines      |    0.0    |    0     |\n",
       "+-------------------------+-----------+----------+"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "%%sql \n",
    "\n",
    "SELECT \n",
    "   pt.tablename\n",
    "    , pc.reltuples\n",
    "    , pc.relpages\n",
    "FROM   \n",
    "    pg_class pc\n",
    "    , pg_tables pt\n",
    "WHERE  \n",
    "    pt.tableowner=user\n",
    "    and pt.tablename not like 'pg%'\n",
    "    and pt.tablename = pc.relname\n",
    "order by pc.relpages desc;\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9bf05582",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%sql\n",
    "\n",
    "\n",
    "select \n",
    "    relpages\n",
    "    , relname \n",
    "from \n",
    "    pg_class \n",
    "where \n",
    "    relname not like 'pg%' \n",
    "    and relpages>0;"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fb6b1381",
   "metadata": {},
   "source": [
    "Suppose I create an index on places(placeid)\n",
    "\n",
    "where placeid is 36 bytes\n",
    "a tuple pointer is 10 bytes\n",
    "\n",
    "For each tuple, I need 46 bytes\n",
    "\n",
    "TUPLES(places) = 531\n",
    "\n",
    "A disk page is 8000 bytes (assume)\n",
    "\n",
    "8000/(46) approx= 170 tuples per page\n",
    "\n",
    "531/170 approx= 4 pages of index \n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3d72ac39",
   "metadata": {},
   "source": [
    "### B-tree indices\n",
    "\n",
    "- Secondary access method on a key value, R(A) or R(A,B)\n",
    "\n",
    "- Each node in the B-tree is a disk page, stored on disk\n",
    "- Leaf nodes of the B-tree are dense, leaf nodes store on entry per tuple\n",
    "    - An entry: a key value + a disk page pointer (points to a tuple)\n",
    "- Internal nodes of the B-tree are sparse, they point to the B-tree nodes in the level below. \n",
    "    - Each entry is a key value and a disk page pointer (points to a B-tree node page)\n",
    "    \n",
    "- Leaf nodes have an extra pointer to the next node in the leaf to the right (sibling pointers)\n",
    "\n",
    "- Each node in the B-tree has a maximum capacity, which we call n: maximum number of key value and pointer pairs I can store in a page\n",
    "\n",
    "- Each node (except root) has a minimum capacity, floor((n+1)/2) entries\n",
    "\n",
    "### Searching  in B-trees\n",
    "\n",
    "**Searching key = value**\n",
    "\n",
    "- Start from root, find the internal node in the next level until you reach the leaf level\n",
    "- If the value is found, return the key value and tuple address\n",
    "\n",
    "**Searching key in range [A,B]**\n",
    "\n",
    "- Start from root searching for the first leaf node >=A, find the internal node in the next level until you reaach  the leaf level.\n",
    "- Scan the leaf level using sibling pointers until you find the first leaf node greater than B. \n",
    "- Output all found key values and associated tuple pointers\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5e59fb86",
   "metadata": {},
   "source": [
    "![x](lecture21_figures/page1.png)\n",
    "![x](lecture21_figures/page2.png)\n",
    "![x](lecture21_figures/page3.png)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a16c3731",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.13.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
