{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "d5638aac",
   "metadata": {},
   "source": [
    "Sel(Cond) = percentage of tuples that will pass this condition\n",
    "\n",
    "High selectivity, low Sel(Cond) value\n",
    "\n",
    "0.001 is more selective than 0.1\n",
    "\n",
    "\n",
    "- Q1:\n",
    "  - Sel(A in range condition) =  (between 1K-11K), 10/12\n",
    "  - Sel(C=8) = 1/50\n",
    "  - Sel(D=50) = 1/1000\n",
    "  \n",
    "  - Attribute D is the most selective index, R(A)\n",
    "  - Sel(Q1= Cond1 and Cond2 and Cond3) = (10/12 * 1/50 * 1/1000)\n",
    "  - Exp(Q1) = Sel(Q1) * Tuples(R)\n",
    "            = (10/12) * (1/50) * (1/1000) * 100000\n",
    "            \n",
    "  - PAGES(Q1) = PAGES(R) * Sel(Q1)\n",
    "  - The best index for an index only search: R(CDAB) or R(DCAB)\n",
    "  \n",
    "  - Suppose I have indices\n",
    "    - I1 on R(A)\n",
    "      - Scan 2+ leaf nodes for A condition: Cost = 2+ (10/12) * #leafnodes(I1)\n",
    "      - Read the matching tuples to check C,D conditions and return the B attribute:  100,000*(10/12)\n",
    "    - I2 on R(A,C)\n",
    "      - Index scan cost = 2+ #leaf(I2) * (10/12) (A condition only)\n",
    "      - After scanning, we find 100000*(10/12)*(1/50)=1666 tuples that satisfy A and C conditions, read tuples from disk. Read 1666 tuples from 1666 pages in the worst case.\n",
    "    - I3 on R(D)\n",
    "      - Index scan cost: Scan 2+ #leaf(I3)/1000 leaf nodes\n",
    "      - Relation cost: 100000/1000=100 matching tuples to read from disk\n",
    "    - I4 on R(B,C,D,A) \n",
    "      - Scan all leaf nodes and no relation tuples\n",
    "      - Cost = 2+ #leaf(I4)\n",
    "    - I5 on R(C,D,A,B)\n",
    "      - Scan for C,D and A conditions\n",
    "      - Cost = 2 + #leaf(I5) * Sel(Q1) + 0 (relation cost) because all attributes are in the relation\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "27c94bc8",
   "metadata": {},
   "source": [
    "Tuples(R) = 100,000  \n",
    "Pages(R) = 4,000  \n",
    "\n",
    "Index I5, #Leaf(I5) = 2,000  \n",
    "Index I2, #Leaf(I2) = 1,000  \n",
    "\n",
    "\n",
    "R has 100000/4000=25 tuples per page  \n",
    "\n",
    "I5 has 100000/2000= 50 tuples per page  \n",
    "I5 has 100000/2000= 100 tuples per page  \n",
    "\n",
    "- Scan for C,D and A conditions  \n",
    "  Exp(Q1) = 100000 * (10/12) * (1/50) * (1/1000) = 2 tuples  \n",
    "  \n",
    "  - How many leaf nodes? 2\n",
    "  \n",
    "  \n",
    "I2 on R(A,C)    \n",
    "      - Index scan for A condition:  \n",
    "      Exp(A range) = 100000*(10/12) = 83,334 tuples  \n",
    "      For I2, 83,334/100 = 834 pages  \n",
    "      - #leaf(I2) = 1000, 10/12 of it is scanned: 1000*10/12  \n",
    "      - After scanning, we find 100000*(10/12)*(1/50)=1666 tuples that satisfy A and C conditions, read tuples from disk. Read 1666 tuples from 1666 pages in the worst case.  "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d1f7158a",
   "metadata": {},
   "source": [
    "- Block nested loop join (M number of memory blocks available):\n",
    "  - R join S (read left relation once (R), and the right relation a number of times?  \n",
    "  \n",
    "  - PAGES(R)= 1,000\n",
    "  - PAGES(S)= 500\n",
    "  - M=1001, Cost = 1,000+ 500 (Read R once and Read S once)\n",
    "  - M=101, Cost = 1,000 + 500 * (1000/100) = 6,000\n",
    "  - M=51, Cost = 1,000 + 500 * (1000/50) = 11,000\n",
    "  \n",
    "  - S join R\n",
    "  - M=1001, Cost = 1,500\n",
    "  - M=101, Cost = 500 + (500/100) * 1,000 = 5,500\n",
    "  - M=51, Cost = 500 + (500/50)*1,000 = 10,500\n",
    "  "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "38f05505",
   "metadata": {},
   "source": [
    "Lecture exercise 24:\n",
    "    \n",
    "PAGES(R)= 1,000\n",
    "\n",
    "PAGES(S)= 500\n",
    "\n",
    "PAGES(SELECT(R))= 200 (size after selection)\n",
    "\n",
    "M=101\n",
    "\n",
    "- Cost of reading R, 1,000\n",
    "- Cost of reading S: 500* (200/100) = 1,000\n",
    "- Cost of this join: 2,000\n",
    "\n",
    "- Sort: \n",
    "\n",
    " PAGES(SELECT(R JOIN S))= 4,000 (size after join and selection)\n",
    "- Sorting,  M=100\n",
    "  - Step 1: Read T in 100 pages at a time, sort and write\n",
    "  - Cost = Read once (0 cost since it is output by the join), write once = 4,000 (produce 4000/100=40 sorted groups)\n",
    "  - Step 2: Merge 40 groups and output (Cost = read T once, 4,000)\n",
    "  - Total cost = 2*4,000\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5aaec768",
   "metadata": {},
   "source": [
    "- Sorting\n",
    "  - PAGES(T) = 4000, M=100\n",
    "  - Step 1: Read T in 100 pages at a time, sort and write\n",
    "  - Cost = Read once, write once = 8,000 (produce 4000/100=40 sorted groups)\n",
    "  - Step 2: Merge 40 groups and output (Cost = read T once, 4,000)\n",
    "  - Total cost = 3*4,000\n",
    "  \n",
    "- PAGES(T)=4,000, M= 50\n",
    "  - Step 1: Cost = 8,000 pages, produce 4,000/50=80 sorted groups\n",
    "  - Step 2, part 1: Read once,write once, reduce 80 sorted groups to 2\n",
    "  - Cost = 8,000\n",
    "  - Step 2, part 2: Read once, merge 2 groups, Cost = 4,000\n",
    "  - Total cost= 4,000*5"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.13.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
