{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "0e991024",
   "metadata": {},
   "source": [
    "## Lecture 24 \n",
    "\n",
    "### Today's topics: \n",
    "- External Sort\n",
    "- Query Processing and Pipelining\n",
    "- Query Optimization\n",
    "\n",
    "### Announcements\n",
    "\n",
    "- Remaining lectures:\n",
    "  - Today: Query Processing\n",
    "  - Next week: Concurrency in Transactions and Database Tuning \n",
    "  \n",
    "- Remaining assignments:\n",
    "  - Hw6: Due on monday midnight\n",
    "  - Lecture Exercise 23: already out, due on Friday 12/5\n",
    "  - Lecture Exercise 24: out today, due on Friday 12/12\n",
    "  - Optional Lecture Exercise 3: out today, due on Friday 12/12\n",
    "  - Lecture Exercise 25: out on monday 12/8, due on Friday 12/12\n",
    "\n",
    "- We will drop the lowest 5 lecture exercises (despite the weird naming convention I used!). So two of these exercises can be treated as optional. \n",
    "\n",
    "- Use lecture exercises to study! Just because we drop some does not mean the material in them is optional. They are there for you to study. \n",
    "\n",
    "- Final Exam is on 12/18, 11:30AM-2:30PM on DCC 308\n",
    "  - If you need extra time, expect to take the exam between 11:30AM-4PM\n",
    "  - Notify me of any conflicts if you have not done so "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5377129b",
   "metadata": {},
   "source": [
    "### External Sort\n",
    "\n",
    "\n",
    "Suppose PAGES(R)=1,000 and M=10,000 blocks available to the query, \n",
    "\n",
    "Cost of external sort: 1,000 reads\n",
    "\n",
    "\n",
    "Suppose PAGES(R)=1,000 and M=1,000 blocks available to the query, \n",
    "\n",
    "Cost of external sort: 1,000 reads\n",
    "\n",
    "\n",
    "Suppose PAGES(R)=1,000 and M=100 blocks available to the query\n",
    "\n",
    "- External sort, step 1:\n",
    "  - Repeat as long as more pages of R to read\n",
    "    - Read 100 pages of R, sort and **write** to disk in a temporary location\n",
    "  - Cost of step 1: 2*PAGES(R) = 2,000 pages\n",
    "  - Resulting in 10 sorted groups of 100 pages each\n",
    "  \n",
    "- Step 2: sort-merge phase\n",
    "  - Read page with the smallest values from each group in memory, until all pages from all sorted groups are\n",
    "  - Merge tuples in sorted order and output\n",
    "  - When a page becomes empty, read the next page from the same group\n",
    "  - I need to be able to read one page from each sorted group, for which I need 10 blocks (M=100)\n",
    "  - Cost= PAGES(R) = 1,000\n",
    "  \n",
    "- Total cost = 3,000  \n",
    "  \n",
    "\n",
    "Suppose PAGES(R)=1,000 and M=20 blocks available to the query\n",
    "\n",
    "- Step 1: Read/sort 20 blocks of data at a time\n",
    "  - Cost = 2,000\n",
    "  - Number of sorted groups = 1000/20= 50 groups\n",
    "  \n",
    "- Step 2: (Read and merge 20 groups and write to disk)*(Repeat 3 times)\n",
    "  - Cost = 2,000\n",
    "  - Resulted in 3 sorted groups. (400, 400, 200 pages)\n",
    "\n",
    "- Step 2: Read and merge the 3 blocks (need 3 blocks), and output\n",
    "  - Cost = 1,000\n",
    "  \n",
    "- Total cost = 5,000  \n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0c561746",
   "metadata": {},
   "source": [
    "\n",
    "PAGES(R)=1,000  \n",
    "PAGES(S)=5,000  \n",
    "\n",
    "\n",
    "- Cost of block-nested-loop join, M=19\n",
    "  - R join S (18 blocks for R, 1 block for S): \n",
    "    - Read R once (18 blocks at a time)\n",
    "    - Read S, ceil(1,000/18) times\n",
    "    - Total cost = 1,000 + 56*5,000 = 281,000 pages\n",
    "  - S join R (18 blocks of R, 1 block of S)\n",
    "    - Read S once \n",
    "    - Read R 5000/18 = 278 times\n",
    "    - Total cost = 5000 + 278*1000 = 283,000\n",
    "\n",
    "Cost of Plan A-1:\n",
    "\n",
    "PAGES( PROJECT_(R.A,R.B,R.C) (R) = 500 \n",
    "- Cost of block-nested-loop join, M=19\n",
    "  - R join S (18 blocks for R after the projection, 1 block for S): \n",
    "    - Read R once (18 blocks at a time)\n",
    "    - Read S, ceil(500/18) times\n",
    "    - Total cost = 1,000 + 28*5,000 = 141,000 pages\n",
    "\n",
    "No additional cost to select, so = 141,000 pages\n",
    "\n",
    "Cost of Plan B-1:\n",
    "\n",
    "PAGES( PROJECT_(R.B,R.C) (SELECT_(R.A=10) R) = 15\n",
    "\n",
    "Index on R(A):  200 left nodes \n",
    "Selectivity (R.A=10) = 1/100\n",
    "TUPLES(R) = 20,000\n",
    "\n",
    "\n",
    "Select on R.A=10 using index on R(A)\n",
    "\n",
    "- Scan index 200/100 leaf nodes (for A=10) plus 2 internal\n",
    "- Expect: 20000/100 = 200 matching tuples in 200 pages in the worst case\n",
    "- Cost = 200+4 = 204\n",
    "  - Takes up 15 pages according to the given statistics\n",
    "  \n",
    "- Block-nested-loop join (18 blocks for R after selection and 1 block for S)\n",
    "  - Can fit all matching tuples from the index scan in memory (15<18)\n",
    "  - Read S once\n",
    "  - Cost = 5,000 blocks (read R is in the index search)\n",
    "  \n",
    "- Total cost = 5,204  \n",
    "  \n",
    "\n",
    "\n",
    "\n",
    "\n",
    "\n",
    "\n",
    "\n",
    "\n",
    "\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "2f9d916d",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "277.77777777777777"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "9ede61ab",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "141000"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "78cf454b",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.13.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
