{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "56d438ef",
   "metadata": {},
   "source": [
    "## Lecture 20 - Secondary Storage Management\n",
    "\n",
    "### Announcements\n",
    "\n",
    "- Hw#5 is due tonight, I will extend the deadline to monday night. We will grade it over the Thanksgiving break.\n",
    "\n",
    "- No new hw or lecture exercise today\n",
    "\n",
    "- We need to discuss the next hw assignment.\n",
    "\n",
    "### Secondary Storage\n",
    "\n",
    "- Data is stored on disk for stable storage.\n",
    "\n",
    "- Query execution, to run a query\n",
    "  - Parse query, validate, optimize and generate query plan\n",
    "  - Execute the query:\n",
    "    - Read the data from disk into memory\n",
    "    - Compute operations (such as joins, group by, sort, etc.)\n",
    "    - Output the results\n",
    "    \n",
    "- Reading data from disk is much much more expensive than doing operations in memory.\n",
    "\n",
    "- Unit of data to be read is a **disk page**, often 8K in size, read into a **memory block**. We will assume a disk page and a memory block are the same size.\n",
    "\n",
    "- For any relation, we will use PAGES(R) as the total number of pages the relation is stored in\n",
    "\n",
    "- A page stores many tuples, how many tuples depends on the size of a tuple\n",
    "\n",
    "\n",
    "### Types of storage\n",
    "\n",
    "- Magnetic disks: to read a page, we need to move a disk head (seek) and wait for the disk to spin to the correct place (rotational latency) and read the page (data transfer)\n",
    "\n",
    "  Cost of reading a page = seek time + rotational latency + transfer time\n",
    "  \n",
    "  Cost of reading 100 pages:\n",
    "  \n",
    "  100 pages in random locations: = 100 * (seek time + rotational latency + transfer time)\n",
    "\n",
    "  100 pages in a sequential location (same track)   \n",
    "            = seek time + rotational latency + 100* transfer time\n",
    "\n",
    "\n",
    "  There is a big difference in cost of random and sequential I/O\n",
    "  \n",
    "- Solid state disks: no movable parts and faster than random I/O for sure\n",
    "\n",
    "- Virtual machines (access disk through a virtualization layer that has additional costs)\n",
    "\n",
    "\n",
    "ioping . -warmup 5 -c 10 \n",
    "\n",
    "us -> microseconds, 1/1 000 000\n",
    "ms -> miliseconds, 1/1 000 \n",
    "\n",
    "Solid state: min/avg/max/mdev = 51 us / 56 us / 60 us / 3.52 us\n",
    "Magnetic disk: min/avg/max/mdev = 158.4 us / 170.9 us / 180.2 us / 8.82 us\n",
    "Virtual machine: min/avg/max/mdev = 472.6 us / 504.5 us / 588.1 us / 43.7 us\n",
    "\n",
    "\n",
    "- Cost of a query: Number of pages read from disk (or written to disk) to answer a query\n",
    "\n",
    "\n",
    "SELECT * \n",
    "FROM R \n",
    "WHERE R.A = 5 ; \n",
    "\n",
    "- What is the cost of this query?\n",
    "\n",
    "- Sequential scan:\n",
    "   - Read all the data pages into memory\n",
    "   - Read all the tuples and check if R.A=5\n",
    "   - If yes, output\n",
    "   \n",
    "- Cost of sequential scan: PAGES(R)   \n",
    "\n",
    "- The main cost in sequential scan is driven by how many pages the relation spans, not how many tuples it has\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.13.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
