.. role:: underline
    :class: underline


Introduction to Relational Databases
=====================================

Course Notes
-------------

- These notes are meant as a study guide. They will often be an
  outline of the material discussed in class.

- The notes are meant to complement the book, not replace.


Terminology
------------

- *DBMS (Database Management System)*
  is a software tool for storing and managing large amounts of data.

  A database server is a specific installation of a DBMS.

- *Database*
  is a collection of data organized for a specific
  application, often stored in a DBMS.

- *Database application*
  is a software product that uses DBMSs to
  store one or more databases to accomplish a specific purpose.

What makes something a DBMS?
-----------------------------

- Can we call any tool for storing data a DBMS?

  - Is Excel a DBMS?
  - Is your file system a DBMS?

- What are desired properties of a DBMS?

  - what type of data can be stored (data model)
  - store massive amounts of data 
  - allow access (read/write/update) to stored data easily (query language)
  - allow durable data storage, even when there are power failures (durability)
  - allow multiple users to read and write the same data without compromising data integrity (concurrent access)


DBMS Components
----------------

A DBMS is a complex software system with many components:

- Storage manager

  - Index/file manager

- Database language tools
  
  - Data query/manipulation language compiler (DML)
  - Data definition language compiler (DDL)

- Query execution engine
  
  - Buffer manager

- Transaction manager
  
  - Logging and recovery
  - Concurrency control

- *A database administrator*
  is responsible for designing the 
  data model. 

- *A database programmer*
  is responsible for writing application
  software that stores the database.

- *A DBMS systems administrator*
  is responsible for installation 
  and tuning of the DBMS system.

A C I D
-------

A program that changes data is called a *transaction*.

A DBMS is *generally* expected to support ACID properties for
transactions that can be implemented on them:

- Atomicity: transactions must either complete fully or leave no
  effect in the database.
- Consistency: DMBS must not allow the programmers to violate the
  consistency rules for the database schema.
- Isolation: multiple transactions executing at the same time should
  result in a database state identical to transactions executing one
  at a time.
- Durability: when a transaction completes, DBMS must guarantee its
  results are recorded and not lost. 

There may be different ways to define these properties in a DBMS as we
will see.


Databases
----------

- *Database*
   is given by rules regarding data (data schema or data model) and
   the data (database instance)

- *Database schema (or data model)* 
  describes what types of data are valid to store

- *Database instance* 
  is the 
  actual data that satisfies the rules of the database schema.

- We often use the term database to refer to the database instance
  assuming the data model is encapsulated within the data

- Relational data model is the most popular way to describe data
  schema, but many others are possible: RDF, graph data,
  object-oriented, object-relational
 
Data Model
----------

- Logical data model

  - Relations and attributes
  - Constraints: what is valid data and what is not

- Physical data model

  - Where to store the data: which file systems, distributed, replicated
  - How to store the data: which indices to create

- Application logic

  - Built on top of database queries: declarative, write once and 
    optimize on top of the logical data model

Relational Data Model
-----------------------

- *Relations (or tables)*
  store information about the world

- *Attribute (or column)* 
  is a property of a specific object represented by a relation

- *Tuple (or row)* is a specific object stored in a relation.

- *Domain* 
  is a set of valid values. 

  - Simple domains are integers, strings.
  - More complex domains can be defined with restrictions over these
    domains: a RIN is an 8 digit integer, starts with 6.

- *Schema* 
  for a relation defined the names of the attributes and the
  domain for the attributes.

- Note: logical vs. physical names are used interchangeably, but
  remember the distinction:

  - Logical terms refer to the mathematical definition of the relational 
    data model: based on set semantics.

  - Physical terms refer to the storage/implementation of the same
    data model. Sometimes the implementation is not identical to the
    logical model. 

  For now, we care about the logical model.


Relational Data Model
-----------------------

- A database is given by a set of relations. 

- Each relation has a name and stores a set of tuples.

- Each relation schema consists of a set of attributes, the ordering
  of the attributes is not relevant.

- Each attribute has a domain, the set of valid values. 

Relation Instance
-----------------

- A relation contains a set of tuples

- In a valid relation instance each tuple contains values 
  for all the attributes in the relation
  schema that are drawn from the domain of that attribute.

- We can represent a relation in one of many ways:

  - A table:

   **Hero**

   ==============  ===============
   Alias           Name
   ==============  ===============
   Black Panther   T'Challa
   Flash           Barry Allen
   Jessica Jones   Jessica Jones
   ==============  ===============

  - A logical representation of tuples using predicates where
    the attributes are arguments of the predicate. Each tuple is a
    fact about the world.

    ::

     Hero('Black Panther', 'T''Challa')
     Hero('Flash', 'Barry Allen')
     Hero('Jessica Jones', 'Jessica Jones')


  - A set representation:

    ::

      Hero = { <'Black Panther':Alias, 'T''Challa':Name>,
               <'Flash':Alias, 'Barry Allen':Name>,
               <'Jessica Jones':Alias, 'Jessica Jones':Name> }

  - All representations are equivalent after we agree on the convention.

Example relation: Avengers
---------------------------

==========================  ============  =======  =======  =========  ========================================================
Name/Alias                  Appearances   Gender   Year     NumYears   URL
==========================  ============  =======  =======  =========  ========================================================
Peter Benjamin Parker       4333          MALE     1990     25         http://marvel.wikia.com/Peter_Parker_(Earth-616)#
Steven Rogers               3458          MALE     1964     51         http://marvel.wikia.com/Steven_Rogers_(Earth-616)
James "Logan" Howlett       3130          MALE     2005     10         http://marvel.wikia.com/James_Howlett_(Earth-616)#
Anthony "Tony" Stark        3068          MALE     1963     52         http://marvel.wikia.com/Anthony_Stark_(Earth-616)
Thor Odinson                2402          MALE     1963     52         http://marvel.wikia.com/Thor_Odinson_(Earth-616)
Reed Richards               2125          MALE     1989     26         http://marvel.wikia.com/Reed_Richards_(Earth-616)#
Robert Bruce Banner         2089          MALE     1963     52         http://marvel.wikia.com/Robert_Bruce_Banner_(Earth-616)
Clinton Francis Barton      1456          MALE     1965     50         http://marvel.wikia.com/Clint_Barton_(Earth-616)
Henry Jonathan "Hank" Pym   1269          MALE     1963     52         http://marvel.wikia.com/Henry_Pym_(Earth-616)
Natalia Alianovna Romanova  1112          FEMALE   1973     42         http://marvel.wikia.com/Natalia_Romanova_(Earth-616)#
Victor Shade (alias)        1036          MALE     1968     47         http://marvel.wikia.com/Vision_(Earth-616)
Carol Susan Jane Danvers    935           FEMALE   1979     36         http://marvel.wikia.com/Carol_Danvers_(Earth-616)#
Jennifer Walters            933           FEMALE   1982     33         http://marvel.wikia.com/Jennifer_Walters_(Earth-616)#
Jessica Miriam Drew         525           FEMALE   2008     7          http://marvel.wikia.com/Jessica_Drew_(Earth-616)#
Roberto da Costa            491           MALE     2013     2          http://marvel.wikia.com/Roberto_da_Costa_(Earth-616)#
Maria Hill                  359           FEMALE   2010     5          http://marvel.wikia.com/Maria_Hill_(Earth-616)#
Jessica Jones               205           FEMALE   2010     5          http://marvel.wikia.com/Jessica_Jones_(Earth-616)#
Black Panther                             MALE     1966                https://marvel.fandom.com/wiki/T'Challa_(Earth-616)
==========================  ============  =======  =======  =========  ========================================================

Rules of Relational Data Model
-------------------------------

- The domain of attributes have to be simple: integer, float, decimal, 
  string, boolean,
  date, time, timestample or restrictions of these (9 digit integer).

  - This restriction is called the first normal form (1NF): attributes are
    indivisible pieces of information (so no sets or lists for example).!

  - It says that relations are simple flat pieces of information.


- Each relation contains a set of attributes: the ordering of attributes is not important for the meaning of the relation.

- Each relation instance contains a set of tuples. No two tuples can repeat, because we are making a logical statement:

  - Jessica Jones is an avenger. This does not change even if we repeat this value multiple times.

- All relations have at least one key. 


Key
---

- A key is a set of attributes in a relation such that:

  - no two different tuples may have the same value for the attributes in the key,
  - and no subset of these attributes is a key (i.e. it is minimal).  

- In different terms: a key is a way to identify a specific tuple. 

- Keys define the meaning of the relation.

- All relations have a key. Some relations may have multiple keys.

- We will discuss some basic relations: student, class, section, book

  Student(rin, name, major, year)   Key: rin

  Movies(title, year, studio, boxofficevalue)  Key: title, year

- We generally underline the key attributes. See below:

  .. math::

     Student(\underline{rin}, name, major, year)
     
     Movies(\underline{title, year}, studio, boxofficevalue)

- The key really depends on what the specific data model and its attributes
  are representing.

  - In the above example, we are modeling an RPI student for whom RIN is unique.
  - For movies, we are assuming no two different movies with the same title will come out
    in the same year, because that would be confusing.
    But, if our model was about the specific release of a movie, then the box office value
    would change from country to country. So, our key would need to reflect that:

  .. math::

     MovieReleases(\underline{title, year, country}, studio, boxofficevalue)
    
Defining relations in SQL
-------------------------

- To store a relation, we create a Table in a relational database system.

- Examples of attribute types are following:

  - CHAR(n), VARCHAR(n), BIT(n)
  - BOOLEAN
  - INT
  - FLOAT, DOUBLE   ; floating point precision
  - NUMERIC(n,p)      ; fixed point precision
  - DATE

-  Create table command
   
   ::
      
      CREATE TABLE tablename
      (
          attribute1  datatype
	  , attribute2  datatype

	  , ...

	  , attributen  datatype
	  constraints
      ) ;
      
-  Example

   ::
      
      CREATE TABLE student
      (
        id int
        , name  varchar(255)
        , major  char(4)
        , enrolledDate   date
        , constraint student_pk primary key (id)
	--  student_pk is the name we have given to the primary key constraint
      )
       

