Introduction to Relational Databases¶
Course Notes¶
- These notes are meant as a study guide. They will often be an outline of the material discussed in class.
- The notes are meant to complement the book, not replace.
Terminology¶
- DBMS (Database Management System) is a software tool for storing and managing large amounts of data.
- Database is a collection of data organized for a specific application, often stored in a DBMS.
- Database application is a software product that uses DBMSs to store one or more databases to accomplish a specific purpose.
What makes something a DBMS?¶
- Can we call any tool for storing data a DBMS?
- Is Excel a DBMS?
- Is your file system a DBMS?
- What are desired properties of a DBMS?
- what type of data can be stored (data model)
- find and change stored data easily (query language)
- store massive amounts of data
- allow durable data storage, even when there are power failures (durability)
- allow multiple users to read and write the same data (consistency)
DBMS Components¶
A DBMS is a complex software system with many components:
- Storage manager
- Index/file manager
- Database language tools
- Data query/manipulation language compiler (DML)
- Data definition language compiler (DDL)
- Query execution engine
- Buffer manager
- Transaction manager
- Logging and recovery
- Concurrency control
- A database administrator is responsible for designing the data model.
- A database programmer is responsible for writing application software that stores the database.
- A DBMS systems administrator is responsible for installation and tuning of the DBMS system.
A C I D¶
A program that changes data is called a transaction.
A DBMS is generally expected to support ACID properties for transactions that can be implemented on them:
- Atomicity: transactions must either complete fully or leave no effect in the database.
- Consistency: DMBS must not allow the programmers to violate the consistency rules for the database schema.
- Isolation: multiple transactions executing at the same time should result in a database state identical to transactions executing one at a time.
- Durability: when a transaction completes, DBMS must guarantee its results are recorded and not lost.
There may be different ways to define these properties in a DBMS as we will see.
Databases¶
- Database
- is given by rules regarding data (data schema or data model) and the data (database instance)
- Database schema (or data model) describes what types of data are valid to store
- Database instance is the actual data that satisfies the rules of the database schema.
- We often use the term database to refer to the database instance assuming the data model is encapsulated within the data
- Relational data model is the most popular way to describe data schema, but many others are possible: RDF, graph data, object-oriented, object-relational
Data Model¶
- Logical data model
- Relations and attributes
- Constraints: what is valid data and what is not
- Physical data model
- Where to store the data: which file systems, distributed, replicated
- How to store the data: which indices to create
- Application logic
- Built on top of database queries: declarative, write once and optimize on top of the logical data model
Relational Data Model¶
Relations (or tables) store information about the world
Attribute (or column) is a property of a specific object represented by a relation
Tuple (or row) is a specific object stored in a relation.
Domain is a set of valid values.
- Simple domains are integers, strings.
- More complex domains can be defined with restrictions over these domains: a RIN is an 8 digit integer, starts with 6.
Schema for a relation defined the names of the attributes and the domain for the attributes.
Note: logical vs. physical names are used interchangeably, but remember the distinction:
- Logical terms refer to the mathematical definition of the relational data model: based on set semantics.
- Physical terms refer to the storage/implementation of the same data model. Sometimes the implementation is not identical to the logical model.
For now, we care about the logical model.
Relational Data Model¶
- A database is given by a set of relations.
- Each relation has a name and stores a set of tuples.
- Each relation schema consists of a set of attributes, the ordering of the attributes is not relevant.
- Each attribute has a domain, the set of valid values.
Relation Instance¶
A relation contains a set of tuples
In a valid relation instance each tuple contains values for all the attributes in the relation schema that are drawn from the domain of that attribute.
We can represent a relation in one of many ways:
- A table:
Hero
Alias Name Flash Barry Allen Arrow Oliver Queen Jessica Jones Jessica Jones A logical representation of tuples using predicates where the attributes are arguments of the predicate. Each tuple is a fact about the world.
Hero('Flash', 'Barry Allen') Hero('Arrow', 'Oliver Queen') Hero('Jessica Jones', 'Jessica Jones')
A set representation:
Hero = { <'Flash':Alias, 'Barry Allen':Name>, <'Arrow':Alias, 'Oliver Queen':Name>, <'Jessica Jones':Alias, 'Jessica Jones':Name> }
All representations are equivalent after we agree on the convention.
Example relation: Avengers¶
Rules of Relational Data Model¶
- The domain of attributes have to be simple: integer, float, decimal,
string, boolean,
date, time, timestample or restrictions of these (9 digit integer).
- This restriction is called the first normal form (1NF): attributes are indivisible pieces of information.!
- It says that relations are simple flat pieces of information.
- Each relation contains a set of attributes: the ordering of attributes is not important for the meaning of the relation.
- Each relation instance contains a set of tuples. No two tuples can repeat, because we are making a logical statement:
- Jessica Jones is an avenger. This does not change even if we repeat this value multiple times.
- All relations have at least one key.
Key¶
A key is a set of attributes in a relation such that no two different tuples may have the same value for the attributes in the key.
In different terms: a key is a way to identify a specific tuple.
Keys define the meaning of the relation.
All relations have a key. Some relations may have multiple keys.
We will discuss some basic relations: student, class, section, book
Student(rin, name, major, year) Key: rin
Movies(title, year, studio, boxofficevalue) Key: title, year
We generally underline the key attributes. See below:
Defining relations in SQL¶
To store a relation, we create a Table in a relational database system.
Examples of attribute types are following:
- CHAR(n), VARCHAR(n), BIT(n)
- BOOLEAN
- INT
- FLOAT, DOUBLE ; floating point precision
- NUMERIC(n,p) ; fixed point precision
- DATE
Create table command
CREATE TABLE tablename ( attribute1 datatype , attribute2 datatype , ... , attributen datatype constraints ) ;
Example
CREATE TABLE student ( id int , name varchar(255) , major char(4) , enrolledDate date , constraint student_pk primary key (id) -- student_pk is the name we have given to the primary key constraint )
SQL for changing tables¶
Delete a table (with all the tuples in it):
DROP TABLE tablename ;
Add a new attribute to a table:
ALTER TABLE tableName add attributeName attributeType ;
Remove an attribute from a table:
ALTER TABLE tableName drop attributeName ;