Introduction to Relational Databases
Course Notes
These notes are meant as a study guide. They will often be an outline of the material discussed in class.
The notes are meant to complement the book, not replace.
Terminology
DBMS (Database Management System) is a software tool for storing and managing large amounts of data.
A database server is a specific installation of a DBMS.
Database is a collection of data organized for a specific application, often stored in a DBMS.
Database application is a software product that uses DBMSs to store one or more databases to accomplish a specific purpose.
What makes something a DBMS?
Can we call any tool for storing data a DBMS?
Is Excel a DBMS?
Is your file system a DBMS?
What are desired properties of a DBMS?
what type of data can be stored (data model)
store massive amounts of data
allow access (read/write/update) to stored data easily (query language)
allow durable data storage, even when there are power failures (durability)
allow multiple users to read and write the same data without compromising data integrity (concurrent access)
DBMS Components
A DBMS is a complex software system with many components:
Storage manager
Index/file manager
Database language tools
Data query/manipulation language compiler (DML)
Data definition language compiler (DDL)
Query execution engine
Buffer manager
Transaction manager
Logging and recovery
Concurrency control
A database administrator is responsible for designing the data model.
A database programmer is responsible for writing application software that stores the database.
A DBMS systems administrator is responsible for installation and tuning of the DBMS system.
A C I D
A program that changes data is called a transaction.
A DBMS is generally expected to support ACID properties for transactions that can be implemented on them:
Atomicity: transactions must either complete fully or leave no effect in the database.
Consistency: DMBS must not allow the programmers to violate the consistency rules for the database schema.
Isolation: multiple transactions executing at the same time should result in a database state identical to transactions executing one at a time.
Durability: when a transaction completes, DBMS must guarantee its results are recorded and not lost.
There may be different ways to define these properties in a DBMS as we will see.
Databases
- Database
is given by rules regarding data (data schema or data model) and the data (database instance)
Database schema (or data model) describes what types of data are valid to store
Database instance is the actual data that satisfies the rules of the database schema.
We often use the term database to refer to the database instance assuming the data model is encapsulated within the data
Relational data model is the most popular way to describe data schema, but many others are possible: RDF, graph data, object-oriented, object-relational
Data Model
Logical data model
Relations and attributes
Constraints: what is valid data and what is not
Physical data model
Where to store the data: which file systems, distributed, replicated
How to store the data: which indices to create
Application logic
Built on top of database queries: declarative, write once and optimize on top of the logical data model
Relational Data Model
Relations (or tables) store information about the world
Attribute (or column) is a property of a specific object represented by a relation
Tuple (or row) is a specific object stored in a relation.
Domain is a set of valid values.
Simple domains are integers, strings.
More complex domains can be defined with restrictions over these domains: a RIN is an 8 digit integer, starts with 6.
Schema for a relation defined the names of the attributes and the domain for the attributes.
Note: logical vs. physical names are used interchangeably, but remember the distinction:
Logical terms refer to the mathematical definition of the relational data model: based on set semantics.
Physical terms refer to the storage/implementation of the same data model. Sometimes the implementation is not identical to the logical model.
For now, we care about the logical model.
Relational Data Model
A database is given by a set of relations.
Each relation has a name and stores a set of tuples.
Each relation schema consists of a set of attributes, the ordering of the attributes is not relevant.
Each attribute has a domain, the set of valid values.
Relation Instance
A relation contains a set of tuples
In a valid relation instance each tuple contains values for all the attributes in the relation schema that are drawn from the domain of that attribute.
We can represent a relation in one of many ways:
A table:
Hero
Alias
Name
Black Panther
T’Challa
Flash
Barry Allen
Jessica Jones
Jessica Jones
A logical representation of tuples using predicates where the attributes are arguments of the predicate. Each tuple is a fact about the world.
Hero('Black Panther', 'T''Challa') Hero('Flash', 'Barry Allen') Hero('Jessica Jones', 'Jessica Jones')
A set representation:
Hero = { <'Black Panther':Alias, 'T''Challa':Name>, <'Flash':Alias, 'Barry Allen':Name>, <'Jessica Jones':Alias, 'Jessica Jones':Name> }
All representations are equivalent after we agree on the convention.
Example relation: Avengers
Name/Alias |
Appearances |
Gender |
Year |
NumYears |
URL |
---|---|---|---|---|---|
Peter Benjamin Parker |
4333 |
MALE |
1990 |
25 |
|
Steven Rogers |
3458 |
MALE |
1964 |
51 |
|
James “Logan” Howlett |
3130 |
MALE |
2005 |
10 |
|
Anthony “Tony” Stark |
3068 |
MALE |
1963 |
52 |
|
Thor Odinson |
2402 |
MALE |
1963 |
52 |
|
Reed Richards |
2125 |
MALE |
1989 |
26 |
|
Robert Bruce Banner |
2089 |
MALE |
1963 |
52 |
|
Clinton Francis Barton |
1456 |
MALE |
1965 |
50 |
|
Henry Jonathan “Hank” Pym |
1269 |
MALE |
1963 |
52 |
|
Natalia Alianovna Romanova |
1112 |
FEMALE |
1973 |
42 |
|
Victor Shade (alias) |
1036 |
MALE |
1968 |
47 |
|
Carol Susan Jane Danvers |
935 |
FEMALE |
1979 |
36 |
|
Jennifer Walters |
933 |
FEMALE |
1982 |
33 |
|
Jessica Miriam Drew |
525 |
FEMALE |
2008 |
7 |
|
Roberto da Costa |
491 |
MALE |
2013 |
2 |
|
Maria Hill |
359 |
FEMALE |
2010 |
5 |
|
Jessica Jones |
205 |
FEMALE |
2010 |
5 |
|
Black Panther |
MALE |
1966 |
Rules of Relational Data Model
The domain of attributes have to be simple: integer, float, decimal, string, boolean, date, time, timestample or restrictions of these (9 digit integer).
This restriction is called the first normal form (1NF): attributes are indivisible pieces of information (so no sets or lists for example).!
It says that relations are simple flat pieces of information.
Each relation contains a set of attributes: the ordering of attributes is not important for the meaning of the relation.
Each relation instance contains a set of tuples. No two tuples can repeat, because we are making a logical statement:
Jessica Jones is an avenger. This does not change even if we repeat this value multiple times.
All relations have at least one key.
Key
A key is a set of attributes in a relation such that:
no two different tuples may have the same value for the attributes in the key,
and no subset of these attributes is a key (i.e. it is minimal).
In different terms: a key is a way to identify a specific tuple.
Keys define the meaning of the relation.
All relations have a key. Some relations may have multiple keys.
We will discuss some basic relations: student, class, section, book
Student(rin, name, major, year) Key: rin
Movies(title, year, studio, boxofficevalue) Key: title, year
We generally underline the key attributes. See below:
The key really depends on what the specific data model and its attributes are representing.
In the above example, we are modeling an RPI student for whom RIN is unique.
For movies, we are assuming no two different movies with the same title will come out in the same year, because that would be confusing. But, if our model was about the specific release of a movie, then the box office value would change from country to country. So, our key would need to reflect that:
Defining relations in SQL
To store a relation, we create a Table in a relational database system.
Examples of attribute types are following:
CHAR(n), VARCHAR(n), BIT(n)
BOOLEAN
INT
FLOAT, DOUBLE ; floating point precision
NUMERIC(n,p) ; fixed point precision
DATE
Create table command
CREATE TABLE tablename ( attribute1 datatype , attribute2 datatype , ... , attributen datatype constraints ) ;
Example
CREATE TABLE student ( id int , name varchar(255) , major char(4) , enrolledDate date , constraint student_pk primary key (id) -- student_pk is the name we have given to the primary key constraint )