SQL - Part 3: Data Definition and Manipulation

A lot of the mechanics for manipulating data builds on SQL select syntax. It is crucial to understand SELECT well before learning the material in this section.

Overview

Many data manipulation statements refer to a single table only: you can only change one table at a time.
- If your query needs to access data from another table Y while you are modifying table X, you must use a subquery.
- Review the syntax and examples below carefully to see how this is done.
Whenever you run a statement that will change the database contents, you are running a transaction.
- If the transaction violates some constraint in the database, it will not succeed and your statement will fail.
- Unsuccessful statements should not change the database. See more about transactions below.

Transactions

A transaction is a series of database operations executed as a logical unit. They may involve zero, one or more operations that change the data.

Example:

Withdraw 100 dollars from account X:
  Read account value of X into memory
  If greater than 100 then
     subtract 100 from X
     write X back to disk
     commit
 else
     abort

Transaction have multiple properties that are highly desirable in their implementations. We will talk about two in this section:

Atomicity: refers to all-or-nothing approach to transactions. Either all steps of the transaction succeed, or the transaction leaves the database unchanged.

To accomplish atomicity, the database transaction management system keeps track of all changes made and either makes them permanent if transaction succeeds (commit) or rolls them back if transactions fails or aborts (rollback).

Concurrency: Multiple transactions executing at the same time should not have bad effects on each other. Programmers write programs as if it is the only program executing at a time.

Transactions - Atomicity

Define a transaction start point with START TRANSACTION or BEGIN.
A transaction either ends successfully with a COMMIT or aborted completely with a ROLLBACK.
A transaction may involve multiple statements and multiple tuples.
- Whenever an update/delete/insert statement is executed, all tuples changed by the statement is part of the same transaction.
- Sometimes an update statement triggers other operations, for example due to constraints attached to foreign keys (see below).
  
  All these operation become part of the same transaction.
- If any part of a transaction fails, the whole transaction fails.
It is also possible to explicitly specify the beginning and end point of transactions by explicit BEGIN (or BEGIN TRANSACTION) and COMMIT/ROLLBACK statements.

Insert statement

Insert a single tuple into a table:

INSERT INTO results
VALUES(10, 'Rahul', 'winner');

Attributes must appear in the values statement in the same order they appear in the “create table” statement.
All attributes must be given a value, even if is null.

If you want to only populate some attributes with a value, they must be listed:

INSERT INTO episodes(id, title, signature, technical, showstopper)
VALUES(11,'The Great Christmas Bake Off','12 Iced Biscuits','6 Laufabrauð','Hidden Design Christmas Present Cake') ;

Any missing attribute is assumed to have a NULL value.

Insert results of a query to a table

You can also replace the values part of insert with a query. In this case, all the tuples returned by a query are inserted to a table.

CREATE TABLE eliminated (
    episodeid  int
    , baker    character varying(100)
    , PRIMARY KEY (episodeid, baker)
    , FOREIGN KEY (episodeid) REFERENCES episodes(id)
    , FOREIGN KEY (baker) REFERENCES bakers(baker)
) ;

INSERT INTO eliminated (episodeid, baker)
SELECT
   episodeid, baker
FROM
   results
WHERE
   result = 'eliminated';

There is a shorthand for creating a copy of a table by copying both the schema and the tuples from a different table that basically extends the above operation:

CREATE TABLE eliminated as
    SELECT episodeid, baker FROM results WHERE result = 'eliminated';

However, this query will not create primary and foreign key constraints and these must be added explicitly:

ALTER TABLE eliminated ADD PRIMARY KEY(episodeid, baker) ;
ALTER TABLE eliminated ADD FOREIGN KEY (episodeid) REFERENCES episodes(id);
ALTER TABLE eliminated ADD FOREIGN KEY (baker) REFERENCES bakers(baker);

Delete statement

You can delete tuples from a table (but leave the table definition in the database) using the delete statement.
```
DELETE FROM episodes WHERE firstaired is NULL ;
```
This is an example of a WHERE statement that can only refer to the tuples in the table we are deleting from:
- For each tuple in the table, if the WHERE statement is True, then delete the tuple.
The following will then delete all the tuples in the table:
```
DELETE FROM favorites ;
```
If you wanted to completely remove the table, you will then need to use the drop statement:
```
DROP TABLE favorites ;
```

If you wanted to delete based on a different table, you need a subquery:

DELETE FROM
    favorites f
WHERE
    EXISTS (SELECT 1
            FROM results r
            WHERE f.episodeid = r.episodeid
                  AND f.baker=r.baker
                  AND r.result = 'star baker');

Favorites should only contain bakers who were favorites but did not win star baker in that eposide. This is a case where EXISTS is needed because you cannot use join here.

Update statement

Update statement is very similar to delete.

Update tuples by changing the value of one or more attributes.
WHERE statement describes which tuples should be updated.

ALTER TABLE episodes ADD season int ;
ALTER TABLE episodes ADD year int ;

UPDATE episodes SET season = 9 ;  -- all tuples from season 9

UPDATE
    episodes
SET
    year = extract(year from firstaired)
WHERE
    firstaired is not null ; --only for tuples with an airdate value

If there is no WHERE statement, then all tuples are updated.

If you need to update based on a different table, you must use a subquery:

ALTER TABLE bakers ADD numwins ;

 UPDATE
     bakers b
 SET
     b.numwins = (SELECT count(*)
                  FROM results r
                  WHERE r.baker = b.baker AND r.result='star baker') ;

For all bakers, the number of wins is computed with a correlated query that returns the wins for that baker.

Foreign keys

A foreign key is a referential integrity constraint:
- R.A is a foreign key references S.B means that non-null values of R.A must be stored in S.B,
- S.B is a unique attribute or a primary key.

Example

CREATE TABLE ABC (
    X int
    , Y int,
    , PRIMARY KEY(X,Y)
) ;

CREATE TABLE DEF (
    Z int
    , W int
    , Q int
    , PRIMARY KEY Z
    , FOREIGN KEY (Z,W)
      REFERENCES ABC(X,Y)
      ON DELETE CASCADE
      ON UPDATE SET NULL
) ;

This means that DEF(Z,W) can be null (as there is no not null constraint), but if they have a value, the value must exit in DEF.
When a tuple from ABC is deleted, tuples that reference this tuple are also deleted (CASCADE).
If the primary key for a tuple in ABC is updated, then the corresponding tuples in DEF are set to null.
If there is no corresponding ON DELETE or ON UPDATE actions, the default behavior is “RESTRICT”. In this case, an update/delete from ABC will fail if there are any tuples in DEF that reference it.
All these cascade and set null events become part of the same transaction as the triggering update/delete/insert.

Constraint checking

All constraints are checked immediately, i.e. as soon as the relation they are attached to is changed (foreign keys are attached to the referenced relation).
This is not desirable for cyclic constraints:

Example: check for egg a chicken exists and check for chicken an egg exists

You can defer checking of constraints to the end of the transaction:

FOREIGN KEY (Z,W) REFERENCES ABC(X,Y)
DEFERRABLE INITIALLY DEFERRED

Other constraints

NOT NULL: checks that the values stored for an attribute should not be null
CHECK: checks for an attribute or tuple, the values satisfy a condition (anything that can be written in the WHERE clause of a query that refers to the attributes in the given table only)

Example

CREATE TABLE class (
   id int PRIMARY KEY
   , code  CHAR(4)
   , name VARCHAR(50) NOT NULL
   , semester VARCHAR(5) CHECK (semester in ('Fall', 'Spring','Summer'))
   , year INT CHECK (year > 1990)
   , CHECK (code IS NOT NULL OR name = 'MISC')
) ;

The constraint:
```
CHECK (code IS NOT NULL OR name = 'MISC')
```
is checked when a new tuple is inserted into class or when it is updated.

Assertions

Integrity constraints can be expressed in SQL using assertions.
```
CREATE ASSERTION assertionName CHECK  ( … )
```
Assertions are created for a database, i.e. for all tables in a schema. They are evaluated anytime a table in the schema is changed.
The check clause of an assertion is similar to the WHERE clause, except there is no FROM clause and relations.
Anytime a change (INSERT/UPDATE/DELETE) in a table violates an assertion, then the transaction causing the change is aborted.
Assertions are part of SQL standard, but they are not implemented in Postgresql.
Whenever any transaction violates any existing assertion in the
database, the transaction is aborted and all the changes are rolled back.

Assertion Examples

The max_enrollment for a class cannot be larger than the seating capacity of the classroom assigned to the class.

CREATE ASSERTION maxClassSize
CHECK NOT EXISTS (
    SELECT
        1
    FROM
        classes c
        , classrooms cr
    WHERE
        c.classroom_id = cr.id
        and cr.numseats <
            (SELECT
                count(*)
             FROM
                transcript t
             WHERE
                t.course_id = c.course_id
                and t.semester = c.semester
                and t.year = c.year
                and t.section = c.section
) ) ;

Students cannot take a course without completing the prerequisites of that course.

CREATE ASSERTION mustHavePrereq
CHECK NOT EXISTS (
     SELECT
         *
     FROM
         transcript t1
         , requires r
     WHERE
         t1.course_id = r.course_id
         and NOT EXISTS (
             SELECT
                 *
             FROM
                 transcript t2
             WHERE
                 t2.course_id = r.prereq_id
                 and t2.student_id = t1.student_id
                 and t2.grade in (‘A’,’B’,’C’,’D’)
   ) ) ;

Triggers

Assertions may be costly to implement for databases, because they must be checked for any insert/update/delete statements.
As a result, assertions may incur a considerable performance penalty.
Triggers allow the violations to be checked for certain actions:
```
CREATE TRIGGER xyz AFTER INSERT ON transcript
```
Triggers can define what are violations programmatically.
Furthermore, triggers may describe what must happen if there is a violation, instead of simply failing the transaction.
We will see transaction in detail later on.

Transactions - Isolation

Isolation principle says that if one transaction executes completely before the other, than its result is acceptable.
Hence, any serial ordering of transactions results in an acceptable database state.
There is an implicit assumption that transactions are complete units of execution and are implemented correctly.
Let us see an example of what can go wrong if the database does not guarantee serializability.

Given two transactions, T1 and T2 that access the same data X (tuple or attribute)
```
T1: read(x), x++, write(x), read(y), y--, write(y)
T2: read(x), x--, write(x)
```
Assume read/write are disk operations, reading/writing data to the database. The increment/decrement are operation that take place in memory.
Suppose X=10, Y=10
Isolation says that if one transaction executes completely before the other, than its result is acceptable.
After: T1->T2 or T2->T1, we have: X=10,Y=9. See for example:

Time

T1

T2

1

read(x)

2

x++

3

write(x)

4

read(y)

5

y–

6

write(y)

7

read(x)

8

x–

9

write(x)

Which gives x=10, y=9.
Instead, assume the following set of operations take place in the following order of time:

Time

T1

T2

1

read(x)

2

x++

3

read(x)

4

x–

5

write(x)

6

read(y)

7

write(x)

8

y–

9

write(y)

Since T2 reads the value of x before it is written, T1 and T2 read the same value of x.

The final result of this database is X=9, Y=9.

There is no equivalent serial execution that gives this result, which is a problem.
Let us see a different execution order:

Time

T1

T2

1

read(x)

2

x++

3

write(x)

4

read(x)

5

x–

6

read(y)

7

write(x)

8

y–

9

write(y)

This one gives the result x=10, y=9, which is equivalent to a serial execution.

More on Transactions - Serializability

Make sure that even though operations of different transactions may be interleaved, the resulting state is equivalent to the result of some serial execution.

Dirty Read

Dirty read: dirty read is a value written by an uncommitted transaction.

Time

T1

T2

1

read(x)

2

x++

3

write(x)

4

read(x)

Here, value read by T2 is written by T1. If T1 is not yet committed, we must not allow T2 to commit either.

If T1 aborts, then T2 must also be aborted (leading to cascading aborts).

SQL Levels of isolation

Four levels, each one overcomes a problem that may happen in the previous level

ISOLATION LEVEL

POTENTIAL PROBLEM

READ UNCOMMITTED

Dirty data read

READ COMMITTED

Repeated reads may give different results

REPEATABLE READ

Phantom update

SERIALIZABLE

None of the above
As each level is more restrictive, fewer transactions may run concurrently.
Most DBs do not allow READ UNCOMMITTED or force a transaction at this level to be read only.
A READ COMMITTED transaction allows other transactions to read/write an item after the transaction is done reading/writing it. Hence, if the same value is read again, its value may be different.
REPEATABLE READ does not allow the data to be changed by another transaction.

But, it is possible that new tuples that may be relevant to a transaction are inserted or changed while the transaction is executing.

Example: Find number of students in class CSCI-4380.

While counting, none of the existing students may drop the class.

But, new students may be added. This is called phantom update.
SERIALIZABLE does not allow phantom updates either. It is the most restrictive method, often requiring monitoring the tuples that may effect a query/insert/update condition.
Postgresql implementation does not exactly correspond to these levels. We will talk about these in detail when we discuss transaction management.

Here is an example with transaction isolation levels:

START TRANSACTION ;
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
INSERT INTO T1 SELECT NAME FROM T2 ;
INSERT INTO T1 SELECT NAME FROM T2 ;
COMMIT ;

Note that transactions may commit or rollback either programmatically or by external events: table constraint violations or other transaction management system events like resolution of deadlocks or time outs.

What is the result of the following transaction?
```
START TRANSACTION ;
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
INSERT INTO T1 SELECT NAME FROM T2 ;
INSERT INTO T1 SELECT NAME FROM T2 ;
ROLLBACK ;
```

Time	T1	T2
1	read(x)
2	x++
3	write(x)
4	read(y)
5	y–
6	write(y)
7		read(x)
8		x–
9		write(x)

Time	T1	T2
1	read(x)
2	x++
3		read(x)
4		x–
5	write(x)
6	read(y)
7		write(x)
8	y–
9	write(y)

Time	T1	T2
1	read(x)
2	x++
3	write(x)
4		read(x)
5		x–
6	read(y)
7		write(x)
8	y–
9	write(y)

Time	T1	T2
1	read(x)
2	x++
3	write(x)
4		read(x)