SQL - Object-Relational Extensions¶
- Postgresql and many other databases actually have many extensions that go well beyond the relational data model.
- As these extensions violate relational data model, think about
what you are giving up and use them sparingly!
- Simplicity of data model and queries
- Optimizations may not be as easy to perform
- We will go through some of these here, using Postgresql as an example.
Semantic Hierarchies - Inheritance¶
Recall in E-R diagrams, we talked about ISA relationships.
- A isa B, meaning A inherits all the attributes of B (and adds some more)
Postgresql allows you to define class hierarchies:
See
example database to be used
.CREATE TABLE cities ( name text , population float , altitude int -- in feet ); CREATE TABLE capitals ( state char(2) ) INHERITS (cities);
Querying subtables:
SELECT name , altitude FROM cities WHERE altitude > 50;
Includes all cities, i.e. capitals as well.
SELECT name , altitude FROM ONLY cities WHERE altitude > 50;
Includes only cities, not capitals.
To find out which table a row comes from:
SELECT p.relname , c.name , c.altitude FROM cities c , pg_class p WHERE c.altitude > 50 and c.tableoid = p.oid; Output: relname | name | altitude ----------+-----------+---------- cities | Las Vegas | 2174 cities | Mariposa | 1953 capitals | Madison | 84
Semantic hierarchies about sets of objects and their relationship to each other.
- A type of object (capital) is a special type of city.
- All cities include the capitals.
Complex objects¶
You can create user defined types
create type phone_type as ( num varchar(12) , type varchar(50) ); create table person ( id int , name varchar(30) , phone phone_type ) ; insert into person values( 1 , 'Kara Danvers' , ('555-1234','work')::phone_type ) ; select * from person ; id | name | phone ----+--------------+----------------- 1 | Kara Danvers | (555-1234,work)
These complex types really go against the first normal form: that all values should be atomic. But, they allow multiple related values to be encapsulated.
You can access the types using dot notation
select * from person where (phone).type = 'work';
Technically you should store the both attributes for phone separately, but this way, you can tell that they belong together.
You can also define user defined types to be restricted domains of values and then use in multiple places.
Collection of Values¶
In addition to records (like the one above), you can also define collection of values.
Arrays:
CREATE TABLE tictactoe ( squares integer[3][3] ); INSERT INTO tictactoe VALUES('{{1,2,3},{4,5,6},{7,8,9}}'); SELECT squares[3][2] FROM tictactoe; --not zero indexed squares --------- 8 (1 row) CREATE TABLE messages ( msg text[] ) ; INSERT INTO messages VALUES ('{"hello", "world"}') ; INSERT INTO messages VALUES ('{"I", "feel", "so", "free"}') ; SELECT msg[2] FROM messages ; msg ------- world feel (2 rows) SELECT msg[2:3] FROM messages; --slicing, really? msg ----------- {world} {feel,so} (2 rows)
The best of use complex types is to write procedures/functions using pl/pgsql or a programming language like C.
Typed objects and methods¶
The main use of typed objects is to create extensions for handling specific types of data.
For each data type, there are specific methods that apply to them, like an object-oriented programming language!
Some really useful examples:
Geographic data: points (geo locations), polygons (state, city boundaries), line segments (roads, rivers)
Text data: vectors of words and weights for each word
JSON
SELECT '{"foo": {"bar": "baz"}}'::jsonb; jsonb ------------------------- {"foo": {"bar": "baz"}} SELECT '{"foo": {"bar": "baz"}}'::jsonb->'foo'; ?column? ---------------- {"bar": "baz"}
Geographic Data¶
PostGIS is an extension for supporting geographic data with many useful data types of functions.
First install postgis and create the extension from a superuser:
create extension postgis; create database geodb owner sibeladali template template_postgis ;
Now you can use all the data types and methods available in postgis.
CREATE TABLE bwithloc ( name VARCHAR(100) , location geography(POINT,4326) ) ; insert into bwithloc values('Rensselaer Polytechnic Institute', ST_GeographyFromText('SRID=4326;POINT(42.7308634 -73.6816793)')); insert into bwithloc values('Shalimar Restaurant', ST_GeographyFromText('SRID=4326;POINT(42.732293 -73.688473)')); insert into bwithloc values('The Placid Baker', ST_GeographyFromText('SRID=4326;POINT(42.7313916 -73.690868)'));
SRID shows the projection used to compute the latitude and longitude.
You can also enter polygons as arrays of points, line segments are arrays of lines, etc.
Many geography functions are available (distance is in meters):
SELECT b1.name , b2.name , ST_DISTANCE(b1.location, b2.location) FROM bwithloc b1 , bwithloc b2 WHERE b1.name < b2.name ;
Other examples:
- Check whether a point is inside a polygon (which city is this restaurant in)?
- Check the length of a line segment
Text Querying¶
The text queries we have seen so far very simplistic: find if the text contains a specific word.
More sophisticated approaches treat text as a collection of words or tokens.
- If you want to learn more, information retrieval is a field that studies this!
Postgresql supports text processing:
SELECT to_tsvector('fat cats ate fat rats'); to_tsvector ----------------------------------- 'ate':3 'cat':2 'fat':1,4 'rat':5
numbers show the location of the keyword in the text.
Text queries will consist of boolean connection of keywords, tokenized and stop words removed:
SELECT to_tsquery('english', 'The & Fat & Rats'); to_tsquery --------------- 'fat' & 'rat'
You can search a keyword query in a document by relevance. The number of times a word appears will increase the relevance of the text to the query.
We will use the Yelp database as an example:
SELECT b.name , ts_rank_cd(to_tsvector('english', r.review_text), query) AS rank FROM reviews r , businesses b , to_tsquery('pizza & (crust | sauce) & (delicious|tasty)') query WHERE b.business_id = r.business_id and to_tsvector('english', r.review_text) @@ query ORDER BY rank DESC LIMIT 10; name | rank ----------------------------+----------- DeFazio's Pizzeria | 0.05 Little Bites and More | 0.05 Notty Pine Tavern | 0.0366667 Red Front Restrnt & Tavern | 0.0285714 New York Style Pizza | 0.025 Milano Restaurant | 0.0218698 DeFazio's Pizzeria | 0.0202986 The Fresh Market | 0.02 Dante's Pizzeria | 0.0192982 Labella Pizza | 0.0155556
Summary¶
- Postgresql extensible with many new data types and associated methods.
- We will also see how it is possible to create the appropriate indices for these data types.