SQL - Object-Relational Extensions

  • Postgresql and many other databases actually have many extensions that go well beyond the relational data model.

  • As these extensions violate relational data model, think about what you are giving up and use them sparingly!

    • Simplicity of data model and queries

    • Optimizations may not be as easy to perform

  • We will go through some of these here, using Postgresql as an example.

Semantic Hierarchies - Inheritance

  • Recall in E-R diagrams, we talked about ISA relationships.

    • A isa B, meaning A inherits all the attributes of B (and adds some more)

  • Postgresql allows you to define class hierarchies:

    See example database to be used.

    CREATE TABLE cities (
         name            text
         , population      float
         , altitude        int     -- in feet
    );
    
    CREATE TABLE capitals (
         state           char(2)
    ) INHERITS (cities);
    
  • Querying subtables:

    SELECT
       name
       , altitude
    FROM
       cities
    WHERE
       altitude > 50;
    

    Includes all cities, i.e. capitals as well.

    SELECT
       name
       , altitude
    FROM ONLY
       cities
    WHERE
       altitude > 50;
    

    Includes only cities, not capitals.

  • To find out which table a row comes from:

    SELECT
        p.relname
        , c.name
        , c.altitude
    FROM
        cities c
        , pg_class p
    WHERE
        c.altitude > 50
        and c.tableoid = p.oid;
    
    Output:
    relname   |   name    | altitude
    ----------+-----------+----------
    cities    | Las Vegas |     2174
    cities    | Mariposa  |     1953
    capitals  | Madison   |      84
    
  • Semantic hierarchies about sets of objects and their relationship to each other.

    • A type of object (capital) is a special type of city.

    • All cities include the capitals.

Complex objects

  • You can create user defined types

    create type phone_type as (
         num varchar(12)
         , type varchar(50)
    );
    
    create table person (
         id int
         , name varchar(30)
         , phone phone_type
    ) ;
    
    insert into person values(
         1
         , 'Kara Danvers'
         , ('555-1234','work')::phone_type
    ) ;
    
    select * from person ;
    
     id |     name     |      phone
    ----+--------------+-----------------
     1 | Kara Danvers | (555-1234,work)
    
  • These complex types really go against the first normal form: that all values should be atomic. But, they allow multiple related values to be encapsulated.

  • You can access the types using dot notation

    select * from person where (phone).type = 'work';
    
  • Technically you should store the both attributes for phone separately, but this way, you can tell that they belong together.

  • You can also define user defined types to be restricted domains of values and then use in multiple places.

Collection of Values

  • In addition to records (like the one above), you can also define collection of values.

  • Arrays:

    CREATE TABLE tictactoe (
        squares   integer[3][3]
    );
    
    INSERT INTO tictactoe VALUES('{{1,2,3},{4,5,6},{7,8,9}}');
    
    SELECT squares[3][2] FROM tictactoe; --not zero indexed
    
    squares
    ---------
    8
    (1 row)
    
    
    CREATE TABLE messages (
         msg  text[]
    ) ;
    
    INSERT INTO messages VALUES ('{"hello", "world"}') ;
    INSERT INTO messages VALUES ('{"I", "feel", "so", "free"}') ;
    
    SELECT msg[2] FROM messages ;
    msg
    -------
    world
    feel
    (2 rows)
    
    SELECT msg[2:3] FROM messages; --slicing, really?
    
    msg
    -----------
    {world}
    {feel,so}
    (2 rows)
    
  • The best of use complex types is to write procedures/functions using pl/pgsql or a programming language like C.

Typed objects and methods

  • The main use of typed objects is to create extensions for handling specific types of data.

  • For each data type, there are specific methods that apply to them, like an object-oriented programming language!

  • Some really useful examples:

    • Geographic data: points (geo locations), polygons (state, city boundaries), line segments (roads, rivers)

    • Text data: vectors of words and weights for each word

    • JSON

      SELECT '{"foo": {"bar": "baz"}}'::jsonb;
      
      jsonb
      -------------------------
      {"foo": {"bar": "baz"}}
      
      SELECT '{"foo": {"bar": "baz"}}'::jsonb->'foo';
      
      ?column?
      ----------------
      {"bar": "baz"}
      

Geographic Data

  • PostGIS is an extension for supporting geographic data with many useful data types of functions.

  • First install postgis and create the extension from a superuser:

    create extension postgis;
    create database geodb owner sibeladali template template_postgis ;
    
  • Now you can use all the data types and methods available in postgis.

    CREATE TABLE bwithloc (
         name  VARCHAR(100)
         , location geography(POINT,4326)
    ) ;
    
    insert into bwithloc values('Rensselaer Polytechnic Institute',
         ST_GeographyFromText('SRID=4326;POINT(42.7308634 -73.6816793)'));
    
    insert into bwithloc values('Shalimar Restaurant',
         ST_GeographyFromText('SRID=4326;POINT(42.732293 -73.688473)'));
    
    insert into bwithloc values('The Placid Baker',
         ST_GeographyFromText('SRID=4326;POINT(42.7313916 -73.690868)'));
    
  • SRID shows the projection used to compute the latitude and longitude.

  • You can also enter polygons as arrays of points, line segments are arrays of lines, etc.

  • Many geography functions are available (distance is in meters):

    SELECT
       b1.name
       , b2.name
       , ST_DISTANCE(b1.location, b2.location)
    FROM
       bwithloc b1
       , bwithloc b2
    WHERE
       b1.name < b2.name ;
    
  • Other examples:

    • Check whether a point is inside a polygon (which city is this restaurant in)?

    • Check the length of a line segment

Text Querying

  • The text queries we have seen so far very simplistic: find if the text contains a specific word.

  • More sophisticated approaches treat text as a collection of words or tokens.

    • If you want to learn more, information retrieval is a field that studies this!

  • Postgresql supports text processing:

    SELECT to_tsvector('fat cats ate fat rats');
    
    to_tsvector
    -----------------------------------
    'ate':3 'cat':2 'fat':1,4 'rat':5
    

    numbers show the location of the keyword in the text.

  • Text queries will consist of boolean connection of keywords, tokenized and stop words removed:

    SELECT to_tsquery('english', 'The & Fat & Rats');
    to_tsquery
    ---------------
    'fat' & 'rat'
    
  • You can search a keyword query in a document by relevance. The number of times a word appears will increase the relevance of the text to the query.

    We will use the Yelp database as an example:

    SELECT
       b.name
       , ts_rank_cd(to_tsvector('english', r.review_text), query) AS rank
    FROM
       reviews r
       , businesses b
       , to_tsquery('pizza & (crust | sauce) & (delicious|tasty)') query
    WHERE
       b.business_id = r.business_id
       and to_tsvector('english', r.review_text) @@ query
    ORDER BY
       rank DESC
    LIMIT 10;
    
                name            |   rank
    ----------------------------+-----------
     DeFazio's Pizzeria         |      0.05
     Little Bites and More      |      0.05
     Notty Pine Tavern          | 0.0366667
     Red Front Restrnt & Tavern | 0.0285714
     New York Style Pizza       |     0.025
     Milano Restaurant          | 0.0218698
     DeFazio's Pizzeria         | 0.0202986
     The Fresh Market           |      0.02
     Dante's Pizzeria           | 0.0192982
     Labella Pizza              | 0.0155556
    

Summary

  • Postgresql extensible with many new data types and associated methods.

  • We will also see how it is possible to create the appropriate indices for these data types.