Lecture 8: SQL Queries

This was the final lecture on structured data and relational databases, and covered a few more features of SQL queries: UNION, INTERSECT and EXCEPT; arithmetic operations; and aggregates like SUM and MAX. Also, some discussion of travelling salesmen, NP-hard problems, and the ultimate physical limits of computation.

During the lecture, I used a handwritten example to show an SQL nested query. I’ve now reproduced that as a comment on the NB copy of the slides.

Homework

  1. Read one of these two articles.

    The first of these is a magazine article, the second a more technical presentation.

  2. Write some SQL by hand using one of these web demonstrators.

    • SQL Tryit Editor from w3schools. This comes with a sample database of eight small tables.

    • SQL Fiddle. For this you need to enter your own schema and table data, but you can then compare a dozen different database engines. It’s a serious tool for database developers to try out queries and explore problems. Read more about SQL Fiddle.

    • SkyServer. This is a large astronomy database in active use for original scientific research. The tables are big (the largest is 509 columns by 1.2 billion rows) but immediately open to access through various managed interfaces as well as raw SQL. I’ve included the queries used in the lecture at the bottom of this post; try these, and add your own.

      SkyServer and the Sloan Digital Sky Survey (SDSS) also provide the images for Galaxy Zoo, crowdsourcing the classification of a million galaxies.

References

The material on complexity, exponential growth, and physical limits to computation is not examinable in this course, but be aware that these are important aspects of computer science and likely to recur in your future study.

If you are interested in databases, their implementation and application, I recommend the third-year course Database Systems. For complexity and the limits of computation, try Algorithms and Complexity.

Sample SkyServer Queries

The following are queries for SkyServer that I used in the lecture. Try them yourself at the SkyServer SQL page.

  • First, to show a few rows of information about stars.

    select top 10 * from star

  • This counts the number of rows in that table — how many stars SkyServer has identified.

    select count(*) from star

    That might take a little while to run. This is not the largest table; that’s PhotoObjAll, and you can find out more about the database structure in its schema browser.

  • Find some stars near a particular spot in the sky (specifically, the spiral galaxy NGC 1055).

    SELECT top 10
    p.objId,
    p.run, p.rerun, p.camcol, p.field, p.obj,
    p.type, p.ra, p.dec
    FROM PhotoTag p, fGetNearbyObjEq(40.433,0.449,3) n
    WHERE n.objID=p.objID and p.type=3

  • Build those into a web page with links to view telescope images at the spot.

    SELECT TOP 10
    char(60) +
    'a href="http://skyserver.sdss3.org/public/en/tools/chart/navi.aspx?ra='
    + cast(p.ra as varchar(30)) + '&dec=' + cast(p.dec as varchar(30)) + '">'
    + cast(p.objId as varchar(30)) + char(60) + '/a>' as objID,
    p.run, p.rerun, p.camcol, p.field, p.obj,
    p.type, p.ra, p.dec
    FROM PhotoTag p, fGetNearbyObjEq(40.433,0.449,3) n
    WHERE n.objID=p.objID and p.type=3

  • The previous query put HTML into an SQL query, which a remote server then fed back as a web page. This is a rudimentary SQL injection attack, persuading a server to deliver something it wouldn’t ordinarily do. Here’s an example of the same thing actively distorting the astronomical correctness of SkyServer’s output.

    SELECT 'Pluto FROM planet'+char(60)+'br>'+char(60)+'img src="http://129.215.32.13/%73%74%61%72%6b/%70%2e%6a%70%67">'

    Of course, Pluto is not a planet. He’s a dog.

This entry was posted in Lecture log. Bookmark the permalink.