This was the final lecture on structured data and relational databases, and covered a few more features of SQL queries: UNION
, INTERSECT
and EXCEPT
; arithmetic operations; and aggregates like SUM
and MAX
. Also, some discussion of travelling salesmen, NP-hard problems, and the ultimate physical limits of computation.
During the lecture, I used a handwritten example to show an SQL nested query. I’ve now reproduced that as a comment on the NB copy of the slides.
Homework
-
Read one of these two articles.
-
Inside Google Spanner, The Largest Database on Earth. Cade Metz, Wired Enterprise, November 2012.
-
Spanner: Google’s Globally-Distributed Database. J. C. Corbett et al. In OSDI ’12: Proceedings of the 10th USENIX Symposium on Operating Systems Design and Implementation, pages 251–264. The USENIX Association, 2012.
The first of these is a magazine article, the second a more technical presentation.
-
-
Write some SQL by hand using one of these web demonstrators.
-
SQL Tryit Editor from w3schools. This comes with a sample database of eight small tables.
-
SQL Fiddle. For this you need to enter your own schema and table data, but you can then compare a dozen different database engines. It’s a serious tool for database developers to try out queries and explore problems. Read more about SQL Fiddle.
-
SkyServer. This is a large astronomy database in active use for original scientific research. The tables are big (the largest is 509 columns by 1.2 billion rows) but immediately open to access through various managed interfaces as well as raw SQL. I’ve included the queries used in the lecture at the bottom of this post; try these, and add your own.
SkyServer and the Sloan Digital Sky Survey (SDSS) also provide the images for Galaxy Zoo, crowdsourcing the classification of a million galaxies.
-
References
The material on complexity, exponential growth, and physical limits to computation is not examinable in this course, but be aware that these are important aspects of computer science and likely to recur in your future study.
- Wikipedia has a good, if lengthy, article on the Travelling Salesman Problem.
- Wikipedia’s article on NP-completeness is a suitable introduction to the tangled web of complexity classes.
- In the event you prove P=NP or P≠NP, you’ll want to contact the Clay Mathematics Institute about that $1M prize.
- The laptop illustration was from a Nature article on the Ultimate Physical Limits to Computation (EASE login required).
- Wikipedia describes Bremermann’s Limit in its article on transcomputational problems. You can read Bremermann himself on this here:
- Optimization Through Evolution and Recombination. In Self-Organizing Systems, Yovits, et al. (eds). Spartan Books, 1962.
- Complexity and Transcomputability. In The Encyclopedia of Ignorance, Duncan and Weston-Smith (eds). Pergamon Press, 1977.
- Computronium is again on Wikipedia, and if you like that you’ll want to go on and read about Matrioshka Brains and other theoretical megastructures.
- Finally, Travelling Salesman, the movie. The maths in this is — reportedly — sound, and the desert/coin analogy in the trailer does capture the nature of P vs. NP. What’s less clear is whether showing P=NP would have quite the social and political impact suggested.
If you are interested in databases, their implementation and application, I recommend the third-year course Database Systems. For complexity and the limits of computation, try Algorithms and Complexity.
Sample SkyServer Queries
The following are queries for SkyServer that I used in the lecture. Try them yourself at the SkyServer SQL page.
-
First, to show a few rows of information about stars.
select top 10 * from star
-
This counts the number of rows in that table — how many stars SkyServer has identified.
select count(*) from star
That might take a little while to run. This is not the largest table; that’s
PhotoObjAll
, and you can find out more about the database structure in its schema browser. -
Find some stars near a particular spot in the sky (specifically, the spiral galaxy NGC 1055).
SELECT top 10
p.objId,
p.run, p.rerun, p.camcol, p.field, p.obj,
p.type, p.ra, p.dec
FROM PhotoTag p, fGetNearbyObjEq(40.433,0.449,3) n
WHERE n.objID=p.objID and p.type=3
-
Build those into a web page with links to view telescope images at the spot.
SELECT TOP 10
char(60) +
'a href="http://skyserver.sdss3.org/public/en/tools/chart/navi.aspx?ra='
+ cast(p.ra as varchar(30)) + '&dec=' + cast(p.dec as varchar(30)) + '">'
+ cast(p.objId as varchar(30)) + char(60) + '/a>' as objID,
p.run, p.rerun, p.camcol, p.field, p.obj,
p.type, p.ra, p.dec
FROM PhotoTag p, fGetNearbyObjEq(40.433,0.449,3) n
WHERE n.objID=p.objID and p.type=3
-
The previous query put HTML into an SQL query, which a remote server then fed back as a web page. This is a rudimentary SQL injection attack, persuading a server to deliver something it wouldn’t ordinarily do. Here’s an example of the same thing actively distorting the astronomical correctness of SkyServer’s output.
SELECT 'Pluto FROM planet'+char(60)+'br>'+char(60)+'img src="http://129.215.32.13/%73%74%61%72%6b/%70%2e%6a%70%67">'
Of course, Pluto is not a planet. He’s a dog.