Title slide
Slides : Recording

This morning’s lecture presented a mathematical language for slicing and dicing the structured tables of the relational model: selection, projection, renaming; union, intersection, difference; cross product, join, equijoin and natural join. A key feature of this relational algebra is that just six of these operations are enough to capture an extremely wide range of queries and transformations of data. Database implementors work hard to build highly efficient engines to carry out these operations, which can then support many different kinds of user application.

Also, there were some references to increasingly wild estimates of how much data is created and processed worldwide year-by-year: exabytes, petabytes and yottabytes of it.

“These numbers are impressive, but still miniscule compared to the order of magnitude at which nature handles information”

Martin Hilbert, quoted in Science Daily

Links: Slides for Lecture 5; Recording of Lecture 5

Homework
1. Read This

Inside Google Spanner, the Largest Single Database on Earth. Wired, 2012-11-26.

“… a database designed to seamlessly operate across hundreds of data centers and millions of machines and trillions of rows of information.”

Optional: Google recently released a public version: Cloud Spanner. If you are interested to find out more details then start with this recent technical paper.

Spanner, TrueTime and the CAP Theorem. Eric Brewer, Google, February 2017. Download PDF

“Although the network can greatly reduce partitions, it cannot improve the speed of light.”

2. Do This

Work through Example 7 and Example 8 from Tutorial 1: do each example yourself and write out your answer; then look at the suggested solution. Post to one of the groups if you have questions.

References

These are the sources for the various estimates of data sizes referenced in the lecture. Follow the links, read the articles, and find the Sesame Street character.

Data Never Sleeps Infographic Data Never Sleeps
How Much Data is Created Every Minute?

Image collating information on the rate of online activity of particular kinds.

Link: Domo blog article

Screenshot of NIST web page SI Prefixes
International System of Units
US National Institute of Standards and Technology (NIST) Reference on Constants, Units, and Uncertainty
Links: NIST table of SI prefixes; Wikipedia
Screenshot of Cisco web page How Much is That?
Cisco Visual Networking IP Traffic Chart
Table giving examples of various magnitudes of data, from petabytes to yottabytes.

Link: Cisco Traffic Chart; The Zettabyte Era

Screenshot of Science Daily web page How Much Information is the in the World?
Science Daily 2011-02-11
Report on a study carried out at the University of Southern California

Link: Science Daily article; Research report

Picture of ARCHER machine cabinetsPicture of disk cabinets UK National Supercomputing Service
ARCHER

Hosted in Edinburgh, ARCHER is built around a Cray XC30 supercomputer with 300TB of memory and 100k processor cores. The colocated UK Research Data Facility provides 70PB of file storage.

Links: ARCHER; Research Data Facility

Photograph of NSA datacenter Pictures of the NSA’s Utah Data Center
Business Insider 2013-06-07
“Here’s The $2 Billion Facility Where The NSA Stores And Analyzes Your Communications”

Link: Business Insider article; Wired

Picture of tape racks Breaking Data Records Bit by Bit
Harriet Jarlett, CERN

In October 2017 the CERN data centre broke its own record for data storage when it collected 12.3 petabytes of data over a single month.

Links: Article; CERN Datacentre on Google Streetview

Screenshot of Mail Online web page Mail Online: Information Overload
There could soon be no words to describe how much data is stored in the world

Pinpoints the nightmare scenario ahead. Brought to you by a brightly-coloured Google datacentre and The Count from Sesame Street.

Link: Mail Online article

Lecture 5: Relational Algebra