|Slides : Recording|
This morning’s lecture presented a mathematical language for slicing and dicing the structured tables of the relational model: selection, projection, renaming; union, intersection, difference; cross product, join, equijoin and natural join. A key feature of this relational algebra is that just six of these operations are enough to capture an extremely wide range of queries and transformations of data. Database implementors work hard to build highly efficient engines to carry out these operations, which can then support many different kinds of user application.
Also, there were some references to increasingly wild estimates of how much data is created and processed worldwide year-by-year: exabytes, petabytes and yottabytes of it.
“These numbers are impressive, but still miniscule compared to the order of magnitude at which nature handles information”
Martin Hilbert, quoted in Science Daily
1. Read This
Inside Google Spanner, the Largest Single Database on Earth. Wired, 2012-11-26.
“… a database designed to seamlessly operate across hundreds of data centers and millions of machines and trillions of rows of information.”
Optional: Google recently released a public version: Cloud Spanner. If you are interested to find out more details then start with this recent technical paper.
“Although the network can greatly reduce partitions, it cannot improve the speed of light.”
2. Do This
Work through Example 7 and Example 8 from Tutorial 1: do each example yourself and write out your answer; then look at the suggested solution. Post to one of the groups if you have questions.
These are the sources for the various estimates of data sizes referenced in the lecture. Follow the links, read the articles, and find the Sesame Street character.
|Data Never Sleeps
How Much Data is Created Every Minute?
Image collating information on the rate of online activity of particular kinds.
Link: Domo blog article
International System of Units
US National Institute of Standards and Technology (NIST) Reference on Constants, Units, and Uncertainty
Links: NIST table of SI prefixes; Wikipedia
|How Much is That?
Cisco Visual Networking IP Traffic Chart
Table giving examples of various magnitudes of data, from petabytes to yottabytes.
|How Much Information is the in the World?
Science Daily 2011-02-11
Report on a study carried out at the University of Southern California
Link:Science Daily article; Research report
|UK National Supercomputing Service
Hosted in Edinburgh, ARCHER is built around a Cray XC30 supercomputer with 300TB of memory and 100k processor cores. The colocated UK Research Data Facility provides 70PB of file storage.
|Pictures of the NSA’s Utah Data Center
Business Insider 2013-06-07
“Here’s The $2 Billion Facility Where The NSA Stores And Analyzes Your Communications”
|Breaking Data Records Bit by Bit
Harriet Jarlett, CERN
In October 2017 the CERN data centre broke its own record for data storage when it collected 12.3 petabytes of data over a single month.
|Mail Online: Information Overload
There could soon be no words to describe how much data is stored in the world
Pinpoints the nightmare scenario ahead. Brought to you by a brightly-coloured Google datacentre and The Count from Sesame Street.
Link: Mail Online article