# Lecture 5: Relational Algebra

This morning’s lecture presented a mathematical language for slicing and dicing the structured tables of the relational model: selection, projection, renaming; union, intersection, difference; cross product, join, equijoin and natural join. A key feature of this relational algebra is that just six of these operations are enough to capture an extremely wide range of queries and transformations of data. Database implementors work hard to build highly efficient engines to carry out these operations, which can then support many different kinds of user application.

Also, there were some references to increasingly wild estimates of how much data is created and processed worldwide year-by-year: exabytes, petabytes and yottabytes of it.

“These numbers are impressive, but still miniscule compared to the order of magnitude at which nature handles information”

Martin Hilbert, quoted in Science Daily

#### Homework

Inside Google Spanner, the Largest Single Database on Earth
Wired, 2012-11-26.

“… a database designed to seamlessly operate across hundreds of data centers and millions of machines and trillions of rows of information.”

##### 2. Do This

Work through Example 7 and Example 8 from Tutorial 1: do each example yourself and write out your answer; then look at the suggested solution. Post to one of the groups if you have questions.

#### References

These are the sources for the various estimates of data sizes referenced in the lecture. Follow the links, read the articles, and find the Sesame Street character.

 Data Never Sleeps How Much Data is Created Every Minute? Image collating information on the rate of online activity of particular kinds. Link: Domo blog article SI Prefixes International System of Units US National Institute of Standards and Technology (NIST) Reference on Constants, Units, and Uncertainty Links: NIST table of SI prefixes; Wikipedia How Much is That? Cisco Visual Networking IP Traffic Chart Table giving examples of various magnitudes of data, from petabytes to yottabytes. How Much Information is the in the World? Science Daily 2011-02-11 Report on a study carried out at the University of Southern California UK National Supercomputing Service ARCHER Hosted in Edinburgh, ARCHER is built around a Cray XC30 supercomputer with 300TB of memory and 100k processor cores. The colocated UK Research Data Facility provides 70PB of file storage. Links: ARCHER; Research Data Facility Pictures of the NSA’s Utah Data Center Business Insider 2013-06-07 “Here’s The \$2 Billion Facility Where The NSA Stores And Analyzes Your Communications” Mail Online: Information Overload There could soon be no words to describe how much data is stored in the world Pinpoints the nightmare scenario ahead. Illustrated with a picture of The Count from Sesame Street. (Yes, really, go look.) Link: Mail Online article