Today’s lecture presented various techniques to support effective information retrieval: termfrequency inverse document frequency (tfidf); the big bag of words model; the vector space model; and cosine similarity for document ranking.
The vector space model for information retrieval treats documents as vectors in a very highdimensional space: a dimension for every distinct word, with the vector coordinate being the number of times the word occurs in the document. In a collection of documents, these all combine to give a document matrix. We can assess whether a document matches a query by computing the angle between these vector representations. Evaluating cosine similarity on all documents gives a ranking of relevance to the query.
The notion of a model is a powerful one that occurs across the natural sciences, and is actively used in Software Engineering to manage the design and maintenance of large systems. A welldefined model gives a precise representation of some aspects of a system being studied; the model need not capture everything about the system, and indeed it’s often important to abstract from concrete details.
In this case the model allows us to describe ranking and similarity without fixating on implementation details; and potentially to compare different kinds of ranking algorithm sharing the same model.
Links: Slides for Lecture 16; Recording of Lecture 16
References
A Vector Space Model for Information Retrieval Gerard Salton Not in Comm. ACM, 1975; nor in J. Amer. Soc. for Inf. Science, 1975; nor indeed anywhere. As explained below, this apparently highly influential paper is a bibliography virus. Link: None. It’s a mirage.  
The Most Influential Paper Gerard Salton Never Wrote David Dubin Library Trends 52(4): 748–764, 2004 Link: Copy in the Free Library 

Karen SpärckJones Pioneer in information retrieval and natural language processing. Professor of Computers and Information Links Obituary 