British National Corpus

In the most recent tutorial exercises you used the cqp tool to search a 3-gigaword Dickens corpus. We also have the 96-gigaword British National Corpus installed under cqp which you can explore by selecting BNC at the commmand line.

$ cqp -e
[no corpus]> BNC
BNC> AllWords = [word="[a-zA-Z].*"]
BNC> size AllWords

This has part-of-speech and lemma information like the Dickens corpus, using the Claws 5 POS tag set. As this corpus is much larger you will find queries take noticeably longer to execute.

I also recommend reading the following article on the design and creation of the BNC.

This includes information about text corpora in general, as well as specific details about how the BNC came about.

Tutorial Exercises: Information Retrieval

The tutorials web page now has the latest set of tutorial exercises. The coursework assignment has also been running for a week now, and is due in on Thursday next week. The Information Retrieval tutorial work is fairly brief, which will allow you to spend time in the tutorial discussing any questions you have about the assignment. Please do take advantage of this: attempt every question in the assignment before your tutorial, and note down any concerns you have.

New LEGO Computer Scientist Minifig: Margaret Hamilton

Picture of LEGO set
LEGO Women of NASA
LEGO have today announced the latest model to come out of their Ideas fan-design programme. It’s the “Women of NASA” set which includes Margaret Hamilton: computer scientist, software engineer, and leader of the programming team for the Apollo and Skylab space missions.

This goes to complement LEGO’s existing computer programmer minifig. Sadly Lovelace and Babbage didn’t make it this time around.

Hamilton in Apollo Command Module Margaret Hamilton
Software Engineer

Director of Apollo Flight Computer Programming for the moon landings and other NASA missions. CEO, Hamilton Technologies.

