British National Corpus

In the most recent tutorial exercises you used the cqp tool to search a 3-gigaword Dickens corpus. We also have the 96-gigaword British National Corpus installed under cqp which you can explore by selecting BNC at the commmand line.

$ cqp -e
[no corpus]> BNC
BNC> AllWords = [word="[a-zA-Z].*"]
BNC> size AllWords

This has part-of-speech and lemma information like the Dickens corpus, using the Claws 5 POS tag set. As this corpus is much larger you will find queries take noticeably longer to execute.

I also recommend reading the following article on the design and creation of the BNC.

This includes information about text corpora in general, as well as specific details about how the BNC came about.