The language of Parliament
What work of Australian political history contains almost 500 million words contributed by over a thousand different authors?
Hansard is the record of what was spoken in the Commonwealth Parliament from 1901 to the present. Debates, speeches, questions, even interjections – it’s all recorded in Hansard.
You can search Hansard using the ParlInfo database, and explore recent years on the Open Australia site. But I’m particularly interested in what we might learn from the historic content.
Pre-digital editions of Hansard have been through a process known as Optical Character Recognitions (OCR) to make the printed text available in electronic form. Within each file, the various parts that make up a sitting day in Parliament – like debates and questions – are clearly marked. This makes it easier to search the text, but it also means we can explore the language of Hansard in different ways.
We’re used to searching for particular documents within a collection, but what about searching for particular words or phrases across a collection – tracking their use, context, and variations through time. You can try this for yourself on the Museum’s Election Speeches site. There you can visualise the frequency of words across 89 election policy speeches.
Hansard is a lot bigger, containing something like 300,000 speeches and 150,000 questions. The first step is just getting all the text files together in a form that can be analysed. Last year I wrote a computer script to download all of Hansard from 1901 to 1980 from the ParlInfo database. There’s 9,829 files, one for every sitting day in the House of Representatives and the Senate. I’ve saved them in a GitHub repository where anyone can easily grab their own copies.
Using these files I created Historic Hansard, a website dedicated to ‘lovers of political speech’. While ParlInfo is good for searching, Historic Hansard is meant for reading. Each day’s proceedings is presented on a single page, giving you a sense of the rhythmn, flow, and structure of the day. But there’s a few added extras. A bill index makes it easy to track the historical development of legislation, while a people index lists the contributions of each individual member.

But we can do more. Historic Hansard also provides some examples of what becomes possible when we make historical resources available in digital form. The site integrates Hypothes.is, a web-scale annotation tool. This means that anyone can enrich the content with notes, links, comments, and images. Imagine a classroom project where students annotate a year of Hansard with information about all the people, places, and events mentioned in Parliament.
Pages for each day and year in Historic Hansard also include a ‘View in Voyant’ button. This opens the text in Voyant Tools, a web-based text analysis workbench. Voyant treats text as data, enabling you to analyse the frequency, distribution, and context of individual words within those many thousands of speeches. A wide range of tools and visualisations are built in to Voyant, and you can easily embed the results in your own web site. If Voyant gets you hooked on the possibilities of text analysis, you can take things further by downloading multiple days or years from my Hansard repository and manually importing them into Voyant or another text analysis tool like AntConc.

Hansard is a rich resource for exploration and research, but something is lost when the spoken word is transcribed into text. Are there ways in which we might use Hansard to recover some of the theatre, drama, humour, and personality of Parliament?
Amongst the more formal speeches and statements, Hansard records interjections – short comments or questions from across the chamber that interrupt the current speaker. Almost a million interjections from 1901 to 1980 are included in Hansard. Recently, as I was fiddling around with ways of displaying them, I started to see the interjections as something akin to tweets – quick, pithy, and pointed. What would happen, I wondered, if we reimagined interjections from a century ago in an age of social media.
Real Words :: Imagined Tweets displays randomly selected interjections as if they were tweets. The words, speakers, and dates are real. The Twitter handles, activity stats, and times are invented. As the ‘tweets’ load, conversations build up across the decades. You can filter the interjections using keywords, and you might even spot an emoji or two!
Instead of shelf after shelf of bound volumes of Hansard, we now have 500 million words, spoken by our elected representatives, available in digital form. Topics can be analysed, aggregated, tracked, and visualised. Contexts can be enriched and extended. We’ve only just started to explore these possibilities. Think about the new questions you can ask, the patterns you can find, the perspectives you can build. By learning the language of Parliament we can find new ways of understanding our political past.