The Annotated Archive


Derek Willis


May 20, 2005

Third in a series of essays humbly titled “Fixing Journalism”

I love archives. As a kid, my favorite books were reference books, the kind that had baseball stats for every team in every year since 1903. My mother, an English teacher, probably wondered what she had done wrong when I opted for lists over literature.

So I was a pretty happy fellow when I first walked into the news library at the Palm Beach Post in May 1995 as an intern. Here was a place loaded with archives, not just in the physical sense but also electronic, where I could search for the proverbial needle in a haystack – and sometimes find it.

So I love archives. But you know what I’d really love?

An annotated archive. An archive that doesn’t just display vertical depth going back years but can show relationships between archived items and the individuals and institutions named within them. An archive that can help find connections easier and can help new or unfamiliar users get up to speed quickly.

Now this isn’t a knock on current archives; they are indispensable tools for research. But imagine if we could create and grow an archive that provided supplemental information and showed networks between topics, individuals and organizations.

For example, if you have a story on a controversial new development, the annotated archive could have links within the story to information researchers have about the company or its executives. A list of articles that mention a local businessman would be a single click away, along with other important information.

This isn’t impossible; in fact, the tools exist to build a rudimentary example of the idea right now. And unlike my previous essays, for this one I’ve tried to come up with a small example of what an annotated archive might look like.

The tool I chose is the MediaWiki software that powers Wikipedia. It’s free and open-source software, easy to install and administer. The reason I picked MediaWiki is that many folks have heard of Wikipedia and might be able to better envision how a newspaper archive with Wiki principles might work.

Now, onto my example archive. It’s not much – only a few articles that I wrote or co-wrote while I was at Congressional Quarterly. The concept is pretty simple: each article in the archive can contain one or more internal references (in addition to external references) to other pages, which may contain other stories, information about a person, company or organization, or other notes.

So, for example, a story includes an internal link to Tom DeLay, the House Majority Leader. That page currently has links to external Web sites, but it could contain internal-only notes, contact information or pointers to documents or other references.

On each page, there’s a link on the lower left called ’’What links here,” which pulls up each of the other pages in the archive that reference that page. With the proper tending and weeding, the usefulness of such a feature only grows, illuminating connections that even beat reporters may not have realized.

Of course that’s a best case scenario. Such an annotated archive requires regular maintenance – an archive editor, basically – who must be a news generalist and yet recognize key people and organizations within the community. Those people are inside our newsrooms – we must tap their knowledge in a better way.

One of the best features of an annotated archive is its flexibility. It can grow to incorporate new topics and can be updated when a familiar subject suddenly takes on a new importance. This doesn’t have to be done with Wiki software, either; that’s just an easy way to demonstrate the principles. We have access to such a range of technologies that can enable us to think about our information in new and better ways.

Users are already thinking about these issues. Consider The Annotated Times, which essentially tracks New York Times articles by topic and author, and even provides custom RSS feeds so that users can track articles by reporter. Or Chicago Crime, which took crime reports from the Chicago Police Department and made them available in ways that were interesting to readers. We need to treat our archives as our most valuable data, because it is.

Archives have long been both an internal resource and an external product for newspapers. They still are, but we don’t have to treat them as the same thing. The archive that newspapers provide to their employees should be more valuable than the one they sell to vendors, and the newsroom is the place that can and must make it happen.