Civic Data and Journalism


Derek Willis


June 6, 2015

A solid foundation of publicly available, consistent civic data - think of election results, voting information and other political records - is more important than ever for journalism. And while we’ve made a good start, we have a long way to go.

On June 5 I had the privilege of being on a panel at the Personal Democracy Forum in New York with Luciana Lopez of Reuters, Jonathan Capehart of The Washington Post and Jenn Topper of the Sunlight Foundation. The title was “How Civic Tech is Changing the Way Newsrooms Cover Elections,” and our discussion was wide-ranging, thanks to a) good panel selection and b) expansive definitions of civic tech.

My basic theme was this: newsrooms need access to a basic level of civic data and technology now more than ever, for two major reasons. First, the days when most newsrooms had the budget and staff to maintain their own versions of important civic data (like voter files or campaign contributions) are gone. Instead, the most common options are seeking out such data from third-party providers or not doing it at all. While there are some excellent third-party providers, including the folks at Sunlight and the Center for Responsive Politics, this is still a system that makes it harder for journalists to stumble across potential stories on their own.

Second, consensus on how to describe some political situation is harder to achieve when the number of publishers multiplies. Maybe we don’t want total consensus, but I’d argue that it’s desirable and important that we at least are able to have a common understanding of the key data that informs our civic life and how we report on it. Along with methodological transparency, it’s a way that we can be upfront with readers and each other.

Two projects have helped make these points clearer to me. The first, the United States project, is the result of conversations between Eric Mill (then at Sunlight), Joshua Tauberer of Govtrack and myself that came to a head in August 2012. We all agreed that it made no sense for each of us to maintain separate scrapers for legislative data, and that beyond duplicative work, it created the possibility that different choices made in isolation would result in different data. No one wanted a situation in which three providers of congressional data disagreed with each other - and we had a few cases of that, inadvertently.

The project grew from an initial focus on Congress to include executive branch data, including a central repository of inspector general reports. It now has a number of contributors, and decisions are made by rough consensus. The result is a stable foundation of legislative data. If, for example, The New York Times Congress API that I maintain were to be discontinued, something very similar to it could be re-created in a matter of weeks, if not days, using this data as its primary source. In less than three years, a group of volunteers has made it possible for virtually anyone seeking bulk legislative data to not have to spend weeks or months reinventing the wheel. That’s a huge accomplishment for civic technology, and for journalism.

The second project is OpenElections, which I’ve been working on along with Serdar Tumgoren and a host of volunteers since early 2013. This effort will go on for years, but its goal is to remove the high barriers to journalists and developers seeking to work with historic election results data. To make it easier for people to understand how elections have played out in their towns, counties and states over the past 15 years (and further if we can). Most newsrooms, never bastions of efficiency during good economic times, now face pressures that make it all but impossible to embark on months-long efforts to collect and parse data. If they were able to do so, OpenElections wouldn’t exist.

What makes these efforts worthwhile for me is the idea that someone, somewhere will use the data we’ve gathered and published to make something I’ve never considered. Or to look at issues or trends I’ll never look at. This is public data; it should have a public home and a public understanding. There’s still a huge role for journalism in civic technology, none bigger than providing context and understanding from our shared foundation. And there’s still a lot of work to do. We have too many silos of civic information about politicians and candidates. We duplicate each other’s work too often to make sense.

The United States project and OpenElections could have been owned by a single newsroom and turned out ok. But I don’t think that’s the best scenario, because then the data that provides a common foundation is subject to a single organization’s control. We literally, and figuratively, can’t afford that.

So here’s what you can do: if you’re interested in congressional or election results data, consider contributing to these projects. Or consider filling the other gaps we have, especially at lower levels of government. But whatever you do, try to ensure that your work doesn’t become someone else’s repetition. Our common foundation, and our ability to deliver the kind of journalism that our civic life demands, can’t afford it, either.