What They Say About Us – Derek Willis

As journalism buzzwords go these days, analytics has a lot going for it. The term is broad enough to encompass a wide range of ideas, ranging from the “dark side” where the application of analytics leads to bad and banal writing to the use of analytics by reporters to unearth details of breaking news. Knowing who views our work and how they interact with it, while not a savior of the industry, seems like something we should be paying attention to and incorporating into newsroom processes.

But I’m also interested in another form of analytics: how other people cite, describe and otherwise use the information we publish. Yes, the number of Facebook likes or Twitter retweets that a story gets can tell you something about how popular it is. But what about the people who take the time to describe a story on social media in their own words, or pick their favorite quote? Aren’t they arguably more invested in it, maybe even personally involved?

A few weeks ago, my colleagues in Interactive News at The Times held a hack day focusing on analytics, and built some really interesting and useful stuff that will help improve our internal knowledge of our own applications and how users are interacting with them. My own contribution was much less technically impressive, and we’ll get to that in a bit. But it scratched an itch that had started several years ago when I saw linkypedia, a site built by my friend Ed Summers. Ed works at the Library of Congress, which means he cares a lot about not just preserving information but also making it easy to reference. He also, via his work on a historic newspaper preservation project called Chronicling America, knows a bit about the news industry.

The idea behind linkypedia is that links on Wikipedia aren’t just references, they help describe how digital collections are used on the Web, and encourage the spread of knowledge: “if organizations can see how their web content is being used in Wikipedia, they will be encouraged and emboldened to do more.” When I first saw it, I immediately thought about how New York Times content was being cited on Wikipedia. Because it’s an open source project, I was able to find out, and it turned out (at least back then) that many Civil War-era stories that had been digitized were linked to from the site. I had no idea, and wondered how many of my colleagues knew. Then I wondered what else we didn’t know about how our content is being used outside the friendly confines of nytimes.com.

That’s the thread that leads from Linkypedia to TweetRewrite, my “analytics” hack that takes a nytimes.com URL and finds tweets that aren’t simply automatic retweets; it tries to filter out posts that contain the exact headline of the story to find what people say about it. It’s a pretty simple Ruby app that uses Sinatra, the Twitter and Bitly gems and a library I wrote to pull details about a story from the Times Newswire API.

Seems to me that the more we know about how our content is cited, the more we might be able to find out about our users or how they value the journalism we produce. Maybe there are product ideas in that knowledge: if people are relying on our stories for certain purposes, it’s possible that we could create custom packages and charge for them. At the very least, such insights on social media might give us an idea of how people spread stories, and maybe we can learn from that. Tracing links are just a start; we could do the same for entire passages, appearances by New York Times staff in other media and more. Another project of mine uses the Capitol Words API from the Sunlight Foundation to track mentions of The Times in the Congressional Record. If we’re lucky, people talk and write about us a lot. We should be listening and watching for what they say.