Lessons from ‘Data-Crunched Democracy’

Last Friday I traveled to Philadelphia for Data-Crunched Democracy, a conference drawing together political consultants, data analysis and targeting professionals, academics and journalists to discuss the impacts of “big data” on elections. This cross-disciplinary approach made for some very interesting discussions and I met a number of fascinating people (including, as it happened, the prodigious Xenocrypt). Rasmus Kleis Nielsen has a good overview of the day, but I wanted to write a bit about the lessons for journalists covering campaigns that engage in the use of data for targeting and predictive analysis.

The Limits of Obama vs. Romney

Many of the moderator and audience questions at the conference were focused on last year’s presidential campaign, and with good reason, but journalists need to absorb the important lessons and move on. Here is a quick list of relatively unimportant questions about data use from that campaign:

How many people worked on the data team?
What percentage of the campaign budget was devoted to targeting? (If you want to be a wiseass, the correct answer is: as close to 100% as possible, depending on the definition of “targeting”)
Which commercial datasets were the most useful to the campaign?
Did you really track whether visitors to the campaign sites also visited pornography sites? (this still annoys the Romney folks)
How often did you pull my Facebook graph and Twitter feed?
How much data did you have?

I wished that the organizers of the conference - who did a very good job - had blocked out 30 minutes at the start for an “Ask Me Anything About the 2012 Campaign” session, and then prohibited 2012-specific questions for the rest of the day, in order to draw the discussion towards the implications of such use and the lessons for all of us. There were times when we had that discussion, but there were other times when it kept circling back to some version of “How great, in your opinion, was the Obama campaign’s data use” or “Why didn’t the Romney campaign do X?”. As Ethan Roeder, an Obama campaign data staffer, put it on Twitter: “Obama campaign wasn’t all that and a bag of chips, bro.”

The first thing we can do as journalists is to focus on the broader lessons and questions of the campaign, not rehash it.

Not Exactly Like Selling Orange Juice

Yes, there was some excellent journalism done about the campaigns’ use of data, particularly from Sasha Issenberg, Alexis Madrigal and Lois Beckett of ProPublica. And it’s logical that much of the focus from the media and academics in this area has been on the 2012 presidential race; it featured two campaigns with massive resources and plans to engage and target voters in sophisticated ways. There are some really innovative techniques that we learned about, including the use of data recorded by television set-top boxes. But even that Gawker piece seems to me to focus on the utility of such a tactic in the wrong way. As Carol Davidsen, who described the process at the conference, said, “behavior data is more interesting than consumer data,” and the campaign was looking to spend its television ad budget wisely. So running ads on a specific cable network during a specific time period becomes less about targeting specific individuals - Davidsen pointed out that while TV viewership hasn’t slacked off, people are doing other things while the TV is on - and more about getting a better value for the campaign’s ad budget. Yes, it’s targeting. But what I heard from the people who do campaign targeting is that a lot of what’s possible is fuzzy at best. It’s very hard to get down to targeting individuals precisely using data from social media or other sources beyond a voter file.

There are too many occasions when the press has conflated political data work with the most common use case: selling stuff to consumers. Elections are not like selling orange juice; rather, they are like selling orange juice to people who drink it infrequently, are often ambivalent about it and can change their minds based on what may seem tangential issues. In other words, it’s hard, and that difficulty matters both to the process and how journalists explain it. Political campaigns have not and cannot just take corporate data mining practices and plug them into a political campaign; it’s a different process, one that actually has less data.

Listen to Alex Lundry of TargetPoint Consulting, who worked on the Romney campaign: “What we’re doing is building data bridges, but to data islands. There are no individuals, only aggregates. There are bridges, but our ability to work on those islands is limited.” Some of the academics accused the targeting folks of playing down the effectiveness of their techniques, as if they were saying that this stuff doesn’t work. I don’t believe that’s the point they were making. The point, which journalists in particular need to understand, is that integrating data sets is a messy, unreliable and hard business. I usually tell students of mine that at least 75 percent of the time they spend with data is doing stuff before the analysis: acquiring, cleaning, standardizing, figuring out the limits of it. Lundry put that number at north of 90 percent at the conference. So there are no magic bullets here, and the really valuable part is the voter file, not the other data. It would be convenient for journalists if the story of data use in the 2012 campaign turned on a single tactic or dataset, but it’s not true. We need to be honest with ourselves and our readers about what campaigns are trying, but especially what works, even if it’s not thrilling.

Keep Calm and Write Sensibly

There was some pushback from campaign staff towards the media for its reporting on data and privacy, that we’ve gotten alarmist and are scaring readers. I think they have a point, in that the world we describe too often sounds like, as one panelist put it, “a Tom Clancy novel” where everyone assumes that they are being followed all the time. I’m not saying there is no danger of abuse or misuse of personal data - a campaign staffer could disclose personal information about voters in a number of ways - but I do think the journalists need to recognize that most campaigns are built on a simple model: do what’s necessary to win. That often means that big plans that you and I might dream up for massive digital surveillance don’t happen because campaigns don’t have the time, staff or incentive to conduct them. It doesn’t mean that campaigns are always good actors, however, and journalists need to be able to understand the technology enough to tell the difference.

During the presidential campaign, the Times (and I presume other outlets) was pitched a story by one side about the Internet practices of the other, with the implication being that the other side wasn’t playing by rules barring coordination among separate committees. While it was technically possible that such skullduggery was afoot, there was no proof (or even very solid evidence). I imagine it would be tough for many reporters to be able to judge the situation, considering that it turned on how Web analytics were being used and identified. Ultimately, we did not publish a story, and when I asked one of the original sources of the pitch about it after the election, this person laughed in a fairly sheepish way. They tried: it was a campaign, after all. You want to win.

So yes, the media has its problems in accurately writing about data use, privacy and the Web components of modern political campaigns, but let’s not forget that the campaigns have an interest in painting each other as bad actors, too, and journalism has a role in trying to figure out such claims. Which means that journalists need to understand how sites track visitors. Your own analytics department would be a good place to start. And then we need to write calmly about the tactics and implications. This approach, perhaps best exemplified by ProPublica’s Message Machine work, tries to pull apart the campaign’s techniques without making them sound like they are some science-fiction tale. Campaign emails are a great place to start, because lots of people receive email, and it’s so cheap to send that there’s no incentive for the campaigns not to experiment. When that approach becomes possible with television advertising (an area Lundry calls “ripe for innovation”), we need to be ready for it, as difficult as it may be to track.

Focus on the Basics

Because a lot of this data stuff seems like rocket science to most journalists, there’s the temptation to believe that data science can do anything, and that anything is possible. But when it comes to campaign data, it all starts and ends with the voter files - registration and voter history information. That’s the core data asset, and everything else is built atop it. You can’t really start with, say, a file detailing magazine subscriptions and build an effective campaign. Lundry called the information gathered by volunteers or paid canvassers “solid gold” to a campaign, because it is collected by a person who is able to talk to potential voters.

From that point, the workings can get a little obscure, as campaigns try to integrate data from the voter files with information they have collected. Part of that is the commercial data, but only part, and its predictive value doesn’t always rival that of voter history files and partisan leanings (particularly in the past few elections). For journalists, the lesson is clear: we should at least have what campaigns start with in terms of voter data: voter registration, voter history and political geographies. In many states, these are freely available (although, as Kevin Collins notes, that’s not universally true). The campaigns were updating their voter data monthly, then weekly, then daily as the election neared. Are we doing anything similar?

I hear the cry: “But what about social media?” Here’s what Rayid Ghani, the Obama campaign’s chief data scientist, said at the conference: “So far there hasn’t been a good way to connect that with the voter file.” The piles of Facebook Likes, the retweets and reblogged Tumblr posts - they are all interesting things, but how relevant are they if a sizable chunk of your social audience either lives outside the United States or isn’t eligible to vote for some other reason (age, for example)? The most interesting question from the 2012 election, to me, is how the Obama campaign managed to find additional voters and turn them out. Some of that is due to predictive modeling. But some of it is likely due to really diligent in-person field work. We should strive to understand all the parts of the effort.

One way to do that is to follow the people who did targeting work as they now go on to other things. Many are still involved in politics at the national level, including folks like Lundry. Others, like Obama’s former Wisconsin field operations lead Hallie Montoya Tansey, are looking down the ballot. She and a friend have founded The Target Labs to provide prediction services to campaigns. Why? Here’s what she says about working for Obama in Wisconsin:

“The reason we were reaching out to voters outside of the Democratic strongholds in Madison and Milwaukee is because we were using data analytics to build models that used all the information we had about voters, including demographics, geography, and past voting history, to make extremely accurate predictions about how likely they were to support us. These models made it possible to find ‘our voters’ on every block in every region of the state. In 2004, John Kerry won Wisconsin by 10,000 votes. In 2008, our margin of victory was 400,000. Although I’d worked on many campaigns before, this was my first experience using predictive modeling, and it convinced me completely of the potential of these techniques, adopted at scale, to change the way campaigns are run.”

News organizations should be doing post-mortems using 2004, 2008 and 2012 voter history and registration data to figure out where the “new” voters came from, who they might be and how solid they are in terms of voting behavior. Although no one at the conference was certain that micro-targeting would become a staple of every campaign anytime soon, I think we all acknowledged that it is coming to an election near you. That warning applies to journalists, too. The way to start is to begin valuing information about voters as much as campaigns do, and to try and emulate the data efforts of the campaigns, so that we can more accurately assess their work.