Hard Problems

Sep 14 2010

I wasn’t going to respond to Ellen Miller’s comments on my previous post, mostly because I thought I had said what I wanted to. But now that O’Reilly has picked up on things, I figure it might be worth one last attempt on my part. Your experience, of course, may argue against that.

Ellen writes:

Eventually, Derek, someone has to stand up and say “but he isn’t wearing any clothes,” just like that kid at the parade. And that’s what we’ve done about USASpending …

The Administration’s delivery on their transparency promises is pretty poor. Perhaps only worse is the quality of some of the data we are seeing. It’s not wrong to point that out and demand that it to be fixed. It’s the responsible thing to do.

Well, yes, it’s certainly fair to point out where the government has fallen short of its promises. No argument there. I think a close examination of what Sunlight has said and done shows that things aren’t that simple. Let’s consider Data.gov, for example. By pushing for access to datasets, Sunlight got what it requested: a bunch of datasets. Assuming that they’d all contain pristine information would have been a foolish notion, and I’m sure there are scores of journalists who could have said as much. Quality has always been an issue.

But let’s take a look at what Sunlight said after Data.gov was announced but before it was released. Sunlight Labs asked for five things:

  1. Bulk access to data
  2. Accountability for Data Quality
  3. Clear, understandable language
  4. Service and developer friendly file formats
  5. Comprehensiveness

Assuming those were listed in rough order of importance, Data.gov has done quite a bit towards satisfying some of those conditions, but certainly nothing approaching perfection. There is a lot of data you can get in bulk. And you can rate the quality or usefulness of each dataset (although the public doesn’t seem to be much interested in doing that). But to quality specifically, the Sunlight post says: “In order to make this the most efficient process possible, Government should rely on the customers of its data to pinpoint where the problems are.” (emphasis mine)

This doesn’t excuse the publication of dirty or incomplete data, but asking for the release of what is known to be imperfect data and then saying that things aren’t working because the data isn’t good enough seems to be a bit … off. Or somewhat obvious. Or maybe reminiscent of Captain Renault.

Tom Lee’s post introducing ClearSpending – which is a responsible and considered response to the issue of data quality – acknowledges the complexity of things: “These are hard problems, and the people responsible for USASpending don’t really have the power necessary to get other agencies into line.” So knocking them for redesigning the site – something Sunlight considers important enough to do repeatedly as a public service – seems to me to fall a bit flat.

What bugs me about this situation is that there are times when the “open government” community suffers from similar issues – not exactly data quality, but of a broader concern that Tom writes about:

Frankly, I’m worried about what happens when people start asking what concrete things the open government movement has accomplished. We need to make sure that the answer isn’t “accidentally misleading a lot of people.”

Show me how this Poligraft analysis of a typical NYT story about a Sunday talk show appearance deepens my understanding of the interconnections between the the people mentioned in it. It doesn’t. That’s not because Poligraft is a bad idea; it’s because the goal it sets out to achieve is hard, really hard. Does that mean that Sunlight is failing to deliver on its promises? Or the Capitol Words project that’s currently on hiatus, is that a failure, too?

Of course not. But these are hard problems, and they require a lot more than some of the rhetoric I’ve seen of late. I’d love to see more stuff like ClearSpending, without some of the rhetoric that has accompanied it.