The Scoop

  • Home
  • Projects
  • About The Scoop
  • Fixing Journalism
  • Departments
    • Apple
    • Asides
    • Broadcast
    • Campaign Finance
    • Car Tools
    • Data
    • DIY
    • django
    • Fed Data
    • FOIA
    • General
    • IRE
    • Journalism
    • Local Data
    • Mapping
    • Miscellany
    • NonGov Data
    • Online
    • Paper Trail
    • Presentations
    • Public Records
    • Python
    • Rails
    • SLA
    • Social Network Analysis
    • Sports
    • State Data
    • Teaching
    • Work
    • XML
  • Subscribe via RSS

Six Reasons To Look Past Caspio

August 18th, 2008  |  Published in Journalism | Comments (4)

Mindy asks for some bullet points on why news organizations would do better to not use Caspio for their Web database needs. Feel free to add on:

  1. SEO. If you like building databases that are not indexed by Google and other search engines, then Caspio’s right for you. Go ahead, Google “Powered by Caspio“.
  2. Owning vs. Renting. You will never stop paying for Caspio unless you quit it entirely. And then you’ll still need to rewrite your apps. All you’ve gained is more work.
  3. You will need programming. Caspio says “no more programming,” but to do anything beyond basic search and display, you will need some. Oh, but you can’t get access to that functionality during a free trial.
  4. Like using Flash? Caspio doesn’t.
  5. Nickel and Dime. Zip code searching costs $150 to setup and $50 a month.
  6. As my boss and friend Aron says, “We can’t outsource our future.” By choosing Caspio, you’re dependent upon them to add features, and while they do, they add them for all users, too. So much for differentiation.

And here’s a bonus quote from Jacob Kaplan-Moss, one of Django’s lead developers, who admittedly has a bias in this area. But still, it’s a very telling quote: “I’ve actually stopped being all that concerned about Caspio: each new Caspio customer is one more competitor my paper doesn’t have to worry about.”

Now, I’m sure there are six reasons to use Caspio, but I don’t think they stack up in the long term. I think they leave you with more work, not less, and with apps that you have to spend valuable time making look different from everybody else who uses Caspio.

Fumblerooski

August 9th, 2008  |  Published in Sports, django | Comments (6)

For reference purposes, you may want to study this old commercial for Reese’s Peanut Butter Cups. Recommended, but not necessary, is this definition.

It’s August, which means that college football is just around the corner. College football is why I don’t volunteer to teach any classes in the fall. It’s why I occasionally compensate my better half for missed Saturday afternoons (although thanks to ESPN 360, I’m not nearly as bad as I was pre-child). So I love college ball, and I love data. That’s where Fumblerooski comes in.

Let me say from the get-go that this is not nearly a finished site. It’s not even halfway there. I’m posting about it now because I’d like to invite people with similar interests to help me build out a site that puts the numbers behind college football front and center. Yes, I have ideas - APIs, for example - but alone Fumblerooski will only ever be so good, and certainly not good enough. That’s why the code behind the site is on github.

The basics: it’s running Django trunk (so, yes, that’s the 1.0 beta candidate right now) and uses MySQL as a backend. Right now I have game results dating back to 1987 for most major schools and spottier coverage for minor ones. In addition, the NCAA releases game-by-game statistics for players and I have some scripts for processing that data, although there’s plenty of room for improvement. Folks who dive into the code may also notice that I started a recruiting dataset as well, but I think that area is well-covered, so it’s not a priority for me at this time. At the moment, Fumblerooski is running on a Joyent 1/2 gig Accelerator with nginx as the Web server.

Most of my work so far as gone towards building out team information. Take my alma mater, Pittsburgh: you can see the results of a given season, check out a series (you can reverse it if you’re one of those WVU fans) or see details of an individual game. The drive chart, which is a fairly recent NCAA addition, is dynamically fetched (and no, I can’t do anything about the colors).

I envision at least two types of contributors: one would help on the coding side with new features (I have plans for aggregate player stuff, but want to wait to see what gets into Django). Another type could be with information: fleshing out coaching details, for example. In my wildest dreams, Fumblerooski gets a severely-needed makeover as well. Any takers? Feel free to sign up at github, or fork the code, or whatever. You can also contact me if you’d like to help in other ways.

Oh, and the name? It was the best football-only term available, but I also got the blessing of Nebraska alum Matt Waite.

The Birth of Quadruplets, or Understanding the Process

July 22nd, 2008  |  Published in Journalism | Comments (0)

My friend Dave Gulliver had a fascinating piece in his paper on Sunday about the birth of quadruplets in a Sarasota hospital. It’s a great story, but what makes it greater is that it was written by somebody with a certain amount of expertise on the subject of difficult premature multiple births. I hope Dave doesn’t mind, but I’d like to use that story as an example of why understanding the use of data is increasingly important for large swaths of journalism.

There’s a tendency among some folks in the industry to see CAR and other technological tools as just that - blunt instruments. Helpful, sure, but not ultimately necessary to the task of creating journalism. And for a segment of what journalism does, that’s probably ok. When we report on people and institutions that aren’t using technology to guide their decisions or actions, then an understanding of how data is used or certain technologies isn’t a necessity.

I suppose a music critic needn’t understand much about databases, for example, but reporters covering government, business, college or professional sports, to name a few, should be able to assess their subjects the way that people inside those sectors do. And increasingly, that means understanding the use of data. Many local governments base their police staffing - who covers where - on a non-stop flow of crime data. Sports teams pour over tape, logging their opponents’ tendencies in preparation for upcoming games. Businesses are all about the numbers, too.

And then there’s politics. Winning elections these days is very often about putting together enough voters to crack 50%. There’s microtargeting based on consumer data and door-to-door canvassing so that volunteers can input demographic data into centralized servers. They’re not doing that just for fun - it’s valuable information. But if journalists can’t really grasp how organizations are using data, we’re liable to miss the effects, and thus miss some fuller explanations of events. Yes, we can rely on people to tell us what’s happening - and we should - but if data plays a big part in the life of an organization, the reporter covering it should have some basis to evaluate that role.

So how does that relate to Dave’s story about the quads? Well, after reading it, I noticed that there were some subtle bits of detail that I never would have thought to include or been able to describe as well - about how the NICU operates, the details of the births. That’s because Dave has been there with his twin boys. A parent of a child born without complications or a single person would have been hard-pressed to write as good a story. I sure wouldn’t have been able to do so.

It’s the same idea when it comes to understanding the basis for decisions that come from, at least in part, the collection and consumption of data. It’s can mean the difference between telling a story and telling a better story. I’m sure plenty of organizations that we cover would be happy to have reporters who are in the dark about these things. But that doesn’t help our readers any.

So, technology and data as a tool? Yes. But when the tools become a crucial part of the world we cover, understanding how they work and being able to use them makes us better journalists.

DjangoCon

July 20th, 2008  |  Published in django | Comments (0)

I donated to Django

The first-ever DjangoCon will be held Sept. 6-7 at the Googleplex in Mountain View, Calif. The preliminary program looks incredible, and I’m sad to be missing it. My summer travels have been plenty and another West Coast trip, especially over a weekend, is a bit too much (there’s also the nagging point that I’d have to pay for it myself!). Matt Waite will be there, on a panel discussing Django in journalism, just one of the really strong sessions. If you’re a West Coast CAR person dabbling in frameworks, it’s worth checking out.

But I’m trying to do my part, beginning with a donation to the Django Software Foundation. Doing so will help pay for conferences like DjangoCon, sprints and other activities that help improve the framework, and it’s such a small thing to do considering the benefits I’ve realized from using Django. If you feel the same way, please think about supporting Django.

Caspio’s Lessons

June 29th, 2008  |  Published in Car Tools, Journalism | Comments (6)

Been awhile since I wrote about Caspio, and since then they’ve only gained more media clients, which I suppose could be a lesson for me. But I think not. Rather, I hope what we’ll see in the months and years to come are the lessons that Matt Wynn offers from his experiences using Caspio. Here’s your nutgraf: “My conclusion on Caspio is that they do one thing very well. But other, cheaper alternatives do it just as well. Further, to learn to make it do otherwise seems pointless, especially seeing as we would be paying for the luxury of learning to hack it.” (The emphasis is mine.)

Caspio’s David Milliron spoke at this year’s Special Libraries Association conference at a panel organized by SLA’s News Division, which includes many newspaper and broadcast librarians. It’s easy to see why: a lot of these folks are being asked to do new things, to be more involved with their organization’s Web sites, and to do it with fewer people. Seems like a pretty good opportunity for Caspio, and I don’t fault them for recognizing that. The problem I have is that the promise of Caspio is in the short-term; no matter how many features they add (my personal favorite being the Data Sheet Find and Replace one: “You no longer need to export your table outside of Caspio Bridge for this type data modification.”), you’ll never get the flexibility and control over your apps that you do when you build your own stuff. Despite what Milliron says, there are very real and serious differences between Caspio and Web application frameworks.

Maybe that’s the real lesson that journalism folks need to heed: that the costs of learning Caspio go beyond the monthly fees and the potential cost of switching to another tool (having to re-do your existing apps). Caspio is, as Matt says, good at doing some pretty basic stuff when it comes to putting data online. But if you want to go beyond Ye Olde Data Ghettoe, you’ll have to learn some programming anyway. So why learn something that can only be used on a closed system that you have to rent? Matt’s alternative happens to be PHP/MySQL based, but he’s not going to be paying for using either of those. And if suddenly MySQL decides to charge corporate users or something equally far-fetched, he can switch to Postgres or SQLite without starting from scratch.

I realize many, many folks in newsrooms can say, “Um, pardon me, but we don’t have a Matt Wynn.” Or maybe you do, but he’s insanely busy all the time. That’s a very common situation. But the real long-term question is this: if your organization is never going to want to do anything more than put up isolated search pages serving up content that no search engine can reliably find, you’re still gonna pay every month for that privilege by using Caspio. And if you hope and plan on doing more someday, even if that’s not today, then you’ll have almost nothing to transfer to that effort by using Caspio, since one of their chief claims is that you don’t have to learn any programming to use it.

So if learning more is a part of your plan, why not spend the time learning a system that doesn’t charge you for that time? By adding Caspio experience to your resume, what real skills have you gained aside from the ability to point and click?

Previously


Aug 9, 2008
Fumblerooski

by Derek | Read | 6 Comments

For reference purposes, you may want to study this old commercial for Reese’s Peanut Butter Cups. Recommended, but not necessary, is this definition.
It’s August, which means that college football is just around the corner. College football is why I don’t volunteer to teach any classes in the fall. It’s why I occasionally compensate my better [...]


Jul 22, 2008
The Birth of Quadruplets, or Understanding the Process

by Derek | Read | No Comments

My friend Dave Gulliver had a fascinating piece in his paper on Sunday about the birth of quadruplets in a Sarasota hospital. It’s a great story, but what makes it greater is that it was written by somebody with a certain amount of expertise on the subject of difficult premature multiple births. I hope Dave [...]


Jul 20, 2008
DjangoCon

by Derek | Read | No Comments

The first-ever DjangoCon will be held Sept. 6-7 at the Googleplex in Mountain View, Calif. The preliminary program looks incredible, and I’m sad to be missing it. My summer travels have been plenty and another West Coast trip, especially over a weekend, is a bit too much (there’s also the nagging point that I’d have [...]


Jun 29, 2008
Caspio’s Lessons

by Derek | Read | 6 Comments

Been awhile since I wrote about Caspio, and since then they’ve only gained more media clients, which I suppose could be a lesson for me. But I think not. Rather, I hope what we’ll see in the months and years to come are the lessons that Matt Wynn offers from his experiences using Caspio. Here’s [...]


Jun 19, 2008
The Future of News Libraries

by Derek | Read | 2 Comments

At the recently-completed SLA conference in Seattle, Nora Paul led a session on the “future of news libraries” that asked the attendees to imagine 2012, when librarians (or news researchers, or whatever you want to call them) are recognized as leaders of innovation in newsrooms, and then to explain how that came to pass. It [...]


Jun 18, 2008
SLA Wrap-Up

by Derek | Read | 1 Comment

This year’s Special Libraries Association conference was, as usual, a great experience. Lots of good sessions from the News Division and other divisions. Some of my highlights, in dump mode:

A session on controlled vocabularies in art museums featuring folks from the Getty in Los Angeles. Turns out they have several datasets that might be interesting [...]

About The Scoop

Derek Willis’ weblog on investigative and computer-assisted reporting.

Recent Comments

  • Derek on Six Reasons To Look Past Caspio
  • Benj. on Six Reasons To Look Past Caspio
  • Teaching Online Journalism » Delivering data: Which solution fits best? on Six Reasons To Look Past Caspio
  • Justin Lilly on Six Reasons To Look Past Caspio
  • Kristen on Fumblerooski

Recent Posts

  • Six Reasons To Look Past Caspio
  • Fumblerooski
  • The Birth of Quadruplets, or Understanding the Process
  • DjangoCon
  • Caspio’s Lessons

Contributors

  • Derek
  • Matt

Popular

  • Methadone Overdose Deaths
  • The Times
  • On Bomb-Throwing
  • Outsourcing Database Development, or the Caspio Issue
  • Trial By Caspio
  • Joyce Meyer Ministry Compensation
  • The Original (and Future?) Facebook
  • Django, iCal and vObject
  • Teaching Data on the Web
  • EveryBlock and the Definition of News
  • Around the Site

    • Home
    • About
    • Projects
    • Fixing Journalism
    • Database of CAR Stories
  • Methods

    • Fanueil Media
    • Open
    • Institute for Analytic Journalism
    • CAR in Canada
    • IRE
    • MacDevCenter
    • ONLamp.com
    • Planet MySQL
    • Poynter
    • Resource Shelf
  • People

    • Mark Schaver
    • Jeremy Zawodny
    • Liz Donovan
    • Shannan Bowen
    • Matt Wynn
    • Chase Davis
    • Adrian Holovaty
    • Joe Adams
    • Matt Waite
    • Mike Hillyer
    • Mark Hamilton
    • William P. Hartnett


  • ©2008 The Scoop
    Powered by WordPress using the Gridline Lite theme by Graph Paper Press.