Voter File 2.0: Catalist, Democratic Tool
By Micah L. Sifry, 05/09/2008 - 4:11pm

I'm in a breakout session at the New Democratic Network's daylong conference on "New Tools, New Audiences," listening to Vijay Ravindran, the CTO of Catalist, talk about web 2.0 and its development of an "Enhanced Voter File." As usual, these are my rushed notes, and at best a good paraphrase of what was said, not direct quotation.

The traditional voter file, which is collected by state bodies, is just name, contact info and party registration, and past voter behavior.

The enhanced voter file, something that Democrats, Republicans and sometimes other organizations build and maintain, contains commercial data, census data, historical information about your behavior, and specialized data (like lifestyle choices). (Vijay notes, later in the Q&A, that this kind of data is often a weak indicator of people's actual political preferences, and hypothesizes that someday under an OpenID framework, campaigns or organizations like his might be more interested in highly accurate information that individuals volunteer about themselves.)

Enhanced voter files are used for canvassing, and also for modeling campaigns.

Catalist is building on the lessons of 2004 (where Democrats had a database meltdown) and working to build a 50 state national database:

Catalist's voter file has the names of 180 million registered voters, plus 75 million unregistered people (for use by voter registration groups), enhanced with commercial data, specialty data (like who owns hunting licenses), integrated it with the Democrat's VAN application, and with a tool for subscribers to mine the data.

Catalist's goal is to be a permanent piece of progressive infrastructure. Vijay talks about Tim O'Reilly's "What is Web 2.0" paper as his "baseline in driving Catalist." So he goes thru some of O'Reilly's key points about the development of web 2.0.

1. The web as a platform: Examples are Amazon, Facebook, Google. For Catalist, this means: if our data is inaccessible, it doesn't exist. (This is another reference to the Democratic data disaster of 2004.) A database administrator is a poor excuse for an interface where people can self-administer. From the very beginning, having a back end web interface was essential. We also created a web services API for progressive organizations with technical staff.

2. Harnessing collective intelligence. Examples are Wikipedia, Amazon, Flickr. For Catalist, this means storing, organizing and utilizing in perpetuity the collective personal data of its customers. It means removing the technical limitations around cooperation, building value-added meta-data that no one else can, and relying on its customers to make its data more correct. (This is sort of the open source model for bug fixing, but Catalist isn't an open product. But inside its ecosystem, it sounds like it's applying the same logic.)

3. Data is the next "Intel inside." For Catalist, this means instant access to information about nearly everyone over 18 in the US in a single format, easy upload of proprietary data for integrated data mining, giving back unique identifiers for each data point, and the creation of a proprietary matching system for the data. What this means you can combine field canvass lists, fundraising, membership, polling info and online engagement, to get a 360 degree view of people and figure out more ways to engage them (or ask them for money, he notes, if you haven't already).

4. End of the software release cycle. Examples are eBay, Netflix, who make their fixes in real time online. The same can be true for politics. Data can be updated in critical election months; no more stale data. And new features and bug fixes need to be deployed rapidly. (No more 2004 horror shows for the Democratic side, in essence. He draws a parallel to Christmas season at Amazon.)

5. Lightweight Programming Models. Examples are Amazon Web Services, YouTube's embed feature. Catalist's approach is to not try to do everything. Their web service is designed to allow other's creativity to take advantage, like MoveOn's "Vote Poke" application. Their formats are usable by microtargetters, and other groups can syndicate their data (like America Votes)

6. Software Above the Level of the Single Device. They're making application configurations for field, analytics, fundraising, strategy, pollsters. etc. Vijay mentions the need to make these more Blackberry friendly, given how many political staffers have them.

7. Rich user experiences. Catalist's Q tool has a professional UI design, maps, drag and drop crosstabls, inline updated counts and access control for organizations.

Where is this going? With more data and more collaboration, and web services, more innovative applications will get built both by Catalist and others. He sees the semantic web coming, intelligent crowd sourcing, integrated web mining...and ultimately more progressive power.

Question time. I ask about their business model. Subscriptions to Catalist are $25K to $400K per year. Several hundred organizations are clients. About 40 people on staff. 15 terabytes in size database. He analogizes it to an electric company, where no one org would ever have the wherewithal to build one, but it is essential infrastructure.

I also ask: "How do they insure that they're only selling their services to progressives?" He says they haven't hit the interesting question yet of "what if Joe Lieberman wants our services?" It would be up to the board. AARP would be considered progressive, he thinks.

In terms of how information is shared internally: A lot of nonprofits that use Catalist's data release their own results back for others to use, such as Womens Voices, Womens Vote, one of their clients. By and large, donor data, membership info tends to be kept private by clients.

A very impressive presentation. Ravindran, who left Amazon to lead Catalist's technology team, has clearly brought the wisdom of Silicon Valley to the political infrastructure business. He's definitely someone to watch.

interesting

I just twittered this -

"@Mlsif These updates are like gold to me. Thanks! Catalist isn't the game changer though: VAN+huge distributed relational field ops is"

And Micah asked me to come here and expand, so here goes as best I can with the four brain cells I've got left on a nice Friday afternoon...

What Catalist is doing is helpful, but it's really a top-down offering that's off to the side of the really game-changing thing that's happening, which is that we're at the beginning of this transition to a much more participatory version of our democracy.

Person to person conversations are generally going to be a lot more valuable than knowing what magazines someone subscribes to or what kind of car they drive. It's a better investment to put decent data in the hands of lots and lots of activists, along with a tool they can use to improve it, than it is to try to collect as much expensive commercial data as possible and building targeting models off that.

Ideally, you'd have both, and in some states progressives have done just that. (Credit Granholm's win in MI in 06 to a great modeling and cluster analysis program delivered through the VAN) But if you're really interested in long-term transformation, it's the person to person stuff that is going to reshape how our democracy functions. The tools you can use to augment that process are what's really interesting.

Put another way: data is a commodity, but relationships are electoral manna. I haven't seen anything in Catalist's long term vision that makes it clear to me that they get that. (maybe they'd be willing to come here and tell me how wrong I am, I'd be happy to be corrected!)

I don't know why the folks who are working on that problem are never invited to these things - get someone from VAN, or someone from one of the America Votes statewide organizing tables. Maybe it's less glitzy, or something?

interesting points

First, thanks Micah for the play by play.

And thanks Dan for getting your thoughts out there!

> Person to person conversations are generally going to be a lot more valuable than knowing what
> magazines someone subscribes to or what kind of car they drive.

I think you want both. You'd like to know more about who are you talking to, so those person to person conversations are more effective, and more importantly the results of the conversation really need to be captured for future uses -- whether it be to help with vote goals, recruiting new volunteers, etc. You seem to see this as either/or, and I see it as enable and prosper.

> It's a better investment to put decent data in the hands of lots and lots of activists,
> along with a tool they can use to improve it,
> than it is to try to collect as much expensive commercial data as possible and building targeting
> models off that.

Maybe, but I would argue that we should be able to have both. This actually speaks to the point I made about at a technical level leveraging Web Services to allow creativity and innovation to occur with the data outside of what one company can foster. And how building a business model that allows for wide dispersal of data is beneficial to everyone. Ultimately, the organizations that use Catalist will make program decisions of whether to use activists, direct mail, TV, etc. If anything, being able to quantifiably capture the effect of programs will be the best way to get decision makers to let go of out-dated techniques (whatever those are), and data is critical to basing that off of.

We see our long term vision at a lower infrastructural level than the application level space of whether one method of electoral contact is superior to another. Whatever is used, there is a clear need for progressive organizations to have: 1) a low cost source to fulfill their data needs, 2) an easy to use data warehouse to match, store, organize, and integrate their person level data 3) a platform that allows for many flexible ways to access their data in conjunction with other basic and sometimes not-so basic information.

Given the cost and complexity for progressive organizations to shoulder this on their own or through a multitude of vendors, I think this is a big advance. I will let others decide if that constitutes game changing or not. But I do know that when I see MoveOn do an application like VotePoke, which would have been prohibitively complex for the ROI without Catalist and its Web Services interface, I think we're on the right track. When I see organizations for the first time be able to know which of their donors are also volunteers and also attended an event, I see real progress. And when I see organizations for the first time have a unique person identifier that they can reliable carry across internal systems provided by Catalist, I get really excited about where this can go. Some of these concepts are not very marketable, but they are super important in the long term.

Please do check out the NDN video of my presentation, and maybe we can get into a deeper discussion sometime soon.

PS. I totally agree that America Votes and VAN deserve more pub -- they're both great organizations and valued partners of ours.



© 2008 Personal Democracy Forum | All Rights Reserved |