Today was the first day I had a chance to use Datameer 2.0 on some actual data. I’ve decided to use some of the FEC campaign contribution data and see how Datameer works with the data.
The Federal Election Commission has made campaign contribution data publicly available for download here. The FEC has provided campaign finance maps on its home page. The Sunlight Foundation has created the Influence Explorer to provide similar analysis. For the most recent dataset, there are 5,257 candidates, 12,588 committees and 1,505,580 individual campaign contributions.
The data I am working with consists of a list of candidates, committees supporting the candidates and individual campaign contributions. Each of these files can be downloaded from the FEC website listed above. The FEC is changing the layout of the files from fixed length fields to pipe delimited. You can read more about that here. Since the candidates and committee fields weren’t yet converted, I loaded those files into Oracle and wrote a quick PHP script to output them in delimited format.
One of the pluses about Datameer 2.0 is that it can access data in a variety of formats and locations. To test out these capabilities, I uploaded the three data files to my Amazon S3 instance. Connecting to the S3 instance was a simple process. I selected S3 as the data store and then filled in the connection details shown below.
After saving the data store, you can see the data store in the Datameer Browser tab. I simply called mine DataStore.
Once the DataStore was set up, I created a new data link to each of the three files that I had stored in my S3 instance. The data link steps the users through selecting the data store and the file type (log file, delimited file, fixed width, twitter data, and XML are some of the file types). It then asks you for the file name and whether the file has column header information in the first row.
One of the great features of Datameer 2.0 is that it is able to make smart choices about the data fields and types based on the data. The user can rename column names and modify data types before the data is loaded. This is a real easy interface to use.
Once all three data links were set up, you could see them in the browser tab.
Using the familiarity of the spreadsheet, Datameer 2.0 provides the user the capability to do powerful analysis of the data. Even the most complex nested joins of a large number of datasets can be performed using an interactive dialog. Mix and match analytics and data transformations in unlimited number of data processing pipelines leaving the raw data untouched. The image below is a sample of the candidates file.
In the next post, we’ll look at doing some of the joins between the data and begin looking at some of the built-in graphical presentation capabilities of Datameer 2.0.