The Centers for Medicare and Medicaid Services made a huge announcement on Wednesday. By mid-next week (April 9th), they will be releasing a massive database on the payments made to 880,000 health care professionals serving seniors and the disabled in the Medicare program, officials said this afternoon. The data will cover doctors and other practitioners […]
Author: dave fauth
Analyzing HHS Data with Mortar
I’m starting a new series on analyzing publicly available large data sets using Mortar. In the series, I will walk through the steps of obtaining the data sets, writing some Pig scripts to join and massage the data sets, adding in some UDFs that perform statistical functions on the data and then plotting those results […]
Hadoop, Impala and Neo4J
Back in December, I wrote about some ways of moving data from Hadoop into Neo4J using Pig, Py2Neo and Neo4J. Overall, it was successful although maybe not at the scale I would have liked. So this is really attempt number two at using Hadoop technology to populate a Neo4J instance. In this post, I’ll use […]
Recently Released Public Data Sets
ProPublica announced the ProPublica Data Store. ProPublica is making available the data sets that they have used to power their analysis. The data sets are bucketed into Premium, FOIA data and external data sets. Premium data sets have been cleaned up, categorized and enhanced with data from other sources. FOIA data is raw data from […]
Extracting Insight from FBO Data Using Mortar and Elasticsearch – Part 4
Earlier this year, Sunlight foundation filed a lawsuit under the Freedom of Information Act. The lawsuit requested solication and award notices from FBO.gov. In November, Sunlight received over a decade’s worth of information and posted the information on-line for public downloading. I want to say a big thanks to Ginger McCall and Kaitlin Devine for […]
Extracting Insights from FBO.Gov data – Part 3
Earlier this year, Sunlight foundation filed a lawsuit under the Freedom of Information Act. The lawsuit requested solication and award notices from FBO.gov. In November, Sunlight received over a decade’s worth of information and posted the information on-line for public downloading. I want to say a big thanks to Ginger McCall and Kaitlin Devine for […]
Extracting Insights from FBO.Gov data – Part 2
Earlier this year, Sunlight foundation filed a lawsuit under the Freedom of Information Act. The lawsuit requested solication and award notices from FBO.gov. In November, Sunlight received over a decade’s worth of information and posted the information on-line for public downloading. I want to say a big thanks to Ginger McCall and Kaitlin Devine for […]
Extracting Insights from FBO.Gov data – Part 1
Extracting Insights from FBO.Gov data – Part 1 Earlier this year, Sunlight foundation filed a lawsuit under the Freedom of Information Act. The lawsuit requested solication and award notices from FBO.gov. In November, Sunlight received over a decade’s worth of information and posted the information on-line for public downloading. I want to say a big […]
Hadoop to Neo4J
Leading up to Graphconnect NY, I was distracting myself from working on my talk by determining if there was any way to import data directly from Hadoop into a graph database, specifically, Neo4j. Previously, I had written some Pig jobs to output the data into various files and then used the Neo4J batchinserter to load […]
Creating an Elasticsearch index of Congress Bills using Pig
Recently Mortar worked with Pig and CPython to have it committed into the Apache Pig trunk. This now allows to take advantage of Hadoop with real Python. Users get to focus just on the logic you need, and streaming Python takes care of all the plumbing. Shortly thereafter, Elasticsearch announced integration with Hadoop. “Using Elasticsearch […]