Category: Uncategorized

Analyzing HHS Data with Mortar

Post author By dave fauth
Post date April 3, 2014
Categories In Uncategorized
No Comments on Analyzing HHS Data with Mortar

I’m starting a new series on analyzing publicly available large data sets using Mortar. In the series, I will walk through the steps of obtaining the data sets, writing some Pig scripts to join and massage the data sets, adding in some UDFs that perform statistical functions on the data and then plotting those results […]

Hadoop, Impala and Neo4J

Post author By dave fauth
Post date March 12, 2014
Categories In Hadoop, Neo4J, opendata, Uncategorized
No Comments on Hadoop, Impala and Neo4J

Back in December, I wrote about some ways of moving data from Hadoop into Neo4J using Pig, Py2Neo and Neo4J. Overall, it was successful although maybe not at the scale I would have liked. So this is really attempt number two at using Hadoop technology to populate a Neo4J instance. In this post, I’ll use […]

Recently Released Public Data Sets

Post author By dave fauth
Post date March 10, 2014
Categories In Uncategorized
No Comments on Recently Released Public Data Sets

ProPublica announced the ProPublica Data Store. ProPublica is making available the data sets that they have used to power their analysis. The data sets are bucketed into Premium, FOIA data and external data sets. Premium data sets have been cleaned up, categorized and enhanced with data from other sources. FOIA data is raw data from […]

Extracting Insights from FBO.Gov data – Part 2

Post author By dave fauth
Post date January 5, 2014
Categories In Hadoop, Mortar, opendata, Uncategorized, Visualization
No Comments on Extracting Insights from FBO.Gov data – Part 2

Earlier this year, Sunlight foundation filed a lawsuit under the Freedom of Information Act. The lawsuit requested solication and award notices from FBO.gov. In November, Sunlight received over a decade’s worth of information and posted the information on-line for public downloading. I want to say a big thanks to Ginger McCall and Kaitlin Devine for […]

Hadoop to Neo4J

Leading up to Graphconnect NY, I was distracting myself from working on my talk by determining if there was any way to import data directly from Hadoop into a graph database, specifically, Neo4j. Previously, I had written some Pig jobs to output the data into various files and then used the Neo4J batchinserter to load […]

Creating an Elasticsearch index of Congress Bills using Pig

Post author By dave fauth
Post date October 24, 2013
Categories In Uncategorized
No Comments on Creating an Elasticsearch index of Congress Bills using Pig

Recently Mortar worked with Pig and CPython to have it committed into the Apache Pig trunk. This now allows to take advantage of Hadoop with real Python. Users get to focus just on the logic you need, and streaming Python takes care of all the plumbing. Shortly thereafter, Elasticsearch announced integration with Hadoop. “Using Elasticsearch […]

Health Insurance Marketplace Costs

Post author By dave fauth
Post date October 7, 2013
Categories In Uncategorized
1 Comment on Health Insurance Marketplace Costs

Data.Healthcare.Gov released QHP cost information for various health care plans for states in the Federally-Facilitated and State-Partnership Marketplaces. The data is available in a variety of formats and lays out costs for various levels of health care plans (Gold, Silver, Bronze and Catastrophe) for different categories. Premium Information Premium amounts do not include tax credits […]

Part 2 – Building an Enhanced DocGraph Dataset using Mortar (Hadoop) and Neo4J

Post author By dave fauth
Post date August 26, 2013
Categories In Uncategorized
No Comments on Part 2 – Building an Enhanced DocGraph Dataset using Mortar (Hadoop) and Neo4J

In the last post, I talked about creating the enhanced DocGraph dataset using Mortar and Neo4J. Our data model looks like the following: Nodes Organizations Specialties Providers Locations CountiesZip Census Relationships * Organizations -[:PARENT_OF] – Providers -[:SPECIALTY]- Specialties * Providers -[:LOCATED_IN]-Locations * Providers -[:REFERRED]-Providers * Counties -[:INCOME_IN]- CountiesZip * Locations – [:LOCATED_IN]-Locations Each of the […]

Building an Enhanced DocGraph Dataset using Mortar (Hadoop) and Neo4J

Post author By dave fauth
Post date August 19, 2013
Categories In Uncategorized
1 Comment on Building an Enhanced DocGraph Dataset using Mortar (Hadoop) and Neo4J

“The average doctor has likely never heard of Fred Trotter, but he has some provocative ideas about using physician data to change how healthcare gets delivered.” This was from a recent Gigaom article. You can read more details about DocGraph from Fred Trotter’s post. The basic data set is just three columns: two separate NPI […]

Recommender Tips, Mortar and DocGraph

Post author By dave fauth
Post date August 14, 2013
Categories In Uncategorized
No Comments on Recommender Tips, Mortar and DocGraph

Jonathan Packer wrote on Mortar’s blog about flexible recommender models. Jonathan articulates that “from a business perspective the two most salient advantages of graph-based models: flexibility and simplicity.” Some of salient points made in the article are: graph-based models are modular and transparent simple graph-based model will allow you to build a viable recommender system […]