I love coffee. I love finding new coffee shops and trying out great coffee roasters. Some of my favorites are Great Lakes Coffee, Stumptown Roasters in NYC’s Ace Hotel, Red Rooster (my choice for home delivery) and my go-to local coffee shop, The Grounds. The Grounds is owned by friends and you have to love […]
Category: Uncategorized
Neo4j – Uber H3 – Geospatial
We are going to take a slight detour with regards to the healthcare blog series and talk about Uber H3. H3 is a hexagonal hierarchical geospatial indexing system. It comes with an API for indexing coordinates into a global grid. The grid is fully global and you can choose your resolution. The advantages and disadvantages […]
Using Amazon CloudWatch to monitor Neo4j Logs
This post describes how to set up Amazon CloudWatch. Amazon CloudWatch Logs allows you to monitor, store, and access your Neo4j log files from Amazon EC2 instances, AWS CloudTrail, or other sources. You can then retrieve the associated log data from CloudWatch Logs using the Amazon CloudWatch console, the CloudWatch Logs commands in the AWS […]
Accessing Hive/Impala from Neo4j
Quite frequently, I get asked about how you could import data from Hadoop and bring it into Neo4j. More often than not, the request is about importing from Impala or Hive. Today, Neo4j isn’t able to directly access an HDFS file system so we have to use a different approach. For this example, we will […]
Neo4j – New Neo4j-import
Neo4j has recently announced the 2.2 Milestone 2 release. Among the exciting features is the improved and fully integrated “Superfast Batch Loader”. This utility (unsurprisingly) called neo4j-import, now supports large scale non-transactional initial loads (of 10M to 10B+ elements) with sustained throughputs around 1M records (node or relationship or property) per second. Neo4j-import is available […]
Reading and Writing Parquet files with Mortar
Using Mortar to read/write Parquet files You can load Parquet formatted files into Mortar, or tell Mortar to output data in Parquet format using the Hadoop connectors that can be built from here or downloaded from here. Last year, Cloudera, in collaboration with Twitter and others, released a new Apache Hadoop-friendly, binary, columnar file format […]
Medicare Payment Data set for release
The Centers for Medicare and Medicaid Services made a huge announcement on Wednesday. By mid-next week (April 9th), they will be releasing a massive database on the payments made to 880,000 health care professionals serving seniors and the disabled in the Medicare program, officials said this afternoon. The data will cover doctors and other practitioners […]
Analyzing HHS Data with Mortar
I’m starting a new series on analyzing publicly available large data sets using Mortar. In the series, I will walk through the steps of obtaining the data sets, writing some Pig scripts to join and massage the data sets, adding in some UDFs that perform statistical functions on the data and then plotting those results […]
Hadoop, Impala and Neo4J
Back in December, I wrote about some ways of moving data from Hadoop into Neo4J using Pig, Py2Neo and Neo4J. Overall, it was successful although maybe not at the scale I would have liked. So this is really attempt number two at using Hadoop technology to populate a Neo4J instance. In this post, I’ll use […]
Recently Released Public Data Sets
ProPublica announced the ProPublica Data Store. ProPublica is making available the data sets that they have used to power their analysis. The data sets are bucketed into Premium, FOIA data and external data sets. Premium data sets have been cleaned up, categorized and enhanced with data from other sources. FOIA data is raw data from […]