Update
This is a follow-up to the previous post.
Note:The data and java code is available at Github.
The Data
I decided to use six years of data (Congress sessions 107-112). Since there is significant turnover in Congress (re-election, sex scandals, resignation, and death among other reasons), I had to expand the number of congressional members to 928. In addition, over the six years of bills, there are now 5,870 unique topics or subjects of the bills (i.e. Arts, Boy Scouts, Clergy, Welfare, Whaling to mention a few).
Using the BatchInserter framework from Max De Marzi and Michael Hunger, the data can be quickly loaded into Neo4J (less than 5 minutes). There are 340,619 nodes, 846,697 properties, and 2,514,565 relationships.
Data Exploration
We’ll start out simple using Cypher queries and build from there. Let’s see how many bills discuss Whaling.
START n=node:subjects('subject:Whaling') MATCH n-[r:REFLECTS]-c RETURN c.Session, count(c.Session) ORDER BY c.Session DESC
Our output:
"110" 5 "109" 7 "108" 11 "107" 13
To see who is sponsoring or cosponsoring the most bills related to “whaling”, we run the following query:
START n=node:subjects('subject:Whaling') MATCH n-[r:REFLECTS]-c-[s:COSPONSORS|SPONSORS]-z RETURN z.firstname, z.lastname, count(z) As popCoSponsors ORDER BY popCoSponsors DESC LIMIT 15
Our output is:
z.firstname z.lastname popCoSponsors "Joseph" "Lieberman" 9 "John" "Kerry" 8 "Susan" "Collins" 8 "Edward" "Kennedy" 8 "Barbara" "Boxer" 8 "Ron" "Wyden" 7 "Daniel" "Akaka" 7 "Christopher" "Dodd" 6 "Olympia" "Snowe" 6 "John" "Reed" 6 "Carl" "Levin" 6 "Russell" "Feingold" 6 "Wayne" "Gilchrest" 6 "John" "McCain" 6 "Joseph" "Biden" 5
If I want to see who usually co-sponsors bills sponsored by Olympia Snowe, I would run this query:
START n=node:subjects('subject:Whaling'), z=node:congress('lastName:Snowe') MATCH n-[r:REFLECTS]-c-[s:COSPONSORS]-x WHERE c-[:SPONSORS]-z RETURN x.firstname, x.lastname, count(x) As popCoSponsors ORDER BY popCoSponsors DESC LIMIT 10
The output for this is:
x.firstname x.lastname popCoSponsors "Ron" "Wyden" 4 "Christopher" "Dodd" 4 "John" "Reed" 4 "Daniel" "Akaka" 4 "Susan" "Collins" 4 "Joseph" "Biden" 4 "Edward" "Kennedy" 4 "Barbara" "Boxer" 4 "John" "McCain" 4 "Joseph" "Lieberman" 4
Paths between Congressmen
To find the shortest path between two congressmen, we can run this query:
START ryanvp = node(6233), obama = node(6323) MATCH p=shortestPath(ryanvp-[*..10]-obama) return p;
This returns the following path:
Obama -> Bill SR 97 110 Session -> Topic of Congressional Tributes -> HR716 112 Session -> Paul Ryan