Intelliwareness - Blog on Big Data, Data Analytics and Other IT

Java SSL Certificate

This post is meant to remind me on how to implement SSL certificates within Java. It was definitely a learning experience digging into trust stores and keystores.

Installation of client certificates in a Java client environment

This section describes the steps required to install the provided certificates in a Java client environment. In general you will create a new Java keystore and truststore using the files and password we have provided. Here are the steps to follow:

1. Make sure you have access to a Java 6 installation. You only need this for the keytool utility. The files you create with Java 6 are fully compatible with Java 5 but the keytool utility in Java 5 does not support importing PKCS #12 files.
2. Import the PKCS #12 file provided into a new keystore by issuing the following command: (Use the CLEAR Administrator provided password for all password prompts)
keytool -importkeystore -v -srckeystore clientcert.p12 -srcstoretype PKCS12 –keystore newstore.ks
3. Next create a truststore that includes the CA certificate: (You can select you own password)
keytool -import -v -keystore newtrust.ks -file cacertfile.pem

4. Finally use the Java system properties when running your client to ensure that the proper certificate is selected during SSL negotiation. The properties are:
-Djavax.net.ssl.keyStore=newstore.ks \
-Djavax.net.ssl.keyStorePassword= \
-Djavax.net.ssl.trustStore=newtrust.ks \
-Djavax.net.ssl.trustStorePassword=

For keytool commands, I referred to this site: http://www.sslshopper.com/article-most-common-java-keytool-keystore-commands.html

A good site for troubleshooting is: http://confluence.atlassian.com/display/JIRA/Connecting+to+SSL+services

I ended up using the SSLPoke.java file on the atlassian site to help troubleshoot the SSL connection. This really helped understand connection issues.

Sample code within Palantir

Within Palantir, I was able to use the following code to successfully connect to the SSL endpoint.

			String string = "";
			StringBuffer sb = new StringBuffer();
			sb.append("");
			String strGetURL = strURL;
			try {
		        KeyManagerFactory keyManagerFactory = KeyManagerFactory.getInstance(KeyManagerFactory.getDefaultAlgorithm());
		        KeyStore keyStore = KeyStore.getInstance(KeyStore.getDefaultType());
		        InputStream keyInput = this.getClass().getResourceAsStream("/newstore.ks");
		        keyStore.load(keyInput, "certificatepwd".toCharArray());
		        keyInput.close();
		        keyManagerFactory.init(keyStore, "certificatepwd".toCharArray());

		        TrustManagerFactory trustManagerFactory = TrustManagerFactory.getInstance(TrustManagerFactory.getDefaultAlgorithm());
		        KeyStore trustStore = KeyStore.getInstance(KeyStore.getDefaultType());
		        InputStream trustInput = this.getClass().getResourceAsStream("/newtrust.ks");
		        trustStore.load(trustInput, "certificatepwd".toCharArray());
		        trustInput.close();
		        trustManagerFactory.init(trustStore);

		        SSLContext sct = SSLContext.getInstance("SSL");
		        sct.init(keyManagerFactory.getKeyManagers(), trustManagerFactory.getTrustManagers(), new SecureRandom());
		        SSLContext.setDefault(sct);
		        
		        SSLSocketFactory sslsocketfactory = sct.getSocketFactory();
//		        SSLSocket socket = (SSLSocket)factory.createSocket(host, port);

		        
//		        SSLSocketFactory sslsocketfactory = (SSLSocketFactory) SSLSocketFactory.getDefault();				
				String username="username:password";
				String encoding = new sun.misc.BASE64Encoder().encode (username.getBytes());
				
				URL url = new URL(strGetURL);

				HttpsURLConnection conn = (HttpsURLConnection)url.openConnection();
				conn.setRequestProperty ("Authorization", "Basic " + encoding);
				conn.setRequestProperty ( "Content-Type", "application/xml" );
				conn.setRequestMethod("GET"); 
				conn.setDoOutput(true);
				conn.setSSLSocketFactory(sslsocketfactory);

				InputStream inputstream = conn.getInputStream();
				InputStreamReader inputstreamreader = new InputStreamReader(inputstream);
				BufferedReader bufferedreader = new BufferedReader(inputstreamreader);

				string = null;
				while ((string = bufferedreader.readLine()) != null) {
//					System.out.println("Received " + string);
					sb.append(string);
				}
			} catch (Exception exception) {
				exception.printStackTrace();
			}
			return sb.toString();
		}

As I mentioned earlier, this is mostly for my usage for future deployments. If someone else finds it useful, I’m glad that you were helped.

FEC Data – Further Analysis

Post author By dave fauth
Post date November 3, 2011
Categories In Uncategorized
No Comments on FEC Data – Further Analysis

In the previous post we showed how Federal Election Commission data could be loaded into Neo4J and manipulated using Gremlin. In this follow-up posting, we’ll modify the data structure and do some further analysis of the data.

The FEC Data Graph
The FEC data is represented in the following graph. Each committee supports a candidate. Some candidates may be independent from a committee. Individuals contribute 1 or more times to a committee. For this demonstration, we’ve haven’t separated out city/state/zip and created a common location.

A couple of notes on the data. Some of the committees did not have a treasurer so I added in a value of “No Treasurer”. Some of the candidates were referenced to non-existent committees. In this case, I’ve created entries for those committees in order to load the data and create the links. Additionally, the individual contribution file has overpunch characters to different amounts or negative amounts. Those values were adjusted in the database so the data could be loaded as an integer value.

In this design, we see that a contributor (individual making a contribution) can make several contributions over time. These contributions are given to a committee in support of a candidate. Additionally, we’ve added in an additional data set that is a summary of all contributions where detailed donors reporting is not required because they have not given more than $200.

Just to give people an idea of the volume of contributions, in September when I downloaded the data, there were 437,726 contributions. When I downloaded the latest file on November 3, there were now 598,306 contributions. That’s about a 36 percent increase.

Let’s look at a candidate
We’ll use the gremlin language to perform some graph manipulation to analyze giving to some candidates. For this case, I’ve decided to look at Mr. Herman Cain.

 v = g.idx(T.v)[[candName:'CAIN, HERMAN']] >> 1
==>v[24238]
gremlin> v.map()
==>candZip=30281
==>candState=GA
==>candId=P00003608
==>candStatus=N
==>type=Candidate
==>candParty=REP
==>candYear=12
==>candCity=STOCKBRIDGE
==>candDistrict=00
==>candName=CAIN, HERMAN

Using the Gremlin pipes (for a great description, see this post). To check on the committee supporting Herman Cain, we run this command:

gremlin>  v.inE('supports').outV.name[0..20]
==>FRIENDS OF HERMAN CAIN INC
gremlin> v=g.idx(T.v)[[name:'FRIENDS OF HERMAN CAIN INC']] >> 1
==>v[8875]
gremlin> v.map()
==>zip=30281
==>treasurer=MARK J BLOCK
==>name=FRIENDS OF HERMAN CAIN INC
==>state=GA
==>party=REP
==>type=Committee
==>city=STOCKBRIDGE
==>committeeId=C00496067

In this instance, there is only a single committee, FRIENDS OF HERMAN CAIN INC, that is supporting Herman Cain.

Our pipe looks like this:

Campaign Contributions
Let’s take a look and see how many campaign contributions have been made to Herman Cain. The data is as of mid-September. I haven’t downloaded an updated data set.

gremlin> v.outE('receives').count()
==>1201

We see that the FRIENDS OF HERMAN CAIN INC have received 1201 individual contributions. The average contribution is determined using the following Gremlin command:

gremlin> v.outE('receives').inV.amount.mean()
==>625.4771024146545

In the next analysis, we’ll use some filter steps to remove objects from the flow of computation. In this example, we see that there are three contributions above $4500.

gremlin>  v.out('receives').filter{it.amount>4500}.amount
==>5000
==>5000
==>5000

To see who the people are, we’ll use a more complicated pipe that starts with the committee, filters out the contributions that are greater than $4500 and then passes those results as input to find out who made those contributions.

gremlin> v.out('receives').filter{it.amount>4500}.inE('makes').outV.contName[0..10]
==>Fox, Saul
==>Weidner, William
==>Jones, William
gremlin>

Our next analysis will be to see who is contributing multiple times to the Cain campaign. We use the following code to see who has contributed. In order to remove the reflexive path, we add in the filter. Without the filter, we would see double the amount of contributions.

gremlin> m=[:]
gremlin>  v.out('receives').inE('makes').outV.filter{it !=v}.contName.groupCount
(m) >> -1
==>null
gremlin>  m.sort{a,b -> b.value <=> a.value}[0..39]
==>Waddle, Julie=6
==>Rogers, Michael=5
==>Anderson, Neil=4
==>Watkins, Walter=4
==>Tribble, James=4
==>Burton, James=4
==>laseau, mary=4
==>Russell, Daniel=4
==>Harris, Dudney L.=4
==>Fox, Saul=3
==>Weidner, William=3
==>Ratliff, Robert=3
==>Ellis, Marty=3
==>Bucciero, Kimberly=3
==>Kincaid, Elizabeth=3
==>Harkins, Gerry=3
==>Adams, Archie=3
==>Frankovitch, Joseph=3
==>Holten, James=3
==>Lindenfeld, Malaise=3
==>Richardson, Scott=3
==>Irvin, David L=3
==>Ward, Thomas=3
==>Buchanan, douglas=3
==>samuels, philip=3
==>Thompson, James=3
==>Koch, Tina=3
==>clements, john=3
==>Fowler, Jan=2
==>Blackwell, Diane=2
==>Koch, Richard=2
==>Gingrich, William=2
==>Robson, Roger=2
==>Anderson Jr, Taz=2
==>Hatfield, Edward=2
==>Ramey, Valerie=2
==>Parham, Charles=2
==>Shaw, Terry=2
==>Eidson, Robert=2
==>Keown, Karie=2
gremlin>

Who is Julie Waddle? Using additional commands, we find out she is a homemaker from Herman Cain’s hometown.

gremlin>   v = g.idx(T.v)[[contName:'Waddle, Julie']] >> 1
==>v[240805]
gremlin> v.map()
==>contState=GA
==>contCity=Woodstock
==>contName=Waddle, Julie
==>contOccupation=n/a/Homemaker
==>contZip=30188
==>contributorID=190665
==>type=Contributor
gremlin>

To show the contributions, we will run this command:

gremlin> v.outE('makes').inV.amount
==>500
==>300
==>300
==>400
==>250
==>250
gremlin>

Let’s do a little more digging and see what the top reported occupations where that contributed to the Cain campaign. We’ll use the following command and get the following results which show retirees, homemakers, physicians and lawyers are the top contributors:

gremlin> m=[:]
gremlin> v.out('receives').inE('makes').outV.filter{it !=v}.contOccupation.groupCount(m) >> -1
==>null
gremlin>  m.sort{a,b -> b.value <=> a.value}[0..39]
==>Retired/Retired=122
==>Retired=30
==>n/a/Homemaker=27
==>self/Attorney=11
==>self/Physician=9
==>n/a/Retired=8
==>self/self=7
==>none/Retired=7
==>self/Real Estate=7
==>Technical Director Custoemr Enginee=5
==>self/Sales=5
==>none/Homemaker=4
==>ApolloMD/Physician=4
==>self/investor=4
==>University of West GA/Professor=4
==>self/Human Resources=4
==>self/Dentist=4
==>Fox Paine & Co./Chief Exec.=3
==>GGAM LLC./CEO=3
==>self/small business owner=3
==>Homemaker=3
==>Self/Owner/Concert Merchandise Comp=3
==>Goodman Networks/Project Manger=3
==>James C Kincaid DDS/Secretary=3
==>Hybrid Concrete Structures/Construc=3
==>self employed/Consultant=3
==>self/BUSINESS OWNER=3
==>Kingsley Associates/Database Admini=3
==>Holten Meat Inc/CEO=3
==>self/Retired=3
==>n/a/BUSINESS OWNER=3
==>North  Georgia Foods Inc/Business O=3
==>renze display/President=3
==>universal sewing supply/Executive=3
==>GA Solar Lighting/contractor=3
==>Teradata Corp./VP=3
==>TRG Inc./President=3
==>self/Consultant=3
==>Amsell LLC/Sales=2
==>Smith Gambrell & Russell/Legal Secr=2
gremlin>

Next Steps
The next steps will be to reload the graph with updated data and look at different groupings of data (occupation, location, time series, etc).

Federal Election Commission Campaign Data Analysis

Post author By dave fauth
Post date October 14, 2011
Categories In Uncategorized
No Comments on Federal Election Commission Campaign Data Analysis

This post is inspired by Marko Rodriguez’ excellent post on a Graph-Based Movie Recommendation engine. I will use many of the same concepts that he describes in his post in order to load the data into Neo4J and then begin to analyze the data. This post will focus on the data loading. Follow-on posts will look at further analysis based on the relationships.

Background
The Federal Election Commission has made campaign contribution data publicly available for download here. The FEC has provided campaign finance maps on its home page. The Sunlight Foundation has created the Influence Explorer to provide similar analysis.

This post and follow-on posts will look at analyzing the Campaign Data using the graph database Neo4j, and the graph traversal language Gremlin. This post will go about showing the data preparation, the data modeling and then loading into Neo4J.

The FEC Data
The FEC data is available for download from the FEC website via FTP. It is composed of three main files which are the Campaign Committees, Campaign Candidates and the Individual Contributors. As of this post, there were approximately 10,875 committees, 3,600 candidates, and 455,000 unique contributions. Each of the data sets has a data description as well as frequency counts. The 2011-2012 data can be found here.

Gremlin and Neo4J
Gremlin 1.3 is available for download at this location. Neo4J 1.5M01 is available for download at this location. For this demonstration, we will be running the community edition of Neo4J in a Windows Virtual Machine.

Data Preparation
The FEC data is in formatted, fixed-length fields. This makes it a little bit harder to prepare for import into Neo4J with my limited skills and abilities. To work around that, I was able to load the data into Oracle using SQL Loader and then I wrote a simple PHP program to query the database and format the data into a delimited file. If interested in those files, feel free to contact me.

Loading Data
The data will be inserted into the graph database Neo4j. The Gremlin/Groovy code below creates a new Neo4j graph, removes an unneeded default edge index, and sets the transaction buffer to 2500 mutations per commit.

g = new Neo4jGraph('/tmp/FEC')
g.dropIndex("edges")
g.setMaxBufferSize(2500)

Loading Committee Data
The committee data contains information about the different election committees. In our case, it has seven columns.

C00000059::HALLMARK CARDS PAC::UNK::KANSAS CITY::MO::64108::GREG SWARENS
C00000422::AMERICAN MEDICAL ASSOCIATION POLITICAL ACTION COMMITTEE::::WASHINGTON::DC::20001::WALKER, KEVIN
C00000489::D R I V E POLITICAL FUND, TEAMSTERS LOCAL UNION CHAPTER 886::::OKLAHOMA CITY::OK::73107::TOM RITTER
C00000547::KANSAS MEDICAL SOCIETY POLITICAL ACTION COMMITTEE::UNK::TOPEKA::KS::66612::C. RICHARD BONEBRAKE, M.D.
C00000638::INDIANA STATE MEDICAL ASSOCIATION POLITICAL ACTION COMMITTEE::::INDIANAPOLIS::IN::46202::KORA, M.D., VIDYA
C00000729::AMERICAN DENTAL ASSOCIATION POLITICAL ACTION COMMITTEE::UNK::WASHINGTON::DC::20005::CONNOR, FRANCIS DR

The code needed to parse this data is below:

new File('committee.dat').eachLine {def line ->
  def components = line.split('::');
  def committeeVertex = g.addVertex(['type':'Committee','committeeId':components[0], 'name':components[1], 'party':components[2],'city':components[3],'state':components[4],'zip':components[5],'treasurer':components[6]]);
}

Parsing Candidate Data
The candidate data contains information about the various candidates. In our case, it has nine columns. A sample of the data is below:

H0AL00016::BOZEMAN, MARTHA RENEE::BIRMINGHAM::AL::35201::UNK::P::10::07
H0AL01030::GOUNARES, PETER HUNTER::ORANGE BEACH::AL::36561::REP::P::10::01
H0AL01048::WALTER, DAVID MARSH::FOLEY::AL::36535::CON::P::10::01
H0AL02087::ROBY, MARTHA::MONTGOMERY::AL::36106::REP::C::12::02
H0AL05049::CRAMER, ROBERT E "BUD" JR::HUNTSVILLE::AL::35804::DEM::P::08::05
H0AL05155::PHILLIP, LESTER S::MADISON::AL::35758::REP::P::10::05
H0AL05163::BROOKS, MO::HUNTSVILLE::AL::35802::REP::C::12::05
H0AL05189::SHEPARD, TAZEWELL::HUNTSVILLE::AL::35801::DEM::P::10::05
H0AL05197::RABY, STEPHEN WALKER::TOREY::AL::35773::DEM::P::10::05
H0AL06088::COOKE, STANLEY KYLE::KIMBERLY::AL::35091::REP::P::10::06
H0AL06096::LAMBERT, PAUL ANTHONY::MAYLENE::AL::35114::REP::N::10::06

The code to parse the candidate file is:

new File('candidate.dat').eachLine {def line ->
  def components = line.split('::');
  def candVertex = g.addVertex(['type':'Candidate','candId':components[0], 'candName':components[2], 'candCity':components[3], 'candState':components[4],'candZip':components[5],'candParty':components[6],'candStatus':components[7],'candYear':components[8],'candDistrict':components[9]]);
  def supportedEdge = g.addEdge(g.idx(T.v)[[committeeId:components[1]]].next(), candVertex, 'supports');
}

Loading the Individual Contributors File
The individual contributors file contains all of the contributions made to different committees.

The sample data is:

C00000422::0009951::Helm, Douglas Alan MD::PERINATAL ASSOCIATES/Physician::Fresno::CA::93701::01::11::11::20::0000500::M2
C00000422::0009952::Karasek, Dennis Edward MD::SELF-EMPLOYED/Physician::San Antonio::TX::78231::01::11::11::20::0002000::M2
C00000422::0009953::Kilgore, Shannon M MD::VA PALO ALTO HCS/Physician::Palo Alto::CA::94304::01::11::11::20::0000500::M2
C00000422::0009954::Matthews, George Philip MD::VISION QUEST/Physician::Arlington::TX::76006::01::11::11::20::0000500::M2
C00000422::0009955::Kimball, Daniel B Jr. MD::N/A/Retired Physician::Reading::PA::19611::01::15::11::20::0001000::M2
C00000422::0009956::Mehling, Brian Macdermott MD::MEHLING ORTHOPAEDIC/Physician::West Islip::NY::11795::01::14::11::20::0000291::M2

Given that there are about a half a million contributors, parsing this data and loading will take a couple of minutes.

new File('indiv.dat').eachLine {def line ->
  def components = line.split('::');
  def indivVertex = g.addVertex(['type':'Individual','indivId':components[1], 'indivName':components[2], 'indivOccupation':components[3],'indivCity':components[4], 'indivState':components[5],'indivZip':components[6],'transDate':components[7] + components[8] +components[9],'amount':components[11],'transactionType':components[12]]);
}

To commit any data left over in the transaction buffer, successfully stop the current transaction. Now the data is persisted to disk. If you plan on leaving the Gremlin console, be sure to g.shutdown() the graph first.

g.stopTransaction(TransactionalGraph.Conclusion.SUCCESS)

Validating the Data

gremlin> g.V.count()
==>462915
gremlin> g.E.count()
==>441262
gremlin> g.V[[type:'Committee']].count()
==>21653
gremlin> g.V[[type:'Candidate']].count()
==>3536
gremlin> g.V[[type:'Individual']].count()
==>437726

Let’s look at some distributions
What is the distribution of contributions among states?

gremlin> g.V[[type:'Individual']].indivState.groupCount(m) >> -1
==>null
gremlin> m.sort{a,b -> b.value<=>a.value}
==>CA=52767
==>NY=34742
==>TX=34521
==>FL=25758
==>VA=22660
==>IL=19075
==>PA=15494
==>MA=14134
==>DC=14108
==>OH=13425
==>MI=11938
==>MD=11647
==>NJ=11523
==>CT=11165
==>WA=9410
==>GA=9195
==>MN=8339
==>TN=8112
==>MO=7738
==>NC=7380
==>AZ=6921
==>CO=6876
==>IN=6529
==>WI=6002
==>LA=5020
==>NV=4118
==>NM=3956
==>KS=3862
==>OR=3758
==>IA=3593
==>AL=3383
==>SC=3310
==>OK=3227
==>KY=3218
==>NE=3147
==>MT=2417
==>UT=2388
==>AR=2269
==>NH=2051
==>MS=1866
==>RI=1840
==>HI=1629
==>ME=1605
==>ND=1578
==>WV=1492
==>SD=1442
==>DE=1288
==>VT=1133
==>ID=980
==>AK=965
==>WY=920
==>=800
==>PR=611
==>VI=160
==>ZZ=141
==>GU=83
==>MP=14
==>AS=3
gremlin>

What about the average contribution?

gremlin> g.V[[type:'Individual']].amount.mean()
==>1138.58

Are there any treasurers supporting multiple committees?

gremlin> m=[:]
gremlin> g.V[[type:'Committee']].treasurer.groupCount(m) >> -1
==>null
gremlin>  m.sort{a,b -> b.value <=> a.value}[0..19]
==>No Chair=1716
==>LAROSE, JOSUE=122
==>DURKEE, KINDE=90
==>KINDE DURKEE=84
==>LISA LISKER=80
==>JUDITH ZAMORE=68
==>KEITH A DAVIS=66
==>LISKER, LISA=62
==>CUSHMAN, NANCY=56
==>NANCY H WATKINS=56
==>NO TREASURER=43
==>CABELL HOBBS=40
==>KILGORE, PAUL=34
==>KELLY LAWLER=34
==>PAUL KILGORE=32
==>KELLEY, MEREDITH=30
==>WATKINS, NANCY H.=30
==>MACKENZIE, SCOTT B=28
==>ADRIANE RUMMEL=28
==>BAUER, DAVID=22

No chair and no treasurer indicate that the treasurer value was empty. However, there are several treasurers supporting multiple committees.

Next Steps
The next steps will be to look at some of the relationships between contributors and committees and see if there are treasurers serving on multiple committees.

Additionally, because each contribution is counted individually, there are several duplicate donors/campaign contributors. In order to address that, I will separate out the donors and their address as a separate table and link them to the contributions.

If you have questions about this post, feel free to email me.

i2 Report File – Palantir Plugin (Update)

Post author By dave fauth
Post date October 14, 2011
Categories In Uncategorized
No Comments on i2 Report File – Palantir Plugin (Update)

Since the initial posting, I’ve made some updates to the Palantir import helper allows a user to select the report file and then import the file.

Once the user clicks on Import, a list of i2 types are presented to the user (both links and entities). The user can map each of the i2 types to Palantir objects or links.

Finally a summary of the number of entities and links processed are presented to the user. The entities and links are added to the chart.

i2 Report File – Palantir Plugin

Post author By dave fauth
Post date March 23, 2011
Categories In Uncategorized
No Comments on i2 Report File – Palantir Plugin

i2 ANB allows users to export chart information about entities, links, attributes and cards to a report. This is useful if you want to create a report containing the information in all or part of your chart. This report is created as a text file which can then be used in other applications.

i2 ANB allows users to define the items you want to include in your report using a report specification. A report specification is a series of settings which tell ANB what kind of report to create and what you want to include in it. Report specifications enable you to define the items, content and destination of our report.

For our useage, we’ve modified a default report template. For i2 ANB entities, we want access to the entity type, identity, label, description, date and the attributes. For the attributes, we are printing out the attribute name and attribute value. We use tabs between the attribute name and value. We also use ]] as a delimiter between attribute name and value.

For links, we print out the link type, label, link 1 and link 2 as well as the Ends. We’ve added link1 and link2 values as it isn’t always possible to parse the Ends value properly.

The report configuration are shown in the following 2 screen shots.

As shown in below, the Palantir import helper allows a user to select the report file and then import the file. A summary of the number of entities and links processed are presented to the user. The entities and links are added to the chart. Users can customize the mappings between i2 entity types and palantir entity types by modifying an XML file that is part of the Palantir helpers.

Here’s the final import into a Palantir graph.

Questions about the Palantir helper can be sent to dsfauth_at_gmail.com

Quick Links on APIs

Some quick links that have popped up over the last few days:

Your API Sucks: Why Developers Hang Up and How to Stop That An article from Apigee that talks how APIs don’t need to suck for developers.

Get free admission to Strata and a chance to showcase to investors Thanks to Pete Warden, here’s a way for big data startups a chance to get to Strata and in front of VCs.

Enterprise 2.0 RESTful APIs made easy with PHP FRAPIFRAPI is a high-level API framework that puts the “rest” back into RESTful. Use it to power your web apps, mobile services, and legacy systems.

Best Practices for API Development Recently tips from the founder of the Lokad API, a sales forecasting service, summarized some of her tips for API design.

Update to Government Big Data Forum 2011

Post author By dave fauth
Post date January 14, 2011
Categories In Uncategorized
No Comments on Update to Government Big Data Forum 2011

On Bob Gourley’s blog, he announced some updates to the speaker list. From Bob’s site:

[Update: Speakers include Dawn Meyerriecks of ODNI, Tim Schmidt of Department of Transportation, Kirit Amin of Department of State, Aaron Drew of DoD ].

Get on over to the event site and sign up.

2011 Data Conferences

A few notable conferences for 2011.

Government Big Data Forum 2011 – Big data is not only in the commercial space but is a challenge in the Federal Government. In what should be an interesting forum held in Washington, DC, panels include does ETL still work, de-duplication of data and sensemaking of data. – Held January 26, 2011

O’Reilly Strata Conference – Making Data Work
Big Data is here. Turning data into decisions. This will be held February 1-3, 2011 in Santa Clara, CA.

Glue Conference – As the “cloud” becomes a common platform, web applications still live in a “stovepipe” world. It’s not a question of “should we move to the cloud?” It’s a question of once some, or most, or all of our web applications live in the cloud, how do we handle the problems of scalability, security, identity, storage, integration and interoperability? What was the problem of “enterprise application integration” in the late 90s, is now the cambrian explosion of web-based applications that will demand similar levels of integration. The problem, put simply, is how to “glue” all of these apps, data, people, work-flows, and networks together. – Held May 25-26 in Broomfield, CO

Defrag Conference. November 9-10, 2011 in Broomfield, CO.

Short Links

Taking a page from Pete Warden, I’ve decided to start off with some short links. In between, I’ll mix it up with some longer posts, but the intent of the short links is to highlight interesting pages/links/sites that I’ve found over the past few days.

Government Big Data Forum 2011 – Big data is not only in the commercial space but is a challenge in the Federal Government. In what should be an interesting forum held in Washington, DC, panels include does ETL still work, de-duplication of data and sensemaking of data.

Social Network Visualization – A great collection of papers related to social network visualization from UC Davis. Social networks are visual in nature. Visualization techniques have been applied in social analysis since the field began. We aim to develop interactive visual analytic tools for complex social networks.

RIM In Talks to acquire Gist – As a Gist user and not a Blackberry user, I’m closely watching this news.

Data Scientists – As more and more data is made available, people are needed to make sense of it. Companies such as bit.ly, LinkedIn and Foursquare are hiring. If I was going back to school, this is a career that I would target.