We got a nice find off the Bureau of Land Management which has a system called LR2000 to give a public interface for its land management records. It’s very much a work in progress and seems ripe for a mash up with private records like Google Earth and OpenStreetMap though Google Earth’s pro product might already have the database. Something to explore going forward.
Governments work in the real world which means that once Citizen Intelligence picks all the low hanging fruit of already published government data streams, it’s going to have to go out into the real world to both generate new data and confirm the truth of data already published. The new GPS III system should be just about deployed by then. The first two satellites were deployed in 2013. For Citizen Intelligence purposes, 1 yard accuracy is perfectly fine for the purposes needed and the increased power the GPS III will increase the utility of the signal in urban areas.
I just ran through a quick data survey. How many participants in data.gov are actually complying with the Obama administration’s open data initiative directive to publish all the public data sets at <site name>/data.json. It turns out the answer is about 16%. It was a quick data survey because I could quickly setup a web viewer with that url and just iterate through a couple of hundred data.gov participants and mark the entries that came up with json. It was about a third of the time necessary for the first run through.
I had theorized that as data is filled in, subsequent runs grow faster. It’s nice to see that in practice and that the effect is pretty large. I can see things working even faster in future as I move from semi-automation to full automation.
Today is mostly data entry, building up data structures, and re-estimated the cost to implement the non-money making portions of Citizen Intelligence downward. That makes it a good day. About a fourth of data.gov sites seem to have APIs so far but no doubt that’s going to rise as time goes on. It’s all thanks to President Obama’s Open Government Initiative. Hats off on this one because over the next decade or so we’re likely going to be able to settle a number of policy arguments that the left and right have been going back and forth on for decades about.
Running through this interesting Popular Mechanics slideshow on old infrastructure I was struck by the note that a water pipe built in 1895 with a 100-120 year life expectancy burst a few weeks ago and they are investigating the cause of the break. Doing the math this is 119 years, that water pipe has been on duty. It was time. In a well planned government, as infrastructure gets near its expected lifespan, you would expect that taxes would start being set aside for replacement so that when the inevitable happens the government wouldn’t have to go into debt to pay for the replacement, costing the taxpayers extra money. While some of that does, in fact, go on, borrowing to pay for infrastructure replacement is pretty common. Personalize this and people get interested.
Part of the Citizen Intelligence ecosystem will be convincing governments to publish their infrastructure inventories online as a matter of routine. It’s just the right thing to do. It’s not that expensive to do it, and you get all sorts of downstream uses for that data like lowering the overall cost of government by taxing to pay for infrastructure instead of constantly going to the bond market for utterly predictable expenses.
This is the start of a regular feature, Dataset Monday. Every monday we’re going to be looking at publicly available datasets and speculating on applications that could be created from them.
First up Data.gov, a topic that will be returned to frequently. The site is advertised as the following on the White House website. “Data.gov is a citizen-friendly platform that provides access to Federal datasets. With a searchable data catalog, Data.gov helps the public find, access, and download non-sensitive Government data and tools in a variety of formats.” If you have the time, the final Concept of Operations is worth a read.
Data.gov was launched in May of 2009. As of January 2014, it can honestly be said that it has yet to fulfill its sales pitch. Almost 5 years into its existence and it demonstrably does not contain all the datasets that the government releases. It has 225 entities represented its list of organizations. One hundred eighty three are federal, of which 2 appear to be duplicates of other organizations. Nineteen state organizations including a municipality are included. Two interstate compacts are there as is an indian tribe. Thirteen public university entities are listed as are three private ones. One organization (Arctic Landscape Conservation Cooperative) defies categorization, and one organization is actually labeled Test Organization 1.
Not only does it not contain all the datasets that the federal government releases, it doesn’t contain all the datasets that the entities listed on the site make publicly available. For instance, the Bureau of Labor Statistics has its CE series listed on data.gov. This is a useful series and one of 34 listed as available to the public. But the server that contains that CE series listed on data.gov contains 58 entries, 24 more than are listed for the entire entity. Clearly there is a lot of room for Citizen Intelligence to step in and improve things with some very low hanging fruit. Finding the old style repositories for each data.gov listing could yield multiple additional non-listed datasets.