Tags

, ,

This is the start of a regular feature, Dataset Monday. Every monday we’re going to be looking at publicly available datasets and speculating on applications that could be created from them.

First up Data.gov, a topic that will be returned to frequently. The site is advertised as the following on the White House website. “Data.gov is a citizen-friendly platform that provides access to Federal datasets. With a searchable data catalog, Data.gov helps the public find, access, and download non-sensitive Government data and tools in a variety of formats.” If you have the time, the final Concept of Operations is worth a read.

Data.gov was launched in May of 2009. As of January 2014, it can honestly be said that it has yet to fulfill its sales pitch. Almost 5 years into its existence and it demonstrably does not contain all the datasets that the government releases. It has 225 entities represented its list of organizations. One hundred eighty three are federal, of which 2 appear to be duplicates of other organizations. Nineteen state organizations including a municipality are included. Two interstate compacts are there as is an indian tribe. Thirteen public university entities are listed as are three private ones. One organization (Arctic Landscape Conservation Cooperative) defies categorization, and one organization is actually labeled Test Organization 1.

Not only does it not contain all the datasets that the federal government releases, it doesn’t contain all the datasets that the entities listed on the site make publicly available. For instance, the Bureau of Labor Statistics has its CE series listed on data.gov. This is a useful series and one of 34 listed as available to the public. But the server that contains that CE series listed on data.gov contains 58 entries, 24 more than are listed for the entire entity. Clearly there is a lot of room for Citizen Intelligence to step in and improve things with some very low hanging fruit. Finding the old style repositories for each data.gov listing could yield multiple additional non-listed datasets.

Advertisements