For starters, Amazon will publish various U.S. Census databases from the U.S. Census Bureau, 3-D chemical structures provided by Indiana University and an annotated form of the Human Genome from Ensembl. By melding research data with storage services such as EC2, Amazon is likely to garner interest from academia and businesses that crunch these datasets. In the future, Amazon will publish a bevy of economic data sets.
Under Amazon's serviceâ€"called Public Data Sets on AWSâ€"the company will make the information available for free. A company or researcher could then analyze the data by mounting Amazon's EC2 service and pay for compute time.
As a practical matter, these datasets can be unwieldy and take days to download, says Adam Selipsky, vice president for product management and developer relations at AWS. Selipsky noted that Amazon will publish additional data sets, but the process takes time given that "the downloading literally takes days."
In a statement, Amazon served up the technical details:
Select public data sets are hosted on Amazon EC2 for free as an Amazon Elastic Block Store (Amazon EBS) snapshot. Amazon EC2 customers can access this data by creating their own personal Amazon EBS volume, using a public data set snapshot as a starting point. They can then access, modify and perform computation on these data sets directly using an Amazon EC2 instance and just pay for the compute and storage resources that they use. If available, researchers can also use pre-configured Amazon Machine Images (AMIs) with tools like Inquiry by BioTeam to perform their analysis.Here's the list of what's available today:
- A 3D Version of the PubChem Library provided by Rajarshi Guha at Indiana University;
- UGI Virtual Conformer Library provided by Rajarshi Guha at Indiana University;
- Various US Census Databases provided by The US Census Bureau (1980, 1990, 2000 and 2003-2006 economic data);
- Various Labor Statistics Databases provided by The Bureau of Labor Statistics.