Big Data is as important to the U.S. economy as agricultural products, according to the team behind a new report about how the federal government can better use huge data collections. The report, from the TechAmerica Foundation, was released Wednesday.
Entitled Demystifying Big Data: A Practical Guide to Transforming the Business of Government, the report's aims are to define Big Data and provide policy guidance to the federal government.
The Five V's
In March, the Obama administration announced a new Big Data initiative, with more than $200 million in projects at six agencies designed to advance the technologies and develop the required workforce . Projects include a National Institutes of Health effort to make human genome data more accessible to the public.
Big Data, the report noted, is either structured information in relational databases, or unstructured information, such as e-mail, video, blogs, call-center conversations, or social media. Unstructured data currently constitutes about 85 percent of the data generated today and, the report noted, poses "challenges in deriving meaning with conventional business intelligence tools."
The characteristics of Big Data are its volume, velocity, variety and veracity. The volume is being driven by the increase in data sources and higher resolution sensors, and the velocity -- how fast data is being produced, changed, and processed -- is being driven by improved throughput connectivity and enhanced computing power of data generating deivces, as well as more data sources.
Variety, created by new sources and sources inside and outside the organization, is being pushed by social media, sensors, the rise of mobile and other factors. And veracity, or the quality of the data, is a key requirement of data-based decisions.
Values of Big Data
The report pointed to possible values of Big Data analysis for the federal government, including determining the most effective medical outcomes across large populations, analysis of health anomalies in the hospital or home through sensors, new levels of real-time traffic information, a better understanding of the most effective online learning techniques, better fraud detection, or improved weather predictions.
The report recommended that IT structures at the agencies evolve into massively scalable storage and network infrastructure designs, with planning for data protection, data sharing, data reuse, ongoing analysis, compliance, security and privacy issues, data retention and data availability. (continued...)
Posted: 2012-10-04 @ 7:14am PT
Enjoyed the piece Barry. Great to see the industry finally adopting the "Vs" of Big Data that Gartner first introduced over 12 years ago. For future attribution, here's a link to the piece I wrote first publicly defining them in 2001: http://blogs.gartner.com/doug-laney/deja-vvvue-others-claiming-gartners-volume-velocity-variety-construct-for-big-data/. Note that you mention 5Vs, but only discuss 4. ??? Anyway, we contend veracity isn't a defining characteristic of Big Data, just an aggregate measure of quality. Note that Gartner has now suggested and published on 12 dimensions of data. --Doug Laney, VP Research, Gartner, @doug_laney