White House announces 200m in funding for big data research and development, hosts forum at AAAS

In 2012, making sense of big data through narrative and context, particularly unstructured data, is now a strategic imperative for leaders around the world, whether they serve in Washington, run media companies or trading floors in New York City or guide tech titans in Silicon Valley.

While big data carries the baggage of huge hype, the institutions of federal government are getting serious about its genuine promise. On Thursday morning, the Obama Administration announced a “Big Data Research and Development Initiative,” with more than $200 million in new commitments. (See fact sheet provided by the White House Office of Science and technology policy at the bottom of this post.)

“In the same way that past Federal investments in information-technology R&D led to dramatic advances in supercomputing and the creation of the Internet, the initiative we are launching today promises to transform our ability to use Big Data for scientific discovery, environmental and biomedical research, education, and national security,” said Dr. John P. Holdren, Assistant to the President and Director of the White House Office of Science and Technology Policy, in a prepared statement.

The research and development effort will focus on advancing “state-of-the-art core technologies” need for big data, harnessing said technologies “to accelerate the pace of discovery in science and engineering, strengthen our national security, and transform teaching and learning,” and “expand the workforce needed to develop and use Big Data technologies.”

In other words, the nation’s major research institutions will focus on improving available technology to collect and use big data, apply them to science and national security, and look for ways to train more data scientists.

“IBM views Big Data as organizations’ most valuable natural resource, and the ability to use technology to understand it holds enormous promise for society at large,” said David McQueeney, vice president of software, IBM Research, in a statement. “The Administration’s work to advance research and funding of big data projects, in partnership with the private sector, will help federal agencies accelerate innovations in science, engineering, education, business and government.”

While $200 million dollars is a relatively small amount of funding, particularly in the context of the federal budget or as compared to investments that are (probably) being made by Google or other major tech players, specific support for training and subsequent application of big data within federal government is important and sorely needed. The job market for data scientists in the private sector is so hot that government may well need to build up its own internal expertise, much in the same way Living Social is training coders at the Hungry Academy.

Big data is a big deal,” blogged Tom Kalil, deputy director for policy at White House OSTP, at the White House blog this morning.

We also want to challenge industry, research universities, and non-profits to join with the Administration to make the most of the opportunities created by Big Data. Clearly, the government can’t do this on its own. We need what the President calls an “all hands on deck” effort.

Some companies are already sponsoring Big Data-related competitions, and providing funding for university research. Universities are beginning to create new courses—and entire courses of study—to prepare the next generation of “data scientists.” Organizations like Data Without Borders are helping non-profits by providing pro bono data collection, analysis, and visualization. OSTP would be very interested in supporting the creation of a forum to highlight new public-private partnerships related to Big Data.

The White House is hosting a forum today in Washington to explore the challenges and opportunities of big data and discuss the investment. The event will be streamed online in live webcast from the headquarters of the AAAS in Washington, DC. I’ll be in attendance and sharing what I learn.

“Researchers in a growing number of fields are generating extremely large and complicated data sets, commonly referred to as ‘big data,'” reads the invitation to the event from the White House Office of Science and Technology Policy. “A wealth of information may be found within these sets, with enormous potential to shed light on some of the toughest and most pressing challenges facing the nation. To capitalize on this unprecedented opportunity — to extract insights, discover new patterns and make new connections across disciplines — we need better tools to access, store, search, visualize, and analyze these data.”

Speakers:

  • John Holdren, Assistant to the President and Director, White House Office of Science and Technology Policy
  • Subra Suresh, Director, National Science Foundation
  • Francis Collins, Director, National Institutes of Health
  • William Brinkman, Director, Department of Energy Office of Science

Panel discussion:

  • Moderator: Steve Lohr, New York Times, author of “Big Data’s Impact in the World
  • Alex Szalay, Johns Hopkins University
  • Lucila Ohno-Machado, UC San Diego
  • Daphne Koller, Stanford
  • James Manyika, McKinsey

What is big data?

Anyone planning for big data to use data for public good — or profit — through applied data science must know first understand what big data is.

On that count, turn to my colleague Edd Dumbill, who posted a useful definition last year on the O’Reilly Radar in his introduction to the big data landscape:

Big data is data that exceeds the processing capacity of conventional database systems. The data is too big, moves too fast, or doesn’t fit the strictures of your database architectures. To gain value from this data, you must choose an alternative way to process it.

The hot IT buzzword of 2012, big data has become viable as cost-effective approaches have emerged to tame the volume, velocity and variability of massive data. Within this data lie valuable patterns and information, previously hidden because of the amount of work required to extract them. To leading corporations, such as Walmart or Google, this power has been in reach for some time, but at fantastic cost. Today’s commodity hardware, cloud architectures and open source software bring big data processing into the reach of the less well-resourced. Big data processing is eminently feasible for even the small garage startups, who can cheaply rent server time in the cloud.

Teams of data scientists are increasingly leveraging a powerful, growing set of common tools, whether they’re employed by government technologists opening cities, developers driving a revolution in healthcare or hacks and hackers defining the practice of data journalism.

To learn more about the growing ecosystem of big data tools, watch my interview with Cloudera architect Doug Cutting, embedded below. @Cutting created Lucerne and led the Hadoop project at Yahoo before he joined Cloudera. Apache Hadoop is an open source framework that allows distributed applications based upon the MapReduce paradigm to run on immense clusters of commodity hardware, which in turn enables the processing of massive amounts of big data.

Details on the administration’s big data investments

A fact sheet released by the White House OSTP follows, verbatim:

National Science Foundation and the National Institutes of Health – Core Techniques and Technologies for Advancing Big Data Science & Engineering

“Big Data” is a new joint solicitation supported by the National Science Foundation (NSF) and the National Institutes of Health (NIH) that will advance the core scientific and technological means of managing, analyzing, visualizing, and extracting useful information from large and diverse data sets. This will accelerate scientific discovery and lead to new fields of inquiry that would otherwise not be possible. NIH is particularly interested in imaging, molecular, cellular, electrophysiological, chemical, behavioral, epidemiological, clinical, and other data sets related to health and disease.

National Science Foundation: In addition to funding the Big Data solicitation, and keeping with its focus on basic research, NSF is implementing a comprehensive, long-term strategy that includes new methods to derive knowledge from data; infrastructure to manage, curate, and serve data to communities; and new approaches to education and workforce development. Specifically, NSF is:

· Encouraging research universities to develop interdisciplinary graduate programs to prepare the next generation of data scientists and engineers;
· Funding a $10 million Expeditions in Computing project based at the University of California, Berkeley, that will integrate three powerful approaches for turning data into information – machine learning, cloud computing, and crowd sourcing;
· Providing the first round of grants to support “EarthCube” – a system that will allow geoscientists to access, analyze and share information about our planet;
Issuing a $2 million award for a research training group to support training for undergraduates to use graphical and visualization techniques for complex data.
Providing $1.4 million in support for a focused research group of statisticians and biologists to determine protein structures and biological pathways.
· Convening researchers across disciplines to determine how Big Data can transform teaching and learning.

Department of Defense – Data to Decisions: The Department of Defense (DoD) is “placing a big bet on big data” investing approximately $250 million annually (with $60 million available for new research projects) across the Military Departments in a series of programs that will:

*Harness and utilize massive data in new ways and bring together sensing, perception and decision support to make truly autonomous systems that can maneuver and make decisions on their own.
*Improve situational awareness to help warfighters and analysts and provide increased support to operations. The Department is seeking a 100-fold increase in the ability of analysts to extract information from texts in any language, and a similar increase in the number of objects, activities, and events that an analyst can observe.

To accelerate innovation in Big Data that meets these and other requirements, DoD will announce a series of open prize competitions over the next several months.

In addition, the Defense Advanced Research Projects Agency (DARPA) is beginning the XDATA program, which intends to invest approximately $25 million annually for four years to develop computational techniques and software tools for analyzing large volumes of data, both semi-structured (e.g., tabular, relational, categorical, meta-data) and unstructured (e.g., text documents, message traffic). Central challenges to be addressed include:

· Developing scalable algorithms for processing imperfect data in distributed data stores; and
· Creating effective human-computer interaction tools for facilitating rapidly customizable visual reasoning for diverse missions.

The XDATA program will support open source software toolkits to enable flexible software development for users to process large volumes of data in timelines commensurate with mission workflows of targeted defense applications.

National Institutes of Health – 1000 Genomes Project Data Available on Cloud: The National Institutes of Health is announcing that the world’s largest set of data on human genetic variation – produced by the international 1000 Genomes Project – is now freely available on the Amazon Web Services (AWS) cloud. At 200 terabytes – the equivalent of 16 million file cabinets filled with text, or more than 30,000 standard DVDs – the current 1000 Genomes Project data set is a prime example of big data, where data sets become so massive that few researchers have the computing power to make best use of them. AWS is storing the 1000 Genomes Project as a publically available data set for free and researchers only will pay for the computing services that they use.

Department of Energy – Scientific Discovery Through Advanced Computing: The Department of Energy will provide $25 million in funding to establish the Scalable Data Management, Analysis and Visualization (SDAV) Institute. Led by the Energy Department’s Lawrence Berkeley National Laboratory, the SDAV Institute will bring together the expertise of six national laboratories and seven universities to develop new tools to help scientists manage and visualize data on the Department’s supercomputers, which will further streamline the processes that lead to discoveries made by scientists using the Department’s research facilities. The need for these new tools has grown as the simulations running on the Department’s supercomputers have increased in size and complexity.

US Geological Survey – Big Data for Earth System Science: USGS is announcing the latest awardees for grants it issues through its John Wesley Powell Center for Analysis and Synthesis. The Center catalyzes innovative thinking in Earth system science by providing scientists a place and time for in-depth analysis, state-of-the-art computing capabilities, and collaborative tools invaluable for making sense of huge data sets. These Big Data projects will improve our understanding of issues such as species response to climate change, earthquake recurrence rates, and the next generation of ecological indicators.”

Further details about each department’s or agency’s commitments can be found at the following websites by 2 pm today:

NSF: http://www.nsf.gov/news/news_summ.jsp?cntn_id=123607
HHS/NIH: http://www.nih.gov/news/health/mar2012/nhgri-29.htm
DOE: http://science.energy.gov/news/
DOD: www.DefenseInnovationMarketplace.mil
DARPA: http://www.darpa.mil/NewsEvents/Releases/2012/03/29.aspx
USGS: http://powellcenter.usgs.gov

IBM infographic on big data

Big Data: The New Natural Resource

This post and headline have been updated as more information on the big data R&D initiative became available.

Celebrating science with the Geek in Chief at the White House Science Fair

Today in Washington, President Obama hosted the second annual White House Science Fair. Video of his comments is embedded below, along with a storify of exhibits and students from the day.

“The young people I met today, the young people behind me — you guys inspire me. It’s young people like you that make me so confident that America’s best days are still to come. When you work and study and excel at what you’re doing in math and science, when you compete in something like this, you’re not just trying to win a prize today. You’re getting America in shape to win the future. You’re making sure we have the best, smartest, most skilled workers in the world, so that the jobs and industries of tomorrow take root right here. You’re making sure we’ll always be home to the most creative entrepreneurs, the most advanced science labs and universities. You’re making sure America will win the race to the future.

So as an American, I’m proud of you. As your President, I think we need to make sure your success stories are happening all across our country.

And that’s why when I took office, I called for an all-hands-on-deck approach to science, math, technology and engineering. Let’s train more teachers. Let’s get more kids studying these subjects. Let’s make sure these fields get the respect and attention that they deserve.

Now, in a lot of ways, today is a celebration of the new. But the belief that we belong on the cutting edge of innovation — that’s an idea as old as America itself. I mean, we’re a nation of tinkerers and dreamers and believers in a better tomorrow. You think about our Founding Fathers — they were all out there doing experiments — and folks like Benjamin Franklin and Thomas Jefferson, they were constantly curious about the world around them and trying to figure out how can we help shape that environment so that people’s lives are better.

It’s in our DNA. We know that innovation has helped each generation pass down that basic American promise, which is no matter who you are, no matter where you come from, you can make it if you try. So there’s nothing more important than keeping that promise alive for the next generation. There’s no priority I have that’s higher than President — as President than this.

And I can’t think of a better way to spend a morning than with the young people who are here doing their part and creating some unbelievable stuff in the process. So I’m proud of you. I want you to keep up your good work.-President Barack Obama

 

 

Later in the day, Bill Nye, “The Science Guy,” Neil Tyson Degrasse and Tom Kalil participated in a live Twitter chat:

Beth Noveck on connecting the academy to open government R&D

Earlier this week, the White House convened an open government research and development summit at the National Archives. Columbia statistics professor Victoria Stodden captures some key themes from it at her blog, including smart disclosure of government data and open government at the VA. Stodden also documented the framing questions that federal CTO Aneesh Chopra asked for help answered from the academic community:

1. big data: how strengthen capacity to understand massive data?
2. new products: what constitutes high value data?
3. open platforms: what are the policy implications of enabling 3rd party apps?
4. international collaboration: what models translate to strengthen democracy internationally?
5. digital norms: what works and what doesn’t work in public engagement?

In the video below, former White House deputy CTO for open government, Beth Noveck, reflected on what the outcomes and results from the open government R&D summit at the end of the second day. If you’re interested in a report from one of the organizers, you’d be hard pressed to do any better.

The end of the beginning for open government?

The open government R&D summit has since come under criticism from one of its attendees, Expert Labs’ director of engagement Clay Johnson, for being formulaic, “self congratulatory” and not tackling the hard problems that face the country. He challenged the community to do better:

These events need to solicit public feedback from communities and organizations and we need to start telling the stories of Citizen X asked for Y to happen, we thought about it, produced it and the outcome was Z. This isn’t to say that these events aren’t helpful. It’s good to get the open government crowd together in the same room every once and awhile. But knowing the talents and brilliant minds in the room, and the energy that’s been put behind the Open Government Directive, I know we’re not tackling the problems that we could.

Noveck responded to his critique in a comment where she observed that “Hackathons don’t substitute for inviting researchers — who have never been addressed — to start studying what’s working and what’s not in order to free up people like you (and I hope me, too) to innovate and try great new experiments and to inform our work. But it’s not enough to have just the academics without the practitioners and vice versa.”

Justin Grimes, a Ph.D student who has been engaged in research in this space, was reflective after reading Johnson’s critique. “In the past few years, I’ve seen far more open gov events geared towards citizens, [developers], & industry than toward academics,” he tweeted. “Open gov is a new topic in academia; few people even know it’s out there; lot of potential there but we need more outreach. [The] purpose was to get more academics involved in conversation. Basically, government saying ‘Hey, look at our problems. Do research. Help us.'”

Johnson spoke with me earlier this year about what else he sees as the key trends of Gov 2.0 and open government, including transparency as infrastructure, smarter citizenship and better platforms. Given the focus he has put on doing, vs researching or, say, “blogging about it,” it will be interesting to see what comes out of Johnson and Expert Labs next.