Last month, I traveled to Moldova to speak at a “smart society” summit hosted by the Moldovan national e-government center and the World Bank. I talked about what I’ve been seeing and reporting on around the world and some broad principles for “smart government.” It was one of the first keynote talks I’ve ever given and, from what I gather, it went well: the Moldovan government asked me to give a reprise to their cabinet and prime minister the next day.
I’ve embedded the entirety of the morning session above, including my talk (which is about half an hour long). I was preceded by professor Beth Noveck, the former deputy CTO for open government at The White House. If you watch the entire program, you’ll hear from:
Victor Bodiu, General Secretary, Government of the Republic of Moldova, National Coordinator, Governance e-Transformation Agenda
Dona Scola, Deputy Minister, Ministry of Information Technology and Communication
Andrew Stott, UK Transparency Board, former UK Government Director for Transparency and Digital Engagement
Victor Bodiu, General Secretary, Government of the Republic of Moldova
Arcadie Barbarosie, Executive Director, Institute of Public Policy, Moldova
Without planning on it, I managed to deliver a one-liner that morning that’s worth rephrasing and reiterating here: Smart government should not just serve citizens with smartphones.
I look forward to your thoughts and comments, for those of you who make it through the whole keynote.
In 2012, making sense of big data through narrative and context, particularly unstructured data, is now a strategic imperative for leaders around the world, whether they serve in Washington, run media companies or trading floors in New York City or guide tech titans in Silicon Valley.
While big data carries the baggage of huge hype, the institutions of federal government are getting serious about its genuine promise. On Thursday morning, the Obama Administration announced a “Big Data Research and Development Initiative,” with more than $200 million in new commitments. (See fact sheet provided by the White House Office of Science and technology policy at the bottom of this post.)
“In the same way that past Federal investments in information-technology R&D led to dramatic advances in supercomputing and the creation of the Internet, the initiative we are launching today promises to transform our ability to use Big Data for scientific discovery, environmental and biomedical research, education, and national security,” said Dr. John P. Holdren, Assistant to the President and Director of the White House Office of Science and Technology Policy, in a prepared statement.
The research and development effort will focus on advancing “state-of-the-art core technologies” need for big data, harnessing said technologies “to accelerate the pace of discovery in science and engineering, strengthen our national security, and transform teaching and learning,” and “expand the workforce needed to develop and use Big Data technologies.”
In other words, the nation’s major research institutions will focus on improving available technology to collect and use big data, apply them to science and national security, and look for ways to train more data scientists.
“IBM views Big Data as organizations’ most valuable natural resource, and the ability to use technology to understand it holds enormous promise for society at large,” said David McQueeney, vice president of software, IBM Research, in a statement. “The Administration’s work to advance research and funding of big data projects, in partnership with the private sector, will help federal agencies accelerate innovations in science, engineering, education, business and government.”
While $200 million dollars is a relatively small amount of funding, particularly in the context of the federal budget or as compared to investments that are (probably) being made by Google or other major tech players, specific support for training and subsequent application of big data within federal government is important and sorely needed. The job market for data scientists in the private sector is so hot that government may well need to build up its own internal expertise, much in the same way Living Social is training coders at the Hungry Academy.
“Big data is a big deal,” blogged Tom Kalil, deputy director for policy at White House OSTP, at the White House blog this morning.
We also want to challenge industry, research universities, and non-profits to join with the Administration to make the most of the opportunities created by Big Data. Clearly, the government can’t do this on its own. We need what the President calls an “all hands on deck” effort.
Some companies are already sponsoring Big Data-related competitions, and providing funding for university research. Universities are beginning to create new courses—and entire courses of study—to prepare the next generation of “data scientists.” Organizations like Data Without Borders are helping non-profits by providing pro bono data collection, analysis, and visualization. OSTP would be very interested in supporting the creation of a forum to highlight new public-private partnerships related to Big Data.
The White House is hosting a forum today in Washington to explore the challenges and opportunities of big data and discuss the investment. The event will be streamed online in live webcast from the headquarters of the AAAS in Washington, DC. I’ll be in attendance and sharing what I learn.
“Researchers in a growing number of fields are generating extremely large and complicated data sets, commonly referred to as ‘big data,'” reads the invitation to the event from the White House Office of Science and Technology Policy. “A wealth of information may be found within these sets, with enormous potential to shed light on some of the toughest and most pressing challenges facing the nation. To capitalize on this unprecedented opportunity — to extract insights, discover new patterns and make new connections across disciplines — we need better tools to access, store, search, visualize, and analyze these data.”
John Holdren, Assistant to the President and Director, White House Office of Science and Technology Policy
Subra Suresh, Director, National Science Foundation
Francis Collins, Director, National Institutes of Health
William Brinkman, Director, Department of Energy Office of Science
Big data is data that exceeds the processing capacity of conventional database systems. The data is too big, moves too fast, or doesn’t fit the strictures of your database architectures. To gain value from this data, you must choose an alternative way to process it.
The hot IT buzzword of 2012, big data has become viable as cost-effective approaches have emerged to tame the volume, velocity and variability of massive data. Within this data lie valuable patterns and information, previously hidden because of the amount of work required to extract them. To leading corporations, such as Walmart or Google, this power has been in reach for some time, but at fantastic cost. Today’s commodity hardware, cloud architectures and open source software bring big data processing into the reach of the less well-resourced. Big data processing is eminently feasible for even the small garage startups, who can cheaply rent server time in the cloud.
To learn more about the growing ecosystem of big data tools, watch my interview with Cloudera architect Doug Cutting, embedded below. @Cutting created Lucerne and led the Hadoop project at Yahoo before he joined Cloudera. Apache Hadoop is an open source framework that allows distributed applications based upon the MapReduce paradigm to run on immense clusters of commodity hardware, which in turn enables the processing of massive amounts of big data.
Details on the administration’s big data investments
A fact sheet released by the White House OSTP follows, verbatim:
“National Science Foundation and the National Institutes of Health – Core Techniques and Technologies for Advancing Big Data Science & Engineering
“Big Data” is a new joint solicitation supported by the National Science Foundation (NSF) and the National Institutes of Health (NIH) that will advance the core scientific and technological means of managing, analyzing, visualizing, and extracting useful information from large and diverse data sets. This will accelerate scientific discovery and lead to new fields of inquiry that would otherwise not be possible. NIH is particularly interested in imaging, molecular, cellular, electrophysiological, chemical, behavioral, epidemiological, clinical, and other data sets related to health and disease.
National Science Foundation: In addition to funding the Big Data solicitation, and keeping with its focus on basic research, NSF is implementing a comprehensive, long-term strategy that includes new methods to derive knowledge from data; infrastructure to manage, curate, and serve data to communities; and new approaches to education and workforce development. Specifically, NSF is:
· Encouraging research universities to develop interdisciplinary graduate programs to prepare the next generation of data scientists and engineers;
· Funding a $10 million Expeditions in Computing project based at the University of California, Berkeley, that will integrate three powerful approaches for turning data into information – machine learning, cloud computing, and crowd sourcing;
· Providing the first round of grants to support “EarthCube” – a system that will allow geoscientists to access, analyze and share information about our planet;
Issuing a $2 million award for a research training group to support training for undergraduates to use graphical and visualization techniques for complex data.
Providing $1.4 million in support for a focused research group of statisticians and biologists to determine protein structures and biological pathways.
· Convening researchers across disciplines to determine how Big Data can transform teaching and learning.
Department of Defense – Data to Decisions: The Department of Defense (DoD) is “placing a big bet on big data” investing approximately $250 million annually (with $60 million available for new research projects) across the Military Departments in a series of programs that will:
*Harness and utilize massive data in new ways and bring together sensing, perception and decision support to make truly autonomous systems that can maneuver and make decisions on their own.
*Improve situational awareness to help warfighters and analysts and provide increased support to operations. The Department is seeking a 100-fold increase in the ability of analysts to extract information from texts in any language, and a similar increase in the number of objects, activities, and events that an analyst can observe.
To accelerate innovation in Big Data that meets these and other requirements, DoD will announce a series of open prize competitions over the next several months.
In addition, the Defense Advanced Research Projects Agency (DARPA) is beginning the XDATA program, which intends to invest approximately $25 million annually for four years to develop computational techniques and software tools for analyzing large volumes of data, both semi-structured (e.g., tabular, relational, categorical, meta-data) and unstructured (e.g., text documents, message traffic). Central challenges to be addressed include:
· Developing scalable algorithms for processing imperfect data in distributed data stores; and
· Creating effective human-computer interaction tools for facilitating rapidly customizable visual reasoning for diverse missions.
The XDATA program will support open source software toolkits to enable flexible software development for users to process large volumes of data in timelines commensurate with mission workflows of targeted defense applications.
National Institutes of Health – 1000 Genomes Project Data Available on Cloud: The National Institutes of Health is announcing that the world’s largest set of data on human genetic variation – produced by the international 1000 Genomes Project – is now freely available on the Amazon Web Services (AWS) cloud. At 200 terabytes – the equivalent of 16 million file cabinets filled with text, or more than 30,000 standard DVDs – the current 1000 Genomes Project data set is a prime example of big data, where data sets become so massive that few researchers have the computing power to make best use of them. AWS is storing the 1000 Genomes Project as a publically available data set for free and researchers only will pay for the computing services that they use.
Department of Energy – Scientific Discovery Through Advanced Computing: The Department of Energy will provide $25 million in funding to establish the Scalable Data Management, Analysis and Visualization (SDAV) Institute. Led by the Energy Department’s Lawrence Berkeley National Laboratory, the SDAV Institute will bring together the expertise of six national laboratories and seven universities to develop new tools to help scientists manage and visualize data on the Department’s supercomputers, which will further streamline the processes that lead to discoveries made by scientists using the Department’s research facilities. The need for these new tools has grown as the simulations running on the Department’s supercomputers have increased in size and complexity.
US Geological Survey – Big Data for Earth System Science: USGS is announcing the latest awardees for grants it issues through its John Wesley Powell Center for Analysis and Synthesis. The Center catalyzes innovative thinking in Earth system science by providing scientists a place and time for in-depth analysis, state-of-the-art computing capabilities, and collaborative tools invaluable for making sense of huge data sets. These Big Data projects will improve our understanding of issues such as species response to climate change, earthquake recurrence rates, and the next generation of ecological indicators.”
Further details about each department’s or agency’s commitments can be found at the following websites by 2 pm today:
The Internet makes it likely that more campaigns will be self-directed from the grass roots. The tea party movement, for example, would have been impossible to organize and coordinate without email and the Web. Thus campaign managers will have to rely less on activity in centralized headquarters and more on volunteers—working at their pace and in their way—to reach voters on their laptops, tablets and smart phones.
Cutting-edge campaigns have quickly grasped how the Web makes it easier and less expensive to transmit information. But campaigns are only starting to understand how to use the Web and social-networking tools to make video and other data go viral—moving not just to those on a campaign’s email list but to the broader public.
It took decades for the changes inaugurated by the “We Like Ike” TV ads to fully take hold. It will likewise take time for political practitioners to figure out what works and what doesn’t work on the Internet. But we are seeing a version of Joseph Schumpeter’s “creative destruction” fundamentally alter the landscape of American politics. It will have huge implications on how campaigns are run, who we elect, and what kind of country we become.
A year later, we’re seeing that reality writ large upon the canvas of the 2012 elections. The portrait of the impact of the Internet and mobile devices upon the decisions that Saliterman painted through statistics offers a glimpse at where the future is trending. (Sources noted where provided.)
One third of voters learn from online-only sources. (Pew).
33% of likely voters don’t watch live TV. (Accenture)
70% of likely Republican voters in South Carolina went online before the primary.
2012 Primary voters viewed 14-20 sources before voting.
49% of people compared different candidates online.
In that context, Saliterman shared out to the room of Washington politicos and media three ways that campaigns are using the Internet — or, more specifically, Google products — to reach voters and influence the political conversation:
Google search advertising, used for rapid response to the political news cycle, anticipating what people are searching for and putting a campaign or media’s story where it will be found.
Geotargeted advertising, where likely voters in a primary, municipal election or state election can be served contextual messages based upon the location from which they’re accessing a webpage
Promoted video ads on YouTube, the world’s biggest video platform
There’s also a directory of public data that contains information on countries far beyond the borders of the U.S. that will be of interest to journalists and researchers who are not engaged in electoral politics.
Data from a new study on the use of Twitter by U.S. Senator and Representatives by public relations giant Edelman strongly suggests that the Grand Old Party has opened up a grand old lead in its use of the popular microblogging platform in just about every metric.
On Twitter’s 6th birthday, there’s more political speech flowing through tweets than ever. Twitter data from the study, as provided by Simply Measured, showed that on Twitter, Republican lawmakers are mentioned more, reply more often, are retweeted more, share more links to rich content and webpages, and reference specific bills much more often. Republicans tweet about legislation 3.5 times more than Democrats.
There are also more Republicans on Twitter: while the 89 U.S. Senators who tweet are evenly split, with one more Republican Senator tipping the balance, in the U.S. House there are 67 more Republican Representatives expressing themselves in 140 characters or less.
At this point, it’s worth noting that one of Twitter’s government leads in DC estimated earlier this year that only 15-20% of Congressional Twitter accounts are actually being updated by the Congressmen themselves, but the imbalance stands.
While the ways that governments deal with social media cannot be measured by one platform alone nor the activity upon it, the data in the embedded study below be of interest to many, particularly as the window for Congress to pass meaningful legislation narrows as the full election season looms this summer.
In the context of social media and election 2012, how well a Representative or Senator is tweeting could be assessed by whether they can use Twitter to build awareness of political platforms, respond to opposing campaign or, perhaps importantly for the purposes of the election, reach potential voters, help get them registered, and bring them to the polls
Outreach and transparency are both valuable to a healthy democracy, and to some extent, it is re-assuring that Twitter use is motivated by both reasons. An interesting counter-factual situation would be if the Republicans were the majority party. We may therefore ask in that situation: Is the desire to reach out to (opposing) voters strongest for “losing” parties? Our study certainly hints that Republicans are not only motivated to use Twitter as a means to reach out to their own followers, but also to Democrats, as they are more likely to use Twitter in cases where their district was overwhelmingly in favor President Barack Obama.
All-in-all, it would seem like Twitter is good for the whole Gov 2.0 idea. If Republicans are using Twitter as a means for outreach, then more bills may be passed (note: this has yet to be tested empirically, and still remains an open question for researchers). If Democrats are using Twitter as a means for transparency, then the public benefits from the stronger sense of accountability.
The future of cities was a hot topic this year at the SXSW Interactive Festival in Austin, Texas, with two different panels devoted to thinking about what’s next. I moderated one of them, on “shaping cities with mobile data.” Megan Schumann, a consultant at Deloitte, was present at both sessions and storified them. Her curatorial should gives you a sense of the zeitgeist of ideas shared.
If the town square now includes public discourse online, democratic governments in the 21st century are finding that part of civic life now includes listening there. Given what we’ve seen in this young century, how governments deal with social media is now part of how they deal with civil liberties, press freedom, privacy and freedom of expression in general.
At the end of Social Media Week 2012, I moderated a discussion with Matt Lira, Lorelei Kelly our Clay Johnson at the U.S. National Archives. This conversation explored more than how social media is changing politics in Washington: we looked at its potential to can help elected officials and other public servants make better policy decisions in the 21st century.
I hope you find it of interest; all three of the panelists gave thoughtful answers to the questions that I and the audience posed.
A new paper on “The New Ambiguity of ‘Open Government’” by Princeton scholars David Robinson and Harlan Yu is essential reading on the state of open government and open data in 2012. As the Cato Institute’s Jim Harper noted in a post about the new paper and open government data this morning, “paying close attention to language can reveal what’s going on in the world around you.”
“Open technologies involve sharing data over the Internet, and all kinds of governments can use them, for all kinds of reasons. Recent public policies have stretched the label “open government” to reach any public sector use of these technologies. Thus, “open government data” might refer to data that makes the government as a whole more open (that is, more transparent), but might equally well refer to politically neutral public sector disclosures that are easy to reuse, but that may have nothing to do with public accountability. Today a regime can call itself “open” if it builds the right kind of web site—even if it does not become more accountable or transparent. This shift in vocabulary makes it harder for policymakers and activists to articulate clear priorities and make cogent demands.
This essay proposes a more useful way for participants on all sides to frame the debate: We separate the politics of open government from the technologies of open data. Technology can make public information more adaptable, empowering third parties to contribute in exciting new ways across many aspects of civic life. But technological enhancements will not resolve debates about the best priorities for civic life, and enhancements to government services are no substitute for public accountability.”
Yu succinctly explained his thinking in two more tweets:
“Open” causes confusion: it describes both governments and data. Are we talking about “open (government data)” or “(open government) data”?
While it remains to be seen whether the Open Knowledge Foundation will be “open” to changing the “Open Data Handbook” to the “Adaptable Data Handbook,” Yu and Robinson are after something important here.
There’s good reason to be careful about celebrating the progress in cities, states and counties are making in standing up open government data platforms. Here’s an excerpt from a post on open government data on Radar last year:
Open government analysts like Nathaniel Heller have raised concerns about the role of open data in the Open Government Partnership, specifically that:
“… open data provides an easy way out for some governments to avoid the much harder, and likely more transformative, open government reforms that should probably be higher up on their lists. Instead of fetishizing open data portals for the sake of having open data portals, I’d rather see governments incorporating open data as a way to address more fundamental structural challenges around extractives (through maps and budget data), the political process (through real-time disclosure of campaign contributions), or budget priorities (through online publication of budget line-items).”
Similarly, Greg Michener has made a case for getting the legal and regulatory “plumbing” for open government right in Brazil, not “boutique Gov 2.0″ projects that graft technology onto flawed governance systems. Michener warned that emulating the government 2.0 initiatives of advanced countries, including open data initiatives:
“… may be a premature strategy for emerging democracies. While advanced democracies are mostly tweaking and improving upon value-systems and infrastructure already in place, most countries within the OGP have only begun the adoption process.”
Michener and Heller both raise bedrock issues for open government in Brazil and beyond that no technology solution in of itself will address. They’re both right: Simply opening up data is not a replacement for a Constitution that enforces a rule of law, free and fair elections, an effective judiciary, decent schools, basic regulatory bodies or civil society, particularly if the data does not relate to meaningful aspects of society.
Heller and Michener speak for an important part of the open government community and surely articulate concerns that exist for many people, particularly for a “good government” constituency whose long term, quiet work on government transparency and accountability may not be receiving the same attention as shinier technology initiatives.
Harper teased out something important on that count: “There’s nothing wrong with open government data, but the heart of the government transparency effort is getting information about the functioning of government. I think in terms of a subject-matter trio—deliberations, management, and results—data about which makes for a more open, more transparent government. Everything else, while entirely welcome, is just open government data.”
This new paper will go a long way to clarifying and teasing out those issues.
A new survey report on “the tone of life on social networking sites” from the Pew Research Center’s Internet & American Life Project found that 85% of American adults who use social media say people are “mostly kind” on those sites:
These attitudes will naturally be of great interest to people who work in practices that span open government to education, in terms of practitioners considering the use of social media for public engagement, civic participation, and deliberative democracy, along with grist for digital ethnographers of both the amateur and professional variety.
More stats, excerpted from the report:
“A nationally representative phone survey of American adults finds that:
*85% of SNS-using adults say that their experience on the sites is that people are mostly kind, compared with 5% who say people they observe on the sites are mostly unkind and another 5% who say their answer depends on the situation.
*68% of SNS users said they had an experience that made them feel good about themselves.
*61% had experiences that made them feel closer to another person. (Many said they had both experiences.)
*39% of SNS-using adults say they frequently see acts of generosity by other SNS users and another 36% say they sometimes see others behaving generously and helpfully. By comparison, 18% of SNS-using adults say they see helpful behavior “only once in a while” and 5% say they never see generosity exhibited by others on social networking sites.”
Today in Washington, the “School without Walls was full of of civic energy around open data, tech, community, bikes, smart cities, systems, efficiency, sustainability, accessibility, trains, buses, hacking, social networking, research, policy, crowdsourcing and more. Transportation Camp, an “unconference” generated by its attendees, featured dozens of sessions on all of those topics and more. As I’ve reported before, transit data is open government fuel for economic growth.
Below, the stories told in the tweets from the people show how much more there is to the world of transit than data alone. Their enthusiasm and knowledge made the 2012 iteration of Transportation Camp in the District a success.
As I wrote in December, one of the big unanswered questions about the Stop Online Piracy Act and its companion bill in the Senate, the PROTECT IP Act, is whether Internet companies would directly engage hundreds of millions of users to advocate against the bill for them in Washington, in the way that Tumblr did last November. To date, Facebook and Google have not committed to doing so.
Today, Twitter CEO Dick Costolo indicated to me that the California-based social media company that he leads will not being ‘shutting down’ on Wednesday — but that it would also continue to be ‘very active.’ The Guardian has picked up on our exchange, publishing a story that focused upon mischaracterized Costolo’s response to a question about Twitter shutting down as him calling Wikipedia’s SOPA protest as ‘silly.’ What Costolo made clear later when Wales asked him about the story, however, is that he was referring to Twitter making such a choice, not Wikipedia.
Following is a storify of the relevant tweets, along with some context for them.
My sense is that, of all of the major social media players — which in 2012 now include Google, Facebook, LinkedIn, Yahoo, Tumblr and MySpace, amongst others — Twitter has been one of the leaders in the technology community for sticking up for its users where it can, particularly with respect to the matter of fighting to make Twitter subpoena from the U.S. Justice Department regarding user data public.
Whether or not Twitter ultimately decides to “shut down” or “black out,” Twitter’s general counsel, Alex Macgillivray deserves all due credit for that decision and others, along with the lucid blog post that explained how SOPA would affect ordinary, non-infringing users.
For a fuller explanation of why these issues stuff matters, I highly recommend reading “Consent of the Networked,” a new must-read book on Internet freedom by former CNN journalist and co-founder of Global Voices Rebecca MacKinnon. These sorts of decisions and precedents are deeply important in the 21st century, when much of what people think of as speech in the new public square is hosted upon the servers of private services like Twitter, Facebook and Google.
This post has been updated to clarify Costolo’s position, with respect to how the Guardian framed his initial response.