big data

Election 2012: A #SocialElection Driven By The Data

Social media was a bigger part of the election season of 2012 than ever before, from the enormous volume of Facebook updates and tweets to memes during the Presidential debates to public awareness of what the campaigns were doing there in popular culture. Facebook may even have booted President Obama’s vote tally.

While it’s too early to say if any of the plethora of platforms played any sort of determinative role in 2012, strong interest in what social media meant in this election season led me to participate in two panels in the past two weeks: one during DC Week 2012 and another at the National Press Club, earlier today. Storifies of the online conversations during each one are embedded below.



The big tech story of this campaign, however, was not social media. As Micah Sifry presciently observed last year, it wasn’t (just) about Facebook: “it’s the data, stupid.” And when it came to building for this re-election campaign like an Internet company, the digital infrastructure that the Obama campaign’s team of engineers built helped to deliver the 2012 election.

Do newspapers need to adopt data science to cover campaigns?

Last October, New York Times elections developer Derek Willis was worried about what we don’t know about elections:

While campaigns have a public presence that is mostly recorded and observed, the stuff that goes on behind the scenes is so much more sophisticated than it has been. In 2008 we were fascinated by the Obama campaign’s use of iPhones for data collection; now we’re entering an age where campaigns don’t just collect information by hand, but harvest it and learn from it. An “information arms race,” as GOP consultant Alex Gage puts it.

For most news organizations, the standard approach to campaign coverage is tantamount to bringing a knife to a gun fight. How many data scientists work for news organizations? We are falling behind, and we risk not being able to explain to our readers and users how their representatives get elected or defeated.

Writing for the New York Times today, Slate columnist Sasha Issenberg revisited that theme, arguing that campaign reporters are behind the curve in understanding, analyzing or being able to capably replicate what political campaigns are now doing with data. Whether you’re new to the reality of the role of big data in this campaign or fascinated by it, a recent online conference on the data-driven politics of 2012 will be of interest. I’ve embedded it below:

Issenberg’s post has stirred online debate amongst journalists, academics and at least one open government technologist. I’ve embedded a storify of them below.


Social citizenship: CNN and Facebook to partner on “I’m Voting” app in 2012 election

Two years ago, I wondered whether “social voting” on Foursquare would increase voter participation.

That experiment is about to be writ much larger. In a release today, first reported (as far as I can tell) by Mike Allen in Politico Playbook, CNN and Facebook announced that they will be partnering on a “I’m Voting” Facebook app that will display commitments to vote on timelines, newsfeeds and the “real-time ticker” in Facebook.

“Each campaign cycle brings new technologies that enhance the way that important connections between citizens and their elected representatives are made. Though the mediums have changed, the critical linkages between candidates and voters­ remain,” said Joel Kaplan, Facebook Vice President-U.S. Public Policy, in a prepared statement. “Innovations like Facebook can help transform this informational experience into a social one for the American people.”

“By allowing citizens to connect in an authentic and meaningful way with presidential candidates and discuss critical issues facing the country, we hope more voters than ever will get involved with issues that matter most to them,” said Joe Lockhart, Facebook Vice President Corporate Communications, in a prepared statement. “Facebook is pleased to partner with CNN on this uniquely participatory experience.”

“We fundamentally changed the way people consume live event coverage, setting a record for the most-watched live video event in Internet history, when we teamed up with Facebook for the 2009 Inauguration of President Obama,” said KC Estenson, SVP CNN Digital, in a prepared statement. “By again harnessing the power of the Facebook platform and coupling it with the best of our journalism, we will redefine how people engage in the democratic process and advance the way a news organization covers a national election.”

“This partnership doubles down on CNN’s mission to provide the most engaging coverage of the 2012 election season,” said Sam Feist, CNN Washington bureau chief, in a prepared statement. “CNN’s unparalleled political reporting combined with Facebook’s social connectivity will empower more American voters in this critical election season.”

What will ‘social citizenship’ mean?

There’s also a larger question about the effect of these technologies on society: Will social networks encouraging people to share their voting behavior lead to more engagement throughout the year? After all, people are citizens 365 days a year, not just every two years on election day. Will “social citizenship” play a role in Election 2012?

In 2010, Foursquare founder Dennis Crowley said yes. As has often been the case (Dodgeball, anyone?), Crowley may well have been ahead of his time.

“One of the things that we’re finding is that when people send their Foursquare checkins out to Twitter and to Facebook, it can drive behaviors,” said Crowley in 2010. “If I check into a coffee shop all the time, my friends are going to be like, hey, I want to go to that coffee shop. We’re thinking the same thing could happen en masse if you start checking into these polling stations, if you start broadcasting that you voted, it may encourage other friends to go out there and do something.”

The early evidence, at least from healthcare in 2010, was that social sharing can lead to more awareness and promote health. Whether civic health improves, at least as measured in voter participation, is another matter. How you voted used to be a question that each registered citizen could choose to keep to him or herself. In 2012 and the age of social media, that social norm may be shifting.

One clear winner in Election 2012, however, will almost certainly be Facebook, which will be collecting a lot of data about users that participate in this app and associated surveys — and that data will be of great interest to political scientists and future campaigns alike.

“Since both CNN and [Facebook] are commercial entities, and since data collection/tracking practices in these apps are increasingly invasive, I am curious to see how these developments impact the evolution of the currently outdated US privacy regime,” commented Vivian Tero, an IDC analyst focused on governance, risk and compliance.

UPDATE: The Poynter Institute picked up this story and connected it in a tweet with a recent AdWeek interview with CNN digital senior vice president and general manager KC Estenson on “CNN’s digital power play.

Estenson, whose network has been suffering from lower ratings of late, notes that online, CNN is now “regularly getting 60 million unique users,” with an “average 20 million minutes a month across the platforms” and CNN Digital generating 110 million video streams per month.

That kind of traffic could power a lot of Likes.

Full release by Facebook on U.S. Politics over on Facebook.

This post has been updated as more information became available, via Facebook spokesman Andrew Noyes.

U.S. cities form working group to share predictive data analytics skills

Yesterday, I published an interview with Michael Flowers, New York City’s director of analytics for the Office of Policy and Strategic Planning in Mayor Bloomberg’s office. In the interview, “Predictive data analytics is saving lives and taxpayer dollars in New York City,” Flowers talks about how his team of 5 is applying data analysis on the behalf of citizens to improve the efficiency of processes and more effectively detection of crimes, from financial fraud to cigarette bootlegging.

After our interview, Flowers followed up over email to tell me about a new working group on data analytics between New York City, Boston, Chicago and Philadelphia. The working group, which recently launched a website at www.g-analytics.org, is sharing methodologies, ideas and strategies,

“Ultimately we want the group to grow and support as many cities interested in pursuing this approach as possible,” wrote Flowers. “It can get pretty lonely when you pursue something asymmetrical or untraditional in the government space, so we felt it was important to make it as simple as possible for like-minded cities to get started. There’s a great guy I work closely with out in Chicago on this effort – [Chicago chief data officer] Brett Goldstein; we talk at least twice a week.”

What is smart government?

Last month, I traveled to Moldova to speak at a “smart society” summit hosted by the Moldovan national e-government center and the World Bank. I talked about what I’ve been seeing and reporting on around the world and some broad principles for “smart government.” It was one of the first keynote talks I’ve ever given and, from what I gather, it went well: the Moldovan government asked me to give a reprise to their cabinet and prime minister the next day.

I’ve embedded the entirety of the morning session above, including my talk (which is about half an hour long). I was preceded by professor Beth Noveck, the former deputy CTO for open government at The White House. If you watch the entire program, you’ll hear from:

  • Victor Bodiu, General Secretary, Government of the Republic of Moldova, National Coordinator, Governance e-Transformation Agenda
  • Dona Scola, Deputy Minister, Ministry of Information Technology and Communication
  • Andrew Stott, UK Transparency Board, former UK Government Director for Transparency and Digital Engagement
  • Victor Bodiu, General Secretary, Government of the Republic of Moldova
  • Arcadie Barbarosie, Executive Director, Institute of Public Policy, Moldova

Without planning on it, I managed to deliver a one-liner that morning that’s worth rephrasing and reiterating here: Smart government should not just serve citizens with smartphones.

I look forward to your thoughts and comments, for those of you who make it through the whole keynote.

Startup Weekend DC kickoff highlights open data, startups and disruptive innovation

On Friday night, a packed room of eager potential entrepreneurs, developers and curious citizens watched US CTO Todd Park and Bill Eggers kick off Startup Weekend DC in Microsoft’s offices in Chevy Chase, Maryland.

Park brought his customary energy and geeky humor to his short talk, pitching the assembled crowd on using open government data in their ideas.

 

Park wants to inject open data as a “fuel” into the economy. After talking about the success of the Health Data Initiative and the Health Datapalooza, he shared a series of websites were aspiring entrepreneurs could find data to use:

Park also made an “ask” of the attendees of Startup Weekend DC that I haven’t heard from many government officials: he requested that if they A) use the data and/or B) if they run into any trouble accessing it, to let him know.

“If you had a hard time or found a particular restful API moving, let me know,” he said. “It helps us improve our performance.” And then he gave out his email address at the White House Executive Office of the President, as he did at SXSW Interactive in Austin in March of this year. Asking the public for feedback on data quality — particularly entrepreneurs and developers — and providing contact information to do so is, to put it bluntly, something every city and state official that has stood up and open data platform could and should be doing. In this context, the US CTO has set a notable example for the country.

Examples of startups, gap filling and civic innovation

Following Park, author and Deloitte consultant Bill Eggers talked about innovative startups and the public sector. I’ve embedded video of his talk below:

Eggers cited three different startups in his talk: Recycle Bank, Avego and Kaggle.

1) The outcome of Recycle Bank‘s influence was a 19-fold increase in recycling in some cities from gamification, said Eggers. The startup now has 3 million members and is now setting its sights on New York City.

2) The real-time ridesharing provided by Avego holds the promise to hugely reduce traffic congestion, said Eggers. According to the stats he cited, 80% of people on the road are currently driving in cars by themselves. Avego has raised tens of millions of dollars to try to better optimize transportation.

3) Anthony Goldbloom found a hole in the big data market at Kaggle, said Eggers, where they’re matching data challenges with data scientists. There now some 19,000 registered data scientists in the Kaggle database.

Eggers cited the success of a competition to map dark matter on Kaggle, a problem that had had millions spent on it. The results of open innovation here were better than science had been able to achieve prior to the competition. Kaggle has created a market out of writing better algorithms.

After Eggers spoke, the organizers of Startup Weekend explained how the rest of the weekend would proceed and asked attendees to pitch their ideas. One particular idea, for this correspondent, stood out, primarily because of the young fellows pitching it:

White House announces 200m in funding for big data research and development, hosts forum at AAAS

In 2012, making sense of big data through narrative and context, particularly unstructured data, is now a strategic imperative for leaders around the world, whether they serve in Washington, run media companies or trading floors in New York City or guide tech titans in Silicon Valley.

While big data carries the baggage of huge hype, the institutions of federal government are getting serious about its genuine promise. On Thursday morning, the Obama Administration announced a “Big Data Research and Development Initiative,” with more than $200 million in new commitments. (See fact sheet provided by the White House Office of Science and technology policy at the bottom of this post.)

“In the same way that past Federal investments in information-technology R&D led to dramatic advances in supercomputing and the creation of the Internet, the initiative we are launching today promises to transform our ability to use Big Data for scientific discovery, environmental and biomedical research, education, and national security,” said Dr. John P. Holdren, Assistant to the President and Director of the White House Office of Science and Technology Policy, in a prepared statement.

The research and development effort will focus on advancing “state-of-the-art core technologies” need for big data, harnessing said technologies “to accelerate the pace of discovery in science and engineering, strengthen our national security, and transform teaching and learning,” and “expand the workforce needed to develop and use Big Data technologies.”

In other words, the nation’s major research institutions will focus on improving available technology to collect and use big data, apply them to science and national security, and look for ways to train more data scientists.

“IBM views Big Data as organizations’ most valuable natural resource, and the ability to use technology to understand it holds enormous promise for society at large,” said David McQueeney, vice president of software, IBM Research, in a statement. “The Administration’s work to advance research and funding of big data projects, in partnership with the private sector, will help federal agencies accelerate innovations in science, engineering, education, business and government.”

While $200 million dollars is a relatively small amount of funding, particularly in the context of the federal budget or as compared to investments that are (probably) being made by Google or other major tech players, specific support for training and subsequent application of big data within federal government is important and sorely needed. The job market for data scientists in the private sector is so hot that government may well need to build up its own internal expertise, much in the same way Living Social is training coders at the Hungry Academy.

Big data is a big deal,” blogged Tom Kalil, deputy director for policy at White House OSTP, at the White House blog this morning.

We also want to challenge industry, research universities, and non-profits to join with the Administration to make the most of the opportunities created by Big Data. Clearly, the government can’t do this on its own. We need what the President calls an “all hands on deck” effort.

Some companies are already sponsoring Big Data-related competitions, and providing funding for university research. Universities are beginning to create new courses—and entire courses of study—to prepare the next generation of “data scientists.” Organizations like Data Without Borders are helping non-profits by providing pro bono data collection, analysis, and visualization. OSTP would be very interested in supporting the creation of a forum to highlight new public-private partnerships related to Big Data.

The White House is hosting a forum today in Washington to explore the challenges and opportunities of big data and discuss the investment. The event will be streamed online in live webcast from the headquarters of the AAAS in Washington, DC. I’ll be in attendance and sharing what I learn.

“Researchers in a growing number of fields are generating extremely large and complicated data sets, commonly referred to as ‘big data,’” reads the invitation to the event from the White House Office of Science and Technology Policy. “A wealth of information may be found within these sets, with enormous potential to shed light on some of the toughest and most pressing challenges facing the nation. To capitalize on this unprecedented opportunity — to extract insights, discover new patterns and make new connections across disciplines — we need better tools to access, store, search, visualize, and analyze these data.”

Speakers:

  • John Holdren, Assistant to the President and Director, White House Office of Science and Technology Policy
  • Subra Suresh, Director, National Science Foundation
  • Francis Collins, Director, National Institutes of Health
  • William Brinkman, Director, Department of Energy Office of Science

Panel discussion:

  • Moderator: Steve Lohr, New York Times, author of “Big Data’s Impact in the World
  • Alex Szalay, Johns Hopkins University
  • Lucila Ohno-Machado, UC San Diego
  • Daphne Koller, Stanford
  • James Manyika, McKinsey

What is big data?

Anyone planning for big data to use data for public good — or profit — through applied data science must know first understand what big data is.

On that count, turn to my colleague Edd Dumbill, who posted a useful definition last year on the O’Reilly Radar in his introduction to the big data landscape:

Big data is data that exceeds the processing capacity of conventional database systems. The data is too big, moves too fast, or doesn’t fit the strictures of your database architectures. To gain value from this data, you must choose an alternative way to process it.

The hot IT buzzword of 2012, big data has become viable as cost-effective approaches have emerged to tame the volume, velocity and variability of massive data. Within this data lie valuable patterns and information, previously hidden because of the amount of work required to extract them. To leading corporations, such as Walmart or Google, this power has been in reach for some time, but at fantastic cost. Today’s commodity hardware, cloud architectures and open source software bring big data processing into the reach of the less well-resourced. Big data processing is eminently feasible for even the small garage startups, who can cheaply rent server time in the cloud.

Teams of data scientists are increasingly leveraging a powerful, growing set of common tools, whether they’re employed by government technologists opening cities, developers driving a revolution in healthcare or hacks and hackers defining the practice of data journalism.

To learn more about the growing ecosystem of big data tools, watch my interview with Cloudera architect Doug Cutting, embedded below. @Cutting created Lucerne and led the Hadoop project at Yahoo before he joined Cloudera. Apache Hadoop is an open source framework that allows distributed applications based upon the MapReduce paradigm to run on immense clusters of commodity hardware, which in turn enables the processing of massive amounts of big data.

Details on the administration’s big data investments

A fact sheet released by the White House OSTP follows, verbatim:

National Science Foundation and the National Institutes of Health – Core Techniques and Technologies for Advancing Big Data Science & Engineering

“Big Data” is a new joint solicitation supported by the National Science Foundation (NSF) and the National Institutes of Health (NIH) that will advance the core scientific and technological means of managing, analyzing, visualizing, and extracting useful information from large and diverse data sets. This will accelerate scientific discovery and lead to new fields of inquiry that would otherwise not be possible. NIH is particularly interested in imaging, molecular, cellular, electrophysiological, chemical, behavioral, epidemiological, clinical, and other data sets related to health and disease.

National Science Foundation: In addition to funding the Big Data solicitation, and keeping with its focus on basic research, NSF is implementing a comprehensive, long-term strategy that includes new methods to derive knowledge from data; infrastructure to manage, curate, and serve data to communities; and new approaches to education and workforce development. Specifically, NSF is:

· Encouraging research universities to develop interdisciplinary graduate programs to prepare the next generation of data scientists and engineers;
· Funding a $10 million Expeditions in Computing project based at the University of California, Berkeley, that will integrate three powerful approaches for turning data into information – machine learning, cloud computing, and crowd sourcing;
· Providing the first round of grants to support “EarthCube” – a system that will allow geoscientists to access, analyze and share information about our planet;
Issuing a $2 million award for a research training group to support training for undergraduates to use graphical and visualization techniques for complex data.
Providing $1.4 million in support for a focused research group of statisticians and biologists to determine protein structures and biological pathways.
· Convening researchers across disciplines to determine how Big Data can transform teaching and learning.

Department of Defense – Data to Decisions: The Department of Defense (DoD) is “placing a big bet on big data” investing approximately $250 million annually (with $60 million available for new research projects) across the Military Departments in a series of programs that will:

*Harness and utilize massive data in new ways and bring together sensing, perception and decision support to make truly autonomous systems that can maneuver and make decisions on their own.
*Improve situational awareness to help warfighters and analysts and provide increased support to operations. The Department is seeking a 100-fold increase in the ability of analysts to extract information from texts in any language, and a similar increase in the number of objects, activities, and events that an analyst can observe.

To accelerate innovation in Big Data that meets these and other requirements, DoD will announce a series of open prize competitions over the next several months.

In addition, the Defense Advanced Research Projects Agency (DARPA) is beginning the XDATA program, which intends to invest approximately $25 million annually for four years to develop computational techniques and software tools for analyzing large volumes of data, both semi-structured (e.g., tabular, relational, categorical, meta-data) and unstructured (e.g., text documents, message traffic). Central challenges to be addressed include:

· Developing scalable algorithms for processing imperfect data in distributed data stores; and
· Creating effective human-computer interaction tools for facilitating rapidly customizable visual reasoning for diverse missions.

The XDATA program will support open source software toolkits to enable flexible software development for users to process large volumes of data in timelines commensurate with mission workflows of targeted defense applications.

National Institutes of Health – 1000 Genomes Project Data Available on Cloud: The National Institutes of Health is announcing that the world’s largest set of data on human genetic variation – produced by the international 1000 Genomes Project – is now freely available on the Amazon Web Services (AWS) cloud. At 200 terabytes – the equivalent of 16 million file cabinets filled with text, or more than 30,000 standard DVDs – the current 1000 Genomes Project data set is a prime example of big data, where data sets become so massive that few researchers have the computing power to make best use of them. AWS is storing the 1000 Genomes Project as a publically available data set for free and researchers only will pay for the computing services that they use.

Department of Energy – Scientific Discovery Through Advanced Computing: The Department of Energy will provide $25 million in funding to establish the Scalable Data Management, Analysis and Visualization (SDAV) Institute. Led by the Energy Department’s Lawrence Berkeley National Laboratory, the SDAV Institute will bring together the expertise of six national laboratories and seven universities to develop new tools to help scientists manage and visualize data on the Department’s supercomputers, which will further streamline the processes that lead to discoveries made by scientists using the Department’s research facilities. The need for these new tools has grown as the simulations running on the Department’s supercomputers have increased in size and complexity.

US Geological Survey – Big Data for Earth System Science: USGS is announcing the latest awardees for grants it issues through its John Wesley Powell Center for Analysis and Synthesis. The Center catalyzes innovative thinking in Earth system science by providing scientists a place and time for in-depth analysis, state-of-the-art computing capabilities, and collaborative tools invaluable for making sense of huge data sets. These Big Data projects will improve our understanding of issues such as species response to climate change, earthquake recurrence rates, and the next generation of ecological indicators.”

Further details about each department’s or agency’s commitments can be found at the following websites by 2 pm today:

NSF: http://www.nsf.gov/news/news_summ.jsp?cntn_id=123607
HHS/NIH: http://www.nih.gov/news/health/mar2012/nhgri-29.htm
DOE: http://science.energy.gov/news/
DOD: www.DefenseInnovationMarketplace.mil
DARPA: http://www.darpa.mil/NewsEvents/Releases/2012/03/29.aspx
USGS: http://powellcenter.usgs.gov

IBM infographic on big data

Big Data: The New Natural Resource

This post and headline have been updated as more information on the big data R&D initiative became available.

Googling the 2012 election

Lunch with @stiles @ethanklapper @ginnyhunt et al to hear about new elections tech http://google.com/elections

The Internet will be a core component of the 2012 election cycle. Of course, you follow technology and politics, you know that’s been increasingly true for years. Last week, speaking at a briefing in Google’s DC offices, Google’s Rob Saliterman cited a 3/10/2011 op-ed by Karl Rove in the Wall Street Journal, where he wrote that The impact of the Internet on elections has only begun to be felt:

The Internet makes it likely that more campaigns will be self-directed from the grass roots. The tea party movement, for example, would have been impossible to organize and coordinate without email and the Web. Thus campaign managers will have to rely less on activity in centralized headquarters and more on volunteers—working at their pace and in their way—to reach voters on their laptops, tablets and smart phones.

Cutting-edge campaigns have quickly grasped how the Web makes it easier and less expensive to transmit information. But campaigns are only starting to understand how to use the Web and social-networking tools to make video and other data go viral—moving not just to those on a campaign’s email list but to the broader public.

It took decades for the changes inaugurated by the “We Like Ike” TV ads to fully take hold. It will likewise take time for political practitioners to figure out what works and what doesn’t work on the Internet. But we are seeing a version of Joseph Schumpeter’s “creative destruction” fundamentally alter the landscape of American politics. It will have huge implications on how campaigns are run, who we elect, and what kind of country we become.

A year later, we’re seeing that reality writ large upon the canvas of the 2012 elections. The portrait of the impact of the Internet and mobile devices upon the decisions that Saliterman painted through statistics offers a glimpse at where the future is trending. (Sources noted where provided.)

  • 83% of mobile phone owners are registered voters. (Nielsen Mobile)
  • One third of voters learn from online-only sources. (Pew).
  • 33% of likely voters don’t watch live TV. (Accenture)
  • 70% of likely Republican voters in South Carolina went online before the primary.
  • 2012 Primary voters viewed 14-20 sources before voting.
  • 49% of people compared different candidates online.

Political campaigns using geotargeted, contextual search ads for rapid response in primaries, says @robsaliterman

In that context, Saliterman shared out to the room of Washington politicos and media three ways that campaigns are using the Internet — or, more specifically, Google products — to reach voters and influence the political conversation:

  1. Google search advertising, used for rapid response to the political news cycle, anticipating what people are searching for and putting a campaign or media’s story where it will be found.
  2. Geotargeted advertising, where likely voters in a primary, municipal election or state election can be served contextual messages based upon the location from which they’re accessing a webpage
  3. Promoted video ads on YouTube, the world’s biggest video platform

More information on Google Elections is, naturally, available online, along with a toolkit.

There’s also a directory of public data that contains information on countries far beyond the borders of the U.S. that will be of interest to journalists and researchers who are not engaged in electoral politics.

Googling "unemployment" using public data http://www.google.com/publicdata/directory

Postscript: For an excellent discussion of where campaigns are going in search of the digital voter, read Amy Schatz in the Wall Street Journal.

Correction: A statistic provided by Google about the percentage of smartphone/tablet owners that are registered to vote was removed from this post after it could not be confirmed.