Scaling Big Data for Social Good: Weighing the Benefits and the Risks
Five hundred million tweets. Sixty-five billion WhatsApp messages. Over 290 billion emails. These are just a few of the estimates of the amount of data we produce daily. According to IBM, about 2.5 quintillion bytes of data are created every day— that’s enough to fill over 57 billion 32 GB iPads. And these amounts are expected to continue to increase exponentially. In fact, the entire digital universe is expected to hit 44 zettabytes this year, which will mean there are 40 times more bytes of data than there are stars in the observable universe.
Although the use of big data has skyrocketed in recent years, one of the most notable big data sets available, Landsat, actually dates back to the 1970s. As the world’s longest-running Earth observations program, Landsat uses NASA’s satellites to provide images of the Earth at a 30 meter resolution for the entire globe, and the decades of experience we’ve had with the program make clear the benefits of unlocking big data for the public good. By sharing its data, LandSat has informed insights across wide-reaching sectors including health, urbanization, agriculture and environmental loss, resulting in a global economic benefit of billions of dollars.
The Potential – And Risks – of Data Sharing
Allowing data, particularly big data, to be shared between the public and private sectors has significant benefits. These include improved public services, transparency and citizen engagement – not to mention the potential for driving economic opportunities and business innovation. And the private sector has an important role to play, not only in leveraging the value of government open data, but in making some of its own data open or available to the public sector through collaboration. Many of these data-driven public-private partnerships have already proven successful. For instance, Uber released their data on traffic to aid transportation planners and city officials; Statistics Canada has partnered with smart meter companies to access electricity consumption data to better understand consumption patterns; and the telecommunications company, Airtel, shared data with the World Health Organization to help combat tuberculosis in India.
Yet with great potential comes great risk. Big data sharing raises pressing questions about security, privacy and consent. These questions have become even more salient in recent years, as we’ve learned more about how our data has been handled by technology companies and others. For instance, in 2018 it was revealed that Facebook shared user data with other tech giants, including Amazon, Microsoft and Apple, despite already facing class-action lawsuits for having failed to protect the personal information of its users during the Cambridge Analytica scandal.
And these concerns extend beyond the private sector. For example, in 2017 the University of Chicago shared historical medical data records with Google as part of research into artificial intelligence to predict illnesses. But a former patient has sued the university, claiming that the data was not appropriately de-identified, and that the university sold their health data without obtaining consent. Similarly, a data sharing agreement signed by the governments of the United States and the United Kingdom has raised concerns within the legal community about privacy protections. While big data sharing can result in important opportunities and insights, it can also expose the parties involved to liability, and if handled improperly, it can jeopardize individuals’ privacy and leave them vulnerable to identity theft and other damages. And these concerns can be particularly acute in emerging markets, where individuals may have less awareness of and control over their data and its uses, and where the legal systems may be less prepared to protect people’s data rights.
Creating Frameworks for the Ethical Use of Big Data
So how do we ensure that big data is used safely and ethically? For one, when sharing or opening data to the public, it is essential that appropriate procedures be in place, including the right legal frameworks and policies. Fortunately, with more than 110 countries having freedom of information laws, there are already legal frameworks requiring governments to proactively disclose information and data. We need to continue to build on this work, to define ways of sharing data that ensure trust while protecting sensitive information. Moreover, we should consider how data sharing can be mobilized when privacy laws are weak, what risks and incentives influence sharing within the development sector, and how emerging technologies can be harnessed to maximize innovation.
At the Thematic Research Network on Data and Statistics (TReNDS), we study the ways that data can support sustainable development, in particular looking at how countries can strengthen the governance of both traditional and new data sources, like big data. In partnership with the World Economic Forum, NYU’s Governance Lab and the University of Washington, our project, Contracts for Data Collaboration, seeks to strengthen the trust, transparency and accountability of cross-sector data collaboratives. We do this by analyzing the ways that data sharing arrangements are organized and how the associated agreements are negotiated, as well as producing guidance materials to help lower the barrier to entry for these partnerships. Our research has uncovered a number of public-private partnerships that have successfully navigated the challenges of big data sharing to realize benefits that would not otherwise be possible.
For example, in 2018, UN Environment and Google formed a partnership to create a global indicator of surface water to help fill data gaps, which has now been incorporated into official Sustainable Development Goal reporting. Although the project was fundamentally made possible by a mutual spirit of cooperation and a clear work plan, the partners also signed a memorandum of understanding, which helped improve the external perception of the collaboration by assuring data users that the necessary due diligence was performed. This agreement also secured a long-term commitment to data sharing between Google and UN Environment by defining the intent of the collaboration, while still allowing for flexibility.
Additionally, when the government of Moldova established an aid management platform to collect data from different development actors across the country, including its own ministries and international NGOs, it created a data management plan that provided guidelines for data submissions. The plan was written as a flexible, living document, and it was supported by an accompanying law that increased the incentives for submitting data electronically, and strengthened data collection. This platform has been in effect in Moldova since 2013, with development actors continuing to regularly submit data.
Novel Approaches to Data Sharing
One source of big data that has received wide attention is telecommunications data. By analyzing aggregated, anonymized call records, messages and other mobile network data, we can gain real-time insights into everything from population movement to disease spread. Spain’s Institute of Statistics, for instance, recently reached agreements with its three largest mobile network operators to access data for improving transportation, infrastructure and health services. Additionally, call data records have been used in over two dozen low and middle-income countries to examine humanitarian aid delivery, track disease spread and predict violence. However, there are growing concerns about the lack of consent involved in some of these initiatives, fears about potential misuse, and suggestions that some of these data sharing activities might even be illegal.
There is a definite need for standard, legally acceptable methods for accessing telecommunications data. Fortunately, several thoughtful models have been tested and may offer ways forward. For instance, a three-way agreement in Ghana is allowing the government to access data from the mobile network operator, Vodafone, for sustainable development purposes – but this data is first shared with and processed by the NGO, Flowminder, before being delivered in aggregated form to Ghana’s Statistical Services. Meanwhile, the OPAL project has taken a slightly different approach, by providing mobile network operators in Colombia and Senegal with code that allows them to analyze data and preserve privacy in-house. These sorts of new and innovative approaches to sharing data will enable us to leverage the value of these powerful big data sources, while also mitigating the risks.
The growth of big data is not slowing down anytime soon, and we have so much potential to use it for social good. Yet with data privacy laws still lacking throughout much of the world, and public trust in data sharing on the decline, we must carefully balance these opportunities with risks. We can strike this balance by establishing effective legal frameworks and policies to ensure that this data is used safely, securely and ethically.
Hayden Dahmm is an analyst with the Sustainable Development Solutions Network’s Thematic Research Network on Data and Statistics (SDSN TReNDS).
Photo courtesy of Internews Europe.
- public policy