‘Are We There Yet?’: How to Deal with Delays – And Other Challenges – In Acquiring Financial Transaction Data
Editor’s note: This article is part of the NextBillion series “Big Data: Big Risks, Big Opportunities,” one of three special series we’re running this year. Learn more about NextBillion’s 2020 series here.
“Are we there yet?” It’s a familiar refrain to any parent who has taken a child on a long journey – you may even remember the stock replies your own parents supplied in similar situations. But “we’ll be there soon” probably didn’t correspond with your own perception of time, and left you wondering why it was taking so long.
As a program established to promote data-driven decision-making to extend financial inclusion, insight2impact (i2i) sometimes faces a similar situation when we attempt to secure financial transaction data from external organizations. We regularly hear the promise that “we’ll have the data very soon” – followed by a delay of several months.
Convincing financial service providers to share transaction data
The belief that there must be a different (better) way to measure financial inclusion provided the impetus to launch the i2i program. Five years ago, many countries still pursued targets based on the assumption that citizens were financially included if they had access to financial services – primarily bank accounts – irrespective of whether they actually used those accounts.
We were fairly confident that analyzing a combination of demand-side data (for example, responses from a bespoke financial inclusion survey) and financial transaction data would provide a clearer picture of people’s true financial lives. We hoped that the insights gained from analyzing these two forms of data (and ideally a third, merged dataset comprising the transaction data of respondents to the survey) would be useful for policymakers. In retrospect, we didn’t understand how challenging this ambition would prove to be.
In almost every case, the response to our request for data from our initial contact within these institutions was positive. Things generally seemed promising at the outset but, despite good partner organizations and the best intentions, the time-lapse between our initial request for transaction data and actual access to that data was significant. An interval of a year between first contact with the data provider and receipt of the data is not unheard of.
Despite the wait, we have now secured access to (and have analyzed or are analyzing) transaction data from one of the largest banks in Mexico, a credit bureau in Zimbabwe, a payment switch that processes interbank payments in Nigeria and a regulatory authority, which provided access to mobile money transaction data in Rwanda. It’s likely that we will soon (ahem) have access to card data from two additional banks – one in Mexico and another in Nigeria.
Though it took longer to reach our destination than we’d initially thought, we learned some valuable lessons from our experience acquiring this financial transaction data, and using it to better understand whether formal financial services are meeting the needs of individuals. Below, we highlight a few of those lessons that may be of use to others working in the financial inclusion space.
Find and cultivate organizational champions (in the right departments)
If you want to get access to customer transaction data, it’s necessary to find someone within the organization who will champion your request (and your research objectives). Ideally, this will be an influential person at the organization – someone who wants to make your project work, and who has the clout to demand responses from the legal department (see below). This has proven to be significantly easier in the instances where we had pre-existing relationships. Our champions, however, tended to be “financial inclusion people” rather than “data people” – so they weren’t always able to communicate the more technical specifications of the request to their colleagues. To address this, you could try to ensure that a “data person” from both teams is present in meetings, alongside your usual contact people.
Remember: The legal team generally has the final word
The legal team is likely to be the final decision-maker in a request for data sharing, as it will need to ensure that no data privacy or protection regulations are contravened – even though the data you are given access to is likely to be anonymized. Due to the need to navigate these concerns, negotiating permission and establishing the terms and conditions of access can be a lengthy process. Factor this into your project timeline. In instances where the data-sharing entity is in a different country from your own, it may be worth appointing a local consultant who is more familiar with local regulations to negotiate with the data provider. We have also found it helpful to remain flexible about how we access data (see more detail below).
Carefully consider (and budget for) the practicalities of accessing and analyzing the data
The conditions around data access have been different for each partner organization we’ve worked with. In some cases, we could only access data on-site: We couldn’t take any of the data with us – not even electronic copies to analyze later – and had to sit in their offices and analyze the data on their computers. In other instances, we needed to meet with representatives from the central bank to explain our objectives before access was granted. In one case, a dataset we analyzed was uploaded to an external site, and we were provided with the transcription key and password. Being able to get the full extract and work on data on our own server was a bonus.
Regardless of the organization you’re working with, it’s important to pay attention to data localization requirements (ie: regulations regarding national data sovereignty). If the data needs to be analyzed on-site or in-country, this may impact the budget costs significantly. Unless you contract a team of data scientists in that location, it would be wise to factor in travel costs and accommodation for multiple trips (including initial trips to convince the data owners to part with this data – and, potentially, to allay their security concerns).
Even if you contract a local data team, remember that managing that team remotely could be time-consuming. You would also need to budget for the identification and contracting of suitable analysts, which may be trickier if you aren’t relying on an existing network of contacts.
Define your question and research parameters in advance
If you are required to specify the data parameters upfront, you should be specific about every aspect of the data you request: the fields and their definitions, the sampling methodology, the time range, etc. You should also check that the data provider isn’t excluding anything from the data (hidden fees, transfers within the bank, etc.) that they assume you don’t need. It’s a good idea to ask questions around the data to confirm that things will be adding up in the way you expect. For example, can you calculate the balance from the transactions that you observe, or are there hidden transactions that affect the balance you see?
In one of our projects, we were only able to make a one-time request for data: This proved tricky because we were unable to follow up on interesting hypotheses that emerged later, and which would have required additional data points to properly analyze.
It helps if your research question is clear from the outset – something we found out the hard way. Because the analysis of transaction data was a novel approach for us, we approached it with an exploratory mindset rather than an understanding of exactly what we were looking for. This made the analyst’s task more complicated, as it can be hard to spot interesting patterns if you aren’t exactly sure what you are looking for from the start.
But in reality, data projects are often a bit messy. Researchers are sometimes blind to the condition of the dataset they will getting – how clean the data will be, or whether the data scheme they receive is accurate. A clearly defined research question doesn’t guarantee a smooth project.
Sometimes the best way to understand what is available, and how it is structured, is to visit the data provider. However, as mentioned above, this can be costly and difficult to arrange, as you will need to send an experienced data scientist with good data engineering knowledge to inspect the data on-site. The data scientist may need to visit the provider several times over the course of the project – and ideally, this visit should be preceded by a series of calls with the data provider.
Make sure your project managers have relevant data skills
Having senior staff in your organization with data science or statistical skills available to direct the research is beneficial. Fairly open-ended exploration has been tricky for the analysts working on i2i projects, particularly where the project lead wasn’t completely familiar with the novel terminology and data science principles involved in transaction data processing.
Analyzing transaction data to understand people’s financial situations and their financial behavior has enabled i2i to unearth some useful insights – one of which is the huge gap between what people say they did with their money (in response to a survey question), and what they actually did with their money. And we couldn’t have generated these insights without customer data.
That’s why, despite the challenges we’ve outlined here, we’ll continue building upon our efforts to work with current and future partners to secure access to additional data. The lessons we’ve learned will help us in planning and scoping these future projects. And if nothing else, we’ll go into these projects knowing that, when a partner confidently claims that “the data is arriving soon,” it likely means we’ll need to extend our project timelines.
Photo credit: Tim Mossholder, via Pexels