Can data change lives in India?

A New Delhi start-up is helping to police social progress through analytics and visualisation

Perhaps it’s just perception. Anyone who has been a visitor to India will have wildly varying views and memories. Some will see the poverty and won’t be able to get beyond it. Some will see the colour of the clothes in Rajasthan and assume that is what all of India is like. Others will see the vivid and varying landscape, the excellent food and friendly, curious people, while others may just see auto rickshaws, traffic chaos and pollution. India, like any country, is a mix of good and bad and its social problems have, from the outside at least, always been perceived as problems relating to scale and isolation. Too many disconnected people living in villages, lacking literacy and numeracy and anchored by caste or sex, or both. Too many people drawn to cities with infrastructures unable to cope quickly enough with changing demands. Maybe just too many people.

Speaking at the Shri Mata Vaishno Devi University at Katra in Jammu and Kashmir in April, Indian Prime Minister Narendra Modi suggested that in fact there is power in numbers. India, he said “will lead the 21st century” because this is the knowledge economy and India has “800 million in youth power, which is below 35 years.” The young, he suggested, will know how to rule through knowledge and technology, and India, through its growing population of 1.2 billion people, certainly has mass. 

So how can the country’s perceptions be challenged? How can governments make necessary changes to meet the demands of a fast growing, transient workforce? How can governments divert necessary funds to tackle immediate problems in communities no matter how remote? And how can it empower its new generation, the 800 million under the age of 35?

“Data can be incredibly helpful in solving some of the world’s most critical problems, whether it’s using crime data to find the safest route home for a woman, targeting government schemes to the people who need them the most, measuring and identifying which schools are failing their students, or determining the most important priorities for a national health budget,” says Prukalpa Sankar, a co-founder of New Delhi-based start-up SocialCops and who, aged 23 last year, made Forbes India’s 30 Under 30 list.

Prukalpa and co-founder Varun Banka fit the profile of Modi’s new India. Young and ambitious, they are driven to succeed but what is impressive is how their energies are being put to social good. The clue is in the name. SocialCops is a business using Big Data collection and analytics to support governments and non-government organisations, or in some cases to highlight issues that organisations need to address.

During the Chennai floods last November, SocialCops helped collate data via a mobile app (that doesn’t require internet access) to help volunteers map the worst-hit areas. It’s an example of how these bright minds are joining the dots, using the proliferation of mobile phones in India (there are more than one billion mobile subscribers now) and an understanding of data to solve problems.

So how do they do it?

“The question is not whether data can be used to solve these problems,” says Prukalpa. “After all, most governments in the world are using data at some level to make decisions, many philanthropic organizations tout themselves as data-driven, and nearly every grant requires quantifiable data on a nonprofit’s impact. At this point, using data is a given. The question is how data can be used effectively to solve these problems, which is exactly what we’re working to improve.”

SocialCops has built a data platform that can collect data through a mobile app. It can, says Prukalpa, even track GPS locations without internet access on low-cost devices, an important feature during the Chennai floods. All data sets are cleaned using machine learning, including freely available public data sets, and stored in the SocialCops data repository. The result is a source of usable data that geocodes and builds data indices. Users can then visualise that data, layering data sets depending on their particular goal.

“A great example of how we’ve used our platform to solve important problems is our work with Oxfam India on RTE,” says Prukalpa. “India passed the Right to Education Act in 2009 which, among other things, sets certain infrastructure norms (male and female toilet facilities, boundary wall, drinking water, et cetera), and certain student-teacher and student-classroom ratios for schools.

“However, only eight per cent of schools currently comply with RTE norms. To increase awareness around this important problem, we used our platform to analyse data for 1.4 million schools in India and build a visual data index on RTE compliance. We gave each district a score on how well it’s complying with RTE. Users can either search the district name or click on the district on the map, making it engaging for people to find their own district scores. The district score was compared with the state score, as well as India’s overall score, to give an overall perspective.”

Air pollution

SocialCops uses a mix of government data and data collected through surveys or even independent data uploaded and validated through its machine learning data transformation tool. In India, Prukalpa says that there are some unique challenges. With less than a third of the population accessing the internet (and those that can are approximately 70 per cent male) it’s important to rely on data collected on the ground. The government is the main organisation that produces data from the ground, whether it’s the National Census which covers almost every citizen or the DISE dataset which covers 14 million schools across India.

To get an idea of the breadth of data analytics coming out of SocialCops you only have to look at the regular email stream of visualisations covering everything from air pollution in Delhi through to women in work or mental healthcare in India. The air pollution one is a new project, called Peppered Moth. It’s an experiment using specially designed sensors fixed to five auto rickshaws. Each sensor device cost around $100 to make and each device takes two readings per minute. With five devices deployed, the pollution reading for each hour is an average of 600 data points, and the Air Quality Index for each day is calculated from almost 15,000 distinct readings.

The company admits it will not create a wholly accurate picture – it needs more rickshaws to increase the sample size but it will provide a set of data that can at least start to show trends, where in Delhi air pollution is at its worst, for example. The plan is then to compare this with health data and provide a picture that the local government can refer to when making decisions on air quality improvements. Similar techniques can be used to tackling other issues such as water scarcity, waste management and energy use.

Of course Prukalpa is aware that building the picture through data is only one step in changing lives and communities; governments and agencies have to respond on the ground with real intervention and policy. So is this happening?

Reality not perception

“Everyone is excited about using data to confront the world’s most critical problems, so getting people on board has not been an issue,” she says. “Over 150 organisations across seven countries currently use our platform, and this number is just continuing to grow.”

She adds though that everyone, governments, NGOs and even SocialCops are still learning and it’s not easy to pinpoint where and when problems have been solved strictly due to data.

“Our other big learning is that big problems are always interconnected,” says Prukalpa. “For example, imagine you’re trying to solve the problem of low attendance rates at village schools. This is an education problem, yes, so it’s important to look at the school’s infrastructure, how often the teacher shows up, how far the school is from the students it serves, and so on. 

“However, low attendance is also related to sanitation, since girls are less able to come to school if there is no toilet for them to use. Low attendance is also linked to employment, since kids drop out of school to work if their family needs the income. It’s also linked to health, since a child that is perpetually sick cannot attend school. It’s linked to water, since poor water can make children sick and prevent them from coming to school. It’s linked to infrastructure, since a child is less able to come to school if there’s no bus or road to transport him there. And so on…

“These contexts — where everything is interlinked — is where data is crucial. It’s incredibly difficult to tease out the relationship between all of these factors and determine what is actually causing low attendance without data.”

And that’s the point. Data can, if collected and managed correctly, be the window on worlds not previously accessible in any level of accuracy. Decisions based on fact and not whim. Reality not perception. That’s the aim, not just for SocialCops but for all its partners and clients too.


Related reading:

Sugata Mitra: What the Slumdog guru did next

Slumdog e-learning guru plans cloud schools