Big Data Is Filling Gender Data Gaps—And Pushing Us Closer to Gender Equality


Imagine you are a government official in Nairobi, working to deploy resources to close educational achievement gaps throughout Kenya. You believe that the literacy rate varies widely in your country, but the available survey data for Kenya doesn’t include enough data about the country’s northern regions. You want to know where to direct programmatic resources, and you know you need detailed information to drive your decisions.

But you face a major challenge—the information does not exist.

Decision-makers want to use good data to inform policy and programs, but in many scenarios, quality, complete data is not available. And though this is true for large swaths of people around the world, this lack of information acutely impacts girls and women, who are often overlooked in data collection even when traditional surveys count their households. If we do not increase the availability and use of gender data, policymakers will not be able to make headway on national and global development agendas.

Gender data gaps are multiple and intersectional, and although some are closing, many persist despite the simultaneous explosion of new data sources emerging from new technologies. So, what if there was a way to utilize these new data sources to count those women and girls, and men and boys, who are left out by traditional surveys and other conventional data collection methods?

Big Data Meets Gender Data
“Big data” refers to large amounts of data collected passively from digital interactions with great variety and at a high rate of velocity. Cell phone use, credit card transactions, and social media posts all generate big data, as does satellite imagery which captures geospatial data.

In recent years, researchers have been examining the potential of big data to complement traditional data sources, but Data2X entered this space in 2014 because we observed that no one was investigating how big data could help increase the scope, scale, and quality of data about the lives and women and girls.

Data2X is a collaborative technical and advocacy platform that works with UN agencies, governments, civil society, academics, and the private sector to close gender data gaps, promote expanded and unbiased gender data collection, and use gender data to improve policies, strategies, and decision-making. We host partnerships which draw upon technical expertise, in-country knowledge, and advocacy insight to tackle and rectify gender data gaps. Across partnerships, this work necessitates experimental approaches.

And so, with this experimental approach in-hand, and with support from our funders, the William and Flora Hewlett Foundation and the Bill & Melinda Gates Foundation, Data2X launched four research pilots to build the evidence base for big data’s possible contributions to filling gender data gaps.

Think back to the hypothetical government official in Kenya trying to determine literacy rates in northern Kenya. This time, a researcher tells her that it’s possible – that by using satellite imagery to identify correlations between geospatial elements and well-being outcomes, the researcher can map the literacy rate for women across the entire country.

This is precisely what Flowminder Foundation, one of the four partner organizations in Data2X’s pilot research, was able to do. Researchers harnessed satellite imagery to fill data gaps, finding correlations between geospatial elements–such as accessibility, elevation, or distance to roads–and social and health outcomes for girls and women (as reported in traditional surveys) – such as literacy, access to contraception, and child stunting rates. Flowminder then mapped these phenomena, displaying continuous landscapes of gender inequality which can provide policymakers with timely information on regions with greatest inequality of outcomes and highest need for resources.

This finding, and many others, are outlined in a new Data2X report, “Big Data and the Well-Being of Women and Girls,” which for the first time showcases how big data sources can fill gender data gaps and inform policy on girls’ and women’s lives. In addition to the individual pilot research findings outlined in the report, there are four high-level takeaways from this first phase of our work:

Country Context is Key: The report affirms that in developing and implementing approaches to filling gender gaps, country context is paramount – and demands flexible experimentation. In the satellite imagery project, researchers’ success with models varied by country: models for modern contraceptive use performed strongly in Tanzania and Nigeria, whereas models for girls’ stunting rates were inadequate for all but one pilot country.

To Be Useful, Data Must Be Actionable: Even with effective data collection tools in place, data must be demand-driven and actionable for policymakers and in-country partners. Collaborating with National Statistics Offices, policymakers must articulate what information they need to make decisions and deploy resources to resolve gender inequalities, as well as their capacity to act on highly detailed data.

One Size Doesn’t Fit All: In filling gender data gaps, there is no one-size-fits-all solution. Researchers may find that in one setting, a combination of official census data and datasets made available through mobile operators sufficiently fills data gaps and provides information which meets policymakers’ needs. In another context, satellite imagery may be most effective at highlighting under-captured dimensions of girls’ and women’s lives in under-surveyed or resource-poor areas.

Ground Truth: Big data cannot stand alone. Researchers must “ground truth,” using conventional data sources to ensure that digital data enhances, but does not replace, information gathered from household surveys or official census reviews. We can never rely solely on data sources which carry implicit biases towards women and girls who experience fewer barriers to using technology and higher rates of literacy, leaving out populations with fewer resources.

Big data offers great promise to complement information captured in conventional data sources and provide new insights into potentially overlooked populations. There is significant potential for future, inventive applications of these data sources, opening up opportunities for researchers and data practitioners to apply big data to pressing gender-focused challenges.

When actionable, context-specific, and used in tandem with existing data, big data can strengthen policymakers’ evidence base for action, fill gender data gaps, and advance efforts to improve outcomes for girls and women.


Stanford sociologists encourage researchers to study human behavior with help of existing online communities, big data

A group of Stanford experts are encouraging more researchers who study social interaction to conduct studies that examine online environments and use big data.


The internet dominates our world and each one of us is leaving a larger digital footprint as more time passes. Those footprints are ripe for studying, experts say.

people standing against a wall interact with their phones; a graphic is superimposed to show their connections
A new paper urges sociologists and social psychologists to focus on developing online research studies with the help of big data to advance theories of social interaction and structure. (Image credit: pixelfit / Getty Images)

In a recently published paper, a group of Stanford sociology experts encourage other sociologists and social psychologists to focus on developing online research studies with the help of big data in order to advance the theories of social interaction and structure.

Companies have long used information they gather about their online customers to get insights into performance of their products, a process called A/B testing. Researchers in other fields, such as computer science, have also been taking advantage of the growing amount of data.

But the standard for many experiments on social interactions remains limited to face-to-face laboratory studies, said Paolo Parigi, a lead author of the study, titled “Online Field Experiments: Studying Social Interactions in Context.”

Parigi, along with co-authors Karen Cook, a professor of sociology, and Jessica Santana, a graduate student in sociology, are urging more sociology researchers to take advantage of the internet.

“What I think is exciting is that we now have data on interactions to a level of precision that was unthinkable 20 years ago,” said Parigi, who is also an adjunct professor in the Department of Civil and Environmental Engineering.

Online field experiments
In the new study, the researchers make a case for “online field experiments” that could be embedded within the structure of existing communities on the internet.

The researchers differentiate online field experiments from online lab experiments, which create a controlled online situation instead of using preexisting environments that have engaged participants.

“The internet is not just another mechanism for recruiting more subjects,” Parigi said. “There is now space for what we call computational social sciences that lies at the intersection of sociology, psychology, computer science and other technical sciences, through which we can try to understand human behavior as it is shaped and illuminated by online platforms.”

As part of this type of experiment, researchers would utilize online platforms to take advantage of big data and predictive algorithms. Recruiting and retaining participants for such field studies is therefore more challenging and time-consuming because of the need for a close partnership with the platforms.

But online field experiments allow researchers to gain an enhanced look at certain human behaviors that cannot be replicated in a laboratory environment, the researchers said.

For example, theories about how and why people trust each other can be better examined in the online environments, the researchers said, because the context of different complex social relationships is recorded. In laboratory experiments, researchers can only isolate the type of trust that occurs between strangers, which is called “thin” trust.

Most recently, Cook and Parigi have used the field experiment design to research the development of trust in online sharing communities, such as Airbnb, a home and room rental service. The results of the study are scheduled to be published later this year. More information about that experiment is available at

“It’s a new social world out there,” Cook said, “and it keeps expanding.”

Ethics of studying internet behavior
Using big data does come with a greater need for ethical responsibility. In order for the online studies of social interactions to be as accurate as possible, researchers require access to private information for their participants.

One solution that protects participants’ privacy is linking their information, such as names or email addresses, to unique identifiers, which could be a set of letters or numbers assigned to each research subject. The administrators of the platform would then provide those identifiers to researchers without compromising privacy.

It’s also important to make sure researchers acquire the permission of the online platforms’ participants. Transparency is key in those situations, Cook said.

The research was funded by the National Science Foundation.

Source: Stanford News

Big Data: Why NASA Can Now Visualize Its Lessons Learned


NASA’s Lessons Learned database is a vast, constantly updated collection knowledge and experience from past missions, which it relies on for planning future projects and expeditions into space.

With detailed information from every mission going back as far as the 60’s, every record is reviewed and approved before inclusion. As well as NASA staff, thousands of scientists, engineers, educators and analysts access the database every month from private-sector and government organizations.

As it has swollen in size, the interface used internally to query the dataset – a keyword-based search built on a PageRank-style algorithm – was becoming unwieldy. Chief Knowledge Architect David Meza spoke to me recently and told me that the move to the graph-based, open source Neo4J management system has significantly cut down on time engineers and mission planners spend combing through keyword-based search results.

Meza says “This came to light when I had a young engineer come to me because he was trying to explore our Lessons Learned database – but sometimes it’s hard to find the information you want in that database.

“He had 23 key terms he was trying to search for across the database of nearly 10 million documents, and because it was based on a PageRank algorithm the records nearest the top of the results were there because they were most frequently accessed, not necessarily because they had the right information.”

The gist of the problem was that even after searching the database, the engineer was left with around 1,000 documents which would need to be read through individually to know if they held information he needed.

“I knew there had to be something better we could do,” Meza says. “I started looking at graph database technologies and came across Neo4J. What was really interesting was the way it made it easier to combine information and showcase it in a graph form.

“To me, that is more intuitive, and I know a lot of engineers feel that way. It makes it easier to see patterns and see how things connect.”

The engineer was trying to solve a problem involving corrosion of valves, of the sort used in numerous technologies in use at Johnson Space Center, Texas, including environmental systems, oxygen and fuel tanks.

Using graph visualization, it quickly became apparent, for some reason, there was a high correlation between records involving this sort of corrosion and topics involving batteries.

“I couldn’t understand how these topics were related,” Meza says, “but when I started looking into the lessons within those topics I was quickly able to see that some of the condition where we had issues with lithium batteries leaking, and acid contaminating the tanks – we definitely had issues.

“So, if I’m concerned about the tanks and the valves within those tanks, I also have to be concerned about whether there are batteries close to them. Having this correlation built in allowed the engineer to find this out much faster.”

Correlating information graphically in this way makes it far quicker to spot links between potentially related information.

“To me, it’s a validation,” Meza says. “There are many different ways to look and search for information rather than just a keyword search. And I think utilizing new types of graph databases and other types of NoSQL databases really showcases this – often there are better ways than a traditional relational database management system.”

Neo4J is one of the most commonly used open source graph database management systems. It hit the headlines in 2016 when it was used as a primary tool by journalists working with the ICIJ to analyze the leaked, 2.6 terabyte Panama Papers for evidence of tax evasion, money-laundering and other criminal activity.

Obviously, to an organization as data-rich as NASA, there are clear benefits to thinking beyond keyword and PageRank when it comes to accessing information. NASA’s experience serves as another reminder that when you’re undertaking data-driven enterprise, volume of information isn’t always the deciding factor between success and failure, and in fact can sometimes be a hindrance. Often insights are just as likely to emerge from developing more efficient and innovative ways to query data, and clearer ways to communicate it to those who need it to do their jobs.

Source: Forbes

Coke Let People Make Any Flavor They Want, The People Demanded Cherry Sprite

Thanks to new machines that let customers flavor their drinks however they want, Coca-Cola discovered that what people really wanted was Cherry Sprite.


When it launched its design-your-own-flavor soda dispensers, Coca-Cola handed over the keys to its customers, letting them add a shot of flavor — say raspberry or vanilla or lemon — to any drink. In return, the touchscreen machines started sending Coca-Cola some very useful data on what its customers really want.

Now, eight years after introducing the machines, which have made their way into movie theaters and fast food outlets around the country, Coca-Cola is unveiling its first product created using all that data. People have spent years dialling their own flavor combinations into the machine, and the lesson was simple: The people demand Sprite Cherry.

To the casual soda drinker, Sprite Cherry may seem kind of predictable — it’s not a huge leap from Cherry Coke — and even a little disappointing considering the other options people could add to drinks, like strawberry, grape, peach, raspberry, orange, and vanilla. Serious Eats described Sprite Cherry as “kind of meh.”

But the people have spoken.

“There’s proven data that people actually love it,” said Bobby Oliver, director of Sprite & Citrus Brands for Coca-Cola North America. “It’s not just a survey where people say yes or no.”

Asked if cherry’s victory was a letdown, Oliver said, “We’re not disappointed at all.” Combining “lemon lime, with a twist of cherry flavor” allows the brand to “stay true to what Sprite is about,” he said.

Sprite has been an outlier in a shrinking soda business, with dollar sales up about 3.4% in 2016, according to Coca-Cola, citing Nielsen data. Meanwhile, the company’s overall revenues fell by 5% last year.

Introducing new products — especially beverages that aren’t soda — is part of Coca-Cola’s strategy, “We brought to market more than 500 new products, nearly 400 of which were tea, juices, coffees, waters or other still beverages,” CEO Muhtar Kent said to investors last week.

Sprite Cherry and Sprite Cherry Zero are the first Freestyle products to make it to Coca-Cola’s permanent lineup (there are other limited-time products like Sprite Cranberry), and are also the first new Sprite flavor since Sprite Zero was launched more than a decade ago. Coca-Cola announced Sprite Cherry in late 2016. Whether Sprite Cherry fans will find the bottled version as satisfying as the fountain soda remains to be seen.


MD Anderson Benches IBM Watson In Setback For Artificial Intelligence In Medicine

It was one of those amazing “we’re living in the future” moments. In an October 2013 press release, IBM declared that MD Anderson, the cancer center that is part of the University of Texas, “is using the IBM Watson cognitive computing system for its mission to eradicate cancer.”

Making Big Data User Friendly For Small Businesses


What do you think of when you think of “big data?”

If you’re like most of us, you probably think of large-scale IT projects. You might think of detailed analytics that are designed to make your head spin.

I mean, who needs to bother with all those annoying numbers, right?

Well, here’s the thing: big data isn’t just for big business. Big data is also important for small businesses.

If you’re not focusing enough on your analytics, you could be missing out on amazing growth opportunities for your business. You might be making decisions that hurt your business.

Big Data And Small Business
Big Data is by no means a new concept for most people in the business world. For many small businesses, the use of data technology has been mostly out of reach due to budget constraints and lack of in-house technical expertise.

If that’s the case for you and your business, you are a part of the 77 percent that don’t yet have a big data strategy. The emergence of self-service solutions, however, has been slowly opening the gates for small businesses and the opportunities to leverage internal data are growing.

Rita Sallam, VP of Research at Gartner, says that there are “approximately 70 percent of users in organizations that currently do not use BI tools or have statistical backgrounds.” Therefore, “New approaches have the potential to transform how and which users can derive insights from data discovery tools.”

If 70% of users were able to leverage big data insights without technical backgrounds, the impact on operations and revenue could be enormous. This is even more true for small businesses, as technical expertise is often siloed in IT departments.

That is why many startups are making data accessible to low-tech businesses. Uday Hegde is the CEO and Co-Founder of USEReady, a data analytics firm that helps businesses implement data solutions.

Hegde believes that self-service data is crucial for making business intelligence a reality for businesses of any size. “As self-service tools become more prevalent, non-technical employees can access data like never before. This helps executives at every level of the organization to conduct analysis and speed up the decision-making process.”

Making Data More User-Friendly
One of the biggest challenges to small businesses that are developing data analytics and business intelligence strategies is the way in which data insights are presented. Complicated excel sheets and poorly designed dashboards make it virtually impossible for non-IT professionals to use their data.

Self-service solutions are working to use better designing practices to help solve this problem. “By making data sets visual, business owners can start asking the right questions and making decisions based on hard facts rather than speculation.” Hegde explains. “The result is often better allocation of crucial technology, people, and resources.” The key is making data presentable so all stakeholders can use it.

A perfect example of how impactful data visualization techniques can be is this video from statistician and TED talker, Hans Rosling.

Zeroing in On the Right Kind of Data
Self-service data solutions are opening up new opportunities for businesses to figure out which data sets are the most useful. The number of vendors looking to help is always growing. Using data tools like Tableau, or CRM software like Hubspot, enable organizations to identify more specific data points to help them evaluate business performance.

Web traffic is a great example. It’s one of the most important pieces of data that a business owner can have. But for most organizations, it fails to offer any actionable insights. When a business owner is able to understand which demographics and customer segments are spending the most time on her website, she can use this data to improve her marketing efforts.

Tracking Year-Over-Year Data
It is not uncommon for small businesses to operate without large amounts of historical data. However, self-service tools are allowing them to collect information over much longer periods of time. This helps business owners create a better picture of long-term growth that goes deeper than traditional revenue or P&L numbers.

By tracking historical data, companies can begin to evaluate the success of key business decisions, both in the short and long-term. Executives can avoid costly errors based on information from previous initiatives that performed poorly. Additionally, they could identify which parts of the business are most profitable and identify new ways to expand those services.

Small businesses that successfully deploy self-service data solutions can enjoy increased profits and reduced risk by identifying problems sooner rather than later. Hegde asserts that “all businesses need a clear data strategy to create a competitive advantage.” As the technology continues to develop and the number of providers catering to businesses of all sizes increase, it can be expected that data will continue to be one of the most important assets an organization can have.

Final Thoughts
Most small business owners assume that “big data” is for “big business.” But it’s not true. If you are able to improve the way your business looks at its metrics, you can make better decisions. You can avoid taking actions that waste time and money. In the end, a better business intelligence strategy will make your company more effective.