The Big (Unstructured) Data Problem

OnTheGo

The face of data breaches changed last year. The one that marked that change for me was the breach that involved former Secretary of State Colin Powell’s Gmail account. Targeted to expose the Hillary Clinton campaign, Colin Powell’s emails were posted on DCLinks.com for everyone to read. One of them had an attachment listing Salesforce’s acquisition targets and the details of its M&A strategy. Colin Powell, a member of Salesforce’s board, had access, through his personal email account, to sensitive information. When his personal email was hacked, all of that sensitive information was exposed — and blasted out in the headlines.

Corporations are trying to lock down sensitive information, most of it in structured systems and in data centers with a variety of security solutions. As it is getting harder for hackers to get to the data they want, they are finding the weakest path to that data and evolving their attack vector. Unstructured data is that new attack vector.

Most enterprises do not understand how much sensitive data they have, and when we consider how much unstructured data (emails, PDFs and other documents) a typical enterprise has under management, the red flags are clear and present. Analysts at Gartner (gated) estimate that upward of 80% of enterprise data today is unstructured. This is a big data problem, to say the least. As the level of unstructured data rises and hackers shift their focus to it, unstructured data is an issue that can no longer be placed on the enterprise IT back burner.

What Exactly Is Unstructured Data?

Unstructured data is any data that resides in emails, files, PDFs or documents. Sensitive unstructured data is usually data that was first created in a protected structured system such as SAP Financials for example and then exported into an Excel spreadsheet for easier consumption by audiences who are not SAP users.

Let me give you a very common example in any public company: Every quarter, a PR department receives the final quarterly financial numbers via email ahead of the earnings announcement in order to prepare a press release. The PR draft will be shared via email by a select group within the company before being approved and ready to be distributed out on the news wires. When pulling that financial information from the ERP system — a system that usually lives behind the corporate firewall with strong security and identity controls in place and with business owners who govern access to the systems and data within — we’ve instantly taken that formerly safe data and shared it freely by email as an Excel file.

A hacker could easily try to hack the credentials of a key employee rather than break into the network and tediously make his or her way to the ERP system. The path to getting the coveted earnings data can be easily shortened by focusing on its unstructured form shared via email or stored in files with limited security.

Right now, enterprises are woefully unprepared. Nearly 80% of enterprises have very little visibility into what’s happening across their unstructured data, let alone how to manage it. Enterprises are simply not ready to protect data in this form because they don’t understand just how much they have. Worse yet, they don’t even know what lies within those unstructured data files or who owns these files. Based on a recent survey created by my company, as many as 71% of enterprises are struggling with how to manage and protect unstructured data.

This is especially concerning when we consider the looming General Data Protection Regulation (GDPR) deadline. When that regulation takes effect in May 2018, any consumer data living in these unmanaged files that is exposed during a breach would immediately open the organization up to incredibly steep penalties. While regulations like GDPR put fear into companies, it may be a while before they start to take action. Many companies are struggling to strike the right balance between focusing on reacting to security threats versus time spent evaluating the broader picture of proactively managing risk for their company.

The Path Forward

Enterprises simply cannot afford to ignore the big unstructured data problem any longer. They need an actionable plan, one that starts with this four-step process:

•Find your unstructured data. Sensitive data is most likely spread out across both structured systems (i.e., your ERP application) and unstructured data (i.e., an Excel spreadsheet with exported data from your ERP app) that lives in a file share or the numerous cloud storage systems companies use today for easier cross-company sharing and collaboration.
•Classify and assign an owner to that data. Not all data has value, but even some stale data may still be of sensitive nature. Take the time to review all data and classify it to help you focus only on the most sensitive areas. Then assign owners to the classified unstructured data. If you do not know whom it belongs to, ask the many consumers of that data; they usually always point in the same direction — its natural owner.
•Understand who has access to your data. It’s extremely important to understand who has access to all sensitive company information, so access controls need to be placed on both structured and unstructured data.
•Put parameters around your data. Sensitive data should be accessed on a “need to know” basis, meaning only a select few in the company should have regular access to your more sensitive files, the ones that could have serious consequences if they ended up in the wrong hands.

With these steps in place, you can better avoid anyone within your company from having access to a file that they don’t need to do their job and ultimately minimize the risk of a breach. And although there are data access governance solutions that help corporations protect unstructured data, very few enterprises today have such a program in place. Ultimately, these solutions will need to find their way into enterprises as hackers once again change their attack vector to easier prey.

Source: Forbes

Using Cell Phone Data to Predict the Next Epidemic

Whom you call is linked to where you travel, which dictates how viruses spread.Champion

Can big data about whom we call be used to predict how a viral epidemic will spread?

It seems unlikely. After all, viruses do not spread over a cell network; they need us to interact with people in person.

Yet, it turns out that the patterns in whom we call can be used to predict patterns in where we travel, according to new research from Kellogg’s Dashun Wang. This in turn can shed light on how an epidemic would spread.

Both phone calls and physical travel are highly influenced by geography. The further away a shopping mall or post office is from our home, after all, the less likely we are to visit it. Similarly, our friends who live in the neighborhood are a lot likelier to hear from us frequently than our extended family in Alberta.

But Wang and colleagues were able to take this a step further. By analyzing a huge amount of data on where people travel and whom they call, they were able to determine the mathematical formula that illustrates the link between how distance impacts these two very different activities. This understanding provides a framework for using data about long-distance interactions to predict physical ones—and vice versa.

As humans, we do not like to think that someone could anticipate our actions, says Wang, an associate professor of management and organizations. But his evidence says otherwise. “It’s just fascinating to see this kind of deep mathematical relationship in human behavior,” he says.

Wang’s conclusions were based on the analysis of three massive troves of cell phone data collected for billing purposes. The data, from three nations spanning two continents, included geographic information about where cell phone users traveled, as well as information about each phone call placed or received, and how far a user was from the person on the other end of the line.

The discovery of this underlying relationship between physical and nonphysical interactions has significant practical implications. For example, the researchers were able to model the spread of a hypothetical virus, which started in a few randomly selected people and then spread to others in the vicinity, using only the data about the flow of phone calls between various parties. Those predictions were remarkably similar to ones generated by actual information about where users traveled and thus where they would be likely to spread or contract a disease.

“I think that’s a great example to illustrate the opportunities brought about by big data,” Wang says. “The paper represents a major step in our quantitative understanding of how geography governs the way in which we are connected. These insights can be particularly relevant in a business world that is becoming increasingly interconnected.”

Source: Kellogg Insight

Why Big Data Will Revolutionize B2B Marketing Strategies

B2B, or business to business marketing, involves selling of a company’s services or products to another company. Consumer marketing and B2B marketing are really not that different. Basically, B2B uses the same principles to market its product but the execution is a little different. B2B buyers make their purchases solely based on price and profit
OnTheGo
Why Big Data Will Revolutionize B2B Marketing Strategies | Innovation Management

B2B, or business to business marketing, involves selling of a company’s services or products to another company. Consumer marketing and B2B marketing are really not that different. Basically, B2B uses the same principles to market its product but the execution is a little different. B2B buyers make their purchases solely based on price and profit potential while consumers make their purchases based on emotional triggers, status, popularity, and price. B2B is a large industry.

The fact that more than 50 percent of all economic activity in the United States is made up of purchases made by institutions, government agencies, and business gives you a perspective of the size of this industry. Technological advancements and the internet has given B2Bs new ways to make sense of their big data, learn about prospects, and improve their conversion rates. Innovations such as marketing automation platforms and marketing technology — sometimes referred to as ‘martech’ — will revolutionize the way B2B companies market their products. They will be able to deliver mass personalization and nurture leads through the buyer’s journey.

In the next few years, these firms will be spending 73% more on marketing analytics. What does this mean for B2B marketing? The effects of new technology on B2B marketing will be more pronounced in some key areas. These are:

Lead Generation

In the old days, businesses had to spend fortunes on industry reports and market research to find how and to whom to market their products. They had to build their marketing efforts based on what their existing customer base seems to like. However, growing access to technology and analytics has made revenue attribution and lead nurturing a predictable, measurable, and a more structured process. While demand generation is an abstraction or a form of art (largely depends on who you ask), lead generation is a repeatable scientific process. This means less guesswork and more revenue.

Small Businesses

Thanks to SaaS (software-as-a-service) revolution, technologies once only available to elite firms—revenue reporting, real-time web analytics, and marketing automation — are now accessible and affordable to businesses of all sizes. Instead of attempting to build economies of scale, smaller businesses are using the power of these innovations to give the bigger competition a run for their dough. With SaaS, small businesses can now narrow their approaches and zero in on key accounts.

In the context of business to business marketing, this means that instead of trying to attract unqualified, uncommitted top-tier leads, these companies will go after matched stakeholders and accounts and earn their loyalty by providing exceptional customer experiences.

Data Analytics

A few years ago, data was the most underutilized asset in the hands of a marketer. That has since changed. Marketers are quickly coming to the realization that when it comes to their trade, big data is now more valuable than ever — measuring results, targeting prospects, and improving campaigns — and are in search of more ways to exploit it. B2B marketing is laden with new tools that capitalize on data points. These firms use data scraping techniques and tools to customize their sites for their target audiences. Business can even use predictive lead scoring to gauge the performance of leads in the future. Apache Kafka provides a Distributed Streaming Platform for building a real-time data pipeline in addition to streaming mobile apps.

Revenue

The integration of marketing automation and CRM has made it easier for B2Bs to track and measure marketing campaign efforts through revenue marketing. It has always been hard for firms to calculate their return on marketing investment (ROMI).

Technological advancements have some exciting parallels in the B2B industry. In order to exploit this technology and gain a competitive edge, companies have to stay up to date. The risk involved is very minimal so these firms have absolutely nothing to worry about.

Source: Innovation Management

How To Build A Big Data Engineering Team

OnTheGo

Companies are digitizing and pushing all their operational functions and workflows into IT systems that benefit from so-called ‘big data’ analytics. Using this approach, firms can start to analyze the massive firehose stream of data now being recorded by the Internet of Things (IoT) with its sensors and lasers designed to monitor physical equipment. They can also start to ingest and crunch through the data streams being produced in every corner of the what is now a software-driven data-driven business model.

All well and good, but who is going to do all this work? It looks like your company just had to establish a data engineering department.

Drinking from the data firehose
As a technologist, writer and speaker on software engineering, Aashu Virmani also holds the role of chief marketing officer at  in-database analytics software company Fuzzy Logix (known for its DB Lytix product). Virmani claims that there’s gold in them thar data hills, if we know how to get at it. This is the point where firms start to realize that they need to invest in an ever larger army of data engineers and data scientists.

But who are these engineers and scientists? Are they engineers in the traditional sense with greasy spanners and overalls? Are they scientists in the traditional sense with bad hair and too many ballpoint pens in their jacket pockets? Not as such, obviously, because this is IT.

What the difference between a data scientist & a data engineer?
“First things first, let’s ensure we understand what the difference between a data scientist and a data engineer really is because, if we know this, then we know how best to direct them to drive value for the business. In the most simple of terms, data engineers worry about data infrastructure while data scientists are all about analysis,” explains Fuzzy Logix’s Virmani.

Boiling it down even more, one prototypes and the other deploys.

Is one role more important than the other? That’s a bit like asking whether a fork is more important than a knife. Both have their purposes and both can operate independently. But in truth, they really come into their own when used together.

What makes a good data scientist?
“They (the data scientist) may not have a ton of programming experience but their understanding of one or more analytics frameworks is essential. Put simply, they need to know which tool to use (and when) from the tool box available to them. Just as critically, they must be able to spot data quality issues because they understand how the algorithms work,” said Virmani.

He asserts that a large part of their role is hypothesis testing (confirming or denying a well-known thesis) but the data scientist that knows their stuff will impartially let the data tell its own story.

Virmani continued, “Visualizing the data is just as important as being a good statistician, so the effective data scientist will have knowledge of some visualization tools and frameworks to, again, help them tell a story with the data. Lastly, the best data scientists have a restless curiosity which compels them to try and fail in the process of knowledge discovery.”

What makes a good data engineer?
To be effective in this role, a data engineer needs to know the database technology. Cold. Teradata, IBM, Oracle, Hadoop are all ‘first base’ for the data engineer you want in your organization.

“In addition to knowing the database technology, the data engineer has an idea of the data schema and organization – how their company’s data is structured, so he or she can put together the right data sets from the right sources for the scientist to explore,” said Virmani.

The data engineer will be utterly comfortable with the ‘pre’ and ‘post’ tasks before data science will even occur.  The ‘pre’ tasks mostly deal with what we call ETL – Extract, Transform, Load.

Virmani continued, “Often it may be the case that the data science is happening not in the same platform, but an experimental copy of the database and often in a small subset of the data. It is also frequently the case that IT may own the operational database and may have strict rules on how/when the data can be accessed.  A data science team needs a ‘sandbox’ in which to play – either in the same DB environment, or in a new environment intended for data scientists. A data engineer makes that possible. Flawlessly.”

Turning to ‘post’ tasks, once the data science happens (say, a predictive model is built that determines ‘which credit card transactions are fraudulent’), the process needs to be ‘operationalized’.  This requires that the analytic model developed by the data scientists be moved from the ‘sandbox’ environment to the real production/operational database, or transaction system. The data engineer is the role that can take the output of the data scientist and help put this into production. Without this role, there will be tons of insights (some proven, some unproven) but nothing put into production to see if the model is providing the business value in real time or not.

Ok, so you now understand what ‘good’ looks like in terms of data scientists and engineers but how do you set them up for success?

How to make your big data team work
The first and most important factor here is creating the right operational structure to allow both parties to work collaboratively and to gain value from each other. Both roles function best when supported by the other so create the right internal processes to allow this to happen.

Fuzzy Logix’s Virmani heeds that we should never let this not become a tug of war between the CIO and the CDO (chief data officer) – where the CDO’s organization just wants to get on with the analysis/exploration, while the IT team wants to control access to every table/row there is (for what may be valid reasons).

“Next, invest in the right technologies to allow them to maximize their time and to focus in the right areas. For example, our approach at Fuzzy Logix is to embed analytics directly into applications and reporting tools so allowing data scientists to be freed up to work on high value problems,” he said.

Don’t nickel & dime on talent

Speaking to a number of firms in the big data space trying to establish big data teams, one final truth appears to resonate — don’t nickel and dime on talent. These roles are new (comparatively) and if you pay for cheap labor then most likely you’re not going to get data engineering or data science gold.

Fuzzy Logix offers in-database and GPU-based analytics solutions built on libraries of over 600 mathematical, statistical, simulation, data mining, time series and financial models. The firm has an (arguably) non-corporate (relatively) realistic take on real world big data operations and this conversation hopefully sheds some light on the internal mechanics of a department that a lot of firms are now working to establish.

Source: Forbes

Big Data Is Filling Gender Data Gaps—And Pushing Us Closer to Gender Equality

OnTheGo

Imagine you are a government official in Nairobi, working to deploy resources to close educational achievement gaps throughout Kenya. You believe that the literacy rate varies widely in your country, but the available survey data for Kenya doesn’t include enough data about the country’s northern regions. You want to know where to direct programmatic resources, and you know you need detailed information to drive your decisions.

But you face a major challenge—the information does not exist.

Decision-makers want to use good data to inform policy and programs, but in many scenarios, quality, complete data is not available. And though this is true for large swaths of people around the world, this lack of information acutely impacts girls and women, who are often overlooked in data collection even when traditional surveys count their households. If we do not increase the availability and use of gender data, policymakers will not be able to make headway on national and global development agendas.

Gender data gaps are multiple and intersectional, and although some are closing, many persist despite the simultaneous explosion of new data sources emerging from new technologies. So, what if there was a way to utilize these new data sources to count those women and girls, and men and boys, who are left out by traditional surveys and other conventional data collection methods?

Big Data Meets Gender Data
“Big data” refers to large amounts of data collected passively from digital interactions with great variety and at a high rate of velocity. Cell phone use, credit card transactions, and social media posts all generate big data, as does satellite imagery which captures geospatial data.

In recent years, researchers have been examining the potential of big data to complement traditional data sources, but Data2X entered this space in 2014 because we observed that no one was investigating how big data could help increase the scope, scale, and quality of data about the lives and women and girls.

Data2X is a collaborative technical and advocacy platform that works with UN agencies, governments, civil society, academics, and the private sector to close gender data gaps, promote expanded and unbiased gender data collection, and use gender data to improve policies, strategies, and decision-making. We host partnerships which draw upon technical expertise, in-country knowledge, and advocacy insight to tackle and rectify gender data gaps. Across partnerships, this work necessitates experimental approaches.

And so, with this experimental approach in-hand, and with support from our funders, the William and Flora Hewlett Foundation and the Bill & Melinda Gates Foundation, Data2X launched four research pilots to build the evidence base for big data’s possible contributions to filling gender data gaps.

Think back to the hypothetical government official in Kenya trying to determine literacy rates in northern Kenya. This time, a researcher tells her that it’s possible – that by using satellite imagery to identify correlations between geospatial elements and well-being outcomes, the researcher can map the literacy rate for women across the entire country.

This is precisely what Flowminder Foundation, one of the four partner organizations in Data2X’s pilot research, was able to do. Researchers harnessed satellite imagery to fill data gaps, finding correlations between geospatial elements–such as accessibility, elevation, or distance to roads–and social and health outcomes for girls and women (as reported in traditional surveys) – such as literacy, access to contraception, and child stunting rates. Flowminder then mapped these phenomena, displaying continuous landscapes of gender inequality which can provide policymakers with timely information on regions with greatest inequality of outcomes and highest need for resources.

This finding, and many others, are outlined in a new Data2X report, “Big Data and the Well-Being of Women and Girls,” which for the first time showcases how big data sources can fill gender data gaps and inform policy on girls’ and women’s lives. In addition to the individual pilot research findings outlined in the report, there are four high-level takeaways from this first phase of our work:

Country Context is Key: The report affirms that in developing and implementing approaches to filling gender gaps, country context is paramount – and demands flexible experimentation. In the satellite imagery project, researchers’ success with models varied by country: models for modern contraceptive use performed strongly in Tanzania and Nigeria, whereas models for girls’ stunting rates were inadequate for all but one pilot country.

To Be Useful, Data Must Be Actionable: Even with effective data collection tools in place, data must be demand-driven and actionable for policymakers and in-country partners. Collaborating with National Statistics Offices, policymakers must articulate what information they need to make decisions and deploy resources to resolve gender inequalities, as well as their capacity to act on highly detailed data.

One Size Doesn’t Fit All: In filling gender data gaps, there is no one-size-fits-all solution. Researchers may find that in one setting, a combination of official census data and datasets made available through mobile operators sufficiently fills data gaps and provides information which meets policymakers’ needs. In another context, satellite imagery may be most effective at highlighting under-captured dimensions of girls’ and women’s lives in under-surveyed or resource-poor areas.

Ground Truth: Big data cannot stand alone. Researchers must “ground truth,” using conventional data sources to ensure that digital data enhances, but does not replace, information gathered from household surveys or official census reviews. We can never rely solely on data sources which carry implicit biases towards women and girls who experience fewer barriers to using technology and higher rates of literacy, leaving out populations with fewer resources.

Big data offers great promise to complement information captured in conventional data sources and provide new insights into potentially overlooked populations. There is significant potential for future, inventive applications of these data sources, opening up opportunities for researchers and data practitioners to apply big data to pressing gender-focused challenges.

When actionable, context-specific, and used in tandem with existing data, big data can strengthen policymakers’ evidence base for action, fill gender data gaps, and advance efforts to improve outcomes for girls and women.

Source: cfr.org

Stanford sociologists encourage researchers to study human behavior with help of existing online communities, big data

A group of Stanford experts are encouraging more researchers who study social interaction to conduct studies that examine online environments and use big data.

OnTheGo

The internet dominates our world and each one of us is leaving a larger digital footprint as more time passes. Those footprints are ripe for studying, experts say.

people standing against a wall interact with their phones; a graphic is superimposed to show their connections
A new paper urges sociologists and social psychologists to focus on developing online research studies with the help of big data to advance theories of social interaction and structure. (Image credit: pixelfit / Getty Images)

In a recently published paper, a group of Stanford sociology experts encourage other sociologists and social psychologists to focus on developing online research studies with the help of big data in order to advance the theories of social interaction and structure.

Companies have long used information they gather about their online customers to get insights into performance of their products, a process called A/B testing. Researchers in other fields, such as computer science, have also been taking advantage of the growing amount of data.

But the standard for many experiments on social interactions remains limited to face-to-face laboratory studies, said Paolo Parigi, a lead author of the study, titled “Online Field Experiments: Studying Social Interactions in Context.”

Parigi, along with co-authors Karen Cook, a professor of sociology, and Jessica Santana, a graduate student in sociology, are urging more sociology researchers to take advantage of the internet.

“What I think is exciting is that we now have data on interactions to a level of precision that was unthinkable 20 years ago,” said Parigi, who is also an adjunct professor in the Department of Civil and Environmental Engineering.

Online field experiments
In the new study, the researchers make a case for “online field experiments” that could be embedded within the structure of existing communities on the internet.

The researchers differentiate online field experiments from online lab experiments, which create a controlled online situation instead of using preexisting environments that have engaged participants.

“The internet is not just another mechanism for recruiting more subjects,” Parigi said. “There is now space for what we call computational social sciences that lies at the intersection of sociology, psychology, computer science and other technical sciences, through which we can try to understand human behavior as it is shaped and illuminated by online platforms.”

As part of this type of experiment, researchers would utilize online platforms to take advantage of big data and predictive algorithms. Recruiting and retaining participants for such field studies is therefore more challenging and time-consuming because of the need for a close partnership with the platforms.

But online field experiments allow researchers to gain an enhanced look at certain human behaviors that cannot be replicated in a laboratory environment, the researchers said.

For example, theories about how and why people trust each other can be better examined in the online environments, the researchers said, because the context of different complex social relationships is recorded. In laboratory experiments, researchers can only isolate the type of trust that occurs between strangers, which is called “thin” trust.

Most recently, Cook and Parigi have used the field experiment design to research the development of trust in online sharing communities, such as Airbnb, a home and room rental service. The results of the study are scheduled to be published later this year. More information about that experiment is available at stanfordexchange.org.

“It’s a new social world out there,” Cook said, “and it keeps expanding.”

Ethics of studying internet behavior
Using big data does come with a greater need for ethical responsibility. In order for the online studies of social interactions to be as accurate as possible, researchers require access to private information for their participants.

One solution that protects participants’ privacy is linking their information, such as names or email addresses, to unique identifiers, which could be a set of letters or numbers assigned to each research subject. The administrators of the platform would then provide those identifiers to researchers without compromising privacy.

It’s also important to make sure researchers acquire the permission of the online platforms’ participants. Transparency is key in those situations, Cook said.

The research was funded by the National Science Foundation.

Source: Stanford News

Big Data: Why NASA Can Now Visualize Its Lessons Learned

onthego

NASA’s Lessons Learned database is a vast, constantly updated collection knowledge and experience from past missions, which it relies on for planning future projects and expeditions into space.

With detailed information from every mission going back as far as the 60’s, every record is reviewed and approved before inclusion. As well as NASA staff, thousands of scientists, engineers, educators and analysts access the database every month from private-sector and government organizations.

As it has swollen in size, the interface used internally to query the dataset – a keyword-based search built on a PageRank-style algorithm – was becoming unwieldy. Chief Knowledge Architect David Meza spoke to me recently and told me that the move to the graph-based, open source Neo4J management system has significantly cut down on time engineers and mission planners spend combing through keyword-based search results.

Meza says “This came to light when I had a young engineer come to me because he was trying to explore our Lessons Learned database – but sometimes it’s hard to find the information you want in that database.

“He had 23 key terms he was trying to search for across the database of nearly 10 million documents, and because it was based on a PageRank algorithm the records nearest the top of the results were there because they were most frequently accessed, not necessarily because they had the right information.”

The gist of the problem was that even after searching the database, the engineer was left with around 1,000 documents which would need to be read through individually to know if they held information he needed.

“I knew there had to be something better we could do,” Meza says. “I started looking at graph database technologies and came across Neo4J. What was really interesting was the way it made it easier to combine information and showcase it in a graph form.

“To me, that is more intuitive, and I know a lot of engineers feel that way. It makes it easier to see patterns and see how things connect.”

The engineer was trying to solve a problem involving corrosion of valves, of the sort used in numerous technologies in use at Johnson Space Center, Texas, including environmental systems, oxygen and fuel tanks.

Using graph visualization, it quickly became apparent, for some reason, there was a high correlation between records involving this sort of corrosion and topics involving batteries.

“I couldn’t understand how these topics were related,” Meza says, “but when I started looking into the lessons within those topics I was quickly able to see that some of the condition where we had issues with lithium batteries leaking, and acid contaminating the tanks – we definitely had issues.

“So, if I’m concerned about the tanks and the valves within those tanks, I also have to be concerned about whether there are batteries close to them. Having this correlation built in allowed the engineer to find this out much faster.”

Correlating information graphically in this way makes it far quicker to spot links between potentially related information.

“To me, it’s a validation,” Meza says. “There are many different ways to look and search for information rather than just a keyword search. And I think utilizing new types of graph databases and other types of NoSQL databases really showcases this – often there are better ways than a traditional relational database management system.”

Neo4J is one of the most commonly used open source graph database management systems. It hit the headlines in 2016 when it was used as a primary tool by journalists working with the ICIJ to analyze the leaked, 2.6 terabyte Panama Papers for evidence of tax evasion, money-laundering and other criminal activity.

Obviously, to an organization as data-rich as NASA, there are clear benefits to thinking beyond keyword and PageRank when it comes to accessing information. NASA’s experience serves as another reminder that when you’re undertaking data-driven enterprise, volume of information isn’t always the deciding factor between success and failure, and in fact can sometimes be a hindrance. Often insights are just as likely to emerge from developing more efficient and innovative ways to query data, and clearer ways to communicate it to those who need it to do their jobs.

Source: Forbes