NASA’s Lessons Learned database is a vast, constantly updated collection knowledge and experience from past missions, which it relies on for planning future projects and expeditions into space.
With detailed information from every mission going back as far as the 60’s, every record is reviewed and approved before inclusion. As well as NASA staff, thousands of scientists, engineers, educators and analysts access the database every month from private-sector and government organizations.
As it has swollen in size, the interface used internally to query the dataset – a keyword-based search built on a PageRank-style algorithm – was becoming unwieldy. Chief Knowledge Architect David Meza spoke to me recently and told me that the move to the graph-based, open source Neo4J management system has significantly cut down on time engineers and mission planners spend combing through keyword-based search results.
Meza says “This came to light when I had a young engineer come to me because he was trying to explore our Lessons Learned database – but sometimes it’s hard to find the information you want in that database.
“He had 23 key terms he was trying to search for across the database of nearly 10 million documents, and because it was based on a PageRank algorithm the records nearest the top of the results were there because they were most frequently accessed, not necessarily because they had the right information.”
The gist of the problem was that even after searching the database, the engineer was left with around 1,000 documents which would need to be read through individually to know if they held information he needed.
“I knew there had to be something better we could do,” Meza says. “I started looking at graph database technologies and came across Neo4J. What was really interesting was the way it made it easier to combine information and showcase it in a graph form.
“To me, that is more intuitive, and I know a lot of engineers feel that way. It makes it easier to see patterns and see how things connect.”
The engineer was trying to solve a problem involving corrosion of valves, of the sort used in numerous technologies in use at Johnson Space Center, Texas, including environmental systems, oxygen and fuel tanks.
Using graph visualization, it quickly became apparent, for some reason, there was a high correlation between records involving this sort of corrosion and topics involving batteries.
“I couldn’t understand how these topics were related,” Meza says, “but when I started looking into the lessons within those topics I was quickly able to see that some of the condition where we had issues with lithium batteries leaking, and acid contaminating the tanks – we definitely had issues.
“So, if I’m concerned about the tanks and the valves within those tanks, I also have to be concerned about whether there are batteries close to them. Having this correlation built in allowed the engineer to find this out much faster.”
Correlating information graphically in this way makes it far quicker to spot links between potentially related information.
“To me, it’s a validation,” Meza says. “There are many different ways to look and search for information rather than just a keyword search. And I think utilizing new types of graph databases and other types of NoSQL databases really showcases this – often there are better ways than a traditional relational database management system.”
Neo4J is one of the most commonly used open source graph database management systems. It hit the headlines in 2016 when it was used as a primary tool by journalists working with the ICIJ to analyze the leaked, 2.6 terabyte Panama Papers for evidence of tax evasion, money-laundering and other criminal activity.
Obviously, to an organization as data-rich as NASA, there are clear benefits to thinking beyond keyword and PageRank when it comes to accessing information. NASA’s experience serves as another reminder that when you’re undertaking data-driven enterprise, volume of information isn’t always the deciding factor between success and failure, and in fact can sometimes be a hindrance. Often insights are just as likely to emerge from developing more efficient and innovative ways to query data, and clearer ways to communicate it to those who need it to do their jobs.