The new availability of huge amounts of data, along with the statistical tools to crunch these numbers, offers a whole new way of understanding the world. Correlation supersedes causation, and science can advance even without coherent models, unified theories, or really any mechanistic explanation at all.
So wrote Wired’s Chris Anderson in 2008. It kicked up a little storm at the time, as Anderson, the magazine’s editor, undoubtedly intended. For example, an article in a journal of molecular biology asked, “…if we stop looking for models and hypotheses, are we still really doing science?” The answer clearly was supposed to be: “No.”
But today — not even a decade since Anderson’s article — the controversy sounds quaint. Advances in computer software, enabled by our newly capacious, networked hardware, are enabling computers not only to start without models — rule sets that express how the elements of a system affect one another — but to generate their own, albeit ones that may not look much like what humans would create. It’s even becoming a standard method, as any self-respecting tech company has now adopted a “machine-learning first” ethic.
We are increasingly relying on machines that derive conclusions from models that they themselves have created, models that are often beyond human comprehension, models that “think” about the world differently than we do.
But this comes with a price. This infusion of alien intelligence is bringing into question the assumptions embedded in our long Western tradition. We thought knowledge was about finding the order hidden in the chaos. We thought it was about simplifying the world. It looks like we were wrong. Knowing the world may require giving up on understanding it.
Models Beyond Understanding
In a series on machine learning, Adam Geitgey explains the basics, from which this new way of “thinking” is emerging:
[T]here are generic algorithms that can tell you something interesting about a set of data without you having to write any custom code specific to the problem. Instead of writing code, you feed data to the generic algorithm and it builds its own logic based on the data.”
For example, you give a machine learning system thousands of scans of sloppy, handwritten 8s and it will learn to identify 8s in a new scan. It does so, not by deriving a recognizable rule, such as “An 8 is two circles stacked vertically,” but by looking for complex patterns of darker and lighter pixels, expressed as matrices of numbers — a task that would stymie humans. In a recent agricultural example, the same technique of numerical patterns taught a computer how to sort cucumbers.
Then you can take machine learning further by creating an artificial neural network that models in software how the human brain processes signals. Nodes in an irregular mesh turn on or off depending on the data coming to them from the nodes connected to them; those connections have different weights, so some are more likely to flip their neighbors than others. Although artificial neural networks date back to the 1950s, they are truly coming into their own only now because of advances in computing power, storage, and mathematics. The results from this increasingly sophisticated branch of computer science can be deep learning that produces outcomes based on so many different variables under so many different conditions being transformed by so many layers of neural networks that humans simply cannot comprehend the model the computer has built for itself.
Yet it works. It’s how Google’s AlphaGo program came to defeat the third-highest ranked Go player in the world. Programming a machine to play Go is more than a little daunting than sorting cukes, given that the game has 10^350 possible moves; there are 10^123 possible moves in chess, and 10^80 atoms in the universe. Google’s hardware wasn’t even as ridiculously overpowered as it might have been: It had only 48 processors, plus eight graphics processors that happen to be well-suited for the required calculations.
AlphaGo was trained on thirty million board positions that occurred in 160,000 real-life games, noting the moves taken by actual players, along with an understanding of what constitutes a legal move and some other basics of play. Using deep learning techniques that refine the patterns recognized by the layer of the neural network above it, the system trained itself on which moves were most likely to succeed.
Although AlphaGo has proven itself to be a world class player, it can’t spit out practical maxims from which a human player can learn. The program works not by developing generalized rules of play — e.g., “Never have more than four sets of unconnected stones on the board” — but by analyzing which play has the best chance of succeeding given a precise board configuration. In contrast, Deep Blue, the dedicated IBM chess-playing computer, has been programmed with some general principles of good play. As Christof Koch writes in Scientific American, AlphaGo’s intelligence is in the weights of all those billions of connections among its simulated neurons. It creates a model that enables it to make decisions, but that model is ineffably complex and conditional. Nothing emerges from this mass of contingencies, except victory against humans.
As a consequence, if you, with your puny human brain, want to understand why AlphaGo chose a particular move, the “explanation” may well consist of the networks of weighted connections that then pass their outcomes to the next layer of the neural network. Your brain can’t remember all those weights, and even if it could, it couldn’t then perform the calculation that resulted in the next state of the neural network. And even if it could, you would have learned nothing about how to play Go, or, in truth, how AlphaGo plays Go—just as internalizing a schematic of the neural states of a human player would not constitute understanding how she came to make any particular move.
Go is just a game, so it may not seem to matter that we can’t follow AlphaGo’s decision path. But what do we say about the neural networks that are enabling us to analyze the interactions of genes in two-locus genetic diseases? How about the use of neural networks to discriminate the decay pattern of single and multiple particles at the Large Hadron Collider? How the use of machine learning to help identify which of the 20 climate change models tracked by the Intergovernmental Panel on Climate Change is most accurate at any point? Such machines give us good results — for example: “Congratulations! You just found a Higgs boson!” — but we cannot follow their “reasoning.”
Clearly our computers have surpassed us in their power to discriminate, find patterns, and draw conclusions. That’s one reason we use them. Rather than reducing phenomena to fit a relatively simple model, we can now let our computers make models as big as they need to. But this also seems to mean that what we know depends upon the output of machines the functioning of which we cannot follow, explain, or understand.
Since we first started carving notches in sticks, we have used things in the world to help us to know that world. But never before have we relied on things that did not mirror human patterns of reasoning — we knew what each notch represented — and that we could not later check to see how our non-sentient partners in knowing came up with those answers. If knowing has always entailed being able to explain and justify our true beliefs — Plato’s notion, which has persisted for over two thousand years — what are we to make of a new type of knowledge, in which that task of justification is not just difficult or daunting but impossible?