AlphaGo is a deep neural network – a program modeled after the human brain.
AlphaGo is a deep neural network – a program modeled after the human brain.
AlphaGo isn’t even one single machine, but is instead distributed software running across a computer network, which comprises over 170 GPUs and 1,200 CPUs. It’s a prime example of a program based on deep neural nets – hardware and software networks that approximate the structure and function of the web of neurons in our brains.
It uses a combination of Monte Carlo tree search programs, search algorithms, and two types of deep neural networks; a policy and value network. AlphaGo uses these neural networks to guide its Monte Carlo program, which involves looking ahead and playing out the remainder of the game in its “mind.”
During each simulated game, the policy network suggests which moves to make based on what it thinks the opponent’s next move will be, while the value network evaluates the resulting position. Finally, AlphaGo selects the most successful move in its simulation.
These neural networks empower a technique called machine learning, which enables a computer to “learn” without needing to be fed explicit instructions for specific scenarios. Computers relying on machine learning require huge amounts of data to become smarter, and DeepMind started training the policy network with 30 million moves from games between top Go players.
DeepMind’s goal was to win the best players, not merely ape them. AlphaGo had to discover new strategies for itself, so DeepMind set it to play thousands of games against itself, gradually improving its tactics by a trial-and-error process known as reinforcement learning. Ultimately, the value networks became so capable that they could evaluate any Go position and estimate the eventual winner, a feat once thought to be impossible.