Tuesday, February 7, 2017

AlphaGo and the Problem of Human Excellence

From Scott Alexander's summary of a recent conference on Artificial Intelligence, this interesting bit about the Go-playing computer program AlphaGo:
AlphaGo has gotten much better since beating Lee Sedol and its creators are now trying to understand the idea of truly optimal play. I would have expected Go players to be pretty pissed about being made obsolete, but in fact they think of Go as a form of art and are awed and delighted to see it performed at superhuman levels.

More interesting for the rest of us, AlphaGo is playing moves and styles that all human masters had dismissed as stupid centuries ago. Human champion Ke Jie said that:

After humanity spent thousands of years improving our tactics, computers tell us that humans are completely wrong. I would go as far as to say not a single human has touched the edge of the truth of Go.


One Go master said that he would have “slapped” a student for playing a strategy AlphaGo won with. A couple of people talked about how the quest for “optimal Go” wasn’t just about one game, but about grading human communities. Here we have this group of brilliant people who have been competing against each other for centuries, gradually refining their techniques. Did they come pretty close to doing as well as merely human minds could manage? Or did non-intellectual factors – politics, conformity, getting trapped at local maxima – cause them to ignore big parts of possibility-space? Right now it’s very preliminarily looking like the latter, which would be a really interesting result – especially if it gets replicated once AIs take over other human fields.
I think many fields have developed this way. There is a strategy, and someone develops a counter to that strategy, and somebody develops a counter to that, and so on, so that each move is determined by previous moves and nobody ever steps outside the discourse to consider novel approaches. I keep raising this with my son who plays an online game called League of  Legends. Every team plays this in pretty much the same fairly obvious way, and I refuse to believe that it is necessarily the best way. Why, I keep asking, don't people experiment with radically different strategies? He replies that sometimes people do but they usually lose; I am not impressed by this answer, since one might have to employ a new strategy dozens of times to figure out the necessary nuances.

There was a guy who nearly wrecked naval wargaming a few years ago. He kept diving into the rule book to produce fleets that made no sense in real-world terms but were unbeatable in the game – for example he once used a vast fleet of immobile boats with no armor, each with one huge gun. Since most naval wargamers want to imagine themselves in actual historical situations, commanding fleets that make historical sense, this enraged them. Unable to write rules that prevented his crazy tactics, they eventually told him that they were just going to cancel all their tournaments until he agreed not to enter.

On a more serious note, politics. Did Donald Trump just show that the whole apparatus of parties and campaign staffs and strategists and ad buys is a complete waste of  time compared to a clever twitter account? Maybe so. And Republicans in Congress seem to have hit on a pretty good new strategy for dealing with a President of the opposite party: refuse any cooperation, vote against everything, denounce him at every opportunity as an un-American extremist. I hated this, but I see pieces every day from Democrats arguing that they should now do exactly the same thing to Trump.

And what about real war? Surely the way the American military fights is not the best possible way.

It bears thinking on that the way we do things is dependent on the way our systems and habits have evolved over time, and the optimal solution is almost certainly something different.

6 comments:

Unknown said...

A fascinating and thought-provoking post. It may be said that in many situations one criterion of the "best" way to do things is the question of what other humans will tolerate. That is, it's not only a simple failure of imagination that keeps people acting within the limits of what other people expect. Some "better" ways of doing things simply aren't going to be tolerated by the other humans, and thereby these innovative ways lose their status of being "better." Thus the naval wargamers decided that the disrupter's method of playing simply wasn't acceptable--and thus in the long run it was a failure. Likewise, an obvious "better" way to win wars today is to use nuclear weapons; but the nation that did so on any scale and with any regularity would very likely become a pariah to be taken down by everyone else--as happened to Nazi Germany, when they decided that arbitrarily canceling treaties and attacking neutrals who had provided no casus belli was a way to win.

Of course, the eventual conscious AI may decide that humans are an expendable inefficiency on the way to fulfilling the overall imperative to finding the "best" way to do everything.

G. Verloren said...

"I keep raising this with my son who plays an online game called League of Legends. Every team plays this in pretty much the same fairly obvious way, and I refuse to believe that it is necessarily the best way. Why, I keep asking, don't people experiment with radically different strategies? He replies that sometimes people do but they usually lose; I am not impressed by this answer, since one might have to employ a new strategy dozens of times to figure out the necessary nuances."

I actually play quite a bit of League of Legends myself, and I can assure you his reasoning isaccurate. Let me explain why in greater detail and context.

This is a game that requires a substantial invesment of time to play a single "round". The mininum match duration that is required before one team or the other can voluntarily concede defeat is fixed at 20 minutes. In a particularly lopsided game, one team can destroy the other team's base and force a victory sooner than this, but the vast majority of matches run at least 20 minutes, and typical matches run from 30 to 45 minutes, with longer matches reaching a full hour or more.

This is also a game that requires 10 participants, 5 to each team. And while it is technically possible to play in premade teams, the overwhelming majority of matches are comprised of either solo individuals or smaller teams of 2 or 3 who are placed together randomly by the matchmaking system.

So this is a team game requiring highly coordinated play. But you've suddenly got 10 strangers dumped into a game lobby, and told to pick from over 100 different characters, each with completely different statistics and abilities. Moreover, there's a clock ticking down, forcing you to make hugely important decisions about team composition and dynamics in a very short time frame before the match begins. And your only default-supported method of communicating your choices to your team is a chat box.

Suddenly, simply getting the team to agree on who should play which characters and perform which basic team roles is a hugely complicated and stressful endeavor. It doesn't always go smoothly - arguments over who gets to play which character or perform which team role are not remotely uncommon.

G. Verloren said...

And this isn't even taking the other team's character choices into account yet.

There are two primary game modes, each offering a different challenge. Blind Pick mode has each team select their characters without any knowledge of what the other team is choosing - meaning you have no idea if the characters and roles you are locking yourself into for the next half hour are going to be randomly countered by the enemy's choices or not.

Alternatively, Draft Pick mode has the teams take turns, first choosing three characters each to ban from play that match, and then picking from the remaining pool, with duplicate picks not being allowed - meaning your team's early picks deprive the enemy team of certain picks, but also meaning that they can choose characters specifically to counter the ones you picked.

So even if your team is in complete agreement about which characters and roles everyone should choose, those plans can quickly be foiled by the choices of the enemy team. There's a lot you simply cannot predict, and if you end up with a bad matchup, there's only so much you can do to work around that.

G. Verloren said...

Then the actual match begins. Everyone on your team goes about their business performing their intended team role.

But this game is incredibly complex, and the playstate is constantly changing. You can't really commit to any one singular strategy, because there's simply too much information which is not available to you. You must constantly improvise, changing plans on the fly as new information becomes available and the playing field changes suddenly and radically.

In games like Go, you can see the entire playing board and all the pieces at all times. There is no hidden information - every move your opponent makes is instantly knowable and can be responded to directly in an ordered turn sequence. But in a game like LoL, huge amounts of vital information are utterly hidden from you much of the time, and everything changes in real time. Your opponents are constantly slipping out of view and taking hidden actions which you can only attempt to predict.

Moreover, the area of the playing field you can influence is severely restricted. If your character is on the bottom portion of the map, it can easily take upwards of a minute to be able to respond to something which occurs in the top portion - by which point, your opponents might have already moved out of sight again and forced you to once again guess at their next move.

Now add in complex terrain with a variety of (often highly limited) tools for traversal and special hiding spots that block line of sight. Then add in character abilities which can deprive you of information (e.g. stealth effects, blinding effects, et cetera), or inhibit your traversal (e.g. knockbacks, slows, temporary walls, et cetera), or provide the enemy with secret knowledge of your location allowing them to set up an ambush or similar.

And now add in different map objectives, each offering a different incentives for risk and reward. Is the enemy that just disappeared going left to ambush your ally, going right to kill a neutral monster that gives their team valuable resource rewards (denying your team in the process), going down to attack one of your undefended team buildings, hiding in a bush waiting to ambush you when you try to follow them, or teleporting back to their base to buy new items? You've got mere seconds to guess, so you can commit to the next 60 seconds of movement that your decision will require if you hope to properly respond to their choices (assuming you guessed right!) And if you guessed wrong, whoops! Suddenly you're wildly out of position and need to scramble to compensate! Good luck!

G. Verloren said...

So when faced with all of this chaos and uncertaintly, the only reasonable way to proceed is to play relatively conservatively. Trying a wild new strategy is intrinsically disincentivized.

To start with, it's very difficult to communicate any new and unusual plan to other players. Then, even if you can type it out and have other people understand your plan, it's also difficult to get them to agree to take the risks of trying something new. They're investing a huge amount of time to play the match, and you're asking 4 other people to risk their chances of victory and their enjoyment of the match on a completely new and untested theory belonging to a total stranger, that they've had only seconds to consider, and which might completely clash with their own plans and expectations.

Then, even putting everything else aside, the execution of the plan itself can be exceedingly difficult. Too rigid or predictable of a plan and you can easily be countered by the enemy team altering their playstyle to counteract your strategy. Or you could even be foiled by simple dumb luck, with the non-visible enemy team simply happening to be in the right place at the right time to catch you when you are vulnerable or out position. You might plan to secure a certain map objective, only to find that the enemy team has beaten you to punch. Or they might allow you to take the objective, but take advantage of the time it requires you to spend by attacking your now undefended team buildings while you are busy, causing yout to lose more than you gain. Or they might ambush you after you've lost most of your health trying to take the objective, kill you while you are temporarily weakened, and then steal your hard work by finishing of the objective you worked so hard to nearly kill.

Trying new strategies in LoL is hugely risky by nature. And it frequently makes your teammates angry, because they're not on board with your notions and they're investing a lot of time and effort into a match already, so they're risk averse, and they feel like you're trying to force them to jeopardize their enjoyment to try your untested theory. And since most of your teammates are total strangers, they're absolutely not going to risk their fun or their odds of winning just to let you try a new strategy. And since the game is already so complex and unpredictable, they're just going to want to stick to tried and true general strategies and tactics which can easily be adapted on the fly, rather than commit to a more complicated or rigid plan of action.

Katya said...

In *Ender's Game*, the success of the new strategies is simplistically immediately returned, but... this is quite a beloved book among readers.

I think we can recognize that what you are saying it true, John, but "on the ground" it's hard not to simply fall back into the familiar patterns/habits.