DeepMind AI beats Quake Capture The Flag

Started by Darren Dirt, July 19, 2018, 05:01:05 AM

Previous topic - Next topic

Darren Dirt

Because figuring out GO and CHESS (without being fed rules or opening moves) isn't eye opening enough...


https://deepmind.com/blog/capture-the-flag/

"
...CTF in which the map layout changes from match to match. As a consequence, our agents are forced to acquire general strategies rather than memorising the map layout.

Additionally, to level the playing field, our learning agents experience the world of CTF in a similar way to humans: they *observe a stream of pixel images and issue actions through an emulated game controller*.

CTF is played on procedurally generated environments, such that agents must generalise to unseen maps.

Our agents must learn from scratch how to see, act, cooperate, and compete in unseen environments, all *from a single reinforcement signal per match: whether their team won or not*.
"

Geez.
_____________________

Strive for progress. Not perfection.
_____________________

Darren Dirt

#1
The article authors could read my mind...

"
The agents had very fast reaction times and were very accurate taggers, which could explain their performance. However, by artificially reducing this accuracy and reaction time we saw that this was only one factor in their success.
"

Okay... what other factors?

"
The agents are never told anything about the rules of the game, yet learn about fundamental game concepts and effectively develop an intuition for CTF.

Through unsupervised learning we established the prototypical behaviours of agents and humans to discover that agents in fact learn human-like behaviours, such as following teammates and camping in the opponent?s base.
"

Dayum.


By the way...

Quote from: Darren Dirt on July 19, 2018, 05:01:05 AM
Because figuring out GO and CHESS (without being fed rules or opening moves) isn't eye opening enough...
Apparently Chess (no rules, no opening moves) it learned to crush ... in just 4 real-life hours. O_O https://forumserver.twoplustwo.com/29/news-views-gossip/googles-alphazero-just-smashed-strongest-chess-engine-poker-next-1697744/

Current level of Machine Learning, you crazy!
_____________________

Strive for progress. Not perfection.
_____________________

Lazybones

Interesting.

Watching videos on vision systems for cars from nvidia and marI/O videos on YouTube I didn?t think the learning process had matured that much.

Tom

They do have more sophisticated neural network algorithms I'm sure.. but I'd think they preload the network to get things going that fast.
<Zapata Prime> I smell Stanley... And he smells good!!!

Darren Dirt

#4
THAT is the thing that stands out.

No preloading as a starting point to jump off from.

Just "here is the data stream, here is how you measure success/fail. GO!"

It is called RL. "Reinforced" learning. (Brings to mind the TED Talk I linked to actually!) It is a lot how humans learn, instead of brute force trying all possibilities, it soon becomes automatic weighting of different paths. "Learning" soon follows... then super-human mastery. O_O

See the other links in the 2+2 thread. They also ate beating DOTA2. The 5vs5 player game.
_____________________

Strive for progress. Not perfection.
_____________________

Mr. Analog

By Grabthar's Hammer