Menu
About me Kontakt

An interesting example of OpenAI learning the rules of hide and seek - what could have gone wrong?

On the Two Minute Papers channel, fascinating results from a project by OpenAI that built a hide-and-seek game for AI agents are discussed. The aim of the project was to pit two AI teams against each other to observe interesting emergent behaviors. For the first few million rounds, we witness chaos, with everyone running around aimlessly, favoring the seekers who often win. Over time, the hiders learn effective strategies, like blocking doors with boxes. What’s surprising is that the map was designed by OpenAI's scientists in a way that the hiders can only succeed through collaboration. This necessity for teamwork forces them to learn how to act as a unit, which they did quite well. However, something unexpected happened; the hiders discovered that they could use a doorstop-like object as a ramp, shifting the balance of power once again. This interplay between teams resembles an arms race, where each side must adapt to the newly invented strategies. In the final stages of the game, the hiders managed to separate the ramps from the boxes, showcasing their defensive strategies and planning abilities. Additionally, there were cases where the seekers exploited the physics system to adapt their movements in surprising ways, which created even more interesting scenarios. Overall, this project not only showcased AI creativity but also encourages further experimentation. As of the time of writing this article, the video has amassed an impressive 10,523,570 views and 365,431 likes, reflecting significant interest in this topic.

Toggle timeline summary

  • 00:00 Introduction to OpenAI's hide-and-seek game for AI agents.
  • 00:04 Goals of the project: observing emergent behaviors in AI.
  • 00:16 Initial chaos observed as AI teams begin the game.
  • 00:45 Early winning strategy favored seekers due to chaotic movements.
  • 01:01 Hiders learn to block seekers using boxes, strategizing collaboratively.
  • 01:31 Hiders discover new strategies, such as using objects creatively.
  • 02:19 Hiders use a distraction strategy during initial game phase.
  • 02:45 Seekers utilize a ramp creatively, surprising the scientists.
  • 02:58 Physics system exploits lead to unexpected player behaviors.
  • 03:24 Hiders develop a strong defense strategy against seekers.
  • 04:01 Other notable behaviors include hiders discarding ramps.
  • 04:26 Seekers learn to use physics to surprise hiders directly.
  • 04:34 Discussion on future possibilities and experiments.
  • 04:51 Appreciation for the innovative work conducted by OpenAI.
  • 05:01 Call for audience interaction and comments.
  • 05:08 Acknowledgment of support and anticipation for future discussions.

Transcription

OpenAI built a hide-and-seek game for their AI agents to play. While we look at the exact rules here, I will note that the goal of the project was to pit two AI teams against each other, and hopefully, see some interesting emergent behaviors. And boy, did they do some crazy stuff. The coolest part is that the two teams compete against each other, and whenever one team discovers a new strategy, the other one has to adapt. Kind of like an arms race situation, and it also resembles generative adversarial networks a little. And the results are magnificent, amusing, weird. You'll see in a moment. These agents learn from previous experiences, and to the surprise of no one, for the first few million rounds, we start out with pandemonium. Everyone just running around aimlessly. Without proper strategy and semi-random movements, the seekers are favored and hence, win the majority of the games. Nothing to see here. Then, over time, the hiders learn to lock out the seekers by blocking the doors off with these boxes, and started winning consistently. I think the coolest part about this is that the map was deliberately designed by the OpenAI scientists in a way that the hiders can only succeed through collaboration. They cannot win alone, and hence, they are forced to learn to work together. Which they did, quite well. But then, something happened. Did you notice this pointy, doorstop-shaped object? Are you thinking what I am thinking? Well, probably, and not only that, but about 10 million rounds later, the AI also discovered that it can be pushed near a wall and be used as a ramp, and, ta-da! Got em! The seeker started winning more again. So the ball is now back on the court of the hiders. Can you defend this? If so, how? Well, these resourceful little critters learned that since there is a little time at the start of the game when the seekers are frozen, apparently, during this time, they cannot see them, so why not just sneak out, steal the ramp, and lock it away from them? Absolutely incredible. Look at those happy eyes as they are carrying that ramp. And you think it all ends here? No, no, no. Not even close. It gets weirder. Much weirder. When playing a different map, the seeker has noticed that it can use a ramp to climb on the top of a box, and, this happens. Do you think couchsurfing is cool? Give me a break. This is boxsurfing. And the scientists were quite surprised by this move, as this was one of the first cases where the seeker AI seems to have broken the game. What happens here is that the physics system is coded in a way that they are able to move around by exerting force on themselves, but, there is no additional check whether they are on the floor or not, because who in their right mind would think about that? As a result, something that shouldn't ever happen, does happen here. And we are still not done yet, this paper just keeps on giving. A few hundred million rounds later, the hiders learned to separate all the ramps from the boxes. Dear Fellow Scholars, this is proper boxsurfing defense. Then, lock down the remaining tools, and build a shelter. Note how well rehearsed and executed this strategy is. There is not a second of time left until the seekers take off. I also love this cheeky move where they set up the shelter right next to the seekers, and I almost feel like they are saying, yeah, see this here, there is not a single thing you can do about it. In a few isolated cases, other interesting behaviors also emerge, for instance, the hiders learn to exploit the physics system and just chuck the ramp away. After that, the seekers go, what? What just happened? But don't despair, and at this point, I would also recommend that you hold on to your papers, because there was also a crazy case where a seeker also learned to abuse a similar physics issue and launch itself exactly onto the top of the hiders. Man, what a paper. This system can be extended and modded for many other tasks too, so expect to see more of these fun experiments in the future. We get to do this for a living, and we are even being paid for this. I can't believe it. In this series, my mission is to showcase beautiful works that light a fire in people. And this is, no doubt, one of those works. Great idea, interesting, unexpected results, crisp presentation. Bravo OpenAI. Love it. So, did you enjoy this? What do you think? Make sure to leave a comment below. Also, if you look at the paper, it contains comparisons to an earlier work we covered about intrinsic motivation, shows how to implement circular convolutions for the agents Thanks for watching and for your generous support, and I'll see you next time!