It took just a few years for AI art to evolve from text to 2D imagery to generated 3D video. Today, it’s taken the next step with Google’s Genie 2, which can generate playable 3D game worlds that are constructed on the fly, all from a simple text prompt.
Google’s Genie 2 is the evolution of its Generative Interactive Environments, which uses AI to construct new, interactive environments on the fly. Genie 1, which Google released in February, could construct 2D environments. Now, Genie 2, which Google announced today, takes that into 3D space.
Google calls Genie 2 a “world model,” which means it can simulate virtual worlds, with animations, physics, and object interactions. It’s a two-step process: Genie 2 requires a prompt image to extrapolate into a world, but that image can be itself generated by an ordinary text prompt. Want a cyberpunk Western? Genie 2 will create it. A sailing simulation? That, too. You just need a reference or a prompt to begin.
In Google’s case, it used an image generated by Imagen 3, as well as concept art hand-drawn by an artist. Within the world, the player — either an AI or a human — can interact with the environment. Google’s demo showed a traditional WASD setup, with the arrow keys as alternatives.
The problem, however, is consistency. For whatever reason, the model loses coherency after a short time, typically around 20 seconds or so. (The “longest” model Google created was a minute in length.)
In part, that may be because the model can generate “counterfactuals,” or the different paths and actions a player can choose from a fixed starting point — turning left or right at a fork in the road, for example. The model has to take into account a “long horizon,” or what happens when a player turns away from a scene, then toward it once again.
Google said that Genie 2 can accommodate different perspectives, such as an isometric view, a third-person driving video, or a first-person perspective. Water effects are taken into account, as are complex interactions with the environment. In one demonstration, a player was able to slash a balloon, which popped. Smoke, gravity, and reflections are all modeled, but Google isn’t saying to what extent or resolution its models are rendered, or how many polygons are calculated per frame.
The Genie 2 environments aren’t just for humans. AI “players” can also be modeled, either as NPCs or as the player character. Google showed off how the AI could be told to go through a specific door with a text prompt, and how the AI could recognize the command, understand what it meant in the rendered environment, and then proceed.
Google didn’t divulge what computing resources Genie 2 requires, whether it will be released publicly, or even whether it plans to commercialize it. But with AI slowly creeping into games via AI-generated dialogue, it appears that AI-simulated games could eventually be real, too. Just not right away.