For better or worse, generative AI has been a disruptive force in many industries, although its reception in video games has been lukewarm at best, with attempts at integrating AI-powered NPCs into games failing to impress most gamers. Now, Google’s DeepMind AI has a new model called Genie 2, which can supposedly be used to generate “action-controllable, playable, 3D environments for training and evaluating embodied agents.” All the environments generated by Genie 2 can supposedly be interacted with, whether by a human piloting a character with a mouse and keyboard or an AI-controlled NPC, although it’s unclear what the behind-the-scenes code and optimizations look like, both aspects of which will be key to any real-world applications of the tech. Google says worlds created by Genie 2 can simulate consequences of actions in addition to the world itself, all in real-time. This means that when a player interacts with a world generated by Genie 2, the AI will respond with what its model suggests is the result of that action (like stepping on a leaf resulting in the destruction of said leaf). This extends to things like lighting, reflections, and physics, with Google showing off some impressively accurate water, volumetric effects, and accurate gravity.
In a demo video, Google showed a number of different AI-generated worlds, each with their own interactive characters, from a spaceship interior being explored by an astronaut to a robot taking a stroll in a futuristic cyberpunk urban environment, and even a sailboat sailing over water and a cowboy riding through some grassy plains on horseback. What’s perhaps most interesting about Genie 2’s generated environments is that Genie has apparently given each world a different perspective and camera control scheme. Some of the examples shown are first-person, while others are third-person with the camera either locked to the character or free-floating around the character. Of course, being generative AI, there is some weirdness, and Google clearly chose its demo clips carefully to avoid graphical anomalies from taking center stage. What’s more, at least a few clips seem to very strongly resemble worlds from popular video games, Assassin’s Creed, Red Dead Redemption, Sony’s Horizon franchise, and what appears to be a mix of various sci-fi games, including Warframe, Destiny, Mass Effect, and Subnautica. This isn’t surprising, since the worlds Google used to showcase the AI are all generated with an image and text prompt as inputs, and, given what Google says it used as training data used, it seems likely that gaming clips from those games made it into the AI model’s training data.
In the first clip, the buttons the character interacts with look like they can’t decide whether they are circular or angular. In some other clips, there are strange blurry textures, and there are moments in some clips where character interactions seem a little unnatural. Google doesn’t shy away from these bloopers, though, and has included a very funny collection of bloopers in the bottom of the announcement blog post.
Of course, any generative AI system is only as good as its training data, and Google says that Genie 2 was trained on a large-scale video dataset, and that the model likely contained a mix of different video game clips, some of which probably exhibited some visual artifacts typical of streamed gameplay. Google imagines using the Genie for everything from rapid prototyping game worlds and training and evaluating AI-powered NPCs in novel worlds and scenarios.
Google claims that its AI is developed “responsibly,” stating:
Genie 2 shows the potential of foundational world models for creating diverse 3D environments and accelerating agent research. This research direction is in its early stages and we look forward to continuing to improve Genie’s world generation capabilities in terms of generality and consistency.
As with SIMA, our research is building towards more general AI systems and agents that can understand and safely carry out a wide range of tasks in a way that is helpful to people online and in the real world