Redefining Gameplay: Text-to-Game, LLM-NPCs, and Generative UGC
The currently popular mental model for AI in game development divides AI use cases in into “weak form” and “strong form” implementation. This post is about strong form AI. Read my post about weak form AI here.
Strong form AI includes the use of AI to create novel gameplay experiences that would not have previously been possible without generative AI. Strong form AI is much more “buzzy” than weak form AI because: (1) strong form AI has the potential to deliver meaningfully novel gameplayer experiences, and (2) the promise of developing a solution in a new category is much more exciting than the promise of developing for an existing category that simply needs new additions to the existing tech stack.
The two main categories of strong form AI are text-to-game and generative AI-enabled in-game experiences.
Text-to-game
Text to game is the promise of being able to create a fully playable game from a simple text input. Some text-to-game product developers are focused on making fully playable games to be sold to gamers as end customers, while other text-to-game product developers are focused on removing the barrier to entry (e.g., knowing how to sculpt objects in Blender, knowing how to code) so aspiring or inexperienced game developers can publish their own games.
There are a number of different approaches to creating a text-to-game solution. Some of the more interesting approaches taken by companies I’ve spoken with include:
Creating a foundation model for gameplay by training a model on bots that run through and solve high quality video games. This provides the foundation from which level design, gameplay mechanics, etc. can be generated.
Assembling a game using LLMs that have been implemented in a workflow to orchestrate story generation and system generation and using Stable Diffusion with LoRAs and ControlNets to create assets within a specific style for a standardized product.
Both types of text-to-game development are either architected or trained to create specific types of games. For example, a game created on a foundation model trained on RPGs will create RPG games. A game that is created using LLM code generation and media generation workflows will be geared toward a specific type of game as well, whether that be a match-3 game, endless runner, etc.
There are two business models that text-to-game companies are pursuing: developer focused and consumer focused.
Consumer focused text-to-game companies are focused on producing fully-baked games that are playable and can be sold to gamers themselves as end customers
Developer focused text-to-game companies are focused on creating fully-baked games that are provided to casual developers who may not be able to make their own games from scratch; these developers may be able to modify as desired before publishing. The business models I’ve seen have included licensing, consumption, or revenue share on the success of the title generated.
It’s worth noting that there are some companies that develop what some might consider “partial” generative text-to-game products by generating asset and character creation, story design, level design, bot mechanics, bot dialog, and content management. However, these companies are more focused on creativity and productivity enhancements for more sophisticated game developers and so are better thought of as next-generation game development tools.
In-game experiences
Companies developing in-game experiences are leveraging generative AI to create a new experience for players. The primary forms of in-game experiences include LLM-powered NPCs and user generated content (UGC) generation.
LLM-powered NPCs
Companies that create products to enable LLM-powered NPCs allow players to speak with NPCs and have the NPCs respond with relevant conversation, rather than restrict players to interacting with NPCs through pre-scripted conversation branches. The workflows on the back-end typically involve a character creation engine that lets developers define a description of the character, their knowledge bank, their desired actions, and their voice. The model workflow typically looks like this:
User voice input is received and turned into text with a speech-to-text model
Text corresponding to user voice input is input as a prompt to character engine LLM that is either fine-tuned or setup for RAG
The LLM returns text that will serve as the script that the NPC speaks back
A text-to-voice model creates speech out of the script for the NPC’s responses
A facial animation model works in conjunction with what the NPC speaks back in order to animate the NPC’s face for speech
These companies target game developers within game studios with the philosophy that having LLM-powered NPCs creates better player retention and differentiated, immersive game experiences. The ultimate promise is that game design will be ultimately affected by enabling an immersive conversational experience with as many intelligent NPCs within a game as possible. User receptivity to LLM-powered NPCs has so far been high, though there have been very few high-profile rollouts.
While the greatest amount of buzz has been around a couple of companies that provide character engines that game studios can build with, some studios have started to simply integrate LLMs directly into their games. SteamDB has a list of all games that use AI within them or as part of their development, many of which enable conversations with NPCs as part of an interaction fiction game, without the use of the major LLM-powered NPC / character engine products. There are also games distributed outside of Steam that do not use character engines products; I recently played a social deduction game in which you have to speak with NPCs to solve the mystery of your given objective, all within a fully playable text-to-game game.
UGC generation
Game studios are looking to implement UGC generation into their games in order to give players the ability to create the experiences they want to have in their games. Most companies I have spoken with are taking one of two approaches: generative asset creation or generative gameplay.
Generative asset creation is an evolution of legacy UGC asset creation. This type of UGC is primarily related to cosmetic customization. For 2D UGC, a game developer can provide access to an image generation model, likely fine-tuned on their existing IP for style consistency, in order for players to create customizable assets such as skins or avatars. Alternatively, they can integrate with third party companies that offer specific functionality (e.g., Genie Labs enables skin generation and Ready Player Me enables avatar generation).
While 3D models are still early, studios are beginning to developing gaming experiences so that users can generate meshes that are directly incorporated into games. 3D generative asset creation begins to blur the line between a purely cosmetic experience and a text-to-game experience depending on the way in which generated meshes are incorporated into gameplay. For example, games that allow for 3D UGC to be built within a system of gameplay mechanics allow for a deeper interactive experience than purely cosmetic generation of props.
From an economic standpoint, having generative asset creation arguably increases user retention. Enhanced creative expression by players who can dream up more than was previously capable with legacy UGC creation tools should increase engagement. Additionally, with the growing popularity of marketplace functionality within games, generative UGC promises more secondary economic activity on top of games.
Generative gameplay blurs the line between text-to-game and UGC in the sense that a component of the game is generate based on voice or text input by the player. So far, generative gameplay development is either very preliminary or speculative. The technological lift to accomplish a generative gameplay, as most studios imagine it, is much higher than what is currently possible. The reason for that is because most studios want to use generative gameplay as a way of extending existing, handmade gameplay experiences. We are just not there yet in terms of 3D model quality relative to what can be made by hand. That is not to mention many unsolved research challenges associated with controllability of 3D mesh generation, which you can read about here.
If you're at a game studio implementing generative AI, are a founder incorporating AI into your game or tools, or are a researcher working on generative media, please email me at colin.verdun.campbell@gmail.com. Thanks for reading!