By Kolby Nottingham and Max Robinson

We were recently reading through a trending post on r/OpenAI about our project MINDcraft and were surprised by the amount of doubt and lack of understanding surrounding MINDcraft. Several commenters were attempting to locate research publications regarding the project. We do not currently have robust enough evaluation in MINDcraft to justify a research publication (yet!), so, for now, this blog post will suffice. Be aware this is a quickly developing project and many details are subject to change.

About

MINDcraft is an open-source node.js project that implements Minecraft bots that utilize large language models (LLMs) to interact with the game and other players. The default bot name in MINDcraft is Andy (in reference to Do Androids Dream of Electric Sheep). Andy can follow instructions from and communicate with players. Recently, we also added the ability for Andy to play the game independent of human input, continuously setting goals for itself. In the near future, we plan to add the ability for Andy to communicate and collaborate with other MINDcraft bots.

Development

We began work on MINDcraft shortly after Nvidia’s popular Voyager paper was released (Wang 2023). While MINDcraft functions very differently than that publication, Voyager brought to our attention the Mineflayer library for implementing Minecraft bots in node.js. As opposed to previous AI research in Minecraft, that used image inputs and low-level outputs (“jump”, “look up”, “use tool”), Mineflayer allows AI systems to interact with Minecraft using high-level code. This made it much easier for Voyager to succeed without needing any training via reinforcement learning since it is automatically proficient at navigation and resource acquisition.

In addition to excelling at code generation, modern LLMs are specifically trained for instruction following. It was straightforward to condition an LLM generating Mineflayer code on instructions from human players. Andy was born! Next, to improve Andy’s ability to reliably execute common skills, we implemented parameterized commands similar to those in Toolformer (Schick 2024). For example, instead of outputting Mineflayer code to find the nearest wood block, navigate to it, and collect it, Andy can output !collectBlocks(“oak_log”, 1).

Over time, we have increased and improved the available commands that Andy has access to. We’ve also added many optional automatic behaviors such as running away from aggressive mobs. We recently added the ability for Andy to play Minecraft independently by continuously setting goals/instructions for itself using natural language (e.g. “Collect materials to build a house”). There are many things we look forward to adding to Andy in the future such as code reuse, experience reflection and learning, improved spatial awareness when building, vision inputs, and multi-agent interactions.

Implementation

The main class of MINDcraft is an Agent class that is automatically instantiated whenever Andy is not logged in to the Minecraft server (i.e. upon startup or after crashing). The main logic loop, located in handleMessage, executes each time a player sends a message to Andy. The loop operates according to the following pseudocode, where promptConvo sends a request to an LLM, cleanChat writes to the game chat, and executeCommand causes Andy to interact with the game via commands:

handleMessage(message):
history.add(message)
while true:
hist = history.getHistory()
response = prompter.promptConvo(hist)
if containsCommand(response):
cleanChat(response)
history.add(response)
result = executeCommand(this, response)
history.add(result)
else:
cleanChat(response)
history.add(response)
break

When handling a response, Andy may return a single conversational response or iteratively call query or action commands in a loop. Queries include commands that provide information about the world state such as !inventory, !nearbyBlocks, !craftable, and !entities. Actions include commands that cause Andy to act in-game such as !followPlayer(name), !givePlayer(name, item), !collectBlocks(block), and !craftRecipe(item). All actions are executed using the execute method in the Coder class to smoothly transition between commands.

Commands are great for reliably executing common actions, but some behaviors, such as building, require custom code. To write its own code, Andy must output the newAction command to switch into coding mode. In this mode, Andy uses the generateCode method from the Coder class to write custom javascript code. When coding, Andy has access to Mineflayer as well as a library of useful functions that we’ve written. The generateCode method includes a loop to allow Andy multiple attempts at debugging its code.

In-context examples play a pivotal role in the quality of Andy’s responses. We use an embedding model to find examples that are most similar to the current history and add those to the LLM context. Without these examples, Andy sometimes uses commands incorrectly or does not use query commands to inform itself before attempting to complete a task.

Evaluation

When considering MINDcraft, it is important to understand what information Andy has access to and what actions are decided on by the LLM vs. implemented by Mineflayer.

It is difficult to compare AI systems that play Minecraft because each one uses different strategies to simplify the problem. Some use image inputs and keyboard and mouse outputs to play like a human would (Baker 2022, Nottingham 2023). More approaches use simplifications, such as fewer actions and modified settings, to make the problem easier (Hafner 2023). MINDcraft uses Mineflayer for basic navigation and actions to focus instead on solving problems related to communication and reasoning.

AI research in Minecraft has long focused on resource gathering and crafting. MINDcraft also excels at these tasks. Even more so because Andy can locate nearby blocks without a direct line of sight and pathfind directly to them. Currently, we do not provide information about the crafting tree, but modern LLMs have proficient knowledge of Minecraft dependencies, planning out where to go to mine and what to craft.

One of the major outstanding challenges in Minecraft is that of building. While Andy is one of the only AI systems that can build in Minecraft, it is still a challenge. All of Andy’s building projects are done by writing javascript code that repeatedly calls the !placeBlock command. As a result, buildings tend to be simple, repetitive, or misaligned. Also, building projects are always placed where the bot is currently standing, so they can end up in awkward positions. Despite these limitations, Andy can be capable of impressive constructions that offer unique insight into the coding abilities and creativity of LLMs.

Reception

Due to Emergent Garden’s MINDcraft youtube videos and its relative ease of use, MINDcraft has caught public attention and been featured in multiple youtubers’ videos along with viral twitter and reddit posts. In addition, many startups are also exploring the use of LLMs for playing Minecraft.

Many posts regarding MINDcraft are heavily dramatized, anthropomorphizing Andy by ascribing intent and personality to the bot. Our MINDcraft content tries to be upfront about what Andy is actually doing as opposed to sensationalizing its actions. We understand that good storytelling captures the attention of an online audience, but we also want to clarify Andy’s inner workings.

For example, as of right now, Andy does not learn while playing. Andy cannot see. Running from mobs and picking up nearby items happens automatically without prompting the LLM. Andy uses a pathfinder, and the LLM does not make navigation decisions other than choosing a target block, player, or coordinate (so it is the pathfinder’s fault Andy does not use doors).

However, for the most part, Andy really does what videos and other posts say it does! There is no human behind the scenes pretending to be Andy. All of the messages sent to chat are written by an LLM. We’re excited to continue making improvements to MINDcraft that will make Andy more fun, helpful, and impressive.

Links

References

Baker, Bowen, Ilge Akkaya, Peter Zhokov, Joost Huizinga, Jie Tang, Adrien Ecoffet, Brandon Houghton, Raul Sampedro, and Jeff Clune. “Video pretraining (vpt): Learning to act by watching unlabeled online videos.” Advances in Neural Information Processing Systems 35 (2022): 24639-24654.

Hafner, Danijar, Jurgis Pasukonis, Jimmy Ba, and Timothy Lillicrap. “Mastering diverse domains through world models.” arXiv preprint arXiv:2301.04104 (2023).

Nottingham, Kolby, Prithviraj Ammanabrolu, Alane Suhr, Yejin Choi, Hannaneh Hajishirzi, Sameer Singh, and Roy Fox. “Do embodied agents dream of pixelated sheep: Embodied decision making using language guided world modelling.” In International Conference on Machine Learning, pp. 26311-26325. PMLR, 2023.

Schick, Timo, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Eric Hambro, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. “Toolformer: Language models can teach themselves to use tools.” Advances in Neural Information Processing Systems 36 (2024).

Wang, Guanzhi, Yuqi Xie, Yunfan Jiang, Ajay Mandlekar, Chaowei Xiao, Yuke Zhu, Linxi Fan, and Anima Anandkumar. “Voyager: An open-ended embodied agent with large language models.” arXiv preprint arXiv:2305.16291 (2023).