| Download | - View final version: Enhancing the planning capabilities of large language models by building external world models (PDF, 698 KiB)
|
|---|
| Link | https://aclanthology.org/2025.agentscen-1.2/ |
|---|
| Author | Search for: Chen, Edwin1; Search for: Li, Xiaoyan1; Search for: Bellinger, Colin1ORCID identifier: https://orcid.org/0000-0002-3567-7834; Search for: Wang, Yunli1ORCID identifier: https://orcid.org/0000-0002-2320-954X |
|---|
| Affiliation | - National Research Council Canada. Digital Technologies
|
|---|
| Format | Text, Article |
|---|
| Conference | 2nd Agent AI for Scenario Planning (AgentScen), August 16, 2025, Montreal, Canada |
|---|
| Abstract | Large Language Models (LLMs) possess a huge amount of knowledge but struggle with multi-step planning even in toy environments due to the limitations of their static internal world model. We introduce a novel approach where an LLM serves as a “world model builder”, constructing and iteratively refining an explicit, external world model. The core of our approach is a state transition function, that is initially generated by the LLM and is refined using feedback from interactions with the environment. This refinement is made possible by accumulating test cases from past experiences allowing us to treat the construction of the world model as a program synthesis problem. We demonstrate the efficacy of our method on the Blocksworld benchmark and introduce a novel ColorMixing dataset that is designed to evaluate multi-step reasoning and planning. Our experimental results show that our method, using GPT-4 and LLaMA3- 70B, achieves perfect accuracy on Blocksworld tasks and significantly outperforms baseline methods, especially in terms of planning success and LLM queries. This paper presents a robust methodology for enhancing LLM planning via a learnable external world model and contributes a new benchmark for evaluating such capabilities. |
|---|
| Publication date | 2025-08-16 |
|---|
| Publisher | Association for Computational Linguistics |
|---|
| Licence | |
|---|
| In | |
|---|
| Language | English |
|---|
| Peer reviewed | Yes |
|---|
| Export citation | Export as RIS |
|---|
| Report a correction | Report a correction (opens in a new tab) |
|---|
| Record identifier | af28d8ae-786d-4145-8754-827e2791b105 |
|---|
| Record created | 2025-09-18 |
|---|
| Record modified | 2025-09-19 |
|---|