Planning in 16 Tokens: A Compact Discrete Tokenizer for Latent World Model • Jinsung Lee’s Personal Homepage

We provide a CompACT tokenizer, which represents an image with only 16 discrete tokens. Although it does not support exact reconstruction due to it extreme compression, planning-essential features such as object identities or positions are preserved, making CompACT an effective tokenizer for planning purpose.

Abstract

World models provide a powerful framework for simulating environment dynamics conditioned on actions or instructions, enabling downstream tasks such as action planning or policy learning.
Recent approaches leverage world models as learned simulators, but its application to decision-time planning remains computationally prohibitive for real-time control. A key bottleneck lies in latent representations: conventional tokenizers encode each observation into hundreds of tokens, making planning both slow and resource-intensive. To address this, we propose CompACT, a discrete tokenizer that compresses each observation into just 16 tokens, drastically reducing computational cost while preserving essential information for planning. An action-conditioned world model that occupies CompACT tokenizer achieves competitive planning performance with orders-of-magnitude faster planning, offering a practical step toward real-world deployment of world models.