
XPENG introduced X-Mind, a predictive world model framework featuring Thought Sketch, Recurrent Block Diffusion and Visual Chain-of-Thought to anticipate traffic scenarios and deliver low inference latency. Trained on hundreds of millions of driving frames, X-Mind improves trajectory prediction accuracy and complements X-World and X-Foresight in XPENG’s Physical AI roadmap.
XPENG unveiled X-Mind at CVPR 2026 Workshop, presenting a Predictive World Model framework aimed at next-generation autonomous driving. X-Mind enhances proactive reasoning over perception-to-action systems by simulating future traffic scenarios before decision-making.
Thought Sketch generates efficient cognitive representations combining BEV layouts, driving priors and road elements to reduce computational complexity. Recurrent Block Diffusion achieves high-quality future scene generation in a single forward pass, while Visual Chain-of-Thought visualizes predicted obstacle movements and lane connectivity for transparency.
X-Mind was trained on hundreds of millions of real-world driving frames and demonstrates improved trajectory prediction accuracy in complex long-tail scenarios. The framework achieves low inference latency suitable for automotive deployment, balancing advanced reasoning with real-time performance.
As the final component after X-World and X-Foresight, X-Mind completes XPENG’s Physical AI foundational model roadmap. The integrated stack enables vehicles not only to act but to forecast how the world evolves after each action.