New Benchmark Reveals Limitations of AI in Creative Problem-Solving

Published on May 7, 2026

Recent research has unveiled a pressing gap in the capabilities of large language models (LLMs) regarding creative reasoning. While these models excel at reasoning tasks, their ability to repurpose tools creatively remains largely untested. The introduction of CreativityBench aims to address this deficiency, marking a significant shift in how AI creativity is evaluated.

CreativityBench sets out to benchmark affordance-based creativity a comprehensive knowledge base. This resource features over 4,000 entities and more than 150,000 affordance annotations. The project generates 14,000 tasks that challenge LLMs to find innovative uses for objects based on their physical properties rather than their traditional applications.

Initial evaluations across ten leading LLMs indicate that while models can occasionally identify plausible objects, they struggle with pinpointing the correct parts and their associated affordances. As a result, performance in solving tasks plummets. Notably, enhancements from model scaling appear to plateau quickly, and common strategies like Chain-of-Thought yield minimal improvements.

These findings underscore a critical hurdle in advancing AI creativity, even with state-of-the-art models. The establishment of CreativityBench not only sheds light on this vital aspect of intelligence but also has significant implications for future AI development. As researchers continue to explore these challenges, the potential for more versatile and innovative agents could reshape various applications.

New Benchmark Reveals Limitations of AI in Creative Problem-Solving

Related News

Related Articles