The Difference Between Knowing More and Being Smarter

François Chollet’s “On the Measure of Intelligence” argues that intelligence should be measured by skill-acquisition efficiency, not task performance alone.

Apr 30, 2026

When an AI system beats a benchmark, it is tempting to say it has become more intelligent. But François Chollet’s “On the Measure of Intelligence” asks us to slow down. A benchmark score tells us what a system can do under specific conditions. It does not necessarily tell us how well the system can adapt, generalize, or acquire new skills. Chollet’s central move is to define intelligence not as raw task performance, but as skill-acquisition efficiency: the ability to learn new tasks with limited experience and prior information.

a computer circuit board with a brain on it — Photo by Steve A Johnson on Unsplash

1. The Benchmark Problem

Artificial intelligence has a familiar success story: systems keep beating benchmarks. From chess grandmasters to ImageNet champions and now large language models acing standardized tests, AI keeps shattering records. Yet the meaning of these victories remains strangely unclear. Are these systems actually getting smarter, or are they simply getting better at the specific games we’ve designed for them? François Chollet’s 2019 paper “On the Measure of Intelligence” cuts through the hype. He argues that the AI community has been measuring the wrong thing for decades—and that this mismeasurement is quietly steering the entire field in the wrong direction.

2. Performance Is Not Intelligence

High performance on a single task can come from many sources: narrow specialization, brute-force memorization, massive scale, or simply having seen the test data during training. Chollet points out that skill at any given benchmark is heavily modulated by prior knowledge and experience. Give a system unlimited training data, unlimited compute, or hand-crafted rules tuned exactly to the benchmark, and it can achieve superhuman scores without possessing any general intelligence at all. This is why leaderboard-chasing has become a kind of sophisticated overfitting. Performance tells you what a system can do right now. It says almost nothing about how that system got there or what it will be able to do tomorrow on a truly new problem.

3. Chollet’s Alternative Definition

Chollet offers a radically different framing, grounded in algorithmic information theory. Intelligence, he proposes, is *skill-acquisition efficiency*. It is not the skills a system already possesses, but how quickly and flexibly it can acquire new skills when faced with novel tasks. Formally, the intelligence of a system is a measure of its skill-acquisition efficiency over a well-defined scope of tasks, with respect to its priors, the amount of experience it receives, and the generalization difficulty of those tasks. In plain language: given the same starting knowledge and the same limited practice time on problems it has never seen before, which system ends up more capable? The one that learns faster, generalizes farther, and wastes less data is the more intelligent one. Skill is the output. Intelligence is the efficiency of the process that produces it.

4. Why Human Intelligence Feels Different

Humans routinely solve brand-new problems after seeing just a handful of examples. A child can grasp a novel puzzle, invent a rule, and apply it to variations they’ve never encountered. This feels like magic compared to today’s AI, which often needs millions of examples or explicit programming. Chollet explains the difference through abstraction, compositionality, and built-in cognitive structure. Humans do not start from a blank slate; we arrive with powerful core knowledge priors—intuitive physics, object permanence, basic counting, causality—that let us rapidly build higher-level abstractions. Modern AI systems, by contrast, are typically trained to interpolate within vast datasets rather than to discover and compose new concepts on the fly. The result is impressive pattern-matching but brittle generalization outside the training distribution.

5. The Role of Prior Knowledge

This is the subtle and often misunderstood part of Chollet’s argument. He is not claiming that prior knowledge is bad or that intelligence should emerge from pure tabula rasa. Every intelligent system—biological or artificial—relies on priors. The key insight is that we must *account for* those priors when measuring intelligence. If two systems are given radically different starting knowledge or vastly different amounts of experience, comparing their final skill levels tells you nothing useful about which one is more intelligent. A fair benchmark must control for priors and experience, then measure how efficiently each system turns limited new data into broad new capabilities. Chollet’s definition makes this explicit: intelligence is always relative to the priors and experience budget you’re willing to grant.

6. What Better Benchmarks Should Measure

Traditional benchmarks reward final performance. Chollet argues that good intelligence benchmarks should instead test novelty, abstraction, transfer, sample efficiency, and robustness to unfamiliar tasks. They should present problems that cannot be solved by memorization or simple pattern matching. They should require the system to discover rules, build programs in its mind, and generalize from a few demonstrations. Most importantly, the tasks should sit outside the distribution of anything the system has been explicitly trained on. This is exactly what Chollet’s own Abstraction and Reasoning Corpus (ARC) attempts to do: a set of visual grid puzzles that humans solve quickly but that have proven extraordinarily difficult for even the largest language and vision models—precisely because they demand the kind of rapid, data-efficient abstraction that defines intelligence.

7. Why This Matters Now

The timing of Chollet’s critique could hardly be more relevant. Today’s frontier models are astonishingly capable at interpolating within enormous training corpora, yet they still struggle with the kind of flexible, low-data generalization that humans take for granted. Leaderboards keep climbing, but the gap between “impressive on known tasks” and “generally intelligent” remains vast. As we pour ever more compute and data into scaling, Chollet’s framework offers a crucial corrective: raw scale alone is not a reliable path to intelligence if it simply buys more memorized skill without improving acquisition efficiency. Evaluating whether our systems are becoming generally intelligent requires moving beyond leaderboard scores to tests that isolate the core mechanisms of learning and abstraction.

8. Conclusion

Intelligence is not just what a system can do. It is how efficiently it can learn to do new things. François Chollet’s paper reframes the entire AI project around this insight. By shifting the goal from “achieve high scores on fixed tests” to “measure skill-acquisition efficiency under controlled priors and experience,” he gives the field a clearer target and a more honest yardstick. The next generation of benchmarks—and the next generation of AI—will be judged not by how well they perform today, but by how quickly and gracefully they adapt tomorrow. In that light, the real measure of progress is not how smart our machines already appear, but how intelligently they continue to grow.

Nat's Substack

Discussion about this post

Ready for more?