The perfect AI
At the Snowflake Summit this week, Sam Altman said that the perfect AI is “a very tiny model with superhuman reasoning, 1 trillion tokens of context, and access to every tool you can imagine.”
That is to say, it would be trained on how to think, it would be able to think about a lot at once, and it would be able to call on tools like search and coding environments to search for information across sources or derive values needed to reach a solution.
But it wouldn’t necessarily know a lot. One couldn’t assume it would be able to recite epic poems, obscure one-off facts, niche technical specs, or copyrighted material.
Which is probably fine if your model has been pretrained on enough high-quality training data to bootstrap reasoning capabilities in the first place and can tell when it needs to look things up rather than hallucinating.
This has straightforward upsides. A smaller model with less information memorized demands less memory needed on the device or data center server rack that runs a model. For the same amount of resources per model instance, it leaves more room for context. It might mean less capex or electricity use per token generated. A smaller model can also start returning tokens sooner and return more tokens per second (which for a reasoning model, means returning a better answer in the same amount of time). And so on.
Technologies demand their complements
Technologies emerge from humans solving problems, engaging with their world, increasing their understanding, and capturing the benefit of that understanding with the creation of an artifact.
Then that artifact exists in the world. It becomes part of the status quo, embedded in processes.
Then the technology, present in our world, can demand and elicit complementary behavior from us. A personal computer wants a user who writes and edits words, manipulates data on spreadsheets, and organizes that information in files. The Internet sees humans like it sees computers, as endpoints of the network making queries or providing information or processing.
Superhuman reasoning but tiny models will not “want” people to provide a shallow facsimile of their own capabilities — the type of guy who is quick to pull up and show you random trivia on their phone, on the basis that it’s novel.
I reckon the complement to these perfect AIs will be deep, deep internalized knowledge in some number of adjacent, partially overlapping domains. These models will be superintelligent in surprisingly spiky ways, and they’ll need partners to explore the frontier.