In this post I want to review my contrarian beliefs about the fallacies of modern AI that, while near term productive, will need to evolve to be long term productive.
Lack of North Star
A major problem in AI is we don’t know where we are going. People throw around vague statements about “AGI” which may as well be “magic”. I frame it this way The Goal is Machine Consciousness.
The Scaling Thesis is Wishful Thinking
OpenAI and most of the world is wrong about building smart systems that reach escape velocity — the scaling thesis is unlikely to be the answer though I believe it will be productive. OpenAI defines AGI as being able to perform most economically valuable tasks better than humans, but the problem is, that is a moving target.
Ultimately, any machine consciousness needs to be able to do trial and error and form it’s own models of the world so any form of training in advance is fundamentally flawed and will never lead to an “intelligence” as we know it.
There are four problems with humans force feeding models:
- The throttling problem
- The calibration problem
- The tail problem
- The focus problem
The Throttling Problem
We throttle learning when machines need to rely on waterfall style transfers of data to learn from. Now, these aren’t independent, you can scale training data but have online learning, but the vast inertia of so much knowledge created in a dead environment means that the vast majority of those models will be calibrated badly. Which leads to the next problem.
The Calibration Problem
We effectively create a game of telephone where the data given is the lens through which learning happens and there will be distortion. Consider whether there is a curriculum or dataset out there that encompasses the transfer of knowledge perfectly. There isn’t, that is why we must learn for ourselves, map new knowledge to our past experience in a unique way. We have to try things and see if they work. We have to apply our knowledge and test it against reality or other systems. The idea that we can setup those assessments for everything seems unlikely.
The Tail Problem
That said, if we go big enough and had truly infinite data then yes! It would work in that magic world where we also have infinite resources. This leads us to the tail problem. It’s obvious that the vast majority of use cases have sparse data sets. Therefore, AI will need to be able to bootstrap it’s own learning. And if it can bootstrap it’s own learning for an edge use case then why does it need us to scale it in the first place? One way around this is you focus on generating the data to train the machine for that usecase, this is viable but do we desire a hungry beast on a chain? Most of the items on the tail will likely not be enough to satiate it to a useful degree.
The Focus Problem
The final problem is the focus problem. Focusing on the simple idea of linear scaling will draw attention from focusing on the core loop of learning. We don’t need to go big, we need to go tiny. Make an amoeba of intelligence and evolve it upwards. What we need instead is the analogy of a universal Turing machine — a universal learning machine. Something simple, something elegant, that can self-learn efficiently. We should focus our efforts on building this.
Our Addiction to Big Data
More data is a pit trap that is addictive because it’s a clear linear way to make progress. The come down is that, as shown with self-driving cars it will cost more and more effort to squeeze out incremental performance. The same is true of LLMs.
The scaling thesis drives our addiction but chasing asymptotically diminishing returns with brute force approaches like Tesla self-driving will be near term productive but long term failures — I don’t mean they won’t lead to self-driving per se but that they will do it in a way that won’t generalize to other use-cases efficiently.
Big Models Have no Moat
Training on open data sources is a race to “free” and won’t create a business even as utility goes to infinity.
Waterfall Style Development
Models are built waterfall style. Humans source data. Train models. Deploy the models. However, every active intelligence has agency to learn for itself, the awareness to know where it’s gaps are so our entire approach to intelligence in AI is fundamentally and irrevocably flawed.
Focus on the Internet
The internet is a bad place for intelligence to grow up. It will be short term productive to build big models on the internet and train workers on the internet but it will fail to generalize.
Focus on Human-like Intelligence
We are blinded by existing intelligence’s features and artifacts. Trying to squeeze human-like intelligence through the straw of ML is a mistake. We don’t want this but we say we do and it creates heat in the system see Embrace Alien Machine Intelligence.
False Coupling
The brain works by translating senses down to universal intermediaries that the inner brain can understand and yet we directly couple I/O in our construction of models. In other words, humans can install new senses and hands through tools. Our models cannot.
Models on a Leash
All training goes through humans, we are the consciousness of our AI at the moment. This is a problem for numerous reasons:
- they can’t self correct
- we can never supply all the data they need
- therefore we throttle their learning
The fact that we force feed models data means that they will always be limited by what we have chosen to feed them.