😖

Why the Scaling Thesis is Wrong

When you view intelligence as a model that predicts a world and selects future states that optimize some function (like reproduction) and then attempts to fulfill the selected future states with its body then it is clear the bottleneck for learning is:

  • the cost of updating your internal model
  • the number of sampled disproving experiences i.e. where a prediction goes wrong

Or in other words the speed of learning and ergo generalization is:

Learning=f(costToRetrain,numDisprovingSamples)Learning = f(costToRetrain, numDisprovingSamples)

Now we can ask two questions:

  • how do I lower the cost of retraining my learning machine?
  • how do I source more disproving samples?

The former is where we intersect with the Scaling Thesis.

Sourcing More Disproving Samples

So to test your internal model (self-learning) you both need to:

  1. know what it is and
  2. select/optimize for disproving experiences

We presume a model is necessary because we also presume that retraining costs are lower when you have a modular model of the world. Further, this model gives us a basis to compare our predictions given by a learning machines model of reality to the reality that is observed.

Without a way to interact with your environment, you cannot test your model of said environment.

Now, the scaling thesis, effectively says if we throw enough data at a thing it will learn some meta process for learning on it’s own. In theory, sure, if we have literally infinite data and literally infinite compute and infinite time then this will definitely work. But there are numerous problems and questions.

What are the retraining costs for example? High. Very high with the current SOTA. I don’t see any fundamental reason why this will change with current approaches.

As any computer scientist will say, garbage in = garbage out and the vast majority of the data we train on is garbage.

So, in practice there are a few crippling flaws with the Scaling Thesis:

  • current machines cannot learn on their own through interaction with the environment which means they are at the mercy of data sets
  • retraining is expensive and scales to infinity with the data needed
  • the incidence of disproving samples is unknown and unknowable in a given static data set because it depends on an unknown state of the model at a given time in it’s training

At Aliem we believe embodiment and data creation are foundational ingredients to autonomous machines that generalize over time and reject the Scaling Thesis.

We believe no-data learning and bootstrapped machine intelligence is the future.