Iteration 0: Pretrained-only GPTs - before RLHF.

Some of you might not be familiar with this version, hence calling it 0. This article covers it well. Essentially, it could continue text well, but it could not be asked questions with the expectation it would answer. Instead, it would be more likely to continue the text with several similar questions as these next-tokens are probable.

From witnessing these deficiencies, using RLHF to direct the model into an “instruct” version that would respond instead of continue was found to be invaluable, and thus added to the training regime.

Iteration 1: ChatGPT.

We all know this one. Was pretty amazing initially, but then we started to be able to recognize the hallucination problem. No matter how hard of a question, it would respond immediately with plausible text. Easy questions would often be right, but harder ones would often result in a web of lies. If you imagine taking a test which you are graded on by your stream of consciousness output, you might do the same thing. But, strangely, asking it to think step by step first and to think out loud in tokens before an answer would help it a lot.

From witnessing these deficiencies, using CoT to force multi-step reasoning resulting in higher quality answers was found to be invaluable, and thus added to the training regime.

Iteration 2: o1 (preview)

Most here are well aware. The model can now think dynamically, able to take its time before responding to continously iterate on the answer before submitting a final answer. Not fool proof, still makes mistakes, but achieves a new level of performance that was previously unavailable, same as iteration 0 to 1.

But, what new weaknesses and deficiencies are we squarely facing that we couldn’t quite see in the previous iteration? What new training paradigm might be possible/considered that was not before?

What’s next?

Technological progress is an inherently compounding venture, with the next advances generally depending upon the previous. We should expect this space to be no different (and we can already see the pattern emerging).

In my opinion, what o1 allows us that is impossible before is world-model building. We all talk about “does it have an internal world model or not”, as an accurate world model predicates accurate predictions/understandings of the world.

Before o1, if the model answered wrong, it was very difficult to understand why. What internal mechanisms failed and resulted in the inaccuracy? All you had was the output and the activations to investigate - not very useful.

But now we can see the reasoning steps, in plain language, and we can pin point exactly where the logic goes off the rails. Which means we can pinpoint exactly where we can train the inaccuracy out of the model. And maybe that means we can train the next model to have an explicit world-model portion, just as o1 has specific reasoning portions, where at the end of training we can inspect not only the output and reasoning chain, but the underlying truths/lemmas the model holds.

What do you think? Am I a cuckoo or is this a reasonable attempt at guessing the next? What do you think happens from here?

I still hold that AGI will be obvious when a model can become the best comedian in the world, selling out stadiums worldwide and leaving only happy attendees. I believe that there is so much of our humanity encoded into a successful routine that to succeed as well as AlphaZero in the space would mean it understands us better than ourselves.

What will be the iteration 3 models?