Oliver Gilan

We are not in fast takeoff

AI is not yet improving in a runaway fashion, but most jobs in the economy will see a significant level of disruption from AI in the next few years.

There’s a palpable desperation in the air right now.

And with the release of Opus 4.5 even Anthropic researchers themselves are wondering what to do next.

As a devtool founder and engineer I’m always thinking about the future of software engineering and, at least for now, I do not think we are in a fast takeoff (yet).

Here’s why.

Making sense of where we are#

Millions of years of evolution have tuned the parameters of our mind and body to the gradient of our real, messy environment. The development of language and writing allowed us to project our understanding of that environment into a symbolic representation.

Pretraining was the first GPT-era paradigm and it leveraged all of that written knowledge of ours to fit models to the gradient of our projection. Thus, models are not grounded in the way a human is, but they understand our concepts and language and, it turns out, if you train a model on everything that humans have written about the world, it does a pretty good job of understanding our world!

Post-training was the next paradigm and made the models useful by “forcing” or “teaching” them to prefer certain distributions of outputs for certain inputs. With raw pre-trained models, if you asked the model a question the model might:

  • continue the question
  • start a fake Q&A thread
  • produce something unhelpful

Post-training took these models and made them useful but they still lacked flexibility and the ability to execute over longer tasks that contain ambiguity.

The RL era came next with verifiable rewards, and suddenly it’s possible to teach models how to execute over multiple steps and achieve specific long-time-horizon outcomes rather than hardcoded input-output mappings. This allows the models to experiment and discover novel ways to achieve complex tasks.

In other words:

  • Pretraining = broad competence
  • Post-training = useful behavior
  • RL = long-time-horizon execution

RL is the paradigm for the foreseeable future and barring a breakthrough in model architecture or training regime, understanding the capabilities and limitations of RL is the best way to understand where we are headed.

The limitations of RL#

Engineers are freaking out over Opus 4.5 but most people have been complaining for a while that new model releases aren’t bringing the same level of intelligence gains as previous releases and many engineers still claim the models are functionally useless for them. I have to admit, Opus 4.5 doesn’t even feel like that big of a jump from Sonnet 4.5 or GPT-5 and people weren’t freaking out about those models. So what’s going on?

Reinforcement learning is making models better at achieving longer-running tasks for specific kinds of tasks and when it comes to software engineering the training data is by far the most robust because the labs are built by engineers, programming is incredibly verifiable, and it’s easy to get expert engineers to generate high-quality data through normal work processes.

So models are getting better at programming and their RL training sets are getting better and better, to the point where Opus 4.5 really is just better than entry-level engineers and potentially even the average programmer but RL isn’t a silver bullet.

The RL paradigm is also disadvantageous to the foundation labs because it

  • doesn’t scale as well as pretraining (building RL environments is hard)
  • doesn’t generalize by default as well as pretraining
  • empowers legacy businesses that have high-quality process knowledge

The combination of these limitations makes it harder to scale the capabilities of the models which is why most labs are hyperfixating on the programming usecase: they have process knowledge on programming and it’s the most scalable skillset — many other white collar tasks that humans do manually can be represented in code in some way.

Opus 4.5 is still squarely in the category of a tool rather than an independent actor. It still doesn’t actually learn the way a human does; it has behaviors baked into it that are hard to change, and it can still miss the forest for the trees in a way a senior engineer wouldn’t. Opus is a massive accelerant in my own work, but I still need to look at the code it’s generating and for anything more complex than basic CRUD I still need to hold the right mental models in my head to make sure Opus isn’t going off the rails.

What people are experiencing is a model trained on high-quality RL for common coding tasks. They are better than the average engineer but they are not replacing seniors. They amplify the talent of the engineer using them and they are squarely in the category of tools rather than independent actors.

By the way, this is enough to shift the economics of software development and upend industries, but this doesn’t constitute fast takeoff where we see rapid exponential capability growth across most domains simultaneously.

So what comes next?#

I suspect that as long as we’re in this paradigm we’ll see a couple of things:

  • models will continue to get better at a wider variety of programming tasks.
  • models will continue to execute on longer-term tasks without human intervention.
  • models will continue to have the same limitations that prevent them from being independent actors. They will act as amplifiers: 10x engineers will become 100x engineers and the average engineer will become far worse as they delegate more decisions to the model and lose their grip on the codebase.
  • models will start to be RL’d on a wider variety of economic tasks beyond programming

Personal software is going to become a real trend, most simple vertical SaaS that required teams of average engineers will be built by one or two engineers, and agents will grow as a preferred interface over static dashboards and forms for getting things done. I’m quite confident that software engineering will continue to bifurcate as a profession (more on this soon).

Enterprise software will continue to consume the majority of engineering time and talent because the delivery and maintenance of software will be a difficult problem where human judgement and experience will be necessary in a way that cannot be RL’d. The interesting question is what happens to incumbents. LLMs better at programming than the average engineer dramatically changes the economics of engineering and I suspect nearly every workflow in the SDLC will need to be re-thought. It’s very possible that the best engineers won’t stick around in boring legacy workflows which could open an opportunity for startups that accumulate talent to disrupt enterprise software.

Outside of programming, we will see RL start to be applied in a serious way to other job functions and economic tasks. Many non-technical individuals in finance, law, and other professions are going to begin having their Claude Code moments where they realize that AI doesn’t mean Microsoft Copilot and the jobs they thought were really secure are now at risk. I expect to see gradual yet widespread nihilism and anger that previously appeared in the arts and is now starting to show itself within tech.

But I’m not convinced this progress will be exponential. The RL progress in programming won’t translate to these other tasks which may be much harder to find verifiable rewards and amass high-quality environments for training.

Bets to take#

It’s hard to know how to best position yourself for the future but here’s how I think about it:

Prepare for a world where the RL paradigm is dominant for the foreseeable future and look for the places where a new innovation could move us into fast takeoff. If continuous learning is solved, or there’s some breakthrough in context windows, or we see some innovation in model architecture it’s entirely plausible that we rapidly enter fast takeoff. Researchers from Anthropic seem confident they’ll solve continuous learning in 2026, which is… interesting. I would take them seriously.

The rise of high-quality Chinese OSS models is fascinating because it points to a future where everyone has open models with the same base capabilities and the RL environments become the differentiator. As an investor I’m really interested in companies that can take advantage of these new OSS models and use ML talent arbitrage to grow rapidly. Right now the vast majority of ML talent is stuck in labs and labs can only focus on so many things at once. Companies building tools for Fortune 500 companies to translate their process knowledge into self-hosted models, like Mira Murati’s Thinking Machines, will do very well.

Similarly, I believe founding teams with a combination of ML expertise and specific industry expertise have an opportunity to build the frontier intelligence for industries not yet served by the labs. The hard part is finding the right industries that are both lucrative to be venture-scale yet not in the hot path of the labs but they exist and are ripe for disruption.

Another area that I find interesting is robotics. It’s very possible that the best way to move models to the next phase of capabilities is to move them from our projected world of words and into the real world of physics. I’m fascinated by the things that Physical Intelligence is working on as well as other world-model startups.

If you are building in the hot path of the labs, say in developer tools, then you have to think very carefully about your positioning and what game you’re playing. If your product value prop hinges on providing an agent that’s X% better than other agents then you’re in direct competition with the labs and unless you have rigorous evals, experience with RL environments, and probably even some plan to own your own models then you’re going to be destroyed long term. If you’re competing at the intelligence layer then you are in competition with the labs and need to act accordingly.

Instead I would focus on either the infrastructure layer or the application layer. As agents become more competent they will need tools to do their jobs just like humans. I believe models will be able to use human tools like Excel as a stopgap but most likely we’ll see a rise in spreadsheet software that’s designed for agents to use natively. Companies like Daytona are a good example of what “agent-first computers” might look like.

Similarly, just because agents are doing a lot of work doesn’t mean that humans won’t be in the loop. Whether it’s observability, delegation, auditing, or more traditional workflows, humans will still be in the loop and there will be billion-dollar companies that provide very polished, well-designed user experiences at the application layer. I don’t care how good Opus 4.5 is these LLMs still cannot build a UX as polished as something like Linear. Even all the popular products from the labs that are proudly vibe-coded have very obvious UX problems. I believe human taste and judgement in product decisions and last-mile technical details will still be a major differentiator for a lot of software.


I’m building Mesa to rethink the fundamental infrastructure around versioning, code storage, and code collaboration in a world where agents and humans are co-authoring software. If you’re interested in building the next GitHub and you want to tackle problems both at the infrastructure layer (databases, API design, distributed compute, versioning) and at the application layer (pull requests / code review when you don’t need to actually review code, issue management and OSS maintenance, documentation, etc.) then please reach out!

Baby Lion