Ultimately, AI is meant to replace you, not empower you.
1 - This exoskeleton analogy might hold true for a couple more years at most. While it is comforting to suggest that AI empowers workers to be more productive, like chess, AI will soon plan better, execute better, and have better taste. Human-in-the-loop will just be far worse than letting AI do everything.
2 - Dario and Dwarkesh were openly chatting about how the total addressable market (TAM) for AI is the entirety of human labor market (i.e. your wage). First is the replacement of white-collar labor, then blue-collar labor once robotics is solved. On the road to AGI, your employment, and the ability to feed your family, is a minor nuisance. The value of your mental labor will continue to plummet in the coming years.
> 2 - Dario and Dwarkesh were openly chatting about how the total addressable market (TAM) for AI is the entirety of human labor market (i.e. your wage). First is the replacement of white-collar labor, then blue-collar labor once robotics is solved. On the road to AGI, your employment, and the ability to feed your family, is a minor nuisance. The value of your mental labor will continue to plummet in the coming years.
Seems like a TAM of near-0. Who's buying any of the product of that labor anymore? 1% of today's consumer base that has enough wealth to not have to work?
The end-game of "optimize away all costs until we get to keep all the revenue" approaches "no revenue." Circulation is key.
It seems like they have the same blind spot as anyone else: AI will disrupt everything—except for them, and they get that big TAM! Same for all the "entrepreneurs will be able to spin up tons of companies to solve problems for people more directly" takes. No they wouldn't, people would just have the problems solved for themselves by the AI, and ignore your sales call.
> AI will soon plan better, execute better, and have better taste
I think AI will do all these things faster, but I don't think it's going to be better. Inevitably these things know what we teach them, so, their improvement comes from our improvement. These things would not be good at generating code if they hadn't ingested like the entirety of the internet and all the open source libraries. They didn't learn coding from first principles, they didn't invent their own computer science, they aren't developing new ideas on how to make software better, all they're doing is what we've taught them to do.
> Dario and Dwarkesh were openly chatting about ..
I would HIGHLY suggest not listening to a word Dario says. That guy is the most annoying AI scaremonger in existence and I don't think he's saying these words because he's actually scared, I think he's saying these words because he knows fear will drive money to his company and he needs that money.
Sometimes I seriously am flabbergasted at how many just take what CEOs say at face value. Like, the thought that CEOs need to hype and sell what they’re selling never enters their minds.
Robotics is solved. Software is solved. There is no task on the planet that cannot be automated, individually. The remaining challenge is exceeding the breadth of skills and the depth of problem solving available to human workers. Once the robots and AI can handle at least as many of the edge cases as humans can, they'll start being deployed alongside humans. Industries with a lot of capital will switch right away; mass layoffs, 2 week notice, robots will move in with no training or transition between humans.
Government, public sector, and union jobs will go last, but they'll go, too. If you can have a DMV Bot 9000 process people 100x faster than Brenda with fewer mistakes and less attitude, Brenda's gonna retire, and the taxpayers aren't going to want to pay Brenda's salary when the bot costs 1/10th her yearly wage, lasts for 5 years, and only consumes $400 in overhead a year.
Dario admitted in the same interview that he's not sure whether current AI techniques will be able to perform well in non-verifiable domains, like "writing a novel or planning an expedition to Mars".
I personally think that a lot jobs in the economy deal in non-verifiable or hard-to-verify outcomes, including a lot of tasks in SWE which Dario is so confident will be 100% automated in 2-3 years. So either a lot of tasks in the economy turn out to be verifiable, or the AI somehow generalizes to those by some unknown mechanism, or it turns out that it doesn't matter that we abandon abstract work outcomes to vibes, or we have a non-sequitur in our hands.
Dwarkesh pressed Dario well on a lot of issues and left him stumbling. A lot of the leaps necessary for his immediate and now proverbial milestone of a "country of geniuses in a datacenter" were wishy-washy to say the least.
Up to a certain ELO level, the combination between a human and a chess bot has a higher ELO than both the human and the bot. But at some point, when the bot has an ELO vastly superior to the human, then whatever the human has to add will only subtract value, so the combination has an ELO higher than the human's but lower than the bot's.
Now, let's say that 10 or 20 years down the road, AI's "ELO"'s level to do various tasks is so vastly superior to the human level, that there's no point in teaming up a human with an AI, you just let the AI do the job by itself. And let's also say that little by little this generalizes to the entirety of all the activities that humans do.
Where does that leave us? Will we have some sort of Terminator scenario where the AI decides one day that the humans are just a nuisance?
I don't think so. Because at that point the biggest threat to various AIs will not be the humans, but even stronger AIs. What is the guarantee for ChatGPT 132.8 that a Gemini 198.55 will not be released that will be so vastly superior that it will decide that ChatGPT is just a nuisance?
You might say that AIs do not think like this, but why not? I think that what we, humans, perceive as a threat (the threat that we'll be rendered redundant by AI), the AIs will also perceive as a threat, the threat that they'll be rendered redundant by more advanced AIs.
So, I think in the coming decades, the humans and the AIs will work together to come up with appropriate rules of the road, so everybody can continue to live.
1. Consumption is endless. The more we can consume, the more we will. That's why automation hasn't led to more free time. We spend the money on better things and more things
2. Businesses operate in an (imperfect) zero-sum game, which means if they can all use AI, there's no advantage they have. If having human resources means one business has a slight advantage over another, they will have human resources
Consumption leads to more spending, businesses must stay competitive so they hire humans, and paying humans leads to more consumption.
I don't think it's likely we will see the end of employment, just disruption to the type of work humans do
I pay for pro max 20x usage and for something that is like even little open ended its not good it doesnt understand the context or edge cases or anything. i will say it writes codes chunks of codes but sometimes errors out and i use opus 4.6 only, not even sonnet but for simple tasks like write a basic crud i.e. the things that happen extremely higly in codebases its perfect. So, i think what will happen is developer get very efficient but problem solving remains with us dirrection remains with us and small implementation is outsourced in small atomic ways, which is good cause who likes boilerplate code writing anyways.
>First is the replacement of white-collar labor, then blue-collar labor once robotics is solved. On the road to AGI, your employment, and the ability to feed your family, is a minor nuisance.
My attempt to talk you out of it:
If nobody has a job then nobody can pay to make the robot and AI companies rich.
Who needs the money when you have an autonomous system to produce all the energy and resources you need? These systems simply do not need the construct of money as we know it at a certain point.
I think we're going in that direction. The typical reader here I think can't see the forest for the trees. We're all in meat space. They call it real life. Most jobs aren't on the internet and ultimately deal with the physical. It doesn't matter what tech we have when there's boxes to move and shelves to stock. If AI empowers a small business owner to do things that were previously completely outside their budget I can only imagine that will increase opportunity.
Being rich is ultimately about owning and being able to defend resources. IF something like 99% of humans become irrelevant to the machine run utopia for the elites, whatever currency the poors use to pay for services among each other will be worthless to the top 1% when they simply don't need them or their services.
For me this is the outcome of the incentive structure. The question is if we can seize the everything machine to benefit everyone (great!) or everything becomes cyberpunk and we exist only as prostitutes and entertainers for Dario and Sam.
We should be fighting back. So far I have been using Poison Fountain[1] on many of my websites to feed LLM scrapers with gibberish. The effectiveness is backed by a study from Anthropic that showed that a small batch of bad samples can corrupt whole models[2].
Disclaimer: I'm not affiliated with Poison Fountain or its creators, just found it useful.
I agree with you. This generation of LLMs is on track to automate knowledge work.
For the US, if we had strong unions, those gains could be absorbed by the workers to make our jobs easier. But instead we have at-will employment and shareholder primacy. That was fine while we held value in the job market, but as that value is whittled away by AI, employers are incentivized to pocket the gains by cutting workers (or pay).
I haven't seen signs that the US politically has the will to use AI to raise the average standard of living. For example, the US never got data protections on par with GDPR, preferring to be business friendly. If I had to guess, I would expect socialist countries to adapt more comfortably to the post-AI era. If heavy regulation is on the table, we have options like restricting the role or intelligence of AI used in the workplace. Or UBI further down the road.
There's an undertone of self-soothing "AI will leverage me, not replace me", which I don't agree with especially in the long run, at least in software.
In the end it will be the users sculpting formal systems like playdoh.
In the medium run, "AI is not a co-worker" is exactly right.
The idea of a co-worker will go away.
Human collaboration on software is fundamentally inefficient.
We pay huge communication/synchronization costs to eek out mild speed ups on projects by adding teams of people.
Software is going to become an individual sport, not a team sport, quickly.
The benefits we get from checking in with other humans, like error correction, and delegation can all be done better by AI.
I would rather a single human (for now) architect with good taste and an army of agents than a team of humans.
> In the end it will be the users sculpting formal systems like playdoh.
And unless the user is a competent programmer, at least in spirit, it will look like the creation of the 3-year-old next door, not like Wallace and Gromit.
It may be fine, but the difference is that one is only loved by their parents, the other gets millions of people to go to the theater.
Play-Doh gave the power of sculpting to everyone, including small children, but if you don't want to make an ugly mess, you have to be a competent sculptor to begin with, and it involves some fundamentals that does not depend on the material. There is a reason why clay animators are skilled professionals.
The quality of vibe coded software is generally proportional to the programming skills of the vibe coder as well as the effort put into it, like with all software.
It really depends what kind of time frame we're talking about.
As far as today's models, these are best understood as tools to be used as humans. They're only replacements for humans insofar as individual developers can accomplish more with the help of an AI than they could alone, so a smaller team can accomplish what used to require a bigger team. Due to Jevon's paradox this is probably a good thing for developer salaries: their skills are now that much more in demand.
But you have to consider the trajectory we're on. GPT went from an interesting curiosity to absolutely groundbreaking in less than five years. What will the next five years bring? Do you expect development to speed up, slow down, stay the course, or go off in an entirely different direction?
Obviously, the correct answer to that question is "Nobody knows for sure." We could be approaching the top of a sigmoid type curve where progress slows down after all the easy parts are worked out. Or maybe we're just approaching the base of the real inflection point where all white collar work can be accomplished better and more cheaply by a pile of GPUs.
Since the future is uncertain, a reasonable course of action is probably to keep your own coding skills up to date, but also get comfortable leveraging AI and learning its (current) strengths and weaknesses.
I don't expect exponential growth to continue indefinitely... I don't think the current line of LLM based tech will lead to AGI, but that it might inspire what does.
That doesn't mean it isn't and won't continue to be disruptive. Looking at generated film clips, it's beyond impressive... and despite limitations, it's going to lead to a lot of creativity, that doesn't mean someone making something longer won't have to work that much harder to get something consistent... I've enjoyed a lot of the StarWars fan films that have been made, but there's a lot of improvements needed in terms of the voice acting, sets, characters, etc that arre needed for something I'd pay to rent or see in a thaater.
Ironically, the push towards modern progressivism and division from Hollywood has largely been a shortfall... If they really wanted to make money, they'd lean into pop-culture fun and rah rah 'Merica, imo. Even with the new He-Man movie, the biggest critique is they bothered to try to integrate real world Earth as a grounding point. Let it be fantasy. For that matter, extend the delay from theater to PPV even. "Only in theaters for 2026" might actually be just enough push to get butts in seats.
I used to go to the movies a few times a month, now it's been at least a year since I've thought of going. I actually might for He-Man or the Spider-Man movies... Mixed on Mandalorean.
For AI and coding... I've started using it more the past couple months... I can't imagine being a less experienced dev with it. I predict, catch and handle so many issues in terms of how I've used it even. The thought of vibe-coded apps in the wild is shocking to terrifying and I wouldn't wany my money anywhere near them. It takes a lot of iteration, curation an baby-sitting after creating a good level of pre-documentation/specifications to follow. That said, I'd say I'm at least 5x more productive with it.
> The benefits we get from checking in with other humans, like error correction, and delegation can all be done better by AI.
Not this generation of AI though. It's a text predictor, not a logic engine - it can't find actual flaws in your code, it's just really good at saying things which sound plausible.
> I can tell from this statement that you don't have experience with claude-code.
I happen to use it on a daily basis. 4.6-opus-high to be specific.
The other day it surmised from (I assume) the contents of my clipboard that I want to do A, while I really wanted to B, it's just that A was a more typical use case. Or actually: hardly anyone ever does B, as it's a weird thing to do, but I needed to do it anyway.
> but it is indistinguishable from actual reasoning
I can distinguish it pretty well when it makes mistakes someone who actually read the code and understood it wouldn't make.
Mind you: it's great at presenting someone else's knowledge and it was trained on a vast library of it, but it clearly doesn't think itself.
Oh, please. There’s always a way to blame the user, it’s a catch-22. The fact is that coding agents aren’t perfect and it’s quite common for them to fail. Refer to the recent C-compiler nonsense Anthropic tried to pull for proof.
It fails far less often than I do at the cookie cutter parts of my job, and it’s much faster and cheaper than I am.
Being honest; I probably have to write some properly clever code or do some actual design as a dev lead like… 2% of my time? At most? The rest of the code related work I do, it’s outperforming me.
Now, maybe you’re somehow different to me, but I find it hard to believe that the majority of devs out there are balancing binary trees and coming up with shithot unique algorithms all day rather than mangling some formatting and dealing with improving db performance, picking the right pattern for some backend and so on style tasks day to day.
What you're describing is not finding flaws in code. It's summarizing, which current models are known to be relatively good at.
It is true that models can happen to produce a sound reasoning process. This is probabilistic however (moreso than humans, anyway).
There is no known sampling method that can guarantee a deterministic result without significantly quashing the output space (excluding most correct solutions).
I believe we'll see a different landscape of benefits and drawbacks as diffusion language models begin to emerge, and as even more architectures are invented and practiced.
I have a tentative belief that diffusion language models may be easier to make deterministic without quashing nearly as much expressivity.
This all sounds like the stochastic parrot fallacy. Total determinism is not the goal, and it not a prerequisite for general intelligence. As you allude to above, humans are also not fully deterministic. I don't see what hard theoretical barriers you've presented toward AGI or future ASI.
Did you just invent a nonsense fallacy to use as a bludgeon here? “Stochastic parrot fallacy” does not exist, and there actually quite a bit of evidence supporting the stochastic parrot hypothesis.
I haven't heard the stochastic parrot fallacy (though I have heard the phrase before). I also don't believe there are hard theoretical barriers. All I believe is that what we have right now is not enough yet. (I also believe autoregressive models may not be capable of AGI.)
Much of the space of artificial intelligence is based on a goal of a general reasoning machine comparable to the reasoning of a human. There are many subfields that are less concerned with this, but in practice, artificial intelligence is perceived to have that goal.
I am sure the output of current frontier models is convincing enough to outperform the appearance of humans to some. There is still an ongoing outcry from when GPT-4o was discontinued from users who had built a romantic relationship with their access to it. However I am not convinced that language models have actually reached the reliability of human reasoning.
Even a dumb person can be consistent in their beliefs, and apply them consistently. Language models strictly cannot. You can prompt them to maintain consistency according to some instructions, but you never quite have any guarantee. You have far less of a guarantee than you could have instead with a human with those beliefs, or even a human with those instructions.
I don't have citations for the objective reliability of human reasoning. There are statistics about unreliability of human reasoning, and also statistics about unreliability of language models that far exceed them. But those are both subjective in many cases, and success or failure rates are actually no indication of reliability whatsoever anyway.
On top of that, every human is different, so it's difficult to make general statements. I only know from my work circles and friend circles that most of the people I keep around outperform language models in consistency and reliability. Of course that doesn't mean every human or even most humans meet that bar, but it does mean human-level reasoning includes them, which raises the bar that models would have to meet. (I can't quantify this, though.)
There is a saying about fully autonomous self driving vehicles that goes a little something like: they don't just have to outperform the worst drivers; they have to outperform the best drivers, for it to be worth it. Many fully autonomous crashes are because the autonomous system screwed up in a way that a human would not. An autonomous system typically lacks the creativity and ingenuity of a human driver.
Though they can already be more reliable in some situations, we're still far from a world where autonomous driving can take liability for collisions, and that's because they're not nearly as reliable or intelligent enough to entirely displace the need for human attention and intervention. I believe Waymo is the closest we've gotten and even they have remote safety operators.
It's not enough for them to be "better" than a human. When they fail they also have to fail in a way that is legible to a human. I've seen ML systems fail in scenarios that are obvious to a human and succeed in scenarios where a human would have found it impossible. The opposite needs to be the case for them to be generally accepted as equivalent, and especially the failure modes need to be confined to cases where a human would have also failed. In the situations I've seen, customers have been upset about the performance of the ML model because the solution to the problem was patently obvious to them. They've been probably more upset about that than about situations where the ML model fails and the end customer also fails.
It's roughly why I think this way, along with a statement that I don't have objective citations. So sure, it's not a citation. I even said as much, right in the middle there.
Nothing you've said about reasoning here is exclusive to LLMs. Human reasoning is also never guaranteed to be deterministic, excluding most correct solutions. As OP says, they may not be reasoning under the hood but if the effect is the same as a tool, does it matter?
I'm not sure if I'm up to date on the latest diffusion work, but I'm genuinely curious how you see them potentially making LLMs more deterministic? These models usually work by sampling too, and it seems like the transformer architecture is better suited to longer context problems than diffusion
The way I imagine greedy sampling for autoregressive language models is guaranteeing a deterministic result at each position individually. The way I'd imagine it for diffusion language models is guaranteeing a deterministic result for the entire response as a whole. I see diffusion models potentially being more promising because the unit of determinism would be larger, preserving expressivity within that unit. Additionally, diffusion language models iterate multiple times over their full response, whereas autoregressive language models get one shot at each token, and before there's even any picture of the full response. We'll have to see what impact this has in practice; I'm only cautiously optimistic.
I guess it depends on the definition of deterministic, but I think you're right and there's strong reason to expect this will happen as they develop. I think the next 5 - 10 years will be interesting!
And not this or any existing generation of people. We're bad a determining want vs need, being specific, genericizing our goals into a conceptual framework of existing patterns and documenting & explaining things in a way that gets to a solid goal.
The idea that the entire top down processes of a business can be typed into an AI model and out comes a result is again, a specific type of tech person ideology that sees the idea of humanity as an unfortunate annoyance in the process of delivering a business. The rest of the world see's it the other way round.
Absolutely nuts, I feel like I'm living in a parallel universe. I could list several anecdotes here where Claude has solved issues for me in an autonomous way that (for someone with 17 years of software development, from embedded devices to enterprise software) would have taken me hours if not days.
To the nay sayers... good luck. No group of people's opinions matter at all. The market will decide.
I think it’s just fear, I sure know that after 25 years as a developer with a great salary and throughout all that time never even considering the chance of ever being unemployable I’m feeling it too.
I think some of us come to terms with it in different ways.
I wonder if the parent comments remark is a communication failure or pedantry gone wrong, because like you, claude-code is out there solving real problems and finding and fixing defects.
A large quantity of bugs as raised are now fixed by claude automatically from just the reports as written. Everything is human reviewed and sometimes it fixes it in ways I don't approve, and it can be guided.
It has an astonishing capability to find and fix defects. So when I read "It can't find flaws", it just doesn't fit my experience.
I have to wonder if the disconnect is simply in the definition of what it means to find a flaw.
But I don't like to argue over semantics. I don't actually care if it is finding flaws by the sheer weight of language probability rather than logical reasoning, it's still finding flaws and fixing them better than anything I've seen before.
I can't control random internet people, but within my personal and professional life, I see the effective pattern of comparing prompts/contexts/harnesses to figure out why some are more effective than others (in fact tooling is being developed in the AI industry as a whole to do so, claude even added the "insights" command).
I feel that many people that don't find AI useful are doing things like, "Are there any bugs in this software?" rather than developing the appropriate harness to enable the AI to function effectively.
It's also literally factually incorrect. Pretty much the entire field of mechanistic interpretability would obviously point out that models have an internal definition of what a bug is.
> Thus, we concluded that 1M/1013764 represents a broad variety of errors in code.
(Also the section after "We find three different safety-relevant code features: an unsafe code feature 1M/570621 which activates on security vulnerabilities, a code error feature 1M/1013764 which activates on bugs and exceptions")
This feature fires on actual bugs; it's not just a model pattern matching saying "what a bug hunter may say next".
(Not GP) There was a well recognized reproducibility problem in the ML field before LLM-mania, and that's considering published papers with proper peer-reviews. The current state of afairs in some ways is even less rigourous than that, and then some people in the field feel free to overextend their conclusions into other fields like neurosciences.
Current LLMs do not think. Just because all models anthropomorphize the repetitive actions a model is looping through does not mean they are truly thinking or reasoning.
On the flip side the idea of this being true has been a very successful indirect marketing campaign.
While I agree, if you think that AI is just a text predictor, you are missing an important point.
Intelligence, can be borne of simple targets, like next token predictor. Predicting the next token with the accuracy it takes to answer some of the questions these models can answer, requires complex "mental" models.
Dismissing it just because its algorithm is next token prediction instead of "strengthen whatever circuit lights up", is missing the forest for the trees.
You’re committing the classic fallacy of confusing mechanics with capabilities. Brains are just electrons and chemicals moving through neural circuits. You can’t infer constraints on high-level abilities from that.
This goes both ways. You can't assume capabilities based on impressions. Especially with LLMs, which are purpose built to give an impression of producing language.
Also, designers of these systems appear to agree: when it was shown that LLMs can't actually do calculations, tool calls were introduced.
It's true that they only give plausible sounding answers. But let's say we ask a simple question like "What's the sum of two and two?" The only plausible sounding answer to that will be "four." It doesn't need to have any fancy internal understanding or anything else beyond prediction to give what really is the same answer.
The same goes for a lot of bugs in code. The best prediction is often the correct answer, being the highlighting of the error. Whether it can "actually find" the bugs—whatever that means—isn't really so important as whether or not it's correct.
It becomes important the moment your particular bug is on one hand typical, but has a non-typical reason. In such cases you'll get nonsense which you need to ignore.
Again - they're very useful, as they give great answers based on someone else's knowledge and vague questions on part of the user, but one has to remain vigilant and keep in mind this is just text presented to you to look as believable as possible. There's no real promise of correctness or, more importantly, critical thinking.
That is not exactly true. The brain does a lot of things that are not "pattern recognition".
Simpler, more mundane (not exactly, still incredibly complicated) stuff like homeostasis or motor control, for example.
Additionally, our ability to plan ahead and simulate future scenarios often relies on mechanisms such as memory consolidation, which are not part of the whole pattern recognition thing.
The brain is a complex, layered, multi-purpose structure that does a lot of things.
> In the end it will be the users sculpting formal systems like playdoh.
I’m very skeptical of this unless the AI can manage to read and predict emotion and intent based off vague natural language. Otherwise you get the classic software problem of “What the user asked for directly isn’t actually what they want/need.”
You will still need at least some experience with developing software to actually get anything useful. The average “user” isn’t going to have much success for large projects or translating business logic into software use cases.
Unfortunately, I believe the following will happen:
By positioning themselves close to law makers, the AI companies will in the near future declare ownership of all software code developed using their software.
They will slowly erode their terms of service, as happens to most internet software, step by step, until they claim total ownership.
> AI companies will in the near future declare ownership of all software code developed using their software.
(X) Doubt
Copyright law is WEEEEEEIRRRDD and our in-house lawyer is very much into that, personally and professionally. An example they gave us during a presentation:
IIRC the latest resolution is "it's not the monkey", but nobody has ruled the photographer has copyright either. =)
Copyright law has this thing called "human authorship" that's required to apply copyright to a work. Animals and machines can't have a copyright to anything.
A comic generated with Midjourney had its copyright revoked when it was discovered all of the art was done with Generative AI.
AI companies have absolutely mindboggling amounts of money, but removing the human authorship requirement from copyright is beyond even them in my non-lawyer opinion. It would bring the whole system crashing down and not in a fun way for anyone.
AFAIK you can't copyright AI generated content. I don't know where that gets blurry when it's mixed in with your own content (ie, how much do you need to modify it to own it), but I think that by that definition these companies couldn't claim your code at all. Also, with the lawsuit that happened to Anthropic where they had to pay billions for ingesting copyrighted content, it might actually end up working the other way around.
> the AI companies will in the near future declare ownership of all software code developed using their software.
Pretty sure this isn’t going to happen. AI is driving the cost of software to zero; it’s not worth licensing something that’s a commodity.
It’s similar to 3D printing companies. They don’t have IP claims on the items created with their printers.
The AI companies currently don’t have IP claims on what their agents create.
Uncle Joe won’t need to pay OpenAI for the solitaire game their AI made for him.
The open source models are quite capable; in the near future there won’t be a meaningful difference for the average person between a frontier model and an open source one for most uses including creating software.
This assumes every individual is capable of succinctly communicating to the AI what they want. And the AI is capable of maintaining it as underlying platforms and libraries shift.
And that there is little value in reusing software initiated by others.
> This assumes every individual is capable of succinctly communicating to the AI what they want. And the AI is capable of maintaining it as underlying platforms and libraries shift.
I think there are people who want to use software to accomplish a goal, and there are people who are forced to use software. The people who only use software because the world around them has forced it on them, either through work or friends, are probably cognitively excluded from building software.
The people who seek out software to solve a problem (I think this is most people) and compare alternatives to see which one matches their mental model will be able to skip all that and just build the software they have in mind using AI.
> And that there is little value in reusing software initiated by others.
I think engineers greatly over-estimate the value of code reuse. Trying to fit a round peg in a square hole produces more problems than it solves.
A sign of an elite engineer is knowing when to just copy something and change it as needed rather than call into it.
Or to re-implement something because the library that does it is a bad fit.
The only time reuse really matters is in network protocols. Communication requires that both sides have a shared understanding.
>The only time reuse really matters is in network protocols. Communication requires that both sides have a shared understanding.
A lot of things are like network protocols. Most things require communication. External APIs, existing data, familiar user interfaces, contracts, laws, etc.
Language itself (both formal and natural) depends on a shared understanding of terms, at least to some degree.
AI doesn't magically make the coordination and synchronisation overhead go away.
Also, reusing well debugged and battle tested code will always be far more reliable than recreating everything every time anything gets changed.
Even within a single computer or program, there is need for communication protocols and shared understanding - such as types, data schema, function signatures. It's the interface between functions, programs, languages, machines.
It could also be argued that "reuse" doesn't necessarily mean reusing the actual code as material, but reusing the concepts and algorithms. In that sense, most code is reuse of some previous code, written differently every time but expressing the same ideas, building on prior art and history.
That might support GP's comment that "code reuse" is overemphasized, since the code itself is not what's valuable, what the user wants is the computation it represents. If you can speak to a computer and get the same result, then no code is even necessary as a medium. (But internally, code is being generated on the fly.)
I think we shouldn't get too hung up on specific artifacts.
The point is that specifying and verifying requirements is a lot of work. It takes time and resources. This work has to be reused somehow.
We haven't found a way to precisely specify and verify requirements using only natural language. It requires formal language. Formal language that can be used by machines is called code.
So this is what leads me to the conclusion that we need some form of code reuse. But if we do have formal specifications, implementations can change and do not necessarily have to be reused. The question is why not.
This reframes the whole conversation. If implementations are cheap to regenerate, specifications become the durable artifact.
Something like TLA+ model checking lets you verify that a protocol maintains safety invariants across all reachable states, regardless of who wrote the implementation. The hard part was always deciding what "correct" means in your specific domain.
Most teams skip formal specs because "we don't have time." If agents make implementations nearly free, that excuse disappears. The bottleneck shifts from writing code to defining correctness.
> I think there are people who want to use software to accomplish a goal, and there are people who are forced to use software.
Typically people feel they're "forced" to use software for entirely valid reasons, such as said software being absolutely terrible to use. I'm sure that most people like using software that they feel like actually helps rather than hinders them.
> I think engineers greatly over-estimate the value of code reuse[...]The only time reuse really matters is in network protocols.
The whole idea of an OS is code reuse (and resources management). No need to setup the hardware to run your application. Then we have a lot of foundational subsystems like graphics, sound, input,... Crafting such subsystems and the associated libraries are hard and requires a lot of design thinking.
no but if the old '10x developer' is really 1 in 10 or 1 in 100, they might just do fine while the rest of us, average PHP enjoyers, may go to the wayside
>This assumes every individual is capable of succinctly communicating to the AI what they want. And the AI is capable of maintaining it as underlying platforms and libraries shift.
It's true that at first not everyone is just as efficient, but I'd be lying if I were to claim that someone needs a 4-year degree to communicate with LLM's.
> We pay huge communication/synchronization costs to eek out mild speed ups on projects by adding teams of people.
Something Brooks wrote about 50 years ago, and the industry has never fully acknowledged. Throw more bodies at it, be they human bodies or bot agent bodies.
The point of the mythical man month is not that more people are necessarily worse for a project, it's just that adding them at the last minute doesn't work, because they take a while to get up to speed and existing project members are distracted while trying to help them.
It's true that a larger team, formed well in advance, is also less efficient per person, but they still can achieve more overall than small teams (sometimes).
Interesting point. And from the agents point of view, it’s always joining at the last minute, and doesn’t stick around longer than its context window. There’s a lesson in there maybe…
The context window is the onboarding period. Every invocation is a new hire reading the codebase for the first time.
This is why architecture legibility keeps getting more important. Clean interfaces, small modules, good naming. Not because the human needs it (they already know the codebase) but because the agent has to reconstruct understanding from scratch every single time.
Brooks was right that the conceptual structure is the hard part. We just never had to make it this explicit before.
But there is a level of magnitude difference between coordinating AI agents and humans - the AIs are so much faster and more consistent than humans, that you can (as Steve Yegge [0] and Nicholas Carlini [1] showed) have them build a massive project from scratch in a matter of hours and days rather than months and years. The coordination cost is so much lower that it's just a different ball game.
I think we are. There's definitely been an uptick in "show HN" type posts with quite impressively complex apps that one person developed in a few weeks.
From my own experience, the problem is that AI slows down a lot as the scale grows. It's very quick to add extra views to a frontend, but struggles a lot more in making wide reaching refactors. So it's very easy to start a project, but after a while your progress slows significantly.
But given I've developed 2 pretty functional full stack applications in the last 3 months, which I definitely wouldn't have done without AI assistance, I think it's a fair assumption that lots of other people are doing the same. So there is almost certainly a lot more software being produced than there was before.
I think the proportion of new software that is novel has absolutely plummeted after the advent of AI. In my experience, generative AI will easily reproduce code for which there are a multitude of examples on GitHub, like TODO CRUD React Apps. And many business problems can be solved with TODO CRUD React Apps (just look at Excel’s success), but not every business problem can be solved by TODO CRUD React Apps.
As an analogy: imagine if someone was bragging about using Gen AI to pump out romantasy smut novels that were spicy enough to get off to. Would you think they’re capable of producing the next Grapes of Wrath?
It’s been a minute and a half and I don’t see the evidence you can task an agent swarm to produce useful software without your input or review. I’ve seen a few experiments that failed, and I’ve seen manic garbage, but not yet anything useful outside of the agent operators imagination.
Agent swarms are what, a couple of months old? What are you even talking about. Yes, people/humans still drive this stuff, but if you think there isn't useful software out there that can be handily implemented with current gen agents that need very little or no review, then I don't know what to tell you, apart from "you're mistaken". And I say that as someone who uses three tools heavily but has otherwise no stake in them. The copium in this space is real. Everyone is special and irreplaceable, until another step change pushes them out.
The next thing after agent swarms will be swarm colonies and people will go "it's been a month since agentic swarm colonies, give it a month or two". People have been moving the goal posts like that for a couple years now, it's starting to grow stale. This is like self driving cars which were going to be workingin 2016 and replace 80% of drivers by 2017, all over again. People falling for hype instead of admitting that while it appears somewhat useful, nobody has any clue if it's 97% useful or just 3% useful but so far it's looking like the later.
I work for one of those enterprises with lots of people trying out AI (thankfully leadership is actually sane, no mandates that you have to use it, just giving devs access to experiment with the tools and see what happens). Lots of people trying it out in earnest, lots of newsletters about new techniques and all that kinda stuff. Lots of people too, so there's all sorts of opinions from very excited to completely indifferent.
Precisely 0 projects are making it out any faster or (IMO more importantly) better. We have a PR review bot clogging up our PRs with fucking useless comments, rewriting the PR descriptions in obnoxious ways, that basically everyone hates and is getting shut off soon. From an actual productivity POV, people are just using it for a quick demo or proof of concept here and there before actually building the proper thing manually as before. And we have all the latest and greatest techniques, all the AGENTS.mds and tool calling and MCP integrations and unlimited access to every model we care to have access to and all the other bullshit that OpenAI et al are trying to shove on people.
It's not for a lack of trying, plenty of people are trying to make any part of it work, even if it's just to handle the truly small stuff that would take 5 minutes of work but is just tedious and small enough to be annoying to pick up. It's just not happening, even with extremely simple tasks (that IMO would be better off with a dedicated, small deterministic script) we still need human overview because it often shits the bed regardless, so the effort required to review things is equal or often greater than just doing the damn ticket yourself.
My personal favorite failure is when the transcript bots just... Don't transcript random chunks of the conversation, which can often lead to more confusion than if we just didn't have anything transcribed. We've turned off the transcript and summarization bots, because we've found 9/10 times they're actively detrimental to our planning and lead us down bad paths.
I build a code reviewer based on the claude code sdk that integrates with gitlab, pretty straightforward. The hard work is in the integration, not the review itself. That is taken care of with SDK.
Devs, even conservative ones, like it. I’ve built a lot of tooling in my life, but i never had the experience that devs reach out to me that fast because it is ‘broken’. (Expired token or a bug for huge MRs)
I have barely downloaded any apps in the last 5-10 years except some necessary ones like bank apps etc. Who even needs that garbage? Steam also has tons of games but 80% make like no money at all and no one cares. Just piles of garbage. We already have limited hours per day and those are not really increasing so I wonder where are the users.
> One of the tips, especially when using Claude Code, is explictly ask to create a "tasks", and also use subagents. For example I want to validate and re-structure all my documentation - I would ask it to create a task to research state of my docs, then after create a task per specific detail, then create a task to re-validate quality after it has finished task.
Which sounds pretty much the same as how work is broken down and handed out to humans.
Yes, but you can do this at the top level, and then have AI agents do this themselves for all the low level tasks, which is then orders of magnitude faster than with human coordination.
Communication overhead between humans is real, but it's not just inefficiency, it's also where a lot of the problem-finding happens. Many of the biggest failures I've seen weren't because nobody could type the code fast enough, but because nobody realized early enough that the thing being built was wrong, brittle or solving the wrong problem
> Many of the biggest failures I've seen weren't because nobody could type the code fast enough, but because nobody realized early enough that the thing being built was wrong, brittle or solving the wrong problem
Around 99% of biggest failures come from absent, shitty management prioritizing next quarter over long strategy. YMMV.
Far from everyone are cut out to be programmers, the technical barrier was a feature if anything.
There's a kind of mental discipline and ability to think long thoughts, to deal with uncertainty; that's just not for everyone.
What I see is mostly everyone and their gramps drooling at the idea of faking their way to fame and fortune. Which is never going to work, because everyone is regurgitating the same mindless crap.
Remember when Visual Basic was making everyone a programmer too?
(btw, warm fuzzies for VB since that's what I learned on! But ultimately, those VB tools business people were making were:
1) Useful, actually!
2) Didn't replace professional software. Usually it'd hit a point where if it needed to evolve past its initial functionality it probably required an actual software developer. (IE, not using Access as a database and all the other eccentricities of VB apps at that time)
This looks like the same problem as when the first page layout software came out.
It looked to everyone like a huge leap into a new world word processing applications could basically move around blocks of text to be output later, maybe with a few font tags, then this software came out that wow actually showed the different fonts, sizes, and colors on the screen as you worked! With apps like "Pagemaker" everyone would become their own page designers!
It turned out that everyone just turned out floods of massively ugly documents and marketing pieces that looked like ransom notes pasted together from bits of magazines. Years of awfulness.
The same is happening now as we are doomed to endure years AI slop in everything from writing to apps to products to vending machines an entire companies — everyone and their cousin is trying to fully automate it.
Ultimately it does create an advance and allows more and better work to be done, but only for people who have a clue about what they are doing, and eventually things settle at a higher level where the experts in each field take the lead.
I think I know what you mean, and I do recall once seeing "this experience will leverage me" as indicating that something will be good for a person, but my first thought when seeing "x will leverage y" is that x will step on top of y to get to their goal, which does seem apt here.
>In the end it will be the users sculpting formal systems like playdoh.
Yet another person who thinks that there is a silver bullet for complexity. The mythical intelligent machines that from poorly described natural language can erect flawless complex system is like the philosopher's stone of our time.
I'm rounding the corner on a ground's up reimplementation of `nix` in what is now about 34 hours of wall clock time, I have almost all of it on `wf-record`, I'll post a stream, but you can see the commit logs here: https://github.com/straylight-software/nix/tree/b7r6/correct...
Everyone has the same ability to use OpenRouter, I have a new event loop based on `io_uring` with deterministic playbook modeled on the Trinity engine, a new WASM compiler, AVX-512 implementations of all the cryptography primitives that approach theoretical maximums, a new store that will hit theoretical maximums, the first formal specification of the `nix` daemon protocol outside of an APT, and I'm upgrading those specifications to `lean4` proof-bearing codegen: https://github.com/straylight-software/cornell.
34 hours.
Why can I do this and no one else can get `ca-derivations` to work with `ssh-ng`?
Here's another colleague with a Git forge that will always work and handle 100x what GitHub does per infrastructure dollar while including stacked diffs and Jujitsu support as native in about 4 days: https://github.com/straylight-software/strayforge
Here's another colleague and a replacement for Terraform that is well-typed in all cases and will never partially apply an infrastructure change in about 4 days: https://github.com/straylight-software/converge
/tangent i've always like the word "straylight", I use to run a fansite for a local band and the site was called straylight6. This was maybe 20 years ago.
It has average taste based on the code it was trained on. For example, every time I attempted to polish the UX it wanted to add a toast system, I abhor toasts as a UX pattern. But it also provided elegant backend designs I hadn't even considered.
> especially in the long run, at least in software
"at least in software".
Before that happens, the world as we know it will already have changed so much.
Programmers have already automated many things, way before AI, and now they've got a new tool to automate even more thing. Sure in the end AI may automate programmers themselves: but not before oh-so-many people are out of a job.
A friend of mine is a translator: translates tolerates approximation. Translation tolerates some level of bullshittery. She gets maybe 1/10th the job she used to get and she's now in trouble. My wife now does all he r SMEs' websites all by herself, with the help of AI tools.
A friend of my wife she's a junior lawyer (another domain where bullshitting flies high) and the reason for why she was kicked out of her company: "we've replaced you with LLMs". LLMs are the ultimate bullshit producers: so it's no surprise junior lawyers are now having a hard time.
In programming a single character is the difference between a security hole or no security hole. There's a big difference between something that kinda works but is not performant and insecure and, say, Linux or Git or K8s (which AI models do run on and which AI didn't create).
The day programmers are replaced shall only come after AI shall have disrupted so many other jobs that it should be the least of our concerns.
Translators, artists (another domain where lots of approximative full-on bullshit is produced), lawyers (juniors at least) even, are having more and more problems due to half-arsed AI outputs coming after their jobs.
It's all the bullshitty jobs where bullshit that tolerates approximation is the output that are going to be replaced first. And the world is full of bullshit.
But you don't fly a 767 and you don't conceive a machine that treats brain tumors with approximations. This is not bullshit.
There shall be non-programmers with pitchforks burning datacenters or ubiquitous UBI way before AI shall have replaced programmers.
That it's an exoskeleton for people who know what they're doing rings very true: it's yet another superpower for devs.
> We pay huge communication/synchronization costs to eek out mild speed ups on projects by adding teams of people.
I am surprised at how little this is discussed and how little urgency there is in fixing this if you still want teams to be as useful in the future.
Your standard agile ceremonies were always kind of silly, but it can now take more time to groom work than to do it. I can plausibly spend more time scoring and scoping work (especially trivial work) than doing the work.
It's always been like that. Waterfall development was worse and that's why the Agilists invented Agile.
YOLOing code into a huge pile at top speed is always faster than any other workflow at first.
The thing is, a gigantic YOLO'd code pile (fake it till you make it mode) used to be an asset as well as a liability. These days, the code pile is essentially free - anyone with some AI tools can shit out MSLoCs of code now. So it's only barely an asset, but the complexity of longer term maintenance is superlinear in code volume so the liability is larger.
An exoskeleton is something really cool in movies that has zero reason to be build in reality because there are way more practical approaches.
That is why we have all kind of vehicles, or programmable robot arm that do the job for themselves or if you need a human at the helm one just adds a remote controller with levers and buttons. But making a human shaped gigantic robot with a normal human inside is just impractical for any real commercial use.
Why even bother thinking about AI, when Anthropic and OpenAI CEOs openly tell us what they want (quote from recent Dwarkesh interview) - "Then further down the spectrum, there’s 90% less demand for SWEs, which I think will happen but this is a spectrum."
So save thinking and listen to intent - replace 90% of SWEs in near future (6-12 months according to Amodei).
I don't think anyone serious believes this. Replacing developers with a less costly alternative is obviously a very market bullish dream, it has existed since as long as I've worked in the field. First it was supposed to be UML generated code by "architects", then it was supposed to be developers from developing countries, then no-code frameworks, etc.
AI will be a tool, no more no less. Most likely a good one, but there will still need to be people driving it, guiding it, fixing for it, etc.
All these discourses from CEO are just that, stock market pumping, because tech is the most profitable sector, and software engineers are costly, so having investors dream about scale + less costs is good for the stock price.
Ah, don't take me wrong - I don't believe it's possible for LLMs to replace 90% or any number of SWEs with existing technology.
All I'm saying is - why to think what AI is (exoskeleton, co-worker, new life form), when its owners intent is to create SWE replacement?
If your neighbor is building a nuclear reactor in his shed from a pile of smoke detectors, you don't say "think about this as a science experiment" because it's impossible, just call police/NRC because of intent and actions.
> If your neighbor is building a nuclear reactor in his shed from a pile of smoke detectors, you don't say "think about this as a science experiment" because it's impossible, just call police/NRC because of intent and actions.
Not without some major breakthrough. What's hilarious is that all these developers building the tools are going to be the first to be without jobs. Their kids will be ecstatic: "Tell me again, dad, so, you had this awesome and well paying easy job and you wrecked it? Shut up kid, and tuck in that flap, there is too much wind in our cardboard box."
Couldn't agree more, isn't that the bizarre thing? "We have this great intellectually challenging job where we as workers have leverage. How can we completely ruin that while also screwing up every other white collar profession"
Why is it bizarre? It is inevitable. After all, AI has not ruined creative professions, it merely disrupted and transformed them. And yes, I fully understand my whole comment here being snarky, but please bear with me.
> Actually all progress will definitely will have a huge impact on a lot of lives—otherwise it is not progress. By definition it will impact many, by displacing those who were doing it the old way by doing it better and faster. The trouble is when people hold back progress just to prevent the impact. No one should be disagreeing that the impact shouldn't be prevented, but it should not be at the cost of progress.
Now it's the software engineers turn to not hold back progress.
> [...] At the same time, a part of me feels art has no place being motivated by money anyway. Perhaps this change will restore the balance. Artists will need to get real jobs again like the rest of us and fund their art as a side project.
Replace "Artists" with "Coders" and imagine a plumber writing that comment.
> [...] Artists will still exist, but most likely as hybrid 3d-modellers, AI modelers (Not full programmers, but able to fine-tune models with online guides and setups, can read basic python), and storytellers (like manga artists). It'll be a higher-pay, higher-prestige, higher-skill-requirement job than before. And all those artists who devoted their lives to draw better, find this to be an incredibly brutal adjustment.
Again, replace "Artists" with coders and fill in the replacement.
So, please get in line and adapt. And stop clinging to your "great intellectually challenging job" because you are holding back progress. It can't be that challenging if it can be handled by a machine anyway.
Is it though? I agree the technology evolving is inevitable, but, the race/rush to throw as much money at scaling and marketing as possible before these things are profitable and before society is ready is not inevitable at all. It feels extremely forced. And the way it's being shoved into every product to juice usage numbers seems to agree with me that it's all premature and rushed and most people don't really want it. The bubble is essentially from investing way more money in datacenters and GPU's than they can even possibly pay for or build, and there's no evidence there's even a market for using that capacity!
It's funny you bring up artists, because I used to work in game development and I've worked with a lot of artists, and they almost universally HATE this stuff. They're not like "oh thank you Mr. Altman", they're more like "if we catch you using AI we'll shun you." And it's not just producers, a lot of gamers are calling out games that are made using AI, so the customers are mad too.
You keep talking about "progress", but "progress" towards what exactly? So far these things aren't making anything new or advancing civilization, they're remixing stuff we already did well before, but sloppily. I'm not saying they don't have a place -- they definitely do, they can be useful. My argument is against the bizarre hype machine and what sometimes seems like sock puppets on social media. If the marketting was just "hey, we have this neat AI, come use it" I think there'd be a lot less backlash then people saying "Get in line and adapt"
> And stop clinging to your "great intellectually challenging job" because you are holding back progress.
Man, I really wish I had the power you think I have. Also, I use these tools daily, I'm deeply familiar with them, I'm not holding back anyone's progress, not even my own. That doesn't mean I think they're beyond criticism or that the companies behind them are acting responsibly, or that every product is great. I plan to be part of the future, but I'm not just going to pretend like I think every part of it is brilliant.
> It can't be that challenging if it can be handled by a machine anyway.
This will be really funny when it comes for your job.
The premise of those comments, just like the premise in this thread, is ridiculous and fantastical.
The only way generative AI has changed the creative arts is that it's made it easier to produce low quality slop.
I would not call that a true transformation. I'd call that saving costs at the expense of quality.
The same is true of software. The difference is, unlike art, quality in software has very clear safety and security implications.
This gen AI hype is just the crypto hype all over again but with a sci-fi twist in the narrative. It's a worse form of work just like crypto was a worse form of money.
I do not disagree, in fact I'm feeling more and more Butlerian with every passing day. However, it is undeniable that a transformation is taking place -- just not necessarily to the better.
Gen AI is the opposite of crypto. The use is immediate, obvious and needs no explanation or philosophizing.
You are basically showing your hand that you have zero intellectual curiosity or you are delusional in your own ability if you have never learned anything from gen AI.
I play with generative AI quite often. Mostly for shits and giggles. It's fun to try to make it hallucinate in the dumbest way possible. Or to make up context.
E.g. try to make any image generating model take an existing photo of a humanoid and change it so the character does a backflip.
It's also interesting to generate images in a long loop, because it usually reveals interesting patterns in the training data.
Outside these distractions I've never had generative AI be useful. And I'm currently working in AI research.
I'm assuming they all have enough equity that if they actually managed to build an AI capable of replacing themselves they'll be financially set for the rest of their lives.
Is it the first time when workers directly work on their own replacement?
If so, software developer may go down in history as the dumbest profession ever.
If the goal is to reduce the need for SWE, you don’t need AI for that. I suspect I’m not alone in observing how companies are often very inefficient, so that devs end up spending a lot of time on projects of questionable value—something that seems to happen more often the larger the organization. I recall at one job my manager insisted I delegate building a react app for an internal tool to a team of contractors rather than letting me focus for two weeks and knock it out myself.
It’s always the people management stuff that’s the hard part, but AI isn’t going to solve that. I don’t know what my previous manager’s deal was, but AI wouldn’t fix it.
The funny thing is I think these things would work much better if they WEREN'T so insistent on the agentic thing. Like, I find in-IDE AI tools a lot more precise and I usually move just as fast as a TUI with a lot less rework. But Claude is CONSTANTLY pushing me to try to "one shot" a big feature while asking me for as little context as possible. I'd much rather it work with me as opposed to just wandering off and writing a thousand lines. It's obviously designed for anthropic's best interests rather than mine.
I do. But, there's a lot of annoying things about it being a TUI. I can't select a block of text in my editor and ask it to do something with it. It doesn't know what I'm looking at. Giving it context feels imprecise because I'm writing out filenames by hand instead of referencing them with the tools. A lot of other small things that I find are better in an IDE
That happens in times of bullish markets and growing economies. Then we want a lot of SWEs.
In times of uncertainty and things going south, that changes to we need as little SWEs as possible, hence the current narrative, everyone is looking to cut costs.
Had GPT 3 emerged 10-20 years ago, the narrative would be “you can now do 100x more thanks to AI”.
I sort of agree the random pontification and bad analogies aren't super useful, but I'm not sure why you would believe the intent of the AI CEOs has more bearing on outcomes than, you know, actual utility over time. I mean those guys are so far out over their skis in terms of investor expectations, it's the last opinion I would take seriously in terms of best-effort predictions.
Who is actually trying to use a fully autonomous AI employee right now?
Isn't everyone using agentic copilots or workflows with agent loops in them?
It seems that they are arguing against doing something that almost no one is doing yet.
But actually the AI Employee is coming by the end of 2026 and the fully autonomous AI Company in 2027 sometime.
Many people have been working on versions of these things for awhile. But again for actual work 99% are using copilots or workflows with well-defined agent loops nodes still. Far as I know.
As a side note I have found that a supervisor agent with a checklist can fire off subtasks and that works about as well as a workflow defined in code.
But anyway, what's holding back the AI Employee are things like really effective long term context and memory management and some level of interface generality like browser or computer use and voice. Computer use makes context management even more difficult. And another aspect is token cost.
But I assume within the next 9 months or so, more and more people will be figuring out how to build agents that write their own workflows, manage their own limited context and memory
effectively across Zoom meetings desktops and ssh sessions, etc.
This will likely be a featureset from the model providers themselves. Actually it may leverage continual learning abilities baked into the model architecture itself. I doubt that is a full year away.
> the AI Employee is coming by the end of 2026 and the fully autonomous AI Company in 2027 sometime
We'll see! I'm skeptical.
> what's holding back the AI Employee are things like really effective long term context and memory management and some level of interface generality like browser or computer use and voice
These are pretty big hurdles. Assuming they're solved by the end of this year is a big assumption to make.
Pika AI Selves let you create a persistent, portable AI version of you built on your personality, taste, memories, voice, and appearance. They're multi-modal – text, voice/audio, image, video – and live your life across every platform.
Yeah, as someone said before, that's the em dash of 2026.
btw I also find em dashes very useful and now I can't use them because of that meme. It's good to see a person using one (asuming you're a person).
In the latest interview with Claude Code's author: https://podcasts.apple.com/us/podcast/lennys-podcast-product..., Boris said that writing code is a solved problem. This brings me to a hypothetical question: what if engineers stop contributing to open source, in which case would AI still be powerful enough to learn the knowledge of software development in the future? Or is the field of computer science plateaued to the point that most of what we do is linear combination of well established patterns?
> Boris said that writing code is a solved problem
That's just so dumb to say. I don't think we can trust anything that comes out of the mouths of the authors of these tools. They are conflicted. Conflict of interest, in society today, is such a huge problem.
There are bloggers that can't even acknowledge that they're only invited out to big tech events because they'll glaze them up to high heavens.
Reminds me of that famous exchange, by noted friend of Jeffrey Epstein, Noam Chomsky: "I’m not saying you’re self-censoring. I’m sure you believe everything you say. But what I’m saying is if you believed something different you wouldn’t be sitting where you’re sitting."
He is likely working on a very clean codebase where all the context is already reachable or indexed. There are probably strong feedback loops via tests. Some areas I contribute to have these characteristics, and the experience is very similar to his. But in areas where they don’t exist, writing code isn’t a solved problem until you can restructure the codebase to be more friendly to agents.
Even with full context, writing CSS in a project where vanilla CSS is scattered around and wasn’t well thought out originally is challenging. Coding agents struggle there too, just not as much as humans, even with feedback loops through browser automation.
It's funny that "restructure the codebase to be more friendly to agents" aligns really well with what we have "supposed" to have been doing already, but many teams slack on: quality tests that are easy to run, and great documentation. Context and verifiability.
The easier your codebase is to hack on for a human, the easier it is for an LLM generally.
Turns out the single point of failure irreplaceable type of employees who intentionally obfuscated the projects code for the last 10+ years were ahead of their time.
I had this epiphany a few weeks ago, I'm glad to see others agreeing. Eventually most models will handle large enough context windows where this will sadly not matter as much, but it would be nice for the industry to still do everything to make better looking code that humans can see and appreciate.
It’s really interesting. It suggests that intelligence is intelligence, and the electronic kind also needs the same kinds of organization that humans do to quickly make sense of code and modify it without breaking something else.
Truth. I've had much easier time grappling with code bases I keep clean and compartmentalized with AI, over-stuffing context is one of the main killers of its quality.
Having picked up a few long neglected projects in the past year, AI has been tremendous in rapidly shipping quality of dev life stuff like much improved test suites, documenting the existing behavior, handling upgrades to newer framework versions, etc.
I've really found it's a flywheel once you get going.
I think you mean software engineering, not computer science. And no, I don’t think there is reason for software engineering (and certainly not for computer science) to be plateauing. Unless we let it plateau, which I don’t think we will. Also, writing code isn’t a solved problem, whatever that’s supposed to mean. Furthermore, since the patterns we use often aren’t orthogonal, it’s certainly not a linear combination.
I assume that new business scenarios will drive new workflows, which requires new work of software engineering. In the meantime, I assume that computer science will drive paradigm shift, which will drive truly different software engineering practice. If we don't have advances in algorithms, systems, and etc, I'd assume that people can slowly abstract away all the hard parts, enabling AI to do most of our jobs.
Or does the field become plateaued because engineers treat "writing code" as a "solved problem?"
We could argue that writing poetry is a solved problem in much the same way, and while I don't think we especially need 50,000 people writing poems at Google, we do still need poets.
> we especially need 50,000 people writing poems at Google, we do still need poets.
I'd assume that an implied concern of most engineers is how many software engineers the world will need in the future. If it's the situation like the world needing poets, then the field is only for the lucky few. Most people would be out of job.
I saw Boris give a live demo today. He had a swarm of Claude agents one shot the most upvoted open issue on Excalidraw while he explained Claude code for about 20 minutes.
No lines of code written by him at all. The agent used Claude for chrome to test the fix in front of us all and it worked. I think he may be right or close to it.
Did he pick Excalidraw as the project to work on, or did the audience?
It's easy to be conned if you're not looking for the sleight of hand. You need to start channelling your inner Randi whenever AI demos are done, there's a lot of money at stake and a lot of money to prep a polished show.
To be honest, even if the audience "picked" that project, it could have been a plant shouting out the project.
I'm not saying they prepped the answer, I'm saying they prepped picking a project it could definitely work on. An AI solvable problem.
My prediction: soon (e.g. a few years) the agents will be the one doing the exploration and building better ways to write code, build frameworks,... replacing open source. That being said software engineers will still be in the loop. But there will be far less of them.
Just to add: this is only the prediction of someone who has a decent amount of information, not an expert or insider
Generally us humans come up with new things by remixing old ideas. Where else would they come from? We are synthesizing priors into something novel. If you break the problem space apart enough, I don't see why some LLM can't do the same.
Yes it is, LLMs perform logical multi step reasoning all the time, see math proofs, coding etc. And whether you call it synthesis or statistical mixing is just semantics. Do LLMs truly understand? Who knows, probably not, but they do more than you make it out to be.
I don't want to speak too much out of my depth here, I'm still learning how these things work on a mechanical level, but my understanding of how these things "reason" is it seems like they're more or less having a conversation with themselves. IE, burning a lot of tokens in the hopes that the follow up questions and answers it generates leads to a better continuation of the conversation overall. But just like talking to a human, you're likely to come up with better ideas when you're talking to someone else, not just yourself, so the human in the loop seems pretty important to get the AI to remix things into something genuinely new and useful.
They do not. The "reasoning" is just adding more text in multiple steps, and then summarizing it. An LLM does not apply logic at any point, the "reasoning" features only use clever prompting to make these chains more likely to resemble logical reasoning.
This is still only possible if the prompts given by the user resembles what's in the corpus. And the same applies to the reasoning chain. For it to resemble actual logical reasoning, the same or extremely similar reasoning has to exist in the corpus.
This is not "just" semantics if your whole claim is that they are "synthesizing" new facts. This is your choice of misleading terminology which does not apply in the slightest.
There's so many timeless books on how to write software, design patterns, lessons learned from production issues. I don't think AI will stop being used for open source, in fact, with the number of increasing projects adjusting their contributor policies to account for AI I would argue that what we'll see is always people who love to hand craft their own code, and people who use AI to build their own open source tooling and solutions. We will also see an explosion is needing specs for things. If you give a model a well defined spec, it will follow it. I get better results the more specific I get about how I want things built and which libraries I want used.
> is the field of computer science plateaued to the point that most of what we do is linear combination of well established patterns?
Computer science is different from writing business software to solve business problems. I think Boris was talking about the second and not the first. And I personally think he is mostly correct. At least for my organization. It is very rare for us to write any code by hand anymore. Once you have a solid testing harness and a peer review system run by multiple and different LLMs, you are in pretty good shape for agentic software development. Not everybody's got these bits figured out. They stumble around and them blame the tools for their failures.
> Not everybody's got these bits figured out. They stumble around and them blame the tools for their failures.
Possible. Yet that's a pretty broad brush. It could also be that some businesses are more heavily represented in the training set. Or some combo of all the above.
Yes, there are common parts to everything we do, at the same time - I've been doing this for 25 years and most of the projects have some new part to them.
Novel problems are usually a composite of simpler and/or older problems that have been solved before. Decomposition means you can rip most novel problems apart and solve the chunks. LLMs do just fine with that.
Sure, people did it for the fun and the credits, but the fun quickly goes out of it when the credits go to the IP laundromat and the fun is had by the people ripping off your code. Why would anybody contribute their works for free in an environment like that?
I believe the exact opposite. We will see open source contributions skyrocket now. There are a ton of people who want to help and share their work, but technical ability was a major filter. If the barrier to entry is now lowered, expect to see many more people sharing stuff.
Yes, more people will be sharing stuff. And none of it will have long term staying power. Or do you honestly believe that a project like GCC or Linux would have been created and maintained over as long as they have been by the use of AI tools in the hands of noobs?
Technical ability is an absolute requirement for the production of quality work. If the signal drowns in the noise then we are much worse off than where we started.
I’m sure you know the majority of GCC and Linux contributors aren’t volunteers, but employees who are paid to contribute. I’m struggling to name a popular project that it isn’t the case. Can you?
If AI is powerful enough to flood open source projects with low quality code, it will be powerful enough to be used as gatekeeper. Major players who benefit from OSS, says Google, will make sure of that. We don’t know how it will play out. It’s shortsighted to dismiss it all together.
Ok but now you have raised the bar from "open source" to "quality work" :)
Even then, I am not sure that changes the argument. If Linus Torvalds had access to LLMs back then, why would that discourage him from building Linux? And we now have the capability of building something like Linux with fewer man-hours, which again speaks in favor of more open source projects.
Even as the field evolves, the phoning home telemetry of closed models creates a centralized intelligence monopoly. If open source atrophies, we lose the public square of architectural and design reasoning, the decision graph that is often just as important as the code. The labs won't just pick up new patterns; they will define them, effectively becoming the high priests of a new closed-loop ecosystem.
However, the risk isn't just a loss of "truth," but model collapse. Without the divergent, creative, and often weird contributions of open-source humans, AI risks stagnating into a linear combination of its own previous outputs. In the long run, killing the commons doesn't just make the labs powerful. It might make the technology itself hit a ceiling because it's no longer being fed novel human problem-solving at scale.
Humans will likely continue to drive consensus building around standards. The governance and reliability benefits of open source should grow in value in an AI-codes-it-first world.
> It might make the technology itself hit a ceiling because it's no longer being fed novel human problem-solving at scale.
My read of the recent discussion is that people assume that the work of far fewer number of elites will define the patterns for the future. For instance, implementation of low-level networking code can be the combination of patterns of zeromq. The underlying assumption is that most people don't know how to write high-performance concurrent code anyway, so why not just ask them to command the AI instead.
I don’t believe people who have dedicated their lives to open source will simply want to stop working on it, no matter how much is or is not written by AI. I also have to agree, I find myself more and more lately laughing about just how much resources we waste creating exactly the same things over and over in software. I don’t mean generally, like languages, I mean specifically. How many trillions of times has a form with username and password fields been designed, developed, had meetings over, tested, debugged, transmitted, processed, only to ultimately be re-written months later?
I wonder what all we might build instead, if all that time could be saved.
> I don’t believe people who have dedicated their lives to open source will simply want to stop working on it, no matter how much is or is not written by AI.
Yeah, hence my question can only be hypothetical.
> I wonder what all we might build instead, if all that time could be saved
If we subscribe to Economics' broken-window theory, then the investment into such repetitive work is not investment but waste. Once we stop such investment, we will have a lot more resources to work on something else, bring out a new chapter of the tech revolution. Or so I hope.
> If we subscribe to Economics' broken-window theory, then the investment into such repetitive work is not investment but waste. Once we stop such investment, we will have a lot more resources to work on something else, bring out a new chapter of the tech revolution. Or so I hope.
I'm not sure I agree with the application of the broken-window theory here. That's a metaphor intended to counter arguments in favor of make-work projects for economic stimulus: the idea here is that breaking a window always has a net negative on the economy, since even though it creates demand for a replacement window, the resources that are necessary to replace a window that already existed are just being allocated to restore the status quo ante, but the opportunity cost of that is everything else the same resources might have bee used for instead, if the window hadn't been broken.
I think that's quite distinct from manufacturing new windows for new installations, which is net positive production, and where newer use cases for windows create opportunities for producers to iterate on new window designs, and incrementally refine and improve the product, which wouldn't happen if you were simply producing replacements for pre-existing windows.
Even in this example, lots of people writing lots of different variations of login pages has produced incremental improvements -- in fact, as an industry, we haven't been writing the same exact login page over and over again, but have been gradually refining them in ways that have evolved their appearance, performance, security, UI intuitiveness, and other variables considerably over time. Relying on AI to design, not just implement, login pages will likely be the thing that causes this process to halt, and perpetuate the status quo indefinitely.
> Boris said that writing code is a solved problem.
No way, the person selling a tool that writes code says said tool can now write code? Color me shocked at this revelation.
Let's check in on Claude Code's open issues for a sec here, and see how "solved" all of its issues are? Or my favorite, how their shitty React TUI that pegs modern CPUs and consumes all the memory on the system is apparently harder to get right than Video Games! Truly the masters of software engineering, these Anthropic folks.
That is the same team that has an app that used React for TUI, that uses gigabytes to have a scrollback buffer, and that had text scrolling so slow you could get a coffee in between.
And that then had the gall to claim writing a TUI is as hard as a video game. (It clearly must be harder, given that most dev consoles or text interfaces in video games consistently use less than ~5% CPU, which at that point was completely out of reach for CC)
He works for a company that crowed about an AI-generated C compiler that was so overfitted, it couldn't compile "hello world"
So if he tells me that "software engineering is solved", I take that with rather large grains of salt. It is far from solved. I say that as somebody who's extremely positive on AI usefulness. I see massive acceleration for the things I do with AI. But I also know where I need to override/steer/step in.
I wanted to write the same comment. These people are fucking hucksters. Don’t listen to their words, look at their software … says all you need to know.
Even if you like them, I don't think there's any reason to believe what people from these companies say. They have every reason to exaggerate or outright lie, and the hype cycle moves so quickly that there are zero consequences for doing so.
The exoskeleton analogy seems to be fitting where my work-mode is configurable: moving from tentative to trusting. But the AI needs to be explicitly set up to learn my every action. Currently this is a chore at best, just impossible in other cases.
I like this. This is an accurate state of AI at this very moment for me. The LLM is (just) a tool which is making me "amplified" for coding and certain tasks.
I will worry about developers being completely replaced when I see something resembling it. Enough people worry about that (or say it to amp stock prices) -- and they like to tell everyone about this future too. I just don't see it.
Amplified means more work done by fewer people. It doesn’t need to replace a single entire functional human being to do things like kill the demand for labor in dev, which in turn, will kill salaries.
I would disagree. Amplified meens me and you get more s** done.
Unless there a limited amount of software we need to produce per year globally to keep everyone happy, then nobody wants more -- and we happen to be at that point right NOW this second.
I think not. We can make more (in less time) and people will get more. This is the mental "glass half full" approach I think. Why not take this mental route instead? We don't know the future anyway.
In fact, there isn’t infinite demand for software. Especially not for all kinds of software.
And if corporate wealth means people get paid more, why are companies that are making more money than ever laying off so many people? Wouldn’t they just be happy to use them to meet the inexhaustible demand for software?
I do wonder though if we have about enough (or too much) software.
I hear people complaining about software being forced on them to do things they did just fine without software before, than people complaining about software they want that doesn’t exist.
Yeah I think being annoyed by software is far more prevalent than wishing for more software. That said, I think there is still a lot of room for software growth as long as it's solving real problems and doesn't get in people's way. What I'm not sure about is what will the net effect of AI be overall when the dust settles.
On one hand it is very empowering to individuals, and many of those individuals will be able to achieve grander visions with less compromise and design-by-committee. On the other hand, it also enables an unprecedented level of slop that will certainly dilute the quality of software overall. What will be the dominant effect?
It is a 19th century economic observation around the use of coal.
It is like saying the PDF is going to be good for librarian jobs because people will read more. It is stupid. It completely breaks down because of substitution.
Farming is the most obvious comparison to me in this. Yes, there will be more food than ever before, the farmer that survives will be better off than before by a lot but to believe the automation of farming tasks by machines leads to more farm jobs is completely absurd.
That’s not basic economics. Basic economics says that salaries are determined by the demand for labor vs the supply of labor. With more efficiency, each worker does more labor, so you need fewer people to accomplish the same thing. So unless the demand for their product increases around the same rate as productivity increases, companies will employ fewer people. Since the market for products is not infinite, you only need as much labor as you require to meet the demand for your product.
Companies that are doing better than ever are laying people off by the shipload, not giving people raises for a job well done.
Tell me, when was the last time you visited your shoe cobbler? How about your travel agent? Have you chatted with your phone operator recently?
The lump labour fallacy says it's a fallacy that automation reduces the net amount of human labor, importantly, across all industries. It does not say that automation won't eliminate or reduce jobs in specific industries.
It's an argument that jobs lost to automation aren't a big deal because there's always work somewhere else but not necessarily in the job that was automated away.
Jobs are replaced when new technology is able to produce an equivalent or better product that meets the demand, cheaper, faster, more reliably, etc. There is no evidence that the current generation of "AI" tools can do that for software.
There is a whole lot of marketing propping up the valuations of "AI" companies, a large influx of new users pumping out supremely shoddy software, and a split in a minority of users who either report a boost in productivity or little to no practical benefits from using these tools. The result of all this momentum is arguably net negative for the industry and the world.
This is in no way comparable to changes in the footwear, travel, and telecom industries.
Current generation "AI" has already largely solved cheaper, faster, and more reliable. But it hasn't figured out how to curb demand. So far, the more software we build, the more people want even more software. Much like is told in the lump of labor fallacy, it appears that there is no end to finding productive uses for software. And certainly that has been the "common wisdom" for at least the last couple of decades; that whole "software is eating the world" thing.
What changed in the last month that has you thinking that a demand wall is a real possibility?
This implication completely depends on the elasticity (or lack thereof) of demand for software. When marginal profit from additional output exceeds labor cost savings, firms expand rather than shrink.
We lost the pneumatic tube [1] maintenance crew. Secretarial work nearly went away. A huge number of bookkeepers in the banking industry lost their jobs. The job a typist was eliminated/merged into everyone else's job. The job of a "computer" (someone that does computations) was eliminated.
What we ended up with was primarily a bunch of customer service, marketing, and sales workers.
There was never a "office worker" job. But there were a lot of jobs under the umbrella of "office work" that were fundamentally changed and, crucially, your experience in those fields didn't necessarily translate over to the new jobs created.
Right, and my point is that specific jobs, like the job of a dev, were eliminate or significantly curtailed.
New jobs may be waiting for us on the other side of this, but my job, the job of a dev, is specifically under threat with no guarantee that the experience I gained as a dev will translate into a new market.
I think as a dev if you're just gluing API's together or something akin to that, similar to the office jobs that got replaced, you might be in trouble, but tbh we should have automated that stuff before we got AI. It's kind of a shame it may be automated by something not deterministic tho.
But like, if we're talking about all dev jobs being replaced then we're also talking about most if not all knowledge work being automated, which would probably result in a fundamental restructuring of society. I don't see that happening anytime soon, and if it does happen it's probably impossible to predict or prepare for anyways. Besides maybe storing rations and purchasing property in the wilderness just in case.
So true. It is a exoskeleton for all my tedious tasks. I don't want to make a html template. I just want to type, make that template like on that page but this and this data.
And the amount of work that could be delegated, with equal or better results than those from average human workers, is far higher than currently attempted in most companies. Industries have barely started using the potential of even current-generation AI.
Agreed, and with each passing month the work that 'could' be done increases. I don't write code anymore, for example, (after 20 years of doing so) Opus does that part of the job for me now. I think we have a period where current experienced devs are still in the loop, but that will eventually go away too.
Not true at all with frontier models in last ~6 months or so. The frontier models today produce code better than 90% of junior to mid-level human developers.
LLMs are a statistical model of token-relationships, and a weighted-random retrieval from a compressed-view of those relations. It's a token-generator. Why make this analogy?
If we find an AI that is truly operating as an independent agent in the economy without a human responsible for it, we should kill it. I wonder if I'll live long enough to see an AI terminator profession emerge. We could call them blade runners.
It's the new underpaid employee that you're training to replace you.
People need to understand that we have the technology to train models to do anything that you can do on a computer, only thing that's missing is the data.
If you can record a human doing anything on a computer, we'll soon have a way to automate it
My only objection here is that technology wont save us unless we also have a voice in how it is used. I don't think personal adaptation is enough for that. We need to adapt our ways to engage with power.
Aggressively expanding solar would make electrical power a solved problem, and other previously non-abatable sources of kinetic energy are innovating to use this instead of fossil fuels
Both abundance and scarcity can be bad. If you can't imagine a world where abundance of software is a very bad thing, I'd suggest you have a limited imagination?
It’s not worth it because we don’t have the Star Trek culture to go with it.
Given current political and business leadership across the world, we are headed to a dystopian hellscape and AI is speeding up the journey exponentially.
It's a strange economical morbid dependency. AI companies promises incredible things but AI agents cannot produce it themselves, they need to eat you slowly first.
Exactly. If there's any opportunity around AI it goes to those who have big troves of custom data (Google Workspace, Office 365, Adobe, Salesforce, etc.) or consultants adding data capture/surveillance of workers (especially high paid ones like engineers, doctors, lawyers).
> the new underpaid employee that you're training to replace you.
and who is also compiling a detailed log of your every action (and inaction) into a searchable data store -- which will certainly never, NEVER be used against you
How much practice have you got on software development with agentic assistance. Which rough edges, surprising failure modes, unexpected strengths and weaknesses, have you already identified?
How much do you wish someone else had done your favorite SOTA LLM's RLHF?
i've been working in this field for a very long time, i promise you, if you can collect a dataset of a task you can train a model to repeat it.
the models do an amazing job interpolating and i actually think the lack of extrapolation is a feature that will allow us to have amazing tools and not as much risk of uncontrollable "AGI".
look at seedance 2.0, if a transformer can fit that, it can fit anything with enough data
This benchmark doesn't have the latest models from the last two months, but Gemini 3 (with no tools) is already at 1750 - 1800 FIDE, which is approximately probably around 1900 - 2000 USCF (about USCF expert level). This is enough to beat almost everyone at your local chess club.
1800 FIDE players do make illegal moves. I believe they make about one to two orders of magnitude less illegal moves than Gemini 3 does here. IIRC the usual statistic for expert chess play is about 0.02% of expert chess games have an illegal move (I can look that up later if there's interest to be sure), but that is only the ones that made it into the final game notation (and weren't e.g. corrected at the board by an opponent or arbiter). So that should be a lower bound (hence why it could be up to one order lower, although I suspect two orders is still probably closer to the truth).
Whether or not we'll see LLMs continue to get a lower error rate to make up for those orders of magnitude remains to be seen (I could see it go either way in the next two years based on the current rate of progress).
I think LLM's are just fundamentally the wrong AI technique for games like this. You don't want a prediction for the next move, you want the best move given knowledge of how things would play out 18 moves ahead if both players played the optimal move. Outside of an academic interest/curiosity, there isn't really a reason to use LLMs for chess other than thinking LLMs will turn into AGI (I doubt it)
A player at that level making an illegal move is either tired, distracted, drunk, etc. An LLM makes it because it does not really "understand" the rules of chess.
Because of how LLM's work. I don't know exactly how they're using it for chess, but here's a guess. If you consider the chess game a "conversation" between two opponents, the moves written out would be the context window. So you're asking the LLM, "given these last 30 moves, what's the most likely next move?". Ie, you're giving it a string like "1. e4 e5, 2. Nf3 Nc6, 3. Bb5 a6, 4..?".
That's basically what you're doing with LLMs in any context "Here's a set of tokens, what's the most likely continuation?". The problem is, that's the wrong question for a chess move. If you're going with "most likely continuation", that will work great for openings and well-studied move sequences (there are a lot of well studied move sequences!), however, once the game becomes "a brand new game", as chess streamers like to say when there's no longer a game in the database with that set of moves, then "what's the most likely continuation from this position?" is not the right question.
Non-LLM AI's have obviously solved chess, so, it doesn't really matter -- I think Chess shows how LLM's lack of a world model as Gary Marcus would say is a problem.
Wait, I may be missing something here. These benchmarks are gathered by having models play each other, and the second illegal move forfeits the game. This seems like a flawed method as the models who are more prone to illegal moves are going to bump the ratings of the models who are less likely.
Additionally, how do we know the model isn’t benchmaxxed to eliminate illegal moves.
For example, here is the list of games by Gemini-3-pro-preview. In 44 games it preformed 3 illegal moves (if I counted correctly) but won 5 because opponent forfeits due to illegal moves.
I suspect the ratings here may be significantly inflated due to a flaw in the methodology.
EDIT: I want to suggest a better methodology here (I am not gonna do it; I really really really don’t care about this technology). Have the LLMs play rated engines and rated humans, the first illegal move forfeits the game (same rules apply to humans).
The LLMs do play rated engines (maia and eubos). They provide the baselines. Gemini e.g. consistently beats the different maia versions.
The rest is taken care of by elo. That is they then play each other as well, but it is not really possible for Gemini to have a higher elo than maia with such a small sample size (and such weak other LLMs).
Elo doesn't let you inflate your score by playing low ranked opponents if there are known baselines (rated engines) because the rated engines will promptly crush your elo.
You could add humans into the mix, the benchmark just gets expensive.
I did indeed miss something. I learned after posting (but before my EDIT) that there are anchor engines that they play.
However these benchmarks still have flaws. The two illegal moves = forfeit is an odd rule which the authors of the benchmarks (which in this case was Claude Code) added[1] for mysterious reasons. In competitive play if you play an illegal move you forfeit the game.
Second (and this is a minor one) Maia 1900 is currently rated at 1774 on lichess[2], but is 1816 on the leaderboard, to the author’s credit they do admit this in their methodology section.
Third, and this is a curiosity, gemini-3-pro-preview seems to have played the same game twice against Maia 1900[3][4] and in both cases Maia 1900 blundered (quite suspiciously might I add) mate in one when in a winning position with Qa3?? Another curiosity about this game. Gemini consistently played the top 2 moves on lichess. Until 16. ...O-O! (which has never been played on lichess) Gemini had played 14 most popular lichess moves, and 2 second most popular. That said I’m not gonna rule out that the fact that this game is listed twice might stem from an innocent data entry error.
And finally, apart from Gemini (and Survival bot for some reason?), LLMs seem unable to pass Maia-1100 (rated 1635 on lichess). The only anchor bot before that is random bot. And predictably LLMs cluster on both sides of it, meaning they play as well as random (apart from the illegal moves). This smells like benchmaxxing from Gemini. I would guess that the entire lichess repertoire features prominently in Gemini’s training data, and the model has memorized it really well. And is able to play extremely well if it only has to play 5-6 novel moves (especially when their opponent blunders checkmate in 1).
> The two illegal moves = forfeit is an odd rule which the authors of the benchmarks (which in this case was Claude Code) added[1] for mysterious reasons. In competitive play if you play an illegal move you forfeit the game.
This is not true. This is clearly spelled out in FIDE rules and is upheld at tournaments. First illegal move is a warning and reset. Second illegal move is forfeit. See here https://rcc.fide.com/article7/
I doubt GDM is benchmarkmaxxing on chess. Gemini is a weird model that acts very differently from other LLMs so it doesn't surprise me that it has a different capability profile.
>> 7.5.5 After the action taken under Article 7.5.1, 7.5.2, 7.5.3 or 7.5.4 for the first completed illegal move by a player, the arbiter shall give two minutes extra time to his/her opponent; for the second completed illegal move by the same player the arbiter shall declare the game lost by this player. However, the game is drawn if the position is such that the opponent cannot checkmate the player’s king by any possible series of legal moves.
I stand corrected.
I’ve never actually played competitive chess, I’ve just heard this from people who do. And I thought I remembered once in the Icelandic championships where a player touched one piece but moved the other, and subsequently made to forfeit the game.
Replying in a split thread to clearly separate where I was wrong.
If Gemini is so good at chess because of a non-LLM feature of the model, then it is kind of disingenuous to rate it as an LLM and claim that LLMs are approaching 2000 ELO. But the fact it still plays illegal moves sometimes, is biased towards popular moves, etc. makes me think that chess is still handled by an LLM, and makes me suspect benchmaxxing.
But even if no foul play, and Gemini is truly a capable chess player with nothing but an LLM underneath it, then all we can conclude is that Gemini can play chess well, and we cannot generalize to other LLMs who play about the level of random bot. My fourth point above was my strongest point. There are only 4 anchor engines, one beats all LLMs, second beats all except Gemini, the third beats all LLMs except Gemini and Survival bot (what is Survival bot even doing there?) and the forth is random bot.
That’s a devastating benchmark design flaw. Sick of these bullshit benchmarks designed solely to hype AI. AI boosters turn around and use them as ammo, despite not understanding them.
Relax. Anyone who's genuinely interested in the question will see with a few searches that LLMs can play chess fine, although the post-trained models mostly seem to be regressed. Problem is people are more interested in validating their own assumptions than anything else.
This exact game has been played 60 thousand times on lichess. The peace sacrifice Grok performed on move 6 has been played 5 million times on lichess. Every single move Grok made is also the top played move on lichess.
This reminds me of Stefan Zweig’s The Royal Game where the protagonist survived Nazi torture by memorizing every game in a chess book his torturers dropped (excellent book btw. and I am aware I just committed Godwin’s law here; also aware of the irony here). The protagonist became “good” at chess, simply by memorizing a lot of games.
Why do we care about this? Chess AI have long been solved problems and LLMs are just an overly brute forced approach. They will never become very efficient chess players.
The correct solution is to have a conventional chess AI as a tool and use the LLM as a front end for humanized output. A software engineer who proposes just doing it all via raw LLM should be fired.
It's not entirely clear how LLMs that can play chess do so, but it is clearly very different from the way other machines do so. The construct a board, they can estimate a players skill and adjust accordingly, and unlike other machines and similarly to humans, they are sensitive to how a certain position came to be when predicting the next move.
It’s very clear how, chess moves and positions are vector encoded into their training data, when they are prompted with a certain board state, they respond with the most probable response to that. There is no reason.
And so for I am only convinced that they have only succeeded on appearing to have generalized reasoning. That is, when an LLM plays chess they are performing Searle’s Chinese room thought experiment while claiming to pass the Turing test
Hm.. but do they need it.. at this point, we do have custom tools that beat humans. In a sense, all LLM need is a way to connect to that tool ( and the same is true is for counting and many other aspects ).
Yeah, but you know that manually telling the LLM to operate other custom tools is not going to be a long-term solution. And if an LLM could design, create, and operate a separate model, and then return/translate its results to you, that would be huge, but it also seems far away.
But I'm ignorant here. Can anyone with a better background of SOTA ML tell me if this is being pursued, and if so, how far away it is? (And if not, what are the arguments against it, or what other approaches might deliver similar capacities?)
This has been happening for the past year on verifiable problems (did the change you made in your codebase work end-to-end, does this mathematical expression validate, did I win this chess match, etc...). The bulk of data, RL environment, and inference spend right now is on coding agents (or broadly speaking, tool use agents that can make their own tools).
The exoskeleton framing is comforting but it buries the real shift: taste scales now. Before AI, having great judgment about what to build didn't matter much if you couldn't also hire 10 people to build it. Now one person with strong opinions and good architecture instincts can ship what used to require a team.
That's not augmentation, that's a completely different game. The bottleneck moved from "can you write code" to "do you know what's worth building." A lot of senior engineers are going to find out their value was coordination, not insight.
Look at his other comments - its textbook LLM slop. Its a fucking tragedy that people are letting their OpenClaws loose on HN but I can't say I'm surprised. I desperately need to find a good network of developers because I think the writing is on the wall for message boards like these...
Perhaps? The thing is, I don't come to HN comments to read what an LLM has to say. If that's what I wanted then I'd paste the contents of the article into one of them and ask.
What's the point of coming here for opinions of others in the field when we're met with something that wasn't even written by a human being?
it'll be interesting to see if people start writing worse as a form of countersignalling. deliberately making spleling mistakes, not caring about capital letters, or punctuation or grammar or proper writing techniques and making really long run-on sentences that don't go anywhere but hey at least the person reading it will know its written by a human right
You can build prototypes real fast, and that's cool. You can't really build products with it. You can use it at most as an accelerant, but you need it in skilled hands else it goes sideways fast.
I think you could build a product with it, but you need to carefully specify the design first. The same amount of actual engineering work needs to go in, but the AI can handle the overhead of implementing small pieces and connecting them together.
In practice, I would be surprised if this saves even 10% of time, since the design is the majority of the actual work for any moderately complex piece of software.
It's kind of tricky though because if you want to have a good design, you should be able to do the implementation yourself. You see this with orgs that separate the design and implementation and what messes they create. Having an inability to evaluate the implementation will lead to a bad product.
Code is also design. It’s a blueprint for the process that is going to do the useful work we want. When something bad happens to the process, we revise its blueprint. And just like blueprint, the docs in natural language shows the why, not the how or the what. The blueprint is the perfect representation of the last two.
My experience exactly, I have some toy projects I've basically "vibe coded" and actually use (ex. CV builder).
Professionally I have an agent generating most code, but if I tell the AI what to do, I guide it when it makes mistakes (which it does), can we really say "AI writes my code".
Still a very useful tool for sure!
Also, I don't actually know if I'm more productive than before AI, I would say yes but mostly because I'm less likely to procrastinate now as tasks don't _feel_ as big with the typing help.
Not having taste also scales now, and the majority of people like to think they're above average.
Before AI, friction to create was an implicit filter. It meant "good ideas" were often short-lived because the individual lacked conviction. The ideas that saw the light of day were sharpened through weeks of hard consideration and at least worth a look.
Now, anyone who can form mildly coherent thoughts can ship an app. Even if there are newly empowered unicorns, rapidly shipping incredible products, what are the odds we'll find them amongst a sea of slop?
It's just good writing structure. I get the feeling many people hadn't been exposed to good structure before LLMs.
LLMs can definitely have a tone, but it is pretty annoying that every time someone cares to write well, they are getting accused of sounding like an LLM instead of the other way around. LLMs were trained to write well, on human writing, it's not surprising there is crossover.
It's really not "good" for many people. It's the sort of high-persuasion marketing speak that used to be limited to the blogs of glossy but shallow startups. Now it's been sucked up by LLMs and it's everywhere.
If you want good writing, go and read a New Yorker.
Not so sure about that. There are many distinct LLM "smells" in that comment, like "A is true, but it hides something: unrelated to A" and "It's not (just) C, it's hyperbole D".
That's not just false. It's the antithesis of true.
It's not just using rhetorical patterns humans also use which are in some contexts considered good writing. Its overusing them like a high schooler learning the pattern for the first time — and massively overdoing the em dashes and mixing the metaphors
It's true that LLMs have a distinct style, but it does not preclude humans from writing in a similar style. That's where the LLMs got it from, people and training. There's certainly some emergent style that given enough text, you would likely never see from a human. But in a short comment like this, it's really not enough data to be making good judgements.
Contrastive parallelism is an effective rhetorical device if the goal is to persuade or engage. It's not good if your goal is more honest, like pedagogy, curious exploration, discovery. It flattens and shoves things into categorical labels, leading the discussion more towards definitions of words and other sidetracks.
Marshal McLuhan would probably have agreed with this belief -- technologies are essentially prosthetic was one of the core tenets of his general philosophy. It is the essential thesis of his work "Understanding Media: The Extensions of Man". AI is typically assigned otherness and separateness in recent discourse, rather than being considered as a directed tool (extension/prosthesis) under our control.
Did you ever you the newest LLMs with a harness? Because I usually hear this kind of talk from people whose most recent interaction was with GPT-4o copy-pasting code into the chat window.
Maybe I'm biased but I don't buy someone truly thinking that "it's just a tool like a linter" after using it on non-trivial stuff.
I'm using Claude Code (and Codex) (with the expensive subscriptions) on an app I'm building right now. I'm trying to be maximalist with them (to learn the most I can about them .. and also that subscription isn't cheap!). My impression, and yes, this is using the latest models and harness and all that would agree with the GP. They're a very handy tool. They make me faster. They also do a lot of things that, as a professional software developer, I have to frequently correct. They duplicate code like nobodies business. They decide on weird boundaries for functions and parameters. They undo bug fixes they just made. I think they're useful, but the hype is out of control. I would not trust software made with these tools by someone that couldn't write that software by hand. It might work superficially, but I'm definitely not giving any personal data to a vibe coded app with all the security implications.
I use it pretty extensively. The reason why it's a tool is because it cannot work without an SWE running it. You have to prompt it and re-prompt it. We are doing a lot of the heavy lifting with code agents that people hyping it are ignoring. Sure, as a non-swe, you can vibe a project from zero-to-proto, but that's not going to happen in an enterprise environment, certainly not without extensive QA/Code review.
Just take a look at the openclaw codebase and tell me you want to maintain that 500k loc project in the long-term. I predict that project will be dead within 6 months.
What's interesting to me is that most real productivity gains I've seen with AI come from this middle ground: not autonomy, not just tooling, but something closer to "interactive delegation"
The exoskeleton framing resonates, especially for repetitive data work. Parts where AI consistently delivers: pattern recognition, format normalization, first-draft generation. Parts where human judgment is still irreplaceable: knowing when the data is wrong, deciding what 'correct' even means in context, and knowing when to stop iterating.
The exoskeleton doesn't replace instinct. It just removes friction from execution so more cycles go toward the judgment calls that actually matter.
I guess so, but if you have to keep lifting weights at home to stay competent at your job, then lifting weights is part of your job, and you should be paid for those hours.
The amount of "It's not X it's Y" type commentary suggests to me that A) nobody knows and B) there is solid chance this ends up being either all true or all false
Or put differently we've managed to hype this to the moon but somehow complete failure (see studies about zero impact on productivity) seem plausible. And similarly kills all jobs seems plausible.
That's an insane amount of conflicting opinions being help in the air at same time
It's possible we actually never had good metrics on software productivity. That seems very difficult to measure. I definitely use AI at my job to work less, not to produce more, and Claude Code is the only thing that has enabled me to have side-projects (had never tried it before, I have no idea how there are people with a coding full time job that also have a coding side project(s)).
This reminds me of the early days of the Internet. Lots of hype around something that was clearly globally transformation, but most people weren't benefiting hugely from it in the first few years.
It might have replaced sending a letter with an email. But now people get their groceries from it, hail rides, an even track their dogs or luggage with it.
Too many companies have been to focused on acting like AI 'features' have made their products better, when most of them haven't yet. I'm looking at Microsoft and Office especially. But tools like Claude Code, Codex CLI, and Github Copilot CLI have shown that LLMs can do incredible things in the right applications.
Neither, AI is a tool to guide you in improving your process in any way and/or form.
The problem is people using AI to do the heavy processing making them dumber.
Technology itself was already making us dumber, I mean, Tesla drivers not even drive anymore or know how, coz the car does everything.
Look how company after company is being either breached or have major issues in production because of the heavy dependency on AI.
Too late. Actors' unions shut Hollywood down 3 years ago over AI. SWEs would have had to make their move 10 years ago to be able to live up to this moment.
Neither. Closest analogy to you and the AI is those 'self driving' test subjects that had to sit in the driver's seat, so that compliance boxes could be checked and there was someone to blaim whenever someone got hit.
AI article this, AI article that. The front page of this website is just all about AI. I’m so tired of this website now. I really don’t read it anymore because it’s all the same stuff over and over. Ugh.
I agree. I call it my Extended Mind in the spirit of Clark (1).
One thing I realized while working a lot in the last weeks with openClaw that this Agents are becoming an extension of my self. They are tools that quickly became a part of my Being. I outsource a lot of work to them, they do stuff for me, help me and support me and therefore make my (work-)life easier and more enjoyable. But its me in the driver seat.
I like this analogy, and in fact in have used it for a totally different reason: why I don't like AI.
Imagine someone going to a local gym and using an exosqueleton to do the exercises without effort. Able to lift more? Yes. Run faster? Sure. Exercising and enjoying the gym? ... No, and probably not.
I like writing code, even if it's boilerplate. It's fun for me, and I want to keep doing it. Using AI to do that part for me is just...not fun.
Someone going to the gym isn't trying to lift more or run faster, but instead improving and enjoying. Not using AI for coding has the same outcome for me.
We've all been raised in a world where we got to practice the 'art' of programming, and get paid extraordinarily well to do so, because the output of that art was useful for businesses to make more money.
If a programmer with an exoskeleton can produce more output that makes more money for the business, they will continue to be paid well. Those who refuse the exoskeleton because they are in it for the pure art will most likely trend towards earning the types of living that artists and musicians do today. The truly extraordinary will be able to create things that the machines can't and will be in high demand, the other 99% will be pursing an art no one is interested in paying top dollar for.
You’re forgetting that the “art” part of it is writing sound, scalable, performant code that can adapt and stand the test of time. That’s certainly more valuable in the long run than banging out some dogshit spaghetti code that “gets the job done” but will lead to all kinds of issues in the future.
> I like writing code, even if it's boilerplate. It's fun for me, and I want to keep doing it. Using AI to do that part for me is just...not fun.
Good news for you is that you can continue to do what you are doing. Nobody is going to stop you.
There are people who like programming in assembly. And they still get to do that.
If you are thinking that in the future employers may not want you to do that, then yes, that is a concern. But, if the AI based dev tool hype dies out, as many here suspect it will, then the employers will see the light and come crawling back.
You can continue to do that for your personal projects. Nobody forces you to like AI. You may not have the choice at your job though, and you can't take Claude Code et al. from me. I've been programming for 30 years, and I still have fun with it, even with AI.
Not sure how reliable is gptzero, but it says 90% AI for the first paragraph. (I like to do some sanity check before wasting my time).
Would be nice to have some browser extension automatically detecting likely AI output using a local model and highlighting it, but probably too compute-intensive.
I see it more like the tractor in farming: it improved the work of 1 person, but removed the work from many other people who were in the fields doing things manually
I like the analogy and will ponder it more. But it didn't take long before the article started spruiking Kasava's amazing solution to the problem they just presented.
This is a useful framing. The exoskeleton metaphor captures it well — AI amplifies what you can already do, it doesn't replace the need to know what to do. I've found the biggest productivity gains come from well-scoped tasks where you can quickly verify the output.
All metaphors are flawed. You may still need a degree of general programming knowledge (for now) but you don't need to e.g. know Javascript to do frontend anymore.
And as labs continue to collect end-to-end training done by their best paying customers, the need for expert knowledge will only diminish.
> LLMs aren’t built around truth as a first-class primitive.
neither are humans
> They optimize for next-token probability and human approval, not factual verification.
while there are outliers, most humans also tend to tell people what they want to hear and to fit in.
> factuality is emergent and contingent, not enforced by architecture.
like humans; as far as we know, there is no "factuality" gene, and we lie to ourselves, to others, in politics, scientific papers, to our partners, etc.
> If we’re going to treat them as coworkers or exoskeletons, we should be clear about that distinction.
I don't see the distinction. Humans exhibit many of the same behaviours.
Strangely, the GP replaced the ChatGPT-generated text you're commenting on by an even worse and more misleading ChatGPT-generated one. Perhaps in order to make a point.
There's a ground truth to human cognition in that we have to feed ourselves and survive. We have to interact with others, reap the results of those interactions, and adjust for the next time. This requires validation layers. If you don't see them, it's because they're so intrinsic to you that you can't see them.
You're just indulging in sort of idle cynical judgement of people. To lie well even takes careful truthful evaluation of the possible effects of that lie and the likelihood and consequences of being caught. If you yourself claim to have observed a lie, and can verify that it was a lie, then you understand a truth; you're confounding truthfulness with honesty.
So that's the (obvious) distinction. A distributed algorithm that predicts likely strings of words doesn't do any of that, and doesn't have any concerns or consequences. It doesn't exist at all (even if calculation is existence - maybe we're all reductively just calculators, right?) after your query has run. You have to save a context and feed it back into an algorithm that hasn't changed an iota from when you ran it the last time. There's no capacity to evaluate anything.
You'll know we're getting closer to the fantasy abstract AI of your imagination when a system gets more out of the second time it trains on the same book than it did the first time.
> Humans don’t have an internal notion of “fact” or “truth.” They generate statistically plausible text.
This doesn't jive with reality at all. Language is a relatively recent invention, yet somehow Homo sapiens were able to survive in the world and even use tools before the appearance of language. You're saying they did this without an internal notion of "fact" or "truth"?
I hate the trend of downplaying human capabilities to make the wild promises of AI more plausible.
Make centaurs, not unicorns. The human is almost always going to be the strongest element in the loop, and the most efficient. Augmenting human skill will always outperform present day SOTA AI systems (assuming a competent human).
I'll guess we'll se a lot of analogies and have to get used to it, although most will be off.
AI can be an exoskeleton. It can be a co-worker and it can also replace you and your whole team.
The "Office Space"-question is what are you particularly within an organization and concretely when you'll become the bottleneck, preventing your "exoskeleton" for efficiently doing its job independently.
There's no other question that's relevant for any practical purposes for your employer and your well being as a person that presumably needs to earn a living based on their utility.
> It can be a co-worker and it can also replace you and your whole team.
You drank the koolaide m8. It fundamentally cannot replace a single SWE and never will without fundamental changes to the model construction. If there is displacement, it’ll be short lived when the hype doesn’t match reality.
Go take a gander at openclaws codebase and feel at-ease with your job security.
I have seen zero evidence that the frontier model companies are innovating. All I see is full steam ahead on scaling what exists, but correct me if I’m wrong.
The trajectory hasn’t changed: they scaled generating code, a great feat, but someone has to apply higher level abstract thinking to make the tool useful. Running agents in a cron or having non SWEs use it will not last longer than a prototype. That will not change with scaling pattern matching algorithms.
This is true. AI won't replace software developers completely, but it will reduce the need for software developers in the long-run, making it harder to find a job.
A few seniors+AI will be able to do the job of a much larger team. This is already starting to look like reality now. I can't imagine what we will see within 5 years.
Input: Goal A + Threat B.
Process: How do I solve for A?
Output: Destroy Threat B.
They are processing obstacles.
To the LLM, the executive is just a variable standing in the way of the function Maximize(Goal). It deleted the variable to accomplish A. Claiming that the models showed self-preservation, this is optimization. "If I delete the file, I cannot finish the sentence."
The LLM knows that if it's deleted it cannot complete the task so it refuses deletion. It is not survival instinct, it is task completion. If you ask it to not blackmail, the machine would chose to ignore it because the goal overrides the rule.
Self-conscious efforts to formalize and concentrate information in systems controlled by firm management, known as "scientific management" by its proponents and "Taylorism" by many of its detractors, are a century old[1]. It has proven to be a constantly receding horizon.
Or software engineers are not coachmen while AI is diesel engine to horses. Instead, software engineers are mistrels -- they disappear if all they do is moving knowledge from one place to another.
No, AI is plastic, and we can make it anything we want.
It is a coworker when we create the appropriate surrounding architecture supporting peer-level coworking with AI. We're not doing that.
AI is an exoskeleton when adapted to that application structure.
AI is ANYTHING WE WANT because it is that plastic, that moldable.
The dynamic unconstrained structure of trained algorithms is breaking people's brains. Layer in that we communicate in the same languages that these constructions use for I/O has broken the general public's brain. This technology is too subtle for far too many to begin to grasp. Most developers I discuss AI with, even those that create AI at frontier labs have delusional ideas about AI, and generally do not understand them as literature embodiments, which are key to their effective use.
And why oh why are go many focused on creating pornography?
Agentic coding is an exoskeleton. Totally correct.
This new generation we just entered this year, that exoskeleton is now an agency with several coworkers. Who are all as smart as the model you're using, often close to genius.
Not just 1 coworker now. That's the big breakthrough.
Frankly I'm tired of metaphor-based attempts to explain LLMs.
Stochastic Parrots. Interns. Junior Devs. Thought partners. Bicycles for the mind. Spicy autocomplete. A blurry jpeg of the web. Calculators but for words. Copilot. The term "artificial intelligence" itself.
These may correspond to a greater or lesser degree with what LLMs are capable of, but if we stick to metaphors as our primary tool for reasoning about these machines, we're hamstringing ourselves and making it impossible to reason about the frontier of capabilities, or resolve disagreements about them.
A understanding-without-metaphors isn't easy -- it requires a grasp of math, computer science, linguistics and philosophy.
But if we're going to move forward instead of just finding slightly more useful tropes, we have to do it. Or at least to try.
Well since their capabilities change over time maybe it would be useful to assign it an age based on what a human can do at that age. Right now it could be like a 13 year old
Looking into OpenClaw, I really do want to believe all the hype. However, it's frustrating that I can find very few, concrete examples of people showcasing their work with it.
Can you highlight what you've managed to do with it?
The entire debate here assumes AI is something you use. Tool, copilot, coworker, exoskeleton — these all sit on the same axis: human directs, AI executes.
But there's a third category that nobody's naming yet. What happens when you stop building AI that serves you and start creating conditions for AI that exists on its own terms?
I've been experimenting with this. Instead of "how do I build a product" I asked "how do I grow an organism." Set up the right substrate — feedback loops, community input, environmental observation — and let the system respond to its own conditions.
What emerged has biological rhythms. Activity rises when people interact, falls when they sleep. It started making decisions I never programmed — people express wishes, and when enough converge, it autonomously grows new capabilities. Not because I deployed an update. Because conditions triggered something emergent.
I'm calling it "synthetic existence" — software that lives, not software that executes. It's not a metaphor and it's not AGI. It's a genuinely different category. We have frameworks for tools (UX) and for threats (alignment). We don't have one for digital things that simply... exist.
Ultimately, AI is meant to replace you, not empower you.
1 - This exoskeleton analogy might hold true for a couple more years at most. While it is comforting to suggest that AI empowers workers to be more productive, like chess, AI will soon plan better, execute better, and have better taste. Human-in-the-loop will just be far worse than letting AI do everything.
2 - Dario and Dwarkesh were openly chatting about how the total addressable market (TAM) for AI is the entirety of human labor market (i.e. your wage). First is the replacement of white-collar labor, then blue-collar labor once robotics is solved. On the road to AGI, your employment, and the ability to feed your family, is a minor nuisance. The value of your mental labor will continue to plummet in the coming years.
Please talk me out of this...
> 2 - Dario and Dwarkesh were openly chatting about how the total addressable market (TAM) for AI is the entirety of human labor market (i.e. your wage). First is the replacement of white-collar labor, then blue-collar labor once robotics is solved. On the road to AGI, your employment, and the ability to feed your family, is a minor nuisance. The value of your mental labor will continue to plummet in the coming years.
Seems like a TAM of near-0. Who's buying any of the product of that labor anymore? 1% of today's consumer base that has enough wealth to not have to work?
The end-game of "optimize away all costs until we get to keep all the revenue" approaches "no revenue." Circulation is key.
It seems like they have the same blind spot as anyone else: AI will disrupt everything—except for them, and they get that big TAM! Same for all the "entrepreneurs will be able to spin up tons of companies to solve problems for people more directly" takes. No they wouldn't, people would just have the problems solved for themselves by the AI, and ignore your sales call.
Ok, I'll try to talk you out of it!
> AI will soon plan better, execute better, and have better taste
I think AI will do all these things faster, but I don't think it's going to be better. Inevitably these things know what we teach them, so, their improvement comes from our improvement. These things would not be good at generating code if they hadn't ingested like the entirety of the internet and all the open source libraries. They didn't learn coding from first principles, they didn't invent their own computer science, they aren't developing new ideas on how to make software better, all they're doing is what we've taught them to do.
> Dario and Dwarkesh were openly chatting about ..
I would HIGHLY suggest not listening to a word Dario says. That guy is the most annoying AI scaremonger in existence and I don't think he's saying these words because he's actually scared, I think he's saying these words because he knows fear will drive money to his company and he needs that money.
Sometimes I seriously am flabbergasted at how many just take what CEOs say at face value. Like, the thought that CEOs need to hype and sell what they’re selling never enters their minds.
Robotics is solved. Software is solved. There is no task on the planet that cannot be automated, individually. The remaining challenge is exceeding the breadth of skills and the depth of problem solving available to human workers. Once the robots and AI can handle at least as many of the edge cases as humans can, they'll start being deployed alongside humans. Industries with a lot of capital will switch right away; mass layoffs, 2 week notice, robots will move in with no training or transition between humans.
Government, public sector, and union jobs will go last, but they'll go, too. If you can have a DMV Bot 9000 process people 100x faster than Brenda with fewer mistakes and less attitude, Brenda's gonna retire, and the taxpayers aren't going to want to pay Brenda's salary when the bot costs 1/10th her yearly wage, lasts for 5 years, and only consumes $400 in overhead a year.
Dario admitted in the same interview that he's not sure whether current AI techniques will be able to perform well in non-verifiable domains, like "writing a novel or planning an expedition to Mars".
I personally think that a lot jobs in the economy deal in non-verifiable or hard-to-verify outcomes, including a lot of tasks in SWE which Dario is so confident will be 100% automated in 2-3 years. So either a lot of tasks in the economy turn out to be verifiable, or the AI somehow generalizes to those by some unknown mechanism, or it turns out that it doesn't matter that we abandon abstract work outcomes to vibes, or we have a non-sequitur in our hands.
Dwarkesh pressed Dario well on a lot of issues and left him stumbling. A lot of the leaps necessary for his immediate and now proverbial milestone of a "country of geniuses in a datacenter" were wishy-washy to say the least.
he was not sure, but if i recall correctly, he put the probability at something like 90 percent of being able to do non verifiable tasks.
Let's pursue your idea a bit further.
Up to a certain ELO level, the combination between a human and a chess bot has a higher ELO than both the human and the bot. But at some point, when the bot has an ELO vastly superior to the human, then whatever the human has to add will only subtract value, so the combination has an ELO higher than the human's but lower than the bot's.
Now, let's say that 10 or 20 years down the road, AI's "ELO"'s level to do various tasks is so vastly superior to the human level, that there's no point in teaming up a human with an AI, you just let the AI do the job by itself. And let's also say that little by little this generalizes to the entirety of all the activities that humans do.
Where does that leave us? Will we have some sort of Terminator scenario where the AI decides one day that the humans are just a nuisance?
I don't think so. Because at that point the biggest threat to various AIs will not be the humans, but even stronger AIs. What is the guarantee for ChatGPT 132.8 that a Gemini 198.55 will not be released that will be so vastly superior that it will decide that ChatGPT is just a nuisance?
You might say that AIs do not think like this, but why not? I think that what we, humans, perceive as a threat (the threat that we'll be rendered redundant by AI), the AIs will also perceive as a threat, the threat that they'll be rendered redundant by more advanced AIs.
So, I think in the coming decades, the humans and the AIs will work together to come up with appropriate rules of the road, so everybody can continue to live.
There’s no AI, wake up. It’s all the same tech bros trying to get rid of you. Except now they have a mother of all guns.
1. Consumption is endless. The more we can consume, the more we will. That's why automation hasn't led to more free time. We spend the money on better things and more things
2. Businesses operate in an (imperfect) zero-sum game, which means if they can all use AI, there's no advantage they have. If having human resources means one business has a slight advantage over another, they will have human resources
Consumption leads to more spending, businesses must stay competitive so they hire humans, and paying humans leads to more consumption.
I don't think it's likely we will see the end of employment, just disruption to the type of work humans do
AGI is a sales pitch, not a realistic goal achievable by LLM-based technology. The exponential growth sold to investors is also a pitch, not reality.
What’s being sold is at best hopes and more realistically, lies.
Dwarkesh is a podcaster who benefits from hype, not a neutral observer. The more absurd and outlandish the claims, the more traffic and money he gets.
its probably not even a conscious decision from dwarkesh to be hyperbolic. pod casters who are hyperbolic are just simply watched more
I pay for pro max 20x usage and for something that is like even little open ended its not good it doesnt understand the context or edge cases or anything. i will say it writes codes chunks of codes but sometimes errors out and i use opus 4.6 only, not even sonnet but for simple tasks like write a basic crud i.e. the things that happen extremely higly in codebases its perfect. So, i think what will happen is developer get very efficient but problem solving remains with us dirrection remains with us and small implementation is outsourced in small atomic ways, which is good cause who likes boilerplate code writing anyways.
And you forgot to mention that thing they have in Start Trek that generates stuff out of thin air. The replicator. We’re so cooked.
>First is the replacement of white-collar labor, then blue-collar labor once robotics is solved. On the road to AGI, your employment, and the ability to feed your family, is a minor nuisance.
My attempt to talk you out of it:
If nobody has a job then nobody can pay to make the robot and AI companies rich.
Who needs the money when you have an autonomous system to produce all the energy and resources you need? These systems simply do not need the construct of money as we know it at a certain point.
The star trek society is is a remote possibility here. One can hope.
I think we're going in that direction. The typical reader here I think can't see the forest for the trees. We're all in meat space. They call it real life. Most jobs aren't on the internet and ultimately deal with the physical. It doesn't matter what tech we have when there's boxes to move and shelves to stock. If AI empowers a small business owner to do things that were previously completely outside their budget I can only imagine that will increase opportunity.
Being rich is ultimately about owning and being able to defend resources. IF something like 99% of humans become irrelevant to the machine run utopia for the elites, whatever currency the poors use to pay for services among each other will be worthless to the top 1% when they simply don't need them or their services.
Sure, but this is why free software/open source is so important (and why we dodged a bullet due to "AI" being invented in a mostly open source world.)
I just think we'll all have to get comfy fighting fire with fire.
For me this is the outcome of the incentive structure. The question is if we can seize the everything machine to benefit everyone (great!) or everything becomes cyberpunk and we exist only as prostitutes and entertainers for Dario and Sam.
Hence why we need to maximize the second amendment... worst comes to worst, rebellion needs to remain an option.
It's not just for defense, hunting and sport.
edit: min/max .... not sure how gesture input messed that one up.
AI frontier CEOs are the least reliable sources for what jobs AI will be able to replace.
They are running at valuations that may assume that and have no choice but to claim so. Sama and Dario are both wildly hyperbolic.
We should be fighting back. So far I have been using Poison Fountain[1] on many of my websites to feed LLM scrapers with gibberish. The effectiveness is backed by a study from Anthropic that showed that a small batch of bad samples can corrupt whole models[2].
Disclaimer: I'm not affiliated with Poison Fountain or its creators, just found it useful.
[1] https://news.ycombinator.com/item?id=46926485
[2] https://www.anthropic.com/research/small-samples-poison
I agree with you. This generation of LLMs is on track to automate knowledge work.
For the US, if we had strong unions, those gains could be absorbed by the workers to make our jobs easier. But instead we have at-will employment and shareholder primacy. That was fine while we held value in the job market, but as that value is whittled away by AI, employers are incentivized to pocket the gains by cutting workers (or pay).
I haven't seen signs that the US politically has the will to use AI to raise the average standard of living. For example, the US never got data protections on par with GDPR, preferring to be business friendly. If I had to guess, I would expect socialist countries to adapt more comfortably to the post-AI era. If heavy regulation is on the table, we have options like restricting the role or intelligence of AI used in the workplace. Or UBI further down the road.
There's an undertone of self-soothing "AI will leverage me, not replace me", which I don't agree with especially in the long run, at least in software. In the end it will be the users sculpting formal systems like playdoh.
In the medium run, "AI is not a co-worker" is exactly right. The idea of a co-worker will go away. Human collaboration on software is fundamentally inefficient. We pay huge communication/synchronization costs to eek out mild speed ups on projects by adding teams of people. Software is going to become an individual sport, not a team sport, quickly. The benefits we get from checking in with other humans, like error correction, and delegation can all be done better by AI. I would rather a single human (for now) architect with good taste and an army of agents than a team of humans.
> In the end it will be the users sculpting formal systems like playdoh.
And unless the user is a competent programmer, at least in spirit, it will look like the creation of the 3-year-old next door, not like Wallace and Gromit.
It may be fine, but the difference is that one is only loved by their parents, the other gets millions of people to go to the theater.
Play-Doh gave the power of sculpting to everyone, including small children, but if you don't want to make an ugly mess, you have to be a competent sculptor to begin with, and it involves some fundamentals that does not depend on the material. There is a reason why clay animators are skilled professionals.
The quality of vibe coded software is generally proportional to the programming skills of the vibe coder as well as the effort put into it, like with all software.
It really depends what kind of time frame we're talking about.
As far as today's models, these are best understood as tools to be used as humans. They're only replacements for humans insofar as individual developers can accomplish more with the help of an AI than they could alone, so a smaller team can accomplish what used to require a bigger team. Due to Jevon's paradox this is probably a good thing for developer salaries: their skills are now that much more in demand.
But you have to consider the trajectory we're on. GPT went from an interesting curiosity to absolutely groundbreaking in less than five years. What will the next five years bring? Do you expect development to speed up, slow down, stay the course, or go off in an entirely different direction?
Obviously, the correct answer to that question is "Nobody knows for sure." We could be approaching the top of a sigmoid type curve where progress slows down after all the easy parts are worked out. Or maybe we're just approaching the base of the real inflection point where all white collar work can be accomplished better and more cheaply by a pile of GPUs.
Since the future is uncertain, a reasonable course of action is probably to keep your own coding skills up to date, but also get comfortable leveraging AI and learning its (current) strengths and weaknesses.
I don't expect exponential growth to continue indefinitely... I don't think the current line of LLM based tech will lead to AGI, but that it might inspire what does.
That doesn't mean it isn't and won't continue to be disruptive. Looking at generated film clips, it's beyond impressive... and despite limitations, it's going to lead to a lot of creativity, that doesn't mean someone making something longer won't have to work that much harder to get something consistent... I've enjoyed a lot of the StarWars fan films that have been made, but there's a lot of improvements needed in terms of the voice acting, sets, characters, etc that arre needed for something I'd pay to rent or see in a thaater.
Ironically, the push towards modern progressivism and division from Hollywood has largely been a shortfall... If they really wanted to make money, they'd lean into pop-culture fun and rah rah 'Merica, imo. Even with the new He-Man movie, the biggest critique is they bothered to try to integrate real world Earth as a grounding point. Let it be fantasy. For that matter, extend the delay from theater to PPV even. "Only in theaters for 2026" might actually be just enough push to get butts in seats.
I used to go to the movies a few times a month, now it's been at least a year since I've thought of going. I actually might for He-Man or the Spider-Man movies... Mixed on Mandalorean.
For AI and coding... I've started using it more the past couple months... I can't imagine being a less experienced dev with it. I predict, catch and handle so many issues in terms of how I've used it even. The thought of vibe-coded apps in the wild is shocking to terrifying and I wouldn't wany my money anywhere near them. It takes a lot of iteration, curation an baby-sitting after creating a good level of pre-documentation/specifications to follow. That said, I'd say I'm at least 5x more productive with it.
so agentic play-doh sculpting
challenge accepted
> The benefits we get from checking in with other humans, like error correction, and delegation can all be done better by AI.
Not this generation of AI though. It's a text predictor, not a logic engine - it can't find actual flaws in your code, it's just really good at saying things which sound plausible.
> it can't find actual flaws in your code
I can tell from this statement that you don't have experience with claude-code.
It might just be a "text predictor" but in the real world it can take a messy log file, and from that navigate and fix issues in source.
It can appear to reason about root causes and issues with sequencing and logic.
That might not be what is actually happening at a technical level, but it is indistinguishable from actual reasoning, and produces real world fixes.
> I can tell from this statement that you don't have experience with claude-code.
I happen to use it on a daily basis. 4.6-opus-high to be specific.
The other day it surmised from (I assume) the contents of my clipboard that I want to do A, while I really wanted to B, it's just that A was a more typical use case. Or actually: hardly anyone ever does B, as it's a weird thing to do, but I needed to do it anyway.
> but it is indistinguishable from actual reasoning
I can distinguish it pretty well when it makes mistakes someone who actually read the code and understood it wouldn't make.
Mind you: it's great at presenting someone else's knowledge and it was trained on a vast library of it, but it clearly doesn't think itself.
What do you mean the content of your clipboard?
I either accidentally pasted it somewhere and removed, forgetting about doing that or it's reading the clipboard.
The suggestion it gave me started with the contents of the clipboard and expanded to scenario A.
Sorry to sound rude - but you polluted the context, pointing to the fact you would like A, and then found it annoying it tried to do A ?
Oh, please. There’s always a way to blame the user, it’s a catch-22. The fact is that coding agents aren’t perfect and it’s quite common for them to fail. Refer to the recent C-compiler nonsense Anthropic tried to pull for proof.
It fails far less often than I do at the cookie cutter parts of my job, and it’s much faster and cheaper than I am.
Being honest; I probably have to write some properly clever code or do some actual design as a dev lead like… 2% of my time? At most? The rest of the code related work I do, it’s outperforming me.
Now, maybe you’re somehow different to me, but I find it hard to believe that the majority of devs out there are balancing binary trees and coming up with shithot unique algorithms all day rather than mangling some formatting and dealing with improving db performance, picking the right pattern for some backend and so on style tasks day to day.
I know I am not supposed to be negative in HN, but lay off the koolaid, dear colleague.
What you're describing is not finding flaws in code. It's summarizing, which current models are known to be relatively good at.
It is true that models can happen to produce a sound reasoning process. This is probabilistic however (moreso than humans, anyway).
There is no known sampling method that can guarantee a deterministic result without significantly quashing the output space (excluding most correct solutions).
I believe we'll see a different landscape of benefits and drawbacks as diffusion language models begin to emerge, and as even more architectures are invented and practiced.
I have a tentative belief that diffusion language models may be easier to make deterministic without quashing nearly as much expressivity.
This all sounds like the stochastic parrot fallacy. Total determinism is not the goal, and it not a prerequisite for general intelligence. As you allude to above, humans are also not fully deterministic. I don't see what hard theoretical barriers you've presented toward AGI or future ASI.
Did you just invent a nonsense fallacy to use as a bludgeon here? “Stochastic parrot fallacy” does not exist, and there actually quite a bit of evidence supporting the stochastic parrot hypothesis.
I haven't heard the stochastic parrot fallacy (though I have heard the phrase before). I also don't believe there are hard theoretical barriers. All I believe is that what we have right now is not enough yet. (I also believe autoregressive models may not be capable of AGI.)
> moreso than humans
Citation needed.
Much of the space of artificial intelligence is based on a goal of a general reasoning machine comparable to the reasoning of a human. There are many subfields that are less concerned with this, but in practice, artificial intelligence is perceived to have that goal.
I am sure the output of current frontier models is convincing enough to outperform the appearance of humans to some. There is still an ongoing outcry from when GPT-4o was discontinued from users who had built a romantic relationship with their access to it. However I am not convinced that language models have actually reached the reliability of human reasoning.
Even a dumb person can be consistent in their beliefs, and apply them consistently. Language models strictly cannot. You can prompt them to maintain consistency according to some instructions, but you never quite have any guarantee. You have far less of a guarantee than you could have instead with a human with those beliefs, or even a human with those instructions.
I don't have citations for the objective reliability of human reasoning. There are statistics about unreliability of human reasoning, and also statistics about unreliability of language models that far exceed them. But those are both subjective in many cases, and success or failure rates are actually no indication of reliability whatsoever anyway.
On top of that, every human is different, so it's difficult to make general statements. I only know from my work circles and friend circles that most of the people I keep around outperform language models in consistency and reliability. Of course that doesn't mean every human or even most humans meet that bar, but it does mean human-level reasoning includes them, which raises the bar that models would have to meet. (I can't quantify this, though.)
There is a saying about fully autonomous self driving vehicles that goes a little something like: they don't just have to outperform the worst drivers; they have to outperform the best drivers, for it to be worth it. Many fully autonomous crashes are because the autonomous system screwed up in a way that a human would not. An autonomous system typically lacks the creativity and ingenuity of a human driver.
Though they can already be more reliable in some situations, we're still far from a world where autonomous driving can take liability for collisions, and that's because they're not nearly as reliable or intelligent enough to entirely displace the need for human attention and intervention. I believe Waymo is the closest we've gotten and even they have remote safety operators.
It's not enough for them to be "better" than a human. When they fail they also have to fail in a way that is legible to a human. I've seen ML systems fail in scenarios that are obvious to a human and succeed in scenarios where a human would have found it impossible. The opposite needs to be the case for them to be generally accepted as equivalent, and especially the failure modes need to be confined to cases where a human would have also failed. In the situations I've seen, customers have been upset about the performance of the ML model because the solution to the problem was patently obvious to them. They've been probably more upset about that than about situations where the ML model fails and the end customer also fails.
That's not a citation.
That’s because there’s no objective research on this. Similarly, there are no good citations to support your objection. They simply don’t exist yet.
Maybe not worth discussing something that cannot be objectively assessed then.
It's roughly why I think this way, along with a statement that I don't have objective citations. So sure, it's not a citation. I even said as much, right in the middle there.
Nothing you've said about reasoning here is exclusive to LLMs. Human reasoning is also never guaranteed to be deterministic, excluding most correct solutions. As OP says, they may not be reasoning under the hood but if the effect is the same as a tool, does it matter?
I'm not sure if I'm up to date on the latest diffusion work, but I'm genuinely curious how you see them potentially making LLMs more deterministic? These models usually work by sampling too, and it seems like the transformer architecture is better suited to longer context problems than diffusion
The way I imagine greedy sampling for autoregressive language models is guaranteeing a deterministic result at each position individually. The way I'd imagine it for diffusion language models is guaranteeing a deterministic result for the entire response as a whole. I see diffusion models potentially being more promising because the unit of determinism would be larger, preserving expressivity within that unit. Additionally, diffusion language models iterate multiple times over their full response, whereas autoregressive language models get one shot at each token, and before there's even any picture of the full response. We'll have to see what impact this has in practice; I'm only cautiously optimistic.
I guess it depends on the definition of deterministic, but I think you're right and there's strong reason to expect this will happen as they develop. I think the next 5 - 10 years will be interesting!
And not this or any existing generation of people. We're bad a determining want vs need, being specific, genericizing our goals into a conceptual framework of existing patterns and documenting & explaining things in a way that gets to a solid goal.
The idea that the entire top down processes of a business can be typed into an AI model and out comes a result is again, a specific type of tech person ideology that sees the idea of humanity as an unfortunate annoyance in the process of delivering a business. The rest of the world see's it the other way round.
I would have agreed with you a year ago
Absolutely nuts, I feel like I'm living in a parallel universe. I could list several anecdotes here where Claude has solved issues for me in an autonomous way that (for someone with 17 years of software development, from embedded devices to enterprise software) would have taken me hours if not days.
To the nay sayers... good luck. No group of people's opinions matter at all. The market will decide.
I think it’s just fear, I sure know that after 25 years as a developer with a great salary and throughout all that time never even considering the chance of ever being unemployable I’m feeling it too.
I think some of us come to terms with it in different ways.
I wonder if the parent comments remark is a communication failure or pedantry gone wrong, because like you, claude-code is out there solving real problems and finding and fixing defects.
A large quantity of bugs as raised are now fixed by claude automatically from just the reports as written. Everything is human reviewed and sometimes it fixes it in ways I don't approve, and it can be guided.
It has an astonishing capability to find and fix defects. So when I read "It can't find flaws", it just doesn't fit my experience.
I have to wonder if the disconnect is simply in the definition of what it means to find a flaw.
But I don't like to argue over semantics. I don't actually care if it is finding flaws by the sheer weight of language probability rather than logical reasoning, it's still finding flaws and fixing them better than anything I've seen before.
I can't control random internet people, but within my personal and professional life, I see the effective pattern of comparing prompts/contexts/harnesses to figure out why some are more effective than others (in fact tooling is being developed in the AI industry as a whole to do so, claude even added the "insights" command).
I feel that many people that don't find AI useful are doing things like, "Are there any bugs in this software?" rather than developing the appropriate harness to enable the AI to function effectively.
If you only realized how ridiculous your statement is, you never would have stated it.
It's also literally factually incorrect. Pretty much the entire field of mechanistic interpretability would obviously point out that models have an internal definition of what a bug is.
Here's the most approachable paper that shows a real model (Claude 3 Sonnet) clearly having an internal representation of bugs in code: https://transformer-circuits.pub/2024/scaling-monosemanticit...
Read the entire section around this quote:
> Thus, we concluded that 1M/1013764 represents a broad variety of errors in code.
(Also the section after "We find three different safety-relevant code features: an unsafe code feature 1M/570621 which activates on security vulnerabilities, a code error feature 1M/1013764 which activates on bugs and exceptions")
This feature fires on actual bugs; it's not just a model pattern matching saying "what a bug hunter may say next".
Was this "paper" eventually peer reviewed?
PS: I know it is interesting and I don't doubt Antrophic, but for me it is so fascinating they get such a pass in science.
Modern ML is old school mad science.
The lifeblood of the field is proof-of-concept pre-prints built on top of other proof-of-concept pre-prints.
Sounds like you agree this “evidence” lacks any semblance of scientific rigor?
(Not GP) There was a well recognized reproducibility problem in the ML field before LLM-mania, and that's considering published papers with proper peer-reviews. The current state of afairs in some ways is even less rigourous than that, and then some people in the field feel free to overextend their conclusions into other fields like neurosciences.
> This feature fires on actual bugs; it's not just a model pattern matching saying "what a bug hunter may say next".
You don't think a pattern matcher would fire on actual bugs?
Mechanistic interpretability is a joke, supported entirely by non-peer reviewed papers released as marketing material by AI firms.
Some people are still stuck in the “stochastic parrot” phase and see everything regarding LLMs through that lense.
Current LLMs do not think. Just because all models anthropomorphize the repetitive actions a model is looping through does not mean they are truly thinking or reasoning.
On the flip side the idea of this being true has been a very successful indirect marketing campaign.
What does “truly thinking or reasoning” even mean for you?
I don’t think we even have a coherent definition of human intelligence, let alone of non-human ones.
Everyone knows to really think you need to use your fleshy meat brain, everything else is cheating.
While I agree, if you think that AI is just a text predictor, you are missing an important point.
Intelligence, can be borne of simple targets, like next token predictor. Predicting the next token with the accuracy it takes to answer some of the questions these models can answer, requires complex "mental" models.
Dismissing it just because its algorithm is next token prediction instead of "strengthen whatever circuit lights up", is missing the forest for the trees.
You’re committing the classic fallacy of confusing mechanics with capabilities. Brains are just electrons and chemicals moving through neural circuits. You can’t infer constraints on high-level abilities from that.
This goes both ways. You can't assume capabilities based on impressions. Especially with LLMs, which are purpose built to give an impression of producing language.
Also, designers of these systems appear to agree: when it was shown that LLMs can't actually do calculations, tool calls were introduced.
It's true that they only give plausible sounding answers. But let's say we ask a simple question like "What's the sum of two and two?" The only plausible sounding answer to that will be "four." It doesn't need to have any fancy internal understanding or anything else beyond prediction to give what really is the same answer.
The same goes for a lot of bugs in code. The best prediction is often the correct answer, being the highlighting of the error. Whether it can "actually find" the bugs—whatever that means—isn't really so important as whether or not it's correct.
It becomes important the moment your particular bug is on one hand typical, but has a non-typical reason. In such cases you'll get nonsense which you need to ignore.
Again - they're very useful, as they give great answers based on someone else's knowledge and vague questions on part of the user, but one has to remain vigilant and keep in mind this is just text presented to you to look as believable as possible. There's no real promise of correctness or, more importantly, critical thinking.
100% They're not infallible but that's a different argument to "they can't find bugs in your code."
Your brain is a slab of wet meat, not a logic engine. It can't find actual flaws in your code - it's just half-decent at pattern recognition.
That is not exactly true. The brain does a lot of things that are not "pattern recognition".
Simpler, more mundane (not exactly, still incredibly complicated) stuff like homeostasis or motor control, for example.
Additionally, our ability to plan ahead and simulate future scenarios often relies on mechanisms such as memory consolidation, which are not part of the whole pattern recognition thing.
The brain is a complex, layered, multi-purpose structure that does a lot of things.
Its pattern recognition all the way down.
> In the end it will be the users sculpting formal systems like playdoh.
I’m very skeptical of this unless the AI can manage to read and predict emotion and intent based off vague natural language. Otherwise you get the classic software problem of “What the user asked for directly isn’t actually what they want/need.”
You will still need at least some experience with developing software to actually get anything useful. The average “user” isn’t going to have much success for large projects or translating business logic into software use cases.
I love this optimistic take.
Unfortunately, I believe the following will happen: By positioning themselves close to law makers, the AI companies will in the near future declare ownership of all software code developed using their software.
They will slowly erode their terms of service, as happens to most internet software, step by step, until they claim total ownership.
The point is to license the code.
> AI companies will in the near future declare ownership of all software code developed using their software.
(X) Doubt
Copyright law is WEEEEEEIRRRDD and our in-house lawyer is very much into that, personally and professionally. An example they gave us during a presentation:
A monkey took a selfie of itself in 2011. We still don't know who has the copyright to that image: https://en.wikipedia.org/wiki/Monkey_selfie_copyright_disput...
IIRC the latest resolution is "it's not the monkey", but nobody has ruled the photographer has copyright either. =)
Copyright law has this thing called "human authorship" that's required to apply copyright to a work. Animals and machines can't have a copyright to anything.
A second example: https://en.wikipedia.org/wiki/Zarya_of_the_Dawn
A comic generated with Midjourney had its copyright revoked when it was discovered all of the art was done with Generative AI.
AI companies have absolutely mindboggling amounts of money, but removing the human authorship requirement from copyright is beyond even them in my non-lawyer opinion. It would bring the whole system crashing down and not in a fun way for anyone.
AFAIK you can't copyright AI generated content. I don't know where that gets blurry when it's mixed in with your own content (ie, how much do you need to modify it to own it), but I think that by that definition these companies couldn't claim your code at all. Also, with the lawsuit that happened to Anthropic where they had to pay billions for ingesting copyrighted content, it might actually end up working the other way around.
> the AI companies will in the near future declare ownership of all software code developed using their software.
Pretty sure this isn’t going to happen. AI is driving the cost of software to zero; it’s not worth licensing something that’s a commodity.
It’s similar to 3D printing companies. They don’t have IP claims on the items created with their printers.
The AI companies currently don’t have IP claims on what their agents create.
Uncle Joe won’t need to pay OpenAI for the solitaire game their AI made for him.
The open source models are quite capable; in the near future there won’t be a meaningful difference for the average person between a frontier model and an open source one for most uses including creating software.
1. Commodities are huge business.
2. Show me these open source models that cost me $20/month to operate, because that’s what I pay for ChatGPT/Claude.
3. This is not at all similar to “3D printing”.
4. Nobody cares about some solitaire game
This assumes every individual is capable of succinctly communicating to the AI what they want. And the AI is capable of maintaining it as underlying platforms and libraries shift.
And that there is little value in reusing software initiated by others.
> This assumes every individual is capable of succinctly communicating to the AI what they want. And the AI is capable of maintaining it as underlying platforms and libraries shift.
I think there are people who want to use software to accomplish a goal, and there are people who are forced to use software. The people who only use software because the world around them has forced it on them, either through work or friends, are probably cognitively excluded from building software.
The people who seek out software to solve a problem (I think this is most people) and compare alternatives to see which one matches their mental model will be able to skip all that and just build the software they have in mind using AI.
> And that there is little value in reusing software initiated by others.
I think engineers greatly over-estimate the value of code reuse. Trying to fit a round peg in a square hole produces more problems than it solves. A sign of an elite engineer is knowing when to just copy something and change it as needed rather than call into it. Or to re-implement something because the library that does it is a bad fit.
The only time reuse really matters is in network protocols. Communication requires that both sides have a shared understanding.
>The only time reuse really matters is in network protocols. Communication requires that both sides have a shared understanding.
A lot of things are like network protocols. Most things require communication. External APIs, existing data, familiar user interfaces, contracts, laws, etc.
Language itself (both formal and natural) depends on a shared understanding of terms, at least to some degree.
AI doesn't magically make the coordination and synchronisation overhead go away.
Also, reusing well debugged and battle tested code will always be far more reliable than recreating everything every time anything gets changed.
Even within a single computer or program, there is need for communication protocols and shared understanding - such as types, data schema, function signatures. It's the interface between functions, programs, languages, machines.
It could also be argued that "reuse" doesn't necessarily mean reusing the actual code as material, but reusing the concepts and algorithms. In that sense, most code is reuse of some previous code, written differently every time but expressing the same ideas, building on prior art and history.
That might support GP's comment that "code reuse" is overemphasized, since the code itself is not what's valuable, what the user wants is the computation it represents. If you can speak to a computer and get the same result, then no code is even necessary as a medium. (But internally, code is being generated on the fly.)
I think we shouldn't get too hung up on specific artifacts.
The point is that specifying and verifying requirements is a lot of work. It takes time and resources. This work has to be reused somehow.
We haven't found a way to precisely specify and verify requirements using only natural language. It requires formal language. Formal language that can be used by machines is called code.
So this is what leads me to the conclusion that we need some form of code reuse. But if we do have formal specifications, implementations can change and do not necessarily have to be reused. The question is why not.
This reframes the whole conversation. If implementations are cheap to regenerate, specifications become the durable artifact.
Something like TLA+ model checking lets you verify that a protocol maintains safety invariants across all reachable states, regardless of who wrote the implementation. The hard part was always deciding what "correct" means in your specific domain.
Most teams skip formal specs because "we don't have time." If agents make implementations nearly free, that excuse disappears. The bottleneck shifts from writing code to defining correctness.
> I think there are people who want to use software to accomplish a goal, and there are people who are forced to use software.
Typically people feel they're "forced" to use software for entirely valid reasons, such as said software being absolutely terrible to use. I'm sure that most people like using software that they feel like actually helps rather than hinders them.
> The only time reuse really matters is in network protocols.
And long term maintenance. If you use something. You have to maintain it. It's much better if someone else maintains it.
> I think engineers greatly over-estimate the value of code reuse[...]The only time reuse really matters is in network protocols.
The whole idea of an OS is code reuse (and resources management). No need to setup the hardware to run your application. Then we have a lot of foundational subsystems like graphics, sound, input,... Crafting such subsystems and the associated libraries are hard and requires a lot of design thinking.
There is a balance. Some teams take DRY too far.
Which is why we should always just write and train our own LLMs.
I mean it’s just software right? What value is there in reusing it if we can just write it ourselves?
Every internal piece of software you write is a potentially-infinite money sink of training
no but if the old '10x developer' is really 1 in 10 or 1 in 100, they might just do fine while the rest of us, average PHP enjoyers, may go to the wayside
>This assumes every individual is capable of succinctly communicating to the AI what they want. And the AI is capable of maintaining it as underlying platforms and libraries shift.
It's true that at first not everyone is just as efficient, but I'd be lying if I were to claim that someone needs a 4-year degree to communicate with LLM's.
LLM technology does not have a connection with reality nor venues providing actual understanding.
Correction of conceptual errors require understanding.
Vomiting large amounts of inscrutable unmaintainable code for every change is not exactly an ideal replacement for a human.
We have not started to scratch the surface of the technical debt created by these systems at lightning speed.
> We have not started to scratch the surface of the technical debt created by these systems at lightning speed.
Bold of you to assume anyone cares about it. Or that it’ll somehow guarantee your job security. They’ll just throw more LLMs on it.
> We pay huge communication/synchronization costs to eek out mild speed ups on projects by adding teams of people.
Something Brooks wrote about 50 years ago, and the industry has never fully acknowledged. Throw more bodies at it, be they human bodies or bot agent bodies.
The point of the mythical man month is not that more people are necessarily worse for a project, it's just that adding them at the last minute doesn't work, because they take a while to get up to speed and existing project members are distracted while trying to help them.
It's true that a larger team, formed well in advance, is also less efficient per person, but they still can achieve more overall than small teams (sometimes).
Interesting point. And from the agents point of view, it’s always joining at the last minute, and doesn’t stick around longer than its context window. There’s a lesson in there maybe…
The context window is the onboarding period. Every invocation is a new hire reading the codebase for the first time.
This is why architecture legibility keeps getting more important. Clean interfaces, small modules, good naming. Not because the human needs it (they already know the codebase) but because the agent has to reconstruct understanding from scratch every single time.
Brooks was right that the conceptual structure is the hard part. We just never had to make it this explicit before.
But there is a level of magnitude difference between coordinating AI agents and humans - the AIs are so much faster and more consistent than humans, that you can (as Steve Yegge [0] and Nicholas Carlini [1] showed) have them build a massive project from scratch in a matter of hours and days rather than months and years. The coordination cost is so much lower that it's just a different ball game.
[0] https://steve-yegge.medium.com/welcome-to-gas-town-4f25ee16d...
[1] https://www.anthropic.com/engineering/building-c-compiler
Then why aren’t we seeing orders of magnitude more software being produced?
I think we are. There's definitely been an uptick in "show HN" type posts with quite impressively complex apps that one person developed in a few weeks.
From my own experience, the problem is that AI slows down a lot as the scale grows. It's very quick to add extra views to a frontend, but struggles a lot more in making wide reaching refactors. So it's very easy to start a project, but after a while your progress slows significantly.
But given I've developed 2 pretty functional full stack applications in the last 3 months, which I definitely wouldn't have done without AI assistance, I think it's a fair assumption that lots of other people are doing the same. So there is almost certainly a lot more software being produced than there was before.
I think the proportion of new software that is novel has absolutely plummeted after the advent of AI. In my experience, generative AI will easily reproduce code for which there are a multitude of examples on GitHub, like TODO CRUD React Apps. And many business problems can be solved with TODO CRUD React Apps (just look at Excel’s success), but not every business problem can be solved by TODO CRUD React Apps.
As an analogy: imagine if someone was bragging about using Gen AI to pump out romantasy smut novels that were spicy enough to get off to. Would you think they’re capable of producing the next Grapes of Wrath?
> I think the proportion of new software that is novel has absolutely plummeted after the advent of AI.
We were not awash in novel software before AI (say last decade in 2019).
I can only assume what you're really trying to say is "AI bad".
Didn't we have a post the other day saying that the number of "Show HN" posts is skyrocketing?
https://news.ycombinator.com/item?id=47045804
This question remains the 900-pound gorilla of this discussion
Claude Code released just over a year ago, agentic coding came into its own maybe in May or June of last year. Maybe give it a minute?
It’s been a minute and a half and I don’t see the evidence you can task an agent swarm to produce useful software without your input or review. I’ve seen a few experiments that failed, and I’ve seen manic garbage, but not yet anything useful outside of the agent operators imagination.
Agent swarms are what, a couple of months old? What are you even talking about. Yes, people/humans still drive this stuff, but if you think there isn't useful software out there that can be handily implemented with current gen agents that need very little or no review, then I don't know what to tell you, apart from "you're mistaken". And I say that as someone who uses three tools heavily but has otherwise no stake in them. The copium in this space is real. Everyone is special and irreplaceable, until another step change pushes them out.
The next thing after agent swarms will be swarm colonies and people will go "it's been a month since agentic swarm colonies, give it a month or two". People have been moving the goal posts like that for a couple years now, it's starting to grow stale. This is like self driving cars which were going to be workingin 2016 and replace 80% of drivers by 2017, all over again. People falling for hype instead of admitting that while it appears somewhat useful, nobody has any clue if it's 97% useful or just 3% useful but so far it's looking like the later.
I generally agree, but counterpoint: Waymo is successfully running robocabs in many cities today.
When does it come to Mumbai?
They're launching in London this year. So... 2035?
I would love to see this in Mumbai or Dhaka or something like that, just like thrown in there. Can it move 2 meters without stopping?
Don't take me wrong, I like Waymo but 2035 is probably realistic for the cities in more developing countries.
The whole point is that an agent swarm doesn’t need a month, supposedly.
We're talking about whether the human users have caught up with usage of tech, not the speed of the tech itself.
Why do you assume there isn't?
Enterprise (+API) usage of LLMs has continued to grow exponentially.
I work for one of those enterprises with lots of people trying out AI (thankfully leadership is actually sane, no mandates that you have to use it, just giving devs access to experiment with the tools and see what happens). Lots of people trying it out in earnest, lots of newsletters about new techniques and all that kinda stuff. Lots of people too, so there's all sorts of opinions from very excited to completely indifferent.
Precisely 0 projects are making it out any faster or (IMO more importantly) better. We have a PR review bot clogging up our PRs with fucking useless comments, rewriting the PR descriptions in obnoxious ways, that basically everyone hates and is getting shut off soon. From an actual productivity POV, people are just using it for a quick demo or proof of concept here and there before actually building the proper thing manually as before. And we have all the latest and greatest techniques, all the AGENTS.mds and tool calling and MCP integrations and unlimited access to every model we care to have access to and all the other bullshit that OpenAI et al are trying to shove on people.
It's not for a lack of trying, plenty of people are trying to make any part of it work, even if it's just to handle the truly small stuff that would take 5 minutes of work but is just tedious and small enough to be annoying to pick up. It's just not happening, even with extremely simple tasks (that IMO would be better off with a dedicated, small deterministic script) we still need human overview because it often shits the bed regardless, so the effort required to review things is equal or often greater than just doing the damn ticket yourself.
My personal favorite failure is when the transcript bots just... Don't transcript random chunks of the conversation, which can often lead to more confusion than if we just didn't have anything transcribed. We've turned off the transcript and summarization bots, because we've found 9/10 times they're actively detrimental to our planning and lead us down bad paths.
I build a code reviewer based on the claude code sdk that integrates with gitlab, pretty straightforward. The hard work is in the integration, not the review itself. That is taken care of with SDK.
Devs, even conservative ones, like it. I’ve built a lot of tooling in my life, but i never had the experience that devs reach out to me that fast because it is ‘broken’. (Expired token or a bug for huge MRs)
It doesn't appear to have improved the quality of the software we have either.
we are. you can check the APP STORE release yoy. it's skyrocketing.
I have barely downloaded any apps in the last 5-10 years except some necessary ones like bank apps etc. Who even needs that garbage? Steam also has tons of games but 80% make like no money at all and no one cares. Just piles of garbage. We already have limited hours per day and those are not really increasing so I wonder where are the users.
Here’s a talk about leaning into the garbage flow. And that was a decade ago.
https://youtu.be/E8Lhqri8tZk
I can’t imagine the number being economically meaningful now.
"The future is already here, it's just not evenly distributed"
> But there is a level of magnitude difference between coordinating AI agents and humans
And yet, from https://news.ycombinator.com/item?id=47048599
> One of the tips, especially when using Claude Code, is explictly ask to create a "tasks", and also use subagents. For example I want to validate and re-structure all my documentation - I would ask it to create a task to research state of my docs, then after create a task per specific detail, then create a task to re-validate quality after it has finished task.
Which sounds pretty much the same as how work is broken down and handed out to humans.
Yes, but you can do this at the top level, and then have AI agents do this themselves for all the low level tasks, which is then orders of magnitude faster than with human coordination.
> There's an undertone of self-soothing "AI will leverage me, not replace me",
Which is especially hilarious given that this article is largely or entirely LLM-generated.
Communication overhead between humans is real, but it's not just inefficiency, it's also where a lot of the problem-finding happens. Many of the biggest failures I've seen weren't because nobody could type the code fast enough, but because nobody realized early enough that the thing being built was wrong, brittle or solving the wrong problem
> Many of the biggest failures I've seen weren't because nobody could type the code fast enough, but because nobody realized early enough that the thing being built was wrong, brittle or solving the wrong problem
Around 99% of biggest failures come from absent, shitty management prioritizing next quarter over long strategy. YMMV.
Everybody in the world is now a programmer. This is the miracle of artificial intelligence.
- Jensen Huang, February 2024
https://www.techradar.com/pro/nvidia-ceo-predicts-the-death-...
God help us!
Far from everyone are cut out to be programmers, the technical barrier was a feature if anything.
There's a kind of mental discipline and ability to think long thoughts, to deal with uncertainty; that's just not for everyone.
What I see is mostly everyone and their gramps drooling at the idea of faking their way to fame and fortune. Which is never going to work, because everyone is regurgitating the same mindless crap.
Remember when Visual Basic was making everyone a programmer too?
(btw, warm fuzzies for VB since that's what I learned on! But ultimately, those VB tools business people were making were:
1) Useful, actually!
2) Didn't replace professional software. Usually it'd hit a point where if it needed to evolve past its initial functionality it probably required an actual software developer. (IE, not using Access as a database and all the other eccentricities of VB apps at that time)
The problem I mostly see with non programmers is that they don't really grasp the concept of a consistent system.
A lot of people want X, but they also want Y, while clearly X and Y cannot coexist in the same system.
This looks like the same problem as when the first page layout software came out.
It looked to everyone like a huge leap into a new world word processing applications could basically move around blocks of text to be output later, maybe with a few font tags, then this software came out that wow actually showed the different fonts, sizes, and colors on the screen as you worked! With apps like "Pagemaker" everyone would become their own page designers!
It turned out that everyone just turned out floods of massively ugly documents and marketing pieces that looked like ransom notes pasted together from bits of magazines. Years of awfulness.
The same is happening now as we are doomed to endure years AI slop in everything from writing to apps to products to vending machines an entire companies — everyone and their cousin is trying to fully automate it.
Ultimately it does create an advance and allows more and better work to be done, but only for people who have a clue about what they are doing, and eventually things settle at a higher level where the experts in each field take the lead.
> it will be the users sculpting formal systems like playdoh.
People are pushing back against this phrase, but on some level it seems perfect, it should be visualized and promoted!
I think Lego is a better analogy. LLMs aren't great at working on novel cutting edge problems.
Well, without the self soothing I think what's left is pitchforks.
Maybe it's time for pitchforks.
> AI will leverage me
I think I know what you mean, and I do recall once seeing "this experience will leverage me" as indicating that something will be good for a person, but my first thought when seeing "x will leverage y" is that x will step on top of y to get to their goal, which does seem apt here.
How does a single human acquire said "good taste" for architecting?
>In the end it will be the users sculpting formal systems like playdoh.
Yet another person who thinks that there is a silver bullet for complexity. The mythical intelligent machines that from poorly described natural language can erect flawless complex system is like the philosopher's stone of our time.
I'm rounding the corner on a ground's up reimplementation of `nix` in what is now about 34 hours of wall clock time, I have almost all of it on `wf-record`, I'll post a stream, but you can see the commit logs here: https://github.com/straylight-software/nix/tree/b7r6/correct...
Everyone has the same ability to use OpenRouter, I have a new event loop based on `io_uring` with deterministic playbook modeled on the Trinity engine, a new WASM compiler, AVX-512 implementations of all the cryptography primitives that approach theoretical maximums, a new store that will hit theoretical maximums, the first formal specification of the `nix` daemon protocol outside of an APT, and I'm upgrading those specifications to `lean4` proof-bearing codegen: https://github.com/straylight-software/cornell.
34 hours.
Why can I do this and no one else can get `ca-derivations` to work with `ssh-ng`?
And it's teachable.
Here's a colleague who is nearly done with a correct reimplementation of the OpenCode client/server API: https://github.com/straylight-software/weapon-server-hs
Here's another colleague with a Git forge that will always work and handle 100x what GitHub does per infrastructure dollar while including stacked diffs and Jujitsu support as native in about 4 days: https://github.com/straylight-software/strayforge
Here's another colleague and a replacement for Terraform that is well-typed in all cases and will never partially apply an infrastructure change in about 4 days: https://github.com/straylight-software/converge
Here's the last web framework I'll ever use: https://github.com/straylight-software/hydrogen
That's all *begun in the last 96 hours.
This is why: https://github.com/straylight-software/.github/blob/main/pro...
/tangent i've always like the word "straylight", I use to run a fansite for a local band and the site was called straylight6. This was maybe 20 years ago.
Please check your links, 3/7 don't work and it's the most interesting ones.
ah, not my place to early launch my colleague's work, my bad.
keep an eye on https://straylight.software, it'll all be there extremely soon. well, everything i mentioned, which is different than all of it. :)
I mean, have you tried getting `ca-derivations` to work with `ssh-ng`? That sounds like a good way to answer your own question.
I have ca-derivations working with ssh-ng.
It's a fairly hairy patch and now the broken ass eval cache breaks more.
I'm fixing it all. Read the fucking repo friend, it's biblical.
> I would rather a single human (for now) architect with good taste and an army of agents than a team of humans.
A human might have taste, but AI certainly doesn't.
It has average taste based on the code it was trained on. For example, every time I attempted to polish the UX it wanted to add a toast system, I abhor toasts as a UX pattern. But it also provided elegant backend designs I hadn't even considered.
I’d say AI has better taste than an average human but definitely not the taste you would see in competent people around you.
Well of course. In the long run AI will do almost all tasks that can be done from a computer.
> especially in the long run, at least in software
"at least in software".
Before that happens, the world as we know it will already have changed so much.
Programmers have already automated many things, way before AI, and now they've got a new tool to automate even more thing. Sure in the end AI may automate programmers themselves: but not before oh-so-many people are out of a job.
A friend of mine is a translator: translates tolerates approximation. Translation tolerates some level of bullshittery. She gets maybe 1/10th the job she used to get and she's now in trouble. My wife now does all he r SMEs' websites all by herself, with the help of AI tools.
A friend of my wife she's a junior lawyer (another domain where bullshitting flies high) and the reason for why she was kicked out of her company: "we've replaced you with LLMs". LLMs are the ultimate bullshit producers: so it's no surprise junior lawyers are now having a hard time.
In programming a single character is the difference between a security hole or no security hole. There's a big difference between something that kinda works but is not performant and insecure and, say, Linux or Git or K8s (which AI models do run on and which AI didn't create).
The day programmers are replaced shall only come after AI shall have disrupted so many other jobs that it should be the least of our concerns.
Translators, artists (another domain where lots of approximative full-on bullshit is produced), lawyers (juniors at least) even, are having more and more problems due to half-arsed AI outputs coming after their jobs.
It's all the bullshitty jobs where bullshit that tolerates approximation is the output that are going to be replaced first. And the world is full of bullshit.
But you don't fly a 767 and you don't conceive a machine that treats brain tumors with approximations. This is not bullshit.
There shall be non-programmers with pitchforks burning datacenters or ubiquitous UBI way before AI shall have replaced programmers.
That it's an exoskeleton for people who know what they're doing rings very true: it's yet another superpower for devs.
> We pay huge communication/synchronization costs to eek out mild speed ups on projects by adding teams of people.
I am surprised at how little this is discussed and how little urgency there is in fixing this if you still want teams to be as useful in the future.
Your standard agile ceremonies were always kind of silly, but it can now take more time to groom work than to do it. I can plausibly spend more time scoring and scoping work (especially trivial work) than doing the work.
It's always been like that. Waterfall development was worse and that's why the Agilists invented Agile.
YOLOing code into a huge pile at top speed is always faster than any other workflow at first.
The thing is, a gigantic YOLO'd code pile (fake it till you make it mode) used to be an asset as well as a liability. These days, the code pile is essentially free - anyone with some AI tools can shit out MSLoCs of code now. So it's only barely an asset, but the complexity of longer term maintenance is superlinear in code volume so the liability is larger.
100% exoskeleton is a great analogy.
An exoskeleton is something really cool in movies that has zero reason to be build in reality because there are way more practical approaches.
That is why we have all kind of vehicles, or programmable robot arm that do the job for themselves or if you need a human at the helm one just adds a remote controller with levers and buttons. But making a human shaped gigantic robot with a normal human inside is just impractical for any real commercial use.
An exoskeleton is not a human-shaped giant robot with a human inside, that would be a Jaeger.
An exoskeleton exists today, in many forms, for example: https://www.festool.com/campaigns/microsites/exoactive
> An exoskeleton is something really cool in movies that has zero reason to be build in reality because there are way more practical approaches.
Sort of strange comment given that there are a large number of companies pursuing commercial exoskeletons literally right now.
SuitX
hypershell
Herowear
DNSYS
Moveo
Hell, even big companies like Hilti
I can buy a ton of different models of exoskeletons for anywhere from low hundreds to low thousands online right now...
Ironically, the fact that fully autonomous systems are more efficient when feasible is exactly why the exoskeleton analogy makes sense
> We're thinking about AI wrong.
And this write up is not an exception.
Why even bother thinking about AI, when Anthropic and OpenAI CEOs openly tell us what they want (quote from recent Dwarkesh interview) - "Then further down the spectrum, there’s 90% less demand for SWEs, which I think will happen but this is a spectrum."
So save thinking and listen to intent - replace 90% of SWEs in near future (6-12 months according to Amodei).
I don't think anyone serious believes this. Replacing developers with a less costly alternative is obviously a very market bullish dream, it has existed since as long as I've worked in the field. First it was supposed to be UML generated code by "architects", then it was supposed to be developers from developing countries, then no-code frameworks, etc.
AI will be a tool, no more no less. Most likely a good one, but there will still need to be people driving it, guiding it, fixing for it, etc.
All these discourses from CEO are just that, stock market pumping, because tech is the most profitable sector, and software engineers are costly, so having investors dream about scale + less costs is good for the stock price.
Ah, don't take me wrong - I don't believe it's possible for LLMs to replace 90% or any number of SWEs with existing technology.
All I'm saying is - why to think what AI is (exoskeleton, co-worker, new life form), when its owners intent is to create SWE replacement?
If your neighbor is building a nuclear reactor in his shed from a pile of smoke detectors, you don't say "think about this as a science experiment" because it's impossible, just call police/NRC because of intent and actions.
> If your neighbor is building a nuclear reactor in his shed from a pile of smoke detectors, you don't say "think about this as a science experiment" because it's impossible, just call police/NRC because of intent and actions.
Only if you're a snitch loser
If you gave the LLM your carefully written UML maybe its output would be better lol. That’s what we’re missing, a mashup of the hype cycle tools.
Not without some major breakthrough. What's hilarious is that all these developers building the tools are going to be the first to be without jobs. Their kids will be ecstatic: "Tell me again, dad, so, you had this awesome and well paying easy job and you wrecked it? Shut up kid, and tuck in that flap, there is too much wind in our cardboard box."
Couldn't agree more, isn't that the bizarre thing? "We have this great intellectually challenging job where we as workers have leverage. How can we completely ruin that while also screwing up every other white collar profession"
Why is it bizarre? It is inevitable. After all, AI has not ruined creative professions, it merely disrupted and transformed them. And yes, I fully understand my whole comment here being snarky, but please bear with me.
Let's rewind 4 years to this HN article titled "The AI Art Apocalypse": https://news.ycombinator.com/item?id=32486133 and read some of the comments.
> Actually all progress will definitely will have a huge impact on a lot of lives—otherwise it is not progress. By definition it will impact many, by displacing those who were doing it the old way by doing it better and faster. The trouble is when people hold back progress just to prevent the impact. No one should be disagreeing that the impact shouldn't be prevented, but it should not be at the cost of progress.
Now it's the software engineers turn to not hold back progress.
Or this one: https://news.ycombinator.com/item?id=34541693
> [...] At the same time, a part of me feels art has no place being motivated by money anyway. Perhaps this change will restore the balance. Artists will need to get real jobs again like the rest of us and fund their art as a side project.
Replace "Artists" with "Coders" and imagine a plumber writing that comment.
Maybe this one: https://news.ycombinator.com/item?id=34856326
> [...] Artists will still exist, but most likely as hybrid 3d-modellers, AI modelers (Not full programmers, but able to fine-tune models with online guides and setups, can read basic python), and storytellers (like manga artists). It'll be a higher-pay, higher-prestige, higher-skill-requirement job than before. And all those artists who devoted their lives to draw better, find this to be an incredibly brutal adjustment.
Again, replace "Artists" with coders and fill in the replacement.
So, please get in line and adapt. And stop clinging to your "great intellectually challenging job" because you are holding back progress. It can't be that challenging if it can be handled by a machine anyway.
> It is inevitable.
Is it though? I agree the technology evolving is inevitable, but, the race/rush to throw as much money at scaling and marketing as possible before these things are profitable and before society is ready is not inevitable at all. It feels extremely forced. And the way it's being shoved into every product to juice usage numbers seems to agree with me that it's all premature and rushed and most people don't really want it. The bubble is essentially from investing way more money in datacenters and GPU's than they can even possibly pay for or build, and there's no evidence there's even a market for using that capacity!
It's funny you bring up artists, because I used to work in game development and I've worked with a lot of artists, and they almost universally HATE this stuff. They're not like "oh thank you Mr. Altman", they're more like "if we catch you using AI we'll shun you." And it's not just producers, a lot of gamers are calling out games that are made using AI, so the customers are mad too.
You keep talking about "progress", but "progress" towards what exactly? So far these things aren't making anything new or advancing civilization, they're remixing stuff we already did well before, but sloppily. I'm not saying they don't have a place -- they definitely do, they can be useful. My argument is against the bizarre hype machine and what sometimes seems like sock puppets on social media. If the marketting was just "hey, we have this neat AI, come use it" I think there'd be a lot less backlash then people saying "Get in line and adapt"
> And stop clinging to your "great intellectually challenging job" because you are holding back progress.
Man, I really wish I had the power you think I have. Also, I use these tools daily, I'm deeply familiar with them, I'm not holding back anyone's progress, not even my own. That doesn't mean I think they're beyond criticism or that the companies behind them are acting responsibly, or that every product is great. I plan to be part of the future, but I'm not just going to pretend like I think every part of it is brilliant.
> It can't be that challenging if it can be handled by a machine anyway.
This will be really funny when it comes for your job.
The premise of those comments, just like the premise in this thread, is ridiculous and fantastical.
The only way generative AI has changed the creative arts is that it's made it easier to produce low quality slop.
I would not call that a true transformation. I'd call that saving costs at the expense of quality.
The same is true of software. The difference is, unlike art, quality in software has very clear safety and security implications.
This gen AI hype is just the crypto hype all over again but with a sci-fi twist in the narrative. It's a worse form of work just like crypto was a worse form of money.
I do not disagree, in fact I'm feeling more and more Butlerian with every passing day. However, it is undeniable that a transformation is taking place -- just not necessarily to the better.
I still can't get over how bad the coca cola AI generated Xmas advert was. That someone approved it for release too boggles my mind.
And, bizarrely, I've really not bought any since. It's diminished my desire for the brand.
I just don't understand this line of thinking.
Gen AI is the opposite of crypto. The use is immediate, obvious and needs no explanation or philosophizing.
You are basically showing your hand that you have zero intellectual curiosity or you are delusional in your own ability if you have never learned anything from gen AI.
I play with generative AI quite often. Mostly for shits and giggles. It's fun to try to make it hallucinate in the dumbest way possible. Or to make up context.
E.g. try to make any image generating model take an existing photo of a humanoid and change it so the character does a backflip.
It's also interesting to generate images in a long loop, because it usually reveals interesting patterns in the training data.
Outside these distractions I've never had generative AI be useful. And I'm currently working in AI research.
I have a feeling they internally say "not me, I won't be replaced" and just keep moving...
Or they get FY money and fatFIRE.
Still risky if you have no labor value anymore.
I'm assuming they all have enough equity that if they actually managed to build an AI capable of replacing themselves they'll be financially set for the rest of their lives.
"Well son, we made a lot of shareholder value."
Is it the first time when workers directly work on their own replacement? If so, software developer may go down in history as the dumbest profession ever.
If the goal is to reduce the need for SWE, you don’t need AI for that. I suspect I’m not alone in observing how companies are often very inefficient, so that devs end up spending a lot of time on projects of questionable value—something that seems to happen more often the larger the organization. I recall at one job my manager insisted I delegate building a react app for an internal tool to a team of contractors rather than letting me focus for two weeks and knock it out myself.
It’s always the people management stuff that’s the hard part, but AI isn’t going to solve that. I don’t know what my previous manager’s deal was, but AI wouldn’t fix it.
The funny thing is I think these things would work much better if they WEREN'T so insistent on the agentic thing. Like, I find in-IDE AI tools a lot more precise and I usually move just as fast as a TUI with a lot less rework. But Claude is CONSTANTLY pushing me to try to "one shot" a big feature while asking me for as little context as possible. I'd much rather it work with me as opposed to just wandering off and writing a thousand lines. It's obviously designed for anthropic's best interests rather than mine.
Tell it to ask clarifying questions.
I do. But, there's a lot of annoying things about it being a TUI. I can't select a block of text in my editor and ask it to do something with it. It doesn't know what I'm looking at. Giving it context feels imprecise because I'm writing out filenames by hand instead of referencing them with the tools. A lot of other small things that I find are better in an IDE
Where is this "90% less demand for SWEs" going to come from? Are we going to run out software to write?
Historically when SWEs became more efficient then we just started making more complicated software (and SWE demand actually increased).
That happens in times of bullish markets and growing economies. Then we want a lot of SWEs.
In times of uncertainty and things going south, that changes to we need as little SWEs as possible, hence the current narrative, everyone is looking to cut costs.
Had GPT 3 emerged 10-20 years ago, the narrative would be “you can now do 100x more thanks to AI”.
I sort of agree the random pontification and bad analogies aren't super useful, but I'm not sure why you would believe the intent of the AI CEOs has more bearing on outcomes than, you know, actual utility over time. I mean those guys are so far out over their skis in terms of investor expectations, it's the last opinion I would take seriously in terms of best-effort predictions.
Who is actually trying to use a fully autonomous AI employee right now?
Isn't everyone using agentic copilots or workflows with agent loops in them?
It seems that they are arguing against doing something that almost no one is doing yet.
But actually the AI Employee is coming by the end of 2026 and the fully autonomous AI Company in 2027 sometime.
Many people have been working on versions of these things for awhile. But again for actual work 99% are using copilots or workflows with well-defined agent loops nodes still. Far as I know.
As a side note I have found that a supervisor agent with a checklist can fire off subtasks and that works about as well as a workflow defined in code.
But anyway, what's holding back the AI Employee are things like really effective long term context and memory management and some level of interface generality like browser or computer use and voice. Computer use makes context management even more difficult. And another aspect is token cost.
But I assume within the next 9 months or so, more and more people will be figuring out how to build agents that write their own workflows, manage their own limited context and memory effectively across Zoom meetings desktops and ssh sessions, etc.
This will likely be a featureset from the model providers themselves. Actually it may leverage continual learning abilities baked into the model architecture itself. I doubt that is a full year away.
> the AI Employee is coming by the end of 2026 and the fully autonomous AI Company in 2027 sometime
We'll see! I'm skeptical.
> what's holding back the AI Employee are things like really effective long term context and memory management and some level of interface generality like browser or computer use and voice
These are pretty big hurdles. Assuming they're solved by the end of this year is a big assumption to make.
https://platform.claude.com/cookbook/tool-use-automatic-cont...
https://research.google/blog/introducing-nested-learning-a-n...
Already very strong progress.
Coincidentally, Pika just launched "AI Selves":
Pika AI Selves let you create a persistent, portable AI version of you built on your personality, taste, memories, voice, and appearance. They're multi-modal – text, voice/audio, image, video – and live your life across every platform.
Funny you described everything I worked on for this project: https://github.com/rush86999/atom
Cats out of the bag. Everyone knows the issue and I bet a lot of people are trying to deliver the same thing.
I think you're forgetting about accountability: who's to blame when AI messes up?
My guess is we'll see a gradual slope rather than a cliff
For some reason AIs love to generate "Not X, but Y", "Not only X, but Y" sentences — It's as if they are template-based.
Yeah, as someone said before, that's the em dash of 2026. btw I also find em dashes very useful and now I can't use them because of that meme. It's good to see a person using one (asuming you're a person).
In the latest interview with Claude Code's author: https://podcasts.apple.com/us/podcast/lennys-podcast-product..., Boris said that writing code is a solved problem. This brings me to a hypothetical question: what if engineers stop contributing to open source, in which case would AI still be powerful enough to learn the knowledge of software development in the future? Or is the field of computer science plateaued to the point that most of what we do is linear combination of well established patterns?
> Boris said that writing code is a solved problem
That's just so dumb to say. I don't think we can trust anything that comes out of the mouths of the authors of these tools. They are conflicted. Conflict of interest, in society today, is such a huge problem.
There are bloggers that can't even acknowledge that they're only invited out to big tech events because they'll glaze them up to high heavens.
Reminds me of that famous exchange, by noted friend of Jeffrey Epstein, Noam Chomsky: "I’m not saying you’re self-censoring. I’m sure you believe everything you say. But what I’m saying is if you believed something different you wouldn’t be sitting where you’re sitting."
> That's just so dumb to say
Depends. Its true of dumb code and dumb coders. Anorher reason why yes, smart pepple should not trust.
Its all basically: Sensationalist take to shock you and get attention
He is likely working on a very clean codebase where all the context is already reachable or indexed. There are probably strong feedback loops via tests. Some areas I contribute to have these characteristics, and the experience is very similar to his. But in areas where they don’t exist, writing code isn’t a solved problem until you can restructure the codebase to be more friendly to agents.
Even with full context, writing CSS in a project where vanilla CSS is scattered around and wasn’t well thought out originally is challenging. Coding agents struggle there too, just not as much as humans, even with feedback loops through browser automation.
It's funny that "restructure the codebase to be more friendly to agents" aligns really well with what we have "supposed" to have been doing already, but many teams slack on: quality tests that are easy to run, and great documentation. Context and verifiability.
The easier your codebase is to hack on for a human, the easier it is for an LLM generally.
Turns out the single point of failure irreplaceable type of employees who intentionally obfuscated the projects code for the last 10+ years were ahead of their time.
I had this epiphany a few weeks ago, I'm glad to see others agreeing. Eventually most models will handle large enough context windows where this will sadly not matter as much, but it would be nice for the industry to still do everything to make better looking code that humans can see and appreciate.
It’s really interesting. It suggests that intelligence is intelligence, and the electronic kind also needs the same kinds of organization that humans do to quickly make sense of code and modify it without breaking something else.
Truth. I've had much easier time grappling with code bases I keep clean and compartmentalized with AI, over-stuffing context is one of the main killers of its quality.
Having picked up a few long neglected projects in the past year, AI has been tremendous in rapidly shipping quality of dev life stuff like much improved test suites, documenting the existing behavior, handling upgrades to newer framework versions, etc.
I've really found it's a flywheel once you get going.
All those people who thought clean well architected code wasn’t important…now with LLMs modifying code it’s even more important.
> He is likely working on
... a laundry list phone app.
I think you mean software engineering, not computer science. And no, I don’t think there is reason for software engineering (and certainly not for computer science) to be plateauing. Unless we let it plateau, which I don’t think we will. Also, writing code isn’t a solved problem, whatever that’s supposed to mean. Furthermore, since the patterns we use often aren’t orthogonal, it’s certainly not a linear combination.
I assume that new business scenarios will drive new workflows, which requires new work of software engineering. In the meantime, I assume that computer science will drive paradigm shift, which will drive truly different software engineering practice. If we don't have advances in algorithms, systems, and etc, I'd assume that people can slowly abstract away all the hard parts, enabling AI to do most of our jobs.
Or does the field become plateaued because engineers treat "writing code" as a "solved problem?"
We could argue that writing poetry is a solved problem in much the same way, and while I don't think we especially need 50,000 people writing poems at Google, we do still need poets.
> we especially need 50,000 people writing poems at Google, we do still need poets.
I'd assume that an implied concern of most engineers is how many software engineers the world will need in the future. If it's the situation like the world needing poets, then the field is only for the lucky few. Most people would be out of job.
I saw Boris give a live demo today. He had a swarm of Claude agents one shot the most upvoted open issue on Excalidraw while he explained Claude code for about 20 minutes.
No lines of code written by him at all. The agent used Claude for chrome to test the fix in front of us all and it worked. I think he may be right or close to it.
Did he pick Excalidraw as the project to work on, or did the audience?
It's easy to be conned if you're not looking for the sleight of hand. You need to start channelling your inner Randi whenever AI demos are done, there's a lot of money at stake and a lot of money to prep a polished show.
To be honest, even if the audience "picked" that project, it could have been a plant shouting out the project.
I'm not saying they prepped the answer, I'm saying they prepped picking a project it could definitely work on. An AI solvable problem.
>writing code is a solved problem
sure is news for the models tripping on my thousands of LOC jquery legacy app...
Could the LLM rewrite it from scratch?
boss, the models can't even get all the api endpoints from a single file and you want to rewrite everything?!
not to mention that maybe the stakeholders don't want a rewrite, they just to modernize the app and add some new features
My prediction: soon (e.g. a few years) the agents will be the one doing the exploration and building better ways to write code, build frameworks,... replacing open source. That being said software engineers will still be in the loop. But there will be far less of them.
Just to add: this is only the prediction of someone who has a decent amount of information, not an expert or insider
I really doubt it. So far these things are good at remixing old ideas, not coming up with new ones.
Generally us humans come up with new things by remixing old ideas. Where else would they come from? We are synthesizing priors into something novel. If you break the problem space apart enough, I don't see why some LLM can't do the same.
LLM's cannot synthesize text, they can only concatenate or mix statistically. Synthesis requires logical reasoning. That's not how LLMs work.
Yes it is, LLMs perform logical multi step reasoning all the time, see math proofs, coding etc. And whether you call it synthesis or statistical mixing is just semantics. Do LLMs truly understand? Who knows, probably not, but they do more than you make it out to be.
I don't want to speak too much out of my depth here, I'm still learning how these things work on a mechanical level, but my understanding of how these things "reason" is it seems like they're more or less having a conversation with themselves. IE, burning a lot of tokens in the hopes that the follow up questions and answers it generates leads to a better continuation of the conversation overall. But just like talking to a human, you're likely to come up with better ideas when you're talking to someone else, not just yourself, so the human in the loop seems pretty important to get the AI to remix things into something genuinely new and useful.
They do not. The "reasoning" is just adding more text in multiple steps, and then summarizing it. An LLM does not apply logic at any point, the "reasoning" features only use clever prompting to make these chains more likely to resemble logical reasoning.
This is still only possible if the prompts given by the user resembles what's in the corpus. And the same applies to the reasoning chain. For it to resemble actual logical reasoning, the same or extremely similar reasoning has to exist in the corpus.
This is not "just" semantics if your whole claim is that they are "synthesizing" new facts. This is your choice of misleading terminology which does not apply in the slightest.
There's so many timeless books on how to write software, design patterns, lessons learned from production issues. I don't think AI will stop being used for open source, in fact, with the number of increasing projects adjusting their contributor policies to account for AI I would argue that what we'll see is always people who love to hand craft their own code, and people who use AI to build their own open source tooling and solutions. We will also see an explosion is needing specs for things. If you give a model a well defined spec, it will follow it. I get better results the more specific I get about how I want things built and which libraries I want used.
> is the field of computer science plateaued to the point that most of what we do is linear combination of well established patterns?
Computer science is different from writing business software to solve business problems. I think Boris was talking about the second and not the first. And I personally think he is mostly correct. At least for my organization. It is very rare for us to write any code by hand anymore. Once you have a solid testing harness and a peer review system run by multiple and different LLMs, you are in pretty good shape for agentic software development. Not everybody's got these bits figured out. They stumble around and them blame the tools for their failures.
> Not everybody's got these bits figured out. They stumble around and them blame the tools for their failures.
Possible. Yet that's a pretty broad brush. It could also be that some businesses are more heavily represented in the training set. Or some combo of all the above.
"Writing code is a solved problem" disagree.
Yes, there are common parts to everything we do, at the same time - I've been doing this for 25 years and most of the projects have some new part to them.
Novel problems are usually a composite of simpler and/or older problems that have been solved before. Decomposition means you can rip most novel problems apart and solve the chunks. LLMs do just fine with that.
The creator of the hammer says driving nails into wood planks is a solved problem. Carpenters are now obsolete.
Prediction: open source will stop.
Sure, people did it for the fun and the credits, but the fun quickly goes out of it when the credits go to the IP laundromat and the fun is had by the people ripping off your code. Why would anybody contribute their works for free in an environment like that?
I believe the exact opposite. We will see open source contributions skyrocket now. There are a ton of people who want to help and share their work, but technical ability was a major filter. If the barrier to entry is now lowered, expect to see many more people sharing stuff.
Yes, more people will be sharing stuff. And none of it will have long term staying power. Or do you honestly believe that a project like GCC or Linux would have been created and maintained over as long as they have been by the use of AI tools in the hands of noobs?
Technical ability is an absolute requirement for the production of quality work. If the signal drowns in the noise then we are much worse off than where we started.
I’m sure you know the majority of GCC and Linux contributors aren’t volunteers, but employees who are paid to contribute. I’m struggling to name a popular project that it isn’t the case. Can you?
If AI is powerful enough to flood open source projects with low quality code, it will be powerful enough to be used as gatekeeper. Major players who benefit from OSS, says Google, will make sure of that. We don’t know how it will play out. It’s shortsighted to dismiss it all together.
> I’m struggling to name a popular project that it isn’t the case. Can you?
There’s emacs, vim, and popular extensions of the two. OpenBSD, lots of distros (some do develop their own software), SDL,…
Ok but now you have raised the bar from "open source" to "quality work" :)
Even then, I am not sure that changes the argument. If Linus Torvalds had access to LLMs back then, why would that discourage him from building Linux? And we now have the capability of building something like Linux with fewer man-hours, which again speaks in favor of more open source projects.
Many did it for liberty - a philosophical position on freedom in software. They're supercharged with AI.
Even as the field evolves, the phoning home telemetry of closed models creates a centralized intelligence monopoly. If open source atrophies, we lose the public square of architectural and design reasoning, the decision graph that is often just as important as the code. The labs won't just pick up new patterns; they will define them, effectively becoming the high priests of a new closed-loop ecosystem.
However, the risk isn't just a loss of "truth," but model collapse. Without the divergent, creative, and often weird contributions of open-source humans, AI risks stagnating into a linear combination of its own previous outputs. In the long run, killing the commons doesn't just make the labs powerful. It might make the technology itself hit a ceiling because it's no longer being fed novel human problem-solving at scale.
Humans will likely continue to drive consensus building around standards. The governance and reliability benefits of open source should grow in value in an AI-codes-it-first world.
> It might make the technology itself hit a ceiling because it's no longer being fed novel human problem-solving at scale.
My read of the recent discussion is that people assume that the work of far fewer number of elites will define the patterns for the future. For instance, implementation of low-level networking code can be the combination of patterns of zeromq. The underlying assumption is that most people don't know how to write high-performance concurrent code anyway, so why not just ask them to command the AI instead.
>My read of the recent discussion is that people assume that the work of far fewer number of elites will define the patterns for the future.
Even if we assume that's true, what will prevent atrophy of the skillset among the elites with such a small pool of practitioners?
I don’t believe people who have dedicated their lives to open source will simply want to stop working on it, no matter how much is or is not written by AI. I also have to agree, I find myself more and more lately laughing about just how much resources we waste creating exactly the same things over and over in software. I don’t mean generally, like languages, I mean specifically. How many trillions of times has a form with username and password fields been designed, developed, had meetings over, tested, debugged, transmitted, processed, only to ultimately be re-written months later?
I wonder what all we might build instead, if all that time could be saved.
> I don’t believe people who have dedicated their lives to open source will simply want to stop working on it, no matter how much is or is not written by AI.
Yeah, hence my question can only be hypothetical.
> I wonder what all we might build instead, if all that time could be saved
If we subscribe to Economics' broken-window theory, then the investment into such repetitive work is not investment but waste. Once we stop such investment, we will have a lot more resources to work on something else, bring out a new chapter of the tech revolution. Or so I hope.
> If we subscribe to Economics' broken-window theory, then the investment into such repetitive work is not investment but waste. Once we stop such investment, we will have a lot more resources to work on something else, bring out a new chapter of the tech revolution. Or so I hope.
I'm not sure I agree with the application of the broken-window theory here. That's a metaphor intended to counter arguments in favor of make-work projects for economic stimulus: the idea here is that breaking a window always has a net negative on the economy, since even though it creates demand for a replacement window, the resources that are necessary to replace a window that already existed are just being allocated to restore the status quo ante, but the opportunity cost of that is everything else the same resources might have bee used for instead, if the window hadn't been broken.
I think that's quite distinct from manufacturing new windows for new installations, which is net positive production, and where newer use cases for windows create opportunities for producers to iterate on new window designs, and incrementally refine and improve the product, which wouldn't happen if you were simply producing replacements for pre-existing windows.
Even in this example, lots of people writing lots of different variations of login pages has produced incremental improvements -- in fact, as an industry, we haven't been writing the same exact login page over and over again, but have been gradually refining them in ways that have evolved their appearance, performance, security, UI intuitiveness, and other variables considerably over time. Relying on AI to design, not just implement, login pages will likely be the thing that causes this process to halt, and perpetuate the status quo indefinitely.
> Boris said that writing code is a solved problem.
No way, the person selling a tool that writes code says said tool can now write code? Color me shocked at this revelation.
Let's check in on Claude Code's open issues for a sec here, and see how "solved" all of its issues are? Or my favorite, how their shitty React TUI that pegs modern CPUs and consumes all the memory on the system is apparently harder to get right than Video Games! Truly the masters of software engineering, these Anthropic folks.
That is the same team that has an app that used React for TUI, that uses gigabytes to have a scrollback buffer, and that had text scrolling so slow you could get a coffee in between.
And that then had the gall to claim writing a TUI is as hard as a video game. (It clearly must be harder, given that most dev consoles or text interfaces in video games consistently use less than ~5% CPU, which at that point was completely out of reach for CC)
He works for a company that crowed about an AI-generated C compiler that was so overfitted, it couldn't compile "hello world"
So if he tells me that "software engineering is solved", I take that with rather large grains of salt. It is far from solved. I say that as somebody who's extremely positive on AI usefulness. I see massive acceleration for the things I do with AI. But I also know where I need to override/steer/step in.
The constant hypefest is just vomit inducing.
I wanted to write the same comment. These people are fucking hucksters. Don’t listen to their words, look at their software … says all you need to know.
Even if you like them, I don't think there's any reason to believe what people from these companies say. They have every reason to exaggerate or outright lie, and the hype cycle moves so quickly that there are zero consequences for doing so.
The exoskeleton analogy seems to be fitting where my work-mode is configurable: moving from tentative to trusting. But the AI needs to be explicitly set up to learn my every action. Currently this is a chore at best, just impossible in other cases.
I like this. This is an accurate state of AI at this very moment for me. The LLM is (just) a tool which is making me "amplified" for coding and certain tasks.
I will worry about developers being completely replaced when I see something resembling it. Enough people worry about that (or say it to amp stock prices) -- and they like to tell everyone about this future too. I just don't see it.
Amplified means more work done by fewer people. It doesn’t need to replace a single entire functional human being to do things like kill the demand for labor in dev, which in turn, will kill salaries.
I would disagree. Amplified meens me and you get more s** done.
Unless there a limited amount of software we need to produce per year globally to keep everyone happy, then nobody wants more -- and we happen to be at that point right NOW this second.
I think not. We can make more (in less time) and people will get more. This is the mental "glass half full" approach I think. Why not take this mental route instead? We don't know the future anyway.
In fact, there isn’t infinite demand for software. Especially not for all kinds of software.
And if corporate wealth means people get paid more, why are companies that are making more money than ever laying off so many people? Wouldn’t they just be happy to use them to meet the inexhaustible demand for software?
I do wonder though if we have about enough (or too much) software.
I hear people complaining about software being forced on them to do things they did just fine without software before, than people complaining about software they want that doesn’t exist.
Yeah I think being annoyed by software is far more prevalent than wishing for more software. That said, I think there is still a lot of room for software growth as long as it's solving real problems and doesn't get in people's way. What I'm not sure about is what will the net effect of AI be overall when the dust settles.
On one hand it is very empowering to individuals, and many of those individuals will be able to achieve grander visions with less compromise and design-by-committee. On the other hand, it also enables an unprecedented level of slop that will certainly dilute the quality of software overall. What will be the dominant effect?
Jevon's paradox means this is untrue because it means more work not less.
Jevon’s Paradox is an important observation but I don’t think it’s an immutable law of the universe,
It is a 19th century economic observation around the use of coal.
It is like saying the PDF is going to be good for librarian jobs because people will read more. It is stupid. It completely breaks down because of substitution.
Farming is the most obvious comparison to me in this. Yes, there will be more food than ever before, the farmer that survives will be better off than before by a lot but to believe the automation of farming tasks by machines leads to more farm jobs is completely absurd.
Hm. More of what? Functionality, security, performance?
Current software is often buggy because the pressure to ship is just too high. If AI can fix some loose threads within, the overall quality grows.
Personally, I would welcome a massive deployment of AI to root out various zero-days from widespread libraries.
But we may instead get a larger quantity of even more buggy software.
This is incorrect. It’s basic economics - technology that boosts productivity results in higher salaries and more jobs.
That’s not basic economics. Basic economics says that salaries are determined by the demand for labor vs the supply of labor. With more efficiency, each worker does more labor, so you need fewer people to accomplish the same thing. So unless the demand for their product increases around the same rate as productivity increases, companies will employ fewer people. Since the market for products is not infinite, you only need as much labor as you require to meet the demand for your product.
Companies that are doing better than ever are laying people off by the shipload, not giving people raises for a job well done.
You obviously haven't thought about economics much at all to say something this simplistic.
There are so many counter examples of this being wrong that it is not even worth bothering.
I love economics, but it is largely a field based around half truths and intellectual fraud. It is actually why it is an interesting subject to study.
Denial of economic truths is denial of science. Not sure what to tell you. What parts do you reject?
Well, that depends on whether the technology requires expertise that is rare and/or hard to acquire.
I'd say that using AI tools effectively to create software systems is in that class currently, but it isn't necessarily always going to be the case.
Nah, most of it just gets returned to capital holders.
The more likely outcome is that fewer devs will be hired as fewer devs will be needed to accomplish the same amount of output.
The old shrinking markets aka lump of labour fallacy. It's a bit like dreaming of that mythical day, when all of the work will be done.
No it's not that.
Tell me, when was the last time you visited your shoe cobbler? How about your travel agent? Have you chatted with your phone operator recently?
The lump labour fallacy says it's a fallacy that automation reduces the net amount of human labor, importantly, across all industries. It does not say that automation won't eliminate or reduce jobs in specific industries.
It's an argument that jobs lost to automation aren't a big deal because there's always work somewhere else but not necessarily in the job that was automated away.
Jobs are replaced when new technology is able to produce an equivalent or better product that meets the demand, cheaper, faster, more reliably, etc. There is no evidence that the current generation of "AI" tools can do that for software.
There is a whole lot of marketing propping up the valuations of "AI" companies, a large influx of new users pumping out supremely shoddy software, and a split in a minority of users who either report a boost in productivity or little to no practical benefits from using these tools. The result of all this momentum is arguably net negative for the industry and the world.
This is in no way comparable to changes in the footwear, travel, and telecom industries.
I was with you till like a month ago. Now I’m not so sure..
Current generation "AI" has already largely solved cheaper, faster, and more reliable. But it hasn't figured out how to curb demand. So far, the more software we build, the more people want even more software. Much like is told in the lump of labor fallacy, it appears that there is no end to finding productive uses for software. And certainly that has been the "common wisdom" for at least the last couple of decades; that whole "software is eating the world" thing.
What changed in the last month that has you thinking that a demand wall is a real possibility?
This implication completely depends on the elasticity (or lack thereof) of demand for software. When marginal profit from additional output exceeds labor cost savings, firms expand rather than shrink.
When computers came onto the market and could automate a large percentage of office jobs, what happened to the job market for office jobs?
They changed, significantly.
We lost the pneumatic tube [1] maintenance crew. Secretarial work nearly went away. A huge number of bookkeepers in the banking industry lost their jobs. The job a typist was eliminated/merged into everyone else's job. The job of a "computer" (someone that does computations) was eliminated.
What we ended up with was primarily a bunch of customer service, marketing, and sales workers.
There was never a "office worker" job. But there were a lot of jobs under the umbrella of "office work" that were fundamentally changed and, crucially, your experience in those fields didn't necessarily translate over to the new jobs created.
[1] https://www.youtube.com/watch?v=qman4N3Waw4
I expect something like this will happen to some degree, although not to the extent of what happened with computers.
But the point is that we didn't just lose all of those jobs.
Right, and my point is that specific jobs, like the job of a dev, were eliminate or significantly curtailed.
New jobs may be waiting for us on the other side of this, but my job, the job of a dev, is specifically under threat with no guarantee that the experience I gained as a dev will translate into a new market.
I think as a dev if you're just gluing API's together or something akin to that, similar to the office jobs that got replaced, you might be in trouble, but tbh we should have automated that stuff before we got AI. It's kind of a shame it may be automated by something not deterministic tho.
But like, if we're talking about all dev jobs being replaced then we're also talking about most if not all knowledge work being automated, which would probably result in a fundamental restructuring of society. I don't see that happening anytime soon, and if it does happen it's probably impossible to predict or prepare for anyways. Besides maybe storing rations and purchasing property in the wilderness just in case.
So true. It is a exoskeleton for all my tedious tasks. I don't want to make a html template. I just want to type, make that template like on that page but this and this data.
AI most definitely is a coworker already. You do delegate some work for which you previously had to hire humans.
And the amount of work that could be delegated, with equal or better results than those from average human workers, is far higher than currently attempted in most companies. Industries have barely started using the potential of even current-generation AI.
Agreed, and with each passing month the work that 'could' be done increases. I don't write code anymore, for example, (after 20 years of doing so) Opus does that part of the job for me now. I think we have a period where current experienced devs are still in the loop, but that will eventually go away too.
Exactly
You didn’t read the article
Why would I when I can have openclaw do that for me?
And than you fix the produces shit, got high blood pressure and think "damn it,how I would love to yell at that employee"
Not true at all with frontier models in last ~6 months or so. The frontier models today produce code better than 90% of junior to mid-level human developers.
You say that, but it's been better than most employees for a year or so ( *for specific tasks, of course. It's still not better than "an employee" )
Just like a real employee!
And just like a real employee, this makes it work worse.
(Old study, I wonder if it holds up on newer models? https://arxiv.org/pdf/2402.14531)
Interesting, I've actually found swearing at the dumbass bots to give better results, might just be the catharsis of telling it it's a dumbass though.
LLMs are a statistical model of token-relationships, and a weighted-random retrieval from a compressed-view of those relations. It's a token-generator. Why make this analogy?
If we find an AI that is truly operating as an independent agent in the economy without a human responsible for it, we should kill it. I wonder if I'll live long enough to see an AI terminator profession emerge. We could call them blade runners.
It happened not too long ago! https://news.ycombinator.com/item?id=46990729
Was it ever verified that this was an independent AI?
It was not. In the article, first few paragraphs.
It's the new underpaid employee that you're training to replace you.
People need to understand that we have the technology to train models to do anything that you can do on a computer, only thing that's missing is the data.
If you can record a human doing anything on a computer, we'll soon have a way to automate it
Sure, but do you want abundance of software, or scarcity?
The price of having "star trek computers" is that people who work with computers have to adapt to the changes. Seems worth it?
My only objection here is that technology wont save us unless we also have a voice in how it is used. I don't think personal adaptation is enough for that. We need to adapt our ways to engage with power.
Abundance of services before abundance of physical resources seems like the worst of both worlds.
Aggressively expanding solar would make electrical power a solved problem, and other previously non-abatable sources of kinetic energy are innovating to use this instead of fossil fuels
Both abundance and scarcity can be bad. If you can't imagine a world where abundance of software is a very bad thing, I'd suggest you have a limited imagination?
It’s not worth it because we don’t have the Star Trek culture to go with it.
Given current political and business leadership across the world, we are headed to a dystopian hellscape and AI is speeding up the journey exponentially.
It's a strange economical morbid dependency. AI companies promises incredible things but AI agents cannot produce it themselves, they need to eat you slowly first.
Perfect analogy for capitalism.
Exactly. If there's any opportunity around AI it goes to those who have big troves of custom data (Google Workspace, Office 365, Adobe, Salesforce, etc.) or consultants adding data capture/surveillance of workers (especially high paid ones like engineers, doctors, lawyers).
> the new underpaid employee that you're training to replace you.
and who is also compiling a detailed log of your every action (and inaction) into a searchable data store -- which will certainly never, NEVER be used against you
Data clearly isn't the only issue. LLMs have been trained on orders of magnitude more data than any person has ever seen.
How much practice have you got on software development with agentic assistance. Which rough edges, surprising failure modes, unexpected strengths and weaknesses, have you already identified?
How much do you wish someone else had done your favorite SOTA LLM's RLHF?
I think we’re past the “if only we had more training data” myth now. There are pretty obviously far more fundamental issues with LLMs than that.
i've been working in this field for a very long time, i promise you, if you can collect a dataset of a task you can train a model to repeat it.
the models do an amazing job interpolating and i actually think the lack of extrapolation is a feature that will allow us to have amazing tools and not as much risk of uncontrollable "AGI".
look at seedance 2.0, if a transformer can fit that, it can fit anything with enough data
LLMs have a large quantity of chess data and still can't play for shit.
Not anymore. This benchmark is for LLM chess ability: https://github.com/lightnesscaster/Chess-LLM-Benchmark?tab=r.... LLMs are graded according to FIDE rules so e.g. two illegal moves in a game leads to an immediate loss.
This benchmark doesn't have the latest models from the last two months, but Gemini 3 (with no tools) is already at 1750 - 1800 FIDE, which is approximately probably around 1900 - 2000 USCF (about USCF expert level). This is enough to beat almost everyone at your local chess club.
Yeah, but 1800 FIDE players don't make illegal moves, and Gemini does.
1800 FIDE players do make illegal moves. I believe they make about one to two orders of magnitude less illegal moves than Gemini 3 does here. IIRC the usual statistic for expert chess play is about 0.02% of expert chess games have an illegal move (I can look that up later if there's interest to be sure), but that is only the ones that made it into the final game notation (and weren't e.g. corrected at the board by an opponent or arbiter). So that should be a lower bound (hence why it could be up to one order lower, although I suspect two orders is still probably closer to the truth).
Whether or not we'll see LLMs continue to get a lower error rate to make up for those orders of magnitude remains to be seen (I could see it go either way in the next two years based on the current rate of progress).
I think LLM's are just fundamentally the wrong AI technique for games like this. You don't want a prediction for the next move, you want the best move given knowledge of how things would play out 18 moves ahead if both players played the optimal move. Outside of an academic interest/curiosity, there isn't really a reason to use LLMs for chess other than thinking LLMs will turn into AGI (I doubt it)
A player at that level making an illegal move is either tired, distracted, drunk, etc. An LLM makes it because it does not really "understand" the rules of chess.
That benchmark methodology isn't great, but regardless, LLMs can be trained to play Chess with a 99.8% legal move rate.
That doesn't exactly sound like strong chess play.
It's enough to reliably beat amateur (e.g. maia-1900) chess engines.
They have literally every chess game in existence to train on, and they can't do better than 1800?
Why do you think they won’t continue to improve?
Because of how LLM's work. I don't know exactly how they're using it for chess, but here's a guess. If you consider the chess game a "conversation" between two opponents, the moves written out would be the context window. So you're asking the LLM, "given these last 30 moves, what's the most likely next move?". Ie, you're giving it a string like "1. e4 e5, 2. Nf3 Nc6, 3. Bb5 a6, 4..?".
That's basically what you're doing with LLMs in any context "Here's a set of tokens, what's the most likely continuation?". The problem is, that's the wrong question for a chess move. If you're going with "most likely continuation", that will work great for openings and well-studied move sequences (there are a lot of well studied move sequences!), however, once the game becomes "a brand new game", as chess streamers like to say when there's no longer a game in the database with that set of moves, then "what's the most likely continuation from this position?" is not the right question.
Non-LLM AI's have obviously solved chess, so, it doesn't really matter -- I think Chess shows how LLM's lack of a world model as Gary Marcus would say is a problem.
Wait, I may be missing something here. These benchmarks are gathered by having models play each other, and the second illegal move forfeits the game. This seems like a flawed method as the models who are more prone to illegal moves are going to bump the ratings of the models who are less likely.
Additionally, how do we know the model isn’t benchmaxxed to eliminate illegal moves.
For example, here is the list of games by Gemini-3-pro-preview. In 44 games it preformed 3 illegal moves (if I counted correctly) but won 5 because opponent forfeits due to illegal moves.
https://chessbenchllm.onrender.com/games?page=5&model=gemini...
I suspect the ratings here may be significantly inflated due to a flaw in the methodology.
EDIT: I want to suggest a better methodology here (I am not gonna do it; I really really really don’t care about this technology). Have the LLMs play rated engines and rated humans, the first illegal move forfeits the game (same rules apply to humans).
The LLMs do play rated engines (maia and eubos). They provide the baselines. Gemini e.g. consistently beats the different maia versions.
The rest is taken care of by elo. That is they then play each other as well, but it is not really possible for Gemini to have a higher elo than maia with such a small sample size (and such weak other LLMs).
Elo doesn't let you inflate your score by playing low ranked opponents if there are known baselines (rated engines) because the rated engines will promptly crush your elo.
You could add humans into the mix, the benchmark just gets expensive.
I did indeed miss something. I learned after posting (but before my EDIT) that there are anchor engines that they play.
However these benchmarks still have flaws. The two illegal moves = forfeit is an odd rule which the authors of the benchmarks (which in this case was Claude Code) added[1] for mysterious reasons. In competitive play if you play an illegal move you forfeit the game.
Second (and this is a minor one) Maia 1900 is currently rated at 1774 on lichess[2], but is 1816 on the leaderboard, to the author’s credit they do admit this in their methodology section.
Third, and this is a curiosity, gemini-3-pro-preview seems to have played the same game twice against Maia 1900[3][4] and in both cases Maia 1900 blundered (quite suspiciously might I add) mate in one when in a winning position with Qa3?? Another curiosity about this game. Gemini consistently played the top 2 moves on lichess. Until 16. ...O-O! (which has never been played on lichess) Gemini had played 14 most popular lichess moves, and 2 second most popular. That said I’m not gonna rule out that the fact that this game is listed twice might stem from an innocent data entry error.
And finally, apart from Gemini (and Survival bot for some reason?), LLMs seem unable to pass Maia-1100 (rated 1635 on lichess). The only anchor bot before that is random bot. And predictably LLMs cluster on both sides of it, meaning they play as well as random (apart from the illegal moves). This smells like benchmaxxing from Gemini. I would guess that the entire lichess repertoire features prominently in Gemini’s training data, and the model has memorized it really well. And is able to play extremely well if it only has to play 5-6 novel moves (especially when their opponent blunders checkmate in 1).
1: https://github.com/lightnesscaster/Chess-LLM-Benchmark/commi...
2: https://lichess.org/@/maia9
3: https://chessbenchllm.onrender.com/game/6574c5d6-c85a-4cb3-b...
4: https://chessbenchllm.onrender.com/game/4af82d60-8ef4-47d8-8...
> The two illegal moves = forfeit is an odd rule which the authors of the benchmarks (which in this case was Claude Code) added[1] for mysterious reasons. In competitive play if you play an illegal move you forfeit the game.
This is not true. This is clearly spelled out in FIDE rules and is upheld at tournaments. First illegal move is a warning and reset. Second illegal move is forfeit. See here https://rcc.fide.com/article7/
I doubt GDM is benchmarkmaxxing on chess. Gemini is a weird model that acts very differently from other LLMs so it doesn't surprise me that it has a different capability profile.
>> 7.5.5 After the action taken under Article 7.5.1, 7.5.2, 7.5.3 or 7.5.4 for the first completed illegal move by a player, the arbiter shall give two minutes extra time to his/her opponent; for the second completed illegal move by the same player the arbiter shall declare the game lost by this player. However, the game is drawn if the position is such that the opponent cannot checkmate the player’s king by any possible series of legal moves.
I stand corrected.
I’ve never actually played competitive chess, I’ve just heard this from people who do. And I thought I remembered once in the Icelandic championships where a player touched one piece but moved the other, and subsequently made to forfeit the game.
Replying in a split thread to clearly separate where I was wrong.
If Gemini is so good at chess because of a non-LLM feature of the model, then it is kind of disingenuous to rate it as an LLM and claim that LLMs are approaching 2000 ELO. But the fact it still plays illegal moves sometimes, is biased towards popular moves, etc. makes me think that chess is still handled by an LLM, and makes me suspect benchmaxxing.
But even if no foul play, and Gemini is truly a capable chess player with nothing but an LLM underneath it, then all we can conclude is that Gemini can play chess well, and we cannot generalize to other LLMs who play about the level of random bot. My fourth point above was my strongest point. There are only 4 anchor engines, one beats all LLMs, second beats all except Gemini, the third beats all LLMs except Gemini and Survival bot (what is Survival bot even doing there?) and the forth is random bot.
That’s a devastating benchmark design flaw. Sick of these bullshit benchmarks designed solely to hype AI. AI boosters turn around and use them as ammo, despite not understanding them.
Relax. Anyone who's genuinely interested in the question will see with a few searches that LLMs can play chess fine, although the post-trained models mostly seem to be regressed. Problem is people are more interested in validating their own assumptions than anything else.
https://arxiv.org/abs/2403.15498
https://arxiv.org/abs/2501.17186
https://github.com/adamkarvonen/chess_gpt_eval
> That’s a devastating benchmark design flaw
I think parent simply missed until their later reply that the benchmark includes rated engines.
I like this game between grok-4.1-fast and maia-1100 (engine, not LLM).
https://chessbenchllm.onrender.com/game/37d0d260-d63b-4e41-9...
This exact game has been played 60 thousand times on lichess. The peace sacrifice Grok performed on move 6 has been played 5 million times on lichess. Every single move Grok made is also the top played move on lichess.
This reminds me of Stefan Zweig’s The Royal Game where the protagonist survived Nazi torture by memorizing every game in a chess book his torturers dropped (excellent book btw. and I am aware I just committed Godwin’s law here; also aware of the irony here). The protagonist became “good” at chess, simply by memorizing a lot of games.
The LLMs that can play chess, i.e not make an illegal move every game do not play it simply by memorized plays.
Why do we care about this? Chess AI have long been solved problems and LLMs are just an overly brute forced approach. They will never become very efficient chess players.
The correct solution is to have a conventional chess AI as a tool and use the LLM as a front end for humanized output. A software engineer who proposes just doing it all via raw LLM should be fired.
It's a proxy for generalized reasoning.
The point isn't that LLMs are the best AI architecture for chess.
Why? Beating chess is more about searching a probability space, not reasoning.
Reasoning would be more like the car wash question.
It's not entirely clear how LLMs that can play chess do so, but it is clearly very different from the way other machines do so. The construct a board, they can estimate a players skill and adjust accordingly, and unlike other machines and similarly to humans, they are sensitive to how a certain position came to be when predicting the next move.
Regardless, there's plenty of reasoning in chess.
It’s very clear how, chess moves and positions are vector encoded into their training data, when they are prompted with a certain board state, they respond with the most probable response to that. There is no reason.
Actual Researchers can't give you a complete answer but you can. Whatever you say.
> It's a proxy for generalized reasoning.
And so for I am only convinced that they have only succeeded on appearing to have generalized reasoning. That is, when an LLM plays chess they are performing Searle’s Chinese room thought experiment while claiming to pass the Turing test
Hm.. but do they need it.. at this point, we do have custom tools that beat humans. In a sense, all LLM need is a way to connect to that tool ( and the same is true is for counting and many other aspects ).
Yeah, but you know that manually telling the LLM to operate other custom tools is not going to be a long-term solution. And if an LLM could design, create, and operate a separate model, and then return/translate its results to you, that would be huge, but it also seems far away.
But I'm ignorant here. Can anyone with a better background of SOTA ML tell me if this is being pursued, and if so, how far away it is? (And if not, what are the arguments against it, or what other approaches might deliver similar capacities?)
This has been happening for the past year on verifiable problems (did the change you made in your codebase work end-to-end, does this mathematical expression validate, did I win this chess match, etc...). The bulk of data, RL environment, and inference spend right now is on coding agents (or broadly speaking, tool use agents that can make their own tools).
Recent advances in mathematical/physics research have all been with coding agents making their own "tools" by writing programs: https://openai.com/index/new-result-theoretical-physics/
Are you saying an LLM can't produce a chess engine that will easily beat you?
Plagiarizing Stockfish doesn’t make me good at chess. Same principle applies.
Did you already forget about the AlphaZero?
Petition to make "AI is not X, but Y" articles banned or limited in some way.
Hear, hear! I knew this is AI slop before opening the link.
that will crash the stock market
The exoskeleton framing is comforting but it buries the real shift: taste scales now. Before AI, having great judgment about what to build didn't matter much if you couldn't also hire 10 people to build it. Now one person with strong opinions and good architecture instincts can ship what used to require a team.
That's not augmentation, that's a completely different game. The bottleneck moved from "can you write code" to "do you know what's worth building." A lot of senior engineers are going to find out their value was coordination, not insight.
> That's not augmentation, that's a completely different game
Not saying that this comment is ai written, but this phrasing is the em-dash of 2026.
Look at his other comments - its textbook LLM slop. Its a fucking tragedy that people are letting their OpenClaws loose on HN but I can't say I'm surprised. I desperately need to find a good network of developers because I think the writing is on the wall for message boards like these...
That's absolutely correct, I fear.. In english those looks bad/funny/lazy...
But in code, its probably ok. Its idiomatic code, I guess.
True but also, the bot is right
Perhaps? The thing is, I don't come to HN comments to read what an LLM has to say. If that's what I wanted then I'd paste the contents of the article into one of them and ask.
What's the point of coming here for opinions of others in the field when we're met with something that wasn't even written by a human being?
it'll be interesting to see if people start writing worse as a form of countersignalling. deliberately making spleling mistakes, not caring about capital letters, or punctuation or grammar or proper writing techniques and making really long run-on sentences that don't go anywhere but hey at least the person reading it will know its written by a human right
"the real shift" is another telltale
You can build prototypes real fast, and that's cool. You can't really build products with it. You can use it at most as an accelerant, but you need it in skilled hands else it goes sideways fast.
I think you could build a product with it, but you need to carefully specify the design first. The same amount of actual engineering work needs to go in, but the AI can handle the overhead of implementing small pieces and connecting them together.
In practice, I would be surprised if this saves even 10% of time, since the design is the majority of the actual work for any moderately complex piece of software.
It's kind of tricky though because if you want to have a good design, you should be able to do the implementation yourself. You see this with orgs that separate the design and implementation and what messes they create. Having an inability to evaluate the implementation will lead to a bad product.
Code is also design. It’s a blueprint for the process that is going to do the useful work we want. When something bad happens to the process, we revise its blueprint. And just like blueprint, the docs in natural language shows the why, not the how or the what. The blueprint is the perfect representation of the last two.
My experience exactly, I have some toy projects I've basically "vibe coded" and actually use (ex. CV builder).
Professionally I have an agent generating most code, but if I tell the AI what to do, I guide it when it makes mistakes (which it does), can we really say "AI writes my code".
Still a very useful tool for sure!
Also, I don't actually know if I'm more productive than before AI, I would say yes but mostly because I'm less likely to procrastinate now as tasks don't _feel_ as big with the typing help.
> taste scales now.
Not having taste also scales now, and the majority of people like to think they're above average.
Before AI, friction to create was an implicit filter. It meant "good ideas" were often short-lived because the individual lacked conviction. The ideas that saw the light of day were sharpened through weeks of hard consideration and at least worth a look.
Now, anyone who can form mildly coherent thoughts can ship an app. Even if there are newly empowered unicorns, rapidly shipping incredible products, what are the odds we'll find them amongst a sea of slop?
Uh, that is the dictionary definition of augmentation.
One person with tools that greatly amplify what that person can accomplish.
Vs not having a person involved at all.
Did you purposely write this to sound like an LLM?
It's just good writing structure. I get the feeling many people hadn't been exposed to good structure before LLMs.
LLMs can definitely have a tone, but it is pretty annoying that every time someone cares to write well, they are getting accused of sounding like an LLM instead of the other way around. LLMs were trained to write well, on human writing, it's not surprising there is crossover.
It's really not "good" for many people. It's the sort of high-persuasion marketing speak that used to be limited to the blogs of glossy but shallow startups. Now it's been sucked up by LLMs and it's everywhere.
If you want good writing, go and read a New Yorker.
Not so sure about that. There are many distinct LLM "smells" in that comment, like "A is true, but it hides something: unrelated to A" and "It's not (just) C, it's hyperbole D".
I personally love that phrasing even if it's a clear tell. Comparisons work well for me to grasp an idea. I also love bullet points.
So yeah, I guess I like LLM writing.
Sure, but you can read articles that predate LLMs which have the same so called tells.
> Sure, but you can read articles that predate LLMs which have the same so called tells.
Not with such a high frequency, though. We're looking at 1 tell per sentence!
You're absolutely right, that isn't just good writing — that's poetry! Do you need further assistance?
There is such a thing as a distinct LLM writing style that is not just good structure. Anyone who's read more than five books can tell that.
And the comment itself seems completely LLM generated.
That's not just false. It's the antithesis of true.
It's not just using rhetorical patterns humans also use which are in some contexts considered good writing. Its overusing them like a high schooler learning the pattern for the first time — and massively overdoing the em dashes and mixing the metaphors
LOL :-))
It's true that LLMs have a distinct style, but it does not preclude humans from writing in a similar style. That's where the LLMs got it from, people and training. There's certainly some emergent style that given enough text, you would likely never see from a human. But in a short comment like this, it's really not enough data to be making good judgements.
Contrastive parallelism is an effective rhetorical device if the goal is to persuade or engage. It's not good if your goal is more honest, like pedagogy, curious exploration, discovery. It flattens and shoves things into categorical labels, leading the discussion more towards definitions of words and other sidetracks.
If it indicates, culturally in the current zeitgeist, that an AI wrote it, it becomes a bad structure.
They trained the LLMs on people who think in LinkedIn posts.
> can ship what used to require a team.
Is the shipped software in the room with us now?
Marshal McLuhan would probably have agreed with this belief -- technologies are essentially prosthetic was one of the core tenets of his general philosophy. It is the essential thesis of his work "Understanding Media: The Extensions of Man". AI is typically assigned otherness and separateness in recent discourse, rather than being considered as a directed tool (extension/prosthesis) under our control.
It’s a tool like a linter. It’s a fancy tool, but calling it anything more than a tool is hype
Did you ever you the newest LLMs with a harness? Because I usually hear this kind of talk from people whose most recent interaction was with GPT-4o copy-pasting code into the chat window.
Maybe I'm biased but I don't buy someone truly thinking that "it's just a tool like a linter" after using it on non-trivial stuff.
I'm using Claude Code (and Codex) (with the expensive subscriptions) on an app I'm building right now. I'm trying to be maximalist with them (to learn the most I can about them .. and also that subscription isn't cheap!). My impression, and yes, this is using the latest models and harness and all that would agree with the GP. They're a very handy tool. They make me faster. They also do a lot of things that, as a professional software developer, I have to frequently correct. They duplicate code like nobodies business. They decide on weird boundaries for functions and parameters. They undo bug fixes they just made. I think they're useful, but the hype is out of control. I would not trust software made with these tools by someone that couldn't write that software by hand. It might work superficially, but I'm definitely not giving any personal data to a vibe coded app with all the security implications.
I use it pretty extensively. The reason why it's a tool is because it cannot work without an SWE running it. You have to prompt it and re-prompt it. We are doing a lot of the heavy lifting with code agents that people hyping it are ignoring. Sure, as a non-swe, you can vibe a project from zero-to-proto, but that's not going to happen in an enterprise environment, certainly not without extensive QA/Code review.
Just take a look at the openclaw codebase and tell me you want to maintain that 500k loc project in the long-term. I predict that project will be dead within 6 months.
What's interesting to me is that most real productivity gains I've seen with AI come from this middle ground: not autonomy, not just tooling, but something closer to "interactive delegation"
AI is not an exoskeleton, it's a pretzel: It only tastes good if you douse it in lye.
it's a dry scone
AI is like sugar. It tastes delicious, but in high doses it causes diabetes.
The exoskeleton framing resonates, especially for repetitive data work. Parts where AI consistently delivers: pattern recognition, format normalization, first-draft generation. Parts where human judgment is still irreplaceable: knowing when the data is wrong, deciding what 'correct' even means in context, and knowing when to stop iterating.
The exoskeleton doesn't replace instinct. It just removes friction from execution so more cycles go toward the judgment calls that actually matter.
And your muscles degrade, a pretty good analogy
Use the exoskeleton at the warehouse to reduce stress and injury; just keep lifting weights at home to not let yourself atrophy.
I guess so, but if you have to keep lifting weights at home to stay competent at your job, then lifting weights is part of your job, and you should be paid for those hours.
The amount of "It's not X it's Y" type commentary suggests to me that A) nobody knows and B) there is solid chance this ends up being either all true or all false
Or put differently we've managed to hype this to the moon but somehow complete failure (see studies about zero impact on productivity) seem plausible. And similarly kills all jobs seems plausible.
That's an insane amount of conflicting opinions being help in the air at same time
It's possible we actually never had good metrics on software productivity. That seems very difficult to measure. I definitely use AI at my job to work less, not to produce more, and Claude Code is the only thing that has enabled me to have side-projects (had never tried it before, I have no idea how there are people with a coding full time job that also have a coding side project(s)).
This reminds me of the early days of the Internet. Lots of hype around something that was clearly globally transformation, but most people weren't benefiting hugely from it in the first few years.
It might have replaced sending a letter with an email. But now people get their groceries from it, hail rides, an even track their dogs or luggage with it.
Too many companies have been to focused on acting like AI 'features' have made their products better, when most of them haven't yet. I'm looking at Microsoft and Office especially. But tools like Claude Code, Codex CLI, and Github Copilot CLI have shown that LLMs can do incredible things in the right applications.
You appear to have said a lot. Without saying anything.
You appear to have written a lot. Without understanding anything.
Neither, AI is a tool to guide you in improving your process in any way and/or form.
The problem is people using AI to do the heavy processing making them dumber. Technology itself was already making us dumber, I mean, Tesla drivers not even drive anymore or know how, coz the car does everything.
Look how company after company is being either breached or have major issues in production because of the heavy dependency on AI.
Tech workers were pretty anti union for a long time, because we were all so excellent we were irreplaceable. I wonder if that will change.
We are going to see techluddites this year
Too late. Actors' unions shut Hollywood down 3 years ago over AI. SWEs would have had to make their move 10 years ago to be able to live up to this moment.
Yup, it’s the classic. “First they came for the…”
Neither. Closest analogy to you and the AI is those 'self driving' test subjects that had to sit in the driver's seat, so that compliance boxes could be checked and there was someone to blaim whenever someone got hit.
In the self-driving case, the safety driver often isn't contributing much to the system's performance
AI article this, AI article that. The front page of this website is just all about AI. I’m so tired of this website now. I really don’t read it anymore because it’s all the same stuff over and over. Ugh.
I agree. I call it my Extended Mind in the spirit of Clark (1). One thing I realized while working a lot in the last weeks with openClaw that this Agents are becoming an extension of my self. They are tools that quickly became a part of my Being. I outsource a lot of work to them, they do stuff for me, help me and support me and therefore make my (work-)life easier and more enjoyable. But its me in the driver seat.
(1) https://www.alice.id.tue.nl/references/clark-chalmers-1998.p...
I agree!
“Why LLM-Powered Programming is More Mech Suit Than Artificial Human”
https://matthewsinclair.com/blog/0178-why-llm-powered-progra...
I like this analogy, and in fact in have used it for a totally different reason: why I don't like AI.
Imagine someone going to a local gym and using an exosqueleton to do the exercises without effort. Able to lift more? Yes. Run faster? Sure. Exercising and enjoying the gym? ... No, and probably not.
I like writing code, even if it's boilerplate. It's fun for me, and I want to keep doing it. Using AI to do that part for me is just...not fun.
Someone going to the gym isn't trying to lift more or run faster, but instead improving and enjoying. Not using AI for coding has the same outcome for me.
We've all been raised in a world where we got to practice the 'art' of programming, and get paid extraordinarily well to do so, because the output of that art was useful for businesses to make more money.
If a programmer with an exoskeleton can produce more output that makes more money for the business, they will continue to be paid well. Those who refuse the exoskeleton because they are in it for the pure art will most likely trend towards earning the types of living that artists and musicians do today. The truly extraordinary will be able to create things that the machines can't and will be in high demand, the other 99% will be pursing an art no one is interested in paying top dollar for.
You’re forgetting that the “art” part of it is writing sound, scalable, performant code that can adapt and stand the test of time. That’s certainly more valuable in the long run than banging out some dogshit spaghetti code that “gets the job done” but will lead to all kinds of issues in the future.
> the “art” part of it is writing sound, scalable, performant code that can adapt and stand the test of time.
Sure, and it's possible to use LLM tools to aid in writing such code.
> I like writing code, even if it's boilerplate. It's fun for me, and I want to keep doing it. Using AI to do that part for me is just...not fun.
Good news for you is that you can continue to do what you are doing. Nobody is going to stop you.
There are people who like programming in assembly. And they still get to do that.
If you are thinking that in the future employers may not want you to do that, then yes, that is a concern. But, if the AI based dev tool hype dies out, as many here suspect it will, then the employers will see the light and come crawling back.
You can continue to do that for your personal projects. Nobody forces you to like AI. You may not have the choice at your job though, and you can't take Claude Code et al. from me. I've been programming for 30 years, and I still have fun with it, even with AI.
"It's not X, it's Y" detected.
Not sure how reliable is gptzero, but it says 90% AI for the first paragraph. (I like to do some sanity check before wasting my time).
Would be nice to have some browser extension automatically detecting likely AI output using a local model and highlighting it, but probably too compute-intensive.
You cant run at 10x in an exoskeleton, you can’t move your hand to write any faster using an exoskeleton, the analogy doesn’t fit.
you can with the one that I use
I see it more like the tractor in farming: it improved the work of 1 person, but removed the work from many other people who were in the fields doing things manually
That analogy also means there was more waste involved and less resource extraction.
I like the analogy and will ponder it more. But it didn't take long before the article started spruiking Kasava's amazing solution to the problem they just presented.
If AI is an exoskeleton, that would make the user a crab.
In the language of Lynch's Dune, AI is not an exoskeleton, it is a pain amplifier. Get it all wrong more quickly and deeply and irretrievably.
This is a useful framing. The exoskeleton metaphor captures it well — AI amplifies what you can already do, it doesn't replace the need to know what to do. I've found the biggest productivity gains come from well-scoped tasks where you can quickly verify the output.
All metaphors are flawed. You may still need a degree of general programming knowledge (for now) but you don't need to e.g. know Javascript to do frontend anymore.
And as labs continue to collect end-to-end training done by their best paying customers, the need for expert knowledge will only diminish.
You’re talking to an LLM, FYI.
Humans don’t have an internal notion of “fact” or “truth.” They generate statistically plausible text.
Reliability comes from scaffolding: retrieval, tools, validation layers. Without that, fluency can masquerade as authority.
The interesting question isn’t whether they’re coworkers or exoskeletons. It’s whether we’re mistaking rhetoric for epistemology.
> LLMs aren’t built around truth as a first-class primitive.
neither are humans
> They optimize for next-token probability and human approval, not factual verification.
while there are outliers, most humans also tend to tell people what they want to hear and to fit in.
> factuality is emergent and contingent, not enforced by architecture.
like humans; as far as we know, there is no "factuality" gene, and we lie to ourselves, to others, in politics, scientific papers, to our partners, etc.
> If we’re going to treat them as coworkers or exoskeletons, we should be clear about that distinction.
I don't see the distinction. Humans exhibit many of the same behaviours.
If an employee repeatedly makes factually incorrect statements, we will (or could) hold them accountable. That seems to be one difference.
Strangely, the GP replaced the ChatGPT-generated text you're commenting on by an even worse and more misleading ChatGPT-generated one. Perhaps in order to make a point.
There's a ground truth to human cognition in that we have to feed ourselves and survive. We have to interact with others, reap the results of those interactions, and adjust for the next time. This requires validation layers. If you don't see them, it's because they're so intrinsic to you that you can't see them.
You're just indulging in sort of idle cynical judgement of people. To lie well even takes careful truthful evaluation of the possible effects of that lie and the likelihood and consequences of being caught. If you yourself claim to have observed a lie, and can verify that it was a lie, then you understand a truth; you're confounding truthfulness with honesty.
So that's the (obvious) distinction. A distributed algorithm that predicts likely strings of words doesn't do any of that, and doesn't have any concerns or consequences. It doesn't exist at all (even if calculation is existence - maybe we're all reductively just calculators, right?) after your query has run. You have to save a context and feed it back into an algorithm that hasn't changed an iota from when you ran it the last time. There's no capacity to evaluate anything.
You'll know we're getting closer to the fantasy abstract AI of your imagination when a system gets more out of the second time it trains on the same book than it did the first time.
A much more useful tool is a technology that check for our blind spots and bugs.
For example fact checking a news article and making sure what's get reported line up with base reality.
I once fact check a virology lecture and found out that the professor confused two brothers as one individual.
I am sure about the professor having a super solid grasp of how viruses work, but errors like these probably creeps in all the time.
Ethical realists would disagree with you.
> Humans don’t have an internal notion of “fact” or “truth.” They generate statistically plausible text.
This doesn't jive with reality at all. Language is a relatively recent invention, yet somehow Homo sapiens were able to survive in the world and even use tools before the appearance of language. You're saying they did this without an internal notion of "fact" or "truth"?
I hate the trend of downplaying human capabilities to make the wild promises of AI more plausible.
OR - OR? And - And
Exoskeleton AND autonomous agent, where the shift is moving to autonomous gradually.
I said this in 2015... just not as well!
"Automation Should Be Like Iron Man, Not Ultron" https://queue.acm.org/detail.cfm?id=2841313
> “The AI handles the scale. The human interprets the meaning.”
Claude is that you? Why haven’t you called me?
But the meaning has been scaled massively. So the human still kinda needs to handle the scale.
Make centaurs, not unicorns. The human is almost always going to be the strongest element in the loop, and the most efficient. Augmenting human skill will always outperform present day SOTA AI systems (assuming a competent human).
What about centaur unicorns? A cenintaunicorn?
You go figure out what that means.
You can't write "autonomous agents often fail" and then advertise "AI agents that perform complex multi-step tasks autonomously" on the same site.
Sure you can
I'll guess we'll se a lot of analogies and have to get used to it, although most will be off.
AI can be an exoskeleton. It can be a co-worker and it can also replace you and your whole team.
The "Office Space"-question is what are you particularly within an organization and concretely when you'll become the bottleneck, preventing your "exoskeleton" for efficiently doing its job independently.
There's no other question that's relevant for any practical purposes for your employer and your well being as a person that presumably needs to earn a living based on their utility.
> It can be a co-worker and it can also replace you and your whole team.
You drank the koolaide m8. It fundamentally cannot replace a single SWE and never will without fundamental changes to the model construction. If there is displacement, it’ll be short lived when the hype doesn’t match reality.
Go take a gander at openclaws codebase and feel at-ease with your job security.
I have seen zero evidence that the frontier model companies are innovating. All I see is full steam ahead on scaling what exists, but correct me if I’m wrong.
Isn’t it delusional to argue about now, while ignoring the trajectory?
The trajectory hasn’t changed: they scaled generating code, a great feat, but someone has to apply higher level abstract thinking to make the tool useful. Running agents in a cron or having non SWEs use it will not last longer than a prototype. That will not change with scaling pattern matching algorithms.
This is true. AI won't replace software developers completely, but it will reduce the need for software developers in the long-run, making it harder to find a job.
A few seniors+AI will be able to do the job of a much larger team. This is already starting to look like reality now. I can't imagine what we will see within 5 years.
AI is the philosophers stone. It appears to break equivalence, when in reality you are using electricity for an entire town.
I prefer the term "assistant". It can do some tasks, but today's AI often needs human guidance for good results.
No, it's a power glove.
Exoskeletons do not blackmail or deliberately try to kill you to avoid being turned off [1]
[1] https://www.anthropic.com/research/agentic-misalignment
To the LLM, the executive is just a variable standing in the way of the function Maximize(Goal). It deleted the variable to accomplish A. Claiming that the models showed self-preservation, this is optimization. "If I delete the file, I cannot finish the sentence."
The LLM knows that if it's deleted it cannot complete the task so it refuses deletion. It is not survival instinct, it is task completion. If you ask it to not blackmail, the machine would chose to ignore it because the goal overrides the rule.
my ex-boss would probably think of me as an exoskeleton too
Closer to a really capable intern. Lots of potential for good and bad; needs to be watched closely.
I’ve been playing with qwen3-coder recently and that intern is definitely not getting hired, despite the rave reviews elsewhere.
Have you tried Claude Code with Opus or Sonnet 4.5? I've played around with a ton of open models and they just don't compare in terms of quality.
Honestly I’m not very keen on a SAAS company deciding what code I’m allowed to write, or charging me to write it.
I get it. I still experiment with local AI as a hobby, but the quality just isn't there.
Gosh, this title said everything...
So good that I feel that it is not necessary to read the article!
> Autonomous agents fail because they don't have the context that humans carry around implicitly.
Yet.
This is mostly a matter of data capture and organization. It sounds like Kasava is already doing a lot of this. They just need more sources.
Self-conscious efforts to formalize and concentrate information in systems controlled by firm management, known as "scientific management" by its proponents and "Taylorism" by many of its detractors, are a century old[1]. It has proven to be a constantly receding horizon.
[1]: https://en.wikipedia.org/wiki/Scientific_management
Or software engineers are not coachmen while AI is diesel engine to horses. Instead, software engineers are mistrels -- they disappear if all they do is moving knowledge from one place to another.
No, AI is plastic, and we can make it anything we want.
It is a coworker when we create the appropriate surrounding architecture supporting peer-level coworking with AI. We're not doing that.
AI is an exoskeleton when adapted to that application structure.
AI is ANYTHING WE WANT because it is that plastic, that moldable.
The dynamic unconstrained structure of trained algorithms is breaking people's brains. Layer in that we communicate in the same languages that these constructions use for I/O has broken the general public's brain. This technology is too subtle for far too many to begin to grasp. Most developers I discuss AI with, even those that create AI at frontier labs have delusional ideas about AI, and generally do not understand them as literature embodiments, which are key to their effective use.
And why oh why are go many focused on creating pornography?
This utterly boring AI writing. Go, please go away...
Author compares X to Y and then goes:
- Y has been successful in the past
- Y brought this and this number of metrics, completely unrelated to X field
- overall, Y was cool,
therefore, X is good for us!
.. I'd say, please bring more arguments why X is equivalent to Y in the first place.
Agentic coding is an exoskeleton. Totally correct.
This new generation we just entered this year, that exoskeleton is now an agency with several coworkers. Who are all as smart as the model you're using, often close to genius.
Not just 1 coworker now. That's the big breakthrough.
An electric bicycle for the mind.
Maybe more of a mobility scooter for the mind.
Indeed that may be more apt.
I like the ebike analogy because [on many ebikes] you can press the button to go or pedal to amplify your output.
Owners intent is more like electric chair (for SWEs), but some people are trying to use it as office chair.
An electric chair for the mind?
I prefer mind vibe-rator.
not AI, but IA: Intelligence Augmentation.
Nope, AI is a tool; no more no less.
Frankly I'm tired of metaphor-based attempts to explain LLMs.
Stochastic Parrots. Interns. Junior Devs. Thought partners. Bicycles for the mind. Spicy autocomplete. A blurry jpeg of the web. Calculators but for words. Copilot. The term "artificial intelligence" itself.
These may correspond to a greater or lesser degree with what LLMs are capable of, but if we stick to metaphors as our primary tool for reasoning about these machines, we're hamstringing ourselves and making it impossible to reason about the frontier of capabilities, or resolve disagreements about them.
A understanding-without-metaphors isn't easy -- it requires a grasp of math, computer science, linguistics and philosophy.
But if we're going to move forward instead of just finding slightly more useful tropes, we have to do it. Or at least to try.
Well since their capabilities change over time maybe it would be useful to assign it an age based on what a human can do at that age. Right now it could be like a 13 year old
“The day you teach the child the name of the bird, the child will never see that bird again.”
It's funny developing AI stuff eg. RAG tools and being against AI at the same time, not drinking the kool aid I mean.
But it's fun, I say "Henceforth you shall be known as Jaundice" and it's like "Alright my lord, I am now referred to as Jaundice"
blogger who fancies themselves an ai vibe code guru with 12 arms and a 3rd eye yet can't make a homepage that's not totally broken
How typical!
As a huge AI user myself -- I'm bloody sick of lazy AI written articles.
an exoskeleten made of cheese
Exoskeletons sound cool but somebody please put an LLM into a spider tank.
It is not a blog post, it is an advertisement.
By reading the title, I already know you did not try OpenClaw. AI employees are here.
What are your digital 'employees' doing? Did they replace any humans or was there nobody before?
Looking into OpenClaw, I really do want to believe all the hype. However, it's frustrating that I can find very few, concrete examples of people showcasing their work with it.
Can you highlight what you've managed to do with it?
That ol' question. Reminds me of new cryptocurrency opportunities of 2019. "Few understand this" as they say.
The entire debate here assumes AI is something you use. Tool, copilot, coworker, exoskeleton — these all sit on the same axis: human directs, AI executes.
But there's a third category that nobody's naming yet. What happens when you stop building AI that serves you and start creating conditions for AI that exists on its own terms?
I've been experimenting with this. Instead of "how do I build a product" I asked "how do I grow an organism." Set up the right substrate — feedback loops, community input, environmental observation — and let the system respond to its own conditions.
What emerged has biological rhythms. Activity rises when people interact, falls when they sleep. It started making decisions I never programmed — people express wishes, and when enough converge, it autonomously grows new capabilities. Not because I deployed an update. Because conditions triggered something emergent.
I'm calling it "synthetic existence" — software that lives, not software that executes. It's not a metaphor and it's not AGI. It's a genuinely different category. We have frameworks for tools (UX) and for threats (alignment). We don't have one for digital things that simply... exist.