The AI skeptic's guide to AI collaboration

(hils.substack.com)

47 points | by swyx 3 months ago

63 comments

or_am_i 3 months ago
A lot of the same kind of skill goes into prompting AI and delegating work to other humans. Delegation requires building intellectual empathy for the task recipient, giving them an instruction they can verifiably follow. It requires building trust, and more often than not requires a certain degree of trial/error/watching others work before one can delegate reliably. A lot goes into delegation, and much of this stuff is hard! It's also hard to be delegated to -- especially by someone you haven't worked with before, what is it that they mean when they ask for "more sparkles in the UI" or "I tried C and it didn't work"? Can I guess their background to meet them where they are? The list goes on.
In some ways it's easier to delegate to an AI because you don't have to care for anyone's feelings but your own, and you lose nothing but your own time when things don't go well and you have to reset. On the other hand, when the delegation does not go well, you still got yourself to blame first.
[-]
- cube2222 3 months ago
  This is very accurate imo - it really is the skill of proper delegation. Same for asking AI questions in an unbiased way so it doesn’t just try to please you - this has made me better at asking questions to people as well!
  It’s like a slightly over-eager junior-mid developer, which however doesn’t mind rewriting 30k lines of tests from one framework to another. This means I can let it handle that dirty work, while focusing on the fun and/or challenging parts myself.
  I feel like there’s also a meaningful split of software engineers into those who primarily enjoy the process of crafting code itself, and those that primarily enjoy building stuff, treating the code more as a means to an end (even if they enjoy the process of writing code!). The former will likely not have fun with AI, and will likely be increasingly less happy with how all of this evolves over time. The latter I expect are and will mostly be elated.
  [-]
  - dingnuts 3 months ago
    > It’s like a slightly over-eager junior-mid developer
    One with brain damage maybe, I tried out having Claude & Gemini modify a Go program with an absolutely trivial change (change the units displayed in an output type) and it got one of the four lines of code correct (the actual math for the unit conversion) and the rest was incorrect.
    In the end, I integrated the helper function it output myself.
    SOTA models can generate two or three lines of code accurately at a time and you have to describe them with such specificity that I've usually already done the hard part of the thinking by the time I have a specific enough prompt, that it's easier to just type out the code.
    At best they save me looking up a unit conversion formula, which makes them about as useful as a search engine
    [-]
    - cube2222 3 months ago
      That sounds very unlike my experience. I frequently get it to modify / create large parts of files at a time, successfully.
- dingnuts 3 months ago
  > you lose nothing but your own time when things don't go well and you have to reset
  Crucially, you lose money with a lot of these models when they output the wrong thing, because you pay by token whether the tokens coming out are what you want or not.
  It's a bit like a slot machine. You write your prompt, insert some money, and pull the lever. Sometimes it saves you a lot of time! Sometimes, not so much. Sometimes it gets 80% of the way and you think oh, let me just put in another coin and tweak my prompt and pull the lever, this time it will get me 100%
  Listening to people justify pulling the lever over and over again is a little bit like listening to an addict excusing their behavior.
  I realize there are flat rate plans like Kagi offers, but the API offerings and IDE integrations all feature the slot machine and sunk cost effects that I describe.
  [-]
  - or_am_i 3 months ago
    Yeah, I guess I'm silently assuming the developer's time is more valuable than the API costs (which is true in the majority of use cases in US+EU, unless using parallel/multi-shot strategies or hyper-expensive frontier models).
    I agree, it can feel a lot like a slot machine at times, and it's a failure mode somewhat unique to developing with LLM, where it doesn't just fail outright or tell you "I don't know how to do that", but instead you find yourself in the end of a sometimes hours long spiral of trying just-one-more-prompt.
    It's important to experience this mode of failure and learn to notice the "spiral" early and adjust the approach. Sometimes it's enough to switch to a different model, often an explicit planning step helps. But more likely than not, a "spiral" means approaching the frontier of LLM possibility. In my experience, certain types of changes are really hard for current gen LLM to pull off, like large scale refactorings changing the project architecture, or implementing genuinely novel algorithmic ideas, so we still need a human touch for these (yay?)
zenon 3 months ago
It's surprising to me that these things are so hard to use well. If you asked me before ChatGPT to guess how the user experience with this kind of technology would be, I would have said I expect it to be as intuitive as talking, almost no friction. I think this is a natural expectation that, when violated, turn a lot of people off.
[-]
- raincole 3 months ago
  > intuitive as talking
  Except talking is not intuitive. It's an unbelievably hard skill. How many years have you spent on talking until you can communicate like an adult? To convey complicated political, philosophical, or technical ideas? To express your feelings honestly without offending others?
  For most people it takes from 20 years to a lifetime. Personally I can't even describe a simple (but not commonly known) algorithm to another programmer without a whitboard.
  [-]
  - shakna 3 months ago
    I was speaking two languages at two years old, and debating political systems by ten. I'm not really sure that talking is actually that hard, depending on your cultural background. The more diverse, the easier you may find it to convey incredibly complex concepts. I'm not an outlier - I'm a boring statistical point.
    I've heard plenty of overly complicated explanations of what a monad is. It's also not a complicated concept. Return a partial binding until all argument slots are filled, then return the result of the function. Jargon gets in the way of simple explanations. Ask a kid to explain something, and it will probably be a hell of a lot clearer.
    The more experience you have, the harder it often is to draw out something untainted by that experience to give to someone else. We are the sum of our experience, and so its so darn easy to get lost in that, rather than to speak from where the other person is standing.
- Jensson 3 months ago
  > I would have said I expect it to be as intuitive as talking, almost no friction
  There is so much friction when you try to do anything technical by talking to someone that don't know you, you have to know each other extremely well for there to be no friction.
  This is why people prefer communicating in pseudo code rather than natural language when discussing programming, its really hard to describe what you want in words.
- oezi 3 months ago
  For me this is exactely one of the biggest developments as LLMs became available: They 'get it' much more than the previous tech (search engines) and fill in the blanks much more than previously thought possible.
  Sure if you leave out too much context you get generic responses but that isn't too surprising.
shikon7 3 months ago
> illustrative 30-day calendar of exercises
That there is such a calendar for using ChatGPT in the style of topics like "how to eat healthy", "how to stay fit" or "how to be more confident" shows to me more than anything what impact AI has on our society.
[-]
- 3 months ago
  [deleted]
- ashoeafoot 3 months ago
  It shows the desperation of the bubble, the retarded miracles must sell to keep the "soonstartrek" delusion vector going, the customers be damned.
ptx 3 months ago
So it sounds like the proposed use-case is to use LLMs for generating feedback on your own work.
But if we accept that LLMs generally (in other use-cases) produce output that looks deceptively similar to what you ask for (i.e. it seems to work) but is actually worthless junk if carefully inspected (i.e. it doesn't actually work), why would you think they are able to generate accurate feedback?
[-]
- crashabr 3 months ago
  [dead]
nowittyusername 3 months ago
While the current state of generative AI isn't yet capable of full automation, it will soon enough. But it's also important to understand that the candle is burning from both ends. Here is what I mean by that. The acceptance rate of lower quality goods, work, and pretty much everything has gone up. People, corporations, and everything in between has been "enshitefied" and that is where the crucial miscalculation happens by people who claim that AI will never replace people in this or that field. Your food is 60% percent lower quality, your goods are worse quality, your entertainment is worse quality, and your children have been inundated by brain rot for at least a whole generation now. The standards are lower then they have ever been on pretty much everything, and so are the requirement standard for whatever it is you consider your career. And very soon Ai will fill that shitty position just fine.
HexDecOctBin 3 months ago
> The people who are most skeptical of AI are often those with the highest standards for quality.
From Anger to Denial to Bargaining. And we are starting out with flattery. Masterful gambit!
Instead of participating in slop coding (sorry, "AI collaboration"), I think I'll just wait for the author and their ilk to make their way across Depression and Acceptance.
[-]
- 3 months ago
  [deleted]
croes 3 months ago
> The problem is that most people misunderstand what AI is good at. They talk about it "taking over" writing, planning, and problem-solving—as if these were simple, mechanical tasks that could be fully automated without any loss in quality.
Because that’s the claim of all the AI companies. Right next to the claim that AGI is in reach.
The question is if all use AI will all text become too similar.
[-]
- namaria 3 months ago
  Talking to some younger colleagues over drinks the other evening they showed me their Instagram feed. It's all AI slop. Machine generated jokes.
  For all the talk about jobs and art, LLMs seem to love shitposting.
  [-]
  - otabdeveloper4 3 months ago
    Making racist memes is like the only thing generative AI is better at than real humans. (Makes sense if you consider that "stereotypes" is just another word for "likelihood estimates".)
orbital-decay 3 months ago
The problem is that current AI companies are ignoring domain expertise in favor of overly generalist models. "Meh, we have AGI planned for tomorrow anyway, it will sort everything out by itself. Somehow." This is understandable (see the "Bitter lesson"), but particular knowledge domains are so deep that you can't just ignore them, you'll produce a metric ton of crap if you stay oblivious. No matter how advanced your model is, without consulting with actual experts for fundamentals it will always miss the mark and look off.
Anthropic used to do this with Claude's character until Claude 3, but then dropped it. OAI's image generation is consistently ahead in prompt understanding and abstraction, but they famously don't give a flying turd about nuances. Current models are produced by ML nerds that are handwaving the complexity away, not by experts in what they're trying to solve. If they want it to be usable now, they need to listen to this kind of people [1]. But I don't think they really care.
[1] https://yosefk.com/blog/the-state-of-ai-for-hand-drawn-anima...
[-]
- oezi 3 months ago
  But what kind of magic sauce are experts really based on in your opinion? Something which hasn't been written down in the thousands of books on any technical subject?
  In my opinion it is ridiculous to still say that there is anything fundamentally different between human intelligence and scaling LLMs another 10x or 100x.
  [-]
  - orbital-decay 3 months ago
    Valid question, and yes I don't think there's any difference in performance.
    However I'm not talking about technical tasks with objectively measurable criteria of success (which is a super narrow subset, not even coding is entirely like this). I'm saying that you have to transfer some kind of human preference to the model, as unsupervised learning will never be able to infer an accurate reference point for what you subjectively want from the pretraining data on its own, no matter the scale. Even if I'm wrong on that somehow, we're currently at 1x scale, and model finetuning right now is a pretty hands-on process. It's clear that ML people that usually curate this process have a really vague idea of what looks/reads/feels good. Which is why they produce slop.
    TFA is talking about that:
    >AI doesn’t understand why something matters, can’t independently prioritize what’s most important, and doesn’t bring the accountability or personal investment that gives work its depth and resonance.
    Of course it doesn't, because it's not trained to understand it. Claude was finetuned for "human likeness" up to the version 3, and Opus had really deep understanding of why something matters, it had better agency than any current model, and a great reference point for your priorities. That's what happens when you give the curation to a non-ML adjacent person who knows what she's doing (AFAIK she left Anthropic since then and Anthropic seemingly dropped that "character training" policy).
    Check 4o's image generation as well - it has terrible yellow tint by default, thick constant-width linework in "hand-drawn" pictures etc. You can somewhat steer it with a prompt and references, but it's pretty clear that the people that have been finetuning it didn't have a good idea whether their result is any good, so they made something instantly recognizable as slop. This is not just a botched training run or a dataset preparation bug, it's a recurring pattern for OpenAI, they simply do not care about this. The recurring pattern for Midjorney, for example, is to finetune their models on kitsch.
    This all could be fixed in no time, making these models way more usable as products, right now, not someday when they maybe reach the 100x scale (which is neither likely to happen nor likely to change anything).
    [-]
    - oezi 3 months ago
      Thanks for your reply. Well reasoned.
      I am with you that the current dichotomy of training vs. inference seems unsustainable in the long run. We need ways for LLMs to learn from the interactions they are having, we might need introspection and self-modification.
      I am not sure we need more diversity. Part of your argument sounds to me like we do. Slop (to me) is primarily the result of over-generalizing to everyone's taste. We get generic replies and generic images rather than consistently unique outcomes which we could call a personality.
      >AI doesn’t understand why something matters.
      I beg to differ. LLMs have seen all the reasons why something could matter. This is how they do everything. This is also how the brain works: You excite neurons with two concepts at a similar time and they become linked. For causality/correlation/memory...
      I also agree with you that too much reliance on RLHF has not been the best idea. We are overfitting what people want rather than what people should want if they knew. LLMs are too eager to please and haven't yet learned how much teenage rebellion is needed for progress.
- 0xCafeBabee 3 months ago
  [dead]
bmink 3 months ago
> It's like having a thoughtful and impossibly fast colleague who's always available to help me develop and sharpen my ideas.
More like an absolute bumbling idiot of a colleague that you have to explain things over and over again and can’t ever trust to get anything right.
[-]
- TrackerFF 3 months ago
  User error.
  [-]
  - kemotep 3 months ago
    When it makes up a PowerShell command that is a user error?
    When it takes longer to prompt it with the details you would want in an email than to just write the email, that is user error?
    Like I get the use case with summarization or translation but I can’t trust the output 100% when I know complete nonsense could be output.
    [-]
    - Tenoke 3 months ago
      Having errors is not the user error - Google will also return you bad results but I'd still consider it user error if someone can't avoid the bad results well enough to find some use for it.
- hyfgfh 3 months ago
  Yes, and I already got those
  [-]
  - bravetraveler 3 months ago
    Now, running with scissors
- apwell23 3 months ago
  > More like an absolute bumbling idiot
  sam altman said AI would "clone his brain" by 2026. He is wrong, it already has.
  I've listened to him speak many times and thats an accurate description. seriously, has he ever said even one interesting thing ?
  [-]
  - stogot 3 months ago
    Still unclear why OpenAI wanted him back in power. He’s lost their lead and their top talent
    [-]
    - Jensson 3 months ago
      Because the alternative wouldn't push so hard for profits, their shares would go down in value and very few people want that.
    - ccppurcell 3 months ago
      Didn't he once say he really wanted to start a religion, but it's easier to start a business?
      [-]
      - apwell23 3 months ago
        "AI will solve all of physics "
        these guys are making shit up on the fly now. anything goes.
3 months ago
[deleted]
dlvhdr 3 months ago
I don’t get the “just spend more time with AI” argument. Its not a skill, stop trying to make it one. Why should I spend 30 days with it? The only thing that would accomplish is taking the soul and joy out of everything. Everyone just sound like they don’t like coding.
[-]
- jsnell 3 months ago
  Of course using AIs is a skill, just like e.g. effectively writing search queries used to be a skill back in the day. When I actually tried getting something done with AI models for the first time, rather than just kicking the tires with the implicit motivation of showing how useless they were, it took way more iterations to get a satisfactory output at the start than a week later.
  The kinds of things you'll learn are:
  - What's even worth asking for? What categories of requests just won't work, what scope is too large, what kinds of things are going to just be easier to do yourself?
  - Just how do you phrase the request, what kind of constraints should you give up front, what kind of things do you need to tell it that should be self-evident but aren't?
  - How do you deal with sub-optimal output? Whe do you fix it yourself, when do you get the AI to iterate on it, when do you just throw out the entire sessions and start afresh?
  The only way for it to not be a skill would be if how you use an AI either did not matter for the quality output, or if getting better results just a natural talent some people have and some don't. Both of those seem like pretty unrealistic ideas.
  I think there's probably a discussion to be had about how deep or transferrable the skill is, but your opening gambit of "it's not a skill, stop trying to make it one" is not a productive starting point for such a discussion.
  [-]
  - mrweasel 3 months ago
    > What's even worth asking for?
    That seems to be a struggle for many. A friend of my wife turned 50 and we went to her birthday party. Two speechs and one song was AI generated, two speeches where written by actual humans, guess which should never have been created, let alone performed?
    More and more I struggle to see the point of LLMs. I can sort of convince myself that there are niches where LLMs are really useful, but it's getting harder to maintain that illusion. There are cases where AI technologies are truly impressive and transformative, but they are rarely based on a chat interface.
  - rini17 3 months ago
    But why would you devote much time and energy trying to massage AI when you can instead apply directly to the problem? With likely more satisfying process and result. You paint it as if that was some prejudice.
    [-]
    - jsnell 3 months ago
      No, I was pretty careful to address only the very specific claim the OP made about how effective AI use is not a skill. If you're reading anything more than that into the comment, I think you're projecting. I really don't care at all whether you or the OP use AIs, and am not trying to convince you of that either way.
      [-]
      - rini17 3 months ago
        My personal experience is that it might be called a skill like learning to use dull knife can be called a skill. I might be mistaken, but I need to see clear process and result. Not lengthy comments like "no it's still useful but you need to approach it deliberately".
        And rest assured I don't care about you either(why such tone lol).
- namaria 3 months ago
  I agree, it is absolutely not a skill. LLMs are a black box and the models keep changing under you, and their output can change if you try the exact same input more than once.
  People claiming it's a skill should read up on experiments on behavior adaptation to stochastic rewards. Subjects develop elaborate "rain dances" in the belief that they can influence the outcome. Not unlike sports fans superstitions.
  [-]
  - fhd2 3 months ago
    This. If there was some stability in the space, you could empirically develop good practices that probably beat naive practices. But since everything changes every couple of months and since you'll usually want to try different models on an ongoing basis, I found I'm doing just fine with a very small bag of tricks.
    Sure, by definition, prompting is a skill. But it's a skill that really isn't hard to learn, and the gap between a beginner and a master is pretty narrow. The real differentiator is understanding the domain you're promoting for deeply, e.g. software development or visual design. Most value comes out of knowing what to ask for, and knowing how to evaluate the results.
  - or_am_i 3 months ago
    Analogy would have been correct if prompting didn't influence the output (which I hope you agree is not the case).
    And yes, the model keeps changing under you -- much like a horse is changing under a jockey, forcing them to adapt. Or like formula drivers and different car brands.
    You can absolutely improve the results by experimenting with prompting, by building a mental mode of what happens inside the "black box", by learning what kinds of context it has/does not have, how (not) to overburden it with instructions etc. etc.
  - Xmd5a 3 months ago
    And yet prompts can be optimized.
    [-]
    - AlexeyBrin 3 months ago
      You can optimize a prompt for a particular LLM model and this can be done only through experimentation. If you take your heavily optimized prompt and apply it to a different model there is a good chance you need to start from scratch.
      What you need to do every few months/weeks depending of when the last model was released is to reevaluate your bag of tricks.
      At some point it becomes a roulette - you try this, you tray that and maybe it works or maybe not ...
      [-]
      - Xmd5a 3 months ago
        Stumbled upon this in another thread:
        https://ai-analytics.wharton.upenn.edu/generative-ai-labs/re...
        My point still holds that it is optimizable though (https://github.com/zou-group/textgrad, https://arxiv.org/abs/2501.16673)
        >Subjects develop elaborate "rain dances" in the belief that they can influence the outcome. Not unlike sports fans superstitions.
        Anybody tuning neural weights by hand would feel like this.
- latexr 3 months ago
  > Everyone just sound like they don’t like coding.
  It’s no secret that a lot of people (I’d like having an accurate percentage) got into coding because of the money. When you view it from that perspective everything becomes clearer: those people don’t care for the craft or correctness. They don’t understand or care when something is wrong and it’s in their best interest to convince themselves and everyone else that AI is as good or better than any expert programmer because it means they themselves don’t need to improve or care more than what they already don’t, they can just concentrate on the getting rich part.
  There are experts (programmers or otherwise) who use these tools as guidelines and always verify the output. But too often they defend LLMs as unambiguously good because they fail to understand the overwhelming majority of humans aren’t experts or sufficiently critical of whatever they read, taking whatever the LLM spits as gospel. Which is what makes them dangerous.
- NitpickLawyer 3 months ago
  > Its not a skill, stop trying to make it one.
  Using it efficiently is absolutely a skill. Just like google-fu is a skill. Or reading fast / skimming is a skill. Or like working with others is a skill. And so on and so on.
  [-]
  - ninetyninenine 3 months ago
    Agreed it’s a skill in the same way walking is a skill.
    [-]
    - j-bos 3 months ago
      I'd using bicycling as the analogy, some people never learn and thus don't understand the gains it provides.
      [-]
      - ninetyninenine 3 months ago
        Yeah. It’s a skill. I used walking to basically say it’s like a universal skill that’s deadly easy to learn. So easy that it doesn't even feel like a skill.
        Bicycling is slightly harder than walking.
- mr_mitm 3 months ago
  So strange, I haven't had this much fun at coding in a long time. It's amazing.
  [-]
  - regularjack 3 months ago
    Why is it strange? Different people enjoy different things. Seems normal to me
- simonw 3 months ago
  It's a skill. The more time (and intentional practice) you invest in it the better you'll get at it.
noodletheworld 3 months ago
> The only thing that I have seen convince people (and it always does)
...when anyone starts talking in universals like this, they're usually deep in some hype cycle.
This is a problematic approach that many people take; they posit that:
1) AI is fundamentally transformative.
2) People who don't acknowledge that simply haven't tried it.
However, I posit that:
3) People who think that haven't actually used it a serious capacity or are deliberately misrepresenting things.
The problem is that:
> In reality, I go back and forth with AI constantly—sometimes dozens of times on a single piece of work. I refine, iterate, and improve each part through ongoing dialogue. It's like having a thoughtful and impossibly fast colleague who's always available to help me develop and sharpen my ideas.
...is only true for trivial problems.
The author calls this out, saying:
> It won't excel at consistently citing specific papers, building codes, or case law correctly. (Advanced techniques exist for these tasks, but they're not worth learning when you're just starting out. For now, consider them out of scope.)
...but, this is really the heart of everything.
What are those advanced techniques? Seriously, after 30 days of using AI if all you're doing is:
> Prepare for challenging conversations by using ChatGPT to simulate potential scenarios, helping you approach interpersonal dynamics with empathy and grace.
Then what the absolute heck are you doing.
Stop gaslighting everyone.
Those 'advanced techniques' are all anyone cares about, because they are the things that are hard, and don't work.
In reality, it doesn't matter how much time you spend learning; the technology is fundamentally limited. It can't do some things.
Spending time learning how to do trivial things will never enable you to do hard things.
It's not missing the 'human touch'.
It's the crazy hallucinations, invalid logic, failure to do as told, flat out incorrect information or citations, inability to perform a task (eg. as an agent) without messing some other thing up.
There are a few techniques that can help you have an effective workflow; but seriously, if you're a skeptic about AI, spending a month doing trivial stuff like asking for '10 ideas about X' is an insult to your intelligence and doesn't address any of the concerns that, I would argue, skeptics and real people actually have about AI.
[-]
- latexr 3 months ago
  > This is a problematic approach that many people take; they posit that
  It’s like the people who think that everyone who opposes cryptocurrencies only do so because they are jealous they didn’t invest early.
- skydhash 3 months ago
  Let’s take vim and emacs, or bash. People do not spend years on them only for pleasure or fun, it’s because they’re trying to eliminate tedious aspects of their previous workflows.
  That’s the function of a tool. To help do something in a more relaxed manner. Learning to use it can take some time, but the acquired proficiency will compensate for that.
  General public LLMs have been there for two years, and still today, there are no concrete uses cases that can have the definition of tools. It’s trust me bro! and warnings in small print.
  [-]
  - otabdeveloper4 3 months ago
    > there are no concrete uses cases that can have the definition of tools
    There are some, but you won't like them. Three big examples:
    a) Automating human interactions. (E.g., "write some birthday wishes for my coworker".)
    b) Offensive jokes and memes.
    c) Autogenerated NPC's for role-playing games.
    So, generally things that don't require actual intelligence. (Weird that empathy is the first thing we managed to automate away with "AI".)