I recently spent over an hour trying to get ChatGPT to give me some pretty simple rsync commands. It kept giving me command line parameters that didn't work on the version of rsync on my mac. With ~50% of the failures, it would go down troubleshooting rabbit holes and the rest of the time it would "realize" that it was giving incorrect version responses. I tell it to validate each parameter against my version moving forward and it clearly doesn't do that. I am sure I could have figured it out on my own in 5 mins, but I couldn't stop watching the trainwreck of this zeitgeist tech wasting my time doing a simple task.
I am not a coder (much), but I have to wonder if my experience is common in the coding world? I guess if you are writing code against the version that was the bulk of its training then you wouldn't face this specific issue. Maybe there are ways to avoid this (and others) pitfall with prompting? As it is, I do not see at all how LLMs could really save time on programming tasks without also costing more time dealing with its quirks.
> I tell it to validate each parameter against my version moving forward and it clearly doesn't do that.
I would like an AI expert to weigh in on this point. I run into this a lot. It seems that LLMs, being language models and all, don't actually understand what i'm asking. Whenever i dive into the math, superficially, it kind of makes sense why they don't. But it also seems like transformers or some secret sauce is code up specially for tasks like counting letters in a word so that AI doesn't seem embarrassing. Am I missing something?
LLMs are next-token prediction models. They "understand" in that, if the previous 1000 tokens are such-and-such, then they emit a best guess at the 1001th token.
It "knows" what rsync is because it has a lot of material about rsync in the training data. However it has no idea about that particular version because it doesn't have much training data where the actual version is stated, and differences are elaborated.
What would probably produce a much better result if you included the man page for the specific version you have on your system. Then you're not relying on the model having "memorized" the relationship relationships among the specific tokens you are trying to get the model to focus on, instead just passing it all in as part of the input sequence to be completed.
It is absolutely astounding that LLMs work at all, but they're not magic, and some understanding of how they actually work can be helpful when it comes to using them effectively.
Our low-code expression language is not well-represented in the pre-training data. So as a baseline we get lots of syntax errors and really bad-looking UIs. But we're getting much better results by setting up our design system documentation as an MCP server. Our docs include curated guidance and code samples, so when the LLM uses the server, it's able to more competently search for things and call the relevant tools. With this small but high-quality dataset, it also looks better than some of our experiments with fine tuning. I imagine this could work for other docs use cases that are more dynamic (ie, we're actively updating the docs so having the LLM call APIs for what it needs seems more appropriate than a static RAG setup).
did you feed the context a link or examples of the exact version of the documentation?
I am not an expert but i assume if there are any basic knowledge of the tool then it will try to use it as it knows chunks of some old version. And it wont likely decide to search for the newest docs, you have to tell it search for exact version docs, or feed the exact version docs to context
I find context _sometimes_ work but more often than not i rephrase the question a million times, try to break it down into smaller problems, .. whatever. It seems like it just doesn't "understand".
> I am not a coder (much), but I have to wonder if my experience is common in the coding world?
It is, yes. Surely someone will come and tell you it doesn’t happen to them, but all that tells you is that it ostensibly isn’t universal, but still common enough you’ll find no end of complaints.
> Maybe there are ways to avoid this (and others) pitfall with prompting?
Prompting can’t help you with things not in the training set. For many languages, all LLMs absolutely suck. Even for simple CLI tools, telling an LLM you are on macOS or using the BSD version may not be enough to get them to stop giving you the GNU flags. Furthermore, the rsync change in macOS is fairly recent so there’s even fewer data online on it.
> As it is, I do not see at all how LLMs could really save time on programming tasks without also costing more time dealing with its quirks.
And that’s the best case scenario. It also happens that people blindly commit LLM code and introduce bugs and security flaw they cannot understand or fix.
Usually, in these edge cases, I go to the documentation page and dump all pages as Markdown into the AI tool (most often Gemini, due to token count). This context engeneering has helped a lot to get better answers. However, it also means I am consuming sometimes 1 Million tokens on relatively simple problems. Like recently, when I needed to solve a relativey simple but specific MermaidJS issue.
Ask Gemini Pro 2.5 to build the rsync command and then give it the man page for your version of rsync. It should succeed the first time.
Here's a command to copy the man page to the clipboard than you can immediately paste into aistudio (on a Mac):
man rsync | col -b | pbcopy
As a general rule, if you would need to look something up to complete a task, the AI needs the same information you do—but it's your job to provide it.
So I paste the man page into llm and tell it to only give me parameters that are in that page? Even if it obeyed, it would still choke on how to exclude hidden MacOS kruft files from the copy...
No, you don't need to "tell it to only give me parameters that are in that page."
Here's the entire prompt:
I need the rsync command to copy local files from `/foo/bar` to `~/baz/qux` on my `user@example.com` server.
Exclude macOS cruft like `.DS_Store`, etc. Here's the man page:
<paste man page for your rsync that you copied earlier, see above>
Yup even when you tell it your version it forgets pretty quickly. Or agrees it messed up and assures you this time it will give you the correct info for your version number then gives you the same command.
Javascript is a nightmare as they change everything constantly. PHP has backwards compatibility for everything so its not really an issue.
It also gives out dated info on salesforce, and im not just talking about the latest and greatest, it recommends stuff that was deprecated years ago.
Lots of reasons! First off: where else do I go to learn this stuff? Man pages are reference for people who work in CLI all the time and not for virgin learners as they are are necessarily packed with the complete lexicon but with barely a thought to explaining real world examples of common tasks. There are a million Linux websites with the same versioning issues and inadequate explanations. I guess I could buy an oreily book and learn the topic end to end even though I will only need to know the syntax of a couple commands.
With an LLM, I can get it to tell me what each parameter it suggests actually does and then I can ask it questions about that to further my understanding. That is a massive leg up over the knowledge spaghetti approach...
> There are a million Linux websites with the same versioning issues and inadequate explanations.
Where do you think the LLM is getting its data from? At least on the website you can see the surrounding discussion and have a chance of finding someone with the same problem as you, or learning something tangential which will help you further down the line.
> With an LLM, I can get it to tell me what each parameter it suggests actually does and then I can ask it questions about that to further my understanding.
If it’s giving you wrong flags, why do you assume the explanations it gives you are accurate? LLMs can make those up just as well.
What you should do is verify the given flags on the man page. Not only will it clarify if they exist, it will also clarify if they’re what you’re looking for, and will likely even point to other relevant options.
Learn to read documentation. It's really a very important skill, and many people were becoming _somewhat_ deficient in it even before our good friends the magic robots arrived, due to Stackoverflow et al.
Bah! Coders need to learn to write documentation. The tldr for rsync was hilarious! Actually, this is one place where LLMs are currently useful! Instead of getting LLMs to write code, get it to write documentation.
One pitfall is the LLM hallucinates, might sometimes seem to fulfill your requirements but it can subtly break down. The man pages could be used in conjunction to fact check your understanding.
tldr pages are a great idea, but the execution is a total fail. Looking at the rsync entry, it fails to provide the most blatantly common requirements:
I just needed two commands: one to mirror a folder from one drive to another, updating only the changes (excluding all of the hidden MacOS cruft). And another command to do a deep validation of the copy. These have to be two of the most commonly used commands right???
In the end, I felt that messing around with precious data without being 100% certain of what I am doing just wasn't worth it so I got a GUI app that was intuitive.
Not the op, but I sometimes find the official documents hard to parse. Not looking at rsync in this case. On the other hand I have the same experience with LLMs as the op.
Big thanks to all the doc writers who include vignettes right in the documents.
> I recently spent over an hour trying to get ChatGPT to give me some pretty simple rsync commands.
Try N times, adding more context about the environment and error messages along the way. If it doesn't work after those, try other models (Claude, Gemini, ...). If none of those work on whatever number of attempts you've chosen, then LLMs won't be able to help you well enough and you should save yourself some time and look elsewhere.
A good starting point is trying for 10-20 minutes, after which point an LLM might actually become slower than you going the old fashioned way of digging into docs and reading forum posts and such. There are also problems that are a bit too complex for LLMs as well and they'd just take you in circles no matter for how long you try.
As a programmer I have noticed this problem much more with command help than with code. Maybe partly because the training data has way more, and more diverse, code examples than all the relevant permutations and use cases for command argument examples.
I'm a coder and I've never had your experience. It usually does an amazing job. I think that coders have an advantage because there are many questions I would never ask an LLM because of my intuition on what would work well and what wouldn't. In your case I would have dumped the output of `rsync --help` into the context window once I saw it wasn't familiar with my particular version of rsync. That's they way these tools work.
I've recently been misled by ChatGPT a lot as well. I think it's the router. I'm on the free plan so I assume they're just being tight with the GPU cycles.
All the time I find that if I know the stack and tools I am working in, it's faster to just write code on my own, manually; If I want to learn on the other hand, LLMs are quite useful - as long as you understand (or learn to) and validate the output
Instead of just saying: rsync on my system is version 3.2, have you tried copy/pasting rsync --help? In my experience, that would be enough for the AI to figure out what it needs to do and which arguments to use. I don't treat AI like an oracle, I treat it like an eager CS grad. I must give it the right information for the task.
Telling the agent to execute "man rsync" and synthesize the answer from there is probably the cheapest and most efficient option.
Letting some detached LLM fumble around for an hour is never the right way to go, and inversely sifting through the man page of rsync or fmmpeg or (God forbid) jq to figure out some arcane syntax isn't exactly a great use of anyone's time either, all things considered.
Sifting through just means you don’t know how to use the man interface to search/grep (which from a discoverability perspective is fair). However I think reeling through an Llm (using an agent or not) for a task that probably could take <10mins at $0, demonstrates enthusiasts disregard for a good set of research and reading habits.
All of this is an attempt at circumventing RTFM because you’re privileged enough to afford it.
Just lay yourself down on the WALL-E floating bed and give up already.
The “fucking” manual is obtuse, overly verbose, and almost always lacking in super clear real world examples. That you claim it is a <10 min problem demonstrates experts disregard for the degree of arcane crap, a beginner needs to synthesize.
I am backing up and verifying critical data here. This is not a task that should be taken lightly. And as I learned, it is not a task that one can rely on an LLM for.
I disagree. The topic is rsync which is well documented. The path forward might not work in every situation but the manual has good examples too. This is probably true for what Llm users use it for 90% of the time. To hack together things that are well known and well documented into a result. Llms arguably only work because these tools ffmpeg, rsync, etc were already solving problems and widely used and documented. So burning energy to have a computer look up commands because you couldn’t spend 10 minutes reading yourself could be a waste of time and money. Where as having to spend time researching is likely only a waste of time only.
What command? Just to clarify there is no example command we’re discussing here. You’re just cherry picking results exclusively from the man page and then arguing that chatgpt is better because it gets to use example documentation from the internet. Well I get to use examples from the internet too.
A search query takes a matter of seconds to type in, select a result and read. No doubt still under 10 minutes.
But still to my original point it’s insanely more expensive to have chatgpt look it up. This doesn’t bother you because you are privileged enough to waste money there. If time is money then IMO the only valuable time I have with my money is when it’s gaining interest and not being spent.
You can abstract away all the “but I had to scroll down the page and click a different result” steps as “time savings” all you want, but no one was wasting a ton of time there for already well established tools. That is a deluded myth.
I’m not sure I even grasped your point. The delete flag is pretty self explanatory and gives you options for more granularity. Why does that take greater than 10 mins? What is the issue with that entry?
Here is what I get when I type `man rsync`:
```
--delete
This tells rsync to delete extraneous files from the receiving
side (ones that aren't on the sending side), but only for the
directories that are being synchronized. You must have asked
rsync to send the whole directory (e.g. "dir" or "dir/") without
using a wildcard for the directory's contents (e.g. "dir/*")
since the wildcard is expanded by the shell and rsync thus gets
a request to transfer individual files, not the files' parent
directory. Files that are excluded from the transfer are also
excluded from being deleted unless you use the --delete-excluded
option or mark the rules as only matching on the sending side
(see the include/exclude modifiers in the FILTER RULES section).
Prior to rsync 2.6.7, this option would have no effect unless
--recursive was enabled. Beginning with 2.6.7, deletions will
also occur when --dirs (-d) is enabled, but only for directories
whose contents are being copied.
This option can be dangerous if used incorrectly! It is a very
good idea to first try a run using the --dry-run (-n) option to
see what files are going to be deleted.
If the sending side detects any I/O errors, then the deletion of
any files at the destination will be automatically disabled.
This is to prevent temporary filesystem failures (such as NFS
errors) on the sending side from causing a massive deletion of
files on the destination. You can override this with the
--ignore-errors option.
The --delete option may be combined with one of the --delete-
WHEN options without conflict, as well as --delete-excluded.
However, if none of the --delete-WHEN options are specified,
rsync will choose the --delete-during algorithm when talking to
rsync 3.0.0 or newer, or the --delete-before algorithm when
talking to an older rsync. See also --delete-delay and
--delete-after.
I pasted the results of typing man rsync into my macbook’s terminal. I looked up the —delete parameter and pasted the entry. Not sure why your entry was more useful - perhaps a version issue (which is at the root of the painful time I have spent trying to learn how to do something trivial).
Later in the man page, it gives examples and totally fails to explain those examples. And yes, for someone who is going to be doing this frequently and professionally. They should understand this deeply and spending the hours required to be fluent in a command with a kitchen sink full of parameters. I, on the other hand, will be executing these commands maybe a few times in a year.
The more I think about it, the more I think the solution here is to use LLM‘s to write better documentation with lenses for different types of users with different needs.
Where I find AI most useful is getting it to do tasks I already know how to do, but would take time.
If you understand the problem you are trying to solve well enough to explain it to the LLM, you can get good results, you can also eyeball the outputted code and know right away if it's what you are after.
Getting it to do things you don't know how to do is where it goes off the rails IMO
This post (aside from the title) is fairly nuanced, with the reality that "let an LLM do all the things" is going to be fraught with problems... but "let an LLM do some very specific things that saves me an hour or so from boilerplate code or tests" is very nice. Of course, I doubt a blog post titled "Sometimes AI is ok, sometimes it's not" would get the clicks.
Well, because nothing in the transformer architecture supports such learning ability. All AI researchers and most serious AI users are aware of this, so I'm not sure I understand the question.
> And if you have a compelling thesis, why hasn't this spread to the investing community?
The investing community believes that they can make money. That's feels pretty much orthogonal to whether the metaphorical intern can learn, and much more related to whether clients can be made to buy the product, one way or another.
They mean it doesn't learn from experience/mistakes after spending time on your codebase. There are workarounds like documenting in CLAUDE.md/CODEX.md/.roorules which dump it back into the active context but are hit and miss in my experience. It's definitely better than nothing but Claude still routinely ignores important directives whenever it's in the right mood.
>There is never enough context. We learned quickly that the more context we provided and the smaller the issues, the better the results. However, no matter how much context we provided, the AI would still mess things up because it didn’t ask us for feedback. AI would just not understand if it didn’t have enough information to finish a task, it would assume, a lot, and fail.
Is it me or does it feels like the genie in the bottle thing. Remember a TV show where the guy and his friend sat down with the Genie like a lawyer to make sure every angle is covered (going to spare you the details here). That is what it feels like interacting to a LLM sometimes.
I think the AI acts like that shitty coworker that is super smart but never tells you what they are thinking so they are likely capable of doing whatever you want them to do but working on a team is asking too much and they are apparently not capable of doing that. Because AI promises that you can interact with it like it is human because of it's chat capabilities but it never ever does something like, "hey, i don't understand this part, can you tell me more of what you mean here?"
Well, except that (I think I know the scene you're referring to), it ultimately worked. The LLM, on the other hand, will feel no need to stick to its 'promises'.
(Really the genie is closer to the traditional sci-fi AI in that it's legalistic and rules-bound; the LLM very much isn't.)
I don't like vibe coding as much as actual coding, but the biggest improvement in my workflow was shifting left even more.
Now I dedicate at least one session to just writing a spec file, and have it ask me clarifying questions on my requirements and based on what it finds in the codebase and online. I ask it to also break down the implementation plan in phases with a checklist for each phase.
I then start at least one new session per phase and make sure to nail down that phase before continuing.
The nice thing is if it gets annoying to vibe code it, I or someone on my team can just use the spec to implement things.
I decided to adopt AI assisted coding for a recent project. Not sure what defines 'vibe coding' but the process I ended up was a iterative interaction at a measured pace.
I used Gemini AI studio for this and I was very pleased at the result and decided to open source it. I have completely captured and documented the development transcript. Personally it has give me considerable productivity boost. My only irritation was the unnecessarily over politeness that AI adopts in My take is
AI yields good ROI when you know exactly what you want at the end of the process and when you want to compare and contrast decision choices during the process.
I have used it for all artifacts of the project:
- Core code base
- Test cases
- Build scripts
- Documentation
- Sample apps
- Utilities
I agree that Google were too slow to move, but entirely disagree with the first part. ChatGPT is very much not a "search engine". Arguably it is an "Answer engine", but more so, it is a conversational partner - I almost never use ChatGPT to just get one response; the real benefit is being able to follow up with it until I'm satisfied. It's an entirely different medium of interaction as compared to search engines.
My preferred approach in similar situations is to ask an LLM for an initial solution or code snippet, then take over manually - no endless prompt tweaking, just stop prompting and start coding. Finally (optionally), I let the LLM do a final pass to review my completed solution for bugs, optimizations, etc.
The key win is skipping the prompt refinement loop, which is (A) tedious and time-consuming, and (B) debilitating in the long run.
"We just don’t think we will incorporate AI to do more than that, given the current state of things. We will, however, keep an eye in case the technology changes fundamentally."
I wonder whether LLMs are capable of doing more; probably, we need another paradigm for that; still, they are very, very useful when used right
I don't see how that is a question. I come up with new ideas to improve the LLM-based tools I'm using at least once a day, and the vast majority of these are plain engineering changes that I could do on my own if I wanted to put the effort into it. I think that even if God comes down from heaven to prevent us from further training the LLMs themselves (if God is listening to Yudkowsky's prayers), then we would still have a good few decades of extensively improving the capabilities of LLM-based tools to extract a massive amount of further productivity by just building better agentic wrappers and pipelines, applying proper software development and QA methodology.
Not asking for feedback is the killer for me. Even most junior developers will ask for more information if they don't have enough context/confidence to complete a task.
I often ask Claude to scan through the code first and then come back with questions related to the task. It sometimes comes back with useful questions, but most of the time it acts like a university student looking for participation marks from a tutorial; choosing questions to signal understanding rather than be helpful.
I have taken to appending "DO NOT START WRITING CODE." to almost every prompt.. I try to get it to analyze and ask questions and summarize what its going to do first, and even then it will sometimes ignore that and jump into writing (the wrong) code. A big part of the wrangling seems to be getting it to analyze or reason before charging down a wrong path.
GitHub just released spec-kit which I think attempts to get the human more involved in the spec/planning/task building process. You basically instruct the LLM to generate these docs and you tweak them to flesh it out all fix mistakes. Then you tell the LLM to work on a single task at a time, reviewing in small chunks.
That's how everyone is already using Claude Code, it's not GitHub's idea. You go into plan mode, get it to iterate on the idea, then ask it to make (and save) a to do list md. Then you get it to run through the to-do list, checking tasks off as it goes.
I don't understand why people take bad coding practices and just let AI run with it and then expect nothing but poor quality code. Nothing about the AI revolution here changes how good software has always been written. Write tests, use a typed language, review code. If you have good patterns, good procedures, AI fits right in and fills in the blanks perfectly. Poor AI results tend to be the pot calling the kettle black.
Out of curiosity, how do you think the model producers will/would attempt to discern what information on the web is of high quality vs not so high quality (i.e. poisonous)? Akin to clean/drinkable water vs dirty water/harmful water in the well.
I don’t think they will. The well will always have some level of poison if all information has bias and intent. Bad software design is bad grammar, it’s ubiquitous.
The real problem is the quality of knowledge and education of engineers. No amount of AI fixes that (until you complete displace labour as an input that is).
I mean, it sounds like reviews and tests are already their standard practice, and explicitly part of their AI practice. So it should have worked, right?
I recently spent over an hour trying to get ChatGPT to give me some pretty simple rsync commands. It kept giving me command line parameters that didn't work on the version of rsync on my mac. With ~50% of the failures, it would go down troubleshooting rabbit holes and the rest of the time it would "realize" that it was giving incorrect version responses. I tell it to validate each parameter against my version moving forward and it clearly doesn't do that. I am sure I could have figured it out on my own in 5 mins, but I couldn't stop watching the trainwreck of this zeitgeist tech wasting my time doing a simple task.
I am not a coder (much), but I have to wonder if my experience is common in the coding world? I guess if you are writing code against the version that was the bulk of its training then you wouldn't face this specific issue. Maybe there are ways to avoid this (and others) pitfall with prompting? As it is, I do not see at all how LLMs could really save time on programming tasks without also costing more time dealing with its quirks.
> I tell it to validate each parameter against my version moving forward and it clearly doesn't do that.
I would like an AI expert to weigh in on this point. I run into this a lot. It seems that LLMs, being language models and all, don't actually understand what i'm asking. Whenever i dive into the math, superficially, it kind of makes sense why they don't. But it also seems like transformers or some secret sauce is code up specially for tasks like counting letters in a word so that AI doesn't seem embarrassing. Am I missing something?
LLMs are next-token prediction models. They "understand" in that, if the previous 1000 tokens are such-and-such, then they emit a best guess at the 1001th token.
It "knows" what rsync is because it has a lot of material about rsync in the training data. However it has no idea about that particular version because it doesn't have much training data where the actual version is stated, and differences are elaborated.
What would probably produce a much better result if you included the man page for the specific version you have on your system. Then you're not relying on the model having "memorized" the relationship relationships among the specific tokens you are trying to get the model to focus on, instead just passing it all in as part of the input sequence to be completed.
It is absolutely astounding that LLMs work at all, but they're not magic, and some understanding of how they actually work can be helpful when it comes to using them effectively.
Our low-code expression language is not well-represented in the pre-training data. So as a baseline we get lots of syntax errors and really bad-looking UIs. But we're getting much better results by setting up our design system documentation as an MCP server. Our docs include curated guidance and code samples, so when the LLM uses the server, it's able to more competently search for things and call the relevant tools. With this small but high-quality dataset, it also looks better than some of our experiments with fine tuning. I imagine this could work for other docs use cases that are more dynamic (ie, we're actively updating the docs so having the LLM call APIs for what it needs seems more appropriate than a static RAG setup).
Words are tokens so it can't really 'see' the word(s), it just knows how to link them.
did you feed the context a link or examples of the exact version of the documentation?
I am not an expert but i assume if there are any basic knowledge of the tool then it will try to use it as it knows chunks of some old version. And it wont likely decide to search for the newest docs, you have to tell it search for exact version docs, or feed the exact version docs to context
I find context _sometimes_ work but more often than not i rephrase the question a million times, try to break it down into smaller problems, .. whatever. It seems like it just doesn't "understand".
> I am not a coder (much), but I have to wonder if my experience is common in the coding world?
It is, yes. Surely someone will come and tell you it doesn’t happen to them, but all that tells you is that it ostensibly isn’t universal, but still common enough you’ll find no end of complaints.
> Maybe there are ways to avoid this (and others) pitfall with prompting?
Prompting can’t help you with things not in the training set. For many languages, all LLMs absolutely suck. Even for simple CLI tools, telling an LLM you are on macOS or using the BSD version may not be enough to get them to stop giving you the GNU flags. Furthermore, the rsync change in macOS is fairly recent so there’s even fewer data online on it.
https://derflounder.wordpress.com/2025/04/06/rsync-replaced-...
> As it is, I do not see at all how LLMs could really save time on programming tasks without also costing more time dealing with its quirks.
And that’s the best case scenario. It also happens that people blindly commit LLM code and introduce bugs and security flaw they cannot understand or fix.
https://secondthoughts.ai/p/ai-coding-slowdown
https://arxiv.org/abs/2211.03622
Usually, in these edge cases, I go to the documentation page and dump all pages as Markdown into the AI tool (most often Gemini, due to token count). This context engeneering has helped a lot to get better answers. However, it also means I am consuming sometimes 1 Million tokens on relatively simple problems. Like recently, when I needed to solve a relativey simple but specific MermaidJS issue.
Ask Gemini Pro 2.5 to build the rsync command and then give it the man page for your version of rsync. It should succeed the first time.
Here's a command to copy the man page to the clipboard than you can immediately paste into aistudio (on a Mac):
As a general rule, if you would need to look something up to complete a task, the AI needs the same information you do—but it's your job to provide it.So I paste the man page into llm and tell it to only give me parameters that are in that page? Even if it obeyed, it would still choke on how to exclude hidden MacOS kruft files from the copy...
No, you don't need to "tell it to only give me parameters that are in that page."
Here's the entire prompt:
If you have trouble talking to an AI, how do you ever expect to merge with Neuromancer’s twin?
> Maybe there are ways to avoid this (and others) pitfall with prompting?
Not sure about Codex, but in Claude Code you can run commands. So instead of letting it freestyle / guess, do a:
`! man rsync` or `! rsync --help`
This puts the output into context.
Yup even when you tell it your version it forgets pretty quickly. Or agrees it messed up and assures you this time it will give you the correct info for your version number then gives you the same command.
Javascript is a nightmare as they change everything constantly. PHP has backwards compatibility for everything so its not really an issue.
It also gives out dated info on salesforce, and im not just talking about the latest and greatest, it recommends stuff that was deprecated years ago.
Why involve an LLM at all, if you're looking up docs for a particular tool like rsync?
Lots of reasons! First off: where else do I go to learn this stuff? Man pages are reference for people who work in CLI all the time and not for virgin learners as they are are necessarily packed with the complete lexicon but with barely a thought to explaining real world examples of common tasks. There are a million Linux websites with the same versioning issues and inadequate explanations. I guess I could buy an oreily book and learn the topic end to end even though I will only need to know the syntax of a couple commands.
With an LLM, I can get it to tell me what each parameter it suggests actually does and then I can ask it questions about that to further my understanding. That is a massive leg up over the knowledge spaghetti approach...
> There are a million Linux websites with the same versioning issues and inadequate explanations.
Where do you think the LLM is getting its data from? At least on the website you can see the surrounding discussion and have a chance of finding someone with the same problem as you, or learning something tangential which will help you further down the line.
> With an LLM, I can get it to tell me what each parameter it suggests actually does and then I can ask it questions about that to further my understanding.
If it’s giving you wrong flags, why do you assume the explanations it gives you are accurate? LLMs can make those up just as well.
What you should do is verify the given flags on the man page. Not only will it clarify if they exist, it will also clarify if they’re what you’re looking for, and will likely even point to other relevant options.
> There are a million Linux websites with the same versioning issues and inadequate explanations.
So, instead you ask a magic robot to recite a fuzzily 'remembered' version of those websites?
Bear in mind that an LLM does not _know_ anything, and that, despite some recent marketing, they are not 'reasoning'.
[dead]
Learn to read documentation. It's really a very important skill, and many people were becoming _somewhat_ deficient in it even before our good friends the magic robots arrived, due to Stackoverflow et al.
Bah! Coders need to learn to write documentation. The tldr for rsync was hilarious! Actually, this is one place where LLMs are currently useful! Instead of getting LLMs to write code, get it to write documentation.
One pitfall is the LLM hallucinates, might sometimes seem to fulfill your requirements but it can subtly break down. The man pages could be used in conjunction to fact check your understanding.
I like the tldr pages to learn the most common features and use cases of new command line tools! I think it's great, albeit, a bit slow sometimes
tldr pages are a great idea, but the execution is a total fail. Looking at the rsync entry, it fails to provide the most blatantly common requirements:
I just needed two commands: one to mirror a folder from one drive to another, updating only the changes (excluding all of the hidden MacOS cruft). And another command to do a deep validation of the copy. These have to be two of the most commonly used commands right???
In the end, I felt that messing around with precious data without being 100% certain of what I am doing just wasn't worth it so I got a GUI app that was intuitive.
Not the op, but I sometimes find the official documents hard to parse. Not looking at rsync in this case. On the other hand I have the same experience with LLMs as the op.
Big thanks to all the doc writers who include vignettes right in the documents.
I’m using “aichat”, and have all my man pages in a RAG. It’s far faster to query that than it is for me to read through it manually.
I guess you never used ffmpeg, there is a whole industry of "how to do X with ffmpeg"
> I recently spent over an hour trying to get ChatGPT to give me some pretty simple rsync commands.
Try N times, adding more context about the environment and error messages along the way. If it doesn't work after those, try other models (Claude, Gemini, ...). If none of those work on whatever number of attempts you've chosen, then LLMs won't be able to help you well enough and you should save yourself some time and look elsewhere.
A good starting point is trying for 10-20 minutes, after which point an LLM might actually become slower than you going the old fashioned way of digging into docs and reading forum posts and such. There are also problems that are a bit too complex for LLMs as well and they'd just take you in circles no matter for how long you try.
As a programmer I have noticed this problem much more with command help than with code. Maybe partly because the training data has way more, and more diverse, code examples than all the relevant permutations and use cases for command argument examples.
I'm a coder and I've never had your experience. It usually does an amazing job. I think that coders have an advantage because there are many questions I would never ask an LLM because of my intuition on what would work well and what wouldn't. In your case I would have dumped the output of `rsync --help` into the context window once I saw it wasn't familiar with my particular version of rsync. That's they way these tools work.
I've recently been misled by ChatGPT a lot as well. I think it's the router. I'm on the free plan so I assume they're just being tight with the GPU cycles.
I am on a $20 plan and using the "thinking" version of 5.
[dead]
LLMs are a very specific kind of beast. Using an `rsync --help` or getting any kind of specific documentation into context would have unblocked you.
All the time I find that if I know the stack and tools I am working in, it's faster to just write code on my own, manually; If I want to learn on the other hand, LLMs are quite useful - as long as you understand (or learn to) and validate the output
Instead of just saying: rsync on my system is version 3.2, have you tried copy/pasting rsync --help? In my experience, that would be enough for the AI to figure out what it needs to do and which arguments to use. I don't treat AI like an oracle, I treat it like an eager CS grad. I must give it the right information for the task.
What is the cost (in tokens/$$$) of spending an hour restating questions to a chat bot vs typing `man rsync`?
Telling the agent to execute "man rsync" and synthesize the answer from there is probably the cheapest and most efficient option.
Letting some detached LLM fumble around for an hour is never the right way to go, and inversely sifting through the man page of rsync or fmmpeg or (God forbid) jq to figure out some arcane syntax isn't exactly a great use of anyone's time either, all things considered.
Sifting through just means you don’t know how to use the man interface to search/grep (which from a discoverability perspective is fair). However I think reeling through an Llm (using an agent or not) for a task that probably could take <10mins at $0, demonstrates enthusiasts disregard for a good set of research and reading habits.
All of this is an attempt at circumventing RTFM because you’re privileged enough to afford it.
Just lay yourself down on the WALL-E floating bed and give up already.
The “fucking” manual is obtuse, overly verbose, and almost always lacking in super clear real world examples. That you claim it is a <10 min problem demonstrates experts disregard for the degree of arcane crap, a beginner needs to synthesize.
I am backing up and verifying critical data here. This is not a task that should be taken lightly. And as I learned, it is not a task that one can rely on an LLM for.
I disagree. The topic is rsync which is well documented. The path forward might not work in every situation but the manual has good examples too. This is probably true for what Llm users use it for 90% of the time. To hack together things that are well known and well documented into a result. Llms arguably only work because these tools ffmpeg, rsync, etc were already solving problems and widely used and documented. So burning energy to have a computer look up commands because you couldn’t spend 10 minutes reading yourself could be a waste of time and money. Where as having to spend time researching is likely only a waste of time only.
>rsync which is well documented
Here is the man page entry for the --delete flag:
--delete is used. This option is mutually exclusive with --delete-during, --delete-delay, and --delete-after.
Hilarious!
Reading and understanding the rsync command would take much more than 10 mins and I am not a total newb here.
What command? Just to clarify there is no example command we’re discussing here. You’re just cherry picking results exclusively from the man page and then arguing that chatgpt is better because it gets to use example documentation from the internet. Well I get to use examples from the internet too.
A search query takes a matter of seconds to type in, select a result and read. No doubt still under 10 minutes.
But still to my original point it’s insanely more expensive to have chatgpt look it up. This doesn’t bother you because you are privileged enough to waste money there. If time is money then IMO the only valuable time I have with my money is when it’s gaining interest and not being spent.
You can abstract away all the “but I had to scroll down the page and click a different result” steps as “time savings” all you want, but no one was wasting a ton of time there for already well established tools. That is a deluded myth.
I’m not sure I even grasped your point. The delete flag is pretty self explanatory and gives you options for more granularity. Why does that take greater than 10 mins? What is the issue with that entry?
Here is what I get when I type `man rsync`:
``` --delete This tells rsync to delete extraneous files from the receiving side (ones that aren't on the sending side), but only for the directories that are being synchronized. You must have asked rsync to send the whole directory (e.g. "dir" or "dir/") without using a wildcard for the directory's contents (e.g. "dir/*") since the wildcard is expanded by the shell and rsync thus gets a request to transfer individual files, not the files' parent directory. Files that are excluded from the transfer are also excluded from being deleted unless you use the --delete-excluded option or mark the rules as only matching on the sending side (see the include/exclude modifiers in the FILTER RULES section).
```I pasted the results of typing man rsync into my macbook’s terminal. I looked up the —delete parameter and pasted the entry. Not sure why your entry was more useful - perhaps a version issue (which is at the root of the painful time I have spent trying to learn how to do something trivial).
Later in the man page, it gives examples and totally fails to explain those examples. And yes, for someone who is going to be doing this frequently and professionally. They should understand this deeply and spending the hours required to be fluent in a command with a kitchen sink full of parameters. I, on the other hand, will be executing these commands maybe a few times in a year.
The more I think about it, the more I think the solution here is to use LLM‘s to write better documentation with lenses for different types of users with different needs.
Lower than the cost of a meatbag-office with heating, cooling and coffee for the time spent reading the manual.
How is the cost of an office relative? Llms didn’t destroy offices. You can type `man rsync` without any of that.
Pretty much my experience, yes.
I can relate to a lot of this.
Where I find AI most useful is getting it to do tasks I already know how to do, but would take time.
If you understand the problem you are trying to solve well enough to explain it to the LLM, you can get good results, you can also eyeball the outputted code and know right away if it's what you are after.
Getting it to do things you don't know how to do is where it goes off the rails IMO
This post (aside from the title) is fairly nuanced, with the reality that "let an LLM do all the things" is going to be fraught with problems... but "let an LLM do some very specific things that saves me an hour or so from boilerplate code or tests" is very nice. Of course, I doubt a blog post titled "Sometimes AI is ok, sometimes it's not" would get the clicks.
Exactly. AI is your intern, not your contractor.
More precisely, AI is your intern who won't improve during the internship, ever.
Just to play devils advocate - why do you think this is so?
And if you have a compelling thesis, why hasn't this spread to the investing community?
> why do you think this is so?
Well, because nothing in the transformer architecture supports such learning ability. All AI researchers and most serious AI users are aware of this, so I'm not sure I understand the question.
> And if you have a compelling thesis, why hasn't this spread to the investing community?
The investing community believes that they can make money. That's feels pretty much orthogonal to whether the metaphorical intern can learn, and much more related to whether clients can be made to buy the product, one way or another.
They mean it doesn't learn from experience/mistakes after spending time on your codebase. There are workarounds like documenting in CLAUDE.md/CODEX.md/.roorules which dump it back into the active context but are hit and miss in my experience. It's definitely better than nothing but Claude still routinely ignores important directives whenever it's in the right mood.
And is also a pathological liar lol
Yes, that, too.
>There is never enough context. We learned quickly that the more context we provided and the smaller the issues, the better the results. However, no matter how much context we provided, the AI would still mess things up because it didn’t ask us for feedback. AI would just not understand if it didn’t have enough information to finish a task, it would assume, a lot, and fail.
Is it me or does it feels like the genie in the bottle thing. Remember a TV show where the guy and his friend sat down with the Genie like a lawyer to make sure every angle is covered (going to spare you the details here). That is what it feels like interacting to a LLM sometimes.
I think the AI acts like that shitty coworker that is super smart but never tells you what they are thinking so they are likely capable of doing whatever you want them to do but working on a team is asking too much and they are apparently not capable of doing that. Because AI promises that you can interact with it like it is human because of it's chat capabilities but it never ever does something like, "hey, i don't understand this part, can you tell me more of what you mean here?"
Well, except that (I think I know the scene you're referring to), it ultimately worked. The LLM, on the other hand, will feel no need to stick to its 'promises'.
(Really the genie is closer to the traditional sci-fi AI in that it's legalistic and rules-bound; the LLM very much isn't.)
I don't like vibe coding as much as actual coding, but the biggest improvement in my workflow was shifting left even more.
Now I dedicate at least one session to just writing a spec file, and have it ask me clarifying questions on my requirements and based on what it finds in the codebase and online. I ask it to also break down the implementation plan in phases with a checklist for each phase.
I then start at least one new session per phase and make sure to nail down that phase before continuing.
The nice thing is if it gets annoying to vibe code it, I or someone on my team can just use the spec to implement things.
I decided to adopt AI assisted coding for a recent project. Not sure what defines 'vibe coding' but the process I ended up was a iterative interaction at a measured pace.
I used Gemini AI studio for this and I was very pleased at the result and decided to open source it. I have completely captured and documented the development transcript. Personally it has give me considerable productivity boost. My only irritation was the unnecessarily over politeness that AI adopts in My take is
AI yields good ROI when you know exactly what you want at the end of the process and when you want to compare and contrast decision choices during the process.
I have used it for all artifacts of the project: - Core code base - Test cases - Build scripts - Documentation - Sample apps - Utilities
Transcript - https://gingerhome.github.io/gingee-docs/docs/ai-transcript/... Project - https://github.com/gingerhome/gingee
Yes, this is how I use AI.
Indeed, self-invented abstractions are a bridge too far for AI.
You have to keep it close to the path already walked before by thousands of developers.
This makes AI more of a search engine on steroids than anything else.
ChatGPT is literally just a search engine that Google shouldve moved to, but waited because they didnt want to touch their assets in place.
I agree that Google were too slow to move, but entirely disagree with the first part. ChatGPT is very much not a "search engine". Arguably it is an "Answer engine", but more so, it is a conversational partner - I almost never use ChatGPT to just get one response; the real benefit is being able to follow up with it until I'm satisfied. It's an entirely different medium of interaction as compared to search engines.
GPT is orders of magnitude more expensive to run, though.
My preferred approach in similar situations is to ask an LLM for an initial solution or code snippet, then take over manually - no endless prompt tweaking, just stop prompting and start coding. Finally (optionally), I let the LLM do a final pass to review my completed solution for bugs, optimizations, etc.
The key win is skipping the prompt refinement loop, which is (A) tedious and time-consuming, and (B) debilitating in the long run.
"We just don’t think we will incorporate AI to do more than that, given the current state of things. We will, however, keep an eye in case the technology changes fundamentally."
I wonder whether LLMs are capable of doing more; probably, we need another paradigm for that; still, they are very, very useful when used right
> I wonder whether LLMs are capable of doing more
I don't see how that is a question. I come up with new ideas to improve the LLM-based tools I'm using at least once a day, and the vast majority of these are plain engineering changes that I could do on my own if I wanted to put the effort into it. I think that even if God comes down from heaven to prevent us from further training the LLMs themselves (if God is listening to Yudkowsky's prayers), then we would still have a good few decades of extensively improving the capabilities of LLM-based tools to extract a massive amount of further productivity by just building better agentic wrappers and pipelines, applying proper software development and QA methodology.
True!
> However, no matter how much context we provided, the AI would still mess things up because it didn’t ask us for feedback.
The proceeding without clarifying or asking questions thing really grinds my gears.
Using AI to improve facebook ads... y'all are the breakers from the Dark Tower series.
Not asking for feedback is the killer for me. Even most junior developers will ask for more information if they don't have enough context/confidence to complete a task.
I often ask Claude to scan through the code first and then come back with questions related to the task. It sometimes comes back with useful questions, but most of the time it acts like a university student looking for participation marks from a tutorial; choosing questions to signal understanding rather than be helpful.
I have taken to appending "DO NOT START WRITING CODE." to almost every prompt.. I try to get it to analyze and ask questions and summarize what its going to do first, and even then it will sometimes ignore that and jump into writing (the wrong) code. A big part of the wrangling seems to be getting it to analyze or reason before charging down a wrong path.
If you use Claude Code you can go into plan mode, where it doesn't write code, you can back and forth.
Gemini is terrible for this
GitHub just released spec-kit which I think attempts to get the human more involved in the spec/planning/task building process. You basically instruct the LLM to generate these docs and you tweak them to flesh it out all fix mistakes. Then you tell the LLM to work on a single task at a time, reviewing in small chunks.
That's how everyone is already using Claude Code, it's not GitHub's idea. You go into plan mode, get it to iterate on the idea, then ask it to make (and save) a to do list md. Then you get it to run through the to-do list, checking tasks off as it goes.
This aligns very well with my experience and what I’ve commented on other posts!
"AI, but verify" -- Winston Churchill (alternate universe)
I don't understand why people take bad coding practices and just let AI run with it and then expect nothing but poor quality code. Nothing about the AI revolution here changes how good software has always been written. Write tests, use a typed language, review code. If you have good patterns, good procedures, AI fits right in and fills in the blanks perfectly. Poor AI results tend to be the pot calling the kettle black.
I think bad practices will always be around as most code on Guthub was probably written with bad practices. The well is poisoned.
Out of curiosity, how do you think the model producers will/would attempt to discern what information on the web is of high quality vs not so high quality (i.e. poisonous)? Akin to clean/drinkable water vs dirty water/harmful water in the well.
I don’t think they will. The well will always have some level of poison if all information has bias and intent. Bad software design is bad grammar, it’s ubiquitous.
The real problem is the quality of knowledge and education of engineers. No amount of AI fixes that (until you complete displace labour as an input that is).
I mean, it sounds like reviews and tests are already their standard practice, and explicitly part of their AI practice. So it should have worked, right?
Filler content.
> Our marketing director (that’d be me) said that if we don’t write something about it, we will be left behind...
Write when you have something to say. What was I supposed to learn here?