YAML document from hell (2023)

(ruudvanasseldonk.com)

229 points | by agvxov 2 days ago

140 comments

twelvechairs 2 days ago
Almost all of this is solved by basically putting quotes around strings.
Yaml has its uses cases where you want things json doesnt do like recursion or anchors/aliases/tags. Or at least it has had - perhaps cue/dhall/hcl solves things better. Jsonnet is another. I havent tried enough to test how much better they are.
[-]
- lucideer 2 days ago
  I feel like these two tenets - (1) yaml should require quotes & (2) the value in yaml is in recursion/anchors - are fundamentally the opposite of why yaml exists & why people use it.
  The distinguishing draw of yaml is largely the "easiness" of not having explicit opening or - more importantly - closing delimeters. This is done using a combination of white-space delimiting for structure, & heuristic parsing for values. The latter is fundamentally flawed, but yaml fans think the flaws are a worthwhile trade-off. If you're going to bring delimiters in as a requirement, imho yaml loses its raison d'être.
  Recursion/anchors/etc. on the other hand are optional extras that few use & some parsers don't even support. If they were the driving value of yaml they'd be more ubiquitous.
  Disclaimer: I hate yaml & wish it didn't exist, but I do understand why it does & I frankly don't have a great suggestion for alternatives that would fill those needs. Toml is also flawed.
  [-]
  - com2kid a day ago
    Genuinely curious - What major flaws does TOML have? I've used it before and it seems like a simple no-nonsense config language. Plenty of blog articles about the flaws behind YAML, I don't really see complaints about TOML!
    [-]
    - pavon a day ago
      INI-like formats are perfectly fine for config files with at most one layer of nesting/sections. TOML is a perfectly fine INI-like parser. Its definitions and support for strings, numbers, comments, sections and simple arrays are great. But its main claim to fame is extending INI to support arbitrary levels nesting of arrays and dictionaries like JSON, and IMO it does a horrible job at it.
      With JSON, YAML, XML and many other formats, the syntax for nesting has a visual appearance that matches the logical nesting. TOML does not. You have to maintain a mental model of the data structure, and slot the flat syntax into that structure.
      Furthermore, there are multiple ways to express the same thing like
      [fruit.apple] color = "red"
      or
      [fruit] apple.color = "red"
      It isn't always obvious which approach is more appropriate for a task, and mixing them creates a big mess.
      And the more nested the format becomes, with arrays of dicts, or dicts of arrays, the harder it is to follow.
      [-]
      - kibwen a day ago
        > And the more nested the format becomes, with arrays of dicts, or dicts of arrays, the harder it is to follow.
        While I have some minor annoyances with TOML, I counterintuitively consider it a strength of the format that nesting quickly becomes untenable, because it produces pressure on the designers of config file schemas to keep nesting to a minimum.
        Maybe some projects have a legitimate need for something more complex, but IMO config files are at their best when they're just key-value pairs organized into sections.
        [-]
        integralid 17 hours ago
        As far as I can see, nobody originally constrained the problem to config files. So I guess the problem with TOML is that it's only good for config files, while JSON and TOML are general purpose.
        [-]
        kibwen 15 hours ago
        Yes, I think that's a fair characterization. The priorities of config file formats are different than the priorities of human-readable arbitrary data serialization and transmission formats.
    - sunrunner a day ago
      > I don't really see complaints about TOML!
      Sampling bias, there are no complaints about it because no-one uses it (jk).
      It's subjective of course but despite the name TOML never really seemed that 'Obvious' to me, in particular the spec for tables. I also think the leniency in the syntax isn't necessarily a good feature and serves to make it less 'Minimal' than its name suggests.
    - unconed a day ago
      TOML is basically a formalization of the old INI format, which only existed in ad-hoc implementations. It's not really a "language", just a config data syntax like JSON. It doesn't have major footguns because it doesn't have a lot of surface area.
      The various features it has for nesting and arrays make it convenient to write, but can make it harder to read. There is no canonical serialization of a TOML document, as far as I can tell, you could do it any number of ways.
      So while TOML has its use for small config files you edit by hand, it doesn't really make sense for interchange, and it doesn't see much use outside of Rust afaik.
      [-]
      - clayhacks an hour ago
        I believe TOML can always be serialized to JSON. And TOML is in the python standard library in newer pythons. It’s also used as the suggest format for `pyproject.toml` in python
    - baby a day ago
      toml is just not human friendly unless you're just using a super simple object with as little nesting as possible. As soon as you increase the nesting you need yaml or json
  - munificent 2 days ago
    > The distinguishing draw of yaml is largely the "easiness" of not having explicit opening or - more importantly - closing delimeters.
    Along with a coworker, I wrote the package manager for Dart, which uses YAML for its main manifest file (pubspec.yaml). The lack of delimiters is kind of nice but wasn't instrumental in the choice to use YAML.
    It's because JSON doesn't have comments.
    If there was a JSON+comments what was specified and widely compatible, we would have used that. YAML really is a brittle nightmare, and the lack of delimiters cause problems as often as they solve them. We wrote a YAML parser from scratch and I still get the indentation on lists wrong sometimes.
    But YAML lets you actually, you know, comment out a line of text in it temporarily, and that's really fucking handy. I think of Crockford had left comments in JSON, YAML would be dead.
    [-]
    - rendaw a day ago
      JSONC is JSON with comments (and trailing commas) and it's fairly widely supported, namely because VS Code ships with support built in and they use it for all their config files. I've seen libraries for a number of languages.
      VS code defaults to complaining about trailing commas though (the warnings can be turned off though (it feels like a hack and they didn't properly document it though (it is an officially sanctioned procedure though))).
    - lucideer a day ago
      > It's because JSON doesn't have comments.
      This is a big plus but JSON5 has pretty widespread language library support - probably equal to that of YAML tbh (e.g. Swift has native JSON5 support, I don't know that anyone natively supports YAML). Any reason not to opt for it here?
      [-]
      - munificent a day ago
        I believe JSON5 didn't exist when we first wrote pub. If it did, it certainly wasn't widely known.
        Obviously, migrating to it now when there are thousands and thousands of packages and dozens of tools all reading pubspecs would be much more trouble than it's worth.
        [-]
        lucideer a day ago
        Understandable. I just checked & JSON5 was just 1 year later but even then it would've taken a lot longer to gain sufficient traction to be well supported.
      - Diti a day ago
        Most protocols defined in RFCs require the use of regular JSON. You don’t have a choice.
        [-]
        lucideer a day ago
        Not sure what context you're referring to but we're discussing configuration file formats, not data transports, so I doubt that would be a frequent issue.
  - darkwater a day ago
    I see where you are coming from but YAML anchors are definitely a great and powerful feature that deserves more attention. The other day I was refactoring a broken [1] k8s deployment based on a 3rd-party Helm chart and since I didn't have the time to migrate to a better chart, YAML anchors permitted me to easily reduce YAML duplication, with everything else (Helm, Kustomize, Flux, Kubernetes) completely unaware of anything. Just a standard YAML pattern.
    [1] the broken part was due to an ex-coworker that cheated his way out of GitOps and left basically "fake code" committed, and modified by hand (with Lens) the deployment to make it work
  - Dylan16807 a day ago
    Is - not effectively an opening delimiter?
    If we want to avoid quoting in particular, then we could use - for strings and anything else for non-strings. But the heuristics suck.
- lillesvin 2 days ago
  > Almost all of this is solved by basically putting quotes around strings.
  Yeah, that was my first thought as well. I personally don't mind YAML, but I've also made a habit out of quoting strings. And, I mean, you're quoting both keys and strings in JSON, so you're still saving approx. 2 double quotes per key/value pair in YAML if that's a metric that's important to you.
  [-]
  - montroser 2 days ago
    As the article points out with the `on` example, you really have to quote yaml keys as well, if you want the defense to work...
    [-]
    - lillesvin a day ago
      The argument was that most of the mentioned problems could be solved by quoting the values. I don't have a problem with avoiding "on" as a key, and I apparently haven't used it ever, because I've never run into this particular problem in my 15+ years using YAML.
      So, sure, if you want to play it super safe, quote keys as well. But I'm personally fine with the trade-off in not quoting keys.
      [-]
      - Dylan16807 a day ago
        If you compare to JSON5 instead of JSON, you still get the benefit of unquoted keys, but you also get a guarantee the keys are strings, and it's harder to forget to quote a value.
- puzzlingcaptcha 2 days ago
  from the article:
  >Many of the problems with yaml are caused by unquoted things that look like strings but behave differently. This is easy to avoid: always quote all strings.
- rajer a day ago
  As a total noob who had to work with yaml to write pipelines for ADO over my summer internship, I didn't seem to encounter any of these oddities, nearly everything I worked with was wrapped in quotations.
- bjackman 2 days ago
  Yeah and this is enforced by default in yamllint.
  It's very fair to cry "why the hell do I need a linter for my trivial config file format", and these footguns are a valid reason to avoid YAML.
  But overall YAML's sketchiness is a pretty easy problem to solve and if you have a good reason to keep/choose YAML, and a context where adding a linter is viable, it's not really a big deal IMO.
  And as hinted in the post, there's really no well-established universal alternative. TOML is a good default but it's only usable for pretty straightforward stuff. I'm personally a fan of the "just use Nix" approach but you can't put a Nix interpreter everywhere. And Cue is way overpowered for most usecases.
  I guess the tldr is that the takeaway isn't "don't use YAML" but just "beware of YAML footguns, know the alternatives".
- everforward 2 days ago
  JSON doesn’t do them as part of the spec, but there’s nothing stopping you from doing them as post-processing. Eg OpenAPI does it by using a special $ref key where the post processor swaps in the value referenced there.
  That’s effectively what jsonnet/cue/hcl do, though as a preprocessor instead of a postprocessor.
- danmur 2 days ago
  Jsonnet is pretty nice but the library support isn't quite as good. There are some nice libraries for yaml that do round trip processing for example so you can modify a yaml programmatically and keep comments. Yaml certainly has some warts (and a few things that are just frankly moronic) but it deserves some credit for hitting the sweet spot in a bunch of ways.
- zyx321 2 days ago
  It's very counter-intuitive to me that 22:22 would need to be a quoted string, since functionally it's a K-V-pair. YAML itself even uses : in the Dict syntax!
  [-]
  - tpmoney 2 hours ago
    The fact that it is effectively the dict syntax is precisely what makes it intuitive to me that it should be quoted if it’s going to be a a value. I admit the sexagesimal parsing is not the result I expected but I would have certainly expected something odd to happen given that the value includes a “:” character.
  - darkwater 2 days ago
    It's a key pair in whatever thing reads the YAML and then assign some meaning to that string. In YAML you need to put a space between the semi-colon and the value.
raincole 2 days ago
The n, no, off thing is just sad. It's a 100% avoidable issue. But whoever put that into spec was just so clever that they overflew and became stupid.
[-]
- phpnode 2 days ago
  Whoever thought supporting sexagesimal numbers was a good idea needs to spend some extended time away from their computer to reflect on what they’ve done
  [-]
  - microtherion 2 days ago
    Presumably that was to support time values.
    [-]
    - Ajedi32 a day ago
      That makes sense, but I think the vast majority of tools that need time values would actually expect users to just input a string and parse that themselves.
      IMO anything other than the basic types supported by JSON (number, true, false, null) ought to be be parsed as a string. Or if you really insist, some kind of special syntax to make it clear it's not a string would probably be acceptable.
    - VMG 21 hours ago
      What do you mean by "support"?
      [-]
      - microtherion 7 hours ago
        fastest_mile: 2:54
        Not saying it's a good idea, mind you, but at least it's an ethos.
    - imglorp a day ago
      Does anyone do it that way?
  - chrisandchris 2 days ago
    We wanted a file format that's easy to read and less verbose than xml and all we got was something that is so full of pitfalls that it would be easier just not to use it.
- __alexs 2 days ago
  This is basically every problem in YAML. Someone couldn't resist adding more stuff and either didn't realise or didn't care about the ambiguities it created.
- kevincox 2 days ago
  It basically feels like overfitting. They saw some use case so they added it. But they didn't think about how this would generalize and now this nice use case is disproportionately supported at the cost of surprising everyone who doesn't need time-of-day fields in their file.
- baobrien 2 days ago
  Too clever by half
al_borland 2 days ago
The Norway problem drives me a bit nuts.
In a lot of the Ansible documentation, yes/no are used instead of true/false. When seeing this in the official docs, I used it, figuring this was the preferred convention in Ansible. These days it now throws warnings or lint errors, so I’m updating it all over the places as I find it. Yet the Ansible documentation still commonly uses it.
[-]
- gchamonlive 2 days ago
  Ansible isn't a gold standard for docs. The docs are updated and maintained, but the underlying interfaces aren't consistent and that leaks to the docs. One can only wonder why, maybe different developers with different ideas for conventions without a style guide.
  Ansible is a wonderful tool though, if you can excuse these idiosyncrasies.
  [-]
  - sofixa 2 days ago
    > Ansible is a wonderful tool though, if you can excuse these idiosyncrasies.
    The only advantage Ansible has is how easy it is to start with it - you don't need to deploy agents or even understand a lot about how it works.
    Trouble is, it doesn't really scale. It's pretty slow when running against a bunch of machines, and large configurations get unwieldily quickly (be it because of YAML when in large documents its impossible to orient/know what is where/at what level, or because of the structure of playbooks vs roles vs whatever, or because templating a whitespace-as-logic-"language" is just hell). It's also fun to debug "missing X at line A, but the error can be somewhere else". Cool, thanks for the tip.
    So it's pretty great to get started with, or at a home lab. Big organisations struggling with it is a bit weird.
    [-]
    - gchamonlive a day ago
      I've had the opposite experience. A bit hard to setup, with ssh-agent, inventories and understanding what each module does, and creating specialized roles. So for quick jobs, plain bash with ssh wins most of the time.
      But once ansible is set, it's easy to achieve parallelism when provisioning multiple instances.
      Problem is that it requires lots of back and forth over ssh, so the more latency you have between the control plane and the target hosts the slower it'll be.
      And yeah... Debugging is a pain. I wish I could write ansible in an actual language instead of having to fight multiple layers of indirection with ansible, jinja2 and yaml.
    - dreamcompiler 2 days ago
      Seems like the right answer is "bootstrap your daemon installs with Ansible and then use something that scales better that runs on those daemons."
      What are the best practices along these lines? What's the "something better"?
      [-]
      - TheTaytay a day ago
        Curious about this myself!
        [-]
        onraglanroad a day ago
        I tend to use Ansible to set up for Puppet.
        There's an Ansible provider for Terraform so you can do the whole thing in there.
    - al_borland a day ago
      I found job slicing speeds up jobs dramatically. In a test I did recently it dropped the time from nearly 4 hours, down to 17 minutes, for an inventory of about 4500 hosts.
- tgv 2 days ago
  It depends on how they parse/decode/unmarshal the file. If they use a "generic" yaml parser, no will be translated to false. But if the parser knows the types of the data structure, or can be instructed not to replace certain strings, or has hooks, it can treat no as a string. So it might be that the linter doesn't operate like the parser.
  [-]
  - maxbond 2 days ago
    Halloween isn't for a few more weeks, but this framework for creating bespoke YAML dialects that can only be parsed by a specific implementation and with the correct type annotations will scare the pants off of your devops colleagues around the campfire.
    (In case I haven't succeeded in hitting the right tone, this is intended to be good-natured jest and not snark.)
    [-]
    - tgv a day ago
      Well, JSON cannot represent dates (nor Sets, Maps, NaN, etc.), so quite a few applications with a JSON parser have their own conversion (e.g. seconds since epoch, string parsing, object with date fields). Is that a bespoke JSON dialect that scares the pants off?
      Now, JSON is more suited for machine-to-machine, but YAML works fairly well for humans. It's a pity, but a few domain specific don't really hurt, since you can't copy some bit of YAML and paste it in an entirely different config anyway.
      PS campfire story? "When we were still working in the old building, deep down in the cellar, there was a colleague who had been there since the early days. Nobody saw him arrive at work or leave. It was as if he was always there. One of the things he had written was a custom parser ... FOR YAML!"
      [-]
      - maxbond a day ago
        I'd say that isn't a JSON dialect because that's postprocessing applied after parsing, versus hooking into a YAML parser to change the semantics of how `no` is parsed. But it is a good point.
        I did run into a project once with a very cool custom YAML parser to recommend how to recover from errors. I think you do have to type check all deserialization, and you should fail if you process a bool where you expect a string. Automatically fixing things can be very dangerous. But if you were going to do it, the way you described is the best way to do it.
        > Well, JSON cannot represent ... NaN ...
        Here's another horror story:
        >>> # Python >>> json.dumps({"foo": float("nan")}) '{"foo": NaN}' > // JavaScript > JSON.parse('{"foo": NaN}') Uncaught SyntaxError: Unexpected token 'N', "{"foo": NaN}" is not valid JSON
- Y-bar 2 days ago
  Has this really been a problem in the last ten years? Version 1.2 of the spec (if I recall) fixed it in 2009.
  [-]
  - Diti a day ago
    Only if you use Kubernetes, because it’s YAML 1.1 all the way.
    [-]
    - Diti 16 hours ago
      Forgot to add a source: https://github.com/kubernetes/kubernetes/issues/34146#issuec...
      [-]
      - Y-bar 5 hours ago
        Oh man. That issue is nine years old now and still open. And the referenced candiedyaml library was archived in 2022.
        Sometimes the tech world moves at warp speed, sometimes it just treads water.
aranw 2 days ago
I find it remarkable that YAML has become our goto for configuration when it is riddled with parsing traps and inconsistent behaviour that catches out even experienced developers
[-]
- nucleardog a day ago
  It's the least-annoying option in a lot of cases.
  JSON is for computers. Writing and editing by hand is not great. Escaping things sucks. A simple multi line string or something gets really awkward.
  XML goes too far the other way... it's annoyingly verbose to write by hand. Escaping can get annoying. It often allows you to represent data structures that are not easily representable in various languages.
  INI sucks because it lacks a specification. It also sucks for nested data.
  TOML fixes this by essentially specifying a better INI file. Much like an INI file, this falls apart at any real level of nesting.
  EverythingElse is not widely supported.
  When it comes to basic configs and stuff humans need to work with, I usually start with a basic K=V format. Writing a "parser" in any language usually takes about one minute and has no dependencies so is an easy win.
  As soon as a use case grows beyond that (quoting, explicit typing, multiple lines, escapes, whatever) I just move to YAML. It's not the best, but it's easily available and the least bad from my point of view.
  [-]
  - array_key_first a day ago
    Unironically PHP arrays are the perfect config format. Nestable like JSON, terser, no parsing traps, typed.
    I mean, this is just great:
```php
[
```
      'driver' => 'mysql'
    
      'options' => [...],
```
];
```
    Obviously not a lot of support though... Its PHP.
- foobarian 2 days ago
  And furthermore I find it remarkable how much people like the visual format where you indent nested things with whitespace. I'm pretty sure it's the main reason Python took off as well.
- mcdonje 2 days ago
  It's because other config formats aren't as expressive.
  [-]
  - aranw 2 days ago
    > It's because other config formats aren't as expressive.
    Oh yeah it is literally the best of a bad bunch in my opinion
    I'm hopeful of languages like CUE https://cuelang.org/
  - esafak 2 days ago
    See starlark, dall, jsonnet, cuelang, toml, etc.
bertman 2 days ago
Discussion from 3 years ago, when this was originally posted:
https://news.ycombinator.com/item?id=34351503 , 566 points, 358 comments
[-]
- natebc a day ago
  I think this article gets posted about every quarter.
  [-]
  - illusive4080 a day ago
    I think it shows that there is a persistent dislike of yaml. I would like to read about the history of why yaml became so popular, despite all its flaws.
mcdonje 2 days ago
IMO, JSON, YAML, and TOML should all interpret all keys as strings, and only enforce quotes when syntactically necessary.
So, `key1` is a string and doesn't need to be quoted. `12345` as a key is interpreted as a string (because keys are strings) and doesn't need to be quoted. `"key 1"` has a space, so it needs to be quoted.
[-]
- edoceo 2 days ago
  We'd have to change the spec and then all the core libs. Big task.
  Use more quotes, use yamllint.
  Like bash, more quotes and shellcheck.
  [-]
  - mcdonje 2 days ago
    Specs change from time to time. It requires effort. Nothing new here. It's necessary sometimes. Dealing with annoyances and footguns also takes effort.
    [-]
    - edoceo a day ago
      I hear you. But, we've already got the shit-sandwhich. Put Tabasco on it.
- sceptic123 2 days ago
  What does IMO configuration look like
  [-]
  - psnehanshu 2 days ago
    IMO means "in my opinion", or if you were being sarcastic, putting /s helps.
xg15 a day ago
> There exist various extensions of json that extend it just enough to make it a usable config format without introducing too much complexity. Json with comments is probably the most widespread, as it is used as the config format for Visual Studio Code. The main downside of these is that they haven’t really caught on (yet!), so they aren’t as widely supported as json or yaml.
What blew my mind was learning that the entire JSON grammar is included as a subset in the YAML grammar. So every valid JSON document is automatically a valid YAML document.
But you don't have to stop there. You could also mix and match the JSON grammar elements with the additional "proper YAML" ones - including comments.
So this means any* software that accepts a YAML config would also accept the config as JSON or JSON-with-comments instead. No ecosystem bootstrapping necessary!
(*or almost any, as long as they don't use dicts with non-string keys)
[-]
- jcgl 8 hours ago
  I often make use of that when dealing with unholiness of tempting yaml with jinja in Ansible; instead of faffing around with getting yaml whitespacing juuust right, you can dump whatever python object you have right into json inside your yaml template. Pretty-print the json if you want, or just stick a blob in there.
Kostarrr 2 days ago
So... what are the good alternatives to yaml?
For quite some time I thought toml, but the way you can spread e.g. lists all over the document can also cause some headaches.
Dhall is exactly my kind of type fest but you can hit a hard brick wall because the type system is not as strong as you think.
[-]
- endgame 2 days ago
  I wish I had a good answer for you. I've been dissatisfied with Dhall, Nickle, Cue, and possibly others. Dhall's type system is both too strong (you have to plumb type variables by hand if you want to do any kind of routine FP idioms) and too weak (you can't really _do_ much with record types - it's really hard to swizzle and rearrange deeply nested records).
  On top of that, the grammar is quite difficult to parse. You need a parser that can keep several candidate parses running in parallel (like the classic `Parser a = Parser (String -> [(a, String)])` type) to disambiguate some of the gnarlier constructs (maybe around file paths, URLs, and record accesses? I forget). The problem with this is that it makes the parse errors downright inscrutable, because it's hard to know when the parse you actually intended was rejected by the parser when the only error you get was "Unexpected ','".
  Oh, and you can't multiply integers together, only naturals.
  Maybe Nix in pure eval mode, absurd as that sounds?
  I think the best thing for tools to do is to take and return JSON (possible exception: tools whose format is simple enough for old-school UNIX-style stdin/stdout file formats). Someone will come up with a good functional abstraction over JSON eventually, and until then you can make do with Dhall, YAML, or whatever else.
  [-]
  - ruuda 2 days ago
    > Maybe Nix in pure eval mode, absurd as that sounds?
    It doesn’t sound absurd, it’s pretty nice. What do you think about https://rcl-lang.org?
    [-]
    - rswail a day ago
      Just been reading the docs, I like it :)
      Gonna have to set aside some time to play with it compared to HCL where I spend a lot of time.
- bmacho 2 days ago
  What about KDL (https://kdl.dev/) or Pkl (https://pkl-lang.org/)?
  [-]
  - Ajedi32 2 days ago
    For configuration I dislike the XML object model KDL is built around. It needlessly complicates things to have two different incompatible ways (properties and children) of nesting configuration keys under an element.
    Pkl seems syntactically beautiful and powerful, but having types and functions and loops makes it a lot more complicated than the dead-simple JSON data model that YAML is based on.
    [-]
    - speed_spread a day ago
      In JSON I often end up recreating XML attributes equivalent for metadata fields and using custom prefixes to differentiate those fields from actual data. I find it's nice the data/metadata separation at the language level.
      [-]
      - Ajedi32 a day ago
        Can you give an example of metadata you would put in a config file that isn't configuration and isn't a comment?
        [-]
        speed_spread a day ago
        Metadata is less useful in a config file since it's all static data. But for something more dynamic (messaging, persistence) attributes can be used for Time-To-Live, object class, source, signature, etc.
  - simonask 2 days ago
    KDL is really, really nice. And lightweight.
- lazystone 2 days ago
  No one mentioned HashiCorp HCL so far, though it's really a shame that it didn't get much traction...
  [-]
  - mrgaro a day ago
    HCL is so annoying as it tries so much to prevent user to "do too complex things" and thus it doesn't have proper iterators other similar concepts, which would be very useful when defining infrastructure as xode.
    This has resulted bunch of hacks (such as the count directive on terraform) so that the end result is a frustrating mess.
  - rswail a day ago
    HCL is ok except for the lack of user defined functions which leads to clumsy tricks with nested comprehensions.
    Given its general use around infrastructure, it'd be nice if it had IPv4 and IPv6 addresses as native types that get parsed.
- cousin_it 2 days ago
  How about textproto? And the proto definition gives the schema.
- speed_spread a day ago
  The article mentions
  > A simple subset of yaml
  Which already exists and is called StrictYAML. It's just strings, lists and dicts. No numbers. No booleans. No _countries_. No anchors. No JSON-compatible blocks. So, essentially it's what most of use think as being proper YAML, without all the stupid/bad/overcomplicated stuff. Just bring your own schema and types where required.
  https://hitchdev.com/strictyaml/
h1fra 2 days ago
not only is YAML a pain but JSON has native parser in major languages, while not yaml. I find it crazy some people are still actively choosing this over JSON (or alternatives)
[-]
- loudmax 2 days ago
  This is a case of the right tool for the right job. YAML is far easier to read and parse as a human than JSON.
  If you're passing data between processes, and you still want the data to be human readable, then JSON is a good choice.
  If you're writing a configuration file that's going to be edited by a human, then YAML is easier to look at and understand.
  [-]
  - feoren a day ago
    > YAML is far easier to read and parse as a human than JSON.
    When you're on line 4000 of a YAML configuration file and the previous 70 lines have been at indentation level 6, and you see a blank line and another line at indentation level 4 (or is that 5? maybe 3?) then I strongly, strongly disagree that two '}' characters are more difficult to read than newlines, tabs, and spaces.
    YAML is one of a family of languages borne from the idea that punctuation is bad and therefore should be invisible. Not gone, because all of these languages still have punctuation. No, these characters that are critically important to the interpretation of the file must be invisible.
    Code and markup is easy to read when it is easy to predict what the computer will do when it parses it. Invisible punctuation makes the files harder to read, not easier. The only thing easier in YAML is writing it in the first place, and we all know that "write-only" is an insult.
larkost a day ago
If anyone wants to raid some code for simpler YAML, I wrote a version for the RethinkDB tests a long time ago:
https://github.com/rethinkdb/rethinkdb/blob/main/test/common...
The problem I was trying to solve was that our tests involved a lot of things that looked like dicts (in fact they were), so my YAML-like parser stops parsing things when it looks like we have hit test code. This took out so much escaping, and made it easy to copy-paste tests into a REPL when you were working on the test (and vise-versa).
So it looks like YAML, but without most of the features, and without the footguns.
bilekas a day ago
I have always thought that there is a place for YAML but I do tend to avoid it when I can. I will say while working with terraform I have absolutely falled in love with HCL. It makes a lot of sense to me and there are a lot of validating you can do along the way leading to much more confidence in larger setups. iAC in my case at least.
xg15 a day ago
> While keys in json are always strings, in yaml they can be any value, including booleans.
TIL that yaml and json do not have the same data model and there are yaml documents that are not representable as json...
lambdaone a day ago
What's needed is something that is simple for humans to read and write, has a stable definition, and a clear and unambiguous syntax and mapping to data objects.
None of the systems I've seen achieve all those goals at once.
YAML, while at first sight a good idea, is irredeemably broken and should be deprecated for further use.
JSONC (https://jsonc.org/) is backwards-compatible with JSON, and a good target for long-term future migration.
.INI format works well as a structured subject-predicate-object tuple store for simple use cases.
We're probably going to have to live with that indifinitely, until someone comes up with a proposal that is better.
[-]
- gu009 a day ago
  Tailscale also have HuJSON: https://github.com/tailscale/hujson
BobbyTables2 2 days ago
I’m amazed how sane the “document from hell” looks.
The author didn’t even get into the weird stuff GitLab does with YAML too!
[-]
- RedShift1 a day ago
  Not gonna lie, I use Google and copy paste the stanzas that do the thing I want it to do. Same for Maven, someone somewhere has already solved the same problem I have, all I need to do is copy paste and adjust to my situation.
maweki 2 days ago
We found yaml to be a great exchange format for electronic exam data. It allows us to put student submitted answers and source code into a yaml file and there is no weird escaping. It's very readable with a text editor. And then we just add notes and a score as a list below and then there's the next submission.
For readability of large blocks of texts that may or may not contain various special characters and newlines the only other alternative we have seen was XML, but that is very verbose.
So what the author finds as a negative, the many string formats, are exactly what drew us to yaml in the first place.
[-]
- dreamcompiler 2 days ago
  Somebody in these discussions always correctly points out that s-expressions are as expressive as XML but without the excess line noise, so it might as well be me.
- privatelypublic 2 days ago
  What is so verbose about a cdata directive? Everybody complains about XML being verbose, never once heard complains about HTML being too verbose.
  [-]
  - tpmoney 11 hours ago
    I’ll be that person then. HTML is too verbose for anything intended to be read as plaintext (and not the parsed marked up form) more than 25% of the time. A well formatted java doc comment full of HTML markup is difficult to read as plaintext, but without the markup loses out on the expressiveness converting javadoc to html can give. That’s why it’s nice that Java 25 will introduce markdown as a new option for javadoc (and presumably why Rust chose it for the same)
jrmg a day ago
Amazed that there are no comments yet mentioning HUML:
https://news.ycombinator.com/item?id=45335129
It was on the front page yesterday!
Human-oriented Markup Language
HUML is a simple, strict, serialization language for documents, datasets, and configuration. It prioritizes strict form for human-readability. It looks like YAML, but tries to avoid its complexity, ambiguity, and pitfalls.
seiferteric 2 days ago
I wonder if you could make a new standard something based on yaml where every value was prefixed by a type so there is no ambiguity.
[-]
- juliend2 2 days ago
  We'd need a "YAML, the good parts".
  [-]
  - speed_spread a day ago
    It's called StrictYAML.
- Titan2189 2 days ago
  Obligatory https://xkcd.com/927/
  [-]
  - psnehanshu 2 days ago
    Yup, author made RCL
telliott1984 a day ago
I think I've tried to start using anchors at least once every year or so when I get annoyed with a particularly repetitive file. Never managed to get my head around it. Just seems so shoe-horned in and if anything makes the document harder to follow.
apexalpha a day ago
Up until now I thought YAMl was just json with all the special characters like { } replace by \n and stuff to make it human readable.
I had no idea it was even so opinionated.
Mostly I use it for docker and k8s configuration, so I haven’t run into it yet I suppose
kzrdude 2 days ago
Yaml is an interesting case study that we can (and have) learned a lot about. Mistakes to avoid. :)
rossant 2 days ago
Wow, I wasn't aware there was so much magic and arcane features in yaml. Great post. Thanks.
YouWhy 2 days ago
I came to regard YAML as a kind of a syntactic HFC syrup, a bearable idea that was taken too far.
Alas, YAML is just about everywhere, so the chances for a replacement that'll be both better behaved and as ubiquitous are unfortunately slim.
thomasfl 2 days ago
Not many know that the inventor of the YAML specification built a fully working pendulum clock as a teenager. With Lego bricks. YAML is a good standard for simple settings files. For more complex data structures, use JSON.
vjvjvjvjghv 2 days ago
It's really interesting that after all these years we still don't have a document format that just works. They all suck in their own sweet ways and we still have culture wars over them.
xenator 2 days ago
This one is amazing, I almost pissed myself laughing reading it. So true about YAML. Another caveat is using --- as section separator in the file. It will starts new file inside your existing file.
Still love it.
simonask 2 days ago
I never really understood why nobody ever just forked YAML and took out the ugly bits. It’s not a very complicated parser.
In the mean time, I’m very much enjoying KDL.
[-]
- esafak 2 days ago
  TOML
vivzkestrel 2 days ago
stupid question: why dont they announce a newer version of YAML that is not backwards compatible and allow only quoted strings in their parser?
[-]
- mystifyingpoi 2 days ago
  > that is not backwards compatible
  This would be a massive breaking change for Kubernetes. There are piles and piles of YAML all around the opensource that would need updating. It would be very hard to adopt.
  Also, quoting strings 100% of the time just looks ugly in my opinion. Not a big deal with autogenerated YAML, or YAML that I do not maintain, but for anything handwritten it's annoying.
  [-]
  - phito 2 days ago
    how is it annoying...? it's literally like that in almost every single language out there. IMO seeing unquoted strings in YAML feels weird.
    [-]
    - mystifyingpoi 2 days ago
      As I said, it's subjective. I like this
      image: my-repo.com/my-app:v1 imagePullPolicy: Always
      more than this
      image: "my-repo.com/my-app:v1" imagePullPolicy: "Always"
      That's all. Not sure about quoting keys though.
  - vivzkestrel a day ago
    is it a massive change? yes, will it cause serious problems for existing apps in production? yes. but think of this as one of those python 2 to 3 moments. They could improve the spec dramatically and cut the parser down by a crazy amount to detect edge cases. It ll be a bright direction forward for YAML
lerp-io 2 days ago
the problem is that yaml came from geeked out devops employees that used bash where as json came from javascript.
rsynnott a day ago
But, of course, _all_ yaml documents are from hell.
wingi a day ago
The norway problem is well known.
shadowgovt 2 days ago
Perfectly normal YAML document detected.
More seriously: this is a good overview of the reasons I dislike YAML as a web configuration language. There's too much overlap between the "friendly" auto-type-determination in YAML and the symbols used in web tech, from colons to Norway having a TLD. It wouldn't be so bad if yaml parsers could use expected type of each value as a hint, but that's not a feature in any parser I've met, so I'd rather just not use yaml for anything that's going to end up describing a web service.
mavamaarten 2 days ago
I despise yaml. On top of the points from the article, I never know where to indent and how whitespace is handled on multiline fields.
Just a yucky standard all-around
[-]
- al_borland 2 days ago
  Whitespace gets weird with indenting code.
  I use block scalars constantly now, with liberal use of the trimming dashes all over the place.
  Any time I need to preserve some indentation in my result, I always hate the formatting I’m left with, especially if there is logic involved.
tdkiran a day ago
I honestly don’t get how YAML became so popular and widely adopted. When compared to YAML, JSON is definitely my go-to format.
timetraveller26 a day ago
lua could have been a good replacement for yaml configuration files, the tables syntax is really natural and being a full programming language (a small one) allows for more complex usage.
privatelypublic 2 days ago
Can't take this seriously if XML isn't listed as an alternative.
[-]
- Someone 2 days ago
  FTA: Xml is noisy and annoying to write by hand
  [-]
  - privatelypublic a day ago
    So, at what point does YAML needing magic incantations, wrapping everything in quotes, avoiding any form of templating, etc. stop being less verbose (oops, meant noisy), and "annoying?"
    Reality is, clunky XML is badly designed, or simply has no schema attached.
raisaguys 2 days ago
[flagged]
secondcoming 2 days ago
It's honestly absurd how prevalent YAML is. It's clearly dumb.