I'm not sure whether this is an artefact of translation, but things like this don't inspire confidence:
> The "Modern Data Stack" (MDS) is a hot concept in data engineering in recent years, referring to a cloud-native, modular, decoupled combination of data infrastructure
this is great and i bookmarked it so i can read it later. i’m just curious though, was the readme written by chatgpt? i can’t tell if im paranoid thinking everything is written by chatgpt
I think it was. It's a wall of information, lots of summary tables, fake warmth, and has that LLM smell to it. I'd be very surprised if this wasn't generated text.
I hope xx123122 won't mind my mentioning that they emailed us about this post, which originally got caught in a spam filter. I invited them to post a comment giving the background to the project but they probably haven't seen my reply yet. Hopefully soon, given that the post struck a chord!
Edit: they did, and I've moved that post to the toptext.
Thanks for the heads-up! We noticed that discrepancy as well and have just updated the README_en.md with the correct English diagram. It should be displaying correctly now.
If you are interested in (2026-)internet scale data engineering challenges (e.g. 10-100s of petabyte processing) challenges and pre-training/mid-training/post-training scale challenges, please send me an email to d+data@krea.ai !
Thanks for the support! We believe that code and engineering challenges are universal languages.
We are pleasantly surprised by the warm reception. We know the project (and our English localization) is still a Work in Progress, but we are committed to improving it to meet the high standards of the HN community. We'll keep shipping updates!
I'm not sure whether this is an artefact of translation, but things like this don't inspire confidence:
> The "Modern Data Stack" (MDS) is a hot concept in data engineering in recent years, referring to a cloud-native, modular, decoupled combination of data infrastructure
https://github.com/datascale-ai/data_engineering_book/blob/m...
Later parts are better and more to the point though: https://github.com/datascale-ai/data_engineering_book/blob/m...
Edit: perhaps I judged to early. The RAG sections isn't bad either: https://github.com/datascale-ai/data_engineering_book/blob/m...
this is great and i bookmarked it so i can read it later. i’m just curious though, was the readme written by chatgpt? i can’t tell if im paranoid thinking everything is written by chatgpt
I think it was. It's a wall of information, lots of summary tables, fake warmth, and has that LLM smell to it. I'd be very surprised if this wasn't generated text.
Whether it's GPT or not, it needs rewriting.
I'd have titled the submission 'Data Engineering for LLMs...' as it is focused on that.
Parquet alone is not for modern data engineering. Delta, Iceberg should be in the list
English version: https://github.com/datascale-ai/data_engineering_book/blob/m...
Oh thanks! I've switched the top URL to that now. Submitted URL was https://github.com/datascale-ai/data_engineering_book.
I hope xx123122 won't mind my mentioning that they emailed us about this post, which originally got caught in a spam filter. I invited them to post a comment giving the background to the project but they probably haven't seen my reply yet. Hopefully soon, given that the post struck a chord!
Edit: they did, and I've moved that post to the toptext.
The figures in the different chapters are in english (it's not the case for the image in README_en.md).
Thanks for the heads-up! We noticed that discrepancy as well and have just updated the README_en.md with the correct English diagram. It should be displaying correctly now.
If you are interested in (2026-)internet scale data engineering challenges (e.g. 10-100s of petabyte processing) challenges and pre-training/mid-training/post-training scale challenges, please send me an email to d+data@krea.ai !
谢谢
How is possible a Chinese publication gets to the top in HN?
Thanks for the support! We believe that code and engineering challenges are universal languages.
We are pleasantly surprised by the warm reception. We know the project (and our English localization) is still a Work in Progress, but we are committed to improving it to meet the high standards of the HN community. We'll keep shipping updates!
Nevermind.