Ah these poor fools. Having built this exact product (OOXML compatible editor in React) before, it took all of two minutes to find a bug. The issue is that the OOXML spec is not in fact definitive - Word is, and trying to implement it from the spec will produce something that works maybe 80% of the time then fall over completely when you hit one of hundreds of minor, undocumented edge cases. Assuming of course that CC did not just hallucinate something. And then there's the more fundamental problem that HTML/CSS has unresolvable incompatibilities with OOXML. This is why Google Docs for instance use canvas for rendering.
Fair point, we know the editor isn't yet 1:1 with Word. When you built yours, was Word your source of truth (reverse-engineering sense), or did you stick to MS-OE376? And any recommended process for systematically uncovering those undocumented edge cases?
We went out and used our editor against our and customer's documents. The Open part of OOXML makes as much sense as the Open in OpenAI. Microsoft made OOXML available to fend off an antitrust lawsuit, there is no incentive for them to make it actually easy to build competing editors off their specification.
FWIW the bug I found is that your comment parser assumes the w:date attribute represents a useful timestamp of when comments are made. It does not - a bug in Word causes it to save it as ISO8601 local time but _without timezone_, rendering it useless if more than one user across different timezone edits the document. Instead, you need to cross reference the comment with a newer comment part and find a dateUtc attribute. The above is, of course, completely undocumented.
We don’t have a formal '% compatibility' metric yet, but it’s on our radar as a feedback loop mechanism for self-improvement.
For now, we mostly rely on testing with our own and customer docs. In practice, we were seeing solid results after a couple of days of keeping Claude working in the loop and giving lots of feedback: .docx files along with screenshots annotated to highlight what didn’t work.
Excellent work! To put out the importance of the project - as of today there is not many google docs/word online alternative that is completely open source.
I'm yet to dig the code on how pagination is implemented but if the page breaks mimick word's - this is huge!
Would be more impressive if this was done for something obscure like Microsoft Visio. Theres countless oss ms word editors/libs Claude probably ripped off
Yup that’s why I suggested it. The vsdx schema is notably complex and I don’t see a lot of code examples in the wild. I seriously doubt an llm would be able to output working code for it. Docx is a common use case and a quick google search yields multiple popular libraries that understand the format. Anyways, cool that an llm was able to output a functional docx editor, but that’s certainly not impressive or a groundbreaking feat by any means
Ah these poor fools. Having built this exact product (OOXML compatible editor in React) before, it took all of two minutes to find a bug. The issue is that the OOXML spec is not in fact definitive - Word is, and trying to implement it from the spec will produce something that works maybe 80% of the time then fall over completely when you hit one of hundreds of minor, undocumented edge cases. Assuming of course that CC did not just hallucinate something. And then there's the more fundamental problem that HTML/CSS has unresolvable incompatibilities with OOXML. This is why Google Docs for instance use canvas for rendering.
Fair point, we know the editor isn't yet 1:1 with Word. When you built yours, was Word your source of truth (reverse-engineering sense), or did you stick to MS-OE376? And any recommended process for systematically uncovering those undocumented edge cases?
We went out and used our editor against our and customer's documents. The Open part of OOXML makes as much sense as the Open in OpenAI. Microsoft made OOXML available to fend off an antitrust lawsuit, there is no incentive for them to make it actually easy to build competing editors off their specification.
FWIW the bug I found is that your comment parser assumes the w:date attribute represents a useful timestamp of when comments are made. It does not - a bug in Word causes it to save it as ISO8601 local time but _without timezone_, rendering it useless if more than one user across different timezone edits the document. Instead, you need to cross reference the comment with a newer comment part and find a dateUtc attribute. The above is, of course, completely undocumented.
Can you say the name of the editor your worked on?
I feel your pain. PDF applications have the same problem. The thousand page PDF spec isn't actually the spec, Acrobat is the spec.
When you built this exact product, how long did it take you to reach 80% compatibility?
We don’t have a formal '% compatibility' metric yet, but it’s on our radar as a feedback loop mechanism for self-improvement.
For now, we mostly rely on testing with our own and customer docs. In practice, we were seeing solid results after a couple of days of keeping Claude working in the loop and giving lots of feedback: .docx files along with screenshots annotated to highlight what didn’t work.
Excellent work! To put out the importance of the project - as of today there is not many google docs/word online alternative that is completely open source.
I'm yet to dig the code on how pagination is implemented but if the page breaks mimick word's - this is huge!
So it’s like Google Docs? What am I looking at exactly?
Yes, it's like Google Docs for .docx files. It's open-source, MIT-licensed, and runs fully in JS, so you can embed it in your app.
Every other JS DOCX editor I found was either abandoned or commercial. I couldn't find a solid MIT-licensed option.
More vibe coded slop on Show HN
Would be more impressive if this was done for something obscure like Microsoft Visio. Theres countless oss ms word editors/libs Claude probably ripped off
The schema Visio’s files save to is very complex.
There’s stuff like diagrams.net, but that’s not compatible.
Yup that’s why I suggested it. The vsdx schema is notably complex and I don’t see a lot of code examples in the wild. I seriously doubt an llm would be able to output working code for it. Docx is a common use case and a quick google search yields multiple popular libraries that understand the format. Anyways, cool that an llm was able to output a functional docx editor, but that’s certainly not impressive or a groundbreaking feat by any means
Whats a Ralph loop?
https://ghuntley.com/loop/