When I tried building MCP Apps [1], the official repos (https://github.com/openai/openai-apps-sdk-examples, https://github.com/modelcontextprotocol/ext-apps/tree/main/e...) were great starting points, but they're designed for human developers. When I used them with Claude Code, I ended up in the usual loop: agent writes code → I manually test the app on ChatGPT → describe errors back → repeat. Plus, we didn't know what the best practices are, and struggled to enforce them.
So I built an MCP App template designed for coding agents to work as autonomously as possible on an MCP app.
The key idea: orthogonal testing. 450+ tests parameterized across 12 widget modules that verify infrastructure (protocol compliance, best practices grade, browser rendering), not business logic. Modify widgets, change data, add features — the tests should still pass. Agents iterate freely and get feedback without a human in the loop.
Other features:
- Hierarchical documentation that includes the MCP-App & OpenAI Apps SDK official llms.txt files
- Local chat simulator app that works even without API keys via Puter.js
- Visual testing of every widget: pnpm run ui-test --tool show_carousel → screenshot at /tmp/ui-test/screenshot.png
- 12 working examples (QR codes to 3D solar system) gathered from the official repos mentioned above.
The repo includes an unedited ~15 min video of Claude Code building an app autonomously which worked directly within ChatGPT.
I'd love to hear how it goes if you try it. Or even better: ask for a feedback to your agent, and post it here!
[1] MCP Apps (https://modelcontextprotocol.io/docs/extensions/apps) let you build interactive widgets that run inside Claude, ChatGPT, VS Code, and other AI hosts. In contrast to smartphone apps, the same code can deploy to all platforms.
Yeah, I have to say that finding the right balance between what to write and not to write in the AGENTS.md is quite hard.
Regarding the protocols being unstable, that's quite a fair point. Maybe it is possible to automate this? That is, detecting changes in the official docs automatically, and adapt the docs and tests automatically based on it via a Coding Agent.
Hi, author of the repo speaking here!
When I tried building MCP Apps [1], the official repos (https://github.com/openai/openai-apps-sdk-examples, https://github.com/modelcontextprotocol/ext-apps/tree/main/e...) were great starting points, but they're designed for human developers. When I used them with Claude Code, I ended up in the usual loop: agent writes code → I manually test the app on ChatGPT → describe errors back → repeat. Plus, we didn't know what the best practices are, and struggled to enforce them.
So I built an MCP App template designed for coding agents to work as autonomously as possible on an MCP app.
The key idea: orthogonal testing. 450+ tests parameterized across 12 widget modules that verify infrastructure (protocol compliance, best practices grade, browser rendering), not business logic. Modify widgets, change data, add features — the tests should still pass. Agents iterate freely and get feedback without a human in the loop.
Other features: - Hierarchical documentation that includes the MCP-App & OpenAI Apps SDK official llms.txt files - Local chat simulator app that works even without API keys via Puter.js - Visual testing of every widget: pnpm run ui-test --tool show_carousel → screenshot at /tmp/ui-test/screenshot.png - 12 working examples (QR codes to 3D solar system) gathered from the official repos mentioned above.
The repo includes an unedited ~15 min video of Claude Code building an app autonomously which worked directly within ChatGPT.
I'd love to hear how it goes if you try it. Or even better: ask for a feedback to your agent, and post it here!
[1] MCP Apps (https://modelcontextprotocol.io/docs/extensions/apps) let you build interactive widgets that run inside Claude, ChatGPT, VS Code, and other AI hosts. In contrast to smartphone apps, the same code can deploy to all platforms.
First time I see a clear split where the README markets to humans and the AGENTS.md has a clean tutorial for LLMs.
I'll give it a try, MCP apps are full of promises but protocols are so unstable that I wouldn't want to write the boiler plate myself.
Yeah, I have to say that finding the right balance between what to write and not to write in the AGENTS.md is quite hard.
Regarding the protocols being unstable, that's quite a fair point. Maybe it is possible to automate this? That is, detecting changes in the official docs automatically, and adapt the docs and tests automatically based on it via a Coding Agent.
Crazy ! Testing now
Hope to hear *your coding agent's* feedback!