Agent-evals: Metacognitive scoring and boundary testing for LLM coding agents

(thinkwright.ai)

2 points | by oceanwaves 13 hours ago

1 comments