Some context: we kept finding that our internal red-teaming only covers so much - the attack surface for agents with real capabilities is too broad for any single team.
So we opened it up. A few things that might be interesting to folks here:
- These aren't toy prompts hiding a secret word. The agents have actual tool access and behave like production agents would.
- Anyone can propose a challenge - the scenario, the agent, the objective. Community votes on what goes live next.
We're genuinely looking for people to both break things and suggest ideas for what should be tested next. The agent runtime is being open-sourced separately.
Some context: we kept finding that our internal red-teaming only covers so much - the attack surface for agents with real capabilities is too broad for any single team. So we opened it up. A few things that might be interesting to folks here:
- These aren't toy prompts hiding a secret word. The agents have actual tool access and behave like production agents would.
- System prompts and challenge configs are versioned in the open: https://github.com/fabraix/playground
- Anyone can propose a challenge - the scenario, the agent, the objective. Community votes on what goes live next.
We're genuinely looking for people to both break things and suggest ideas for what should be tested next. The agent runtime is being open-sourced separately.