SWE-bench Verified no longer measures frontier coding capabilities

(openai.com)

322 points | by kmdupree a day ago

171 comments