DeepSeek: Inference-Time Scaling for Generalist Reward Modeling

(arxiv.org)

99 points | by tim_sw 17 hours ago

17 comments