Tree Search Distillation for Language Models Using PPO

(ayushtambde.com)

87 points | by at2005 4 days ago

14 comments