Show HN: Aion-Torch – Adaptive residual scaling for deep Transformers

(github.com)

2 points | by Rioverde 7 hours ago

No comments yet.