Long-Context Attention from Kernel Efficiency to Distributed Context Parallelism

(arxiv.org)

1 points | by PaulHoule 11 hours ago

No comments yet.