DSpark: Speculative decoding accelerates LLM inference [pdf]

(github.com)

669 points | by aurenvale 10 hours ago

259 comments