Some Latency Metrics for Voice UIs

(writingisthinkng.substack.com)

2 points | by fatruchir 6 hours ago

1 comments

fatruchir 6 hours ago
Continuing on the journey to get my hands dirty with voice UIs - I put down some user perceived latency metrics I was seeing when building VUIs.
Key points: - I used the 'pipeline' approach of STT + LLM + TTS (as opposed to the S2S approach eg: gpt-realtime) - This approach (with my specific setup) - yielded latency far greater than the 500ms target, where conversations feel "natural" and there aren't any awkward silences - With the LLM as gpt-5-mini I saw latency at ~1.4s and with the LLM as Llama 3.1-8b on Cerebras I saws 1.1s