I wanted to verify this for myself, so I set up a small test harness on my production server. It ran 360 chat completions across a range of models, cancelling each request immediately after the first token was received. Below are the resulting first-token latency measurements:
人 民 网 版 权 所 有 ,未 经 书 面 授 权 禁 止 使 用。关于这个话题,搜狗输入法2026提供了深入分析
It is very easy to catch, especially if you've never had it before.,更多细节参见51吃瓜
(Image credit: Intel)