Sam on Artificial Fintelligence

3 Comments

Apr 14, 2024

Thanks for writing.

I didn't see an explanation as to HOW the predictions from the smaller, faster model are incorporated into the predictions of the larger model, though, in the case of speculative decoding.

Expand full comment

You make N predictions from the smaller model in serial: x_0, …, x_N. You then run the N predictions through the bigger model simultaneously as a batch. If the predictions match, you keep them, otherwise you throw them away.

Expand full comment

Thanks! :)

Expand full comment

Like

Reply

Share

#nojs-banner { position: fixed; bottom: 0; left: 0; padding: 16px 16px 16px 32px; width: 100%; box-sizing: border-box; background: red; color: white; font-family: -apple-system, "Segoe UI", Roboto, Helvetica, Arial, sans-serif, "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol"; font-size: 13px; line-height: 13px; } #nojs-banner a { color: inherit; text-decoration: underline; } This site requires JavaScript to run correctly. Please turn on JavaScript or unblock scripts