Discussion about this post

User's avatar
Sidhant Chadda's avatar

Thanks for the post,

Most model weights I have seen are floating points between -1 and 1, If we got rid of the exponent bits wouldn't we be able to save ~31% model weight size in a 16 bit floating point?

Presumably this would require changes in the underlying hardware itself, in order to perform calculations with this new floating point.

But still find it bizarre that ML models have all these useless bits lying around.

Expand full comment
Nathan Lambert's avatar

The part about GPTQ is pretty bizarre - I would've thought quantization is just doing what you showed at scale. Maybe it works because it does that rounding operation in a vectorized operation? Rather than naive rounding which is slower? That doesn't sound like I'm saying anything intelligence. A tad funny that we don't know exactly why quantization works.

Expand full comment
5 more comments...

No posts