21 Comments
User's avatar
Haydar's avatar

Thank you for this insightful and visually engaging guide on quantization, Maarten—it's a fantastic resource!

Expand full comment
Jongin Choi's avatar

I noticed a small typo and wanted to let you know! Thank you for the great article—I really appreciate it.

Original:

In practice, we do not need to map the entire FP32 range [-3.4e38, 3.4e38] into INT8. We merely need to find a way to map the range of our data (the model’s parameters) into IN8.

Correction:

In practice, we do not need to map the entire FP32 range [-3.4e38, 3.4e38] into INT8. We merely need to find a way to map the range of our data (the model’s parameters) into INT8.

Thanks again for sharing this insightful piece! 😊

Expand full comment
Romee Panchal's avatar

Wow! What a beautiful and insightful article.

Expand full comment
Eddy Giusepe's avatar

Thank you very much, MAARTEN GROOTENDORST !

Your explanation is very good 🤗!

Expand full comment
Tinias's avatar

Cool! How do you create these visuals and how much time did it take you? They look very nice!

Expand full comment
Filippa's avatar

Thank you for a great explanation on quantization! I just have a question. You mention GGUF in a way that sounds like it is a quantization technique, but when I read about it elsewhere, it is usually described as a file format. I'm a little confused, is GGUF a quantization method, a file format, or both? Or are you referring to the quantization types stored in GGUF, like k-quants and i-quants?

Expand full comment
Jeevesh Juneja's avatar

In the third figure of GPT-Q section, how does 0.5 get quantized to 0.33? If we are quantizing to int4, shouldn't the output be an integer?

Thanks for the amazing article btw! <3

Expand full comment
Yan Yong's avatar

Very informative about quantization. I have book market it, Thank you for your time and effort!

Expand full comment
Oleksandr Rechynskyi's avatar

wow, this is really extensive article. Thanks!

Expand full comment
Jiwoo Park's avatar

Hello. Thank you for the excellent material.

In the `Common Data Types` section of Part 2, the mantissa part in the `BF16` illustration is shown as `1001000`. I'm curious as to why it isn't 1001001.

Expand full comment
Valeriu Ohan's avatar

You are correct. The BF16 mantissa should be 1001001. Otherwise the encoded value is 3.125.

Expand full comment
Sandeep's avatar

I wish papers published on arXiv are as accessible as this -- even for a subscription. The two column PDF is too rigid to read on any screen.

Expand full comment
Chuanliang Jiang's avatar

Excellent blog. One quick question: what is the benefit to upgrade to paid member ?

Expand full comment
Maarten Grootendorst's avatar

Thank you for reminding me that I should make a dedicated page for this!

To give part of the answer, at the moment there is no benefit to the paid member other than to support me in creating these guides and keep them freely available to everyone. Creating these guides takes a long time to research the topic and create the visuals, so any support would be highly appreciated.

I want to keep this content available to those who do not have the funds. I might change this in the future and keep part of the content freely available with a part paid, but I prefer to keep it all free.

Seeing as this is merely a passion project thus far, I feel very hesistant to actively ask people to support me. In all honesty, just reading "Excellent blog" already makes me happy ;)

Expand full comment
The Humans In The Loop's avatar

These are absolutely excellent visualizations. Thank you for sharing!

Expand full comment
kevin's avatar

best!

Expand full comment
Hongwei's avatar

Thank you so much for the visual explanations. I benefited a lot from this.

Expand full comment
Rico Pagliuca's avatar

Incredible, thanks for this! I have read about all of these and have them mostly well understood but for me the visual aids are /incredibly/ assistive. Really appreciate it! Sharing widely.

<entitled> Addendum for HQQ when? </entitled> *ducks* :p

[ https://github.com/mobiusml/hqq ]

Expand full comment