24 Comments
User's avatar
Haydar's avatar

Thank you for this insightful and visually engaging guide on quantization, Maarten—it's a fantastic resource!

Jongin Choi's avatar

I noticed a small typo and wanted to let you know! Thank you for the great article—I really appreciate it.

Original:

In practice, we do not need to map the entire FP32 range [-3.4e38, 3.4e38] into INT8. We merely need to find a way to map the range of our data (the model’s parameters) into IN8.

Correction:

In practice, we do not need to map the entire FP32 range [-3.4e38, 3.4e38] into INT8. We merely need to find a way to map the range of our data (the model’s parameters) into INT8.

Thanks again for sharing this insightful piece! 😊

Romee Panchal's avatar

Wow! What a beautiful and insightful article.

Eddy Giusepe's avatar

Thank you very much, MAARTEN GROOTENDORST !

Your explanation is very good 🤗!

Tinias's avatar

Cool! How do you create these visuals and how much time did it take you? They look very nice!

Satish Jasthi's avatar

Thanks a lot for such a lucid quantisation 101

Naina Chaturvedi's avatar

++ Good Post. Also, start here : 500+ LLM, AI Agents, RAG, ML System Design Case Studies, 300+ Implemented Projects, Research papers in detail

https://open.substack.com/pub/naina0405/p/very-important-llm-system-design?r=14q3sp&utm_campaign=post&utm_medium=web&showWelcomeOnShare=false

George Williams's avatar

Great blog. Under a caption you mention "Depending on the hardware, integer-based calculations might be faster than floating-point calculations but this isn’t always the case." Can you provide a link or paper substantiating this? I'd be curious to learn more about a use case where dedicated hardware on the same chip silicon is faster for floating point than integer. Thanks!

Filippa's avatar

Thank you for a great explanation on quantization! I just have a question. You mention GGUF in a way that sounds like it is a quantization technique, but when I read about it elsewhere, it is usually described as a file format. I'm a little confused, is GGUF a quantization method, a file format, or both? Or are you referring to the quantization types stored in GGUF, like k-quants and i-quants?

Jeevesh Juneja's avatar

In the third figure of GPT-Q section, how does 0.5 get quantized to 0.33? If we are quantizing to int4, shouldn't the output be an integer?

Thanks for the amazing article btw! <3

Yan Yong's avatar

Very informative about quantization. I have book market it, Thank you for your time and effort!

Oleksandr Rechynskyi's avatar

wow, this is really extensive article. Thanks!

Jiwoo Park's avatar

Hello. Thank you for the excellent material.

In the `Common Data Types` section of Part 2, the mantissa part in the `BF16` illustration is shown as `1001000`. I'm curious as to why it isn't 1001001.

Valeriu Ohan's avatar

You are correct. The BF16 mantissa should be 1001001. Otherwise the encoded value is 3.125.

Sandeep's avatar

I wish papers published on arXiv are as accessible as this -- even for a subscription. The two column PDF is too rigid to read on any screen.

Chuanliang Jiang's avatar

Excellent blog. One quick question: what is the benefit to upgrade to paid member ?

Maarten Grootendorst's avatar

Thank you for reminding me that I should make a dedicated page for this!

To give part of the answer, at the moment there is no benefit to the paid member other than to support me in creating these guides and keep them freely available to everyone. Creating these guides takes a long time to research the topic and create the visuals, so any support would be highly appreciated.

I want to keep this content available to those who do not have the funds. I might change this in the future and keep part of the content freely available with a part paid, but I prefer to keep it all free.

Seeing as this is merely a passion project thus far, I feel very hesistant to actively ask people to support me. In all honesty, just reading "Excellent blog" already makes me happy ;)

The Humans In The Loop's avatar

These are absolutely excellent visualizations. Thank you for sharing!