A Visual Guide to Quantization

Maarten Grootendorst

Jul 22, 2024

Exploring memory-efficient techniques for LLMs

Read →

24 Comments

Haydar

Jul 23, 2024

Thank you for this insightful and visually engaging guide on quantization, Maarten—it's a fantastic resource!

Expand full comment

Jongin Choi

Mar 14

I noticed a small typo and wanted to let you know! Thank you for the great article—I really appreciate it.

Original:

In practice, we do not need to map the entire FP32 range [-3.4e38, 3.4e38] into INT8. We merely need to find a way to map the range of our data (the model’s parameters) into IN8.

Correction:

In practice, we do not need to map the entire FP32 range [-3.4e38, 3.4e38] into INT8. We merely need to find a way to map the range of our data (the model’s parameters) into INT8.

Thanks again for sharing this insightful piece! 😊

Expand full comment

Romee Panchal

Sep 11, 2024

Wow! What a beautiful and insightful article.

Expand full comment

Eddy Giusepe

Jul 23, 2024

Thank you very much, MAARTEN GROOTENDORST !

Your explanation is very good 🤗!

Expand full comment

Tinias

Apr 6

Cool! How do you create these visuals and how much time did it take you? They look very nice!

Expand full comment

Satish Jasthi

Oct 29

Thanks a lot for such a lucid quantisation 101

Expand full comment

Naina Chaturvedi

Sep 22

++ Good Post. Also, start here : 500+ LLM, AI Agents, RAG, ML System Design Case Studies, 300+ Implemented Projects, Research papers in detail

https://open.substack.com/pub/naina0405/p/very-important-llm-system-design?r=14q3sp&utm_campaign=post&utm_medium=web&showWelcomeOnShare=false

Expand full comment

George Williams

May 19

Great blog. Under a caption you mention "Depending on the hardware, integer-based calculations might be faster than floating-point calculations but this isn’t always the case." Can you provide a link or paper substantiating this? I'd be curious to learn more about a use case where dedicated hardware on the same chip silicon is faster for floating point than integer. Thanks!

Expand full comment

Filippa

Apr 28

Thank you for a great explanation on quantization! I just have a question. You mention GGUF in a way that sounds like it is a quantization technique, but when I read about it elsewhere, it is usually described as a file format. I'm a little confused, is GGUF a quantization method, a file format, or both? Or are you referring to the quantization types stored in GGUF, like k-quants and i-quants?

Expand full comment

Jeevesh Juneja

Feb 17

In the third figure of GPT-Q section, how does 0.5 get quantized to 0.33? If we are quantizing to int4, shouldn't the output be an integer?

Thanks for the amazing article btw! <3

Expand full comment

Yan Yong

Oct 31, 2024

Very informative about quantization. I have book market it, Thank you for your time and effort!

Expand full comment

Oleksandr Rechynskyi

Oct 31, 2024

wow, this is really extensive article. Thanks!

Expand full comment

Jiwoo Park

Sep 2, 2024

Hello. Thank you for the excellent material.

In the `Common Data Types` section of Part 2, the mantissa part in the `BF16` illustration is shown as `1001000`. I'm curious as to why it isn't 1001001.

Expand full comment

Reply (1)

Valeriu Ohan

Jan 17

You are correct. The BF16 mantissa should be 1001001. Otherwise the encoded value is 3.125.

Expand full comment

Sandeep

Aug 23, 2024Edited

I wish papers published on arXiv are as accessible as this -- even for a subscription. The two column PDF is too rigid to read on any screen.

Expand full comment

Chuanliang Jiang

Aug 14, 2024

Excellent blog. One quick question: what is the benefit to upgrade to paid member ?

Expand full comment

Reply (1)

Maarten Grootendorst

Aug 14, 2024

Thank you for reminding me that I should make a dedicated page for this!

To give part of the answer, at the moment there is no benefit to the paid member other than to support me in creating these guides and keep them freely available to everyone. Creating these guides takes a long time to research the topic and create the visuals, so any support would be highly appreciated.

I want to keep this content available to those who do not have the funds. I might change this in the future and keep part of the content freely available with a part paid, but I prefer to keep it all free.

Seeing as this is merely a passion project thus far, I feel very hesistant to actively ask people to support me. In all honesty, just reading "Excellent blog" already makes me happy ;)

Expand full comment

The Humans In The Loop

Aug 13, 2024

These are absolutely excellent visualizations. Thank you for sharing!

Expand full comment

Exploring Language Models

A Visual Guide to Quantization