I noticed a small typo and wanted to let you know! Thank you for the great article—I really appreciate it.
Original:
In practice, we do not need to map the entire FP32 range [-3.4e38, 3.4e38] into INT8. We merely need to find a way to map the range of our data (the model’s parameters) into IN8.
Correction:
In practice, we do not need to map the entire FP32 range [-3.4e38, 3.4e38] into INT8. We merely need to find a way to map the range of our data (the model’s parameters) into INT8.
Thank you for a great explanation on quantization! I just have a question. You mention GGUF in a way that sounds like it is a quantization technique, but when I read about it elsewhere, it is usually described as a file format. I'm a little confused, is GGUF a quantization method, a file format, or both? Or are you referring to the quantization types stored in GGUF, like k-quants and i-quants?
In the `Common Data Types` section of Part 2, the mantissa part in the `BF16` illustration is shown as `1001000`. I'm curious as to why it isn't 1001001.
Thank you for reminding me that I should make a dedicated page for this!
To give part of the answer, at the moment there is no benefit to the paid member other than to support me in creating these guides and keep them freely available to everyone. Creating these guides takes a long time to research the topic and create the visuals, so any support would be highly appreciated.
I want to keep this content available to those who do not have the funds. I might change this in the future and keep part of the content freely available with a part paid, but I prefer to keep it all free.
Seeing as this is merely a passion project thus far, I feel very hesistant to actively ask people to support me. In all honesty, just reading "Excellent blog" already makes me happy ;)
Incredible, thanks for this! I have read about all of these and have them mostly well understood but for me the visual aids are /incredibly/ assistive. Really appreciate it! Sharing widely.
<entitled> Addendum for HQQ when? </entitled> *ducks* :p
Thank you for this insightful and visually engaging guide on quantization, Maarten—it's a fantastic resource!
I noticed a small typo and wanted to let you know! Thank you for the great article—I really appreciate it.
Original:
In practice, we do not need to map the entire FP32 range [-3.4e38, 3.4e38] into INT8. We merely need to find a way to map the range of our data (the model’s parameters) into IN8.
Correction:
In practice, we do not need to map the entire FP32 range [-3.4e38, 3.4e38] into INT8. We merely need to find a way to map the range of our data (the model’s parameters) into INT8.
Thanks again for sharing this insightful piece! 😊
Wow! What a beautiful and insightful article.
Thank you very much, MAARTEN GROOTENDORST !
Your explanation is very good 🤗!
Cool! How do you create these visuals and how much time did it take you? They look very nice!
Thank you for a great explanation on quantization! I just have a question. You mention GGUF in a way that sounds like it is a quantization technique, but when I read about it elsewhere, it is usually described as a file format. I'm a little confused, is GGUF a quantization method, a file format, or both? Or are you referring to the quantization types stored in GGUF, like k-quants and i-quants?
In the third figure of GPT-Q section, how does 0.5 get quantized to 0.33? If we are quantizing to int4, shouldn't the output be an integer?
Thanks for the amazing article btw! <3
Very informative about quantization. I have book market it, Thank you for your time and effort!
wow, this is really extensive article. Thanks!
Hello. Thank you for the excellent material.
In the `Common Data Types` section of Part 2, the mantissa part in the `BF16` illustration is shown as `1001000`. I'm curious as to why it isn't 1001001.
You are correct. The BF16 mantissa should be 1001001. Otherwise the encoded value is 3.125.
I wish papers published on arXiv are as accessible as this -- even for a subscription. The two column PDF is too rigid to read on any screen.
Excellent blog. One quick question: what is the benefit to upgrade to paid member ?
Thank you for reminding me that I should make a dedicated page for this!
To give part of the answer, at the moment there is no benefit to the paid member other than to support me in creating these guides and keep them freely available to everyone. Creating these guides takes a long time to research the topic and create the visuals, so any support would be highly appreciated.
I want to keep this content available to those who do not have the funds. I might change this in the future and keep part of the content freely available with a part paid, but I prefer to keep it all free.
Seeing as this is merely a passion project thus far, I feel very hesistant to actively ask people to support me. In all honesty, just reading "Excellent blog" already makes me happy ;)
These are absolutely excellent visualizations. Thank you for sharing!
best!
Thank you so much for the visual explanations. I benefited a lot from this.
Incredible, thanks for this! I have read about all of these and have them mostly well understood but for me the visual aids are /incredibly/ assistive. Really appreciate it! Sharing widely.
<entitled> Addendum for HQQ when? </entitled> *ducks* :p
[ https://github.com/mobiusml/hqq ]