Great article! I am wondering if we can translate your blog into Chinese and post it on AI community. We will keep the original link and state where it is translated from. Thank you.
Sure! As long as the source is shared, I'm all for it. When it is translated, could you share the link? I will then also add it to the beginning of the post.
Also here in diagram 2 , when we have expert capacity of 3. For token 4 & 5 expert 2 has highest probability. Shouldnt it be sent to them instead of expert 4?
You are correct, thank you for sharing! It seems I completely missed those things. I updated the ones you mentioned and also updated a couple of others that needed minor updates.
The simplicity of the explanation was very helpful.
Thank you for creating this
Awesome artical! Very insightful! Thank you for providing that!
Great article! I am wondering if we can translate your blog into Chinese and post it on AI community. We will keep the original link and state where it is translated from. Thank you.
Sure! As long as the source is shared, I'm all for it. When it is translated, could you share the link? I will then also add it to the beginning of the post.
Excellent blog. Very insightful!!!
Just a doubt. Shouldn't the sum of probabilities =1 in the diagram of router
https://newsletter.maartengrootendorst.com/i/148217245/the-router
Also here in diagram 2 , when we have expert capacity of 3. For token 4 & 5 expert 2 has highest probability. Shouldnt it be sent to them instead of expert 4?
https://newsletter.maartengrootendorst.com/i/148217245/expert-capacity
You are correct, thank you for sharing! It seems I completely missed those things. I updated the ones you mentioned and also updated a couple of others that needed minor updates.
Thanks a lot for confirming. Really appreciate the quality & effort put in both the book & the blog.