Thank you for the kind words! I'm using Figma to create the visualizations but all visuals (same with my other visual guides) could have been done with something like Powerpoint, KeyNote, etc.
The article is awesome, thanks for the great content and the simplicity of the explanation and the flow of the content. I haven't finished it yet but can't wait to finish it and read it again and again
Great post! A quick question, "In step 2, the resulting model was trained using a similar RL process as was used to train DeepSeek-V3-Zero." , I looked up DeepSeek-V3-Zero and could't find any other mentions, could it be a typo for DeepSeek-R1-Zero?
Thanks for providing such a detailed content with brilliant explanations, easy to read. I did have some pre-information but now i'm clear and confident on most of the topics I was looking around Web.
Thanks for the helpful article and nice visuals but there is a point of confusion. What i understand from the report is that they directly finetune on 800k data for distillation rather than using logits from the bigger model. I think what they refer to as distillation is different from vanilla knowledge distillation.
Can you explain why samples for R1 are focused on coding tasks (e.g. "does it compile?")?
Technical report says "large-scale RL has not been applied extensively in software engineering tasks" and leave this task for the future versions of R1 model.
This is not a post, it is a lecture. Simply amazing, thank you.
loved it. thanks for writing it.
DL had a turing award. sutton and barto for RL have one too.. ones coming for u and jay.. keep inspiring
This is very helpful. Thank you for writing it!
Thank you, for the awesome content. I have a question, what tools do you use for visualization?
Thank you for the kind words! I'm using Figma to create the visualizations but all visuals (same with my other visual guides) could have been done with something like Powerpoint, KeyNote, etc.
The article is awesome, thanks for the great content and the simplicity of the explanation and the flow of the content. I haven't finished it yet but can't wait to finish it and read it again and again
Thank you! Great content! One typo: the exploration and exploitation annotation in the illustration of the Monte Carlo tree search seems flipped.
Thanks for the feedback! I updated the visual.
Thanks a lot for this. Loved the crisp explanation!
Great article. Thanks
Great post! A quick question, "In step 2, the resulting model was trained using a similar RL process as was used to train DeepSeek-V3-Zero." , I looked up DeepSeek-V3-Zero and could't find any other mentions, could it be a typo for DeepSeek-R1-Zero?
Thanks! It was indeed a typo. Fixed :)
Thanks for providing such a detailed content with brilliant explanations, easy to read. I did have some pre-information but now i'm clear and confident on most of the topics I was looking around Web.
Thanks for the helpful article and nice visuals but there is a point of confusion. What i understand from the report is that they directly finetune on 800k data for distillation rather than using logits from the bigger model. I think what they refer to as distillation is different from vanilla knowledge distillation.
A very great informative content .Thanku so much
Great article, did you use deepseek for any part?
OMG, I love it!
Great article!
Can you explain why samples for R1 are focused on coding tasks (e.g. "does it compile?")?
Technical report says "large-scale RL has not been applied extensively in software engineering tasks" and leave this task for the future versions of R1 model.
This is an amazing post, thank you!