Discussion about this post

User's avatar
Mattia Proietti's avatar

This is not a post, it is a lecture. Simply amazing, thank you.

Expand full comment
Manudev Jain's avatar

This is the finest post on DeepSeek and RL in general. I would like to know if in this 5th step the “preference reward” was applied using RLHF or any other way? Thankyou for the post :)

Expand full comment
24 more comments...

No posts