Thoughts

Training Reasoning Models: Notes as I Try to Understand GRPO and DAPO

Notes on GRPO, DAPO, and training reasoning capabilities in language models.