Thoughts — Raghav Nautiyal

April 26, 2026

On how models learn human preferences and optimize for them.

April 6, 2026

On mining repeated behaviors from traces to build higher-level actions for agents.

March 9, 2026

On what happens when models start reasoning about being evaluated.

February 16, 2026

Notes on GRPO, DAPO, and training reasoning capabilities in language models.