Thoughts

Learning Action Macros for Computer-Use Agents

On mining repeated behaviors from traces to build higher-level actions for agents.

Evaluation Awareness

On what happens when models start reasoning about being evaluated.

Training Reasoning Models: Notes as I Try to Understand GRPO and DAPO

Notes on GRPO, DAPO, and training reasoning capabilities in language models.