Learning Action Macros for Computer-Use Agents

As part of CDSS 94: Building Thoughtful AI Systems at Berkeley, my teammates and I were working on a browser agent to automate online e-commerce tasks such as Amazon order fulfillment. While working with it, we noticed how often the agent repeats the same sequences: searching, filtering, navigating pages, adding items to cart.

That led us to a question: could we mine these repeated action chunks from traces and expose them as higher-level "macros"? Instead of reasoning step by step, a browser agent might call something like search(query) or open_product(id), learned directly from its (or other agents') past behavior.

Our early, very preliminary experiments make us think this might reduce the number of decisions an agent has to make. In principle, that could mean fewer model calls and less planning depth, though it is still unclear how much it would affect actual browser execution time.

It also seems possible that this depends a lot on the setting. In more structured workflows, like order fulfillment, form filling, or account management, repeated patterns may be easier to surface. On broader web tasks, there may simply not be enough consistency for useful abstractions to emerge.

We're excited to keep exploring whether collecting traces across representative tasks can surface reusable macros, and whether giving agents access to them might make browser agents both cheaper and better, especially on more complex tasks where too much effort is spent just navigating the UI.