Logo SkillWeaver

Web Agents can Self-Improve by Discovering and Honing Skills


1The Ohio State University      2University of Virginia 3Purdue University 4Carnegie Mellon University 5Cisco Research
{zheng.2372, su.809}@osu.edu
*Equal contribution.

SkillWeaver collects skills as Python functions as it learns to use websites, and then uses those collected skills to improve its performance on future tasks through a purely self-driven curriculum. We demonstrate strong performance improvements on WebArena and on live websites without requiring any parameter updates. Skills are initially generated as generalized versions of past successful task attempts and subsequently debugged through a trial-and-error process.

Updates

  • 2025/04/09: Website is live. The initial manuscript and results are available.

Logo SkillWeaver

Overview

Main figure

In this work, we introduce SkillWeaver, a skill-centric framework enabling agents to self-improve by autonomously synthesizing reusable skills as APIs. Given a new website, the agent autonomously discovers skills, executes them for practice, and distills practice experiences into robust APIs. Iterative exploration continually expands a library of lightweight, plug-and-play APIs, significantly enhancing the agent's capabilities. Experiments on WebArena and real-world websites demonstrate the efficacy of SkillWeaver, achieving relative success rate improvements of 31.8% and 39.8%, respectively. Additionally, APIs synthesized by strong agents substantially enhance weaker agents through transferable skills, yielding improvements of up to 54.3% on WebArena.

WebArena Experiments

The high-quality grounding data synthesized from web (9M elements from Web-Hybrid) effectively helps UGround generalize to Desktop and Mobile UIs, making UGround outperform previous SOTA SeeClick on every platform and element type on ScreenSpot.

WebArena
Method Gitlab Map Shopping CMS Reddit AVG.
WebArena 15.0 15.6 13.9 10.4 6.6 12.3
AutoEval 25.0 27.5 39.6 20.9 20.8 26.9
*SteP 32.0 30.0 37.0 24.0 59.0 33.0
SKILLWEAVER
GPT-4o 17.8 27.5 19.8 18.7 37.7 22.6
+ Skills 22.2 33.9 27.2 25.8 50.0 29.8
Δ ↑ 25% ↑ 23% ↑ 38% ↑ 38% ↑ 33% ↑ 32%
GPT-4o-mini 6.1 10.3 11.8 3.3 18.9 9.2
+ Skills 8.9 16.7 17.1 7.7 26.4 14.1
Δ ↑ 46% ↑ 62% ↑ 46% ↑ 133% ↑ 40% ↑ 45%

Live Website Experiments

Online-Mind2Web
Method Drug Flight Cooking Car AVG.
Baseline 65.0 11.7 62.5 11.1 40.2
+ Skills 87.0 29.4 75.0 11.1 56.2
Δ ↑ 34% ↑ 151% ↑ 20% ↑ 0% ↑ 40%

Case Studies

We compare success and failure cases of the generated APIs.

API Generation

Here, the agent is able to generate a useful API to manage the process of checking out all items in a cart.

async def express_checkout(page):
  """
  Perform an express checkout for the items currently in the cart.

  Args:
      page: The Playwright page object to perform actions on.

  Usage Log:
  - Successfully completed express checkout, resulting in an order confirmation page with order number 000000191.
  - Initial attempts failed due to a timeout error when clicking 'Proceed to Checkout'. Resolved by ensuring items were in the cart.

  Note:
  - Ensure that the cart is pre-filled with the desired items before calling this function.
  - The function assumes that the 'Proceed to Checkout' button is visible and clickable from the cart page.
  - The function navigates through the checkout process by clicking 'Next' on the Shipping page and 'Place Order' on the Review & Payments page.
  - If the 'Place Order' button is not immediately visible, a delay is included to allow dynamic elements to load.
  """
  import asyncio

  await page.goto("/")
  await page.get_by_role("link", name="My Cart").click()
  await asyncio.sleep(5)
  await page.get_by_role("button", name="Proceed to Checkout").click()
  await asyncio.sleep(5)
  await page.get_by_role("button", name="Next").click()
  await asyncio.sleep(5)
  await page.get_by_role("button", name="Place Order").click()
  await asyncio.sleep(5)

Images before (left) and after (right) API execution.

Skill Honing

Here, we demonstrate that the agent is capable of fixing DOM resolution errors, by making specific tweaks to problematic sections of code. After the following trajectory, the agent makes a patch to the knowledge base.

Error Detection

Here is the trajectory in which the agent attempted to execute the API, encountered a runtime error, and then fixed the API.

State 0

URL: /byteblaze/accessible−html−content−patterns/−/merge requests/14

Reasoning

  1. I need to delete merged branches in the specified project 'byteblaze/accessible−html−content−patterns'.
  2. I can utilize the delete_merged_branches function which is predefined to handle this operation effectively in the GitLab UI.
  3. Since the page indicates that I am already within the correct project context, I can proceed directly with invoking the function using the provided project path.

Action

async def act(page):
  # Call the function to delete merged branches
  await delete_merged_branches(page, "byteblaze/accessible-html-content-patterns")

Outcome

'Error': Locator.fill: Error: strict mode violation: get_by_role("textbox") resolved to 2 elements:
1) <input id="search" type="text" name="search" autocomplete="off" placeholder="Search GitLab" data-qa-selector="search box" class="form-control gl-form-input gl-search-box-by-type-input" />
aka get_by_placeholder("Search GitLab")
2) <input type="text" id="BVID158" autocomplete="off" aria-labelledby="input-label" data-qa-selector="delete merged branches input" class="gl-form-input gl-mt-2 form-control gl-form-input-sm" />
aka locator("[id=\"BVID158\"]")
Call log: ...

(Explanation: Here, there is a runtime error in the delete_merged_branches function. This means the skill is incorrectly or non-robustly implemented. In this case, the selector used in the function was ambiguous.)

State 1

URL: /byteblaze/accessible−html−content−patterns/−/branches

Reasoning

The page currently displays a dialog that asks for confirmation to delete all merged branches. It requires typing 'delete' into an input box. After typing the confirmation text, the "Delete merged branches" button can be clicked to proceed.

Action

async def act(page):
  # Fill the confirmation textbox with 'delete'
  await page.get_by_role("dialog", name="Delete all merged branches?").get_by_role("textbox").fill("delete")
  # Click the "Delete merged branches" button to confirm deletion
  await page.get_by_role("dialog", name="Delete all merged branches?").get_by_role("button", name="Delete merged branches").click()

Outcome

No runtime errors.

Agent's Proposed Fix: To fix the runtime error in the generated API, the agent correctly identifies the source of the error and then makes the code more robust.

async def delete_merged_branches(page, project_path):
    """
    Automates the deletion of merged branches in a specified GitLab project.
    This function navigates to the branches page of the specified project and
    deletes all merged branches
    by interacting with the confirmation dialog that appears.
    Args:
    page: The Playwright page object.
    project_path: The path of the project in GitLab (e.g., 'byteblaze/
    accessible-html-content-patterns').
    Usage log:
    - Successfully used to delete merged branches in the 'a11y-syntax-
    highlighting' project by navigating to the branches page,
    clicking the 'Delete merged branches' button, and confirming the action
    by typing 'delete'.
    - Ensure the project path is correctly specified to match the URL structure
    in GitLab.
    - The function assumes the user is logged in and has access to the
    specified project.
    - Resolved an issue with strict mode violation by specifying the correct
    textbox within the dialog.
    """
    await page.goto(f"/{project_path}/-/branches")
    await page.get_by_role("button", name="Delete merged branches").click()
--  await page.get_by_role("textbox")
++  await page.get_by_role("dialog", name="Delete all merged branches?").get_by_role(
++    "textbox"
++  ).fill("delete")
    await page.get_by_role("dialog", name="Delete all merged branches?").
    get_by_role(
      "button", name="Delete merged branches"
    ).click()
    await asyncio.sleep(2)
                

Failure Cases

Failing to Use APIs

Sometimes, the agent fails to utilize an API that it has generated. For example, on the website cookpad.com, the agent fails to call search_recipes_by_cuisine_type('hamburger') to gain recipe details.

cookpad.com website

Using APIs Incorrectly

Sometimes, website features (like search bars) have certain mechanics that the agent fails to grasp, but which could potentially be encoded in the documentation of the API. For example, here the agent calls search_recipes_by_ingredients('chocolate chip, -nuts') instead of search_recipes_by_ingredients('chocolate chip without nuts') . This causes search results containing nuts to be displayed, because this website doesn't use the "-" symbol in a search query to negatively filter by a keyword.

cookpad.com website, with the search bar containing 'chocolate chip, -nuts'

Limitations

While it is very promising to have Python APIs to navigate a website, and especially appealing to have an agent improve its performance simply by gaining more experience, there are some limitations. First of all, our method (like many others) uses the accessibility tree DOM to obtain a compressed or semantically meaningful text representation of the page. However, some websites fail to parse into a natural accessibility tree representation, which may hinder performance.

BibTeX


@inproceedings{zheng2025skillweaver,
title={Web Agents can Self-Improve by Discovering and Honing Skills},
author={Boyuan Zheng and Michael Y. Fatemi and Xiaolong Jin and Zora Zhiruo Wang and Apurva Gandhi and Yueqi Song and Yu Gu and Jayanth Srinivasa and Gaowen Liu and Graham Neubig and Yu Su},
booktitle={Conference on Language Modeling (under review)},
year={2025},
url={https://arxiv.org/abs/2504.07079}
}