- Cross Task Generalization: Generalization across tasks in the same environment, such as (a) to (c).
- Cross Website Generalization: Generalization across websites under the same domain, such as (a) to (d).
- Cross Domain Generalization: Generalization across tasks and environments, such as (e) through (i).
For each task, we provide the following information:
Task Descriptionthat describes the task in natural language sentences.
Action Sequencethat describes the sequence of actions to be performed to complete the task.
- Each action is a pair of
(Operation, Target Element), where the
Target Elementis an element in the web page that the user choose to interact with, and the
Operationis the action to be performed on the element.
- We support four common operations:
Webpage Snapshotsthat serve as the environment. We provide snapshots of differnt formats:
MHTMLthat contains the raw HTML code of the webpage.
DOM Snapshotthat contains snapshot with DOM, layout, and style information
Imagethat contains the screenshot of the webpage.
HARthat contains all network traffic for replay.
Tracethat contains the complete interaction trace for the annotation.
The data is collected through the Amazon Mechanical Turk (AMT) platform. The collection is done in three phases:
- Phase 1, Task Proposal: We first ask the worker to propose tasks that can be performed on a given website. We mannually review the proposed tasks and select the ones that are feasible and interesting for annotation in Phase 2.
- Phase 2, Task Demonstration: We ask the worker to demonstrate how to perform the task on the website. We develop an annotation tool with Playwright which records the interaction trace and takes snapshots of the webpage at each step.
- Phase 3, Task Verification: The authors verified all the tasks to make sure that all actions are clean and the task description correctly relfects the annotated actions.