- Cross Task Generalization: Generalization across tasks in the same environment, such as (a) to (c).
- Cross Website Generalization: Generalization across websites under the same domain, such as (a) to (d).
- Cross Domain Generalization: Generalization across tasks and environments, such as (e) through (i).
For each task, we provide the following information:
-
Task Description
that describes the task in natural language sentences. -
Action Sequence
that describes the sequence of actions to be performed to complete the task. - Each action is a pair of
(Operation, Target Element)
, where theTarget Element
is an element in the web page that the user choose to interact with, and theOperation
is the action to be performed on the element. - We support four common operations:
Click
,Hover
,Type
andSelect
Webpage Snapshots
that serve as the environment. We provide snapshots of differnt formats:-
MHTML
that contains the raw HTML code of the webpage. -
DOM Snapshot
that contains snapshot with DOM, layout, and style information Image
that contains the screenshot of the webpage.-
HAR
that contains all network traffic for replay. Trace
that contains the complete interaction trace for the annotation.
-
The data is collected through the Amazon Mechanical Turk (AMT) platform. The collection is done in three phases:
- Phase 1, Task Proposal: We first ask the worker to propose tasks that can be performed on a given website. We mannually review the proposed tasks and select the ones that are feasible and interesting for annotation in Phase 2.
- Phase 2, Task Demonstration: We ask the worker to demonstrate how to perform the task on the website. We develop an annotation tool with Playwright which records the interaction trace and takes snapshots of the webpage at each step.
- Phase 3, Task Verification: The authors verified all the tasks to make sure that all actions are clean and the task description correctly relfects the annotated actions.