Action Space#
An action is specified by an action type (e.g., CLICK_COORDS)
and the necessary fields for that action type (e.g., coords=[30, 60]).
Supported Action Types#
MiniWoB++ environments support the following action types:
Name |
Description |
|---|---|
|
Do nothing for the current step. |
|
Move the cursor to the specified coordinates. |
|
Click on the specified coordinates. |
|
Double-click on the specified coordinates. |
|
Start dragging on the specified coordinates. |
|
Stop dragging on the specified coordinates. |
|
Scroll up on the mouse wheel at the specified coordinates. |
|
Scroll down on the mouse wheel at the specified coordinates. |
|
Click on the specified element using JavaScript. |
|
Press the specified key or key combination. |
|
Type the specified string. |
|
Type the value of the specified task field. |
|
Click on the specified element using JavaScript, and then type the specified string. |
|
Click on the specified element using JavaScript, and then type the value of the specified task field. |
There are action types that perform similar actions (e.g., CLICK_COORDS and CLICK_ELEMENT).
A common practice is to specify a subset of action types that the agent can use in the config, as described below.
Action Configs#
The list of selected action types, along with other configurations, can be customized
by passing a miniwob.action.ActionSpaceConfig object to the action_space_config argument
during environment construction.
An ActionSpaceConfig object has the following fields:
Key |
Type |
Description |
|---|---|---|
|
|
An ordered sequence of action types to include. |
|
|
Screen width. Will be overridden by the environment constructor. |
|
|
Screen height. Will be overridden by the environment constructor. |
|
|
If specified, bin the x and y coordinates to these numbers of bins. Mouse actions will be executed at the middle of the specified partition. |
|
|
The amount to scroll for scroll actions. |
|
|
Time in milliseconds to wait for scroll action animation. |
|
|
An ordered sequence of allowed keys and key combinations for the |
|
|
Maximum text length for the |
|
|
Character set for the |
Presets#
The following preset names can be specified in place of the ActionSpaceConfig object:
"all_supported": Select all supported actions, including redundant ones."shi17": The action space from (Shi et al., 2017) World of Bits: An Open-Domain Platform for Web-Based Agents."liu18": The action space from (Liu et al., 2018) Reinforcement Learning on Web Interfaces Using Workflow-Guided Exploration."humphreys22": The action space from (Humphreys et al., 2022) A data-driven approach for learning to control computers.
Adding "_mac_os" to the preset name will change the key modifiers in allowed_keys
from Control to Meta.
Key combinations#
The PRESS_KEY action type issues a key combination via Selenium.
Each key combination in the allowed_keys config follow the rules:
Modifiers are specified using prefixes “C-” (Control), “S-” (Shift), “A-” (Alternate), or “M-” (Meta).
Printable character keys (a, 1, etc.) are specified directly. Shifted characters (A, !, etc.) are equivalent to “S-” + non-shifted counterpart.
Special keys are inclosed in “<…>”. The list of valid names is specified in
miniwob.constants.WEBDRIVER_SPECIAL_KEYS.
Example valid key combinations:"7", "<Enter>", "C-S-<ArrowLeft>".
Action Object#
The action passed to the step method
should be a dict whose field inclusion depends on the selected action types in the config:
Key |
Type |
Description |
Inclusion |
|---|---|---|---|
|
|
Action type index from the |
Always. |
|
|
Left and top coordinates.
Depending on the |
When any |
|
|
Element |
When any |
|
|
Key index from the |
When the |
|
|
Text to type. |
When any |
|
|
Index from the task field list |
When any |
For instance, if the config only contains action types CLICK_COORDS and PRESS_KEY,
the action object can be
action = {
"action_type": 0, # CLICK_COORDS
"coords": np.array([100, 50]),
"key": 0, # Ignored by the action CLICK_COORDS
}