Action Space¶
An action is specified by an action type (e.g., CLICK_COORDS)
and the necessary fields for that action type (e.g., coords=[30, 60]).
Supported Action Types¶
MiniWoB++ environments support the following action types:
Name |
Description |
|---|---|
|
Do nothing for the current step. |
|
Move the cursor to the specified coordinates. |
|
Click on the specified coordinates. |
|
Double-click on the specified coordinates. |
|
Start dragging on the specified coordinates. |
|
Stop dragging on the specified coordinates. |
|
Scroll up on the mouse wheel at the specified coordinates. |
|
Scroll down on the mouse wheel at the specified coordinates. |
|
Click on the specified element using JavaScript. |
|
Press the specified key or key combination. |
|
Type the specified string. |
|
Type the value of the specified task field. |
|
Click on the specified element using JavaScript, and then type the specified string. |
|
Click on the specified element using JavaScript, and then type the value of the specified task field. |
There are action types that perform similar actions (e.g., CLICK_COORDS and CLICK_ELEMENT).
A common practice is to specify a subset of action types that the agent can use in the config, as described below.
Action Configs¶
The list of selected action types, along with other configurations, can be customized
by passing a miniwob.action.ActionSpaceConfig object to the action_space_config argument
during environment construction.
An ActionSpaceConfig object has the following fields:
Key |
Type |
Description |
|---|---|---|
|
|
An ordered sequence of action types to include. |
|
|
Screen width. Will be overridden by the environment constructor. |
|
|
Screen height. Will be overridden by the environment constructor. |
|
|
If specified, bin the x and y coordinates to these numbers of bins. Mouse actions will be executed at the middle of the specified partition. |
|
|
The amount to scroll for scroll actions. |
|
|
Time in milliseconds to wait for scroll action animation. |
|
|
An ordered sequence of allowed keys and key combinations for the |
|
|
Maximum text length for the |
|
|
Character set for the |
Presets¶
The following preset names can be specified in place of the ActionSpaceConfig object:
"all_supported": Select all supported actions, including redundant ones."shi17": The action space from (Shi et al., 2017) World of Bits: An Open-Domain Platform for Web-Based Agents."liu18": The action space from (Liu et al., 2018) Reinforcement Learning on Web Interfaces Using Workflow-Guided Exploration."humphreys22": The action space from (Humphreys et al., 2022) A data-driven approach for learning to control computers.
Adding "_mac_os" to the preset name will change the key modifiers in allowed_keys
from Control to Meta.
Key combinations¶
The PRESS_KEY action type issues a key combination via Selenium.
Each key combination in the allowed_keys config follow the rules:
Modifiers are specified using prefixes “C-” (Control), “S-” (Shift), “A-” (Alternate), or “M-” (Meta).
Printable character keys (a, 1, etc.) are specified directly. Shifted characters (A, !, etc.) are equivalent to “S-” + non-shifted counterpart.
Special keys are inclosed in “<…>”. The list of valid names is specified in
miniwob.constants.WEBDRIVER_SPECIAL_KEYS.
Example valid key combinations:"7", "<Enter>", "C-S-<ArrowLeft>".
Action Object¶
The action passed to the step method
should be a dict whose field inclusion depends on the selected action types in the config:
Key |
Type |
Description |
Inclusion |
|---|---|---|---|
|
|
Action type index from the |
Always. |
|
|
Left and top coordinates.
Depending on the |
When any |
|
|
Element |
When any |
|
|
Key index from the |
When the |
|
|
Text to type. |
When any |
|
|
Index from the task field list |
When any |
For instance, if the config only contains action types CLICK_COORDS and PRESS_KEY,
the action object can be
action = {
"action_type": 0, # CLICK_COORDS
"coords": np.array([100, 50]),
"key": 0, # Ignored by the action CLICK_COORDS
}