Computer Use
Enable AI agents to control your computer like a human - clicking, typing, running commands, and editing files.
Transform your AI agent into an autonomous computer operator. This built-in extension enables agents to see your screen, control the mouse and keyboard, execute shell commands, and edit files - just like a human user sitting at the computer.
Based on Anthropic's computer use tools, it brings full desktop automation capabilities to your AI workflows.
Using Computer Use Tools
The built-in Computer Use extension tools are listed under the computer_use category that can be executed directly in the Tools page:
Which also shows the definitions of all other available Computer Use tools:
Selecting Tools
Enable or disable individual tools based on your workflow needs:
Why Computer Use?
Traditional AI tools operate through APIs and structured data. Computer Use breaks this barrier by letting agents interact with any application - web browsers, desktop apps, terminals, IDEs - exactly as you would. This unlocks automation for:
- Legacy Applications: Automate software that lacks APIs
- Visual Verification: Confirm that code actually renders correctly in a browser
- End-to-End Workflows: Chain together multiple applications in a single task
- Interactive Testing: Navigate UIs, fill forms, and verify results visually
Key Capabilities
- Visual Perception: Capture screenshots to see what's on screen, zoom into specific regions for detailed inspection
- Mouse Control: Move, click (single/double/triple), drag, and scroll anywhere on screen
- Keyboard Input: Type text, press key combinations, execute shortcuts
- Shell Execution: Run any command in a persistent bash session with preserved state
- File Operations: View, create, edit, and undo changes to files with precision
Use Cases
Web Development with Visual Verification
Ask an agent to build a web application, and it can write the code, launch a server, open the browser, and take a screenshot to prove it works:
The agent combines all three tools seamlessly:
- bash: Create project directories and start a local server
- edit: Write HTML, CSS, and JavaScript files
- computer: Open the browser and capture the final result
Desktop Application Automation
Agents can operate any GUI application:
- Open applications, navigate menus, click buttons
- Fill out forms and dialog boxes
- Extract information from visual interfaces
- Automate repetitive desktop workflows
System Administration
Execute and verify system operations:
- Run diagnostic commands and interpret output
- Edit configuration files with undo capability
- Verify changes by taking screenshots of system state
- Chain complex multi-step operations in a persistent shell
Testing and QA
Visual validation that code works as expected:
- Navigate to specific URLs and verify page content
- Interact with UI elements and confirm behavior
- Capture screenshots for documentation or bug reports
- Test across different screen regions with zoom
Tools
Computer
Interact with the screen, mouse, and keyboard to control your desktop environment.
| Action | Description |
|---|---|
screenshot | Capture the current screen state |
mouse_move | Move cursor to specific coordinates |
left_click, right_click, middle_click | Click at current or specified position |
double_click, triple_click | Multi-click actions for text selection |
left_click_drag | Click and drag to a target position |
left_mouse_down, left_mouse_up | Press/release mouse button for complex interactions |
scroll | Scroll in any direction by a specified amount |
type | Type text at the current cursor position |
key | Press key combinations (e.g., Control+c, Return) |
hold_key | Hold a key down for a specified duration |
wait | Pause execution for a specified duration |
cursor_position | Get current cursor coordinates |
zoom | Zoom into a specific screen region for detail |
async def computer(
action: Literal["key", "type", "mouse_move", "left_click", "left_click_drag", "right_click",
"middle_click", "double_click", "left_mouse_down", "left_mouse_up", "scroll",
"hold_key", "wait", "triple_click", "screenshot", "cursor_position", "zoom"],
text: Annotated[str | None, "The text to type or the key to press"] = None,
coordinate: Annotated[
tuple[int, int] | None,
"(x, y): The x and y coordinates to move the mouse to"
] = None,
scroll_direction: Literal["up", "down", "left", "right"] | None = None,
scroll_amount: Annotated[int | None, "The number of lines to scroll"] = None,
duration: Annotated[float | None, "Duration in seconds"] = None,
key: Annotated[str | None, "The key sequence to press"] = None,
region: Annotated[str | None, "(x0, y0, x1, y1): The region to zoom into"] = None,
) -> list[dict[str, Any]]Edit
A precision file editor designed for AI agents with string-based operations that avoid line-number ambiguity.
| Command | Description |
|---|---|
view | Read file contents or list directory contents |
create | Create a new file with specified content |
str_replace | Replace a unique string with new content |
insert | Insert text after a specific line number |
undo_edit | Revert the last edit to a file |
View, create, and modify files with undo support.
async def edit(
command: Literal["view", "create", "str_replace", "insert", "undo_edit"],
path: Annotated[str, "The absolute path to the file or directory"],
file_text: Annotated[
str | None,
"The content to write to the file (required for create)"
] = None,
view_range: Annotated[
list[int] | None,
"The range of lines to view (e.g. [1, 10])"
] = None,
old_str: Annotated[
str | None,
"The string to replace (required for str_replace)"
] = None,
new_str: Annotated[
str | None,
"The replacement string (required for str_replace and insert)"
] = None,
insert_line: Annotated[
int | None,
"The line number after which to insert (required for insert)"
] = None,
) -> list[dict[str, Any]]Bash
Execute commands in a persistent shell session where environment variables, working directory, and state are preserved between calls.
| Feature | Description |
|---|---|
| Command execution | Run any bash command |
| Persistent state | Working directory and variables persist |
| Session restart | Reset the shell environment when needed |
| Cross-platform open | Launch files/URLs with system default handler |
async def run_bash(
command: Annotated[str | None, "Command to run"],
restart: Annotated[bool, "Restart the bash session"] = False,
) -> list[dict[str, Any]]Open File or URL
Open a URL or file using the system's default handler (xdg-open on Linux, open on macOS, start on Windows).
async def open(
target: Annotated[str, "URL or file path to open"]
) -> list[dict[str, Any]]