llms.py
Extensions

Computer Use

Enable AI agents to control your computer like a human - clicking, typing, running commands, and editing files.

Transform your AI agent into an autonomous computer operator. This built-in extension enables agents to see your screen, control the mouse and keyboard, execute shell commands, and edit files - just like a human user sitting at the computer.

Based on Anthropic's computer use tools, it brings full desktop automation capabilities to your AI workflows.

Using Computer Use Tools

The built-in Computer Use extension tools are listed under the computer_use category that can be executed directly in the Tools page:

Which also shows the definitions of all other available Computer Use tools:

Selecting Tools

Enable or disable individual tools based on your workflow needs:

Why Computer Use?

Traditional AI tools operate through APIs and structured data. Computer Use breaks this barrier by letting agents interact with any application - web browsers, desktop apps, terminals, IDEs - exactly as you would. This unlocks automation for:

  • Legacy Applications: Automate software that lacks APIs
  • Visual Verification: Confirm that code actually renders correctly in a browser
  • End-to-End Workflows: Chain together multiple applications in a single task
  • Interactive Testing: Navigate UIs, fill forms, and verify results visually

Key Capabilities

  • Visual Perception: Capture screenshots to see what's on screen, zoom into specific regions for detailed inspection
  • Mouse Control: Move, click (single/double/triple), drag, and scroll anywhere on screen
  • Keyboard Input: Type text, press key combinations, execute shortcuts
  • Shell Execution: Run any command in a persistent bash session with preserved state
  • File Operations: View, create, edit, and undo changes to files with precision

Use Cases

Web Development with Visual Verification

Ask an agent to build a web application, and it can write the code, launch a server, open the browser, and take a screenshot to prove it works:

The agent combines all three tools seamlessly:

  1. bash: Create project directories and start a local server
  2. edit: Write HTML, CSS, and JavaScript files
  3. computer: Open the browser and capture the final result

Desktop Application Automation

Agents can operate any GUI application:

  • Open applications, navigate menus, click buttons
  • Fill out forms and dialog boxes
  • Extract information from visual interfaces
  • Automate repetitive desktop workflows

System Administration

Execute and verify system operations:

  • Run diagnostic commands and interpret output
  • Edit configuration files with undo capability
  • Verify changes by taking screenshots of system state
  • Chain complex multi-step operations in a persistent shell

Testing and QA

Visual validation that code works as expected:

  • Navigate to specific URLs and verify page content
  • Interact with UI elements and confirm behavior
  • Capture screenshots for documentation or bug reports
  • Test across different screen regions with zoom

Tools

Computer

Interact with the screen, mouse, and keyboard to control your desktop environment.

ActionDescription
screenshotCapture the current screen state
mouse_moveMove cursor to specific coordinates
left_click, right_click, middle_clickClick at current or specified position
double_click, triple_clickMulti-click actions for text selection
left_click_dragClick and drag to a target position
left_mouse_down, left_mouse_upPress/release mouse button for complex interactions
scrollScroll in any direction by a specified amount
typeType text at the current cursor position
keyPress key combinations (e.g., Control+c, Return)
hold_keyHold a key down for a specified duration
waitPause execution for a specified duration
cursor_positionGet current cursor coordinates
zoomZoom into a specific screen region for detail
async def computer(
    action: Literal["key", "type", "mouse_move", "left_click", "left_click_drag", "right_click",
        "middle_click", "double_click", "left_mouse_down", "left_mouse_up", "scroll",
        "hold_key", "wait", "triple_click", "screenshot", "cursor_position", "zoom"],
    text: Annotated[str | None, "The text to type or the key to press"] = None,
    coordinate: Annotated[
        tuple[int, int] | None,
        "(x, y): The x and y coordinates to move the mouse to"
    ] = None,
    scroll_direction: Literal["up", "down", "left", "right"] | None = None,
    scroll_amount: Annotated[int | None, "The number of lines to scroll"] = None,
    duration: Annotated[float | None, "Duration in seconds"] = None,
    key: Annotated[str | None, "The key sequence to press"] = None,
    region: Annotated[str | None, "(x0, y0, x1, y1): The region to zoom into"] = None,
) -> list[dict[str, Any]]

Edit

A precision file editor designed for AI agents with string-based operations that avoid line-number ambiguity.

CommandDescription
viewRead file contents or list directory contents
createCreate a new file with specified content
str_replaceReplace a unique string with new content
insertInsert text after a specific line number
undo_editRevert the last edit to a file

View, create, and modify files with undo support.

async def edit(
    command: Literal["view", "create", "str_replace", "insert", "undo_edit"],
    path: Annotated[str, "The absolute path to the file or directory"],
    file_text: Annotated[
        str | None,
        "The content to write to the file (required for create)"
    ] = None,
    view_range: Annotated[
        list[int] | None,
        "The range of lines to view (e.g. [1, 10])"
    ] = None,
    old_str: Annotated[
        str | None,
        "The string to replace (required for str_replace)"
    ] = None,
    new_str: Annotated[
        str | None,
        "The replacement string (required for str_replace and insert)"
    ] = None,
    insert_line: Annotated[
        int | None,
        "The line number after which to insert (required for insert)"
    ] = None,
) -> list[dict[str, Any]]

Bash

Execute commands in a persistent shell session where environment variables, working directory, and state are preserved between calls.

FeatureDescription
Command executionRun any bash command
Persistent stateWorking directory and variables persist
Session restartReset the shell environment when needed
Cross-platform openLaunch files/URLs with system default handler
async def run_bash(
    command: Annotated[str | None, "Command to run"],
    restart: Annotated[bool, "Restart the bash session"] = False,
) -> list[dict[str, Any]]

Open File or URL

Open a URL or file using the system's default handler (xdg-open on Linux, open on macOS, start on Windows).

async def open(
    target: Annotated[str, "URL or file path to open"]
) -> list[dict[str, Any]]