To keep the core lightweight while enabling limitless enhancements, the flexible Extensions system (inspired by ComfyUI Custom Nodes) can be used to add features, register new provider implementations, extend, replace, and customize the UI with your own custom features.

Installation

Extensions can be installed from GitHub or by creating a local folder:

Local: Simply create a folder in ~/.llms/extensions/my_extension
GitHub: Clone GitHub Repo extensions into ~/.llms/extensions e.g:

git clone https://github.com/user/my_extension ~/.llms/extensions/my_extension

How it Works (Server)

Extensions are Python modules that plug into the server lifecycle using hooks defined in their __init__.py:

Hook	Purpose
`__parser__(parser)`	Add custom CLI arguments
`__install__(ctx)`	Enhance the server instance (routes, providers, filters, etc.)
`__load__(ctx)`	Load data or perform async tasks before server starts
`__run__(ctx)`	Execute custom logic when running in CLI mode

The ctx parameter provides access to the ExtensionContext
The __install__ hook runs after providers are configured but before server is run

How it Works (UI)

Extensions can also include frontend components:

Placement: Add a ui folder within your extension directory
Access: Static files in this folder are automatically served at /ext/<extension_name>/*
Integration: Create a ui/index.mjs file. This is the entry point and must export an install function:

// Vue 3 Component
const MyComponent = {
    template: `...`
}

// ui/index.mjs
export default {
    install(ctx) {
        // Register or replace components, add routes, etc.
        ctx.components({ MyComponent })
    }
}

Register GET and POST API Endpoints

Custom server API endpoints are registered at /ext/<extension_name>/*

xmas init.py:

Example of creating a simple extension that adds a greet endpoint to the server:

def install(ctx):
    # Load greetings from extension's `./ui/greetings.json`
    greetings_path = Path(__file__).parent / 'ui' / 'greetings.json'
    if greetings_path.exists():
        with open(greetings_path) as f:
            greetings = json.load(f)
    else:
        greetings = ["Merry Christmas!"]

    count = 0
    async def greet(request):
        nonlocal count
        name = request.query.get('name')
        if not name:
            data = await request.post()
            name = data.get('name')

        if not name:
            name = 'Stranger'

        greeting = greetings[count % len(greetings)]
        count += 1
        return web.json_response({"result":f"Hello {name}, {greeting}"})

    # Extension endpoints registered at /ext/xmas/*
    ctx.add_get("greet", greet)   # GET  /ext/xmas/greet
    ctx.add_post("greet", greet)  # POST /ext/xmas/greet

The xmas/ui/greetings.json static file is automatically available from /ext/xmas/greetings.json.

Example of using making Chat Completion requests inside an endpoint:

    async def story(request):
        model = request.query.get("model")
        chat = ctx.chat_request(template="text", model=model, text="Tell me a short Christmas tale")
        response = await ctx.chat_completion(chat)
        return web.json_response(response)

    ctx.add_get("story", story)  # GET /ext/xmas/story

The chat_request uses chat templates configured in your llms.json defaults configuration:

chat - Chat template for text requests (default)
image - Chat template for image input requests
audio - Chat template for audio input requests
file - Chat template for file input requests (e.g. PDFs)
out:image - Chat template for image generation requests
out:audio - Chat template for audio generation requests

Finally, connect it to the install hook to be able to run the extension:

# register install extension handler
__install__ = install

Example endpoint returning user-specific data

system_tools init.py:

This extension first checks if user is signed in (Github OAuth), then returns prompts for that user if exists, otherwise returns default prompts from all users or default prompts in this extension.

def install(ctx):
    def get_user_prompts(request):
        candidate_paths = []        
        username = ctx.get_username(request) # check if user is signed in
        if username:
            # if signed in (Github OAuth), return the prompts for this user if exists
            candidate_paths.append(os.path.join(Path.home(), ".llms", "user", username, "system_prompts", "prompts.json"))
        # return default prompts for all users if exists
        candidate_paths.append(os.path.join(Path.home(), ".llms", "user", "default", "system_prompts", "prompts.json"))
        # otherwise return the default prompts from this repo
        candidate_paths.append(os.path.join(ctx.path, "ui", "prompts.json"))

        # iterate all candidate paths and when exists return its json
        for path in candidate_paths:
            if os.path.exists(path):
                with open(path, encoding="utf-8") as f:
                    txt = f.read()
                    return json.loads(txt)
        return default_prompts

    # API Handler to get prompts
    async def get_prompts(request):
        prompts_json = get_user_prompts(request)
        return web.json_response(prompts_json)

    # Extension endpoint registered at /ext/system_prompts/prompts.json
    ctx.add_get("prompts.json", get_prompts)

Register custom tool with 3rd Party extension

Example of creating a tool extension that adds a web search tool to the server. Uses requirements.txt to install 3rd Party ddgs dependency.

duckduckgo init.py

from ddgs import DDGS
from typing import Any, Dict

def web_search(query: str, max_results: int | None = 10, page: int = 1) -> Dict[str, Any]:
    """
    Perform a web search using DuckDuckGo.
    """

    try:
        results = []
        with DDGS() as ddgs:
            # text() returns an iterator
            for r in ddgs.text(query, max_results=max_results):
                results.append(r)
        return {"query": query, "results": results}
    except Exception as e:
        return {"query": query, "error": str(e)}

def install(ctx):
    ctx.register_tool(web_search)

__install__ = install

See Tool Support Docs and core_tools implementation for more examples.

Custom Provider Implementation example

providers/openrouter.py

Example of creating a custom provider that extends the GeneratorBase class to add support for image generation in OpenRouter.

def install(ctx):
    from llms.main import GeneratorBase

    # https://openrouter.ai/docs/guides/overview/multimodal/image-generation
    class OpenRouterGenerator(GeneratorBase):
        sdk = "openrouter/image"

        def __init__(self, **kwargs):
            super().__init__(**kwargs)

        def to_response(self, response, chat, started_at):
            # go through all image responses and save them to cache
            for choice in response["choices"]:
                if "message" in choice and "images" in choice["message"]:
                    for image in choice["message"]["images"]:
                        if choice["message"]["content"] == "":
                            choice["message"]["content"] = self.default_content
                        if "image_url" in image:
                            data_uri = image["image_url"]["url"]
                            if data_uri.startswith("data:"):
                                parts = data_uri.split(",", 1)
                                ext = parts[0].split(";")[0].split("/")[1]
                                base64_data = parts[1]
                                model = chat["model"].split("/")[-1]
                                filename = f"{model}-{choice['index']}.{ext}"
                                info = {
                                    "model": model,
                                    "prompt": ctx.last_user_prompt(chat),
                                }
                                relative_url, info = ctx.save_image_to_cache(base64_data, filename, info)
                                image["image_url"]["url"] = relative_url

            return response

        async def chat(self, chat, provider=None):
            headers = self.get_headers(provider, chat)
            if provider is not None:
                chat["model"] = provider.provider_model(chat["model"]) or chat["model"]

            started_at = time.time()
            if ctx.MOCK:
                print("Mocking OpenRouterGenerator")
                text = ctx.text_from_file(f"{ctx.MOCK_DIR}/openrouter-image.json")
                return ctx.log_json(self.to_response(json.loads(text), chat, started_at))
            else:
                chat_url = provider.chat_url
                chat = await self.process_chat(chat, provider_id=self.id)
                ctx.log(f"POST {chat_url}")
                ctx.log(provider.chat_summary(chat))
                # remove metadata if any (conflicts with some providers, e.g. Z.ai)
                chat.pop("metadata", None)

                async with aiohttp.ClientSession() as session, session.post(
                    chat_url,
                    headers=headers,
                    data=json.dumps(chat),
                    timeout=aiohttp.ClientTimeout(total=300),
                ) as response:
                    return ctx.log_json(self.to_response(await self.response_json(response), chat, started_at))

    ctx.add_provider(OpenRouterGenerator)


__install__ = install

This new implementation can be used by registering it as the image modality whose npm matches the providers sdk in llms.json, e.g:

{
    "openrouter": {
        "enabled": true,
        "id": "openrouter",
        "modalities": {
            "image": {
                "name": "OpenRouter Image",
                "npm": "openrouter/image"
            }
        }
    }
}

Find more Provider implementations in the providers extension.

Server Extensions APIs

This document covers the public APIs available to server-side extensions via the ExtensionContext and AppExtensions classes.

Overview

When creating a server extension, you work with an ExtensionContext instance that provides access to all extension capabilities. The ExtensionContext is passed to your extension's init(ctx) function and serves as the primary interface for registering handlers, routes, tools, and accessing server functionality.

def init(ctx: ExtensionContext):
    # Your extension initialization code here
    ctx.register_tool(my_tool_function, group="my_tools")
    ctx.add_get("status", handle_status)

Example Extension

Here's a complete example demonstrating common extension patterns:

import os
from aiohttp import web

def init(ctx):
    """Initialize the example extension."""
    
    # Register a custom tool
    def greet_user(name: str, formal: bool = False) -> str:
        """Greet a user by name.
        
        Args:
            name: The user's name
            formal: Whether to use formal greeting
        
        Returns:
            A greeting message
        """
        if formal:
            return f"Good day, {name}. How may I assist you?"
        return f"Hey {name}! What's up?"
    
    ctx.register_tool(greet_user, group="social")
    
    # Register API routes
    async def get_status(request):
        is_auth, user = ctx.check_auth(request)
        return web.json_response({
            "extension": ctx.name,
            "authenticated": is_auth,
            "user": user.get("userName") if user else None
        })
    ctx.add_get("status", get_status)
    
    async def create_item(request):
        is_auth, _ = ctx.check_auth(request)
        if not is_auth:
            return ctx.error_auth_required
        
        try:
            data = await request.json()
            # Process data...
            return web.json_response({"success": True, "id": "123"})
        except Exception as e:
            return web.json_response(ctx.error_response(e), status=500)
    ctx.add_post("items", create_item)
    
    # Register filters
    async def log_requests(chat, context):
        ctx.log(f"Chat request for thread: {context.get('threadId')}")
    ctx.register_chat_request_filter(log_requests)
    
    # Cleanup on shutdown
    def cleanup():
        ctx.log("Extension shutting down, cleaning up...")
    ctx.register_shutdown_handler(cleanup)
    
    ctx.log("Example extension initialized!")

ExtensionContext

The ExtensionContext class is the main interface for extensions. It provides access to the extension's configuration, logging, and registration methods.

Properties

Property	Type	Description
`app`	`AppExtensions`	Reference to the parent application extensions manager
`cli_args`	`argparse.Namespace`	Command-line arguments passed to the server
`extra_args`	`Dict[str, Any]`	Additional arguments from extensions
`error_auth_required`	`Dict[str, Any]`	Pre-built authentication required error response
`path`	`str`	File path of the extension
`name`	`str`	Name of the extension (derived from filename)
`ext_prefix`	`str`	URL prefix for extension routes (e.g., `/ext/myext`)
`debug`	`bool`	Whether debug mode is enabled
`verbose`	`bool`	Whether verbose logging is enabled
`aspect_ratios`	`Dict[str, str]`	Available image aspect ratios (e.g., `"1:1": "1024×1024"`)
`request_args`	`Dict[str, type]`	Supported chat request arguments with their types
`disabled`	`bool`	Whether the extension is disabled

Logging Methods

`log(message: Any) -> Any`

Log a message when verbose mode is enabled. Returns the message for chaining.

ctx.log("Processing request...")

`log_json(obj: Any) -> Any`

Log an object as formatted JSON when verbose mode is enabled. Returns the object for chaining.

ctx.log_json({"status": "ok", "count": 42})

`dbg(message: Any)`

Log a debug message when debug mode is enabled.

ctx.dbg("Entering handler with params: ...")

`err(message: str, e: Exception)`

Log an error with exception details. Prints stack trace in verbose mode.

try:
    process_data()
except Exception as e:
    ctx.err("Failed to process data", e)

Route Registration

Routes are automatically prefixed with /ext/{extension_name}.

`add_get(path: str, handler: Callable, **kwargs: Any)`

async def handle_status(request):
    return web.json_response({"status": "ok"})

ctx.add_get("status", handle_status)  # Available at /ext/myext/status

`add_post(path: str, handler: Callable, **kwargs: Any)`

async def handle_create(request):
    data = await request.json()
    return web.json_response({"created": True})

ctx.add_post("create", handle_create)

`add_put(path: str, handler: Callable, **kwargs: Any)`

`add_delete(path: str, handler: Callable, **kwargs: Any)`

`add_patch(path: str, handler: Callable, **kwargs: Any)`

`add_static_files(ext_dir: str)`

Serve static files from a directory under the extension's URL prefix.

# Serve files from ./ui directory at /ext/myext/*
ext_dir = os.path.join(os.path.dirname(__file__), "ui")
ctx.add_static_files(ext_dir)

`web_path(method: str, path: str) -> str`

Get the full URL path for a route (internal helper).

Filter Registration

Filters intercept and can modify requests/responses at various stages.

`register_chat_request_filter(handler: Callable)`

async def filter_request(chat: Dict, context: Dict):
    # Modify chat request before processing
    chat["metadata"] = {"source": "extension"}

ctx.register_chat_request_filter(filter_request)

`register_chat_tool_filter(handler: Callable)`

async def on_tool_call(chat: Dict, context: Dict):
    ctx.log(f"Tool called in thread: {context.get('threadId')}")

ctx.register_chat_tool_filter(on_tool_call)

`register_chat_response_filter(handler: Callable)`

async def filter_response(response: Dict, context: Dict):
    # Modify or log response
    pass

ctx.register_chat_response_filter(filter_response)

`register_chat_error_filter(handler: Callable)`

async def on_error(e: Exception, context: Dict):
    ctx.log(f"Error: {e}, Stack: {context.get('stackTrace')}")

ctx.register_chat_error_filter(on_error)

`register_cache_saved_filter(handler: Callable)`

def on_cache_saved(context: Dict):
    ctx.log(f"Cached: {context['url']}")

ctx.register_cache_saved_filter(on_cache_saved)

`register_shutdown_handler(handler: Callable)`

def cleanup():
    ctx.log("Extension shutting down...")

ctx.register_shutdown_handler(cleanup)

Tool Registration

`register_tool(func:Callable, tool_def:Optional[Dict]=None, group:Optional[str]=None)`

def search_database(query: str, limit: int = 10) -> Dict[str, Any]:
    """Search the database for matching records.
    
    Args:
        query: Search query string
        limit: Maximum number of results to return
    
    Returns:
        Dictionary containing search results
    """
    results = do_search(query, limit)
    return {"results": results}

ctx.register_tool(search_database, group="database")

If tool_def is not provided, it's automatically generated from the function signature and docstring
group categorizes the tool for UI display (defaults to "custom")

`get_tool_definition(name: str) -> Optional[Dict[str, Any]]`

Retrieve the tool definition for a registered tool.

tool_def = ctx.get_tool_definition("search_database")

`sanitize_tool_def(tool_def: Dict[str, Any]) -> Dict[str, Any]`

Process a tool definition to inline $defs references.

Tool Execution

async exec_tool(name: str, args: Dict[str, Any])
    -> Tuple[Optional[str], List[Dict[str, Any]]]

Execute a registered tool by name.

text, resources = await ctx.exec_tool("search_database", {"query": "test"})

The text captures the tool text response that can be embedded in AI Messages and further passed to other AI Reqeusts and tools.

The resources contains a list of artifacts extracted from the tool results in the same structure as Open AI content types.

tool_result(result: Any, 
    function_name: Optional[str] = None, 
    function_args: Optional[Dict] = None)
     -> Dict[str, Any]

Format a tool execution result for return to the LLM.

tool_result_part(result: Dict, 
    function_name: Optional[str] = None, 
    function_args: Optional[Dict] = None)
    -> Dict[str, Any]

Format a partial tool result.

`to_content(result: Any) -> str`

Convert a result to string content.

Chat Utilities

chat_request(
    template: Optional[str] = None, 
    text: Optional[str] = None, 
    model: Optional[str] = None, 
    system_prompt: Optional[str] = None) 
    -> Dict[str, Any]

Create a chat request object.

chat = ctx.chat_request(
    text="Summarize this document",
    model="gpt-4o",
    system_prompt="You are a helpful assistant"
)

`async chat_completion(chat: Dict[str, Any], context: Optional[Dict] = None) -> Any`

Send a chat completion request.

chat = ctx.chat_request(text="Hello, world!")
response = await ctx.chat_completion(chat)

`create_chat_with_tools(chat: Dict[str, Any], use_tools: str = "all") -> Dict[str, Any]`

Create a chat request with tools injected.

chat = ctx.chat_request(text="Search for recent news")
chat_with_tools = ctx.create_chat_with_tools(chat, use_tools="search_web,fetch_page")

content = ctx.text_from_file("/path/to/file.txt")

`json_from_file(path: str) -> Any`

Read and parse JSON from a file.

data = ctx.json_from_file("/path/to/config.json")

`download_file(url: str) -> Tuple[bytes, Dict[str, Any]]`

Download a file from a URL. Returns bytes and metadata.

session_download_file(session: aiohttp.ClientSession, url: str) 
    -> Tuple[bytes, Dict[str, Any]]

Download a file using an existing aiohttp session.

cache_file = ctx.get_cache_path("my_extension/data.json")

save_image_to_cache(
    base64_data: Union[str, bytes], 
    filename: str, 
    image_info: Dict[str, Any], 
    ignore_info: bool = False) 
    -> Tuple[str, Optional[Dict[str, Any]]]

Save image data to the cache. Returns the cache path and info.

path, info = ctx.save_image_to_cache(b64_data, "output.png", {"prompt": "..."})

save_bytes_to_cache(
    bytes_data: Union[str, bytes], 
    filename: str, 
    file_info: Optional[Dict[str, Any]]) 
    -> Tuple[str, Optional[Dict[str, Any]]]

Save binary data to the cache.

providers = ctx.get_providers()
for name, provider in providers.items():
    ctx.log(f"Provider: {name}")

`get_provider(name: str) -> Optional[Any]`

Get a specific provider by name.

openai = ctx.get_provider("openai")

`add_provider(provider: Any)`

from llms.main import OpenAiCompatible

class MyProvider(OpenAiCompatible):
    name = "my-provider"
    # ...

ctx.add_provider(MyProvider)

Authentication & Sessions

`check_auth(request: web.Request) -> Tuple[bool, Optional[Dict[str, Any]]]`

Check if a request is authenticated. Returns (is_authenticated, user_data).

async def protected_route(request):
    is_auth, user = ctx.check_auth(request)
    if not is_auth:
        return ctx.error_auth_required
    return web.json_response({"user": user})

`get_session(request: web.Request) -> Optional[Dict[str, Any]]`

Get the session data for a request.

session = ctx.get_session(request)
if session:
    ctx.log(f"User: {session.get('userName')}")

`get_username(request: web.Request) -> Optional[str]`

Get the username from a request's session.

`get_user_path(username: Optional[str] = None) -> str`

Get the filesystem path for user-specific data.

user_dir = ctx.get_user_path("john")  # ~/.llms/user/john

config = ctx.get_config()
api_key = config.get("auth", {}).get("api_key")

`get_file_mime_type(filename: str) -> str`

Get the MIME type for a filename.

mime = ctx.get_file_mime_type("image.png")  # "image/png"

to_file_info(
    chat: Dict[str, Any], 
    info: Optional[Dict] = None, 
    response: Optional[Dict] = None) 
    -> Dict[str, Any]

Create file info metadata from chat/response data.

try:
    risky_operation()
except Exception as e:
    msg = ctx.error_message(e)
    return web.json_response({"error": msg}, status=500)

`error_response(e: Exception, stacktrace: bool = False) -> Dict[str, Any]`

Create an error response dictionary from an exception.

try:
    process()
except Exception as e:
    return web.json_response(ctx.error_response(e, stacktrace=True), status=500)

UI Registration

`register_ui_extension(index: str)`

ctx.register_ui_extension("index.mjs")  # Registers /ext/myext/index.mjs

`add_importmaps(dict: Dict[str, str])`

Add JavaScript import map entries.

ctx.add_importmaps({
    "my-lib": "/ext/myext/lib/my-lib.mjs"
})

`add_index_header(html: str)`

Add HTML to the main page header.

ctx.add_index_header('<link rel="stylesheet" href="/ext/myext/styles.css">')

`add_index_footer(html: str)`

Add HTML to the main page footer.

ctx.add_index_footer('<script src="/ext/myext/analytics.js"></script>')

AppExtensions

The AppExtensions class manages all registered extensions and provides shared state. While extensions primarily interact through ExtensionContext, some AppExtensions properties are accessible.

Properties

Property	Type	Description
`cli_args`	`argparse.Namespace`	Command-line arguments
`extra_args`	`Dict[str, Any]`	Additional extension arguments
`config`	`Dict[str, Any]`	Server configuration
`auth_enabled`	`bool`	Whether authentication is enabled
`ui_extensions`	`List[Dict]`	Registered UI extensions
`tools`	`Dict[str, Callable]`	Registered tool functions by name
`tool_definitions`	`List[Dict]`	Tool definitions for LLM consumption
`tool_groups`	`Dict[str, List[str]]`	Tool names grouped by category
`all_providers`	`List[type]`	All registered provider classes
`import_maps`	`Dict[str, str]`	JavaScript import map entries
`index_headers`	`List[str]`	HTML headers for main page
`index_footers`	`List[str]`	HTML footers for main page
`aspect_ratios`	`Dict[str, str]`	Image aspect ratio mappings
`request_args`	`Dict[str, type]`	Supported request argument types

Default Request Arguments

The request_args dictionary defines supported chat request parameters:

{
    "image_config": dict,        # e.g., {"aspect_ratio": "1:1"}
    "temperature": float,        # e.g., 0.7
    "max_completion_tokens": int, # e.g., 2048
    "seed": int,                 # e.g., 42
    "top_p": float,              # e.g., 0.9
    "frequency_penalty": float,  # e.g., 0.5
    "presence_penalty": float,   # e.g., 0.5
    "stop": list,                # e.g., ["Stop"]
    "reasoning_effort": str,     # e.g., "minimal", "low", "medium", "high"
    "verbosity": str,            # e.g., "low", "medium", "high"
    "service_tier": str,         # e.g., "auto", "default"
    "top_logprobs": int,
    "safety_identifier": str,
    "store": bool,
    "enable_thinking": bool,
}

Default Aspect Ratios

{
    "1:1": "1024×1024",
    "2:3": "832×1248",
    "3:2": "1248×832",
    "3:4": "864×1184",
    "4:3": "1184×864",
    "4:5": "896×1152",
    "5:4": "1152×896",
    "9:16": "768×1344",
    "16:9": "1344×768",
    "21:9": "1536×672",
}

Default Import Maps

{
    "vue-prod": "/ui/lib/vue.min.mjs",
    "vue": "/ui/lib/vue.mjs",
    "vue-router": "/ui/lib/vue-router.min.mjs",
    "@servicestack/client": "/ui/lib/servicestack-client.mjs",
    "@servicestack/vue": "/ui/lib/servicestack-vue.mjs",
    "idb": "/ui/lib/idb.min.mjs",
    "marked": "/ui/lib/marked.min.mjs",
    "highlight.js": "/ui/lib/highlight.min.mjs",
    "chart.js": "/ui/lib/chart.js",
    "color.js": "/ui/lib/color.js",
    "ctx.mjs": "/ui/ctx.mjs",
}

Server Extensions

On this page