llms.py
Extensions

Server Extensions

This guide provides a walkthrough of the LLM Server Extensions API

To keep the core lightweight while enabling limitless enhancements, the flexible Extensions system (inspired by ComfyUI Custom Nodes) can be used to add features, register new provider implementations, extend, replace, and customize the UI with your own custom features.

Installation

Extensions can be installed from GitHub or by creating a local folder:

  • Local: Simply create a folder in ~/.llms/extensions/my_extension
  • GitHub: Clone GitHub Repo extensions into ~/.llms/extensions e.g:
git clone https://github.com/user/my_extension ~/.llms/extensions/my_extension

How it Works (Server)

Extensions are Python modules that plug into the server lifecycle using hooks defined in their __init__.py:

HookPurpose
__parser__(parser)Add custom CLI arguments for custom CLI functionality
__install__(ctx)Enhance the server instance (routes, providers, filters, etc.)
__run__(ctx)Execute custom logic when running in CLI mode (to implement __parser__ args)
  • The ctx parameter provides access to the ExtensionContext
  • The __install__ hook runs after providers are configured but before server is run

How it Works (UI)

Extensions can also include frontend components:

  1. Placement: Add a ui folder within your extension directory
  2. Access: Static files in this folder are automatically served at /ext/<extension_name>/*
  3. Integration: Create a ui/index.mjs file. This is the entry point and must export an install function:
// Vue 3 Component
const MyComponent = {
    template: `...`
}

// ui/index.mjs
export default {
    install(ctx) {
        // Register or replace components, add routes, etc.
        ctx.components({ MyComponent })
    }
}

Register GET and POST API Endpoints

Custom server API endpoints are registered at /ext/<extension_name>/*

xmas init.py:

Example of creating a simple extension that adds a greet endpoint to the server:

def install(ctx):
    # Load greetings from extension's `./ui/greetings.json`
    greetings_path = Path(__file__).parent / 'ui' / 'greetings.json'
    if greetings_path.exists():
        with open(greetings_path) as f:
            greetings = json.load(f)
    else:
        greetings = ["Merry Christmas!"]

    count = 0
    async def greet(request):
        nonlocal count
        name = request.query.get('name')
        if not name:
            data = await request.post()
            name = data.get('name')

        if not name:
            name = 'Stranger'

        greeting = greetings[count % len(greetings)]
        count += 1
        return web.json_response({"result":f"Hello {name}, {greeting}"})

    # Extension endpoints registered at /ext/xmas/*
    ctx.add_get("greet", greet)   # GET  /ext/xmas/greet
    ctx.add_post("greet", greet)  # POST /ext/xmas/greet

The xmas/ui/greetings.json static file is automatically available from /ext/xmas/greetings.json.

Example of using making Chat Completion requests inside an endpoint:

    async def story(request):
        model = request.query.get("model")
        chat = ctx.chat_request(template="text", model=model, text="Tell me a short Christmas tale")
        response = await ctx.chat_completion(chat)
        return web.json_response(response)

    ctx.add_get("story", story)  # GET /ext/xmas/story

The chat_request uses chat templates configured in your llms.json defaults configuration:

  • chat - Chat template for text requests (default)
  • image - Chat template for image input requests
  • audio - Chat template for audio input requests
  • file - Chat template for file input requests (e.g. PDFs)
  • out:image - Chat template for image generation requests
  • out:audio - Chat template for audio generation requests

Finally, connect it to the install hook to be able to run the extension:

# register install extension handler
__install__ = install

Example endpoint returning user-specific data

system_tools init.py:

This extension first checks if user is signed in (Github OAuth), then returns prompts for that user if exists, otherwise returns default prompts from all users or default prompts in this extension.

def install(ctx):
    def get_user_prompts(request):
        candidate_paths = []        
        username = ctx.get_username(request) # check if user is signed in
        if username:
            # if signed in (Github OAuth), return the prompts for this user if exists
            candidate_paths.append(os.path.join(Path.home(), ".llms", "user", username, "system_prompts", "prompts.json"))
        # return default prompts for all users if exists
        candidate_paths.append(os.path.join(Path.home(), ".llms", "user", "default", "system_prompts", "prompts.json"))
        # otherwise return the default prompts from this repo
        candidate_paths.append(os.path.join(ctx.path, "ui", "prompts.json"))

        # iterate all candidate paths and when exists return its json
        for path in candidate_paths:
            if os.path.exists(path):
                with open(path, encoding="utf-8") as f:
                    txt = f.read()
                    return json.loads(txt)
        return default_prompts

    # API Handler to get prompts
    async def get_prompts(request):
        prompts_json = get_user_prompts(request)
        return web.json_response(prompts_json)

    # Extension endpoint registered at /ext/system_prompts/prompts.json
    ctx.add_get("prompts.json", get_prompts)

Register custom tool with 3rd Party extension

Example of creating a tool extension that adds a web search tool to the server. Uses requirements.txt to install 3rd Party ddgs dependency.

duckduckgo init.py

from ddgs import DDGS
from typing import Any, Dict

def web_search(query: str, max_results: int | None = 10, page: int = 1) -> Dict[str, Any]:
    """
    Perform a web search using DuckDuckGo.
    """

    try:
        results = []
        with DDGS() as ddgs:
            # text() returns an iterator
            for r in ddgs.text(query, max_results=max_results):
                results.append(r)
        return {"query": query, "results": results}
    except Exception as e:
        return {"query": query, "error": str(e)}

def install(ctx):
    ctx.register_tool(web_search)

__install__ = install

See Tool Support Docs and core_tools implementation for more examples.

Custom Provider Implementation example

providers/openrouter.py

Example of creating a custom provider that extends the GeneratorBase class to add support for image generation in OpenRouter.

def install(ctx):
    from llms.main import GeneratorBase

    # https://openrouter.ai/docs/guides/overview/multimodal/image-generation
    class OpenRouterGenerator(GeneratorBase):
        sdk = "openrouter/image"

        def __init__(self, **kwargs):
            super().__init__(**kwargs)

        def to_response(self, response, chat, started_at):
            # go through all image responses and save them to cache
            for choice in response["choices"]:
                if "message" in choice and "images" in choice["message"]:
                    for image in choice["message"]["images"]:
                        if choice["message"]["content"] == "":
                            choice["message"]["content"] = self.default_content
                        if "image_url" in image:
                            data_uri = image["image_url"]["url"]
                            if data_uri.startswith("data:"):
                                parts = data_uri.split(",", 1)
                                ext = parts[0].split(";")[0].split("/")[1]
                                base64_data = parts[1]
                                model = chat["model"].split("/")[-1]
                                filename = f"{model}-{choice['index']}.{ext}"
                                info = {
                                    "model": model,
                                    "prompt": ctx.last_user_prompt(chat),
                                }
                                relative_url, info = ctx.save_image_to_cache(base64_data, filename, info)
                                image["image_url"]["url"] = relative_url

            return response

        async def chat(self, chat, provider=None):
            headers = self.get_headers(provider, chat)
            if provider is not None:
                chat["model"] = provider.provider_model(chat["model"]) or chat["model"]

            started_at = time.time()
            if ctx.MOCK:
                print("Mocking OpenRouterGenerator")
                text = ctx.text_from_file(f"{ctx.MOCK_DIR}/openrouter-image.json")
                return ctx.log_json(self.to_response(json.loads(text), chat, started_at))
            else:
                chat_url = provider.chat_url
                chat = await self.process_chat(chat, provider_id=self.id)
                ctx.log(f"POST {chat_url}")
                ctx.log(provider.chat_summary(chat))
                # remove metadata if any (conflicts with some providers, e.g. Z.ai)
                chat.pop("metadata", None)

                async with aiohttp.ClientSession() as session, session.post(
                    chat_url,
                    headers=headers,
                    data=json.dumps(chat),
                    timeout=aiohttp.ClientTimeout(total=300),
                ) as response:
                    return ctx.log_json(self.to_response(await self.response_json(response), chat, started_at))

    ctx.add_provider(OpenRouterGenerator)


__install__ = install

This new implementation can be used by registering it as the image modality whose npm matches the providers sdk in llms.json, e.g:

{
    "openrouter": {
        "enabled": true,
        "id": "openrouter",
        "modalities": {
            "image": {
                "name": "OpenRouter Image",
                "npm": "openrouter/image"
            }
        }
    }
}

Find more Provider implementations in the providers extension.


Server Extensions API

Server Extensions allow you to extend the functionality of the LLM Server by registering new providers, UI extensions, HTTP routes, and hooking into the chat pipeline.

The Public API surface is exposed via the ExtensionContext class in main.py which provides access to the Server's functionality.

// __init__.py
def install(ctx):
    # ctx is an instance of ExtensionContext
    ...

Logging & Debugging

Methods for logging information to the console with the extension name prefix.

log(message)

Log a message to stdout if verbose mode is enabled.

  • message: str - The message to log.

log_json(obj)

Log a JSON object to stdout if verbose mode is enabled.

  • obj: Any - The object to serialize and log.

dbg(message)

Log a debug message to stdout if debug mode is enabled.

  • message: str - The debug message.

err(message, e)

Log an error message and exception trace.

  • message: str - The error description.
  • e: Exception - The exception object.

Registration & Configuration

Methods to register various extension components.

add_provider(provider)

Register a new LLM provider.

  • provider: class - The provider class to register.

register_ui_extension(index)

Register a UI extension that will be loaded in the browser.

  • index: str - Relative path to the index file (e.g. "index.html" or "app.mjs") within the extension directory.

register_tool(func, tool_def=None)

Register a function as a tool that can be used by LLMs.

  • func: callable - The Python function to register.
  • tool_def: dict (Optional) - Manual tool definition. If None, it's generated from the function signature.

add_static_files(ext_dir)

Serve static files from a directory.

  • ext_dir: str - Absolute path to the directory containing static files.

register_shutdown_handler(handler)

Register a callback to be called when the server shuts down.

  • handler: callable - The function to call on shutdown.

Index Page

Modify the main index.html page served to the browser.

add_importmaps(dict)

Add entries to the browser's import map, allowing you to map package names to URLs.

  • dict: dict - A dictionary of import map entries (e.g. {"vue": "/ui/lib/vue.mjs"}).

add_index_header(html)

Inject HTML into the <head> section of the main index page.

  • html: str - The HTML string to inject.

add_index_footer(html)

Inject HTML into the end of the <body> section of the main index page.

  • html: str - The HTML string to inject.

HTTP Routes

Register custom HTTP endpoints. All paths are prefixed with /ext/{extension_name}.

add_get(path, handler, **kwargs)

Register a GET route.

  • path: str - The sub-path for the route.
  • handler: callable - Async function taking request and returning web.Response.

add_post(path, handler, **kwargs)

Register a POST route.

  • path: str - The sub-path for the route.
  • handler: callable - Async function taking request and returning web.Response.

add_put(path, handler, **kwargs)

Register a PUT route.

add_delete(path, handler, **kwargs)

Register a DELETE route.

add_patch(path, handler, **kwargs)

Register a PATCH route.

Chat & LLM Interaction

Methods to interact with the LLM chat pipeline.

chat_request(template=None, text=None, model=None, system_prompt=None)

Create a new chat request object, typically to be sent to chat_completion.

  • template: str (Optional) - Template ID to use.
  • text: str (Optional) - User message text.
  • model: str (Optional) - Model identifier.
  • system_prompt: str (Optional) - System prompt to use.

chat_completion(chat, context=None)

Execute a chat completion request against the configured LLM.

  • chat: dict - The chat request object.
  • context: dict (Optional) - execution context.
  • Returns: ChatResponse - The LLM's response.

chat_to_prompt(chat)

Convert a chat object to a prompt string (depends on configured prompts).

chat_to_system_prompt(chat)

Extract or generate the system prompt from a chat object.

chat_response_to_message(response)

Convert a provider's raw response to a standard message format.

last_user_prompt(chat)

Get the last user message from a chat history.

Filters

Hooks to intercept and modify the chat lifecycle.

register_chat_request_filter(handler)

Register a filter to modify chat requests before they are processed.

  • handler: callable(request)

register_chat_tool_filter(handler)

Register a filter to modify or restrict tools available to the LLM.

  • handler: callable(tools, context)

register_chat_response_filter(handler)

Register a filter to modify chat responses before they are returned to the client.

  • handler: callable(response, context)

register_chat_error_filter(handler)

Register a filter to handle or transform exceptions during chat.

  • handler: callable(error, context)

register_cache_saved_filter(handler)

Register a filter called when a response is saved to cache.

  • handler: callable(context)

Authentication & User Context

Access user session and authentication information.

check_auth(request)

Check if the request is authenticated.

  • Returns: (bool, dict) - Tuple of (is_authenticated, user_data).

get_session(request)

Get the session data for the current request.

  • Returns: dict or None.

get_username(request)

Get the authenticated username from the request.

  • Returns: str or None.

get_user_path(username=None)

Get the absolute path to a user's data directory.

  • username: str (Optional) - Specific username, otherwise uses current context or default.

Files & Storage

Utilities for file handling and caching.

text_from_file(path)

Read text content from a file.

save_image_to_cache(base64_data, filename, image_info)

Save a base64 encoded image to the media cache.

save_bytes_to_cache(bytes_data, filename, file_info)

Save raw bytes to the media cache.

get_cache_path(path="")

Get the absolute path to the global cache directory.

to_file_info(chat, info=None, response=None)

Helper to create file metadata info from a chat context.

cache_message_inline_data(message)

Process a message to extract inline data (like images) to cache and replace with URLs.

Utilities

get_config()

Get the global server configuration object.

get_providers()

Get a list of all registered provider handlers.

get_provider(name)

Get a specific provider instance by name.

to_content(result)

Convert a result object (e.g. from a tool) into a standard content string format.

error_response(e, stacktrace=False)

Create a standardized error HTTP response from an exception.

should_cancel_thread(context)

Check if the current processing thread has been flagged for cancellation.

On this page

InstallationHow it Works (Server)How it Works (UI)Register GET and POST API EndpointsExample endpoint returning user-specific dataRegister custom tool with 3rd Party extensionCustom Provider Implementation exampleServer Extensions APILogging & Debugginglog(message)log_json(obj)dbg(message)err(message, e)Registration & Configurationadd_provider(provider)register_ui_extension(index)register_tool(func, tool_def=None)add_static_files(ext_dir)register_shutdown_handler(handler)Index Pageadd_importmaps(dict)add_index_header(html)add_index_footer(html)HTTP Routesadd_get(path, handler, **kwargs)add_post(path, handler, **kwargs)add_put(path, handler, **kwargs)add_delete(path, handler, **kwargs)add_patch(path, handler, **kwargs)Chat & LLM Interactionchat_request(template=None, text=None, model=None, system_prompt=None)chat_completion(chat, context=None)chat_to_prompt(chat)chat_to_system_prompt(chat)chat_response_to_message(response)last_user_prompt(chat)Filtersregister_chat_request_filter(handler)register_chat_tool_filter(handler)register_chat_response_filter(handler)register_chat_error_filter(handler)register_cache_saved_filter(handler)Authentication & User Contextcheck_auth(request)get_session(request)get_username(request)get_user_path(username=None)Files & Storagetext_from_file(path)save_image_to_cache(base64_data, filename, image_info)save_bytes_to_cache(bytes_data, filename, file_info)get_cache_path(path="")to_file_info(chat, info=None, response=None)cache_message_inline_data(message)Utilitiesget_config()get_providers()get_provider(name)to_content(result)error_response(e, stacktrace=False)should_cancel_thread(context)