Server Extensions
This guide provides a walkthrough of the LLM Server Extensions API
To keep the core lightweight while enabling limitless enhancements, the flexible Extensions system (inspired by ComfyUI Custom Nodes) can be used to add features, register new provider implementations, extend, replace, and customize the UI with your own custom features.
Installation
Extensions can be installed from GitHub or by creating a local folder:
- Local: Simply create a folder in
~/.llms/extensions/my_extension - GitHub: Clone GitHub Repo extensions into
~/.llms/extensionse.g:
git clone https://github.com/user/my_extension ~/.llms/extensions/my_extensionHow it Works (Server)
Extensions are Python modules that plug into the server lifecycle using hooks defined in their __init__.py:
| Hook | Purpose |
|---|---|
__parser__(parser) | Add custom CLI arguments |
__install__(ctx) | Enhance the server instance (routes, providers, filters, etc.) |
__load__(ctx) | Load data or perform async tasks before server starts |
__run__(ctx) | Execute custom logic when running in CLI mode |
- The
ctxparameter provides access to theExtensionContext - The
__install__hook runs after providers are configured but before server is run
How it Works (UI)
Extensions can also include frontend components:
- Placement: Add a
uifolder within your extension directory - Access: Static files in this folder are automatically served at
/ext/<extension_name>/* - Integration: Create a
ui/index.mjsfile. This is the entry point and must export aninstallfunction:
// Vue 3 Component
const MyComponent = {
template: `...`
}
// ui/index.mjs
export default {
install(ctx) {
// Register or replace components, add routes, etc.
ctx.components({ MyComponent })
}
}Register GET and POST API Endpoints
Custom server API endpoints are registered at /ext/<extension_name>/*
Example of creating a simple extension that adds a greet endpoint to the server:
def install(ctx):
# Load greetings from extension's `./ui/greetings.json`
greetings_path = Path(__file__).parent / 'ui' / 'greetings.json'
if greetings_path.exists():
with open(greetings_path) as f:
greetings = json.load(f)
else:
greetings = ["Merry Christmas!"]
count = 0
async def greet(request):
nonlocal count
name = request.query.get('name')
if not name:
data = await request.post()
name = data.get('name')
if not name:
name = 'Stranger'
greeting = greetings[count % len(greetings)]
count += 1
return web.json_response({"result":f"Hello {name}, {greeting}"})
# Extension endpoints registered at /ext/xmas/*
ctx.add_get("greet", greet) # GET /ext/xmas/greet
ctx.add_post("greet", greet) # POST /ext/xmas/greetThe xmas/ui/greetings.json static file is automatically available from /ext/xmas/greetings.json.
Example of using making Chat Completion requests inside an endpoint:
async def story(request):
model = request.query.get("model")
chat = ctx.chat_request(template="text", model=model, text="Tell me a short Christmas tale")
response = await ctx.chat_completion(chat)
return web.json_response(response)
ctx.add_get("story", story) # GET /ext/xmas/storyThe chat_request uses chat templates configured in your llms.json defaults configuration:
- chat - Chat template for text requests (default)
- image - Chat template for image input requests
- audio - Chat template for audio input requests
- file - Chat template for file input requests (e.g. PDFs)
- out:image - Chat template for image generation requests
- out:audio - Chat template for audio generation requests
Finally, connect it to the install hook to be able to run the extension:
# register install extension handler
__install__ = installExample endpoint returning user-specific data
This extension first checks if user is signed in (Github OAuth), then returns prompts for that user if exists, otherwise returns default prompts from all users or default prompts in this extension.
def install(ctx):
def get_user_prompts(request):
candidate_paths = []
username = ctx.get_username(request) # check if user is signed in
if username:
# if signed in (Github OAuth), return the prompts for this user if exists
candidate_paths.append(os.path.join(Path.home(), ".llms", "user", username, "system_prompts", "prompts.json"))
# return default prompts for all users if exists
candidate_paths.append(os.path.join(Path.home(), ".llms", "user", "default", "system_prompts", "prompts.json"))
# otherwise return the default prompts from this repo
candidate_paths.append(os.path.join(ctx.path, "ui", "prompts.json"))
# iterate all candidate paths and when exists return its json
for path in candidate_paths:
if os.path.exists(path):
with open(path, encoding="utf-8") as f:
txt = f.read()
return json.loads(txt)
return default_prompts
# API Handler to get prompts
async def get_prompts(request):
prompts_json = get_user_prompts(request)
return web.json_response(prompts_json)
# Extension endpoint registered at /ext/system_prompts/prompts.json
ctx.add_get("prompts.json", get_prompts)Register custom tool with 3rd Party extension
Example of creating a tool extension that adds a web search tool to the server.
Uses requirements.txt to install 3rd Party ddgs dependency.
from ddgs import DDGS
from typing import Any, Dict
def web_search(query: str, max_results: int | None = 10, page: int = 1) -> Dict[str, Any]:
"""
Perform a web search using DuckDuckGo.
"""
try:
results = []
with DDGS() as ddgs:
# text() returns an iterator
for r in ddgs.text(query, max_results=max_results):
results.append(r)
return {"query": query, "results": results}
except Exception as e:
return {"query": query, "error": str(e)}
def install(ctx):
ctx.register_tool(web_search)
__install__ = installSee Tool Support Docs and core_tools implementation for more examples.
Custom Provider Implementation example
Example of creating a custom provider that extends the GeneratorBase class to add support for image generation in OpenRouter.
def install(ctx):
from llms.main import GeneratorBase
# https://openrouter.ai/docs/guides/overview/multimodal/image-generation
class OpenRouterGenerator(GeneratorBase):
sdk = "openrouter/image"
def __init__(self, **kwargs):
super().__init__(**kwargs)
def to_response(self, response, chat, started_at):
# go through all image responses and save them to cache
for choice in response["choices"]:
if "message" in choice and "images" in choice["message"]:
for image in choice["message"]["images"]:
if choice["message"]["content"] == "":
choice["message"]["content"] = self.default_content
if "image_url" in image:
data_uri = image["image_url"]["url"]
if data_uri.startswith("data:"):
parts = data_uri.split(",", 1)
ext = parts[0].split(";")[0].split("/")[1]
base64_data = parts[1]
model = chat["model"].split("/")[-1]
filename = f"{model}-{choice['index']}.{ext}"
info = {
"model": model,
"prompt": ctx.last_user_prompt(chat),
}
relative_url, info = ctx.save_image_to_cache(base64_data, filename, info)
image["image_url"]["url"] = relative_url
return response
async def chat(self, chat, provider=None):
headers = self.get_headers(provider, chat)
if provider is not None:
chat["model"] = provider.provider_model(chat["model"]) or chat["model"]
started_at = time.time()
if ctx.MOCK:
print("Mocking OpenRouterGenerator")
text = ctx.text_from_file(f"{ctx.MOCK_DIR}/openrouter-image.json")
return ctx.log_json(self.to_response(json.loads(text), chat, started_at))
else:
chat_url = provider.chat_url
chat = await self.process_chat(chat, provider_id=self.id)
ctx.log(f"POST {chat_url}")
ctx.log(provider.chat_summary(chat))
# remove metadata if any (conflicts with some providers, e.g. Z.ai)
chat.pop("metadata", None)
async with aiohttp.ClientSession() as session, session.post(
chat_url,
headers=headers,
data=json.dumps(chat),
timeout=aiohttp.ClientTimeout(total=300),
) as response:
return ctx.log_json(self.to_response(await self.response_json(response), chat, started_at))
ctx.add_provider(OpenRouterGenerator)
__install__ = installThis new implementation can be used by registering it as the image modality whose npm matches the providers sdk in llms.json, e.g:
{
"openrouter": {
"enabled": true,
"id": "openrouter",
"modalities": {
"image": {
"name": "OpenRouter Image",
"npm": "openrouter/image"
}
}
}
}Find more Provider implementations in the providers extension.
Server Extensions APIs
This document covers the public APIs available to server-side extensions via the ExtensionContext and AppExtensions classes.
Overview
When creating a server extension, you work with an ExtensionContext instance that provides access to all extension capabilities. The ExtensionContext is passed to your extension's init(ctx) function and serves as the primary interface for registering handlers, routes, tools, and accessing server functionality.
def init(ctx: ExtensionContext):
# Your extension initialization code here
ctx.register_tool(my_tool_function, group="my_tools")
ctx.add_get("status", handle_status)Example Extension
Here's a complete example demonstrating common extension patterns:
import os
from aiohttp import web
def init(ctx):
"""Initialize the example extension."""
# Register a custom tool
def greet_user(name: str, formal: bool = False) -> str:
"""Greet a user by name.
Args:
name: The user's name
formal: Whether to use formal greeting
Returns:
A greeting message
"""
if formal:
return f"Good day, {name}. How may I assist you?"
return f"Hey {name}! What's up?"
ctx.register_tool(greet_user, group="social")
# Register API routes
async def get_status(request):
is_auth, user = ctx.check_auth(request)
return web.json_response({
"extension": ctx.name,
"authenticated": is_auth,
"user": user.get("userName") if user else None
})
ctx.add_get("status", get_status)
async def create_item(request):
is_auth, _ = ctx.check_auth(request)
if not is_auth:
return ctx.error_auth_required
try:
data = await request.json()
# Process data...
return web.json_response({"success": True, "id": "123"})
except Exception as e:
return web.json_response(ctx.error_response(e), status=500)
ctx.add_post("items", create_item)
# Register filters
async def log_requests(chat, context):
ctx.log(f"Chat request for thread: {context.get('threadId')}")
ctx.register_chat_request_filter(log_requests)
# Cleanup on shutdown
def cleanup():
ctx.log("Extension shutting down, cleaning up...")
ctx.register_shutdown_handler(cleanup)
ctx.log("Example extension initialized!")ExtensionContext
The ExtensionContext class is the main interface for extensions. It provides access to the extension's configuration, logging, and registration methods.
Properties
| Property | Type | Description |
|---|---|---|
app | AppExtensions | Reference to the parent application extensions manager |
cli_args | argparse.Namespace | Command-line arguments passed to the server |
extra_args | Dict[str, Any] | Additional arguments from extensions |
error_auth_required | Dict[str, Any] | Pre-built authentication required error response |
path | str | File path of the extension |
name | str | Name of the extension (derived from filename) |
ext_prefix | str | URL prefix for extension routes (e.g., /ext/myext) |
debug | bool | Whether debug mode is enabled |
verbose | bool | Whether verbose logging is enabled |
aspect_ratios | Dict[str, str] | Available image aspect ratios (e.g., "1:1": "1024×1024") |
request_args | Dict[str, type] | Supported chat request arguments with their types |
disabled | bool | Whether the extension is disabled |
Logging Methods
log(message: Any) -> Any
Log a message when verbose mode is enabled. Returns the message for chaining.
ctx.log("Processing request...")log_json(obj: Any) -> Any
Log an object as formatted JSON when verbose mode is enabled. Returns the object for chaining.
ctx.log_json({"status": "ok", "count": 42})dbg(message: Any)
Log a debug message when debug mode is enabled.
ctx.dbg("Entering handler with params: ...")err(message: str, e: Exception)
Log an error with exception details. Prints stack trace in verbose mode.
try:
process_data()
except Exception as e:
ctx.err("Failed to process data", e)Route Registration
Routes are automatically prefixed with /ext/{extension_name}.
add_get(path: str, handler: Callable, **kwargs: Any)
Register a GET route handler.
async def handle_status(request):
return web.json_response({"status": "ok"})
ctx.add_get("status", handle_status) # Available at /ext/myext/statusadd_post(path: str, handler: Callable, **kwargs: Any)
Register a POST route handler.
async def handle_create(request):
data = await request.json()
return web.json_response({"created": True})
ctx.add_post("create", handle_create)add_put(path: str, handler: Callable, **kwargs: Any)
Register a PUT route handler.
add_delete(path: str, handler: Callable, **kwargs: Any)
Register a DELETE route handler.
add_patch(path: str, handler: Callable, **kwargs: Any)
Register a PATCH route handler.
add_static_files(ext_dir: str)
Serve static files from a directory under the extension's URL prefix.
# Serve files from ./ui directory at /ext/myext/*
ext_dir = os.path.join(os.path.dirname(__file__), "ui")
ctx.add_static_files(ext_dir)web_path(method: str, path: str) -> str
Get the full URL path for a route (internal helper).
Filter Registration
Filters intercept and can modify requests/responses at various stages.
register_chat_request_filter(handler: Callable)
Register a filter that runs before chat requests are processed.
async def filter_request(chat: Dict, context: Dict):
# Modify chat request before processing
chat["metadata"] = {"source": "extension"}
ctx.register_chat_request_filter(filter_request)register_chat_tool_filter(handler: Callable)
Register a filter that runs when tools are invoked.
async def on_tool_call(chat: Dict, context: Dict):
ctx.log(f"Tool called in thread: {context.get('threadId')}")
ctx.register_chat_tool_filter(on_tool_call)register_chat_response_filter(handler: Callable)
Register a filter that runs after chat responses are generated.
async def filter_response(response: Dict, context: Dict):
# Modify or log response
pass
ctx.register_chat_response_filter(filter_response)register_chat_error_filter(handler: Callable)
Register a filter that runs when chat errors occur.
async def on_error(e: Exception, context: Dict):
ctx.log(f"Error: {e}, Stack: {context.get('stackTrace')}")
ctx.register_chat_error_filter(on_error)register_cache_saved_filter(handler: Callable)
Register a filter that runs when responses are saved to cache.
def on_cache_saved(context: Dict):
ctx.log(f"Cached: {context['url']}")
ctx.register_cache_saved_filter(on_cache_saved)register_shutdown_handler(handler: Callable)
Register a handler to run when the server shuts down.
def cleanup():
ctx.log("Extension shutting down...")
ctx.register_shutdown_handler(cleanup)Tool Registration
register_tool(func:Callable, tool_def:Optional[Dict]=None, group:Optional[str]=None)
Register a tool function that LLMs can invoke.
def search_database(query: str, limit: int = 10) -> Dict[str, Any]:
"""Search the database for matching records.
Args:
query: Search query string
limit: Maximum number of results to return
Returns:
Dictionary containing search results
"""
results = do_search(query, limit)
return {"results": results}
ctx.register_tool(search_database, group="database")- If
tool_defis not provided, it's automatically generated from the function signature and docstring groupcategorizes the tool for UI display (defaults to"custom")
get_tool_definition(name: str) -> Optional[Dict[str, Any]]
Retrieve the tool definition for a registered tool.
tool_def = ctx.get_tool_definition("search_database")sanitize_tool_def(tool_def: Dict[str, Any]) -> Dict[str, Any]
Process a tool definition to inline $defs references.
Tool Execution
async exec_tool(name: str, args: Dict[str, Any])
-> Tuple[Optional[str], List[Dict[str, Any]]]Execute a registered tool by name.
text, resources = await ctx.exec_tool("search_database", {"query": "test"})The text captures the tool text response that can be embedded in AI Messages and further passed to other AI Reqeusts and tools.
The resources contains a list of artifacts extracted from the tool results in the same structure as Open AI content types.
tool_result(result: Any,
function_name: Optional[str] = None,
function_args: Optional[Dict] = None)
-> Dict[str, Any]Format a tool execution result for return to the LLM.
tool_result_part(result: Dict,
function_name: Optional[str] = None,
function_args: Optional[Dict] = None)
-> Dict[str, Any]Format a partial tool result.
to_content(result: Any) -> str
Convert a result to string content.
Chat Utilities
chat_request(
template: Optional[str] = None,
text: Optional[str] = None,
model: Optional[str] = None,
system_prompt: Optional[str] = None)
-> Dict[str, Any]Create a chat request object.
chat = ctx.chat_request(
text="Summarize this document",
model="gpt-4o",
system_prompt="You are a helpful assistant"
)async chat_completion(chat: Dict[str, Any], context: Optional[Dict] = None) -> Any
Send a chat completion request.
chat = ctx.chat_request(text="Hello, world!")
response = await ctx.chat_completion(chat)create_chat_with_tools(chat: Dict[str, Any], use_tools: str = "all") -> Dict[str, Any]
Create a chat request with tools injected.
chat = ctx.chat_request(text="Search for recent news")
chat_with_tools = ctx.create_chat_with_tools(chat, use_tools="search_web,fetch_page")chat_to_prompt(chat: Dict[str, Any]) -> str
Extract the user prompt from a chat object.
chat_to_system_prompt(chat: Dict[str, Any]) -> str
Extract the system prompt from a chat object.
last_user_prompt(chat: Dict[str, Any]) -> str
Get the last user message from a chat.
chat_response_to_message(response: Dict[str, Any]) -> Dict[str, Any]
Convert a chat response to a message format.
chat_to_aspect_ratio(chat: Dict[str, Any]) -> str
Extract aspect ratio from chat request.
File Utilities
text_from_file(path: str) -> str
Read text content from a file.
content = ctx.text_from_file("/path/to/file.txt")json_from_file(path: str) -> Any
Read and parse JSON from a file.
data = ctx.json_from_file("/path/to/config.json")download_file(url: str) -> Tuple[bytes, Dict[str, Any]]
Download a file from a URL. Returns bytes and metadata.
session_download_file(session: aiohttp.ClientSession, url: str)
-> Tuple[bytes, Dict[str, Any]]Download a file using an existing aiohttp session.
read_binary_file(url: str) -> Tuple[bytes, Dict[str, Any]]
Read binary file content from a URL or path.
Cache Utilities
get_cache_path(path: str = "") -> str
Get the full path to a cache location.
cache_file = ctx.get_cache_path("my_extension/data.json")save_image_to_cache(
base64_data: Union[str, bytes],
filename: str,
image_info: Dict[str, Any],
ignore_info: bool = False)
-> Tuple[str, Optional[Dict[str, Any]]]Save image data to the cache. Returns the cache path and info.
path, info = ctx.save_image_to_cache(b64_data, "output.png", {"prompt": "..."})save_bytes_to_cache(
bytes_data: Union[str, bytes],
filename: str,
file_info: Optional[Dict[str, Any]])
-> Tuple[str, Optional[Dict[str, Any]]]Save binary data to the cache.
cache_message_inline_data(message: Dict[str, Any])
Cache inline data (e.g., base64 images) from a message.
Provider Access
get_providers() -> Dict[str, Any]
Get all registered LLM providers.
providers = ctx.get_providers()
for name, provider in providers.items():
ctx.log(f"Provider: {name}")get_provider(name: str) -> Optional[Any]
Get a specific provider by name.
openai = ctx.get_provider("openai")add_provider(provider: Any)
Register a new LLM provider class.
from llms.main import OpenAiCompatible
class MyProvider(OpenAiCompatible):
name = "my-provider"
# ...
ctx.add_provider(MyProvider)Authentication & Sessions
check_auth(request: web.Request) -> Tuple[bool, Optional[Dict[str, Any]]]
Check if a request is authenticated. Returns (is_authenticated, user_data).
async def protected_route(request):
is_auth, user = ctx.check_auth(request)
if not is_auth:
return ctx.error_auth_required
return web.json_response({"user": user})get_session(request: web.Request) -> Optional[Dict[str, Any]]
Get the session data for a request.
session = ctx.get_session(request)
if session:
ctx.log(f"User: {session.get('userName')}")get_username(request: web.Request) -> Optional[str]
Get the username from a request's session.
get_user_path(username: Optional[str] = None) -> str
Get the filesystem path for user-specific data.
user_dir = ctx.get_user_path("john") # ~/.llms/user/johncontext_to_username(context: Optional[Dict[str, Any]]) -> Optional[str]
Extract username from a context dictionary containing a request.
Configuration & Utilities
get_config() -> Optional[Dict[str, Any]]
Get the current server configuration.
config = ctx.get_config()
api_key = config.get("auth", {}).get("api_key")get_file_mime_type(filename: str) -> str
Get the MIME type for a filename.
mime = ctx.get_file_mime_type("image.png") # "image/png"to_file_info(
chat: Dict[str, Any],
info: Optional[Dict] = None,
response: Optional[Dict] = None)
-> Dict[str, Any]Create file info metadata from chat/response data.
group_resources(resources: List[Dict[str, Any]]) -> Dict[str, List[Dict[str, Any]]]
Group a list of resources by category.
should_cancel_thread(context: Dict[str, Any]) -> bool
Check if the current thread/request should be cancelled.
Error Handling
error_message(e: Exception) -> str
Extract a user-friendly error message from an exception.
try:
risky_operation()
except Exception as e:
msg = ctx.error_message(e)
return web.json_response({"error": msg}, status=500)error_response(e: Exception, stacktrace: bool = False) -> Dict[str, Any]
Create an error response dictionary from an exception.
try:
process()
except Exception as e:
return web.json_response(ctx.error_response(e, stacktrace=True), status=500)UI Registration
register_ui_extension(index: str)
Register a UI extension entry point.
ctx.register_ui_extension("index.mjs") # Registers /ext/myext/index.mjsadd_importmaps(dict: Dict[str, str])
Add JavaScript import map entries.
ctx.add_importmaps({
"my-lib": "/ext/myext/lib/my-lib.mjs"
})add_index_header(html: str)
Add HTML to the main page header.
ctx.add_index_header('<link rel="stylesheet" href="/ext/myext/styles.css">')add_index_footer(html: str)
Add HTML to the main page footer.
ctx.add_index_footer('<script src="/ext/myext/analytics.js"></script>')AppExtensions
The AppExtensions class manages all registered extensions and provides shared state. While extensions primarily interact through ExtensionContext, some AppExtensions properties are accessible.
Properties
| Property | Type | Description |
|---|---|---|
cli_args | argparse.Namespace | Command-line arguments |
extra_args | Dict[str, Any] | Additional extension arguments |
config | Dict[str, Any] | Server configuration |
auth_enabled | bool | Whether authentication is enabled |
ui_extensions | List[Dict] | Registered UI extensions |
tools | Dict[str, Callable] | Registered tool functions by name |
tool_definitions | List[Dict] | Tool definitions for LLM consumption |
tool_groups | Dict[str, List[str]] | Tool names grouped by category |
all_providers | List[type] | All registered provider classes |
import_maps | Dict[str, str] | JavaScript import map entries |
index_headers | List[str] | HTML headers for main page |
index_footers | List[str] | HTML footers for main page |
aspect_ratios | Dict[str, str] | Image aspect ratio mappings |
request_args | Dict[str, type] | Supported request argument types |
Default Request Arguments
The request_args dictionary defines supported chat request parameters:
{
"image_config": dict, # e.g., {"aspect_ratio": "1:1"}
"temperature": float, # e.g., 0.7
"max_completion_tokens": int, # e.g., 2048
"seed": int, # e.g., 42
"top_p": float, # e.g., 0.9
"frequency_penalty": float, # e.g., 0.5
"presence_penalty": float, # e.g., 0.5
"stop": list, # e.g., ["Stop"]
"reasoning_effort": str, # e.g., "minimal", "low", "medium", "high"
"verbosity": str, # e.g., "low", "medium", "high"
"service_tier": str, # e.g., "auto", "default"
"top_logprobs": int,
"safety_identifier": str,
"store": bool,
"enable_thinking": bool,
}Default Aspect Ratios
{
"1:1": "1024×1024",
"2:3": "832×1248",
"3:2": "1248×832",
"3:4": "864×1184",
"4:3": "1184×864",
"4:5": "896×1152",
"5:4": "1152×896",
"9:16": "768×1344",
"16:9": "1344×768",
"21:9": "1536×672",
}Default Import Maps
{
"vue-prod": "/ui/lib/vue.min.mjs",
"vue": "/ui/lib/vue.mjs",
"vue-router": "/ui/lib/vue-router.min.mjs",
"@servicestack/client": "/ui/lib/servicestack-client.mjs",
"@servicestack/vue": "/ui/lib/servicestack-vue.mjs",
"idb": "/ui/lib/idb.min.mjs",
"marked": "/ui/lib/marked.min.mjs",
"highlight.js": "/ui/lib/highlight.min.mjs",
"chart.js": "/ui/lib/chart.js",
"color.js": "/ui/lib/color.js",
"ctx.mjs": "/ui/ctx.mjs",
}UI Extensions
This guide provides a walkthrough of the LLM UI Extensions API which allows you to customize the UI, add new pages, modify the layout, and intercept chat functionality.
Tool Support
Support for Python function calling (Tools), allowing LLMs to interact with your local environment and custom logic.