From Single-Model to Provider-Agnostic: How the Claude Runner Became a Self-Hosted AI Agent Engine
In a previous article, we introduced the Claude Runner — an open-source scheduling platform that executes AI tasks on a cron schedule, responds to webhooks, and builds custom MCP tools from natural language. That version worked, and it proved the concept: AI can move from reactive chat to persistent, goal-directed automation.
But it was Claude-only, single-user, and had a flat tool surface where every job could access every tool. For personal use and simple automation, that was fine. For anything resembling production use — multiple users, sensitive data, cost controls, local model requirements — it needed fundamental changes.
This article covers what changed. Zero overlap with the previous post — everything here is new.
What Changed
The runner evolved from a Claude-specific scheduler into a provider-agnostic agent engine. The core scheduling and webhook infrastructure remains, but the execution layer was rebuilt around four principles:
- Any AI provider can execute jobs, not just Claude
- Tools are scoped to workspaces, not globally available
- Users interact through their own workspace-scoped chat, not the admin interface
- Every API call, tool invocation, and token spend is logged and queryable
The codebase was reorganized into a modular sflow/ package with clear separation between providers, execution, API routes, and MCP tooling.
Interactive Notifications — Human-in-the-Loop
One of the most compelling additions for business use is the ask_user mechanism. During job execution, the AI can pause and ask the user a question.
When a provider encounters a situation that requires human judgment — an ambiguous instruction, a confirmation before a destructive action, a choice between alternatives — it invokes the ask_user tool. This creates an interactive notification in the user's dashboard with the question, answer type (yes/no, multiple choice, or free text), and options.
The job execution pauses. The user sees the notification, provides their answer, and the job resumes with the answer injected into the conversation context.
Human-in-the-Loop by Design
This bridges the gap between full autonomy and full manual control. The AI handles the routine parts of a task and escalates to a human when it encounters genuine ambiguity. The notification system supports priorities (low, medium, high) and tracks whether each notification has been read and answered.
In practice, this enables workflows like: "Process all incoming invoices. If any amount exceeds 10,000 EUR, ask me before approving." The AI processes the routine cases autonomously and escalates the exceptions.
Multi-Provider Architecture
The most visible change is that the runner no longer depends on Claude. Four providers are supported, each implementing a common interface:
Complex reasoning, multi-step tool use via Anthropic's Agent SDK. The strongest option for tasks that require nuanced judgment and extended tool-calling chains.
Every provider implements a shared abstract base class (BaseProvider) that defines a contract for job execution, pricing, availability checks, and optional chat support. The scheduler, run processor, and chat interface do not know or care which provider is behind a given job — they interact exclusively through this contract.
Plug-and-Play Providers
The provider registry checks at startup which SDKs are installed. If the OpenAI package is not present, the OpenAI provider is simply not registered — no errors, no configuration required. This makes the runner deployable in environments where only local models are available or where only one commercial API is authorized.
MCP tools and OpenAI-style function calling use different schemas — the runner converts between them at execution time, handling format differences and wiring tool invocations back through the MCP callable registry transparently.
The Python Script Provider
Not every job needs an LLM. The Python script provider executes scripts directly — either inline code from the job prompt or a file path — with full access to workspace MCP tools via an auto-generated sflow_tools module. Scripts can list available tools and call them directly, bridging the gap between "we need AI reasoning" and "we just need to run a script that calls some tools." Zero API cost, instant execution.
Workspace Isolation and Tool Governance
In the original runner, every job had access to every registered MCP server. This is a non-starter for multi-user or multi-project deployments where different contexts need different tool surfaces — and where some tools should be restricted.
Workspaces
A workspace is a named isolation boundary. Each workspace defines:
- Which MCP servers are enabled
- Which specific tools from those servers are available
- What access level each tool has
- A default system prompt that is prepended to all jobs in the workspace
- A default AI provider for all jobs and conversations in the workspace
- Credentials isolated through server enablement — only enabled servers' credentials are available during execution
Jobs and user chats are associated with a workspace. When a job executes, it only sees the tools enabled in its workspace — nothing else exists from its perspective.
Tool Access Levels
Every tool in a workspace is classified into one of four access levels:
| Level | Description | Example |
|---|---|---|
| read | Query data, no side effects | list_customers, get_order_status |
| write | Create or modify data | create_invoice, update_record |
| admin | System administration | manage_users, configure_server |
| dangerous | Destructive or irreversible | delete_database, send_bulk_email |
Access levels are assigned per tool, per workspace. The same MCP server might expose all its tools in an admin workspace but only read-level tools in a production read-only workspace.
Why This Matters for Governance
This is a practical answer to a real enterprise concern: "How do we let our team use AI tools without worrying that a misconfigured job will delete production data?" You configure a workspace with only the tools and access levels appropriate for that context. The AI cannot access what it cannot see.
Enterprise Example
Consider a manufacturing company running the runner with the Ignition CLI connected as an MCP tool — alongside CRM and email:
Workspace: "Production Monitoring"
MCP Servers: ignition-mcp (read only), email
Tools: get_gateway_status, list_devices, send_email
Access: read, read, write
Workspace: "Admin Operations"
MCP Servers: ignition-mcp (full), crm, email
Tools: all tools enabled
Access: read, write, admin as appropriate
Workspace: "Report Generation"
MCP Servers: crm (read only), file-system (read only)
Tools: lookup_customer, search_contacts, read_file
Access: read, read, readDifferent teams get different workspaces. The same underlying MCP servers are shared, but the tool surface is shaped to each team's needs and risk profile.
Admin Chat and System Configuration
The runner has two distinct chat interfaces, each with a different tool surface.
Admin chat has access to system management tools — creating jobs, managing webhooks, configuring MCP servers, viewing system status. This is the interface for the person operating the runner. Critically, administrators can configure and create new connections to external systems directly through the admin chat — or through any MCP-capable AI client connected to the runner. This means onboarding a new data source or tool server is a conversational action, not a manual configuration task.
User chat is workspace-scoped. A user selects a workspace and starts a conversation. They only see the tools enabled in that workspace, and every interaction is logged against their user identity. The conversation persists across sessions, with full token tracking and cost attribution.
The user chat supports multi-turn conversations with provider selection — a user can choose to chat with Claude, GPT-4o, or a local Ollama model, depending on the workspace configuration and their preference. Provider selection can be configured per workspace (default provider) or overridden per conversation.
Governance, Auditability, and Cost Tracking
For enterprise deployments, knowing what happened, who triggered it, what it cost, and what it touched is not optional — it is a requirement. The runner treats observability and governance as first-class features, not afterthoughts.
Full Audit Trail
Every API call and every tool invocation is logged separately, creating a complete audit trail that can be queried, exported, and reviewed.
API logs capture the provider, model, token counts (input and output), cost, latency, and a source reference that links back to the specific run or conversation that generated the call.
Tool logs capture which MCP server and tool were called, the arguments, result (truncated for storage), error messages, and execution duration. This is essential for debugging failed runs, understanding where time is spent, and demonstrating compliance with data access policies.
Source Tracking
Every log entry is tagged with a source_type (run, admin-chat, user-chat) and a source_id. This makes it straightforward to reconstruct the complete trace of any job or conversation — which tools were invoked, in what order, with what arguments, and at what cost. For regulated industries, this provides the auditability that generic AI chat tools lack.
Both log tables support auto-pruning with configurable retention (default 30 days), preventing unbounded storage growth while maintaining a rolling audit window.
Cost Visibility
The dashboard surfaces logs as trends — token consumption over time, cost per workspace, tool error rates, and provider utilization. This gives operators and finance teams the visibility they need to manage AI spend and identify issues before they compound. A typical workspace running 5 daily scheduled jobs costs less than €5/month in API calls — and switching a daily report from Claude Opus to Haiku can reduce that by 90%.
Access Control Summary
The governance model spans multiple layers:
| Layer | Mechanism |
|---|---|
| Authentication | OAuth 2.1 with PKCE for MCP clients, JWT for dashboard and user API |
| Workspace isolation | Tools, servers, and credentials scoped per workspace |
| Tool classification | Four access levels (read, write, admin, dangerous) per tool per workspace |
| Credential isolation | Only enabled servers' credentials available during execution, guarded by threading locks |
| Rate limiting | Sliding-window rate limits on webhook endpoints |
| Input validation | Enforced limits on prompt length, payload size, and messages per conversation |
| Audit logging | Every API call and tool invocation logged with source tracking |
| Human-in-the-loop | Interactive notifications for high-stakes decisions |
This Is What Sets Self-Hosted Apart
When AI runs on infrastructure you control, with tools scoped to what each team needs, full audit trails, and human escalation for sensitive decisions — you get the productivity benefits of AI agents without the governance risks of sending data to uncontrolled third-party services.
Reliability
Graceful shutdown ensures that running jobs complete or are cleanly terminated when the server stops, preventing orphaned processes and data corruption.
Concurrent execution is bounded by a configurable semaphore (default 10 concurrent runs), and each run is atomically claimed to prevent duplicate execution across scheduler cycles.
Auto-retry is configurable per job. When a run fails and the job has retries configured, the system automatically creates a new pending run with an incremented retry count. The original error is preserved. Notifications distinguish between retried failures (informational) and final failures (actionable), so operators are not overwhelmed by transient errors that resolve on retry.
What We Learned
Provider abstraction was the hardest part. Claude's Agent SDK, OpenAI's chat completions API, and Ollama's local API all have different approaches to tool calling, streaming, and error handling. Building a clean abstraction that handles all three without leaking provider-specific behavior into the rest of the system took more iteration than any other component.
Local models are surprisingly capable for structured tasks. Ollama running Llama 3.1 or Mistral handles tool-calling jobs well when the tools are well-defined and the task is structured. For free-form reasoning and complex multi-step tasks, the commercial APIs still have a clear edge. But for scheduled jobs that follow a predictable pattern — fetch data, transform, send notification — local models work well. And beyond cost, running models locally means sensitive data never leaves your infrastructure. For organizations with strict data residency or confidentiality requirements, this is often the deciding factor.
Workspace design is a product decision, not a technical one. The technical implementation of workspaces is straightforward. Deciding how to partition workspaces — by team, by project, by risk level, by client — is a product and organizational question that depends entirely on the deployment context.
Cost tracking changes behavior. Once you can see exactly what each job costs per run, you start optimizing. Switching a daily report job from Claude Opus to Haiku reduced costs by 90% with no quality impact. Moving a data transformation from an LLM job to a Python script eliminated the API cost entirely. Visibility drives efficiency.
Governance is the real differentiator. The technical capabilities — multi-provider support, tool calling, scheduling — are table stakes. What makes a self-hosted platform viable for enterprise use is the governance layer: who can access what, what happened when, what it cost, and whether a human was in the loop for sensitive decisions. Without that, it is just another AI tool that IT and compliance cannot approve.
Open Source
The runner remains open source under the MIT license: SFLOW-AIRunner-MCP-PRD on GitHub.
The repository includes deployment configurations, systemd service files, and documentation for setting up the complete platform.
