From Single-Model to Provider-Agnostic: How the Claude Runner Became a Self-Hosted AI Agent Engine

In a previous article, we introduced the Claude Runner — an open-source scheduling platform that executes AI tasks on a cron schedule, responds to webhooks, and builds custom MCP tools from natural language. That version worked, and it proved the concept: AI can move from reactive chat to persistent, goal-directed automation.

But it was Claude-only, single-user, and had a flat tool surface where every job could access every tool. For personal use and simple automation, that was fine. For anything resembling production use — multiple users, sensitive data, cost controls, local model requirements — it needed fundamental changes.

This article covers what changed. Zero overlap with the previous post — everything here is new.

What Changed

The runner evolved from a Claude-specific scheduler into a provider-agnostic agent engine. The core scheduling and webhook infrastructure remains, but the execution layer was rebuilt around four principles:

Any AI provider can execute jobs, not just Claude
Tools are scoped to workspaces, not globally available
Users interact through their own workspace-scoped chat, not the admin interface
Every API call, tool invocation, and token spend is logged and queryable

The codebase was reorganized into a modular sflow/ package with clear separation between providers, execution, API routes, and MCP tooling.

Interactive Notifications — Human-in-the-Loop

One of the most compelling additions for business use is the ask_user mechanism. During job execution, the AI can pause and ask the user a question.

When a provider encounters a situation that requires human judgment — an ambiguous instruction, a confirmation before a destructive action, a choice between alternatives — it invokes the ask_user tool. This creates an interactive notification in the user's dashboard with the question, answer type (yes/no, multiple choice, or free text), and options.

The job execution pauses. The user sees the notification, provides their answer, and the job resumes with the answer injected into the conversation context.

Human-in-the-Loop by Design

This bridges the gap between full autonomy and full manual control. The AI handles the routine parts of a task and escalates to a human when it encounters genuine ambiguity. The notification system supports priorities (low, medium, high) and tracks whether each notification has been read and answered.

In practice, this enables workflows like: "Process all incoming invoices. If any amount exceeds 10,000 EUR, ask me before approving." The AI processes the routine cases autonomously and escalates the exceptions.

Multi-Provider Architecture

The most visible change is that the runner no longer depends on Claude. Four providers are supported, each implementing a common interface:

Complex reasoning, multi-step tool use via Anthropic's Agent SDK. The strongest option for tasks that require nuanced judgment and extended tool-calling chains.

Every provider implements a shared abstract base class (BaseProvider) that defines a contract for job execution, pricing, availability checks, and optional chat support. The scheduler, run processor, and chat interface do not know or care which provider is behind a given job — they interact exclusively through this contract.

Plug-and-Play Providers

The provider registry checks at startup which SDKs are installed. If the OpenAI package is not present, the OpenAI provider is simply not registered — no errors, no configuration required. This makes the runner deployable in environments where only local models are available or where only one commercial API is authorized.

MCP tools and OpenAI-style function calling use different schemas — the runner converts between them at execution time, handling format differences and wiring tool invocations back through the MCP callable registry transparently.

The Python Script Provider

Not every job needs an LLM. The Python script provider executes scripts directly — either inline code from the job prompt or a file path — with full access to workspace MCP tools via an auto-generated sflow_tools module. Scripts can list available tools and call them directly, bridging the gap between "we need AI reasoning" and "we just need to run a script that calls some tools." Zero API cost, instant execution.

Workspace Isolation and Tool Governance

In the original runner, every job had access to every registered MCP server. This is a non-starter for multi-user or multi-project deployments where different contexts need different tool surfaces — and where some tools should be restricted.

Workspaces

A workspace is a named isolation boundary. Each workspace defines:

Which MCP servers are enabled
Which specific tools from those servers are available
What access level each tool has
A default system prompt that is prepended to all jobs in the workspace
A default AI provider for all jobs and conversations in the workspace
Credentials isolated through server enablement — only enabled servers' credentials are available during execution

Jobs and user chats are associated with a workspace. When a job executes, it only sees the tools enabled in its workspace — nothing else exists from its perspective.

Tool Access Levels

Every tool in a workspace is classified into one of four access levels:

Level	Description	Example
read	Query data, no side effects	`list_customers`, `get_order_status`
write	Create or modify data	`create_invoice`, `update_record`
admin	System administration	`manage_users`, `configure_server`
dangerous	Destructive or irreversible	`delete_database`, `send_bulk_email`

Access levels are assigned per tool, per workspace. The same MCP server might expose all its tools in an admin workspace but only read-level tools in a production read-only workspace.

Why This Matters for Governance

This is a practical answer to a real enterprise concern: "How do we let our team use AI tools without worrying that a misconfigured job will delete production data?" You configure a workspace with only the tools and access levels appropriate for that context. The AI cannot access what it cannot see.

Enterprise Example

Consider a manufacturing company running the runner with the Ignition CLI connected as an MCP tool — alongside CRM and email:

Workspace: "Production Monitoring"
  MCP Servers: ignition-mcp (read only), email
  Tools: get_gateway_status, list_devices, send_email
  Access: read, read, write

Workspace: "Admin Operations"
  MCP Servers: ignition-mcp (full), crm, email
  Tools: all tools enabled
  Access: read, write, admin as appropriate

Workspace: "Report Generation"
  MCP Servers: crm (read only), file-system (read only)
  Tools: lookup_customer, search_contacts, read_file
  Access: read, read, read

Different teams get different workspaces. The same underlying MCP servers are shared, but the tool surface is shaped to each team's needs and risk profile.

Admin Chat and System Configuration

The runner has two distinct chat interfaces, each with a different tool surface.

Admin chat has access to system management tools — creating jobs, managing webhooks, configuring MCP servers, viewing system status. This is the interface for the person operating the runner. Critically, administrators can configure and create new connections to external systems directly through the admin chat — or through any MCP-capable AI client connected to the runner. This means onboarding a new data source or tool server is a conversational action, not a manual configuration task.

User chat is workspace-scoped. A user selects a workspace and starts a conversation. They only see the tools enabled in that workspace, and every interaction is logged against their user identity. The conversation persists across sessions, with full token tracking and cost attribution.

The user chat supports multi-turn conversations with provider selection — a user can choose to chat with Claude, GPT-4o, or a local Ollama model, depending on the workspace configuration and their preference. Provider selection can be configured per workspace (default provider) or overridden per conversation.

Governance, Auditability, and Cost Tracking

For enterprise deployments, knowing what happened, who triggered it, what it cost, and what it touched is not optional — it is a requirement. The runner treats observability and governance as first-class features, not afterthoughts.

Full Audit Trail

Every API call and every tool invocation is logged separately, creating a complete audit trail that can be queried, exported, and reviewed.

API logs capture the provider, model, token counts (input and output), cost, latency, and a source reference that links back to the specific run or conversation that generated the call.

Tool logs capture which MCP server and tool were called, the arguments, result (truncated for storage), error messages, and execution duration. This is essential for debugging failed runs, understanding where time is spent, and demonstrating compliance with data access policies.

Source Tracking

Every log entry is tagged with a source_type (run, admin-chat, user-chat) and a source_id. This makes it straightforward to reconstruct the complete trace of any job or conversation — which tools were invoked, in what order, with what arguments, and at what cost. For regulated industries, this provides the auditability that generic AI chat tools lack.

Both log tables support auto-pruning with configurable retention (default 30 days), preventing unbounded storage growth while maintaining a rolling audit window.

Cost Visibility

The dashboard surfaces logs as trends — token consumption over time, cost per workspace, tool error rates, and provider utilization. This gives operators and finance teams the visibility they need to manage AI spend and identify issues before they compound. A typical workspace running 5 daily scheduled jobs costs less than €5/month in API calls — and switching a daily report from Claude Opus to Haiku can reduce that by 90%.

Access Control Summary

The governance model spans multiple layers:

Layer	Mechanism
Authentication	OAuth 2.1 with PKCE for MCP clients, JWT for dashboard and user API
Workspace isolation	Tools, servers, and credentials scoped per workspace
Tool classification	Four access levels (read, write, admin, dangerous) per tool per workspace
Credential isolation	Only enabled servers' credentials available during execution, guarded by threading locks
Rate limiting	Sliding-window rate limits on webhook endpoints
Input validation	Enforced limits on prompt length, payload size, and messages per conversation
Audit logging	Every API call and tool invocation logged with source tracking
Human-in-the-loop	Interactive notifications for high-stakes decisions

This Is What Sets Self-Hosted Apart

When AI runs on infrastructure you control, with tools scoped to what each team needs, full audit trails, and human escalation for sensitive decisions — you get the productivity benefits of AI agents without the governance risks of sending data to uncontrolled third-party services.

Reliability

Graceful shutdown ensures that running jobs complete or are cleanly terminated when the server stops, preventing orphaned processes and data corruption.

Concurrent execution is bounded by a configurable semaphore (default 10 concurrent runs), and each run is atomically claimed to prevent duplicate execution across scheduler cycles.

Auto-retry is configurable per job. When a run fails and the job has retries configured, the system automatically creates a new pending run with an incremented retry count. The original error is preserved. Notifications distinguish between retried failures (informational) and final failures (actionable), so operators are not overwhelmed by transient errors that resolve on retry.

What We Learned

Provider abstraction was the hardest part. Claude's Agent SDK, OpenAI's chat completions API, and Ollama's local API all have different approaches to tool calling, streaming, and error handling. Building a clean abstraction that handles all three without leaking provider-specific behavior into the rest of the system took more iteration than any other component.

Local models are surprisingly capable for structured tasks. Ollama running Llama 3.1 or Mistral handles tool-calling jobs well when the tools are well-defined and the task is structured. For free-form reasoning and complex multi-step tasks, the commercial APIs still have a clear edge. But for scheduled jobs that follow a predictable pattern — fetch data, transform, send notification — local models work well. And beyond cost, running models locally means sensitive data never leaves your infrastructure. For organizations with strict data residency or confidentiality requirements, this is often the deciding factor.

Workspace design is a product decision, not a technical one. The technical implementation of workspaces is straightforward. Deciding how to partition workspaces — by team, by project, by risk level, by client — is a product and organizational question that depends entirely on the deployment context.

Cost tracking changes behavior. Once you can see exactly what each job costs per run, you start optimizing. Switching a daily report job from Claude Opus to Haiku reduced costs by 90% with no quality impact. Moving a data transformation from an LLM job to a Python script eliminated the API cost entirely. Visibility drives efficiency.

Governance is the real differentiator. The technical capabilities — multi-provider support, tool calling, scheduling — are table stakes. What makes a self-hosted platform viable for enterprise use is the governance layer: who can access what, what happened when, what it cost, and whether a human was in the loop for sensitive decisions. Without that, it is just another AI tool that IT and compliance cannot approve.

Open Source

The runner remains open source under the MIT license: SFLOW-AIRunner-MCP-PRD on GitHub.

The repository includes deployment configurations, systemd service files, and documentation for setting up the complete platform.

AI Power User

AI Power User (NL)

AI-Enabled Builder

Specializations

From Single-Model to Provider-Agnostic: How the Claude Runner Became a Self-Hosted AI Agent Engine

What Changed

Interactive Notifications — Human-in-the-Loop

Multi-Provider Architecture

The Python Script Provider

Workspace Isolation and Tool Governance

Workspaces

Tool Access Levels

Enterprise Example

Admin Chat and System Configuration

Governance, Auditability, and Cost Tracking

Full Audit Trail

Cost Visibility

Access Control Summary

Reliability

What We Learned

Open Source

Want to discuss a project like this?

Specializations

From Single-Model to Provider-Agnostic: How the Claude Runner Became a Self-Hosted AI Agent Engine ​

What Changed ​

Interactive Notifications — Human-in-the-Loop ​

Multi-Provider Architecture ​

The Python Script Provider ​

Workspace Isolation and Tool Governance ​

Workspaces ​

Tool Access Levels ​

Enterprise Example ​

Admin Chat and System Configuration ​

Governance, Auditability, and Cost Tracking ​

Full Audit Trail ​

Cost Visibility ​

Access Control Summary ​

Reliability ​

What We Learned ​

Open Source ​

Want to discuss a project like this?

From Single-Model to Provider-Agnostic: How the Claude Runner Became a Self-Hosted AI Agent Engine

What Changed

Interactive Notifications — Human-in-the-Loop

Multi-Provider Architecture

The Python Script Provider

Workspace Isolation and Tool Governance

Workspaces

Tool Access Levels

Enterprise Example

Admin Chat and System Configuration

Governance, Auditability, and Cost Tracking

Full Audit Trail

Cost Visibility

Access Control Summary

Reliability

What We Learned

Open Source