Meet LiteLLM Agent Platform: A Kubernetes-Based, Self-Hosted Infrastructure Layer for Isolated Agent Sandboxes and Persistent Session Management in Production

Running AI agents in a local script is straightforward. Running them reliably in production across teams, across restarts, with isolated environments per context is a different problem entirely. BerriAI, the company behind the LiteLLM AI Gateway, is now open-sourcing a purpose-built answer to that problem: the LiteLLM Agent Platform. The platform is described as a simple, self-hosted infrastructure platform for running multiple agents in production.

What Problem Does it Solve?

It helps to understand what happens when you try to scale agents beyond a single process. Agents are stateful: they carry session history, tool call results, and intermediate reasoning across turns. If the container running your agent crashes, restarts, or gets replaced during a deployment, that session state is gone unless something is explicitly managing it. At the same time, different teams often need different runtime environments, different tools, different secrets, different access scopes which means you cannot throw all agents into one shared container.

The platform manages two things: per-team and per-context sandboxes, and session continuity across pod restarts and upgrades. These two capabilities are the core infrastructure primitives the platform provides.

Architecture and Technical Stack

The platform is a standalone Next.js dashboard for LiteLLM v2 managed agents, covering sessions chat, agent CRUD, and live status. The codebase is primarily TypeScript (92.8%), with Shell scripts for provisioning, a Dockerfile for containerization, and CSS for the dashboard UI.

The architecture separates concerns cleanly. A web process runs on port 3000 and serves the Next.js dashboard. A worker process handles async agent tasks. Postgres is used as the persistent backing store, and a schema migration runs as an init container on startup — so the database is always in the correct state before the application boots.

For the sandbox layer — the isolated runtime environment where agents actually execute — sandboxes run on Kubernetes via the kubernetes-sigs/agent-sandbox CRD. Local development uses kind. If you are not already familiar with it: kind (Kubernetes in Docker) lets you spin up a full Kubernetes cluster locally using Docker containers as nodes, without needing a cloud provider. The agent-sandbox CRD (Custom Resource Definition) is a Kubernetes extension from kubernetes-sigs that the platform installs to manage the lifecycle of individual sandbox environments.

The platform also includes a harness system under harnesses/opencode, which contains the configuration for running coding agents — such as Claude Code or OpenAI Codex — inside isolated sandboxes with a vault proxy for credential management. BerriAI team also maintains a separate litellm-agent-runtime repository, described as a coding-agent runtime that runs inside per-session VMs provisioned by a LiteLLM proxy, generic by design, with customization happening via harness configuration or a hydrate payload.

One practical detail worth noting is how environment variables are handled across sandbox containers. Anything in .env prefixed with CONTAINER_ENV_ is injected into every sandbox container with the prefix stripped — for example, CONTAINER_ENV_GITHUB_TOKEN=ghp_... means the container sees GITHUB_TOKEN=ghp_... This gives teams a clean way to pass secrets into sandboxed agent sessions without modifying container images.

https://github.com/BerriAI/litellm-agent-platform

Getting Started

The prerequisites for local development are Docker Desktop, kind, kubectl, helm, and a LiteLLM gateway. No cloud credentials are required to get started locally. The quickstart is two commands:

Copy CodeCopiedUse a different Browser

bin/kind-up.sh
docker compose up

bin/kind-up.sh is idempotent — it provisions a kind cluster named agent-sbx, installs the agent-sandbox controller, and loads the harness image. docker compose up boots Postgres, runs the schema migration, and starts the web process on port 3000 along with the worker.

For production deployment, the recommended path is AWS EKS for the sandbox cluster and Render for the web and worker processes. bin/eks-up.sh provisions the EKS cluster, and a Render Blueprint provides a one-click deployment option.

Relationship to the LiteLLM Gateway

The Agent Platform is a layer on top of the existing LiteLLM ecosystem, not a replacement for it. LiteLLM’s core is a Python SDK and Proxy Server — an AI Gateway — that calls 100+ LLM APIs in OpenAI format, with cost tracking, guardrails, load balancing, and logging, supporting providers including Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, SageMaker, HuggingFace, vLLM, and NVIDIA NIM. The Agent Platform consumes a running LiteLLM gateway as a dependency and builds agent orchestration and session management infrastructure on top of it. Model routing, cost tracking, and rate limiting remain in the gateway layer. Sandbox isolation, session continuity, and the management dashboard are handled by the Agent Platform.

Marktechpost’s Visual Explainer

LiteLLM Agent Platform

Self-Hosted Agent Infrastructure Guide

Alpha

Overview

Concepts

Architecture

Prerequisites

Quickstart

Production

01 / 06

What is LiteLLM Agent Platform?

BerriAI open-sourced this platform on May 8, 2026. It is a self-hosted infrastructure layer for running multiple AI agents in production, built on top of the LiteLLM AI Gateway.

Self-Hosted

Runs entirely on your own infrastructure. No data leaves your environment. Suited for regulated industries and teams with data residency requirements.

Multi-Agent

Designed to run multiple agents in parallel, with full isolation between teams and contexts using per-session sandboxes.

Session Continuity

Agent sessions persist across pod restarts and upgrades, so stateful work is not lost when containers are replaced.

Open Source (MIT)

Fully open source under the MIT license. Repo: github.com/BerriAI/litellm-agent-platform. File issues and contribute directly.

Prerequisite Knowledge

This guide assumes familiarity with Docker, basic command-line usage, and a general understanding of what an AI agent is (a model that calls tools and runs multi-step tasks). Kubernetes experience helps but is not required to follow along.

02 / 06

Key Concepts to Know First

Before running the platform, understand these four building blocks. They appear throughout the setup and configuration.

A

LiteLLM Gateway

The underlying AI Gateway that the Agent Platform depends on. It routes requests to 100+ LLM providers (OpenAI, Anthropic, Bedrock, VertexAI, etc.) using a unified OpenAI-format API. The Agent Platform does not include the gateway, you must have one running separately and point the platform at it.

B

Sandbox

An isolated container environment where a single agent session executes. Each sandbox is independent, meaning one agent cannot access the filesystem, secrets, or state of another. Sandboxes are provisioned and torn down per session using the kubernetes-sigs/agent-sandbox CRD (Custom Resource Definition).

C

Harness

A configuration layer that defines how a specific type of coding agent (such as Claude Code or OpenAI Codex) runs inside a sandbox. The platform ships with an opencode harness under harnesses/opencode/. The harness image is loaded into the kind cluster during setup.

D

CRD (Custom Resource Definition)

A Kubernetes extension that lets you define new resource types. The platform uses the kubernetes-sigs/agent-sandbox CRD to teach your Kubernetes cluster how to manage agent sandboxes as first-class resources, the same way it manages pods or deployments.

03 / 06

How the Platform Is Structured

The platform has four main components. Understanding how they connect helps when debugging or deploying to production.

Component	What It Does	Tech
web (:3000)	Next.js dashboard. Provides the UI for sessions chat, agent CRUD operations, and live status monitoring.	Next.js, TypeScript
worker	Background process that handles async agent tasks, decoupled from the web server.	TypeScript
postgres	Persistent backing store for session state, agent configs, and metadata. Schema migration runs automatically as an init container on startup.	PostgreSQL
sandbox cluster	Kubernetes cluster where individual agent sandboxes run, managed via the agent-sandbox CRD controller. Locally: kind. In production: AWS EKS.	Kubernetes (kind / EKS)

Separation of Concerns

The LiteLLM gateway handles model routing, cost tracking, rate limiting, and guardrails. The Agent Platform handles sandbox lifecycle, session management, and the management dashboard. They run as separate services and the Agent Platform consumes the gateway as a dependency.

04 / 06

Prerequisites Before You Start

Install and verify these tools before running any setup commands. The quickstart will not work without all five.

1
Docker Desktop

Required to build and run containers, and to power kind (which runs Kubernetes nodes as Docker containers). Download from docker.com/products/docker-desktop. Verify with:
```
docker --version
```

2
kind (Kubernetes in Docker)

Used to provision a local Kubernetes cluster for running sandboxes. Install via Homebrew on macOS (brew install kind) or from kind.sigs.k8s.io. Verify with:
```
kind --version
```

3
kubectl

The Kubernetes command-line tool. Used by the setup scripts to interact with the kind cluster. Install from kubernetes.io/docs/tasks/tools. Verify with:
```
kubectl version --client
```

4
helm

The Kubernetes package manager. Used to install the agent-sandbox controller into the kind cluster. Install from helm.sh/docs/intro/install. Verify with:
```
helm version
```

5

A Running LiteLLM Gateway

The Agent Platform requires a LiteLLM gateway URL to route model calls. If you do not have one running, start with the official LiteLLM quickstart at docs.litellm.ai. You will point the Agent Platform at this URL during configuration.

05 / 06

Local Quickstart

Clone the repo and run two commands to get the full platform running locally. No cloud credentials needed for local development.

Clone the repository

Pull the repo from GitHub:

git clone https://github.com/BerriAI/litellm-agent-platform
cd litellm-agent-platform

2
Configure your .env file

Copy the example env file and fill in your LiteLLM gateway URL and any secrets:
```
cp .env.example .env
# Edit .env and set your LITELLM_GATEWAY_URL and other required values
```

3
Provision the local kind cluster

This script is idempotent, meaning safe to run multiple times. It provisions a kind cluster named agent-sbx, installs the agent-sandbox controller via helm, and loads the harness image:
```
bin/kind-up.sh
```

4
Start all services

Boots Postgres, runs the schema migration as an init container, and starts the web server on port 3000 and the worker process:
```
docker compose up
```

5

Open the dashboard

Navigate to http://localhost:3000 in your browser. You should see the LiteLLM Agent Platform dashboard with options to create agents, open sessions, and monitor live status.

Passing Secrets into Sandboxes

Any variable in .env prefixed with CONTAINER_ENV_ is automatically injected into every sandbox container with the prefix stripped. Example: CONTAINER_ENV_GITHUB_TOKEN=ghp_… means the sandbox sees GITHUB_TOKEN=ghp_… This is the correct way to pass credentials into agent sessions.

06 / 06

Production Deployment

The recommended production setup separates the sandbox cluster (AWS EKS) from the web and worker processes (Render). The repo ships scripts and a Blueprint for both.

1
Provision the EKS sandbox cluster

The bin/eks-up.sh script provisions an AWS EKS cluster configured to run agent sandboxes. This replaces kind as the sandbox backend. Requires AWS credentials in your environment:
```
bin/eks-up.sh
```

2

Deploy web and worker to Render

The repo includes a Render Blueprint under deploy/render/ that deploys the web and worker services to Render with one click. See deploy/render/README.md for the Blueprint URL and required environment variables.

3
Use the Developer API directly (optional)

You can interact with the platform programmatically via its REST API using curl or any HTTP client. The full API reference covering how to create an agent, open a session, send a message, and read the reply is at src/server/DEVELOPER.md in the repo.
```
# Example: create an agent session via curl
curl -X POST http://localhost:3000/api/sessions 
  -H "Content-Type: application/json" 
  -d '{"agent_id": "your-agent-id"}'
```

Architecture Summary for Production

AWS EKS runs the sandbox cluster where agent sessions execute in isolation. Render hosts the Next.js web dashboard and the async worker. Postgres (managed or self-hosted) persists session state. The LiteLLM gateway runs separately and handles all model API routing. These four components communicate over the network and can be scaled independently.

Platform is currently in alpha public preview. File issues at github.com/BerriAI/litellm-agent-platform. Architecture details at docs/k8s-backend.md in the repo.

1 / 6

Published by Marktechpost | AI/ML News and Research for Developers and Engineers

Key Takeaways

BerriAI open-sourced LiteLLM Agent Platform, a self-hosted infrastructure layer for running multiple AI agents in production with per-team sandbox isolation and session continuity across pod restarts.
Sandboxes run on Kubernetes via the kubernetes-sigs/agent-sandbox CRD — locally with kind, in production with AWS EKS — no cloud credentials needed to get started.
The platform sits on top of the existing LiteLLM Gateway, which handles model routing, cost tracking, and rate limiting across 100+ LLM providers in OpenAI format.
The quickstart is two commands: bin/kind-up.sh provisions the kind cluster and installs the sandbox controller; docker compose up boots Postgres, web (:3000), and worker.
Released under MIT license and currently in alpha public preview

Check out the GitHub Repo. Also, feel free to follow us on Twitter and don’t forget to join our 150k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

Need to partner with us for promoting your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar etc.? Connect with us

The post Meet LiteLLM Agent Platform: A Kubernetes-Based, Self-Hosted Infrastructure Layer for Isolated Agent Sandboxes and Persistent Session Management in Production appeared first on MarkTechPost.

AI日报汇

AI日报汇

Meet LiteLLM Agent Platform: A Kubernetes-Based, Self-Hosted Infrastructure Layer for Isolated Agent Sandboxes and Persistent Session Management in Production

Meet LiteLLM Agent Platform: A Kubernetes-Based, Self-Hosted Infrastructure Layer for Isolated Agent Sandboxes and Persistent Session Management in Production

What Problem Does it Solve?

Architecture and Technical Stack

Getting Started

Relationship to the LiteLLM Gateway

Marktechpost’s Visual Explainer

Key Takeaways

admin

Related Posts

Alibaba’s Qwen Team Launches Qwen3.7-Plus, Adding Vision, Deep Reasoning, Tool Invocation, and Autonomous Iteration on the Bailian Platform

JetBrains Releases Mellum2: A 12B MoE Model for Fast, Specialized Tasks in Multi-Model AI Pipelines

发表回复取消回复

Other Story

Alibaba’s Qwen Team Launches Qwen3.7-Plus, Adding Vision, Deep Reasoning, Tool Invocation, and Autonomous Iteration on the Bailian Platform

JetBrains Releases Mellum2: A 12B MoE Model for Fast, Specialized Tasks in Multi-Model AI Pipelines

How to Speed Up Transformer Training Using NVIDIA Apex (FusedAdam, FusedLayerNorm) and Native torch.amp

MiniMax Releases MiniMax M3 with MSA Architecture Supporting 1M-Token Context, Native Multimodality, and Agentic Coding

Meet Memory OS: A 6-Layer Open-Source Memory Stack Built on Top of Hermes Agent

How we used Gemini to build Google I/O 2026

AI日报汇

AI日报汇

Meet LiteLLM Agent Platform: A Kubernetes-Based, Self-Hosted Infrastructure Layer for Isolated Agent Sandboxes and Persistent Session Management in Production

Meet LiteLLM Agent Platform: A Kubernetes-Based, Self-Hosted Infrastructure Layer for Isolated Agent Sandboxes and Persistent Session Management in Production

What Problem Does it Solve?

Architecture and Technical Stack

Getting Started

Relationship to the LiteLLM Gateway

Marktechpost’s Visual Explainer

Key Takeaways

admin

Related Posts

Alibaba’s Qwen Team Launches Qwen3.7-Plus, Adding Vision, Deep Reasoning, Tool Invocation, and Autonomous Iteration on the Bailian Platform

JetBrains Releases Mellum2: A 12B MoE Model for Fast, Specialized Tasks in Multi-Model AI Pipelines

发表回复 取消回复

Other Story

Alibaba’s Qwen Team Launches Qwen3.7-Plus, Adding Vision, Deep Reasoning, Tool Invocation, and Autonomous Iteration on the Bailian Platform

JetBrains Releases Mellum2: A 12B MoE Model for Fast, Specialized Tasks in Multi-Model AI Pipelines

How to Speed Up Transformer Training Using NVIDIA Apex (FusedAdam, FusedLayerNorm) and Native torch.amp

MiniMax Releases MiniMax M3 with MSA Architecture Supporting 1M-Token Context, Native Multimodality, and Agentic Coding

Meet Memory OS: A 6-Layer Open-Source Memory Stack Built on Top of Hermes Agent

How we used Gemini to build Google I/O 2026

发表回复取消回复