* feat(agent): add ToolOutputBudgetMiddleware for oversized tool output protection Closes #3289. Adds a unified middleware that enforces per-result budgets on ALL tool outputs (MCP, sandbox, community, custom), preventing oversized external tool results from blowing the model context window. Design informed by claude-code (persistToolResult), hermes-agent (tool_result_storage), and pi (OutputAccumulator) — the three most mature implementations in production coding-agent frameworks. Key features: - Disk externalization: oversized outputs written to thread-local .tool-results/ directory, replaced with compact preview + file reference. Model can read full output via read_file with offset/limit. - Fallback truncation: head+tail truncation when disk is unavailable (no thread_data, write failure), ensuring the context is always protected. - read_file exemption: prevents persist-read-persist infinite loops (independently discovered by claude-code, hermes-agent, and pi). - Per-tool threshold overrides via config. - Line-boundary-aware truncation (no partial lines in previews). - Multimodal content passthrough (images/structured blocks skip budget). - Historical ToolMessage patching in wrap_model_call for checkpoint recovery scenarios. Related: #3222 (design RFC), #1844 (comprehensive context management), #3137 (write_file args compaction), #1677 (sandbox tool truncation). * test: add MCP content_and_artifact format coverage Add 5 tests for MCP tool output format (list of content blocks): - text content blocks are extracted and budgeted - multiple text blocks are joined and budgeted - image content blocks are skipped (multimodal passthrough) - mixed text+image blocks are skipped - small text blocks pass through unchanged Total test count: 59 (was 54). * fix(agent): address Codex review findings for ToolOutputBudgetMiddleware Three issues identified by Codex code review, all fixed: 1. `enabled` config field was unused — middleware now checks `config.enabled` and skips all processing when disabled. 2. `_build_fallback` could exceed `fallback_max_chars` — the marker text itself (~139 chars) was not deducted from the budget. Now pre-computes marker overhead and falls back to hard slice when max_chars is smaller than the marker. 3. Sync file I/O in async path — `awrap_tool_call` now delegates `_patch_result` to `asyncio.to_thread` to avoid blocking the event loop during disk writes. Tests updated to use realistic fallback_max_chars values (500+) that can accommodate the marker overhead, plus two new tests: - `test_result_never_exceeds_max_chars` (parametric across sizes) - `test_very_small_max_chars_does_not_crash` * fix(agent): address Copilot review — path traversal, async perf, shared config 1. Path traversal defense: sanitize tool_name via _sanitize_tool_name() (strips separators, .., absolute paths), validate storage_subdir is relative, and verify resolved filepath stays inside storage_dir. 2. Async hot-path optimization: add _needs_budget() cheap check before asyncio.to_thread offload — small outputs (99% of calls) skip the thread overhead entirely. 3. Replace shared module-level _DEFAULT_CONFIG with _default_config() factory to prevent cross-instance mutation of mutable fields. 12 new tests: TestSanitizeToolName (5), TestExternalizePathTraversal (3), TestNeedsBudget (4). * fix(agent): correct preview hint to match read_file actual API read_file uses start_line/end_line (1-indexed line numbers), not offset/limit. The previous wording was copied from hermes-agent which has a different read_file interface. * perf(agent): hoist hot-path imports, add model-call pre-scan (review #3303) Address maintainer review feedback: 1. Hoist inline imports to module level — `import asyncio` (was in awrap_tool_call hot path) and `from dataclasses import replace` (was in _patch_result) now live at module top. 2. Add a cheap pre-scan to _patch_model_messages so the historical message list is not rebuilt on every model call when nothing is oversized (the common case once results are budgeted at tool-call time). Also adds the same _needs_budget gate to the sync wrap_tool_call for symmetry with awrap_tool_call. The pre-scan is refactored into per-tool-aware helpers (_effective_trigger / _tool_message_over_budget) that mirror the exact trigger conditions in _budget_content — including tool_overrides — so the fast-path can never produce a false negative (silently skipping budgeting for a tool with a low per-tool threshold). 7 new regression tests lock the per-tool-override-through-pre-scan path and the model-call early return. --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
DeerFlow Backend
DeerFlow is a LangGraph-based AI super agent with sandbox execution, persistent memory, and extensible tool integration. The backend enables AI agents to execute code, browse the web, manage files, delegate tasks to subagents, and retain context across conversations - all in isolated, per-thread environments.
Architecture
┌──────────────────────────────────────┐
│ Nginx (Port 2026) │
│ Unified reverse proxy │
└───────┬──────────────────┬───────────┘
│
/api/langgraph/* │ /api/* (other)
rewritten to /api/* │
▼
┌────────────────────────────────────────┐
│ Gateway API (8001) │
│ FastAPI REST + agent runtime │
│ │
│ Models, MCP, Skills, Memory, Uploads, │
│ Artifacts, Threads, Runs, Streaming │
│ │
│ ┌────────────────────────────────────┐ │
│ │ Lead Agent │ │
│ │ Middleware Chain, Tools, Subagents │ │
│ └────────────────────────────────────┘ │
└────────────────────────────────────────┘
Request Routing (via Nginx):
/api/langgraph/*→ Gateway LangGraph-compatible API - agent interactions, threads, streaming/api/*(other) → Gateway API - models, MCP, skills, memory, artifacts, uploads, thread-local cleanup/(non-API) → Frontend - Next.js web interface
Core Components
Lead Agent
The single LangGraph agent (lead_agent) is the runtime entry point, created via make_lead_agent(config). It combines:
- Dynamic model selection with thinking and vision support
- Middleware chain for cross-cutting concerns (9 middlewares)
- Tool system with sandbox, MCP, community, and built-in tools
- Subagent delegation for parallel task execution
- System prompt with skills injection, memory context, and working directory guidance
Middleware Chain
Middlewares execute in strict order, each handling a specific concern:
| # | Middleware | Purpose |
|---|---|---|
| 1 | ThreadDataMiddleware | Creates per-thread isolated directories (workspace, uploads, outputs) |
| 2 | UploadsMiddleware | Injects newly uploaded files into conversation context |
| 3 | SandboxMiddleware | Acquires sandbox environment for code execution |
| 4 | SummarizationMiddleware | Reduces context when approaching token limits (optional) |
| 5 | TodoListMiddleware | Tracks multi-step tasks in plan mode (optional) |
| 6 | TitleMiddleware | Auto-generates conversation titles after first exchange |
| 7 | MemoryMiddleware | Queues conversations for async memory extraction |
| 8 | ViewImageMiddleware | Injects image data for vision-capable models (conditional) |
| 9 | ClarificationMiddleware | Intercepts clarification requests and interrupts execution (must be last) |
Sandbox System
Per-thread isolated execution with virtual path translation:
- Abstract interface:
execute_command,read_file,write_file,list_dir - Providers:
LocalSandboxProvider(filesystem) andAioSandboxProvider(Docker, in community/). Async runtime paths use async sandbox lifecycle hooks so startup, readiness polling, and release do not block the event loop. - Virtual paths:
/mnt/user-data/{workspace,uploads,outputs}→ thread-specific physical directories - Skills path:
/mnt/skills→deer-flow/skills/directory - Skills loading: Recursively discovers nested
SKILL.mdfiles underskills/{public,custom}and preserves nested container paths - File-write safety:
str_replaceserializes read-modify-write per(sandbox.id, path)so isolated sandboxes keep concurrency even when virtual paths match - Tools:
bash,ls,read_file,write_file,str_replace(write_fileoverwrites by default and exposesappendfor end-of-file writes;bashis disabled by default when usingLocalSandboxProvider; useAioSandboxProviderfor isolated shell access)
Subagent System
Async task delegation with concurrent execution:
- Built-in agents:
general-purpose(full toolset) andbash(command specialist, exposed only when shell access is available) - Concurrency: Max 3 subagents per turn, 15-minute timeout
- Execution: Background thread pools with status tracking and SSE events
- Flow: Agent calls
task()tool → executor runs subagent in background → polls for completion → returns result
Memory System
LLM-powered persistent context retention across conversations:
- Automatic extraction: Analyzes conversations for user context, facts, and preferences
- Structured storage: User context (work, personal, top-of-mind), history, and confidence-scored facts
- Debounced updates: Batches updates to minimize LLM calls (configurable wait time)
- System prompt injection: Top facts + context injected into agent prompts
- Storage: JSON file with mtime-based cache invalidation
Tool Ecosystem
| Category | Tools |
|---|---|
| Sandbox | bash, ls, read_file, write_file, str_replace |
| Built-in | present_files, ask_clarification, view_image, task (subagent) |
| Community | Tavily (web search), Jina AI (web fetch), Firecrawl (scraping), DuckDuckGo (image search) |
| MCP | Any Model Context Protocol server (stdio, SSE, HTTP transports) |
| Skills | Domain-specific workflows injected via system prompt |
Gateway API
FastAPI application providing REST endpoints for frontend integration:
| Route | Purpose |
|---|---|
GET /api/models |
List available LLM models |
GET/PUT /api/mcp/config |
Manage MCP server configurations |
GET/PUT /api/skills |
List and manage skills |
POST /api/skills/install |
Install skill from .skill archive |
GET /api/memory |
Retrieve memory data |
POST /api/memory/reload |
Force memory reload |
GET /api/memory/config |
Memory configuration |
GET /api/memory/status |
Combined config + data |
POST /api/threads/{id}/uploads |
Upload files (auto-converts PDF/PPT/Excel/Word to Markdown, rejects directory paths, auto-renames duplicate filenames in one request) |
GET /api/threads/{id}/uploads/list |
List uploaded files |
DELETE /api/threads/{id} |
Delete DeerFlow-managed local thread data after LangGraph thread deletion; unexpected failures are logged server-side and return a generic 500 detail |
GET /api/threads/{id}/artifacts/{path} |
Serve generated artifacts |
IM Channels
The IM bridge supports Feishu, Slack, and Telegram. Slack and Telegram still use the final runs.wait() response path, while Feishu now streams through runs.stream(["messages-tuple", "values"]) and updates a single in-thread card in place.
For Feishu card updates, DeerFlow stores the running card's message_id per inbound message and patches that same card until the run finishes, preserving the existing OK / DONE reaction flow.
Quick Start
Prerequisites
- Python 3.12+
- uv package manager
- API keys for your chosen LLM provider
Installation
cd deer-flow
# Copy configuration files
cp config.example.yaml config.yaml
# Install backend dependencies
cd backend
make install
Configuration
Edit config.yaml in the project root:
models:
- name: gpt-4o
display_name: GPT-4o
use: langchain_openai:ChatOpenAI
model: gpt-4o
api_key: $OPENAI_API_KEY
supports_thinking: false
supports_vision: true
- name: gpt-5-responses
display_name: GPT-5 (Responses API)
use: langchain_openai:ChatOpenAI
model: gpt-5
api_key: $OPENAI_API_KEY
use_responses_api: true
output_version: responses/v1
supports_vision: true
Set your API keys:
export OPENAI_API_KEY="your-api-key-here"
Running
Full Application (from project root):
make dev # Starts Gateway + Frontend + Nginx
Access at: http://localhost:2026
Backend Only (from backend directory):
# Gateway API + embedded agent runtime
make dev
Direct access: Gateway at http://localhost:8001
Project Structure
backend/
├── src/
│ ├── agents/ # Agent system
│ │ ├── lead_agent/ # Main agent (factory, prompts)
│ │ ├── middlewares/ # 9 middleware components
│ │ ├── memory/ # Memory extraction & storage
│ │ └── thread_state.py # ThreadState schema
│ ├── gateway/ # FastAPI Gateway API
│ │ ├── app.py # Application setup
│ │ └── routers/ # 6 route modules
│ ├── sandbox/ # Sandbox execution
│ │ ├── local/ # Local filesystem provider
│ │ ├── sandbox.py # Abstract interface
│ │ ├── tools.py # bash, ls, read/write/str_replace
│ │ └── middleware.py # Sandbox lifecycle
│ ├── subagents/ # Subagent delegation
│ │ ├── builtins/ # general-purpose, bash agents
│ │ ├── executor.py # Background execution engine
│ │ └── registry.py # Agent registry
│ ├── tools/builtins/ # Built-in tools
│ ├── mcp/ # MCP protocol integration
│ ├── models/ # Model factory
│ ├── skills/ # Skill discovery & loading
│ ├── config/ # Configuration system
│ ├── community/ # Community tools & providers
│ ├── reflection/ # Dynamic module loading
│ └── utils/ # Utilities
├── docs/ # Documentation
├── tests/ # Test suite
├── langgraph.json # LangGraph graph registry for tooling/Studio compatibility
├── pyproject.toml # Python dependencies
├── Makefile # Development commands
└── Dockerfile # Container build
langgraph.json is not the default service entrypoint. The scripts and Docker
deployments run the Gateway embedded runtime; the file is kept for LangGraph
tooling, Studio, or direct LangGraph Server compatibility.
Configuration
Main Configuration (config.yaml)
Place in project root. Config values starting with $ resolve as environment variables.
Key sections:
models- LLM configurations with class paths, API keys, thinking/vision flagstools- Tool definitions with module paths and groupstool_groups- Logical tool groupingssandbox- Execution environment providerskills- Skills directory pathstitle- Auto-title generation settingssummarization- Context summarization settingssubagents- Subagent system (enabled/disabled)memory- Memory system settings (enabled, storage, debounce, facts limits)
Provider note:
models[*].usereferences provider classes by module path (for examplelangchain_openai:ChatOpenAI).- If a provider module is missing, DeerFlow now returns an actionable error with install guidance (for example
uv add langchain-google-genai).
Extensions Configuration (extensions_config.json)
MCP servers and skill states in a single file:
{
"mcpServers": {
"github": {
"enabled": true,
"type": "stdio",
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-github"],
"env": {"GITHUB_TOKEN": "$GITHUB_TOKEN"}
},
"secure-http": {
"enabled": true,
"type": "http",
"url": "https://api.example.com/mcp",
"oauth": {
"enabled": true,
"token_url": "https://auth.example.com/oauth/token",
"grant_type": "client_credentials",
"client_id": "$MCP_OAUTH_CLIENT_ID",
"client_secret": "$MCP_OAUTH_CLIENT_SECRET"
}
}
},
"skills": {
"pdf-processing": {"enabled": true}
}
}
Environment Variables
DEER_FLOW_CONFIG_PATH- Override config.yaml locationDEER_FLOW_EXTENSIONS_CONFIG_PATH- Override extensions_config.json location- Model API keys:
OPENAI_API_KEY,ANTHROPIC_API_KEY,DEEPSEEK_API_KEY, etc. - Tool API keys:
TAVILY_API_KEY,GITHUB_TOKEN, etc.
LangSmith Tracing
DeerFlow has built-in LangSmith integration for observability. When enabled, all LLM calls, agent runs, tool executions, and middleware processing are traced and visible in the LangSmith dashboard.
Setup:
- Sign up at smith.langchain.com and create a project.
- Add the following to your
.envfile in the project root:
LANGSMITH_TRACING=true
LANGSMITH_ENDPOINT=https://api.smith.langchain.com
LANGSMITH_API_KEY=lsv2_pt_xxxxxxxxxxxxxxxx
LANGSMITH_PROJECT=xxx
Legacy variables: The LANGCHAIN_TRACING_V2, LANGCHAIN_API_KEY, LANGCHAIN_PROJECT, and LANGCHAIN_ENDPOINT variables are also supported for backward compatibility. LANGSMITH_* variables take precedence when both are set.
Langfuse Tracing
DeerFlow also supports Langfuse observability for LangChain-compatible runs.
Add the following to your .env file:
LANGFUSE_TRACING=true
LANGFUSE_PUBLIC_KEY=pk-lf-xxxxxxxxxxxxxxxx
LANGFUSE_SECRET_KEY=sk-lf-xxxxxxxxxxxxxxxx
LANGFUSE_BASE_URL=https://cloud.langfuse.com
If you are using a self-hosted Langfuse deployment, set LANGFUSE_BASE_URL to your Langfuse host.
Dual Provider Behavior
If both LangSmith and Langfuse are enabled, DeerFlow initializes and attaches both callbacks so the same run data is reported to both systems.
If a provider is explicitly enabled but required credentials are missing, or the provider callback cannot be initialized, DeerFlow raises an error when tracing is initialized during model creation instead of silently disabling tracing.
Docker: In docker-compose.yaml, tracing is disabled by default (LANGSMITH_TRACING=false). Set LANGSMITH_TRACING=true and/or LANGFUSE_TRACING=true in your .env, together with the required credentials, to enable tracing in containerized deployments.
Development
Commands
make install # Install dependencies
make dev # Run Gateway API + embedded agent runtime (port 8001)
make gateway # Run Gateway API without reload (port 8001)
make lint # Run linter (ruff)
make format # Format code (ruff)
make detect-blocking-io # Inventory blocking IO that may block the backend event loop
Code Style
- Linter/Formatter:
ruff - Line length: 240 characters
- Python: 3.12+ with type hints
- Quotes: Double quotes
- Indentation: 4 spaces
Testing
uv run pytest
make detect-blocking-io statically scans backend business code for blocking
IO that may run on the backend event loop and is not test-coverage-bound. It
prints a concise summary for human review and writes complete JSON findings to
.deer-flow/blocking-io-findings.json at the repository root (regardless of
whether the target is invoked from the repo root or from backend/). JSON
findings include both broad IO category and review-oriented fields such as
priority, location, blocking_call, event_loop_exposure, reason, and
code. priority is a deterministic review ordering from the operation type,
not proof of a bug. Bare-name same-file calls are resolved by function name,
so duplicate helper names in one file can conservatively over-report async
reachability.
Technology Stack
- LangGraph (1.0.6+) - Agent framework and multi-agent orchestration
- LangChain (1.2.3+) - LLM abstractions and tool system
- FastAPI (0.115.0+) - Gateway REST API
- langchain-mcp-adapters - Model Context Protocol support
- agent-sandbox - Sandboxed code execution
- markitdown - Multi-format document conversion
- tavily-python / firecrawl-py - Web search and scraping
Documentation
- Configuration Guide
- Architecture Details
- API Reference
- File Upload
- Path Examples
- Context Summarization
- Plan Mode
- Setup Guide
License
See the LICENSE file in the project root.
Contributing
See CONTRIBUTING.md for contribution guidelines.