mirror of https://github.com/bytedance/deer-flow.git synced 2026-05-30 20:38:09 +00:00

History

feat(agent): add ToolOutputBudgetMiddleware for oversized tool output protection (#3303 )

* feat(agent): add ToolOutputBudgetMiddleware for oversized tool output protection

Closes #3289. Adds a unified middleware that enforces per-result budgets
on ALL tool outputs (MCP, sandbox, community, custom), preventing
oversized external tool results from blowing the model context window.

Design informed by claude-code (persistToolResult), hermes-agent
(tool_result_storage), and pi (OutputAccumulator) — the three most
mature implementations in production coding-agent frameworks.

Key features:
- Disk externalization: oversized outputs written to thread-local
  .tool-results/ directory, replaced with compact preview + file
  reference. Model can read full output via read_file with offset/limit.
- Fallback truncation: head+tail truncation when disk is unavailable
  (no thread_data, write failure), ensuring the context is always
  protected.
- read_file exemption: prevents persist-read-persist infinite loops
  (independently discovered by claude-code, hermes-agent, and pi).
- Per-tool threshold overrides via config.
- Line-boundary-aware truncation (no partial lines in previews).
- Multimodal content passthrough (images/structured blocks skip budget).
- Historical ToolMessage patching in wrap_model_call for checkpoint
  recovery scenarios.

Related: #3222 (design RFC), #1844 (comprehensive context management),
#3137 (write_file args compaction), #1677 (sandbox tool truncation).

* test: add MCP content_and_artifact format coverage

Add 5 tests for MCP tool output format (list of content blocks):
- text content blocks are extracted and budgeted
- multiple text blocks are joined and budgeted
- image content blocks are skipped (multimodal passthrough)
- mixed text+image blocks are skipped
- small text blocks pass through unchanged

Total test count: 59 (was 54).

* fix(agent): address Codex review findings for ToolOutputBudgetMiddleware

Three issues identified by Codex code review, all fixed:

1. `enabled` config field was unused — middleware now checks
   `config.enabled` and skips all processing when disabled.

2. `_build_fallback` could exceed `fallback_max_chars` — the marker
   text itself (~139 chars) was not deducted from the budget. Now
   pre-computes marker overhead and falls back to hard slice when
   max_chars is smaller than the marker.

3. Sync file I/O in async path — `awrap_tool_call` now delegates
   `_patch_result` to `asyncio.to_thread` to avoid blocking the
   event loop during disk writes.

Tests updated to use realistic fallback_max_chars values (500+)
that can accommodate the marker overhead, plus two new tests:
- `test_result_never_exceeds_max_chars` (parametric across sizes)
- `test_very_small_max_chars_does_not_crash`

* fix(agent): address Copilot review — path traversal, async perf, shared config

1. Path traversal defense: sanitize tool_name via _sanitize_tool_name()
   (strips separators, .., absolute paths), validate storage_subdir is
   relative, and verify resolved filepath stays inside storage_dir.

2. Async hot-path optimization: add _needs_budget() cheap check before
   asyncio.to_thread offload — small outputs (99% of calls) skip the
   thread overhead entirely.

3. Replace shared module-level _DEFAULT_CONFIG with _default_config()
   factory to prevent cross-instance mutation of mutable fields.

12 new tests: TestSanitizeToolName (5), TestExternalizePathTraversal (3),
TestNeedsBudget (4).

* fix(agent): correct preview hint to match read_file actual API

read_file uses start_line/end_line (1-indexed line numbers), not
offset/limit. The previous wording was copied from hermes-agent
which has a different read_file interface.

* perf(agent): hoist hot-path imports, add model-call pre-scan (review #3303)

Address maintainer review feedback:

1. Hoist inline imports to module level — `import asyncio` (was in
   awrap_tool_call hot path) and `from dataclasses import replace`
   (was in _patch_result) now live at module top.

2. Add a cheap pre-scan to _patch_model_messages so the historical
   message list is not rebuilt on every model call when nothing is
   oversized (the common case once results are budgeted at tool-call
   time). Also adds the same _needs_budget gate to the sync
   wrap_tool_call for symmetry with awrap_tool_call.

The pre-scan is refactored into per-tool-aware helpers
(_effective_trigger / _tool_message_over_budget) that mirror the exact
trigger conditions in _budget_content — including tool_overrides — so
the fast-path can never produce a false negative (silently skipping
budgeting for a tool with a low per-tool threshold).

7 new regression tests lock the per-tool-override-through-pre-scan path
and the model-call early return.

---------

Co-authored-by: Willem Jiang <willem.jiang@gmail.com>

2026-05-29 22:59:26 +08:00

.vscode

chore: specify project title

2026-01-14 09:57:52 +08:00

app

docs: clean standalone LangGraph server remnants (#3301 )

2026-05-29 11:36:45 +08:00

docs

docs: clean standalone LangGraph server remnants (#3301 )

2026-05-29 11:36:45 +08:00

packages/harness

feat(agent): add ToolOutputBudgetMiddleware for oversized tool output protection (#3303 )

2026-05-29 22:59:26 +08:00

scripts

fix(runtime): suppress tool execution when provider safety-terminates with tool_calls (#3035 )

2026-05-22 21:20:28 +08:00

tests

feat(agent): add ToolOutputBudgetMiddleware for oversized tool output protection (#3303 )

2026-05-29 22:59:26 +08:00

.gitignore

feat: add DeerFlowClient for embedded programmatic access (#926 )

2026-02-28 14:38:15 +08:00

.python-version

chore: add Python and LangGraph stuff

2026-01-14 07:15:02 +08:00

AGENTS.md

docs: fix typo and grammar issues in docs (#1315 )

2026-03-25 10:01:36 +08:00

CLAUDE.md

docs: clean standalone LangGraph server remnants (#3301 )

2026-05-29 11:36:45 +08:00

CONTRIBUTING.md

docs: align runtime docs with gateway mode (#2868 )

2026-05-12 16:19:21 +08:00

debug.py

feat(debug): print presented file paths with physical resolution (#2825 )

2026-05-09 18:21:01 +08:00

Dockerfile

fix(docker): set UTF-8 locale to prevent ASCII encoding errors in minimal containers (#2707 )

2026-05-04 09:41:10 +08:00

langgraph.json

fix: resolve make dev and test-e2e errors (#2570 )

2026-04-26 17:27:32 +08:00

Makefile

Add static blocking IO inventory (#3208 )

2026-05-26 23:30:24 +08:00

pyproject.toml

feat(tests): add Blockbuster runtime gate for event-loop blocking IO (#3229 )

2026-05-26 23:03:49 +08:00

README.md

Add static blocking IO inventory (#3208 )

2026-05-26 23:30:24 +08:00

ruff.toml

refactor: split backend into harness (deerflow.*) and app (app.*) (#1131 )

2026-03-14 22:55:52 +08:00

uv.lock

feat(tests): add Blockbuster runtime gate for event-loop blocking IO (#3229 )

2026-05-26 23:03:49 +08:00

README.md

DeerFlow Backend

DeerFlow is a LangGraph-based AI super agent with sandbox execution, persistent memory, and extensible tool integration. The backend enables AI agents to execute code, browse the web, manage files, delegate tasks to subagents, and retain context across conversations - all in isolated, per-thread environments.

Architecture

                        ┌──────────────────────────────────────┐
                        │          Nginx (Port 2026)           │
                        │      Unified reverse proxy           │
                        └───────┬──────────────────┬───────────┘
                                │
            /api/langgraph/*    │    /api/* (other)
            rewritten to /api/* │
                                ▼
               ┌────────────────────────────────────────┐
               │        Gateway API (8001)              │
               │        FastAPI REST + agent runtime    │
               │                                        │
               │ Models, MCP, Skills, Memory, Uploads,  │
               │ Artifacts, Threads, Runs, Streaming    │
               │                                        │
               │ ┌────────────────────────────────────┐ │
               │ │ Lead Agent                         │ │
               │ │ Middleware Chain, Tools, Subagents │ │
               │ └────────────────────────────────────┘ │
               └────────────────────────────────────────┘

Request Routing (via Nginx):

/api/langgraph/* → Gateway LangGraph-compatible API - agent interactions, threads, streaming
/api/* (other) → Gateway API - models, MCP, skills, memory, artifacts, uploads, thread-local cleanup
/ (non-API) → Frontend - Next.js web interface

Core Components

Lead Agent

The single LangGraph agent (lead_agent) is the runtime entry point, created via make_lead_agent(config). It combines:

Dynamic model selection with thinking and vision support
Middleware chain for cross-cutting concerns (9 middlewares)
Tool system with sandbox, MCP, community, and built-in tools
Subagent delegation for parallel task execution
System prompt with skills injection, memory context, and working directory guidance

Middleware Chain

Middlewares execute in strict order, each handling a specific concern:

#	Middleware	Purpose
1	ThreadDataMiddleware	Creates per-thread isolated directories (workspace, uploads, outputs)
2	UploadsMiddleware	Injects newly uploaded files into conversation context
3	SandboxMiddleware	Acquires sandbox environment for code execution
4	SummarizationMiddleware	Reduces context when approaching token limits (optional)
5	TodoListMiddleware	Tracks multi-step tasks in plan mode (optional)
6	TitleMiddleware	Auto-generates conversation titles after first exchange
7	MemoryMiddleware	Queues conversations for async memory extraction
8	ViewImageMiddleware	Injects image data for vision-capable models (conditional)
9	ClarificationMiddleware	Intercepts clarification requests and interrupts execution (must be last)

Sandbox System

Per-thread isolated execution with virtual path translation:

Abstract interface: execute_command, read_file, write_file, list_dir
Providers: LocalSandboxProvider (filesystem) and AioSandboxProvider (Docker, in community/). Async runtime paths use async sandbox lifecycle hooks so startup, readiness polling, and release do not block the event loop.
Virtual paths: /mnt/user-data/{workspace,uploads,outputs} → thread-specific physical directories
Skills path: /mnt/skills → deer-flow/skills/ directory
Skills loading: Recursively discovers nested SKILL.md files under skills/{public,custom} and preserves nested container paths
File-write safety: str_replace serializes read-modify-write per (sandbox.id, path) so isolated sandboxes keep concurrency even when virtual paths match
Tools: bash, ls, read_file, write_file, str_replace (write_file overwrites by default and exposes append for end-of-file writes; bash is disabled by default when using LocalSandboxProvider; use AioSandboxProvider for isolated shell access)

Subagent System

Async task delegation with concurrent execution:

Built-in agents: general-purpose (full toolset) and bash (command specialist, exposed only when shell access is available)
Concurrency: Max 3 subagents per turn, 15-minute timeout
Execution: Background thread pools with status tracking and SSE events
Flow: Agent calls task() tool → executor runs subagent in background → polls for completion → returns result

Memory System

LLM-powered persistent context retention across conversations:

Automatic extraction: Analyzes conversations for user context, facts, and preferences
Structured storage: User context (work, personal, top-of-mind), history, and confidence-scored facts
Debounced updates: Batches updates to minimize LLM calls (configurable wait time)
System prompt injection: Top facts + context injected into agent prompts
Storage: JSON file with mtime-based cache invalidation

Tool Ecosystem

Category	Tools
Sandbox	`bash`, `ls`, `read_file`, `write_file`, `str_replace`
Built-in	`present_files`, `ask_clarification`, `view_image`, `task` (subagent)
Community	Tavily (web search), Jina AI (web fetch), Firecrawl (scraping), DuckDuckGo (image search)
MCP	Any Model Context Protocol server (stdio, SSE, HTTP transports)
Skills	Domain-specific workflows injected via system prompt

Gateway API

FastAPI application providing REST endpoints for frontend integration:

Route	Purpose
`GET /api/models`	List available LLM models
`GET/PUT /api/mcp/config`	Manage MCP server configurations
`GET/PUT /api/skills`	List and manage skills
`POST /api/skills/install`	Install skill from `.skill` archive
`GET /api/memory`	Retrieve memory data
`POST /api/memory/reload`	Force memory reload
`GET /api/memory/config`	Memory configuration
`GET /api/memory/status`	Combined config + data
`POST /api/threads/{id}/uploads`	Upload files (auto-converts PDF/PPT/Excel/Word to Markdown, rejects directory paths, auto-renames duplicate filenames in one request)
`GET /api/threads/{id}/uploads/list`	List uploaded files
`DELETE /api/threads/{id}`	Delete DeerFlow-managed local thread data after LangGraph thread deletion; unexpected failures are logged server-side and return a generic 500 detail
`GET /api/threads/{id}/artifacts/{path}`	Serve generated artifacts

IM Channels

The IM bridge supports Feishu, Slack, and Telegram. Slack and Telegram still use the final runs.wait() response path, while Feishu now streams through runs.stream(["messages-tuple", "values"]) and updates a single in-thread card in place.

For Feishu card updates, DeerFlow stores the running card's message_id per inbound message and patches that same card until the run finishes, preserving the existing OK / DONE reaction flow.

Quick Start

Prerequisites

Python 3.12+
uv package manager
API keys for your chosen LLM provider

Installation

cd deer-flow

# Copy configuration files
cp config.example.yaml config.yaml

# Install backend dependencies
cd backend
make install

Configuration

Edit config.yaml in the project root:

models:
  - name: gpt-4o
    display_name: GPT-4o
    use: langchain_openai:ChatOpenAI
    model: gpt-4o
    api_key: $OPENAI_API_KEY
    supports_thinking: false
    supports_vision: true

  - name: gpt-5-responses
    display_name: GPT-5 (Responses API)
    use: langchain_openai:ChatOpenAI
    model: gpt-5
    api_key: $OPENAI_API_KEY
    use_responses_api: true
    output_version: responses/v1
    supports_vision: true

Set your API keys:

export OPENAI_API_KEY="your-api-key-here"

Running

Full Application (from project root):

make dev  # Starts Gateway + Frontend + Nginx

Access at: http://localhost:2026

Backend Only (from backend directory):

# Gateway API + embedded agent runtime
make dev

Direct access: Gateway at http://localhost:8001

Project Structure

backend/
├── src/
│   ├── agents/                  # Agent system
│   │   ├── lead_agent/         # Main agent (factory, prompts)
│   │   ├── middlewares/        # 9 middleware components
│   │   ├── memory/             # Memory extraction & storage
│   │   └── thread_state.py    # ThreadState schema
│   ├── gateway/                # FastAPI Gateway API
│   │   ├── app.py             # Application setup
│   │   └── routers/           # 6 route modules
│   ├── sandbox/                # Sandbox execution
│   │   ├── local/             # Local filesystem provider
│   │   ├── sandbox.py         # Abstract interface
│   │   ├── tools.py           # bash, ls, read/write/str_replace
│   │   └── middleware.py      # Sandbox lifecycle
│   ├── subagents/              # Subagent delegation
│   │   ├── builtins/          # general-purpose, bash agents
│   │   ├── executor.py        # Background execution engine
│   │   └── registry.py        # Agent registry
│   ├── tools/builtins/         # Built-in tools
│   ├── mcp/                    # MCP protocol integration
│   ├── models/                 # Model factory
│   ├── skills/                 # Skill discovery & loading
│   ├── config/                 # Configuration system
│   ├── community/              # Community tools & providers
│   ├── reflection/             # Dynamic module loading
│   └── utils/                  # Utilities
├── docs/                       # Documentation
├── tests/                      # Test suite
├── langgraph.json              # LangGraph graph registry for tooling/Studio compatibility
├── pyproject.toml              # Python dependencies
├── Makefile                    # Development commands
└── Dockerfile                  # Container build

langgraph.json is not the default service entrypoint. The scripts and Docker deployments run the Gateway embedded runtime; the file is kept for LangGraph tooling, Studio, or direct LangGraph Server compatibility.

Configuration

Main Configuration (`config.yaml`)

Place in project root. Config values starting with $ resolve as environment variables.

Key sections:

models - LLM configurations with class paths, API keys, thinking/vision flags
tools - Tool definitions with module paths and groups
tool_groups - Logical tool groupings
sandbox - Execution environment provider
skills - Skills directory paths
title - Auto-title generation settings
summarization - Context summarization settings
subagents - Subagent system (enabled/disabled)
memory - Memory system settings (enabled, storage, debounce, facts limits)

Provider note:

models[*].use references provider classes by module path (for example langchain_openai:ChatOpenAI).
If a provider module is missing, DeerFlow now returns an actionable error with install guidance (for example uv add langchain-google-genai).

Extensions Configuration (`extensions_config.json`)

MCP servers and skill states in a single file:

{
  "mcpServers": {
    "github": {
      "enabled": true,
      "type": "stdio",
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-github"],
      "env": {"GITHUB_TOKEN": "$GITHUB_TOKEN"}
    },
    "secure-http": {
      "enabled": true,
      "type": "http",
      "url": "https://api.example.com/mcp",
      "oauth": {
        "enabled": true,
        "token_url": "https://auth.example.com/oauth/token",
        "grant_type": "client_credentials",
        "client_id": "$MCP_OAUTH_CLIENT_ID",
        "client_secret": "$MCP_OAUTH_CLIENT_SECRET"
      }
    }
  },
  "skills": {
    "pdf-processing": {"enabled": true}
  }
}

Environment Variables

DEER_FLOW_CONFIG_PATH - Override config.yaml location
DEER_FLOW_EXTENSIONS_CONFIG_PATH - Override extensions_config.json location
Model API keys: OPENAI_API_KEY, ANTHROPIC_API_KEY, DEEPSEEK_API_KEY, etc.
Tool API keys: TAVILY_API_KEY, GITHUB_TOKEN, etc.

LangSmith Tracing

DeerFlow has built-in LangSmith integration for observability. When enabled, all LLM calls, agent runs, tool executions, and middleware processing are traced and visible in the LangSmith dashboard.

Setup:

Sign up at smith.langchain.com and create a project.
Add the following to your .env file in the project root:

LANGSMITH_TRACING=true
LANGSMITH_ENDPOINT=https://api.smith.langchain.com
LANGSMITH_API_KEY=lsv2_pt_xxxxxxxxxxxxxxxx
LANGSMITH_PROJECT=xxx

Legacy variables: The LANGCHAIN_TRACING_V2, LANGCHAIN_API_KEY, LANGCHAIN_PROJECT, and LANGCHAIN_ENDPOINT variables are also supported for backward compatibility. LANGSMITH_* variables take precedence when both are set.

Langfuse Tracing

DeerFlow also supports Langfuse observability for LangChain-compatible runs.

Add the following to your .env file:

LANGFUSE_TRACING=true
LANGFUSE_PUBLIC_KEY=pk-lf-xxxxxxxxxxxxxxxx
LANGFUSE_SECRET_KEY=sk-lf-xxxxxxxxxxxxxxxx
LANGFUSE_BASE_URL=https://cloud.langfuse.com

If you are using a self-hosted Langfuse deployment, set LANGFUSE_BASE_URL to your Langfuse host.

Dual Provider Behavior

If both LangSmith and Langfuse are enabled, DeerFlow initializes and attaches both callbacks so the same run data is reported to both systems.

If a provider is explicitly enabled but required credentials are missing, or the provider callback cannot be initialized, DeerFlow raises an error when tracing is initialized during model creation instead of silently disabling tracing.

Docker: In docker-compose.yaml, tracing is disabled by default (LANGSMITH_TRACING=false). Set LANGSMITH_TRACING=true and/or LANGFUSE_TRACING=true in your .env, together with the required credentials, to enable tracing in containerized deployments.

Development

Commands

make install    # Install dependencies
make dev        # Run Gateway API + embedded agent runtime (port 8001)
make gateway    # Run Gateway API without reload (port 8001)
make lint       # Run linter (ruff)
make format     # Format code (ruff)
make detect-blocking-io  # Inventory blocking IO that may block the backend event loop

Code Style

Linter/Formatter: ruff
Line length: 240 characters
Python: 3.12+ with type hints
Quotes: Double quotes
Indentation: 4 spaces

Testing

uv run pytest

make detect-blocking-io statically scans backend business code for blocking IO that may run on the backend event loop and is not test-coverage-bound. It prints a concise summary for human review and writes complete JSON findings to .deer-flow/blocking-io-findings.json at the repository root (regardless of whether the target is invoked from the repo root or from backend/). JSON findings include both broad IO category and review-oriented fields such as priority, location, blocking_call, event_loop_exposure, reason, and code. priority is a deterministic review ordering from the operation type, not proof of a bug. Bare-name same-file calls are resolved by function name, so duplicate helper names in one file can conservatively over-report async reachability.

Technology Stack

LangGraph (1.0.6+) - Agent framework and multi-agent orchestration
LangChain (1.2.3+) - LLM abstractions and tool system
FastAPI (0.115.0+) - Gateway REST API
langchain-mcp-adapters - Model Context Protocol support
agent-sandbox - Sandboxed code execution
markitdown - Multi-format document conversion
tavily-python / firecrawl-py - Web search and scraping

README.md

DeerFlow Backend

Architecture

Core Components

Lead Agent

Middleware Chain

Sandbox System

Subagent System

Memory System

Tool Ecosystem

Gateway API

IM Channels

Quick Start

Prerequisites

Installation

Configuration

Running

Project Structure

Configuration

Main Configuration (`config.yaml`)

Extensions Configuration (`extensions_config.json`)

Environment Variables

LangSmith Tracing

Langfuse Tracing

Dual Provider Behavior

Development

Commands

Code Style

Testing

Technology Stack

Documentation

License

Contributing

README.md

DeerFlow Backend

Architecture

Core Components

Lead Agent

Middleware Chain

Sandbox System

Subagent System

Memory System

Tool Ecosystem

Gateway API

IM Channels

Quick Start

Prerequisites

Installation

Configuration

Running

Project Structure

Configuration

Main Configuration (config.yaml)

Extensions Configuration (extensions_config.json)

Environment Variables

LangSmith Tracing

Langfuse Tracing

Dual Provider Behavior

Development

Commands

Code Style

Testing

Technology Stack

Documentation

License

Contributing

Main Configuration (`config.yaml`)

Extensions Configuration (`extensions_config.json`)