rayhpeng 34e835bc33
feat(gateway): implement LangGraph Platform API in Gateway, replace langgraph-cli (#1403)
* feat(gateway): implement LangGraph Platform API in Gateway, replace langgraph-cli

Implement all core LangGraph Platform API endpoints in the Gateway,
allowing it to fully replace the langgraph-cli dev server for local
development. This eliminates a heavyweight dependency and simplifies
the development stack.

Changes:
- Add runs lifecycle endpoints (create, stream, wait, cancel, join)
- Add threads CRUD and search endpoints
- Add assistants compatibility endpoints (search, get, graph, schemas)
- Add StreamBridge (in-memory pub/sub for SSE) and async provider
- Add RunManager with atomic create_or_reject (eliminates TOCTOU race)
- Add worker with interrupt/rollback cancel actions and runtime context injection
- Route /api/langgraph/* to Gateway in nginx config
- Skip langgraph-cli startup by default (SKIP_LANGGRAPH_SERVER=0 to restore)
- Add unit tests for RunManager, SSE format, and StreamBridge

* fix: drain bridge queue on client disconnect to prevent backpressure

When on_disconnect=continue, keep consuming events from the bridge
without yielding, so the worker is not blocked by a full queue.
Only on_disconnect=cancel breaks out immediately.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* fix: remove pytest import

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* fix: Fix default stream_mode to ["values", "messages-tuple"]

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* fix: Remove unused if_exists field from ThreadCreateRequest

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* fix: address review comments on gateway LangGraph API

- Mount runs.py router in app.py (missing include_router)
- Normalize interrupt_before/after "*" to node list before run_agent()
- Use entry.id for SSE event ID instead of counter
- Drain bridge queue on disconnect when on_disconnect=continue
- Reuse serialization helper in wait_run() for consistent wire format
- Reject unsupported multitask_strategy with 400
- Remove SKIP_LANGGRAPH_SERVER fallback, always use Gateway

* feat: extract app.state access into deps.py

Encapsulate read/write operations for singleton objects (RunManager,
StreamBridge, checkpointer) held in app.state into a shared utility,
reducing repeated access patterns across router modules.

* feat: extract deerflow.runtime.serialization module with tests

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: replace duplicated serialization with deerflow.runtime.serialization

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: extract app/gateway/services.py with run lifecycle logic

Create a service layer that centralizes SSE formatting, input/config
normalization, and run lifecycle management. Router modules will delegate
to these functions instead of using private cross-imported helpers.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: wire routers to use services layer, remove cross-module private imports

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* style: apply ruff formatting to refactored files

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(runtime): support LangGraph dev server and add compat route

- Enable official LangGraph dev server for local development workflow
- Decouple runtime components from agents package for better separation
- Provide gateway-backed fallback route when dev server is skipped
- Simplify lifecycle management using context manager in gateway

* feat(runtime): add Store providers with auto-backend selection

- Add async_provider.py and provider.py under deerflow/runtime/store/
- Support memory, sqlite, postgres backends matching checkpointer config
- Integrate into FastAPI lifespan via AsyncExitStack in deps.py
- Replace hardcoded InMemoryStore with config-driven factory

* refactor(gateway): migrate thread management from checkpointer to Store and resolve multiple endpoint failures

- Add Store-backed CRUD helpers (_store_get, _store_put, _store_upsert)
- Replace checkpoint-scanning search with two-phase strategy:
  phase 1 reads Store (O(threads)), phase 2 backfills from checkpointer
  for legacy/LangGraph Server threads with lazy migration
- Extend Store record schema with values field for title persistence
- Sync thread title from checkpoint to Store after run completion
- Fix /threads/{id}/runs/{run_id}/stream 405 by accepting both
  GET and POST methods; POST handles interrupt/rollback actions
- Fix /threads/{id}/state 500 by separating read_config and
  write_config, adding checkpoint_ns to configurable, and
  shallow-copying checkpoint/metadata before mutation
- Sync title to Store on state update for immediate search reflection
- Move _upsert_thread_in_store into services.py, remove duplicate logic
- Add _sync_thread_title_after_run: await run task, read final
  checkpoint title, write back to Store record
- Spawn title sync as background task from start_run when Store exists

* refactor(runtime): deduplicate store and checkpointer provider logic

Extract _ensure_sqlite_parent_dir() helper into checkpointer/provider.py
and use it in all three places that previously inlined the same mkdir logic.
Consolidate duplicate error constants in store/async_provider.py by importing
from store/provider.py instead of redefining them.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* refactor(runtime): move SQLite helpers to runtime/store, checkpointer imports from store

_resolve_sqlite_conn_str and _ensure_sqlite_parent_dir now live in
runtime/store/provider.py. agents/checkpointer/provider and
agents/checkpointer/async_provider import from there, reversing the
previous dependency direction (store → checkpointer becomes
checkpointer → store).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* refactor(runtime): extract SQLite helpers into runtime/store/_sqlite_utils.py

Move resolve_sqlite_conn_str and ensure_sqlite_parent_dir out of
checkpointer/provider.py into a dedicated _sqlite_utils module.
Functions are now public (no underscore prefix), making cross-module
imports semantically correct. All four provider files import from
the single shared location.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(gateway): use adelete_thread to fully remove thread checkpoints on delete

AsyncSqliteSaver has no adelete method — the previous hasattr check
always evaluated to False, silently leaving all checkpoint rows in the
database. Switch to adelete_thread(thread_id) which deletes every
checkpoint and pending-write row for the thread across all namespaces
(including sub-graph checkpoints).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(gateway): remove dead bridge_cm/ckpt_cm code and fix StrEnum lint

app.py had unreachable code after the async-with lifespan refactor:
bridge_cm and ckpt_cm were referenced but never defined (F821), and
the channel service startup/shutdown was outside the langgraph_runtime
block so it never ran. Move channel service lifecycle inside the
async-with block where it belongs.

Replace str+Enum inheritance in RunStatus and DisconnectMode with
StrEnum as suggested by UP042.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* style: format with ruff

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: JeffJiang <for-eleven@hotmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
2026-03-30 16:02:23 +08:00
..
2026-01-14 09:57:52 +08:00

DeerFlow Backend

DeerFlow is a LangGraph-based AI super agent with sandbox execution, persistent memory, and extensible tool integration. The backend enables AI agents to execute code, browse the web, manage files, delegate tasks to subagents, and retain context across conversations - all in isolated, per-thread environments.


Architecture

                        ┌──────────────────────────────────────┐
                        │          Nginx (Port 2026)           │
                        │      Unified reverse proxy           │
                        └───────┬──────────────────┬───────────┘
                                │                  │
              /api/langgraph/*  │                  │  /api/* (other)
                                ▼                  ▼
               ┌────────────────────┐  ┌────────────────────────┐
               │ LangGraph Server   │  │   Gateway API (8001)   │
               │    (Port 2024)     │  │   FastAPI REST         │
               │                    │  │                        │
               │ ┌────────────────┐ │  │ Models, MCP, Skills,   │
               │ │  Lead Agent    │ │  │ Memory, Uploads,       │
               │ │  ┌──────────┐  │ │  │ Artifacts              │
               │ │  │Middleware│  │ │  └────────────────────────┘
               │ │  │  Chain   │  │ │
               │ │  └──────────┘  │ │
               │ │  ┌──────────┐  │ │
               │ │  │  Tools   │  │ │
               │ │  └──────────┘  │ │
               │ │  ┌──────────┐  │ │
               │ │  │Subagents │  │ │
               │ │  └──────────┘  │ │
               │ └────────────────┘ │
               └────────────────────┘

Request Routing (via Nginx):

  • /api/langgraph/* → LangGraph Server - agent interactions, threads, streaming
  • /api/* (other) → Gateway API - models, MCP, skills, memory, artifacts, uploads, thread-local cleanup
  • / (non-API) → Frontend - Next.js web interface

Core Components

Lead Agent

The single LangGraph agent (lead_agent) is the runtime entry point, created via make_lead_agent(config). It combines:

  • Dynamic model selection with thinking and vision support
  • Middleware chain for cross-cutting concerns (9 middlewares)
  • Tool system with sandbox, MCP, community, and built-in tools
  • Subagent delegation for parallel task execution
  • System prompt with skills injection, memory context, and working directory guidance

Middleware Chain

Middlewares execute in strict order, each handling a specific concern:

# Middleware Purpose
1 ThreadDataMiddleware Creates per-thread isolated directories (workspace, uploads, outputs)
2 UploadsMiddleware Injects newly uploaded files into conversation context
3 SandboxMiddleware Acquires sandbox environment for code execution
4 SummarizationMiddleware Reduces context when approaching token limits (optional)
5 TodoListMiddleware Tracks multi-step tasks in plan mode (optional)
6 TitleMiddleware Auto-generates conversation titles after first exchange
7 MemoryMiddleware Queues conversations for async memory extraction
8 ViewImageMiddleware Injects image data for vision-capable models (conditional)
9 ClarificationMiddleware Intercepts clarification requests and interrupts execution (must be last)

Sandbox System

Per-thread isolated execution with virtual path translation:

  • Abstract interface: execute_command, read_file, write_file, list_dir
  • Providers: LocalSandboxProvider (filesystem) and AioSandboxProvider (Docker, in community/)
  • Virtual paths: /mnt/user-data/{workspace,uploads,outputs} → thread-specific physical directories
  • Skills path: /mnt/skillsdeer-flow/skills/ directory
  • Skills loading: Recursively discovers nested SKILL.md files under skills/{public,custom} and preserves nested container paths
  • Tools: bash, ls, read_file, write_file, str_replace (bash is disabled by default when using LocalSandboxProvider; use AioSandboxProvider for isolated shell access)

Subagent System

Async task delegation with concurrent execution:

  • Built-in agents: general-purpose (full toolset) and bash (command specialist, exposed only when shell access is available)
  • Concurrency: Max 3 subagents per turn, 15-minute timeout
  • Execution: Background thread pools with status tracking and SSE events
  • Flow: Agent calls task() tool → executor runs subagent in background → polls for completion → returns result

Memory System

LLM-powered persistent context retention across conversations:

  • Automatic extraction: Analyzes conversations for user context, facts, and preferences
  • Structured storage: User context (work, personal, top-of-mind), history, and confidence-scored facts
  • Debounced updates: Batches updates to minimize LLM calls (configurable wait time)
  • System prompt injection: Top facts + context injected into agent prompts
  • Storage: JSON file with mtime-based cache invalidation

Tool Ecosystem

Category Tools
Sandbox bash, ls, read_file, write_file, str_replace
Built-in present_files, ask_clarification, view_image, task (subagent)
Community Tavily (web search), Jina AI (web fetch), Firecrawl (scraping), DuckDuckGo (image search)
MCP Any Model Context Protocol server (stdio, SSE, HTTP transports)
Skills Domain-specific workflows injected via system prompt

Gateway API

FastAPI application providing REST endpoints for frontend integration:

Route Purpose
GET /api/models List available LLM models
GET/PUT /api/mcp/config Manage MCP server configurations
GET/PUT /api/skills List and manage skills
POST /api/skills/install Install skill from .skill archive
GET /api/memory Retrieve memory data
POST /api/memory/reload Force memory reload
GET /api/memory/config Memory configuration
GET /api/memory/status Combined config + data
POST /api/threads/{id}/uploads Upload files (auto-converts PDF/PPT/Excel/Word to Markdown, rejects directory paths)
GET /api/threads/{id}/uploads/list List uploaded files
DELETE /api/threads/{id} Delete DeerFlow-managed local thread data after LangGraph thread deletion; unexpected failures are logged server-side and return a generic 500 detail
GET /api/threads/{id}/artifacts/{path} Serve generated artifacts

IM Channels

The IM bridge supports Feishu, Slack, and Telegram. Slack and Telegram still use the final runs.wait() response path, while Feishu now streams through runs.stream(["messages-tuple", "values"]) and updates a single in-thread card in place.

For Feishu card updates, DeerFlow stores the running card's message_id per inbound message and patches that same card until the run finishes, preserving the existing OK / DONE reaction flow.


Quick Start

Prerequisites

  • Python 3.12+
  • uv package manager
  • API keys for your chosen LLM provider

Installation

cd deer-flow

# Copy configuration files
cp config.example.yaml config.yaml

# Install backend dependencies
cd backend
make install

Configuration

Edit config.yaml in the project root:

models:
  - name: gpt-4o
    display_name: GPT-4o
    use: langchain_openai:ChatOpenAI
    model: gpt-4o
    api_key: $OPENAI_API_KEY
    supports_thinking: false
    supports_vision: true

  - name: gpt-5-responses
    display_name: GPT-5 (Responses API)
    use: langchain_openai:ChatOpenAI
    model: gpt-5
    api_key: $OPENAI_API_KEY
    use_responses_api: true
    output_version: responses/v1
    supports_vision: true

Set your API keys:

export OPENAI_API_KEY="your-api-key-here"

Running

Full Application (from project root):

make dev  # Starts LangGraph + Gateway + Frontend + Nginx

Access at: http://localhost:2026

Backend Only (from backend directory):

# Terminal 1: LangGraph server
make dev

# Terminal 2: Gateway API
make gateway

Direct access: LangGraph at http://localhost:2024, Gateway at http://localhost:8001


Project Structure

backend/
├── src/
│   ├── agents/                  # Agent system
│   │   ├── lead_agent/         # Main agent (factory, prompts)
│   │   ├── middlewares/        # 9 middleware components
│   │   ├── memory/             # Memory extraction & storage
│   │   └── thread_state.py    # ThreadState schema
│   ├── gateway/                # FastAPI Gateway API
│   │   ├── app.py             # Application setup
│   │   └── routers/           # 6 route modules
│   ├── sandbox/                # Sandbox execution
│   │   ├── local/             # Local filesystem provider
│   │   ├── sandbox.py         # Abstract interface
│   │   ├── tools.py           # bash, ls, read/write/str_replace
│   │   └── middleware.py      # Sandbox lifecycle
│   ├── subagents/              # Subagent delegation
│   │   ├── builtins/          # general-purpose, bash agents
│   │   ├── executor.py        # Background execution engine
│   │   └── registry.py        # Agent registry
│   ├── tools/builtins/         # Built-in tools
│   ├── mcp/                    # MCP protocol integration
│   ├── models/                 # Model factory
│   ├── skills/                 # Skill discovery & loading
│   ├── config/                 # Configuration system
│   ├── community/              # Community tools & providers
│   ├── reflection/             # Dynamic module loading
│   └── utils/                  # Utilities
├── docs/                       # Documentation
├── tests/                      # Test suite
├── langgraph.json              # LangGraph server configuration
├── pyproject.toml              # Python dependencies
├── Makefile                    # Development commands
└── Dockerfile                  # Container build

Configuration

Main Configuration (config.yaml)

Place in project root. Config values starting with $ resolve as environment variables.

Key sections:

  • models - LLM configurations with class paths, API keys, thinking/vision flags
  • tools - Tool definitions with module paths and groups
  • tool_groups - Logical tool groupings
  • sandbox - Execution environment provider
  • skills - Skills directory paths
  • title - Auto-title generation settings
  • summarization - Context summarization settings
  • subagents - Subagent system (enabled/disabled)
  • memory - Memory system settings (enabled, storage, debounce, facts limits)

Provider note:

  • models[*].use references provider classes by module path (for example langchain_openai:ChatOpenAI).
  • If a provider module is missing, DeerFlow now returns an actionable error with install guidance (for example uv add langchain-google-genai).

Extensions Configuration (extensions_config.json)

MCP servers and skill states in a single file:

{
  "mcpServers": {
    "github": {
      "enabled": true,
      "type": "stdio",
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-github"],
      "env": {"GITHUB_TOKEN": "$GITHUB_TOKEN"}
    },
    "secure-http": {
      "enabled": true,
      "type": "http",
      "url": "https://api.example.com/mcp",
      "oauth": {
        "enabled": true,
        "token_url": "https://auth.example.com/oauth/token",
        "grant_type": "client_credentials",
        "client_id": "$MCP_OAUTH_CLIENT_ID",
        "client_secret": "$MCP_OAUTH_CLIENT_SECRET"
      }
    }
  },
  "skills": {
    "pdf-processing": {"enabled": true}
  }
}

Environment Variables

  • DEER_FLOW_CONFIG_PATH - Override config.yaml location
  • DEER_FLOW_EXTENSIONS_CONFIG_PATH - Override extensions_config.json location
  • Model API keys: OPENAI_API_KEY, ANTHROPIC_API_KEY, DEEPSEEK_API_KEY, etc.
  • Tool API keys: TAVILY_API_KEY, GITHUB_TOKEN, etc.

LangSmith Tracing

DeerFlow has built-in LangSmith integration for observability. When enabled, all LLM calls, agent runs, tool executions, and middleware processing are traced and visible in the LangSmith dashboard.

Setup:

  1. Sign up at smith.langchain.com and create a project.
  2. Add the following to your .env file in the project root:
LANGSMITH_TRACING=true
LANGSMITH_ENDPOINT=https://api.smith.langchain.com
LANGSMITH_API_KEY=lsv2_pt_xxxxxxxxxxxxxxxx
LANGSMITH_PROJECT=xxx

Legacy variables: The LANGCHAIN_TRACING_V2, LANGCHAIN_API_KEY, LANGCHAIN_PROJECT, and LANGCHAIN_ENDPOINT variables are also supported for backward compatibility. LANGSMITH_* variables take precedence when both are set.

Docker: In docker-compose.yaml, tracing is disabled by default (LANGSMITH_TRACING=false). Set LANGSMITH_TRACING=true and provide LANGSMITH_API_KEY in your .env to enable it in containerized deployments.


Development

Commands

make install    # Install dependencies
make dev        # Run LangGraph server (port 2024)
make gateway    # Run Gateway API (port 8001)
make lint       # Run linter (ruff)
make format     # Format code (ruff)

Code Style

  • Linter/Formatter: ruff
  • Line length: 240 characters
  • Python: 3.12+ with type hints
  • Quotes: Double quotes
  • Indentation: 4 spaces

Testing

uv run pytest

Technology Stack

  • LangGraph (1.0.6+) - Agent framework and multi-agent orchestration
  • LangChain (1.2.3+) - LLM abstractions and tool system
  • FastAPI (0.115.0+) - Gateway REST API
  • langchain-mcp-adapters - Model Context Protocol support
  • agent-sandbox - Sandboxed code execution
  • markitdown - Multi-format document conversion
  • tavily-python / firecrawl-py - Web search and scraping

Documentation


License

See the LICENSE file in the project root.

Contributing

See CONTRIBUTING.md for contribution guidelines.