mirror of https://github.com/bytedance/deer-flow.git synced 2026-04-25 11:18:22 +00:00

fix(backend): stream DeerFlowClient AI text as token deltas (#1969 ) (#1974 )

* fix(backend): stream DeerFlowClient AI text as token deltas (#1969)

DeerFlowClient.stream() subscribed to LangGraph stream_mode=["values",
"custom"] which only delivers full-state snapshots at graph-node
boundaries, so AI replies were dumped as a single messages-tuple event
per node instead of streaming token-by-token. `client.stream("hello")`
looked identical to `client.chat("hello")` — the bug reported in #1969.

Subscribe to "messages" mode as well, forward AIMessageChunk deltas as
messages-tuple events with delta semantics (consumers accumulate by id),
and dedup the values-snapshot path so it does not re-synthesize AI
text that was already streamed. Introduce a per-id usage_metadata
counter so the final AIMessage in the values snapshot and the final
"messages" chunk — which carry the same cumulative usage — are not
double-counted.

chat() now accumulates per-id deltas and returns the last message's
full accumulated text. Non-streaming mock sources (single event per id)
are a degenerate case of the same logic, keeping existing callers and
tests backward compatible.

Verified end-to-end against a real LLM: a 15-number count emits 35
messages-tuple events with BPE subword boundaries clearly visible
("eleven" -> "ele" / "ven", "twelve" -> "tw" / "elve"), 476ms across
the window, end-event usage matches the values-snapshot usage exactly
(not doubled). tests/test_client_live.py::TestLiveStreaming passes.

New unit tests:
- test_messages_mode_emits_token_deltas: 3 AIMessageChunks produce 3
  delta events with correct content/id/usage, values-snapshot does not
  duplicate, usage counted once.
- test_chat_accumulates_streamed_deltas: chat() rebuilds full text
  from deltas.
- test_messages_mode_tool_message: ToolMessage delivered via messages
  mode is not duplicated by the values-snapshot synthesis path.

The stream() docstring now documents why this client does not reuse
Gateway's run_agent() / StreamBridge pipeline (sync vs async, raw
LangChain objects vs serialized dicts, single caller vs HTTP fan-out).

Fixes #1969

* refactor(backend): simplify DeerFlowClient streaming helpers (#1969)

Post-review cleanup for the token-level streaming fix. No behavior
change for correct inputs; one efficiency regression fixed.

Fix: chat() O(n²) accumulator
-----------------------------
`chat()` accumulated per-id text via `buffers[id] = buffers.get(id,"") + delta`,
which is O(n) per concat → O(n²) total over a streamed response. At
~2 KB cumulative text this becomes user-visible; at 50 KB / 5000 chunks
it costs roughly 100-300 ms of pure copying. Switched to
`dict[str, list[str]]` + `"".join()` once at return.

Cleanup
-------
- Extract `_serialize_tool_calls`, `_ai_text_event`, `_ai_tool_calls_event`,
  and `_tool_message_event` static helpers. The messages-mode and
  values-mode branches previously repeated four inline dict literals each;
  they now call the same builders.
- `StreamEvent.type` is now typed as `Literal["values", "messages-tuple",
  "custom", "end"]` via a `StreamEventType` alias. Makes the closed set
  explicit and catches typos at type-check time.
- Direct attribute access on `AIMessage`/`AIMessageChunk`: `.usage_metadata`,
  `.tool_calls`, `.id` all have default values on the base class, so the
  `getattr(..., None)` fallbacks were dead code. Removed from the hot
  path.
- `_account_usage` parameter type loosened to `Any` so that LangChain's
  `UsageMetadata` TypedDict is accepted under strict type checking.
- Trimmed narrating comments on `seen_ids` / `streamed_ids` / the
  values-synthesis skip block; kept the non-obvious ones that document
  the cross-mode dedup invariant.

Net diff: -15 lines. All 132 unit tests + harness boundary test still
pass; ruff check and ruff format pass.

* docs(backend): add STREAMING.md design note (#1969)

Dedicated design document for the token-level streaming architecture,
prompted by the bug investigation in #1969.

Contents:
- Why two parallel streaming paths exist (Gateway HTTP/async vs
  DeerFlowClient sync/in-process) and why they cannot be merged.
- LangGraph's three-layer mode naming (Graph "messages" vs Platform
  SDK "messages-tuple" vs HTTP SSE) and why a shared string constant
  would be harmful.
- Gateway path: run_agent + StreamBridge + sse_consumer with a
  sequence diagram.
- DeerFlowClient path: sync generator + direct yield, delta semantics,
  chat() accumulator.
- Why the three id sets (seen_ids / streamed_ids / counted_usage_ids)
  each carry an independent invariant and cannot be collapsed.
- End-to-end sequence for a real conversation turn.
- Lessons from #1969: why mock-based tests missed the bug, why
  BPE subword boundaries in live output are the strongest
  correctness signal, and the regression test that locks it in.
- Source code location index.

Also:
- Link from backend/CLAUDE.md Embedded Client section.
- Link from backend/docs/README.md under Feature Documentation.

* test(backend): add refactor regression guards for stream() (#1969)

Three new tests in TestStream that lock the contract introduced by
PR #1974 so any future refactor (sync->async migration, sharing a
core with Gateway's run_agent, dedup strategy change) cannot
silently change behavior.

- test_dedup_requires_messages_before_values_invariant: canary that
  documents the order-dependence of cross-mode dedup. streamed_ids
  is populated only by the messages branch, so values-before-messages
  for the same id produces duplicate AI text events. Real LangGraph
  never inverts this order, but a refactor that does (or that makes
  dedup idempotent) must update this test deliberately.

- test_messages_mode_golden_event_sequence: locks the *exact* event
  sequence (4 events: 2 messages-tuple deltas, 1 values snapshot, 1
  end) for a canonical streaming turn. List equality gives a clear
  diff on any drift in order, type, or payload shape.

- test_chat_accumulates_in_linear_time: perf canary for the O(n^2)
  fix in commit 1f11ba10. 10,000 single-char chunks must accumulate
  in under 1s; the threshold is wide enough to pass on slow CI but
  tight enough to fail if buffer = buffer + delta is restored.

All three tests pass alongside the existing 12 TestStream tests
(15/15). ruff check + ruff format clean.

* docs(backend): clarify stream() docstring on JSON serialization (#1969)

Replace the misleading "raw LangChain objects (AIMessage,
usage_metadata as dataclasses), not dicts" claim in the
"Why not reuse Gateway's run_agent?" section. The implementation
already yields plain Python dicts (StreamEvent.data is dict, and
usage_metadata is a TypedDict), so the original wording suggested
a richer return type than the API actually delivers.

The corrected wording focuses on what is actually true and
relevant: this client skips the JSON/SSE serialization layer that
Gateway adds for HTTP wire transmission, and yields stream event
payloads directly as Python data structures.

Addresses Copilot review feedback on PR #1974.

* test(backend): document none-id messages dedup limitation (#1969)

Add test_none_id_chunks_produce_duplicates_known_limitation to
TestStream that explicitly documents and asserts the current
behavior when an LLM provider emits AIMessageChunk with id=None
(vLLM, certain custom backends).

The cross-mode dedup machinery cannot record a None id in
streamed_ids (guarded by ``if msg_id:``), so the values snapshot's
reassembled AIMessage with a real id falls through and synthesizes
a duplicate AI text event. The test asserts len == 2 and locks
this as a known limitation rather than silently letting future
contributors hit it without context.

Why this is documented rather than fixed:
* Falling back to ``metadata.get("id")`` does not help — LangGraph's
  messages-mode metadata never carries the message id.
* Synthesizing ``f"_synth_{id(msg_chunk)}"`` only helps if the
  values snapshot uses the same fallback, which it does not.
* A real fix requires provider cooperation (always emit chunk ids)
  or content-based dedup (false-positive risk), neither of which
  belongs in this PR.

If a real fix lands, replace this test with a positive assertion
that dedup works for None-id chunks.

Addresses Copilot review feedback on PR #1974 (client.py:515).

* fix(frontend): UI polish - fix CSS typo, dark mode border, and hardcoded colors (#1942)

- Fix `font-norma` typo to `font-normal` in message-list subtask count
- Fix dark mode `--border` using reddish hue (22.216) instead of neutral
- Replace hardcoded `rgb(184,184,192)` in hero with `text-muted-foreground`
- Replace hardcoded `bg-[#a3a1a1]` in streaming indicator with `bg-muted-foreground`
- Add missing `font-sans` to welcome description `<pre>` for consistency
- Make case-study-section padding responsive (`px-4 md:px-20`)

Closes #1940

* docs: clarify deployment sizing guidance (#1963)

* fix(frontend): prevent stale 'new' thread ID from triggering 422 history requests (#1960)

After history.replaceState updates the URL from /chats/new to
/chats/{UUID}, Next.js useParams does not update because replaceState
bypasses the router. The useEffect in useThreadChat would then set
threadIdFromPath ('new') as the threadId, causing the LangGraph SDK
to call POST /threads/new/history which returns HTTP 422 (Invalid
thread ID: must be a UUID).

This fix adds a guard to skip the threadId update when
threadIdFromPath is the literal string 'new', preserving the
already-correct UUID that was set when the thread was created.

* fix(frontend): avoid using route new as thread id (#1967)

Co-authored-by: luoxiao6645 <luoxiao6645@gmail.com>

* Fix(subagent): Event loop conflict in SubagentExecutor.execute() (#1965)

* Fix event loop conflict in SubagentExecutor.execute()

When SubagentExecutor.execute() is called from within an already-running
event loop (e.g., when the parent agent uses async/await), calling
asyncio.run() creates a new event loop that conflicts with asyncio
primitives (like httpx.AsyncClient) that were created in and bound to
the parent loop.

This fix detects if we're already in a running event loop, and if so,
runs the subagent in a separate thread with its own isolated event loop
to avoid conflicts.

Fixes: sub-task cards not appearing in Ultra mode when using async parent agents

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(subagent): harden isolated event loop execution

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* refactor(backend): remove dead getattr in _tool_message_event

---------

Co-authored-by: greatmengqi <chenmengqi.0376@bytedance.com>
Co-authored-by: Xinmin Zeng <135568692+fancyboi999@users.noreply.github.com>
Co-authored-by: 13ernkastel <LennonCMJ@live.com>
Co-authored-by: siwuai <458372151@qq.com>
Co-authored-by: 肖 <168966994+luoxiao6645@users.noreply.github.com>
Co-authored-by: luoxiao6645 <luoxiao6645@gmail.com>
Co-authored-by: Saber <11769524+hawkli-1994@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>

2026-04-10 18:16:38 +08:00

34 KiB

Raw Blame History

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

DeerFlow is a LangGraph-based AI super agent system with a full-stack architecture. The backend provides a "super agent" with sandbox execution, persistent memory, subagent delegation, and extensible tool integration - all operating in per-thread isolated environments.

Architecture:

LangGraph Server (port 2024): Agent runtime and workflow execution
Gateway API (port 8001): REST API for models, MCP, skills, memory, artifacts, uploads, and local thread cleanup
Frontend (port 3000): Next.js web interface
Nginx (port 2026): Unified reverse proxy entry point
Provisioner (port 8002, optional in Docker dev): Started only when sandbox is configured for provisioner/Kubernetes mode

Runtime Modes:

Standard mode (make dev): LangGraph Server handles agent execution as a separate process. 4 processes total.
Gateway mode (make dev-pro, experimental): Agent runtime embedded in Gateway via RunManager + run_agent() + StreamBridge (packages/harness/deerflow/runtime/). Service manages its own concurrency via async tasks. 3 processes total, no LangGraph Server.

Project Structure:

deer-flow/
├── Makefile                    # Root commands (check, install, dev, stop)
├── config.yaml                 # Main application configuration
├── extensions_config.json      # MCP servers and skills configuration
├── backend/                    # Backend application (this directory)
│   ├── Makefile               # Backend-only commands (dev, gateway, lint)
│   ├── langgraph.json         # LangGraph server configuration
│   ├── packages/
│   │   └── harness/           # deerflow-harness package (import: deerflow.*)
│   │       ├── pyproject.toml
│   │       └── deerflow/
│   │           ├── agents/            # LangGraph agent system
│   │           │   ├── lead_agent/    # Main agent (factory + system prompt)
│   │           │   ├── middlewares/   # 10 middleware components
│   │           │   ├── memory/        # Memory extraction, queue, prompts
│   │           │   └── thread_state.py # ThreadState schema
│   │           ├── sandbox/           # Sandbox execution system
│   │           │   ├── local/         # Local filesystem provider
│   │           │   ├── sandbox.py     # Abstract Sandbox interface
│   │           │   ├── tools.py       # bash, ls, read/write/str_replace
│   │           │   └── middleware.py  # Sandbox lifecycle management
│   │           ├── subagents/         # Subagent delegation system
│   │           │   ├── builtins/      # general-purpose, bash agents
│   │           │   ├── executor.py    # Background execution engine
│   │           │   └── registry.py    # Agent registry
│   │           ├── tools/builtins/    # Built-in tools (present_files, ask_clarification, view_image)
│   │           ├── mcp/               # MCP integration (tools, cache, client)
│   │           ├── models/            # Model factory with thinking/vision support
│   │           ├── skills/            # Skills discovery, loading, parsing
│   │           ├── config/            # Configuration system (app, model, sandbox, tool, etc.)
│   │           ├── community/         # Community tools (tavily, jina_ai, firecrawl, image_search, aio_sandbox)
│   │           ├── reflection/        # Dynamic module loading (resolve_variable, resolve_class)
│   │           ├── utils/             # Utilities (network, readability)
│   │           └── client.py          # Embedded Python client (DeerFlowClient)
│   ├── app/                   # Application layer (import: app.*)
│   │   ├── gateway/           # FastAPI Gateway API
│   │   │   ├── app.py         # FastAPI application
│   │   │   └── routers/       # FastAPI route modules (models, mcp, memory, skills, uploads, threads, artifacts, agents, suggestions, channels)
│   │   └── channels/          # IM platform integrations
│   ├── tests/                 # Test suite
│   └── docs/                  # Documentation
├── frontend/                   # Next.js frontend application
└── skills/                     # Agent skills directory
    ├── public/                # Public skills (committed)
    └── custom/                # Custom skills (gitignored)

Important Development Guidelines

Documentation Update Policy

CRITICAL: Always update README.md and CLAUDE.md after every code change

When making code changes, you MUST update the relevant documentation:

Update README.md for user-facing changes (features, setup, usage instructions)
Update CLAUDE.md for development changes (architecture, commands, workflows, internal systems)
Keep documentation synchronized with the codebase at all times
Ensure accuracy and timeliness of all documentation

Commands

Root directory (for full application):

make check      # Check system requirements
make install    # Install all dependencies (frontend + backend)
make dev        # Start all services (LangGraph + Gateway + Frontend + Nginx), with config.yaml preflight
make dev-pro    # Gateway mode (experimental): skip LangGraph, agent runtime embedded in Gateway
make start-pro  # Production + Gateway mode (experimental)
make stop       # Stop all services

Backend directory (for backend development only):

make install    # Install backend dependencies
make dev        # Run LangGraph server only (port 2024)
make gateway    # Run Gateway API only (port 8001)
make test       # Run all backend tests
make lint       # Lint with ruff
make format     # Format code with ruff

Regression tests related to Docker/provisioner behavior:

tests/test_docker_sandbox_mode_detection.py (mode detection from config.yaml)
tests/test_provisioner_kubeconfig.py (kubeconfig file/directory handling)

Boundary check (harness → app import firewall):

tests/test_harness_boundary.py — ensures packages/harness/deerflow/ never imports from app.*

CI runs these regression tests for every pull request via .github/workflows/backend-unit-tests.yml.

Architecture

Harness / App Split

The backend is split into two layers with a strict dependency direction:

Harness (packages/harness/deerflow/): Publishable agent framework package (deerflow-harness). Import prefix: deerflow.*. Contains agent orchestration, tools, sandbox, models, MCP, skills, config — everything needed to build and run agents.
App (app/): Unpublished application code. Import prefix: app.*. Contains the FastAPI Gateway API and IM channel integrations (Feishu, Slack, Telegram).

Dependency rule: App imports deerflow, but deerflow never imports app. This boundary is enforced by tests/test_harness_boundary.py which runs in CI.

Import conventions:

# Harness internal
from deerflow.agents import make_lead_agent
from deerflow.models import create_chat_model

# App internal
from app.gateway.app import app
from app.channels.service import start_channel_service

# App → Harness (allowed)
from deerflow.config import get_app_config

# Harness → App (FORBIDDEN — enforced by test_harness_boundary.py)
# from app.gateway.routers.uploads import ...  # ← will fail CI

Agent System

Lead Agent (packages/harness/deerflow/agents/lead_agent/agent.py):

Entry point: make_lead_agent(config: RunnableConfig) registered in langgraph.json
Dynamic model selection via create_chat_model() with thinking/vision support
Tools loaded via get_available_tools() - combines sandbox, built-in, MCP, community, and subagent tools
System prompt generated by apply_prompt_template() with skills, memory, and subagent instructions

ThreadState (packages/harness/deerflow/agents/thread_state.py):

Extends AgentState with: sandbox, thread_data, title, artifacts, todos, uploaded_files, viewed_images
Uses custom reducers: merge_artifacts (deduplicate), merge_viewed_images (merge/clear)

Runtime Configuration (via config.configurable):

thinking_enabled - Enable model's extended thinking
model_name - Select specific LLM model
is_plan_mode - Enable TodoList middleware
subagent_enabled - Enable task delegation tool

Middleware Chain

Middlewares execute in strict order in packages/harness/deerflow/agents/lead_agent/agent.py:

ThreadDataMiddleware - Creates per-thread directories (backend/.deer-flow/threads/{thread_id}/user-data/{workspace,uploads,outputs}); Web UI thread deletion now follows LangGraph thread removal with Gateway cleanup of the local .deer-flow/threads/{thread_id} directory
UploadsMiddleware - Tracks and injects newly uploaded files into conversation
SandboxMiddleware - Acquires sandbox, stores sandbox_id in state
DanglingToolCallMiddleware - Injects placeholder ToolMessages for AIMessage tool_calls that lack responses (e.g., due to user interruption)
GuardrailMiddleware - Pre-tool-call authorization via pluggable GuardrailProvider protocol (optional, if guardrails.enabled in config). Evaluates each tool call and returns error ToolMessage on deny. Three provider options: built-in AllowlistProvider (zero deps), OAP policy providers (e.g. aport-agent-guardrails), or custom providers. See docs/GUARDRAILS.md for setup, usage, and how to implement a provider.
SummarizationMiddleware - Context reduction when approaching token limits (optional, if enabled)
TodoListMiddleware - Task tracking with write_todos tool (optional, if plan_mode)
TitleMiddleware - Auto-generates thread title after first complete exchange and normalizes structured message content before prompting the title model
MemoryMiddleware - Queues conversations for async memory update (filters to user + final AI responses)
ViewImageMiddleware - Injects base64 image data before LLM call (conditional on vision support)
SubagentLimitMiddleware - Truncates excess task tool calls from model response to enforce MAX_CONCURRENT_SUBAGENTS limit (optional, if subagent_enabled)
ClarificationMiddleware - Intercepts ask_clarification tool calls, interrupts via Command(goto=END) (must be last)

Configuration System

Main Configuration (config.yaml):

Setup: Copy config.example.yaml to config.yaml in the project root directory.

Config Versioning: config.example.yaml has a config_version field. On startup, AppConfig.from_file() compares user version vs example version and emits a warning if outdated. Missing config_version = version 0. Run make config-upgrade to auto-merge missing fields. When changing the config schema, bump config_version in config.example.yaml.

Config Caching: get_app_config() caches the parsed config, but automatically reloads it when the resolved config path changes or the file's mtime increases. This keeps Gateway and LangGraph reads aligned with config.yaml edits without requiring a manual process restart.

Configuration priority:

Explicit config_path argument
DEER_FLOW_CONFIG_PATH environment variable
config.yaml in current directory (backend/)
config.yaml in parent directory (project root - recommended location)

Config values starting with $ are resolved as environment variables (e.g., $OPENAI_API_KEY). ModelConfig also declares use_responses_api and output_version so OpenAI /v1/responses can be enabled explicitly while still using langchain_openai:ChatOpenAI.

Extensions Configuration (extensions_config.json):

MCP servers and skills are configured together in extensions_config.json in project root:

Configuration priority:

Explicit config_path argument
DEER_FLOW_EXTENSIONS_CONFIG_PATH environment variable
extensions_config.json in current directory (backend/)
extensions_config.json in parent directory (project root - recommended location)

Gateway API (`app/gateway/`)

FastAPI application on port 8001 with health check at GET /health.

Routers:

Router	Endpoints
Models (`/api/models`)	`GET /` - list models; `GET /{name}` - model details
MCP (`/api/mcp`)	`GET /config` - get config; `PUT /config` - update config (saves to extensions_config.json)
Skills (`/api/skills`)	`GET /` - list skills; `GET /{name}` - details; `PUT /{name}` - update enabled; `POST /install` - install from .skill archive (accepts standard optional frontmatter like `version`, `author`, `compatibility`)
Memory (`/api/memory`)	`GET /` - memory data; `POST /reload` - force reload; `GET /config` - config; `GET /status` - config + data
Uploads (`/api/threads/{id}/uploads`)	`POST /` - upload files (auto-converts PDF/PPT/Excel/Word); `GET /list` - list; `DELETE /{filename}` - delete
Threads (`/api/threads/{id}`)	`DELETE /` - remove DeerFlow-managed local thread data after LangGraph thread deletion; unexpected failures are logged server-side and return a generic 500 detail
Artifacts (`/api/threads/{id}/artifacts`)	`GET /{path}` - serve artifacts; active content types (`text/html`, `application/xhtml+xml`, `image/svg+xml`) are always forced as download attachments to reduce XSS risk; `?download=true` still forces download for other file types
Suggestions (`/api/threads/{id}/suggestions`)	`POST /` - generate follow-up questions; rich list/block model content is normalized before JSON parsing

Proxied through nginx: /api/langgraph/* → LangGraph, all other /api/* → Gateway.

Sandbox System (`packages/harness/deerflow/sandbox/`)

Interface: Abstract Sandbox with execute_command, read_file, write_file, list_dir Provider Pattern: SandboxProvider with acquire, get, release lifecycle Implementations:

LocalSandboxProvider - Singleton local filesystem execution with path mappings
AioSandboxProvider (packages/harness/deerflow/community/) - Docker-based isolation

Virtual Path System:

Agent sees: /mnt/user-data/{workspace,uploads,outputs}, /mnt/skills
Physical: backend/.deer-flow/threads/{thread_id}/user-data/..., deer-flow/skills/
Translation: replace_virtual_path() / replace_virtual_paths_in_command()
Detection: is_local_sandbox() checks sandbox_id == "local"

Sandbox Tools (in packages/harness/deerflow/sandbox/tools.py):

bash - Execute commands with path translation and error handling
ls - Directory listing (tree format, max 2 levels)
read_file - Read file contents with optional line range
write_file - Write/append to files, creates directories
str_replace - Substring replacement (single or all occurrences); same-path serialization is scoped to (sandbox.id, path) so isolated sandboxes do not contend on identical virtual paths inside one process

Subagent System (`packages/harness/deerflow/subagents/`)

Built-in Agents: general-purpose (all tools except task) and bash (command specialist) Execution: Dual thread pool - _scheduler_pool (3 workers) + _execution_pool (3 workers) Concurrency: MAX_CONCURRENT_SUBAGENTS = 3 enforced by SubagentLimitMiddleware (truncates excess tool calls in after_model), 15-minute timeout Flow: task() tool → SubagentExecutor → background thread → poll 5s → SSE events → result Events: task_started, task_running, task_completed/task_failed/task_timed_out

Tool System (`packages/harness/deerflow/tools/`)

get_available_tools(groups, include_mcp, model_name, subagent_enabled) assembles:

Config-defined tools - Resolved from config.yaml via resolve_variable()
MCP tools - From enabled MCP servers (lazy initialized, cached with mtime invalidation)
Built-in tools:
- present_files - Make output files visible to user (only /mnt/user-data/outputs)
- ask_clarification - Request clarification (intercepted by ClarificationMiddleware → interrupts)
- view_image - Read image as base64 (added only if model supports vision)
Subagent tool (if enabled):
- task - Delegate to subagent (description, prompt, subagent_type, max_turns)

Community tools (packages/harness/deerflow/community/):

tavily/ - Web search (5 results default) and web fetch (4KB limit)
jina_ai/ - Web fetch via Jina reader API with readability extraction
firecrawl/ - Web scraping via Firecrawl API

ACP agent tools:

invoke_acp_agent - Invokes external ACP-compatible agents from config.yaml
ACP launchers must be real ACP adapters. The standard codex CLI is not ACP-compatible by itself; configure a wrapper such as npx -y @zed-industries/codex-acp or an installed codex-acp binary
Missing ACP executables now return an actionable error message instead of a raw [Errno 2]
Each ACP agent uses a per-thread workspace at {base_dir}/threads/{thread_id}/acp-workspace/. The workspace is accessible to the lead agent via the virtual path /mnt/acp-workspace/ (read-only). In docker sandbox mode, the directory is volume-mounted into the container at /mnt/acp-workspace (read-only); in local sandbox mode, path translation is handled by tools.py
image_search/ - Image search via DuckDuckGo

MCP System (`packages/harness/deerflow/mcp/`)

Uses langchain-mcp-adapters MultiServerMCPClient for multi-server management
Lazy initialization: Tools loaded on first use via get_cached_mcp_tools()
Cache invalidation: Detects config file changes via mtime comparison
Transports: stdio (command-based), SSE, HTTP
OAuth (HTTP/SSE): Supports token endpoint flows (client_credentials, refresh_token) with automatic token refresh + Authorization header injection
Runtime updates: Gateway API saves to extensions_config.json; LangGraph detects via mtime

Skills System (`packages/harness/deerflow/skills/`)

Location: deer-flow/skills/{public,custom}/
Format: Directory with SKILL.md (YAML frontmatter: name, description, license, allowed-tools)
Loading: load_skills() recursively scans skills/{public,custom} for SKILL.md, parses metadata, and reads enabled state from extensions_config.json
Injection: Enabled skills listed in agent system prompt with container paths
Installation: POST /api/skills/install extracts .skill ZIP archive to custom/ directory

Model Factory (`packages/harness/deerflow/models/factory.py`)

create_chat_model(name, thinking_enabled) instantiates LLM from config via reflection
Supports thinking_enabled flag with per-model when_thinking_enabled overrides
Supports vLLM-style thinking toggles via when_thinking_enabled.extra_body.chat_template_kwargs.enable_thinking for Qwen reasoning models, while normalizing legacy thinking configs for backward compatibility
Supports supports_vision flag for image understanding models
Config values starting with $ resolved as environment variables
Missing provider modules surface actionable install hints from reflection resolvers (for example uv add langchain-google-genai)

vLLM Provider (`packages/harness/deerflow/models/vllm_provider.py`)

VllmChatModel subclasses langchain_openai:ChatOpenAI for vLLM 0.19.0 OpenAI-compatible endpoints
Preserves vLLM's non-standard assistant reasoning field on full responses, streaming deltas, and follow-up tool-call turns
Designed for configs that enable thinking through extra_body.chat_template_kwargs.enable_thinking on vLLM 0.19.0 Qwen reasoning models, while accepting the older thinking alias

IM Channels System (`app/channels/`)

Bridges external messaging platforms (Feishu, Slack, Telegram) to the DeerFlow agent via the LangGraph Server.

Architecture: Channels communicate with the LangGraph Server through langgraph-sdk HTTP client (same as the frontend), ensuring threads are created and managed server-side.

Components:

message_bus.py - Async pub/sub hub (InboundMessage → queue → dispatcher; OutboundMessage → callbacks → channels)
store.py - JSON-file persistence mapping channel_name:chat_id[:topic_id] → thread_id (keys are channel:chat for root conversations and channel:chat:topic for threaded conversations)
manager.py - Core dispatcher: creates threads via client.threads.create(), routes commands, keeps Slack/Telegram on client.runs.wait(), and uses client.runs.stream(["messages-tuple", "values"]) for Feishu incremental outbound updates
base.py - Abstract Channel base class (start/stop/send lifecycle)
service.py - Manages lifecycle of all configured channels from config.yaml
slack.py / feishu.py / telegram.py - Platform-specific implementations (feishu.py tracks the running card message_id in memory and patches the same card in place)

Message Flow:

External platform -> Channel impl -> MessageBus.publish_inbound()
ChannelManager._dispatch_loop() consumes from queue
For chat: look up/create thread on LangGraph Server
Feishu chat: runs.stream() → accumulate AI text → publish multiple outbound updates (is_final=False) → publish final outbound (is_final=True)
Slack/Telegram chat: runs.wait() → extract final response → publish outbound
Feishu channel sends one running reply card up front, then patches the same card for each outbound update (card JSON sets config.update_multi=true for Feishu's patch API requirement)
For commands (/new, /status, /models, /memory, /help): handle locally or query Gateway API
Outbound → channel callbacks → platform reply

Configuration (config.yaml -> channels):

langgraph_url - LangGraph Server URL (default: http://localhost:2024)
gateway_url - Gateway API URL for auxiliary commands (default: http://localhost:8001)
In Docker Compose, IM channels run inside the gateway container, so localhost points back to that container. Use http://langgraph:2024 / http://gateway:8001, or set DEER_FLOW_CHANNELS_LANGGRAPH_URL / DEER_FLOW_CHANNELS_GATEWAY_URL.
Per-channel configs: feishu (app_id, app_secret), slack (bot_token, app_token), telegram (bot_token)

Memory System (`packages/harness/deerflow/agents/memory/`)

Components:

updater.py - LLM-based memory updates with fact extraction, whitespace-normalized fact deduplication (trims leading/trailing whitespace before comparing), and atomic file I/O
queue.py - Debounced update queue (per-thread deduplication, configurable wait time)
prompt.py - Prompt templates for memory updates

Data Structure (stored in backend/.deer-flow/memory.json):

User Context: workContext, personalContext, topOfMind (1-3 sentence summaries)
History: recentMonths, earlierContext, longTermBackground
Facts: Discrete facts with id, content, category (preference/knowledge/context/behavior/goal), confidence (0-1), createdAt, source

Workflow:

MemoryMiddleware filters messages (user inputs + final AI responses) and queues conversation
Queue debounces (30s default), batches updates, deduplicates per-thread
Background thread invokes LLM to extract context updates and facts
Applies updates atomically (temp file + rename) with cache invalidation, skipping duplicate fact content before append
Next interaction injects top 15 facts + context into <memory> tags in system prompt

Focused regression coverage for the updater lives in backend/tests/test_memory_updater.py.

Configuration (config.yaml → memory):

enabled / injection_enabled - Master switches
storage_path - Path to memory.json
debounce_seconds - Wait time before processing (default: 30)
model_name - LLM for updates (null = default model)
max_facts / fact_confidence_threshold - Fact storage limits (100 / 0.7)
max_injection_tokens - Token limit for prompt injection (2000)

Reflection System (`packages/harness/deerflow/reflection/`)

resolve_variable(path) - Import module and return variable (e.g., module.path:variable_name)
resolve_class(path, base_class) - Import and validate class against base class

Config Schema

config.yaml key sections:

models[] - LLM configs with use class path, supports_thinking, supports_vision, provider-specific fields
vLLM reasoning models should use deerflow.models.vllm_provider:VllmChatModel; for Qwen-style parsers prefer when_thinking_enabled.extra_body.chat_template_kwargs.enable_thinking, and DeerFlow will also normalize the older thinking alias
tools[] - Tool configs with use variable path and group
tool_groups[] - Logical groupings for tools
sandbox.use - Sandbox provider class path
skills.path / skills.container_path - Host and container paths to skills directory
title - Auto-title generation (enabled, max_words, max_chars, prompt_template)
summarization - Context summarization (enabled, trigger conditions, keep policy)
subagents.enabled - Master switch for subagent delegation
memory - Memory system (enabled, storage_path, debounce_seconds, model_name, max_facts, fact_confidence_threshold, injection_enabled, max_injection_tokens)

extensions_config.json:

mcpServers - Map of server name → config (enabled, type, command, args, env, url, headers, oauth, description)
skills - Map of skill name → state (enabled)

Both can be modified at runtime via Gateway API endpoints or DeerFlowClient methods.

Embedded Client (`packages/harness/deerflow/client.py`)

DeerFlowClient provides direct in-process access to all DeerFlow capabilities without HTTP services. All return types align with the Gateway API response schemas, so consumer code works identically in HTTP and embedded modes.

Architecture: Imports the same deerflow modules that LangGraph Server and Gateway API use. Shares the same config files and data directories. No FastAPI dependency.

Agent Conversation (replaces LangGraph Server):

chat(message, thread_id) — synchronous, accumulates streaming deltas per message-id and returns the final AI text
stream(message, thread_id) — subscribes to LangGraph stream_mode=["values", "messages", "custom"] and yields StreamEvent:
- "values" — full state snapshot (title, messages, artifacts); AI text already delivered via messages mode is not re-synthesized here to avoid duplicate deliveries
- "messages-tuple" — per-chunk update: for AI text this is a delta (concat per id to rebuild the full message); tool calls and tool results are emitted once each
- "custom" — forwarded from StreamWriter
- "end" — stream finished (carries cumulative usage counted once per message id)
Agent created lazily via create_agent() + _build_middlewares(), same as make_lead_agent
Supports checkpointer parameter for state persistence across turns
reset_agent() forces agent recreation (e.g. after memory or skill changes)
See docs/STREAMING.md for the full design: why Gateway and DeerFlowClient are parallel paths, LangGraph's stream_mode semantics, the per-id dedup invariants, and regression testing strategy

Gateway Equivalent Methods (replaces Gateway API):

Category	Methods	Return format
Models	`list_models()`, `get_model(name)`	`{"models": [...]}`, `{name, display_name, ...}`
MCP	`get_mcp_config()`, `update_mcp_config(servers)`	`{"mcp_servers": {...}}`
Skills	`list_skills()`, `get_skill(name)`, `update_skill(name, enabled)`, `install_skill(path)`	`{"skills": [...]}`
Memory	`get_memory()`, `reload_memory()`, `get_memory_config()`, `get_memory_status()`	dict
Uploads	`upload_files(thread_id, files)`, `list_uploads(thread_id)`, `delete_upload(thread_id, filename)`	`{"success": true, "files": [...]}`, `{"files": [...], "count": N}`
Artifacts	`get_artifact(thread_id, path)` → `(bytes, mime_type)`	tuple

Key difference from Gateway: Upload accepts local Path objects instead of HTTP UploadFile, rejects directory paths before copying, and reuses a single worker when document conversion must run inside an active event loop. Artifact returns (bytes, mime_type) instead of HTTP Response. The new Gateway-only thread cleanup route deletes .deer-flow/threads/{thread_id} after LangGraph thread deletion; there is no matching DeerFlowClient method yet. update_mcp_config() and update_skill() automatically invalidate the cached agent.

Tests: tests/test_client.py (77 unit tests including TestGatewayConformance), tests/test_client_live.py (live integration tests, requires config.yaml)

Gateway Conformance Tests (TestGatewayConformance): Validate that every dict-returning client method conforms to the corresponding Gateway Pydantic response model. Each test parses the client output through the Gateway model — if Gateway adds a required field that the client doesn't provide, Pydantic raises ValidationError and CI catches the drift. Covers: ModelsListResponse, ModelResponse, SkillsListResponse, SkillResponse, SkillInstallResponse, McpConfigResponse, UploadResponse, MemoryConfigResponse, MemoryStatusResponse.

Development Workflow

Test-Driven Development (TDD) — MANDATORY

Every new feature or bug fix MUST be accompanied by unit tests. No exceptions.

Write tests in backend/tests/ following the existing naming convention test_<feature>.py
Run the full suite before and after your change: make test
Tests must pass before a feature is considered complete
For lightweight config/utility modules, prefer pure unit tests with no external dependencies
If a module causes circular import issues in tests, add a sys.modules mock in tests/conftest.py (see existing example for deerflow.subagents.executor)

# Run all tests
make test

# Run a specific test file
PYTHONPATH=. uv run pytest tests/test_<feature>.py -v

Running the Full Application

From the project root directory:

make dev

This starts all services and makes the application available at http://localhost:2026.

All startup modes:

	Local Foreground	Local Daemon	Docker Dev	Docker Prod
Dev	`./scripts/serve.sh --dev` `make dev`	`./scripts/serve.sh --dev --daemon` `make dev-daemon`	`./scripts/docker.sh start` `make docker-start`	—
Dev + Gateway	`./scripts/serve.sh --dev --gateway` `make dev-pro`	`./scripts/serve.sh --dev --gateway --daemon` `make dev-daemon-pro`	`./scripts/docker.sh start --gateway` `make docker-start-pro`	—
Prod	`./scripts/serve.sh --prod` `make start`	`./scripts/serve.sh --prod --daemon` `make start-daemon`	—	`./scripts/deploy.sh` `make up`
Prod + Gateway	`./scripts/serve.sh --prod --gateway` `make start-pro`	`./scripts/serve.sh --prod --gateway --daemon` `make start-daemon-pro`	—	`./scripts/deploy.sh --gateway` `make up-pro`

Action	Local	Docker Dev	Docker Prod
Stop	`./scripts/serve.sh --stop` `make stop`	`./scripts/docker.sh stop` `make docker-stop`	`./scripts/deploy.sh down` `make down`
Restart	`./scripts/serve.sh --restart [flags]`	`./scripts/docker.sh restart`	—

Gateway mode embeds the agent runtime in Gateway, no LangGraph server.

Nginx routing:

Standard mode: /api/langgraph/* → LangGraph Server (2024)
Gateway mode: /api/langgraph/* → Gateway embedded runtime (8001) (via envsubst)
/api/* (other) → Gateway API (8001)
/ (non-API) → Frontend (3000)

Running Backend Services Separately

From the backend directory:

# Terminal 1: LangGraph server
make dev

# Terminal 2: Gateway API
make gateway

Direct access (without nginx):

LangGraph: http://localhost:2024
Gateway: http://localhost:8001

Frontend Configuration

The frontend uses environment variables to connect to backend services:

NEXT_PUBLIC_LANGGRAPH_BASE_URL - Defaults to /api/langgraph (through nginx)
NEXT_PUBLIC_BACKEND_BASE_URL - Defaults to empty string (through nginx)

When using make dev from root, the frontend automatically connects through nginx.

Key Features

File Upload

Multi-file upload with automatic document conversion:

Endpoint: POST /api/threads/{thread_id}/uploads
Supports: PDF, PPT, Excel, Word documents (converted via markitdown)
Rejects directory inputs before copying so uploads stay all-or-nothing
Reuses one conversion worker per request when called from an active event loop
Files stored in thread-isolated directories
Agent receives uploaded file list via UploadsMiddleware

See docs/FILE_UPLOAD.md for details.

Plan Mode

TodoList middleware for complex multi-step tasks:

Controlled via runtime config: config.configurable.is_plan_mode = True
Provides write_todos tool for task tracking
One task in_progress at a time, real-time updates

See docs/plan_mode_usage.md for details.

Context Summarization

Automatic conversation summarization when approaching token limits:

Configured in config.yaml under summarization key
Trigger types: tokens, messages, or fraction of max input
Keeps recent messages while summarizing older ones

See docs/summarization.md for details.

Vision Support

For models with supports_vision: true:

ViewImageMiddleware processes images in conversation
view_image_tool added to agent's toolset
Images automatically converted to base64 and injected into state

Code Style

Uses ruff for linting and formatting
Line length: 240 characters
Python 3.12+ with type hints
Double quotes, space indentation

Documentation

See docs/ directory for detailed documentation:

CONFIGURATION.md - Configuration options
ARCHITECTURE.md - Architecture details
API.md - API reference
SETUP.md - Setup guide
FILE_UPLOAD.md - File upload feature
PATH_EXAMPLES.md - Path types and usage
summarization.md - Context summarization
plan_mode_usage.md - Plan mode with TodoList

34 KiB Raw Blame History