* feat(mcp): support custom tool interceptors via extensions_config.json
Add a generic extension point for registering custom MCP tool
interceptors through `extensions_config.json`. This allows downstream
projects to inject per-request header manipulation, auth context
propagation, or other cross-cutting concerns without modifying
DeerFlow source code.
Interceptors are declared as Python callable paths in a new
`mcpInterceptors` array field and loaded via the existing
`resolve_variable` reflection mechanism:
```json
{
"mcpInterceptors": [
"my_package.mcp.auth:build_auth_interceptor"
]
}
```
Each entry must resolve to a no-arg builder function that returns an
async interceptor compatible with `MultiServerMCPClient`'s
`tool_interceptors` interface.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test(mcp): add unit tests for custom tool interceptors
Cover all branches of the mcpInterceptors loading logic:
- valid interceptor loaded and appended to tool_interceptors
- multiple interceptors loaded in declaration order
- builder returning None is skipped
- resolve_variable ImportError logged and skipped
- builder raising exception logged and skipped
- absent mcpInterceptors field is safe (no-op)
- custom interceptors coexist with OAuth interceptor
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Potential fix for pull request finding
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
* fix(mcp): validate mcpInterceptors type and fix lint warnings
Address review feedback:
1. Validate mcpInterceptors config value before iterating:
- Accept a single string and normalize to [string]
- Ignore None silently
- Log warning and skip for non-list/non-string types
2. Fix ruff F841 lint errors in tests:
- Rename _make_mock_env to _make_patches, embed mock_client
- Remove unused `as mock_cls` bindings where not needed
- Extract _get_interceptors() helper to reduce repetition
3. Add two new test cases for type validation:
- test_mcp_interceptors_single_string_is_normalized
- test_mcp_interceptors_invalid_type_logs_warning
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(mcp): validate interceptor return type and fix import mock path
Address review feedback:
1. Validate builder return type with callable() check:
- callable interceptor → append to tool_interceptors
- None → silently skip (builder opted out)
- non-callable → log warning with type name and skip
2. Fix test mock path: resolve_variable is a top-level import in
tools.py, so mock deerflow.mcp.tools.resolve_variable instead of
deerflow.reflection.resolve_variable to correctly intercept calls.
3. Add test_custom_interceptor_non_callable_return_logs_warning to
cover the new non-callable validation branch.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs(mcp): add mcpInterceptors example and documentation
- Add mcpInterceptors field to extensions_config.example.json
- Add "Custom Tool Interceptors" section to MCP_SERVER.md with
configuration format, example interceptor code, and edge case
behavior notes
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: IECspace <IECspace@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
The tool is registered as `present_files` (plural) in present_file_tool.py,
but four references in documentation and prompt strings incorrectly used the
singular form `present_file`. This could cause confusion and potentially
lead to incorrect tool invocations.
Changed files:
- backend/docs/GUARDRAILS.md
- backend/docs/ARCHITECTURE.md
- backend/packages/harness/deerflow/agents/lead_agent/prompt.py (2 occurrences)
* docs: mark memory updater async migration as completed
- Update TODO.md to mark the replacement of sync model.invoke()
with async model.ainvoke() in title_middleware and memory updater
as completed using [x] format
Addresses #2131
* feat: switch memory updater to async LLM calls
- Add async aupdate_memory() method using await model.ainvoke()
- Convert sync update_memory() to use async wrapper
- Add _run_async_update_sync() for nested loop context handling
- Maintain backward compatibility with existing sync API
- Add ThreadPoolExecutor for async execution from sync contexts
Addresses #2131
* test: add tests for async memory updater
- Add test_async_update_memory_uses_ainvoke() to verify async path
- Convert existing tests to use AsyncMock and ainvoke assertions
- Add test_sync_update_memory_wrapper_works_in_running_loop()
- Update all model mocks to use async await patterns
Addresses #2131
* fix: apply ruff formatting to memory updater
- Format multi-line expressions to single line
- Ensure code style consistency with project standards
- Fix lint issues caught by GitHub Actions
* test: add comprehensive tests for async memory updater
- Add test_async_update_memory_uses_ainvoke() to verify async path
- Convert existing tests to use AsyncMock and ainvoke assertions
- Add test_sync_update_memory_wrapper_works_in_running_loop()
- Update all model mocks to use async await patterns
- Ensure backward compatibility with sync API
* fix: satisfy ruff formatting in memory updater test
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
- Move time.sleep() -> asyncio.sleep() from Planned to Completed Features
- Clean up duplicate entries in TODO.md
Ensures completed async optimizations are properly tracked.
* fix(backend): stream DeerFlowClient AI text as token deltas (#1969)
DeerFlowClient.stream() subscribed to LangGraph stream_mode=["values",
"custom"] which only delivers full-state snapshots at graph-node
boundaries, so AI replies were dumped as a single messages-tuple event
per node instead of streaming token-by-token. `client.stream("hello")`
looked identical to `client.chat("hello")` — the bug reported in #1969.
Subscribe to "messages" mode as well, forward AIMessageChunk deltas as
messages-tuple events with delta semantics (consumers accumulate by id),
and dedup the values-snapshot path so it does not re-synthesize AI
text that was already streamed. Introduce a per-id usage_metadata
counter so the final AIMessage in the values snapshot and the final
"messages" chunk — which carry the same cumulative usage — are not
double-counted.
chat() now accumulates per-id deltas and returns the last message's
full accumulated text. Non-streaming mock sources (single event per id)
are a degenerate case of the same logic, keeping existing callers and
tests backward compatible.
Verified end-to-end against a real LLM: a 15-number count emits 35
messages-tuple events with BPE subword boundaries clearly visible
("eleven" -> "ele" / "ven", "twelve" -> "tw" / "elve"), 476ms across
the window, end-event usage matches the values-snapshot usage exactly
(not doubled). tests/test_client_live.py::TestLiveStreaming passes.
New unit tests:
- test_messages_mode_emits_token_deltas: 3 AIMessageChunks produce 3
delta events with correct content/id/usage, values-snapshot does not
duplicate, usage counted once.
- test_chat_accumulates_streamed_deltas: chat() rebuilds full text
from deltas.
- test_messages_mode_tool_message: ToolMessage delivered via messages
mode is not duplicated by the values-snapshot synthesis path.
The stream() docstring now documents why this client does not reuse
Gateway's run_agent() / StreamBridge pipeline (sync vs async, raw
LangChain objects vs serialized dicts, single caller vs HTTP fan-out).
Fixes#1969
* refactor(backend): simplify DeerFlowClient streaming helpers (#1969)
Post-review cleanup for the token-level streaming fix. No behavior
change for correct inputs; one efficiency regression fixed.
Fix: chat() O(n²) accumulator
-----------------------------
`chat()` accumulated per-id text via `buffers[id] = buffers.get(id,"") + delta`,
which is O(n) per concat → O(n²) total over a streamed response. At
~2 KB cumulative text this becomes user-visible; at 50 KB / 5000 chunks
it costs roughly 100-300 ms of pure copying. Switched to
`dict[str, list[str]]` + `"".join()` once at return.
Cleanup
-------
- Extract `_serialize_tool_calls`, `_ai_text_event`, `_ai_tool_calls_event`,
and `_tool_message_event` static helpers. The messages-mode and
values-mode branches previously repeated four inline dict literals each;
they now call the same builders.
- `StreamEvent.type` is now typed as `Literal["values", "messages-tuple",
"custom", "end"]` via a `StreamEventType` alias. Makes the closed set
explicit and catches typos at type-check time.
- Direct attribute access on `AIMessage`/`AIMessageChunk`: `.usage_metadata`,
`.tool_calls`, `.id` all have default values on the base class, so the
`getattr(..., None)` fallbacks were dead code. Removed from the hot
path.
- `_account_usage` parameter type loosened to `Any` so that LangChain's
`UsageMetadata` TypedDict is accepted under strict type checking.
- Trimmed narrating comments on `seen_ids` / `streamed_ids` / the
values-synthesis skip block; kept the non-obvious ones that document
the cross-mode dedup invariant.
Net diff: -15 lines. All 132 unit tests + harness boundary test still
pass; ruff check and ruff format pass.
* docs(backend): add STREAMING.md design note (#1969)
Dedicated design document for the token-level streaming architecture,
prompted by the bug investigation in #1969.
Contents:
- Why two parallel streaming paths exist (Gateway HTTP/async vs
DeerFlowClient sync/in-process) and why they cannot be merged.
- LangGraph's three-layer mode naming (Graph "messages" vs Platform
SDK "messages-tuple" vs HTTP SSE) and why a shared string constant
would be harmful.
- Gateway path: run_agent + StreamBridge + sse_consumer with a
sequence diagram.
- DeerFlowClient path: sync generator + direct yield, delta semantics,
chat() accumulator.
- Why the three id sets (seen_ids / streamed_ids / counted_usage_ids)
each carry an independent invariant and cannot be collapsed.
- End-to-end sequence for a real conversation turn.
- Lessons from #1969: why mock-based tests missed the bug, why
BPE subword boundaries in live output are the strongest
correctness signal, and the regression test that locks it in.
- Source code location index.
Also:
- Link from backend/CLAUDE.md Embedded Client section.
- Link from backend/docs/README.md under Feature Documentation.
* test(backend): add refactor regression guards for stream() (#1969)
Three new tests in TestStream that lock the contract introduced by
PR #1974 so any future refactor (sync->async migration, sharing a
core with Gateway's run_agent, dedup strategy change) cannot
silently change behavior.
- test_dedup_requires_messages_before_values_invariant: canary that
documents the order-dependence of cross-mode dedup. streamed_ids
is populated only by the messages branch, so values-before-messages
for the same id produces duplicate AI text events. Real LangGraph
never inverts this order, but a refactor that does (or that makes
dedup idempotent) must update this test deliberately.
- test_messages_mode_golden_event_sequence: locks the *exact* event
sequence (4 events: 2 messages-tuple deltas, 1 values snapshot, 1
end) for a canonical streaming turn. List equality gives a clear
diff on any drift in order, type, or payload shape.
- test_chat_accumulates_in_linear_time: perf canary for the O(n^2)
fix in commit 1f11ba10. 10,000 single-char chunks must accumulate
in under 1s; the threshold is wide enough to pass on slow CI but
tight enough to fail if buffer = buffer + delta is restored.
All three tests pass alongside the existing 12 TestStream tests
(15/15). ruff check + ruff format clean.
* docs(backend): clarify stream() docstring on JSON serialization (#1969)
Replace the misleading "raw LangChain objects (AIMessage,
usage_metadata as dataclasses), not dicts" claim in the
"Why not reuse Gateway's run_agent?" section. The implementation
already yields plain Python dicts (StreamEvent.data is dict, and
usage_metadata is a TypedDict), so the original wording suggested
a richer return type than the API actually delivers.
The corrected wording focuses on what is actually true and
relevant: this client skips the JSON/SSE serialization layer that
Gateway adds for HTTP wire transmission, and yields stream event
payloads directly as Python data structures.
Addresses Copilot review feedback on PR #1974.
* test(backend): document none-id messages dedup limitation (#1969)
Add test_none_id_chunks_produce_duplicates_known_limitation to
TestStream that explicitly documents and asserts the current
behavior when an LLM provider emits AIMessageChunk with id=None
(vLLM, certain custom backends).
The cross-mode dedup machinery cannot record a None id in
streamed_ids (guarded by ``if msg_id:``), so the values snapshot's
reassembled AIMessage with a real id falls through and synthesizes
a duplicate AI text event. The test asserts len == 2 and locks
this as a known limitation rather than silently letting future
contributors hit it without context.
Why this is documented rather than fixed:
* Falling back to ``metadata.get("id")`` does not help — LangGraph's
messages-mode metadata never carries the message id.
* Synthesizing ``f"_synth_{id(msg_chunk)}"`` only helps if the
values snapshot uses the same fallback, which it does not.
* A real fix requires provider cooperation (always emit chunk ids)
or content-based dedup (false-positive risk), neither of which
belongs in this PR.
If a real fix lands, replace this test with a positive assertion
that dedup works for None-id chunks.
Addresses Copilot review feedback on PR #1974 (client.py:515).
* fix(frontend): UI polish - fix CSS typo, dark mode border, and hardcoded colors (#1942)
- Fix `font-norma` typo to `font-normal` in message-list subtask count
- Fix dark mode `--border` using reddish hue (22.216) instead of neutral
- Replace hardcoded `rgb(184,184,192)` in hero with `text-muted-foreground`
- Replace hardcoded `bg-[#a3a1a1]` in streaming indicator with `bg-muted-foreground`
- Add missing `font-sans` to welcome description `<pre>` for consistency
- Make case-study-section padding responsive (`px-4 md:px-20`)
Closes#1940
* docs: clarify deployment sizing guidance (#1963)
* fix(frontend): prevent stale 'new' thread ID from triggering 422 history requests (#1960)
After history.replaceState updates the URL from /chats/new to
/chats/{UUID}, Next.js useParams does not update because replaceState
bypasses the router. The useEffect in useThreadChat would then set
threadIdFromPath ('new') as the threadId, causing the LangGraph SDK
to call POST /threads/new/history which returns HTTP 422 (Invalid
thread ID: must be a UUID).
This fix adds a guard to skip the threadId update when
threadIdFromPath is the literal string 'new', preserving the
already-correct UUID that was set when the thread was created.
* fix(frontend): avoid using route new as thread id (#1967)
Co-authored-by: luoxiao6645 <luoxiao6645@gmail.com>
* Fix(subagent): Event loop conflict in SubagentExecutor.execute() (#1965)
* Fix event loop conflict in SubagentExecutor.execute()
When SubagentExecutor.execute() is called from within an already-running
event loop (e.g., when the parent agent uses async/await), calling
asyncio.run() creates a new event loop that conflicts with asyncio
primitives (like httpx.AsyncClient) that were created in and bound to
the parent loop.
This fix detects if we're already in a running event loop, and if so,
runs the subagent in a separate thread with its own isolated event loop
to avoid conflicts.
Fixes: sub-task cards not appearing in Ultra mode when using async parent agents
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(subagent): harden isolated event loop execution
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* refactor(backend): remove dead getattr in _tool_message_event
---------
Co-authored-by: greatmengqi <chenmengqi.0376@bytedance.com>
Co-authored-by: Xinmin Zeng <135568692+fancyboi999@users.noreply.github.com>
Co-authored-by: 13ernkastel <LennonCMJ@live.com>
Co-authored-by: siwuai <458372151@qq.com>
Co-authored-by: 肖 <168966994+luoxiao6645@users.noreply.github.com>
Co-authored-by: luoxiao6645 <luoxiao6645@gmail.com>
Co-authored-by: Saber <11769524+hawkli-1994@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
The /api/langgraph/* endpoints proxy straight to the LangGraph server,
so clients inherit LangGraph's native recursion_limit default of 25
instead of the 100 that build_run_config sets for the Gateway and IM
channel paths. 25 is too low for plan-mode or subagent runs and
reliably triggers GraphRecursionError on the lead agent's final
synthesis step after subagents return.
Set recursion_limit: 100 in the Create Run example and the cURL
snippet, and add a short note explaining the discrepancy so users
following the docs don't hit the 25-step ceiling as a surprise.
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Rename BACKEND_TODO.md to TODO.md in documentation
* Update MCP Setup Guide link in CONTRIBUTING.md
* Update reference to config.yaml path in documentation
* Fix config file path in TITLE_GENERATION_IMPLEMENTATION.md
Updated the path to the example config file in the documentation.
* feat(agent): 为AgentConfig添加skills字段并更新lead_agent系统提示
在AgentConfig中添加skills字段以支持配置agent可用技能
更新lead_agent的系统提示模板以包含可用技能信息
* fix: resolve agent skill configuration edge cases and add tests
* Update backend/packages/harness/deerflow/agents/lead_agent/prompt.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* refactor(agent): address PR review comments for skills configuration
- Add detailed docstring to `skills` field in `AgentConfig` to clarify the semantics of `None` vs `[]`.
- Add unit tests in `test_custom_agent.py` to verify `load_agent_config()` correctly parses omitted skills and explicit empty lists.
- Fix `test_make_lead_agent_empty_skills_passed_correctly` to include `agent_name` in the runtime config, ensuring it exercises the real code path.
* docs: 添加关于按代理过滤技能的配置说明
在配置示例文件和文档中添加说明,解释如何通过代理的config.yaml文件限制加载的技能
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* Fix path for TitleMiddleware implementation
* Fix link to Provisioner Setup Guide in CONFIGURATION.md
* Update file path for TitleMiddleware implementation
* Update image paths in Leica photography article
* fix(security): disable host bash by default in local sandbox
* fix(security): address review feedback for local bash hardening
* fix(ci): sort live test imports for lint
* style: apply backend formatter
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* feat: Add github PAT configs, allowing larger github API rates.
* Update comment to English for better clarity
* fix: Remove unused config lines in config.example.yaml and unreferenced declarations in app_config. Fix lint issues and update documentation.
* fix: Remove unused imports, and passed the ruff check.
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* fix(LLM): fixing Gemini thinking + tool calls via OpenAI gateway (#1180)
When using Gemini with thinking enabled through an OpenAI-compatible gateway,
the API requires that fields on thinking content blocks are
preserved and echoed back verbatim in subsequent requests. Standard
silently drops these signatures when serializing
messages, causing HTTP 400 errors:
Changes:
- Add PatchedChatOpenAI adapter that re-injects signed thinking blocks into
request payloads, preserving the signature chain across multi-turn
conversations with tool calls.
- Support two LangChain storage patterns: additional_kwargs.thinking_blocks
and content list.
- Add 11 unit tests covering signed/unsigned blocks, storage patterns, edge
cases, and precedence rules.
- Update config.example.yaml with Gemini + thinking gateway example.
- Update CONFIGURATION.md with detailed guidance and error explanation.
Fixes: #1180
* Updated the patched_openai.py with thought_signature of function call
* Apply suggestions from code review
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* docs: fix inaccurate thought_signature description in CONFIGURATION.md (#1220)
* Initial plan
* docs: fix CONFIGURATION.md wording for thought_signature - tool-call objects, not thinking blocks
Co-authored-by: WillemJiang <219644+WillemJiang@users.noreply.github.com>
Agent-Logs-Url: https://github.com/bytedance/deer-flow/sessions/360f5226-4631-48a7-a050-189094af8ffe
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: WillemJiang <219644+WillemJiang@users.noreply.github.com>
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
* refactor: extract shared skill installer and upload manager to harness
Move duplicated business logic from Gateway routers and Client into
shared harness modules, eliminating code duplication.
New shared modules:
- deerflow.skills.installer: 6 functions (zip security, extraction, install)
- deerflow.uploads.manager: 7 functions (normalize, deduplicate, validate,
list, delete, get_uploads_dir, ensure_uploads_dir)
Key improvements:
- SkillAlreadyExistsError replaces stringly-typed 409 status routing
- normalize_filename rejects backslash-containing filenames
- Read paths (list/delete) no longer mkdir via get_uploads_dir
- Write paths use ensure_uploads_dir for explicit directory creation
- list_files_in_dir does stat inside scandir context (no re-stat)
- install_skill_from_archive uses single is_file() check (one syscall)
- Fix agent config key not reset on update_mcp_config/update_skill
Tests: 42 new (22 installer + 20 upload manager) + client hardening
* refactor: centralize upload URL construction and clean up installer
- Extract upload_virtual_path(), upload_artifact_url(), enrich_file_listing()
into shared manager.py, eliminating 6 duplicated URL constructions across
Gateway router and Client
- Derive all upload URLs from VIRTUAL_PATH_PREFIX constant instead of
hardcoded "mnt/user-data/uploads" strings
- Eliminate TOCTOU pre-checks and double file read in installer — single
ZipFile() open with exception handling replaces is_file() + is_zipfile()
+ ZipFile() sequence
- Add missing re-exports: ensure_uploads_dir in uploads/__init__.py,
SkillAlreadyExistsError in skills/__init__.py
- Remove redundant .lower() on already-lowercase CONVERTIBLE_EXTENSIONS
- Hoist sandbox_uploads_dir(thread_id) before loop in uploads router
* fix: add input validation for thread_id and filename length
- Reject thread_id containing unsafe filesystem characters (only allow
alphanumeric, hyphens, underscores, dots) — prevents 500 on inputs
like <script> or shell metacharacters
- Reject filenames longer than 255 bytes (OS limit) in normalize_filename
- Gateway upload router maps ValueError to 400 for invalid thread_id
* fix: address PR review — symlink safety, input validation coverage, error ordering
- list_files_in_dir: use follow_symlinks=False to prevent symlink metadata
leakage; check is_dir() instead of exists() for non-directory paths
- install_skill_from_archive: restore is_file() pre-check before extension
validation so error messages match the documented exception contract
- validate_thread_id: move from ensure_uploads_dir to get_uploads_dir so
all entry points (upload/list/delete) are protected
- delete_uploaded_file: catch ValueError from thread_id validation (was 500)
- requires_llm marker: also skip when OPENAI_API_KEY is unset
- e2e fixture: update TitleMiddleware exclusion comment (kept filtering —
middleware triggers extra LLM calls that add non-determinism to tests)
* chore: revert uv.lock to main — no dependency changes in this PR
* fix: use monkeypatch for global config in e2e fixture to prevent test pollution
The e2e_env fixture was calling set_title_config() and
set_summarization_config() directly, which mutated global singletons
without automatic cleanup. When pytest ran test_client_e2e.py before
test_title_middleware_core_logic.py, the leaked enabled=False caused
5 title tests to fail in CI.
Switched to monkeypatch.setattr on the module-level private variables
so pytest restores the originals after each test.
* fix: address code review — URL encoding, API consistency, test isolation
- upload_artifact_url: percent-encode filename to handle spaces/#/?
- deduplicate_filename: mutate seen set in place (caller no longer
needs manual .add() — less error-prone API)
- list_files_in_dir: document that size is int, enrich stringifies
- e2e fixture: monkeypatch _app_config instead of set_app_config()
to prevent global singleton pollution (same pattern as title/summarization fix)
- _make_e2e_config: read LLM connection details from env vars so
external contributors can override defaults
- Update tests to match new deduplicate_filename contract
* docs: rewrite RFC in English and add alternatives/breaking changes sections
* fix: address code review feedback on PR #1202
- Rename deduplicate_filename to claim_unique_filename to make
the in-place set mutation explicit in the function name
- Replace PermissionError with PathTraversalError(ValueError) for
path traversal detection — malformed input is 400, not 403
* fix: set _app_config_is_custom in e2e test fixture to prevent config.yaml lookup in CI
---------
Co-authored-by: greatmengqi <chenmengqi.0376@bytedance.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: DanielWalnut <45447813+hetaoBackend@users.noreply.github.com>
* fix(threads): clean up local thread data after thread deletion
Delete DeerFlow-managed thread directories after the web UI removes a LangGraph thread.
This keeps local thread data in sync with conversation deletion and adds regression coverage for the cleanup flow.
* fix(threads): address thread cleanup review feedback
Encode thread cleanup URLs in the web client, keep cache updates explicit when no thread search data is cached, and return a generic 500 response from the cleanup endpoint while documenting the sanitized error behavior.
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Add GuardrailMiddleware that evaluates every tool call before execution.
Three provider options: built-in AllowlistProvider (zero deps), OAP passport
providers (open standard), or custom providers loaded by class path.
- GuardrailProvider protocol with GuardrailRequest/Decision dataclasses
- GuardrailMiddleware (AgentMiddleware, position 5 in chain)
- AllowlistProvider for simple deny/allow by tool name
- GuardrailsConfig (Pydantic singleton, loaded from config.yaml)
- 25 tests covering allow/deny, fail-closed/open, async, GraphBubbleUp
- Comprehensive docs at backend/docs/GUARDRAILS.md
Closes#1213
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* feat: add Claude Code OAuth and Codex CLI providers
Port of bytedance/deer-flow#1136 from @solanian's feat/cli-oauth-providers branch.\n\nCarries the feature forward on top of current main without the original CLA-blocked commit metadata, while preserving attribution in the commit message for review.
* fix: harden CLI credential loading
Align Codex auth loading with the current ~/.codex/auth.json shape, make Docker credential mounts directory-based to avoid broken file binds on hosts without exported credential files, and add focused loader tests.
* refactor: tighten codex auth typing
Replace the temporary Any return type in CodexChatModel._load_codex_auth with the concrete CodexCliCredential type after the credential loader was stabilized.
* fix: load Claude Code OAuth from Keychain
Match Claude Code's macOS storage strategy more closely by checking the Keychain-backed credentials store before falling back to ~/.claude/.credentials.json. Keep explicit file overrides and add focused tests for the Keychain path.
* fix: require explicit Claude OAuth handoff
* style: format thread hooks reasoning request
* docs: document CLI-backed auth providers
* fix: address provider review feedback
* fix: harden provider edge cases
* Fix deferred tools, Codex message normalization, and local sandbox paths
* chore: narrow PR scope to OAuth providers
* chore: remove unrelated frontend changes
* chore: reapply OAuth branch frontend scope cleanup
* fix: preserve upload guards with reasoning effort wiring
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* fix(harness): normalize structured content for titles
Flatten structured LangChain message content before prompting the title model so list/block payloads don't leak Python reprs into generated thread titles.
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
* Apply suggestions from code review
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
---------
Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
* refactor: extract shared utils to break harness→app cross-layer imports
Move _validate_skill_frontmatter to src/skills/validation.py and
CONVERTIBLE_EXTENSIONS + convert_file_to_markdown to src/utils/file_conversion.py.
This eliminates the two reverse dependencies from client.py (harness layer)
into gateway/routers/ (app layer), preparing for the harness/app package split.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* refactor: split backend/src into harness (deerflow.*) and app (app.*)
Physically split the monolithic backend/src/ package into two layers:
- **Harness** (`packages/harness/deerflow/`): publishable agent framework
package with import prefix `deerflow.*`. Contains agents, sandbox, tools,
models, MCP, skills, config, and all core infrastructure.
- **App** (`app/`): unpublished application code with import prefix `app.*`.
Contains gateway (FastAPI REST API) and channels (IM integrations).
Key changes:
- Move 13 harness modules to packages/harness/deerflow/ via git mv
- Move gateway + channels to app/ via git mv
- Rename all imports: src.* → deerflow.* (harness) / app.* (app layer)
- Set up uv workspace with deerflow-harness as workspace member
- Update langgraph.json, config.example.yaml, all scripts, Docker files
- Add build-system (hatchling) to harness pyproject.toml
- Add PYTHONPATH=. to gateway startup commands for app.* resolution
- Update ruff.toml with known-first-party for import sorting
- Update all documentation to reflect new directory structure
Boundary rule enforced: harness code never imports from app.
All 429 tests pass. Lint clean.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* chore: add harness→app boundary check test and update docs
Add test_harness_boundary.py that scans all Python files in
packages/harness/deerflow/ and fails if any `from app.*` or
`import app.*` statement is found. This enforces the architectural
rule that the harness layer never depends on the app layer.
Update CLAUDE.md to document the harness/app split architecture,
import conventions, and the boundary enforcement test.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: add config versioning with auto-upgrade on startup
When config.example.yaml schema changes, developers' local config.yaml
files can silently become outdated. This adds a config_version field and
auto-upgrade mechanism so breaking changes (like src.* → deerflow.*
renames) are applied automatically before services start.
- Add config_version: 1 to config.example.yaml
- Add startup version check warning in AppConfig.from_file()
- Add scripts/config-upgrade.sh with migration registry for value replacements
- Add `make config-upgrade` target
- Auto-run config-upgrade in serve.sh and start-daemon.sh before starting services
- Add config error hints in service failure messages
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix comments
* fix: update src.* import in test_sandbox_tools_security to deerflow.*
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: handle empty config and search parent dirs for config.example.yaml
Address Copilot review comments on PR #1131:
- Guard against yaml.safe_load() returning None for empty config files
- Search parent directories for config.example.yaml instead of only
looking next to config.yaml, fixing detection in common setups
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: correct skills root path depth and config_version type coercion
- loader.py: fix get_skills_root_path() to use 5 parent levels (was 3)
after harness split, file lives at packages/harness/deerflow/skills/
so parent×3 resolved to backend/packages/harness/ instead of backend/
- app_config.py: coerce config_version to int() before comparison in
_check_config_version() to prevent TypeError when YAML stores value
as string (e.g. config_version: "1")
- tests: add regression tests for both fixes
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix: update test imports from src.* to deerflow.*/app.* after harness refactor
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* Add MiniMax as an OpenAI-compatible model provider
MiniMax offers high-performance LLMs (M2.5, M2.5-highspeed) with
204K context windows. This commit adds MiniMax as a selectable
provider in the configuration system.
Changes:
- Add MiniMax to SUPPORTED_MODELS with model definitions
- Add MiniMax provider configuration in conf/config.yaml
- Update documentation with MiniMax setup instructions
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Update README to remove MiniMax API details
Removed mention of MiniMax API usage and configuration examples.
---------
Co-authored-by: octo-patch <octo-patch@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* feat: add IM channels system for Feishu, Slack, and Telegram integration
Bridge external messaging platforms to DeerFlow via LangGraph Server with
async message bus, thread management, and per-channel configuration.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: address review comments on IM channels system
Fix topic_id handling in store remove/list_entries and manager commands,
correct Telegram reply threading, remove unused imports/variables, update
docstrings and docs to match implementation, and prevent config mutation.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* update skill creator
* fix im reply text
* fix comments
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
add oauth schema to MCP server config (extensions_config.json)
support client_credentials and refresh_token grants
implement token manager with caching and pre-expiry refresh
inject OAuth Authorization header for MCP tool discovery and tool calls
extend MCP gateway config models to read/write OAuth settings
update docs and examples for OAuth configuration
add unit tests for token fetch/cache and header injection
* feat: add Novita AI as optional LLM provider
Adds Novita AI (https://novita.ai) as an optional, OpenAI-compatible
LLM provider.
Changes:
- Added Novita model configuration example in config.example.yaml
- Added NOVITA_API_KEY to .env.example
Usage: Set NOVITA_API_KEY in your environment and use novita-gpt-4
as the model name.
* update correct model info
* Update README.md
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Add accurate token counting using tiktoken library and significantly enhance
memory update prompts with detailed section guidelines, multilingual support,
and improved fact extraction. Update deep-research skill to be more proactive
for research queries.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add native Apple Container support for better performance on macOS while
maintaining full Docker compatibility. Enhance documentation with memory system
details, development guidelines, and sandbox setup instructions. Improve dev
experience with container image pre-pulling and unified cleanup tools.
Key changes:
- Auto-detect and prefer Apple Container on macOS with Docker fallback
- Add APPLE_CONTAINER.md with complete usage and troubleshooting guide
- Document memory system architecture in CLAUDE.md
- Add make setup-sandbox for pre-pulling container images
- Create cleanup-containers.sh for cross-runtime container cleanup
- Update all related documentation (README, SETUP, config examples)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Add README.md with project overview, quick start, and API reference
- Add CONTRIBUTING.md with development setup and contribution guidelines
- Add docs/ARCHITECTURE.md with detailed system architecture diagrams
- Add docs/API.md with complete API reference for LangGraph and Gateway
- Add docs/README.md as documentation index
- Update CLAUDE.md with improved structure and new features
- Update docs/TODO.md to reflect current status
- Update pyproject.toml description
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Implement a skills framework that enables specialized workflows for
specific tasks (e.g., PDF processing, web page generation). Skills are
discovered from the skills/ directory and automatically mounted in
sandboxes with path mapping support.
- Add SkillsConfig for configuring skills path and container mount point
- Implement dynamic skill loading from SKILL.md files with YAML frontmatter
- Add path mapping in LocalSandbox to translate container paths to local paths
- Mount skills directory in AIO Docker sandbox containers
- Update lead agent prompt to dynamically inject available skills
- Add setup documentation and expand config.example.yaml
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
- Add AioSandboxProvider for Docker-based sandbox execution with
configurable container lifecycle, volume mounts, and port management
- Add TitleMiddleware to auto-generate thread titles after first
user-assistant exchange using LLM
- Add Claude Code documentation (CLAUDE.md, AGENTS.md)
- Extend SandboxConfig with Docker-specific options (image, port, mounts)
- Fix hardcoded mount path to use expanduser
- Add agent-sandbox and dotenv dependencies
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>