302 Commits

Author SHA1 Message Date
ly-wang19
8db16bb3d8
fix(config): coerce null config.yaml list sections to empty list (#3434)
Copying config.example.yaml to config.yaml and starting DeerFlow crashed with `pydantic ValidationError: models — Input should be a valid list [input_value=None]`, because the example ships every entry under `models:` commented out, so PyYAML parses the key as null. Reported in #1444.

Add a field_validator(mode="before") on AppConfig that coerces null models/tools/tool_groups to [] (matching their default_factory=list), and emit an actionable warning from from_file when no models are configured (pointing to config.example.yaml / make setup). Adds regression tests.

Closes #1444

Co-authored-by: ly-wang19 <ly-wang19@users.noreply.github.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
2026-06-09 15:45:28 +08:00
AochenShen99
0fb18e368c
refactor(lead-agent): make build_middlewares public to drop the last cross-module private import (#3458)
`client.py` imported the private `_build_middlewares` from `agent.py` across a
module boundary and called it as public API. Because the `_` name signals
"module-private, no external callers", any future rename or signature change
silently breaks the embedded `DeerFlowClient` path — and the test suite even
monkeypatched `deerflow.client._build_middlewares`, baking the leak in.

`DeerFlowClient` is a lead-agent variant that genuinely needs the lead agent's
full middleware composition, so make the dependency honest: promote the helper
to a documented public entry point `build_middlewares` and update every in-repo
caller. Found during #3341 review; #3341 already removed one such leak
(`_assemble_deferred` -> public `assemble_deferred_tools`) and left this one out
of scope on purpose.

- agent.py: rename def + both internal call sites; expand the docstring into a
  public-entry-point contract and document the previously-undocumented
  model_name / app_config / deferred_setup params
- client.py: import + call site now use the public name (removes the last
  cross-module private import)
- scripts/tool-error-degradation-detection.sh: update its import + call site
- tests (5 files): update monkeypatch/patch targets and direct calls
- docs (backend/CLAUDE.md, plan_mode_usage.md, middlewares.mdx): sync the live
  references that describe the symbol as current API

Pure mechanical rename, no behavior change. Historical design docs (rfc,
superpowers spec) intentionally keep the old name as point-in-time records.

Closes #3431
2026-06-09 11:56:28 +08:00
Ryker_Feng
f92a26d56f
fix(web_fetch): support proxy for Jina reader in restricted networks (#3418) (#3430)
* fix(web_fetch): support proxy for Jina reader in restricted networks

The web_fetch tool built a bare httpx.AsyncClient() with no proxy
awareness, so users behind a corporate proxy / in Docker / WSL could
not reach https://r.jina.ai and web_fetch timed out.

- Add optional `proxy` / `trust_env` params to JinaClient.crawl and
  wire them from the `web_fetch` tool config (with type coercion for
  YAML string values).
- Pass internal service hostnames through NO_PROXY in both compose
  files so proxy env inherited via env_file does not break in-cluster
  calls (gateway/provisioner/etc).
- Load proxy vars from .env into the shell in scripts/docker.sh so the
  NO_PROXY interpolation can merge user-provided values on `make` path.
- Document proxy/trust_env options in config.example.yaml.

Closes #3418

* Potential fix for pull request finding

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
2026-06-08 23:25:29 +08:00
AochenShen99
3b6dd0a4e3
feat(subagents): extend deferred MCP tool loading to subagents (#3432)
* feat(subagents): extend deferred MCP tool loading to subagents (#3341)

Subagents now reuse the lead agent's deferred-tool path: when
tool_search.enabled, MCP tool schemas are withheld from the model and
surfaced by name in <available-deferred-tools>, fetched on demand via the
generated tool_search helper. DeferredToolFilterMiddleware deterministically
rewrites request.tools to hide the deferred schemas (the prompt section is
discovery only, not enforcement).

Consolidates the assembly into deerflow.tools.builtins.tool_search, now the
single home for both assemble_deferred_tools (centralized fail-closed guard,
replacing the lead-only private _assemble_deferred) and the relocated
get_deferred_tools_prompt_section. Shared by every build path: lead agent,
embedded client, and subagent executor.

tool_search is appended after the subagent's name-level tool policy and is
treated as infrastructure: its catalog is built from the already
policy-filtered list, so it can never surface a tool the policy denied.

Follow-up to #3370. Fixes #3341.

* test(subagents): assert the real middleware builder emits a working deferred filter (#3341)

The existing recipe test hand-constructs DeferredToolFilterMiddleware, so it
cannot catch a regression in how build_subagent_runtime_middlewares (the call
executor._create_agent actually makes) wires the deferred setup into the
filter. Add a test that sources the filter from the real builder given a real
setup and runs it through a graph: a wrong catalog hash would silently stop
promotion, a dropped filter would stop hiding — both now caught.

Running the full real middleware stack is intentionally avoided (the other
runtime middlewares need sandbox/thread infra to execute, which would make the
test flaky); their attachment + ordering before Safety stays locked in
test_tool_error_handling_middleware.py.

* test(subagents): keep executor tests config-free in CI

* chore: trigger ci

* Potential fix for pull request finding

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
2026-06-08 23:17:22 +08:00
DanielWalnut
cd5bedaa74
feat: MiniMax provider for image/video/podcast skills + new music-generation skill (#3437)
* docs(spec): MiniMax integration for generation skills + new music skill

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* docs(plan): MiniMax generation providers implementation plan

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* test(skills): add importlib loader + FakeResp for skill tests

* test(skills): register loaded module in sys.modules; raise requests.HTTPError in FakeResp

* feat(image-generation): add MiniMax provider with env auto-detect

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* refactor(image-generation): guard unknown provider, derive ref MIME, strengthen tests

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(video-generation): add MiniMax provider with async poll/download

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* refactor(video-generation): surface base_resp errors while polling; add timeout test

* feat(podcast-generation): add MiniMax t2a_v2 provider with env auto-detect

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* refactor(podcast-generation): restore TTS credential guard; add volcengine + voice tests

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(music-generation): new MiniMax music skill via skill-creator

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* refactor(music-generation): treat empty lyrics as absent; test no-audio-data path

* refactor(skills): add request timeouts to MiniMax network calls

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Potential fix for pull request finding 'Explicit returns mixed with implicit (fall through) returns'

Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>

* fix(models): strip inconsistent user-message names for MiniMax chat

DeerFlow middlewares tag user messages with provenance names (user-input, summary, loop_warning); langchain serializes them into the OpenAI-compatible payload and MiniMax rejects mismatched user-message names with "user name must be consistent (2013)". PatchedChatMiniMax now drops the per-message name from user-role messages. Point the config.example MiniMax models at PatchedChatMiniMax so they also get reasoning_content mapping.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(image-generation): MiniMax sends JSON prompt field, guard 1500-char limit

MiniMax image-01 takes one text string capped at 1500 chars, but the skill was sending the whole structured JSON. The MiniMax provider now extracts the JSON `prompt` field (relying on prompt_optimizer to expand it) and fails fast with a clear error before calling the API when that field exceeds 1500 chars. Authoring stays provider-agnostic; Gemini still receives the full JSON.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(podcast-generation): per-provider TTS concurrency and retry/backoff

Each TTS provider owns its concurrency internally — MiniMax runs single-threaded to reduce rate-limit failures, Volcengine keeps 4 workers — with automatic retry and backoff on transient HTTP and base_resp errors. No caller-facing concurrency knob.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(skills): address Copilot review comments on generation skills

- video: add raise_for_status + timeout to the Gemini download/POST/poll calls so non-2xx responses surface as clear HTTP errors instead of JSON/KeyError or hangs
- video: check the task Fail status before the generic base_resp check so the failure keeps its task_id context
- video/image: create the output file parent directory before writing (matching music-generation) so nested output paths do not raise FileNotFoundError
- music: require a non-empty prompt and fail fast with ValueError instead of sending an empty prompt to the API

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(scripts): reclaim dev ports across worktrees in make stop/dev

All deer-flow worktrees (main checkout + linked worktrees) hardcode the same dev ports (8001/3000/2026), so a service started from any worktree must be reclaimable from another. stop_all now resolves the set of worktree roots (DEERFLOW_ROOTS) and treats a process as deer-flow-owned when its open files live under any of them. It also force-kills survivors on 2026 alongside 8001/3000, fixing `make dev` aborting on the nginx port preflight when a prior nginx lingered on 2026.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(view-image): hide the injected image-context message from the UI

ViewImageMiddleware injects a HumanMessage (text + base64 images) so the vision model can see viewed images, but it was the only internal injector that set neither hide_from_ui nor a hidden name, so it leaked into the chat UI (and IM channels) as a user bubble reading "Here are the images you've viewed:". Mark it with additional_kwargs={"hide_from_ui": True}, matching todo/dynamic_context injections, which the frontend isHiddenFromUIMessage and the channel sender already honor. The model still receives the full content.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(minimax): mark M2.7 models as text-only (no vision)

MiniMax M2.7 / M2.7-highspeed do not support vision; only M3 does. The
provider config asserted vision support for M2.7 in four places.

- config.example.yaml: 4 M2.7 entries -> supports_vision: false
- backend/docs/CONFIGURATION.md: M2.7 + highspeed -> supports_vision: false
- wizard: add LLMProvider.model_vision_overrides + extra_config_for() so
  selecting an M2.7 model writes supports_vision: false while M3 (default)
  keeps vision; wire it through setup_wizard.py
- tests: M2.7-highspeed fixture -> supports_vision=False; add
  test_minimax_vision_is_per_model

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
2026-06-08 22:04:38 +08:00
Huixin615
64d923b0fd
fix(middleware): externalize oversized tool output into sandbox for non-mounted sandboxes (#3417)
* fix(middleware): externalize oversized tool output into sandbox for non-mounted sandboxes

ToolOutputBudgetMiddleware persisted oversized tool results to the host
filesystem and returned a /mnt/user-data/outputs virtual path. For sandboxes
that do not use thread-data mounts (e.g. remote AIO sandbox), that virtual
path does not exist inside the sandbox, so the model's read_file tool could
not read it back and reported 'file not found'.

Branch on SandboxProvider.uses_thread_data_mounts:

- Mounted sandboxes (local Docker, AIO + LocalContainerBackend) keep the
  original host-disk path; the host outputs dir is bind-mounted to the same
  virtual path inside the sandbox, so behavior is unchanged.

- Non-mounted (remote) sandboxes externalize into the sandbox itself via
  execute_command('mkdir -p ...') + write_file + 'test -s' validation. The
  validation step is required because AIO sandbox execute_command returns
  'Error: ...' as a string on failure instead of raising, so a silent mkdir
  failure would otherwise leak through.

Any failure (rejected subdir, mkdir/write/validate error) falls back to the
existing inline head+tail truncation, so an unreadable path is never returned
to the model.

The sandbox resolver reads the sandbox_id that SandboxMiddleware already
writes into runtime.state['sandbox']; it never calls provider.acquire(),
keeping the tool-call hot path free of blocking I/O. Tools that do not use a
sandbox (web_search, MCP, ...) resolve to None and fall through to inline
truncation, which is the safe behavior for them.

Fixes #3416

* fix(middleware): address Copilot review feedback on sandbox externalization

- Make get_sandbox_provider() lookup best-effort in _budget_content: only
  query when outputs_path or sandbox is available, and fall back to inline
  truncation if provider initialization raises rather than propagating
  the error. A resolved sandbox instance is sufficient on its own to take
  the non-mounted externalization branch.
- Strict-match the sandbox post-write validation echo
  (check.strip() == 'OK') to avoid false positives if execute_command
  ever surfaces unrelated stdout/stderr containing 'OK' as a substring.

Refs: #3417

* test: fix flaky tests relying on /nonexistent/... path under container root

Two tests in this module (test_returns_none_on_invalid_path and
test_fallback_when_disk_write_fails) used paths like
'/nonexistent/impossible/path' to trigger _externalize's OSError
fallback. These paths are creatable when the test process runs as root
inside the CI container: os.makedirs(..., exist_ok=True) successfully
creates the entire chain under /, so the OSError branch is never hit
and the tests fail. Reproducible on main independently of this PR.

Switch to '/dev/null/cannot-mkdir-here'. /dev/null is a character
device on both Linux and macOS, so os.makedirs always fails with
NotADirectoryError regardless of privileges, reliably exercising the
OSError fallback.

* fix(tool-output-budget): only consult sandbox provider when a sandbox is resolved

The previous revision called get_sandbox_provider() whenever externalization
was triggered, including on the legacy host-disk path. Environments without
a configured sandbox -- in particular CI runners without a config.yaml --
would raise FileNotFoundError there, get caught, and silently fall back to
inline truncation. That defeated the host-disk externalization path that
predates this PR and was the root cause of the regressing legacy tests.

Restructure the branching so the provider is only consulted when a sandbox
has actually been resolved for the current tool call:

  - sandbox resolved + provider.uses_thread_data_mounts: host-disk write
    (bind-mounted into the sandbox, equivalent to a sandbox-side write).
  - sandbox resolved + non-mounted provider:             sandbox write (#3416).
  - no sandbox + outputs_path:                           host-disk write
    (legacy / non-sandbox tools, no provider call at all).
  - otherwise:                                           inline fallback.

No test changes; the legacy externalization tests are provider-agnostic by
construction and now pass without monkeypatching.

Refs: #3416

* test(tool-output-budget): assert legacy path does not call sandbox provider

Lock in the contract introduced by d6e2d25b: when no sandbox is resolved
for a tool call, _budget_content must externalize to the host outputs
directory without consulting get_sandbox_provider(). Regressing this would
re-break legacy / non-sandbox tools in environments without a configured
sandbox (e.g. CI without config.yaml), which is the failure mode #3416's
fix avoids.

The test injects a get_sandbox_provider that raises on call, so any
future refactor that moves the provider lookup out of the sandbox-only
branch will fail loudly.

Refs: #3416
2026-06-08 12:24:48 +08:00
Willem Jiang
519200728a
fix(middleware): offload memory injection off event loop to prevent tiktoken blocking (#3402) (#3411)
* fix(middleware): offload memory injection off event loop to prevent tiktoken blocking (#3402)

  DynamicContextMiddleware.abefore_agent() called _inject() synchronously
  on the asyncio event loop.  The first time memory is injected (second
  request), _inject() → format_memory_for_injection() → _count_tokens()
  → tiktoken.get_encoding("cl100k_base") needs to download the BPE data
  from openaipublic.blob.core.windows.net.  In network-restricted
  environments this download blocks until the OS TCP timeout (~26 min),
  starving ALL concurrent handlers including /api/v1/auth/me.

  Fix:
  - abefore_agent now uses asyncio.to_thread(self._inject, state) so
    file I/O and tiktoken never block the event loop.
  - Extract _get_tiktoken_encoding() with a module-level cache so
    tiktoken.get_encoding() is called at most once per encoding name.
  - Add warm_tiktoken_cache() startup helper; gateway lifespan pre-warms
    the cache via asyncio.to_thread so the first request never triggers a
    cold download.
  - _count_tokens falls back to len(text) // 4 on any encoding failure.

  Tests:
  - tests/test_tiktoken_cache_and_count_tokens.py (12 tests): cache
    hit/miss, fallback paths, warm-up helper.
  - tests/blocking_io/test_dynamic_context_middleware.py (2 tests):
    Blockbuster gate verifies abefore_agent does not block the event
    loop; async/sync parity check.

  Fixes #3402

* Apply suggestions from code review

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

* fix the lint error

* fix(memory): use future annotations to avoid NameError when tiktoken is absent

Add `from __future__ import annotations` to prompt.py so that
tiktoken.Encoding type hints are never evaluated at runtime.  Without
this, environments where tiktoken is not installed could raise NameError
on the module-level cache and function return annotations.

Addresses Copilot review comment on PR #3411.

* fix(middleware): bound abefore_agent injection with timeout to prevent hung requests

Wrap the asyncio.to_thread(self._inject) offload in asyncio.wait_for()
with a 5-second cap.  If the startup warm-up failed silently (e.g.
network blip during deploy), a cold tiktoken BPE download on the first
request can block until the OS TCP timeout (~26 min).  The bounded
timeout ensures the request degrades gracefully (no memory/date context
for that turn) rather than hanging.

Adds test_abefore_agent_returns_none_on_timeout to the blocking-IO
regression anchors.

Addresses review feedback from xg-gh-25 on PR #3411.

---------

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
2026-06-08 12:21:55 +08:00
Nan Gao
f725a963d5
fix(runtime): protect sync singleton init and reset (#3413)
* fix(runtime): protect sync singleton init/reset with threading.Lock

* fix(runtime): serialize sync singleton init and reset

* make format

* test(runtime): assert store reset creates new singleton

* Apply suggestions from code review

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

* fix(runtime): load config outside singleton locks

* fix(runtime): share checkpointer config loading helper

---------

Co-authored-by: GODDiao <diaoshengjia@gmail.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
2026-06-08 08:38:36 +08:00
Nan Gao
10c1d9f417
fix(search): fix DDGS Wikipedia region handling (#3423) 2026-06-08 07:59:50 +08:00
Xinmin Zeng
8d2e55a05f
fix(subagent): structured subagent_status field over text parsing (#3146) (#3154)
* fix(subagent): structured subagent_status field over text parsing

Closes #3146.

## Why

The frontend used to derive subtask card state by string-matching the
leading text of the `task` tool's result. That contract surface was
fragile — `#3107` BUG-007 and the `#3131` review both surfaced cases
where new backend wording (`Task cancelled by user.`,
`Task polling timed out after N minutes`, `ToolErrorHandlingMiddleware`
exception wrappers) silently broke the card lifecycle. The frontend
fallback kept growing more prefixes; any future rewording would break
it again.

## Design

1. **Backend → frontend contract**: `ToolMessage.additional_kwargs`
   carries `subagent_status` (one of `completed | failed | cancelled |
   timed_out | polling_timed_out`) and an optional `subagent_error`
   blob. The frontend prefers it over parsing `content`.

2. **Centralised stamping, not 8 sprinkled stamps**: rather than have
   each of `task_tool.py`'s 5 normal-return + 3 pre-execution `Error:`
   paths remember to set `additional_kwargs`, `ToolErrorHandlingMiddleware`
   stamps the field after every task-tool call. Adding a new return
   path in `task_tool.py` cannot now skip the stamp.

3. **Cross-language contract fixture**: the prefix→status mapping is
   the one piece both sides must agree on. The shared fixture at
   `contracts/subagent_status_contract.json` lists every backend return
   string, the expected status, and what the error substring should
   contain. Backend test (`backend/tests/test_subagent_status_contract.py`)
   and frontend test (`frontend/tests/unit/core/tasks/subtask-result.test.ts`)
   both load that fixture and assert the same cases. A wording drift on
   either side fails the matching language's test.

4. **Round-trip serialisation pinned**: the round-trip test asserts
   `ToolMessage.model_dump_json()` → `model_validate_json()` preserves
   `additional_kwargs.subagent_status`. Catches the case where a future
   LangChain or Pydantic upgrade silently strips unknown kwargs.

5. **Frontend status collapse documented**: the backend has five status
   values, the frontend card has three (`completed | failed |
   in_progress`). `cancelled` / `timed_out` / `polling_timed_out` all
   collapse to `failed` with the original status preserved in `error`.
   `parseSubtaskResult` returns `in_progress` for unknown values so a
   backend that ships a new enum variant before the frontend upgrades
   degrades to the legacy prefix fallback instead of getting pinned.

## Changes

Backend:
- `deerflow.subagents.status_contract` — new module exporting
  `SUBAGENT_STATUS_KEY`, `SUBAGENT_ERROR_KEY`,
  `SUBAGENT_STATUS_VALUES`, `extract_subagent_status(content)`, and
  `make_subagent_additional_kwargs(status, error)`.
- `ToolErrorHandlingMiddleware`: new `_stamp_task_subagent_status`
  helper centralises the stamp; `wrap_tool_call` / `awrap_tool_call`
  stamp on the success path; `_build_error_message` stamps on the
  wrapper path (carrying `ExcClass: detail` into `subagent_error`).
  Non-task tools are untouched.
- New tests: `test_subagent_status_contract.py` (19 cases from the
  shared fixture + status-enum / blank-error / unknown-status
  rejection) and `test_tool_error_handling_subagent_stamp.py`
  (middleware integration: terminal-content stamps, non-terminal
  doesn't, non-task tools untouched, async path mirrors sync,
  existing additional_kwargs survive, JSON round-trip preserved).

Frontend:
- `parseSubtaskResult(text, additionalKwargs?)` — prefers the
  structured stamp; falls back to the legacy prefix matcher for
  historical threads / unknown future status values.
- `STRUCTURED_STATUS_TO_SUBTASK` documents the five→three collapse.
- `message-list.tsx` passes `message.additional_kwargs` through.
- `subtask-result.test.ts` adds a structured-status block + a
  fixture-driven contract block; legacy prefix tests stay green for
  the fallback path.

Contract:
- `contracts/subagent_status_contract.json` — single source of truth
  both languages load. Whitespace variants, varied N for polling
  timeouts, the 3 pre-execution `Error:` returns task_tool produces,
  and the middleware wrapper shape are all in there.

## Test plan
- `make lint` clean (backend + frontend).
- `pytest tests/test_subagent_status_contract.py
   tests/test_tool_error_handling_subagent_stamp.py` → 37 passed.
- `pnpm test --run` → 103 passed (was 76, +27 new).

## Migration / fallback retirement

The text-prefix fallback stays in place until backend telemetry shows
the frontend never hits it for newly produced messages. At that point
a follow-up PR can drop the prefix branches and keep only the
structured-status branch.

Refs: bytedance/deer-flow#3138 (split summary), #3107 (origin), #3131
(prior prefix-only fix), #3146 (this issue).

* fix(subtask): back-fill result/error from text when structured status present

Three follow-ups on the PR #3154 review:

1. `readStructuredStatus` no longer short-circuits the prefix parse.
   The backend currently stamps only the `subagent_status` enum value;
   the human-facing `result` body and wrapped-error message still live
   in `ToolMessage.content`. Dropping the text parse meant successful
   tasks rendered empty completed pills and wrapped failures lost their
   diagnostic. Now both shapes get composed: structured status wins,
   `result`/`error` come from text when both sides agree, and a lying
   success body under a `failed` stamp is dropped instead of leaking.

2. Replace the ESM-incompatible `__dirname` fixture lookup in
   subtask-result.test.ts with `fileURLToPath(new URL(..., import.meta.url))`.
   The frontend package is `"type": "module"`, so the previous path
   would have thrown at runtime if anything ever changed under the
   contract directory.

3. Drop the `$schema` reference from contracts/subagent_status_contract.json
   pointing at a file that doesn't exist in the tree.

Three new tests cover the structured + text composition: completed
back-fills the success body, failed back-fills the wrapper text, and
unrecognised content under a `failed` stamp stays empty rather than
echoing noise.
2026-06-07 22:49:55 +08:00
Ryker_Feng
d8b728f7cb
fix(mcp): close stdio sessions on their owning loop to avoid cross-task cancel-scope error (#3379) (#3392)
* fix(mcp): close stdio sessions on their owning loop to avoid cross-task cancel-scope error (#3379)

Adopt an owner-task lifecycle for pooled MCP ClientSessions so each
session is entered, initialized, and exited within a single asyncio task
on its owning event loop. This eliminates the anyio "Attempted to exit
cancel scope in a different task than it was entered in" RuntimeError
that surfaced when stdio MCP tools were used via the sync tool wrapper
(which spins up and tears down event loops across tasks).

Also harden the pool lifecycle:
- track in-flight session creation per (server, scope) to dedupe
  concurrent get_session() calls for the same key
- make close_scope/close_server/close_all/close_all_sync cover both
  established entries and in-flight creations so sessions cannot be
  resurrected or leaked after close
- handle cross-loop preemption of an in-flight creation by cancelling
  the stale owner task instead of only signalling it
- define close_all_sync() semantics for a running loop on the current
  thread (signal-only, async completion) and route reset_mcp_tools_cache
  through a deterministic async close in that case

* fix(mcp): avoid reset deadlock on running loop cache reset

* fix(mcp): address session pool review feedback
2026-06-07 21:37:30 +08:00
Xinmin Zeng
befe334f10
fix(config): make the reload boundary discoverable from code (#3144) (#3153)
* fix(config): make the reload boundary discoverable from code, not just docs

Closes #3144.

The hot-reload contract — per-run fields are resolved through
`get_app_config()` on every request, infrastructure fields snapshot at
gateway startup — landed in `backend/CLAUDE.md` as part of #3131. A
maintainer reading `get_config()` or an `AppConfig` field still had to
context-switch to that document to know which fields require a process
restart, and there was no enforcement that the prose list stayed in
sync with the code.

This commit moves the boundary to a machine-readable single source of
truth and surfaces it where the code lives:

- New `deerflow.config.reload_boundary` module owns the registry of
  restart-required fields (`STARTUP_ONLY_FIELDS`) and a tiny helper
  API (`is_startup_only_field`, `iter_startup_only_field_paths`,
  `format_field_description`). The standardised `"startup-only:"`
  prefix is exported as `STARTUP_ONLY_PREFIX` so future scanners /
  lint hooks / doc generators can pivot off it without re-parsing
  prose.
- `AppConfig`'s `database`, `checkpointer`, `run_events`,
  `stream_bridge`, `sandbox`, and `log_level` fields now build their
  `Field(description=...)` from `format_field_description(...)`. The
  same text shows up in IDE hover (Pydantic v2 exposes `description`
  via `model_fields[...]`).
- `channels` is restart-required too but lives outside the AppConfig
  Pydantic schema (the config section is consumed directly by
  `start_channel_service`). The registry owns it so the boundary is
  not split between two places.
- `get_config()` docstring points to the registry instead of leaving
  the reader to find `CLAUDE.md`. The `CLAUDE.md` table collapses to
  a one-liner pointing back at `reload_boundary.py` so the boundary
  has one canonical location, not two.

Drift coverage in `tests/test_reload_boundary.py`:

- Every registered field has a non-trivial reason.
- Iterator / membership helpers stay in sync with the dict.
- Every registry entry that maps to an `AppConfig` field also carries
  the `"startup-only:"` prefix in the schema (catches "forgot to
  update the schema").
- Reverse drift: any AppConfig field whose description starts with
  the prefix must be registered (catches "marked restart-required in
  the schema but forgot the registry").
- The runtime introspection that IDE hover depends on
  (`AppConfig.model_fields["database"].description`) is pinned, so a
  future Pydantic upgrade or schema swap that breaks the hover surface
  shows up as a test failure rather than a silent regression.

Refs: bytedance/deer-flow#3138 (split summary), #3107 (origin), #3131
(prior boundary fix in prose form).

* fix(config): preserve field doc and correct log_level reload reason

Two follow-ups on the PR #3153 review:

1. The `log_level` STARTUP_ONLY_FIELDS reason previously claimed
   `apply_logging_level()` mutates the root logger level. It does not:
   only the `deerflow` / `app` logger levels are set, and root handler
   thresholds are conditionally lowered so messages from those loggers
   can propagate. Reword to match the actual behavior so operators
   reading IDE hover get accurate restart guidance.

2. `format_field_description(field_path)` was the sole `Field(description=)`
   for every restart-required field, which silently overwrote the
   original human-facing documentation — most visibly the `log_level`
   field that used to list debug/info/warning/error and clarify that
   third-party libraries are not affected. Extend the helper with a
   keyword-only `field_doc` parameter that composes the startup-only
   marker with the original prose so IDE hover documents both *why*
   the field is restart-required and *what* it actually accepts.
   Updated all six restart-required AppConfig fields (`log_level`,
   `database`, `sandbox`, `run_events`, `checkpointer`, `stream_bridge`)
   to pass their original descriptions through the helper.

Tests: two new cases in `test_reload_boundary.py` pin (a) the helper
composition and (b) every AppConfig restart-required field still
surfaces a recognisable substring of its original documentation.

---------

Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
2026-06-07 21:27:14 +08:00
Ryker_Feng
d133b1119a
fix(summarization): tag summary LLM calls nostream to stop phantom stream messages (#2503) (#3378)
* fix(summarization): tag summary LLM calls nostream to stop phantom stream messages (#2503)

The SummarizationMiddleware runs its summary LLM call inside a before_model
hook. Without a nostream tag the summary tokens were captured by LangGraph's
messages-tuple stream callback and broadcast to the frontend as a phantom AI
message.

Generate a dedicated summary model copy tagged with "nostream" (merged on top
of any existing tags such as "middleware:summarize" so RunJournal attribution
is preserved) and override _create_summary / _acreate_summary to invoke it
directly. This avoids temporarily swapping the shared self.model, which would
otherwise leak the RunnableBinding across concurrent runs and break parent
logic that inspects the raw model (profile / _get_ls_params).

Add regression tests covering nostream tagging, concurrent-run isolation, raw
model preservation, and existing-tag merge.

* fix(summarization): address nostream review feedback
2026-06-07 17:55:04 +08:00
Huixin615
88e36d9686
fix(#3189): prevent write_file streaming timeout on long reports (#3195)
* fix(#3189): prevent write_file streaming timeout on long reports

Adds a layered defense against StreamChunkTimeoutError caused by oversized
single-shot write_file tool calls:

- factory: default stream_chunk_timeout to 240s for OpenAI-compatible
  clients (overridable via ModelConfig.stream_chunk_timeout in config.yaml)
- sandbox/tools: server-side 80 KB length guard on non-append write_file
  calls (configurable via DEERFLOW_WRITE_FILE_MAX_BYTES env var, 0 disables);
  rejects oversized payloads with a structured error pointing the model at
  str_replace or append=True
- middleware: classify StreamChunkTimeoutError as transient but cap retries
  at 1 via per-exception _RETRY_BUDGET_OVERRIDES (same-payload retry on a
  chunk-gap timeout buffers the same way upstream; full 3-attempt loop
  would stack 6-12 min of dead air)
- middleware: surface an actionable user-facing message for stream-drop
  exceptions instead of leaking the raw langchain stack
- prompts: add a routing-style File Editing Workflow hint to both lead_agent
  and general_purpose subagent prompts, pointing the model at str_replace
  for incremental edits (mirrors Claude Code's Edit / Codex's apply_patch)
- tests: behavioural coverage for size guard, retry budget override,
  stream-drop user message, factory default injection

Refs #3189

* fix(#3189): drop stream_chunk_timeout for non-OpenAI providers

Address CR feedback on PR #3195:

- factory: pop `stream_chunk_timeout` from kwargs for any model_use_path other than `langchain_openai:ChatOpenAI` instead of returning early. `ModelConfig.stream_chunk_timeout` is part of the shared schema, so a user-supplied value on a non-OpenAI provider would otherwise be forwarded to its constructor and raise `TypeError: unexpected keyword argument`.

- factory: rewrite docstring to describe the actual `exclude_none=True` behaviour (explicit null is excluded and falls back to the default) instead of the misleading "None falling out via exclude_none=True keeps its value".

- tests: add regression coverage asserting the kwarg is stripped before reaching a non-OpenAI provider's constructor.

Refs: bytedance#3189

* fix(#3189): restrict stream-drop user copy to StreamChunkTimeoutError only

Per CR on #3195: narrow _STREAM_DROP_EXCEPTIONS to StreamChunkTimeoutError. Generic httpx RemoteProtocolError / ReadError fall back to the standard 'temporarily unavailable' copy, since they routinely fire on transient network blips where the 'split the output' guidance is misleading. Retry/backoff classification is unchanged — both remain transient/retriable. Tests updated to reflect new copy, plus a symmetric regression test for ReadError.

---------

Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
2026-06-07 17:47:11 +08:00
Xinmin Zeng
268fdd6968
fix(gateway): drain in-flight runs before closing checkpointer on shutdown (#3381)
* fix(gateway): drain in-flight runs before closing checkpointer on shutdown

Chat runs execute in fire-and-forget background asyncio tasks that write
checkpoints through a shared checkpointer. On shutdown, langgraph_runtime's
AsyncExitStack tore down the checkpointer's postgres connection pool while
those run tasks were still mid-graph. langgraph's
AsyncPregelLoop._checkpointer_put_after_previous then ran its
`finally: await checkpointer.aput(...)` against the closed pool, raising
psycopg_pool.PoolClosed. Because that put runs in a langgraph-internal task
(not on run_agent's call stack), run_agent's try/except cannot catch it and it
surfaces as "unhandled exception during asyncio.run() shutdown".

Add RunManager.shutdown() to cancel and bounded-await all in-flight runs, and
call it from langgraph_runtime BEFORE the AsyncExitStack closes the
checkpointer, so the final checkpoint write lands while the pool is still open.
The drain is bounded by a timeout so a stuck run cannot hang worker shutdown,
and is shielded so a second shutdown signal cannot abandon it mid-drain and
reopen the race.

Closes #3373

* fix(gateway): address review — preserve completed-run status, bound drain persistence

Addresses Copilot review on #3381:

- RunManager.shutdown(): decide run status AFTER the drain. Under the lock it
  now only requests cancellation; after asyncio.wait it marks/persists
  `interrupted` only for runs still pending or ended cancelled. A run that
  completes (e.g. `success`) during the drain window keeps its real terminal
  status instead of being unconditionally overwritten.
- Bound the trailing status persistence within the timeout budget
  (deadline = loop.time()+timeout; gather wrapped in asyncio.wait_for) so a slow
  store backing off under DB pressure cannot push shutdown past the deadline.
- deps: use asyncio.create_task instead of asyncio.ensure_future.
- tests: wait deterministically for the run to be in-flight (poll the first
  checkpoint) instead of a fixed sleep; init shutdown_calls explicitly in the
  recovery test double; add regression test asserting a run completing during
  the drain keeps its status (in memory and in the store).

* fix(gateway): address maintainer review — surface failed drain persists, clarify timeout constant

Addresses @WillemJiang review on #3381:

- shutdown(): inspect the gather result of the trailing interrupted-status
  persistence. _persist_status is best-effort (it catches + logs its own
  failure with exc_info and returns False, so it never raises out of the
  gather), but the aggregate result was never checked — a partial failure had
  no shutdown-level visibility. Now any escaped Exception is logged, and any
  False (a persist that did not confirm) is logged with the run_id. Added
  regression test test_shutdown_surfaces_failed_interrupted_persist.
- deps: clarify the _RUN_DRAIN_TIMEOUT_SECONDS comment — state the actual value
  of _SHUTDOWN_HOOK_TIMEOUT_SECONDS (5.0s) and that both count toward the
  lifespan shutdown window. Kept as two separate constants (independent teardown
  steps that may diverge) rather than one shared "must match" value.
- Verified no other test fake needs the shutdown stub: _FakeRunManager in
  test_worker_langfuse_metadata.py is a run_agent() argument (worker path),
  never injected into langgraph_runtime, so it never receives shutdown().
2026-06-07 11:24:30 +08:00
Nan Gao
1aac408dd0
fix upload file size contract (#3408) 2026-06-06 15:12:17 +08:00
AochenShen99
2bbc7879fa
refactor(tool-search): consolidate MCP metadata tag and harden deferred-tool setup (#3370)
Follow-up to #3342 (deferred MCP tool loading). Maintainability cleanup plus
hardening of malformed/empty tool_search queries; no change to the deferral
mechanism or search ranking.

- Add deerflow/tools/mcp_metadata.py as the single source of truth for the
  "deerflow_mcp" tag (MCP_TOOL_METADATA_KEY + tag_mcp_tool + public
  is_mcp_tool). Removes the duplicated magic string and the private,
  cross-module _is_mcp_tool import.
- tool_search.search: never raise on model-generated input. Extract
  _compile_catalog_regex (shared compile-with-literal-fallback); return empty
  for empty/whitespace queries and a bare "+" instead of matching everything
  or raising IndexError.
- DeferredToolSetup: document the empty-vs-populated invariant.
- build_deferred_tool_setup: comment the two distinct empty-return branches.
- _assemble_deferred: add return type, rename local to deferred_setup, build
  the final list with an explicit append.
- Tests: use tag_mcp_tool instead of per-file tag helpers; cover empty and
  bare-"+" queries.
2026-06-05 15:21:41 +08:00
Eilen Shin
28b1da2172
fix(agents): harden update_agent null-like args (#3237)
* fix(agents): harden update_agent null-like args

* docs: mention undefined null-like update args

---------

Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
2026-06-04 07:10:59 +08:00
Huixin615
89ae74d4f4
fix(skills): surface offending line and quoting hint on SKILL.md YAML… (#3335)
* fix(skills): surface offending line and quoting hint on SKILL.md YAML errors

When a SKILL.md front-matter fails to parse, the existing log only
echoes PyYAML's raw message, leaving authors to grep the file for the
offending line. This is especially painful for the very common
LLM-authored mistake of an unquoted scalar containing ': '
(e.g. 'description: foo: bar'), which fails with
'mapping values are not allowed here' and silently drops the skill.

Enrich the error log with:
  - the source line PyYAML pointed at via problem_mark
  - a targeted, copy-pasteable quoting hint when (and only when) the
    error is the well-known 'mapping values are not allowed' scanner
    error on an unquoted value

The skill is still rejected (no semantics are guessed or rewritten);
only the diagnostic is improved.

Fixes #3333

* improve(skills): address CR feedback on SKILL.md YAML error diagnostics

Per review on #3335:

- Log the file line number (mark.line + 2) instead of the
  front-matter-internal line number, so authors land on the right
  row in their editor.
- Use exc.problem == "mapping values are not allowed here" for a
  tighter match than substring-scanning str(exc).
- Preserve the offending key's leading whitespace in the quoting
  hint so nested mappings stay nested when authors paste the fix
  back.
- Rewrite the regression test to actually exercise the new
  behaviour: PyYAML's own message already echoes the offending
  line (and truncates it with "..."), so the old assertion
  passed on main. New assertions pin (a) the file-line number,
  (b) the full untruncated line, and (c) the copy-pasteable hint.
- Add a guard test for nested-key indentation so the
  partition()/strip() shape cannot regress silently.

Refs #3333, #3335

* fix(skills): escape backslashes in YAML quoting hint

The hint emitted by _format_yaml_error previously escaped only double
quotes, so values containing backslashes (e.g. Windows paths like
C:\Temp or regex escapes like \d) produced a suggested scalar that
was either invalid YAML or silently re-interpreted by PyYAML's
double-quoted escape rules when pasted back. Escape order matters:
backslashes first, then double quotes.

Adds two regression tests covering Windows-path and regex-style
backslashes.

Address Copilot CR feedback on PR #3335.
2026-06-03 21:53:52 +08:00
Ryker_Feng
8fca56cf43
fix(mcp): accept transport field as alias for type (#3238) (#3243)
The official MCP configuration schema uses `transport` to specify the
transport mechanism (stdio/sse/http), but `McpServerConfig` only honored
`type` and defaulted to `stdio`. Remote MCP servers configured with just
`transport: sse` were therefore misidentified as stdio and failed with
"with stdio transport requires 'command' field".

Add a model validator that promotes `transport` to `type` when only
`transport` is provided, while keeping `type` authoritative when both
are set. This matches the MCP-spec field name without breaking existing
configurations.

Fixes #3238
2026-06-03 18:11:38 +08:00
zhongli-sz
3ae82dc663
fix(mcp): add auth interceptor with channel user_id and keep header propagation to mcp tools (#3294)
* 修复channel中的user_id传递到interceptor中的bug, mcp可通过header传递user_id到mcp工具

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(channel,mcp,gateway): normalize channel user_id and add regression tests

Normalize external channel user ids into filesystem-safe runtime context while preserving raw channel_user_id, and document gateway user_id propagation semantics. Add regression coverage for channel user_id context mapping, gateway user_id precedence/internal-role behavior, and MCP interceptor header forwarding via meta.headers.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(auth,mcp): harden user id normalization and header handling

Increase sanitized user-id digest suffix to 16 hex chars, replace internal system role magic string with a shared constant, and harden MCP header forwarding with Mapping type checks. Add regression tests for empty channel user_id handling, unsupported header types, and updated digest length behavior.

Co-authored-by: Cursor <cursoragent@cursor.com>

---------

Co-authored-by: zhongli <335302680@qq.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-03 15:48:19 +08:00
Ryker_Feng
5dc2d6cbf5
fix(sandbox): close AioSandbox HTTP client during provider teardown (#2872) (#3245)
* fix(sandbox): close AioSandbox HTTP client during provider teardown (#2872)

AioSandbox allocates a host-side agent_sandbox client (wrapping an
httpx.Client) in __init__, but AioSandboxProvider.release/destroy/shutdown
only popped provider state and tore down the backend container — the
client/transport owned by each cached AioSandbox was never explicitly
closed, accumulating unreclaimed sockets in long-running services.

- Add AioSandbox.close(): best-effort, idempotent close of the wrapped
  httpx_client (falls back to top-level client.close()); errors are
  logged but never raised so backend cleanup is never blocked.
- AioSandboxProvider.release()/destroy() now close the cached AioSandbox
  before dropping it; shutdown() inherits this via destroy().

* fix(sandbox): close the real httpx.Client owned by AioSandbox (#2872)

The previous close() only walked one level (wrapper.httpx_client), which resolves to the Fern-generated HttpClient wrapper that has no close(). The real socket-owning httpx.Client lives one level deeper at _client_wrapper.httpx_client.httpx_client, so the close path never fired and host-side sockets still leaked.

Resolve the real httpx.Client with graceful degradation; clear self._client under the lock for use-after-close and concurrent double-close safety; mark provider release()/destroy() try/except as defense-in-depth; rewrite TestClose against the real nested structure to lock down the original no-op bug.
2026-06-02 22:55:59 +08:00
AochenShen99
d9f4724950
fix(tool-search): reliably hide deferred MCP schemas by removing the ContextVar (closures + graph state) (#3342)
* feat(tool-search): add hash-scoped promoted state to ThreadState

* feat(tool-search): add immutable DeferredToolCatalog with stable hash

* feat(tool-search): add build_deferred_tool_setup + Command-writing tool_search

* refactor(tool-search): replace deferred-tool ContextVar with closures + graph state (#3272)

Build the deferred catalog + tool_search tool per agent from the policy-filtered
tool list (after skill allowed-tools), pass deferred_names + catalog_hash
explicitly to DeferredToolFilterMiddleware and the prompt, and record promotions
in ThreadState.promoted (scoped by catalog_hash) via a Command-returning
tool_search. Removes DeferredToolRegistry and the _registry_var ContextVar so
deferral no longer depends on build/execute sharing an async context. MCP tools
are tagged with metadata[deerflow_mcp]; client.py assembles deferral the same way.

Catalog is built AFTER tool-policy filtering (no policy-excluded tool can leak via
tool_search) and assembly is fail-closed. Migrate tests off the deleted registry
APIs; delete the obsolete ContextVar-based #2884 regression (re-covered by
state-based tests in a follow-up).

* test(tool-search): lock tool_search promotion into next model turn via graph state

* test(tool-search): cross-context, policy-leak, fail-closed, #2884 isolation regressions

* test(tool-search): align real-LLM e2e with closure-based deferred setup

* docs: update DeferredToolFilterMiddleware description for closure+state design

* style(tests): drop unused import in test_deferred_setup (ruff)

* test(tool-search): harden merge_promoted + replace tautological catalog test

From independent code review:
- merge_promoted: use existing.get("catalog_hash") so a forward-incompatible
  or externally-injected persisted promoted dict triggers a replace instead of
  a KeyError crash; add regression test for the malformed-existing case.
- test_deferred_catalog: replace the `== [] or True` tautology (a test that
  could never fail) with a deterministic invalid-regex->literal-fallback check
  (positive match on calc + negative empty match).
- DeferredToolCatalog: comment why frozen-without-slots is required for the
  cached_property hash/names fields (adding slots=True would break them).

* fix(tool-search): read tool_search.enabled from self._app_config in client

DeerFlowClient._ensure_agent called get_app_config() directly to read
tool_search.enabled, but the client already resolves and stores its config as
self._app_config at construction (and uses it everywhere else). The bare call
re-resolves config from disk at agent-build time, which raises FileNotFoundError
in environments without a config.yaml (CI) — test_client.py's fixture only
patches get_app_config during __init__, so the later call hit the real loader.
Use self._app_config, matching the rest of the client.

* test(tool-search): lock tool_search post-policy append ordering

tool_search is appended after skill-allowlist filtering, so the allowlist
can no longer deny it by name. Lock the intended contract: it only appears
when allowed MCP tools survive the filter, and its catalog (derived from the
already policy-filtered list) can never expose a denied tool. Addresses the
ordering observation from the Copilot review on #3342.
2026-06-02 22:43:22 +08:00
Willem Jiang
031d6fbcbe
fix(checkpointer): use AsyncConnectionPool for postgres to prevent stale connection errors (#3223) (#3226)
* fix(checkpointer): use AsyncConnectionPool for postgres to prevent stale connection errors (#3223)

  Replace AsyncPostgresSaver.from_conn_string() with an explicit
  AsyncConnectionPool that has check_connection enabled, so dead idle
  connections are detected and replaced on checkout instead of raising
  OperationalError.

* Potential fix for pull request finding

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

* Fixed the unit test error and lint error

* fix(checkpointer): add TCP keepalive to postgres connection pool (#3254)

  Enable TCP keepalive probes on the AsyncConnectionPool to prevent
  idle postgres connections from being dropped by the server or network
  middleware. Combined with the existing check_connection callback, this
  provides defense-in-depth against stale connection errors.

  Fixes #3254

* Changed the code as review suggestion

---------

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
2026-06-01 09:05:11 +08:00
Nan Gao
79cc227917
fix(middleware): fix LLM fallback run status (#3321)
* Fix LLM fallback run status

* optimize LLM fallback maker extraction in streaming path
2026-05-31 22:42:13 +08:00
AochenShen99
9f3be2a9fa
fix(agents): offload UploadsMiddleware uploads scan off the event loop (#3311)
UploadsMiddleware defines only the sync `before_agent` hook. LangChain wires a
sync-only hook as `RunnableCallable(before_agent, None)`, and LangGraph's
`ainvoke` runs it directly on the event loop when `afunc is None` — so the
per-message uploads-directory scan (`exists`/`iterdir`/`stat` plus reading
sibling `.md` outlines) blocks the asyncio event loop on every message that has
an uploads directory.

Add `abefore_agent` that offloads the scan to a worker thread via
`run_in_executor`; it copies the current context, preserving the `user_id`
contextvar read by `get_effective_user_id()`.

Add a runtime anchor under `tests/blocking_io/` that drives the real
`create_agent` graph via `ainvoke` under the strict Blockbuster gate, so a
regression back onto the event loop fails CI. Update blocking-IO docs.
2026-05-30 21:46:35 +08:00
AochenShen99
4093c83383
refactor(provider): share assistant payload replay matching (#3307)
* Share assistant payload replay matching

* fix(provider): recover assistant field when ordinal AI index is taken

The mismatch-length fallback in `_match_ai_message` only tried the exact
`fallback_ordinal` AI index. When serialization drops or reorders an
assistant message, a unique signature match can consume a non-ordinal
index, leaving a later ambiguous payload's ordinal already used — so its
provider field (e.g. `reasoning_content`) was silently dropped.

Scan forward from the ordinal for the next unused `AIMessage` (wrapping to
earlier indices) to preserve the positional bias while still recovering
the field. Forward scanning avoids a naive min-unused pick that could
restore the wrong field after a leading message is dropped.

Add a regression test for the dropped-leading-message case.

* fix(provider): avoid earlier assistant fallback replay
2026-05-29 23:05:59 +08:00
Xinmin Zeng
ca487578a4
feat(agent): add ToolOutputBudgetMiddleware for oversized tool output protection (#3303)
* feat(agent): add ToolOutputBudgetMiddleware for oversized tool output protection

Closes #3289. Adds a unified middleware that enforces per-result budgets
on ALL tool outputs (MCP, sandbox, community, custom), preventing
oversized external tool results from blowing the model context window.

Design informed by claude-code (persistToolResult), hermes-agent
(tool_result_storage), and pi (OutputAccumulator) — the three most
mature implementations in production coding-agent frameworks.

Key features:
- Disk externalization: oversized outputs written to thread-local
  .tool-results/ directory, replaced with compact preview + file
  reference. Model can read full output via read_file with offset/limit.
- Fallback truncation: head+tail truncation when disk is unavailable
  (no thread_data, write failure), ensuring the context is always
  protected.
- read_file exemption: prevents persist-read-persist infinite loops
  (independently discovered by claude-code, hermes-agent, and pi).
- Per-tool threshold overrides via config.
- Line-boundary-aware truncation (no partial lines in previews).
- Multimodal content passthrough (images/structured blocks skip budget).
- Historical ToolMessage patching in wrap_model_call for checkpoint
  recovery scenarios.

Related: #3222 (design RFC), #1844 (comprehensive context management),
#3137 (write_file args compaction), #1677 (sandbox tool truncation).

* test: add MCP content_and_artifact format coverage

Add 5 tests for MCP tool output format (list of content blocks):
- text content blocks are extracted and budgeted
- multiple text blocks are joined and budgeted
- image content blocks are skipped (multimodal passthrough)
- mixed text+image blocks are skipped
- small text blocks pass through unchanged

Total test count: 59 (was 54).

* fix(agent): address Codex review findings for ToolOutputBudgetMiddleware

Three issues identified by Codex code review, all fixed:

1. `enabled` config field was unused — middleware now checks
   `config.enabled` and skips all processing when disabled.

2. `_build_fallback` could exceed `fallback_max_chars` — the marker
   text itself (~139 chars) was not deducted from the budget. Now
   pre-computes marker overhead and falls back to hard slice when
   max_chars is smaller than the marker.

3. Sync file I/O in async path — `awrap_tool_call` now delegates
   `_patch_result` to `asyncio.to_thread` to avoid blocking the
   event loop during disk writes.

Tests updated to use realistic fallback_max_chars values (500+)
that can accommodate the marker overhead, plus two new tests:
- `test_result_never_exceeds_max_chars` (parametric across sizes)
- `test_very_small_max_chars_does_not_crash`

* fix(agent): address Copilot review — path traversal, async perf, shared config

1. Path traversal defense: sanitize tool_name via _sanitize_tool_name()
   (strips separators, .., absolute paths), validate storage_subdir is
   relative, and verify resolved filepath stays inside storage_dir.

2. Async hot-path optimization: add _needs_budget() cheap check before
   asyncio.to_thread offload — small outputs (99% of calls) skip the
   thread overhead entirely.

3. Replace shared module-level _DEFAULT_CONFIG with _default_config()
   factory to prevent cross-instance mutation of mutable fields.

12 new tests: TestSanitizeToolName (5), TestExternalizePathTraversal (3),
TestNeedsBudget (4).

* fix(agent): correct preview hint to match read_file actual API

read_file uses start_line/end_line (1-indexed line numbers), not
offset/limit. The previous wording was copied from hermes-agent
which has a different read_file interface.

* perf(agent): hoist hot-path imports, add model-call pre-scan (review #3303)

Address maintainer review feedback:

1. Hoist inline imports to module level — `import asyncio` (was in
   awrap_tool_call hot path) and `from dataclasses import replace`
   (was in _patch_result) now live at module top.

2. Add a cheap pre-scan to _patch_model_messages so the historical
   message list is not rebuilt on every model call when nothing is
   oversized (the common case once results are budgeted at tool-call
   time). Also adds the same _needs_budget gate to the sync
   wrap_tool_call for symmetry with awrap_tool_call.

The pre-scan is refactored into per-tool-aware helpers
(_effective_trigger / _tool_message_over_budget) that mirror the exact
trigger conditions in _budget_content — including tool_overrides — so
the fast-path can never produce a false negative (silently skipping
budgeting for a tool with a low per-tool threshold).

7 new regression tests lock the per-tool-override-through-pre-scan path
and the model-call early return.

---------

Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
2026-05-29 22:59:26 +08:00
Nan Gao
e683ed6a76
fix(runtime): guide malformed write_file recovery (#3040)
* fix(runtime): guide malformed write_file recovery

* fix(runtime): align write_file recovery guidance
2026-05-29 17:46:24 +08:00
Eilen Shin
872079b894
docs: clean standalone LangGraph server remnants (#3301) 2026-05-29 11:36:45 +08:00
john lee
cbf8b194e8
fix(runtime): harden JSONL async I/O and DB put_batch thread validation (#3084)
* fix(runtime): harden JSONL async I/O and DB put_batch thread validation (#2816)

- JsonlRunEventStore: offload all file I/O to asyncio.to_thread() so the
  event loop is never blocked; add per-thread asyncio.Lock to serialise
  concurrent puts and prevent interleaved JSONL lines
- Split _ensure_seq_loaded into a sync _compute_max_seq (runs in thread)
  and an async wrapper; seq counter is recovered from disk on fresh store init
- DbRunEventStore.put_batch: raise ValueError when events span multiple
  thread_ids (previously silently assumed same thread)
- Add test_jsonl_event_store_async_io.py: 12 tests covering lock reuse,
  concurrent seq monotonicity, disk recovery, and mixed-thread batch rejection

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: address Copilot review comments

- delete_by_thread: pop _write_locks after releasing the lock to prevent
  unbounded growth when threads are repeatedly created and deleted
- tests: add regression guard asserting asyncio.to_thread is called for
  _write_record in put(); assert _write_locks entry removed on delete

* fix(lint): move patch import to local scope to fix ruff I001

* fix(lint): apply ruff check+format fixes to test file

* fix(runtime): address review feedback for JSONL async I/O hardening (#2816)

Use setdefault for atomic lock init in _get_write_lock; pop _write_locks
inside the held lock scope in delete_by_thread; update test docstring
and assert lock entry also cleared on delete.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: rayhpeng <rayhpeng@gmail.com>
2026-05-29 09:27:53 +08:00
AochenShen99
44677c5eb4
feat(provider) Add patched MiMo reasoning content support (#3298)
* Add patched MiMo reasoning content support

* Clarify MiMo patched model coverage

* Remove unused MiMo payload index

* Address MiMo review nits
2026-05-28 18:24:32 +08:00
AochenShen99
8decfd327e
Fix custom skill install permissions (#3241)
* Fix custom skill install permissions

* Fix skill upload test portability

* Keep custom skill writes sandbox readable

* Clear sandbox write bits on skill permissions

* Limit custom skill write permission updates
2026-05-28 15:48:32 +08:00
Lawrance_YXLiao
3cb75887c1
fix(memory): parse wrapped memory update json responses (#3252)
* fix(memory): parse wrapped memory update json responses

* test(memory): format wrapped response coverage

* fix(memory): guard malformed nested memory facts

* fix(memory): require full update object when parsing responses

* fix(memory): fail closed on unsafe partial removals

* style(memory): format updater tests
2026-05-28 07:46:44 +08:00
Willem Jiang
162fb2143e
fix(mcp): skip session pooling for HTTP/SSE transports to avoid anyioRuntimeError (#3203) (#3224)
* fix(mcp): skip session pooling for HTTP/SSE transports to avoid anyio RuntimeError (#3203)

  HTTP/SSE transports use anyio.TaskGroup internally for streamable
  connections. These task groups have cancel scopes bound to the async task
  that created them, so closing a pooled session from a different task
  raises RuntimeError. Restrict session pooling to stdio transports only.

* Potential fix for pull request finding

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

* docs: clarify MCP pooling applies only to stdio tools

Agent-Logs-Url: https://github.com/bytedance/deer-flow/sessions/2dd9881d-54c6-45fd-90bc-154a09e29841

Co-authored-by: WillemJiang <219644+WillemJiang@users.noreply.github.com>

---------

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
2026-05-27 08:32:57 +08:00
QY
92905e9e3e
fix(todo): reuse thread state schema (#3206)
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
2026-05-26 23:58:08 +08:00
AochenShen99
e344be8d94
feat(tests): add Blockbuster runtime gate for event-loop blocking IO (#3229)
* feat(tests): add Blockbuster runtime gate for event-loop blocking IO

Adds a strict runtime gate that fails CI when sync blocking IO calls run
on the asyncio event loop thread through DeerFlow business code.

Components:
- backend/tests/support/detectors/blocking_io_runtime.py — Blockbuster
  context scoped to `app.*` and `deerflow.*` so test infrastructure,
  pytest internals, and third-party libraries stay silent.
- backend/tests/blocking_io/conftest.py — pytest_runtest_protocol
  hookwrapper that wraps every item (setup + call + teardown) with the
  strict context. Respects `@pytest.mark.allow_blocking_io` opt-out.
- backend/tests/blocking_io/test_skills_load.py — regression anchor for
  the #1917 fix (asyncio.to_thread offload around
  LocalSkillStorage.load_skills).
- backend/tests/blocking_io/test_sqlite_lifespan.py — regression anchor
  for the #1912 fix (asyncio.to_thread offload around
  ensure_sqlite_parent_dir).
- backend/tests/blocking_io/test_gate_smoke.py — meta-test asserting the
  gate actually catches unoffloaded blocking IO and that the
  `@pytest.mark.allow_blocking_io` opt-out works.
- backend/Makefile — `make test-blocking-io` target.
- .github/workflows/backend-blocking-io-tests.yml — hard-fail PR gate on
  ubuntu-latest. Windows matrix deferred to follow-up.

Dependencies:
- blockbuster>=1.5.26,<1.6 added to dev group.

Coverage boundary (called out in PR body): the gate only catches blocking
IO on code paths the test suite actually exercises. Static AST inventory
(separate, informational) is the complementary coverage tool. Three blind
spot categories — untested paths, mocked-away paths, env-mismatched paths
— are documented in the PR description.

Findings surfaced while authoring this PR:
- resolve_sqlite_conn_str in runtime/store/_sqlite_utils.py:19 does sync
  Path.resolve() -> os.path.abspath on the lifespan loop thread, ahead of
  the #1912 fix. Not addressed here; tracked as follow-up.

Tests: 4 passed locally (`make test-blocking-io`).
Lint/format: clean (`ruff check` and `ruff format --check`).

* fix(tests): scope Blockbuster gate to blocking-io suite

* fix(tests): harden Blockbuster runtime gate

* test(blocking-io): add project rule extension point

* test(blocking-io): address review cleanup
2026-05-26 23:03:49 +08:00
Willem Jiang
f9b7071304
fix(sandbox): add group/other read permissions to uploaded files for Docker sandbox (#3127) (#3134)
* fix(sandbox): add group/other read permissions to uploaded files for Docker sandbox (#3127)

  When using AIO sandbox with LocalContainerBackend, uploaded files are
  created with 0o600 (owner-only) permissions by the gateway process
  running as root. The sandbox process inside the Docker container runs
  as a non-root user and cannot read these bind-mounted files, causing
  a "Permission denied" error on read_file.

  Add `needs_upload_permission_adjustment` attribute to SandboxProvider
  (default True) to indicate that uploaded files need chmod adjustment.
  LocalSandboxProvider opts out (same user). A new `_make_file_sandbox_readable`
  function adds S_IRGRP | S_IROTH bits after files are written, changing
  permissions from 0o600 to 0o644 so the sandbox can read the uploads.

  fixes #3127

* fix(uploads): unconditionally adjust file permissions for sandbox access

  The conditional check  meant uploaded files retained 0o600
  permissions in some Docker sandbox configurations, preventing the
  sandbox process (UID 1000) from reading them. Always add group/other
  read bits so every sandbox setup can access uploaded content. Also add
  read bits to the sync-path writable helper as defense in depth.
2026-05-25 09:26:18 +08:00
Huixin615
8785658a2e
fix(agents): preserve todos state across node updates (#3180)
* fix(agents): preserve todos state across node updates

ThreadState.todos had no reducer, so any downstream node returning a
partial state without todos was implicitly setting it to None, which
LangGraph then used to overwrite the previously streamed value. This
caused the to-do list to render correctly during streaming but vanish
once streaming completed.

Add a merge_todos reducer that keeps the last non-None value, mirroring
the merge_artifacts pattern already used in the same file. An explicit
empty list is still respected so that 'user cleared todos' works.

Tests: 10 new unit tests in tests/test_thread_state_reducers.py covering
merge_todos plus regression coverage for merge_artifacts and
merge_viewed_images. All 69 thread-related tests pass locally.

Closes #3123

* test(agents): add annotation binding regression guard

Address Copilot review feedback on #3123:

- Add TestThreadStateAnnotations asserting that ThreadState.todos is
  Annotated with merge_todos. Without this guard, reverting the
  Annotated[list | None, merge_todos] binding would silently regress
  #3123 while all existing reducer unit tests continue to pass.

- Align test imports to 'from deerflow.agents.thread_state import ...'
  matching the rest of the backend test suite.
2026-05-23 23:25:38 +08:00
rayhpeng
0fb05825a2
fix(runtime): make run creation persistence atomic (#3152)
* fix runtime run creation persistence atomicity

* fix run creation cancellation rollback

* fix run manager test cleanup await

* clarify run creation rollback on cancellation

* document new run persistence rollback boundary

---------

Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
2026-05-23 22:43:34 +08:00
AochenShen99
66d6a6a4e8
fix: harden run finalization persistence (#3155)
* fix: harden run finalization persistence

* style: format gateway recovery test

* fix: align run repository return types

* fix: harden completion recovery follow-up
2026-05-23 00:09:06 +08:00
Nan Gao
f0bae28636
fix(middleware): handle repeated tool call ids (#3143)
* fix(middleware): handle repeated tool call ids

* add tests

* refactor(middleware): rely on tool result queues
2026-05-22 21:44:05 +08:00
Lawrance_YXLiao
2eeb597985
fix(runs): expose active progress counters (#3148)
* fix(runs): expose active progress counters

* fix(runs): avoid delayed progress flush on completion

* fix(runs): tighten progress snapshot semantics

* fix(runs): preserve omitted progress fields

* chore(runs): remove duplicate journal initialization
2026-05-22 21:42:14 +08:00
Xinmin Zeng
be0eae9825
fix(runtime): suppress tool execution when provider safety-terminates with tool_calls (#3035)
* fix(runtime): suppress tool execution when provider safety-terminates with tool_calls

When a provider stops generation for safety reasons (OpenAI/Moonshot
finish_reason=content_filter, Anthropic stop_reason=refusal, Gemini
finish_reason=SAFETY/BLOCKLIST/PROHIBITED_CONTENT/SPII/RECITATION/
IMAGE_SAFETY/...), the response may still carry truncated tool_calls.
LangChain's tool router treats any non-empty tool_calls as executable,
so partial arguments (e.g. write_file with a half-finished markdown)
get dispatched and the agent loops on retry.

Add SafetyFinishReasonMiddleware at after_model: detect safety
termination via a pluggable detector registry, clear both structured
tool_calls and raw additional_kwargs.tool_calls / function_call,
preserve response_metadata.finish_reason for downstream observers,
stamp additional_kwargs.safety_termination for traces, append a
user-facing explanation to message content (list-aware for thinking
blocks), and emit a safety_termination custom stream event so SSE
consumers can reconcile any "tool starting..." UI.

Default detectors cover OpenAI-compatible content_filter, Anthropic
refusal, and Gemini safety enums (text + image). Custom providers are
added via reflection (same pattern as guardrails). Wired into both
lead-agent and subagent runtimes.

Closes #3028

* fix(runtime): persist safety_termination as a middleware audit event

Address review on #3035: the SSE custom event is great for live
consumers but invisible to post-run audit. RunEventStore should carry
its own row so operators can answer "which runs were safety-suppressed
today?" from a single SQL query without joining the message body.

Worker now exposes the run-scoped RunJournal via
runtime.context["__run_journal"] (sentinel key, internal channel).
SafetyFinishReasonMiddleware calls the previously-unused
RunJournal.record_middleware, which emits

  event_type = "middleware:safety_termination"
  category   = "middleware"
  content    = {name, hook, action, changes={
                  detector, reason_field, reason_value,
                  suppressed_tool_call_count,
                  suppressed_tool_call_names,
                  suppressed_tool_call_ids,
                  message_id, extras}}

Tool *arguments* are deliberately excluded — those are the very content
the provider filtered and persisting them would defeat the purpose of
the safety filter (per review note in #3035).

Graceful skips when journal is absent (subagent runtime, unit tests,
no-event-store local dev). Journal exceptions never propagate into the
agent loop.

Refs #3028

* fix(runtime): satisfy ruff format + address Copilot review

- ruff format on safety_finish_reason_config.py and e2e demo (CI lint
  failed on ruff format --check; backend Makefile lint target runs
  ruff check AND ruff format --check).
- Docstring on SafetyFinishReasonConfig now says resolve_variable to
  match the actual loader used in from_config (the wording was
  resolve_class previously; behavior is unchanged — resolve_variable
  mirrors how guardrails.provider is loaded).
- Switch the AIMessage type check in SafetyFinishReasonMiddleware._apply
  from getattr(last, "type") == "ai" to isinstance(last, AIMessage),
  matching TokenUsageMiddleware / TodoMiddleware / ViewImageMiddleware
  / SummarizationMiddleware which are the dominant pattern.

Refs #3028
2026-05-22 21:20:28 +08:00
Willem Jiang
c881d95898
fix(mcp): persist MCP sessions across tool calls for stateful servers (#3089)
* fix(mcp): persist MCP sessions across tool calls for stateful servers

  MCP tools loaded via langchain-mcp-adapters created a new session on
  every call, causing stateful servers like Playwright to lose browser
  state (pages, forms) between consecutive tool invocations within the
  same thread.

  Add MCPSessionPool that maintains persistent sessions scoped by
  (server_name, thread_id). Tool calls within the same thread now reuse
  the same MCP session, preserving server-side state. Sessions are evicted
  in LRU order (max 256) and cleaned up on cache invalidation.

  Fixes #3054

* fix(sandbox): add group/other read permissions to uploaded files for Docker sandbox (#3127)

  When using AIO sandbox with LocalContainerBackend, uploaded files are
  created with 0o600 (owner-only) permissions by the gateway process
  running as root. The sandbox process inside the Docker container runs
  as a non-root user and cannot read these bind-mounted files, causing
  a "Permission denied" error on read_file.

  Add `needs_upload_permission_adjustment` attribute to SandboxProvider
  (default True) to indicate that uploaded files need chmod adjustment.
  LocalSandboxProvider opts out (same user). A new `_make_file_sandbox_readable`
  function adds S_IRGRP | S_IROTH bits after files are written, changing
  permissions from 0o600 to 0o644 so the sandbox can read the uploads.

* fix(mcp): address review comments on session pool and tools

- _extract_thread_id: return "default" instead of stringifying None
  when get_config() returns no thread_id
- call_with_persistent_session: fix **arguments annotation from
  dict[str,Any] to Any
- Replace private _convert_call_tool_result import with a local
  implementation that handles all MCP content block types
- _make_session_pool_tool: accept tool_interceptors and apply the
  configured interceptor chain on every call (preserving OAuth and
  custom interceptors)
- MCPSessionPool: replace asyncio.Lock with threading.Lock; restructure
  get/close methods to never await while holding the lock; add
  close_all_sync() that closes sessions on their owning event loops
- reset_mcp_tools_cache: use pool.close_all_sync() instead of
  asyncio.run-in-thread to close sessions deterministically
- test: add test_session_pool_tool_sync_wrapper_path_is_safe covering
  tool invocation via the sync wrapper (tool.func) path

Agent-Logs-Url: https://github.com/bytedance/deer-flow/sessions/9e7f9e7f-1d2b-464a-b3b7-7f1649b74122

Co-authored-by: WillemJiang <219644+WillemJiang@users.noreply.github.com>

* fix(mcp): extract SESSION_CLOSE_TIMEOUT to class constant

Agent-Logs-Url: https://github.com/bytedance/deer-flow/sessions/9e7f9e7f-1d2b-464a-b3b7-7f1649b74122

Co-authored-by: WillemJiang <219644+WillemJiang@users.noreply.github.com>

* Potential fix for pull request finding 'Empty except'

Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
2026-05-21 23:22:20 +08:00
Xinmin Zeng
e93f658472
fix(stability): resolve P0 blockers from v2.0-m1-rc1 stability audit (#3107) (#3131)
* fix(task-tool): unwrap callback manager when locating usage recorder

`config["callbacks"]` may arrive as a `BaseCallbackManager` (e.g. the
`AsyncCallbackManager` LangChain hands to async tool runs), not just a plain
list. The previous `for cb in callbacks` loop raised
`TypeError: 'AsyncCallbackManager' object is not iterable`, which
`ToolErrorHandlingMiddleware` then converted into a failed `task` ToolMessage
even though the subagent had completed internally — Ultra mode lost subagent
results and the lead agent fell back to redoing the work.

Unwrap `BaseCallbackManager.handlers` before searching for the recorder.

Refs: bytedance/deer-flow#3107 (BUG-002)

* fix(frontend): treat any task tool error as a terminal subtask failure

The subtask card status machine matched only three English prefixes (`Task
Succeeded. Result:`, `Task failed.`, `Task timed out`). Anything else fell
through to `in_progress`, so a `task` tool error wrapped by
`ToolErrorHandlingMiddleware` (`Error: Tool 'task' failed ...`) left the card
spinning forever even after the run had ended.

Extract the prefix logic into `parseSubtaskResult` and recognise any leading
`Error:` token as a terminal failure. The extracted function is unit-tested
against the legacy prefixes plus the `AsyncCallbackManager` regression
captured in the upstream issue.

Refs: bytedance/deer-flow#3107 (BUG-007)

* fix(frontend): exclude hidden, reasoning, and tool payloads from chat export

`formatThreadAsMarkdown` / `formatThreadAsJSON` iterated raw messages without
running the UI-level `isHiddenFromUIMessage` filter. Exported transcripts
therefore included `hide_from_ui` system reminders, memory injections,
provider `reasoning_content`, tool calls, and tool result messages — content
that is intentionally hidden in the chat view.

Filter the export to the user-visible transcript by default and gate
reasoning / tool calls / tool messages / hidden messages behind explicit
`ExportOptions` flags so a future debug export can opt back in without
forking the formatter.

Refs: bytedance/deer-flow#3107 (BUG-006)

* fix(gateway): route get_config through get_app_config for mtime hot reload

`get_config(request)` returned the `app.state.config` snapshot captured at
startup. The worker / lead-agent path then threaded that frozen `AppConfig`
through `RunContext` and `agent_factory`, so per-run fields edited in
`config.yaml` (notably `max_tokens`) were ignored until the gateway process
was restarted — even though `get_app_config()` already does mtime-based
reload at the bottom layer.

Route the request dependency through `get_app_config()` directly. Runtime
`ContextVar` overrides (`push_current_app_config`) and test-injected
singletons (`set_app_config`) keep working; `app.state.config` is now only
read at startup for one-shot bootstrap (logging level, IM channels,
`langgraph_runtime` engines).

`tests/test_gateway_deps_config.py` encoded the old snapshot contract and is
removed; `tests/test_gateway_config_freshness.py` replaces it with mtime,
ContextVar, and `set_app_config` coverage. `test_skills_custom_router.py` and
`test_uploads_router.py` now inject test configs via FastAPI
`dependency_overrides[get_config]` instead of mutating `app.state.config`.

Document the hot-reload boundary in `backend/CLAUDE.md` so reviewers know
which fields are picked up on the next request vs. which still require a
restart (`database`, `checkpointer`, `run_events`, `stream_bridge`,
`sandbox.use`, `log_level`, `channels.*`).

Refs: bytedance/deer-flow#3107 (BUG-001)

* fix(gateway): broaden get_config 503 to any config-load failure

Address review feedback on the previous commit:

1. Narrow exception catch removed. The old contract returned 503 whenever
   `app.state.config is None`. The first cut only mapped
   `FileNotFoundError`, leaving `PermissionError`, YAML parse errors, and
   pydantic `ValidationError` to bubble up as 500. At the request boundary
   we treat any inability to materialise the config as "configuration not
   available" (503) and log the original exception so the operator still
   has the stack.

2. Removed the unused `request: Request` parameter and the matching
   `# noqa: ARG001`. FastAPI's `Depends()` does not require the dependency
   to accept `Request`; the only call site uses the no-arg form.

3. `backend/CLAUDE.md` boundary now lists the *reason* each field is
   restart-required (engine binding, singleton caching, one-shot
   `apply_logging_level`, etc.), not just the field name, so reviewers do
   not have to reverse-engineer the boundary themselves.

Tests parametrise four exception classes (`FileNotFoundError`,
`PermissionError`, `ValueError`, `RuntimeError`) and assert 503 for each.

Refs: bytedance/deer-flow#3107 (BUG-001)

* fix(task-tool): defend _find_usage_recorder against non-list callbacks

Address review feedback. The previous commit handled the two common shapes
LangChain hands to async tool runs — a plain `list[BaseCallbackHandler]` and
a `BaseCallbackManager` subclass — but iterated any other shape directly,
which would still raise `TypeError` if e.g. a single handler instance leaked
through without a list wrapper.

Treat any non-list, non-manager `config["callbacks"]` value as "no recorder"
rather than crash. Docstring now lists all four shapes explicitly. New tests
cover the single-handler-object case, `runtime is None`, `callbacks is None`,
and `runtime.config` being a non-dict — all required to be silent no-ops.

Refs: bytedance/deer-flow#3107 (BUG-002)

* fix(frontend): drop dead identity ternary and add opt-in export tests

Address review feedback on the previous export commit:

1. Removed the no-op `typeof msg.content === "string" ? msg.content : msg.content`
   expression in `formatThreadAsJSON`. Both branches returned the same value;
   the message content now flows through unchanged whether it is a string or
   the rich `MessageContent[]` shape (LangChain JSON-serialises the array
   structure correctly already).

2. Expanded the JSDoc on `ExportOptions` to make it clearer that the four
   flags are not currently wired to any UI control — callers wanting a debug
   export must build the options object explicitly. The default behaviour
   continues to match the explicit prescription in
   bytedance/deer-flow#3107 BUG-006.

3. Added opt-in coverage. The previous tests only exercised the
   `options = {}` default path; the new cases verify each flag flips the
   corresponding payload back into the export so a future debug-export
   surface does not silently break the contract.

Refs: bytedance/deer-flow#3107 (BUG-006)

* fix(frontend): export subtask prefix constants and document fallback intent

Address review feedback on the previous BUG-007 commit:

1. `SUCCESS_PREFIX`, `FAILURE_PREFIX`, `TIMEOUT_PREFIX`, and the
   `ERROR_WRAPPER_PATTERN` regex are now exported. The JSDoc explicitly
   pins them as part of the backend↔frontend contract defined in
   `task_tool.py` and `tool_error_handling_middleware.py`, so any future
   structured-status migration (e.g. backend writing
   `additional_kwargs.subagent_status` instead of leading text) can
   reference these from one canonical place rather than redefine them.

2. The `in_progress` fallback now carries a docstring explaining the
   deliberate choice — LangChain only ever emits a `ToolMessage` once the
   tool itself has returned, so unrecognised content means the contract
   has drifted and "still running" is the right operator signal (eagerly
   marking it terminal-failed would mask the drift).

No behaviour change; this is documentation and an API export.

Refs: bytedance/deer-flow#3107 (BUG-007)

* fix(gateway): drop app.state.config snapshot and freeze run_events_config

Address @ShenAC-SAC's BUG-001 review on #3131. The previous cut still
stored an ``AppConfig`` snapshot on ``app.state.config`` for startup
bootstrap. Two follow-on hazards from that:

1. Future code touching the gateway lifespan could accidentally start
   reading ``app.state.config`` again, silently regressing the request
   hot path back to a stale snapshot.
2. ``get_run_context()`` paired a freshly-reloaded ``AppConfig`` with the
   startup-bound ``event_store`` and a *live* ``run_events_config``
   field — so an operator who edited ``run_events.backend`` mid-flight
   would have produced a run context whose ``event_store`` and
   ``run_events_config`` referred to different backends.

Clean approach (aligned with the direction in PR #3128):

- ``lifespan()`` keeps a local ``startup_config`` variable and passes it
  explicitly into ``langgraph_runtime(app, startup_config)`` and into
  ``start_channel_service``. No ``app.state.config`` attribute is set at
  any point.
- ``langgraph_runtime`` now accepts ``startup_config`` as a required
  parameter, removing the ``getattr(app.state, "config", None)`` lookup
  and the "config not initialised" runtime error.
- The matching ``run_events_config`` is frozen onto ``app.state`` next
  to ``run_event_store`` so ``get_run_context`` reads the two from the
  same startup-time source. ``app_config`` continues to be resolved
  live via ``get_app_config()``.
- ``backend/CLAUDE.md`` boundary explanation updated to spell out the
  ``startup_config`` / ``get_app_config()`` split.

New regression test ``test_run_context_app_config_reflects_yaml_edit``
exercises the worker-feeding path: it asserts that ``ctx.app_config``
follows a mid-flight ``config.yaml`` edit while
``ctx.run_events_config`` stays frozen to the startup snapshot the
event store was built from.

Refs: bytedance/deer-flow#3107 (BUG-001), bytedance/deer-flow#3131 review

* fix(frontend): parse Task cancelled and polling timed out as terminal

Address @ShenAC-SAC's BUG-007 review on #3131. `task_tool.py` actually
emits five terminal strings:

- `Task Succeeded. Result: …`
- `Task failed. …`
- `Task timed out. …`
- `Task cancelled by user.`               ← previously matched none
- `Task polling timed out after N minutes …` ← previously matched none

The previous cut handled three; the last two fell through to the
"unknown content" branch and pushed the subtask card back to
`in_progress` even though the backend had already reached a terminal
state. Add explicit matches plus regression tests for both. The
`in_progress` fallback is now reserved for genuinely unrecognised
output (i.e. contract drift), as documented.

Refs: bytedance/deer-flow#3107 (BUG-007), bytedance/deer-flow#3131 review

* fix(frontend): sanitize JSON export content via the Markdown content path

Address @ShenAC-SAC's BUG-006 review and the Copilot inline comment on
#3131. The previous cut filtered hidden/tool messages out of the JSON
export but still serialised `msg.content` verbatim, so:

- inline `<think>…</think>` wrappers stayed in the exported `content`
  even with `includeReasoning: false`,
- content-array thinking blocks leaked the `thinking` field,
- `<uploaded_files>…</uploaded_files>` markers leaked the workspace
  paths a user uploaded files to.

JSON now goes through the same sanitiser the Markdown path uses
(`extractContentFromMessage` + `stripUploadedFilesTag`). Reasoning and
tool_calls remain gated behind their `ExportOptions` flags. AI / human
rows that sanitise to empty content with no opted-in reasoning or tool
calls are dropped so the JSON matches the Markdown path's `continue`
on empty assistant fragments.

New regression tests cover the three leak shapes the reviewer called
out plus the empty-content-drop case.

Refs: bytedance/deer-flow#3107 (BUG-006), bytedance/deer-flow#3131 review

* test(gateway): align lifespan stub with langgraph_runtime two-arg signature

Codex round-3 review of c0bc7a06 flagged this: changing
`langgraph_runtime` to require `startup_config` as a second positional
argument broke the one-arg stub `_noop_langgraph_runtime(_app)` in
`test_gateway_lifespan_shutdown.py`, which is patched into
`app.gateway.app.langgraph_runtime` by the lifespan shutdown bounded-timeout
regression. Lifespan would then call the stub with two args and raise
`TypeError` before the bounded-shutdown assertion ran.

Update the stub to match the new signature. The shutdown test itself is
unaffected — it only cares about the channel `stop_channel_service` hang
path.

Refs: bytedance/deer-flow#3107 (BUG-001), bytedance/deer-flow#3131 review

* fix(frontend): strip every known backend marker in export, not just uploads

Codex round-3 review of 258ca800 and the matching maintainer feedback on
PR #3131 made the same point: the JSON export now ran the
Markdown-side sanitiser, but that sanitiser only stripped
`<uploaded_files>`. The full set of payloads middleware embeds inside
message `content` is larger:

- `<uploaded_files>` — `UploadsMiddleware`
- `<system-reminder>` — `DynamicContextMiddleware`
- `<memory>` — `DynamicContextMiddleware` (nested inside system-reminder)
- `<current_date>` — `DynamicContextMiddleware`

The primary protection is still `isHiddenFromUIMessage`: the
`<system-reminder>` HumanMessage is marked `hide_from_ui: true` and never
reaches the formatter. This commit adds the second line of defence so a
regression that drops the `hide_from_ui` flag — or any future middleware
that injects the same tag vocabulary into a visible HumanMessage —
cannot leak the payload into the export file.

Concrete changes:

- New `INTERNAL_MARKER_TAGS` constant + `stripInternalMarkers(content)`
  helper in `core/messages/utils.ts`. The constant doubles as
  documentation for the backend↔frontend contract.
- `formatMessageContent` in `export.ts` now calls `stripInternalMarkers`
  instead of `stripUploadedFilesTag`. UI render paths
  (`message-list-item.tsx`) keep using the narrower function so a user
  legitimately typing `<memory>` in a meta-discussion is preserved.
- The "drop empty rows" guard in `buildJSONMessage` switched from
  `=== undefined` to truthy `!` checks. Codex spotted the asymmetry: when
  `extractReasoningContentFromMessage` returned the empty string (which it
  legitimately can), the JSON path emitted `{reasoning: ""}` while the
  Markdown path's `!reasoning` `continue` correctly dropped the row.

New regression tests cover the defence-in-depth strip with a
`<system-reminder><memory><current_date>` payload deliberately *not*
marked `hide_from_ui`; tool-message sanitization under
`includeToolMessages: true`; the mixed-content-array case
(`thinking + text + image_url`); and the opted-in empty-reasoning drop.

Live verification on a real Ultra-mode thread that uploaded a PDF
(`曾鑫民-薪资交易流水.pdf`): backend state's first HumanMessage carries the
`<uploaded_files>` block (with `/mnt/user-data/uploads/...` paths) as part
of a content-array. The Markdown and JSON export blobs both come back
free of `<uploaded_files>`, `<system-reminder>`, `<current_date>`,
`tool_calls`, and reasoning — while preserving the user's `这是什么 ?`
prompt and the assistant's visible answer.

Refs: bytedance/deer-flow#3107 (BUG-006), bytedance/deer-flow#3131 review

* test(frontend): cover trim, varied N, and pre-execution Error: prefixes

Codex round-3 review of 50e2c257 flagged three coverage gaps in the
subtask-status parser:

1. `Task cancelled by user.` and `Task polling timed out` previously had
   no whitespace-trim coverage — the original trim test only exercised
   the success prefix. Streaming chunks can arrive with leading/trailing
   newlines; the regex needed an explicit assertion.
2. The polling-timeout case was tested only at one `N` (15 minutes). The
   backend interpolates the live `timeout_seconds // 60` value, so the
   matcher must hold for any positive integer. Now we run the case for
   1, 5, and 60 minutes.
3. `task_tool.py` also emits three `Error:` strings for pre-execution
   failures — unknown subagent type, host-bash disabled, and "task
   disappeared from background tasks". They are intentionally handled by
   `ERROR_WRAPPER_PATTERN` rather than dedicated prefixes (the wrapper
   already produces the right terminal-failed shape) but had no test
   coverage proving that wiring. Codex was right that a refactor splitting
   one of them off into its own prefix would silently break things.

The JSDoc on the constants block now spells the three pre-execution
errors out so the relationship between `task_tool.py` returns and the
prefix vocabulary is explicit.

No production code change beyond the docstring — this commit is pure
coverage hardening for the contract that already exists.

Refs: bytedance/deer-flow#3107 (BUG-007), bytedance/deer-flow#3131 review
2026-05-21 21:18:10 +08:00
Lawrance_YXLiao
1c5c585741
fix(runtime): bound write_file execution-failure observations (#3133)
* fix(runtime): bound write_file execution-failure observations

* fix(runtime): preserve write_file error prefixes

* test(runtime): trim write_file prefix assertions

* refactor(runtime): drop redundant exception suffix for permission/directory write errors

Address Copilot review on #3133: the PermissionError and IsADirectoryError
branches now return self-contained, non-redundant messages (e.g.
"Error: Permission denied writing to file: /mnt/...") via direct
truncation, instead of going through _format_write_file_error which
appended a duplicate ": PermissionError: permission denied" suffix.

OSError, SandboxError and the generic Exception branches keep the
unified "Failed to write file '{path}': {ExceptionType}: {detail}"
format so the model still sees a stable, machine-readable error class.

Removes the now-unused message= parameter from _format_write_file_error,
keeping a single code path. Truncation contract (<= 2000 chars) and
host-path sanitization unchanged.

* fix(runtime): handle write_file sandbox init errors

Initialize the requested path before sandbox setup so early sandbox failures can still return a bounded write_file error.

Add a regression test for sandbox initialization failures.

* style(test): format sandbox security tests
2026-05-21 20:35:46 +08:00
Xinmin Zeng
df95154282
fix(tracing): propagate session_id and user_id into Langfuse traces (#2944)
* fix(tracing): propagate session_id and user_id into Langfuse traces

Adds Langfuse v4 reserved trace attributes (langfuse_session_id,
langfuse_user_id, langfuse_trace_name, langfuse_tags) to
RunnableConfig.metadata inside the run worker, so the langchain
CallbackHandler can lift them onto the root trace.

- New deerflow.tracing.metadata.build_langfuse_trace_metadata() returns
  the reserved keys when Langfuse is in the enabled providers, else {}.
- worker.run_agent merges them with setdefault so caller-supplied keys
  win, allowing per-request overrides from upstream metadata.
- session_id mirrors the LangGraph thread_id; user_id reads
  get_effective_user_id() (falls back to "default" in no-auth mode).
- trace_name defaults to "lead-agent"; tags carry env and model name
  when DEER_FLOW_ENV (or ENVIRONMENT) and a model name are present.

Closes #2930

* fix(tracing): attach Langfuse callback at graph root so metadata propagates

The first commit injected ``langfuse_session_id`` / ``langfuse_user_id`` /
``langfuse_trace_name`` / ``langfuse_tags`` into ``RunnableConfig.metadata``,
but on ``main`` the Langfuse callback is attached at *model* level
(``models/factory.py``). LangChain still threads ``parent_run_id`` through
the contextvar, so the handler sees the model as a nested observation and
``__on_llm_action`` strips the ``langfuse_*`` keys
(``keep_langfuse_trace_attributes=False``). The trace's top-level
``sessionId`` / ``userId`` therefore stayed empty in deer-flow's LangGraph
runtime — confirmed live against a real Langfuse instance.

This commit moves the callback to the **graph invocation root** so the
handler fires ``on_chain_start(parent_run_id=None)`` and runs the
``propagate_attributes`` path that actually lifts ``session_id`` /
``user_id`` onto the trace:

- ``models/factory.py``: add ``attach_tracing`` keyword (default ``True``)
  so standalone callers (``MemoryUpdater``, etc.) keep their direct
  model-level tracing.
- ``agents/lead_agent/agent.py``: call ``build_tracing_callbacks()`` once
  inside ``_make_lead_agent`` and append the result to
  ``config["callbacks"]``; the four in-graph ``create_chat_model`` sites
  (bootstrap, default agent, sync + async summarization) pass
  ``attach_tracing=False`` to avoid duplicate spans.
- ``agents/middlewares/title_middleware.py``: same ``attach_tracing=False``
  for the title-generation model, since it inherits the graph's
  RunnableConfig via ``_get_runnable_config``.

Test updates:

- ``tests/test_lead_agent_model_resolution.py`` and
  ``tests/test_title_middleware_core_logic.py``: extend the fake
  ``create_chat_model`` signatures / mock assertions to accept the new
  ``attach_tracing`` kwarg.
- ``tests/test_worker_langfuse_metadata.py``: switch the no-user fallback
  test from direct ContextVar mutation to ``monkeypatch.setattr`` on
  ``get_effective_user_id`` to avoid pollution across the langfuse OTel
  global tracer provider.
- ``tests/conftest.py``: add an autouse fixture that resets
  ``deerflow.config.title_config._title_config`` to its pristine default
  after every test. Any test that loads the real ``config.yaml`` (via
  ``get_app_config()``) calls ``load_title_config_from_dict`` and mutates
  the module-level singleton, which previously poisoned the
  title-middleware suite when run after, e.g., the new
  ``test_worker_langfuse_metadata.py`` cases. The fixture is independent
  of this PR's main change but unblocks the cross-file test run.

Live verification (same Langfuse instance as before):

- Drove ``worker.run_agent`` against the real ``make_lead_agent`` +
  ``gpt-4o-mini`` for three distinct ``user_context`` identities
  (``fancy-engineer``, ``alice-pm``, ``bob-designer``).
- Each run produced one ``lead-agent`` trace whose top-level
  ``sessionId`` / ``userId`` / ``tags`` carry the expected values, e.g.
  ``session=e2e-2930-8f347c-alice-pm user=alice-pm name='lead-agent'
  tags=['model:gpt-4o-mini']``.

Refs #2930.

* fix(tracing): extend root-callback + metadata injection to the embedded client

Addresses Copilot review on PR #2944.

Commit 2 disabled model-level tracing for ``TitleMiddleware`` and
``_create_summarization_middleware`` because ``_make_lead_agent`` now
attaches the tracing callbacks at the graph invocation root. But the
embedded ``DeerFlowClient`` does not call ``_make_lead_agent`` — it
calls ``_build_middlewares`` directly and never appends the tracing
handlers to its ``RunnableConfig``. So under the embedded path,
title-generation and summarization LLM calls were left untraced —
a regression introduced by this PR.

This commit mirrors the gateway worker's injection in
``DeerFlowClient.stream``:

- Append ``build_tracing_callbacks()`` to ``config["callbacks"]`` so
  the Langfuse handler sees ``on_chain_start(parent_run_id=None)`` at
  the graph root and runs the ``propagate_attributes`` path.
- Merge ``build_langfuse_trace_metadata(...)`` into
  ``config["metadata"]`` with ``setdefault`` so caller-supplied keys
  still win.
- ``_ensure_agent`` now creates its main model with
  ``attach_tracing=False`` to avoid duplicate spans now that the
  callback lives at the graph root.

Docs:
- ``backend/CLAUDE.md`` Tracing section rewritten to describe the
  graph-root attachment model (replacing the inaccurate
  "at model-creation time" wording).
- ``README.md`` Langfuse section now lists both injection points
  (worker + client) instead of only the worker path.

Tests:
- ``tests/test_client_langfuse_metadata.py`` (new, 3 cases):
  callbacks + metadata are injected when Langfuse is enabled,
  caller-supplied metadata overrides win via ``setdefault``, and the
  injection is inert when Langfuse is disabled.

Live verification on the real Langfuse instance:

  === user=fancy-client ===
    id=cbd22847..  session=client-2930-6b9491-fancy-client  user=fancy-client  name='lead-agent'
  === user=alice-client ===
    id=b4f6f576..  session=client-2930-6b9491-alice-client  user=alice-client  name='lead-agent'

Refs #2930.

* refactor(tracing): address maintainer review on PR #2944

Addresses @WillemJiang's 5 comments.

1. Duplicated metadata-injection code between worker.py and client.py
   New ``deerflow.tracing.inject_langfuse_metadata(config, ...)`` helper
   takes the 10-line build + merge + setdefault logic that was duplicated
   in ``runtime/runs/worker.py`` and ``client.py``. Both callers now share
   a single source of truth, so the two paths cannot drift.

2. Direct private-attribute mutation in conftest.py and tests
   Added public ``reset_tracing_config()`` / ``reset_title_config()``
   functions. ``tests/conftest.py`` and every test that previously did
   ``tracing_module._tracing_config = None`` or
   ``title_module._title_config = TitleConfig()`` now goes through the
   public API. A future internal rename will surface as an ImportError
   instead of a silent no-op.

3. client.py reading os.environ directly
   ``DeerFlowClient.__init__`` grows an optional ``environment`` parameter
   so programmatic callers can pass the deployment label explicitly.
   ``stream()`` consults ``self._environment`` first and only falls back
   to ``DEER_FLOW_ENV`` / ``ENVIRONMENT`` env vars when nothing was
   passed in. Backwards compatible — env-var behaviour preserved for
   callers that opt to keep using it.

4. build_tracing_callbacks() cached on hot path
   Not implemented. Inspected the langfuse v4 ``langchain.CallbackHandler``
   constructor: it only resolves the module-level singleton client via
   ``get_client()`` and initialises a few dicts (no I/O, no env parsing
   at construction time). The build is essentially free. Caching would
   trade a non-measurable speedup for two real risks: handler instances
   carry per-run state internally (``_run_states``, ``_root_run_states``,
   ``last_trace_id``), and tracing config can be reloaded by env-var
   changes between runs. Will revisit if profiling ever shows it as
   a hot spot.

5. attach_tracing=False easy to forget at new in-graph call sites
   - Module docstring at the top of ``lead_agent/agent.py`` documents
     the invariant ("every in-graph ``create_chat_model`` MUST pass
     ``attach_tracing=False``") and enumerates the current sites.
   - New regression test
     ``test_make_lead_agent_attaches_tracing_callbacks_at_graph_root`` in
     ``tests/test_lead_agent_model_resolution.py`` locks both halves of
     the invariant: ``config["callbacks"]`` carries the tracing handler
     after ``_make_lead_agent``, AND every ``create_chat_model`` call
     captured by the test passes ``attach_tracing=False``. A future
     in-graph site that forgets the flag will fail this test.

Lint clean. Full touched-suite bundle: 246 passed.

---------

Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
2026-05-21 16:49:31 +08:00
Xinmin Zeng
31513c2ccb
fix(persistence): emit tz-aware timestamps from SQLite-backed stores (#3130)
SQLAlchemy's DateTime(timezone=True) is a no-op on SQLite (the backend
has no native tz type), so values round-tripped through the DB come
back as naive datetimes. The four SQL _row_to_dict helpers were calling
.isoformat() directly on those naive values, shipping timezone-less
strings like "2026-05-20T06:10:22.970977" out of the API. The browser's
new Date(...) then parses them as local time, shifting recent threads
in /threads/search by the local UTC offset (about 8h in Asia/Shanghai).

Route the four call sites through coerce_iso() instead — it already
normalizes naive values as UTC and emits "+00:00" so the wire format
always carries tz. No data migration is needed; existing SQLite rows
read back via the corrected serializer.

PostgreSQL deployments are unaffected because timestamptz preserves
tzinfo end-to-end.

Closes #3120
2026-05-21 16:22:09 +08:00
Airene Fang
923f516deb
feat(trace):LangGraph -> lead_agent and set custom agent_name to run_name (#3101)
* feat(trace):LangGraph -> lead_agent and set user custom agent name to run_name

* feat(trace):follow github copilot suggest

* feat(trace):Refactor run_name resolution and improve test coverage
2026-05-21 14:48:28 +08:00