2179 Commits

Author SHA1 Message Date
john lee
cbf8b194e8
fix(runtime): harden JSONL async I/O and DB put_batch thread validation (#3084)
* fix(runtime): harden JSONL async I/O and DB put_batch thread validation (#2816)

- JsonlRunEventStore: offload all file I/O to asyncio.to_thread() so the
  event loop is never blocked; add per-thread asyncio.Lock to serialise
  concurrent puts and prevent interleaved JSONL lines
- Split _ensure_seq_loaded into a sync _compute_max_seq (runs in thread)
  and an async wrapper; seq counter is recovered from disk on fresh store init
- DbRunEventStore.put_batch: raise ValueError when events span multiple
  thread_ids (previously silently assumed same thread)
- Add test_jsonl_event_store_async_io.py: 12 tests covering lock reuse,
  concurrent seq monotonicity, disk recovery, and mixed-thread batch rejection

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: address Copilot review comments

- delete_by_thread: pop _write_locks after releasing the lock to prevent
  unbounded growth when threads are repeatedly created and deleted
- tests: add regression guard asserting asyncio.to_thread is called for
  _write_record in put(); assert _write_locks entry removed on delete

* fix(lint): move patch import to local scope to fix ruff I001

* fix(lint): apply ruff check+format fixes to test file

* fix(runtime): address review feedback for JSONL async I/O hardening (#2816)

Use setdefault for atomic lock init in _get_write_lock; pop _write_locks
inside the held lock scope in delete_by_thread; update test docstring
and assert lock entry also cleared on delete.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: rayhpeng <rayhpeng@gmail.com>
2026-05-29 09:27:53 +08:00
Nan Gao
d46a5779bc
fix(chat): preserve messages after summarization (#3280)
* fix(chat): preserve messages after summarization

* make format

* fix(chat): address summarization review comments
2026-05-29 08:24:47 +08:00
Xinmin Zeng
2ace78d1e5
fix(frontend): surface backend detail when agent name check fails (#3048)
* fix(frontend): surface backend detail when agent name check fails

The new-agent page caught AgentNameCheckError but only branched on
reason === "backend_unreachable". Everything else (notably the 422
"Invalid agent name '...'. Must match ^[A-Za-z0-9-]+$" response from
GET /api/agents/check when the user submits a name with disallowed
characters — trailing space, dot, Chinese, invisible whitespace from
copy-paste) fell through to the generic fallback "Could not verify
name availability — please try again", swallowing the detail that
already told the user exactly what to fix.

Add a request_failed branch that surfaces err.message (which
checkAgentName already populates from the backend's detail at
core/agents/api.ts). The disabled / backend_unreachable / unknown-
error paths are unchanged.

Pin the contract with unit tests covering: 200 success, fetch
rejection, 502/503/504 network errors, agents_api disabled detail,
422 validation detail carried verbatim, statusText fallback when
detail is absent, and a regression guard against misclassifying a
422 as agents_api disabled.

Closes #3041

* fix(frontend): localise the error prefix when surfacing backend detail

The previous commit surfaced the backend's raw `err.message` on the
new-agent page when the name check failed. The detail itself is
English (backend's `_validate_agent_name` text, any 5xx business
message, etc.) and dropping it bare into a zh-CN page produced a
jarring English-among-Chinese line that didn't match neighbouring
strings like "已存在同名智能体" / "无法验证名称可用性".

Add `nameStepCheckErrorWithDetail` as a templated string ("Name
check failed: {detail}" / "名称校验失败:{detail}"), mirroring the
existing `nameStepBootstrapMessage` `{name}` template pattern. The
page wraps `err.message` in it when present and falls back to the
plain `nameStepCheckError` when the detail is empty.

Rendered output (verified locally with a Console fetch mock that
returns 500 + detail):

  zh-CN: 名称校验失败:Database connection lost: SQLAlchemy connection
         pool exhausted (max 5 connections, all in use)
  en-US: Name check failed: Database connection lost: SQLAlchemy
         connection pool exhausted (max 5 connections, all in use)

The localised prefix tells the user *what operation* failed; the
raw detail tells them *why*. Translating the detail itself would
be lossy (any unbounded backend string would need a translation
table) and would break the debuggability the previous commit
delivered.

Refs #3041

* fix(frontend): distinguish backend detail from generated fallback in AgentNameCheckError

Addresses Copilot's review on #3048: the previous commits keyed off
`err.message`, but `checkAgentName` substitutes a generated fallback
string ("Failed to check agent name: ${statusText}") when the backend
sent no detail. That guaranteed `err.message` was always truthy, made
the `nameStepCheckError` fallback branch unreachable in practice, and
could surface awkward strings like "名称校验失败:Failed to check
agent name: Bad Gateway" in the UI.

Add an explicit `detail: string | null` field to AgentNameCheckError.
`checkAgentName` populates it only when the backend response actually
carried a string `detail` (defensive guard against the dict-shaped
detail that other deer-flow endpoints use for typed error codes).
The new-agent page now selects on `err.detail` instead of `err.message`
so the localised fallback wins when no real detail exists.

Also fix the prettier formatting that broke lint-frontend CI on the
previous push.

Test changes:
- The 422 carry-through test now asserts both `detail` and `message`
  hold the backend string verbatim.
- A new "falls back to statusText in message but leaves detail null"
  test pins the contract that no real detail ⇒ no UI surface leak.
- A new "treats non-string detail as null" test guards against future
  backend schema drift toward dict-shaped detail.

Refs #3041 #3048
2026-05-28 18:38:45 +08:00
AochenShen99
8330b244a9
docs: add blocking IO detection usage and maintenance (#3233)
* docs: add blocking IO detection usage and maintenance

* docs: address blocking io doc review feedback
2026-05-28 18:26:26 +08:00
AochenShen99
44677c5eb4
feat(provider) Add patched MiMo reasoning content support (#3298)
* Add patched MiMo reasoning content support

* Clarify MiMo patched model coverage

* Remove unused MiMo payload index

* Address MiMo review nits
2026-05-28 18:24:32 +08:00
Admire
2fdfff0db3
fix(frontend): fix Mermaid preview failure in historical messages (#3196)
* fix(frontend): render historical mermaid diagrams

* fix(frontend): address mermaid review feedback

* Stabilize cancel lifecycle test

* fix(frontend): handle mermaid fence variants

* fix(frontend): normalize mermaid arrow spacing

* fix(frontend): handle mermaid CRLF fences

* chore: keep mermaid fix frontend-scoped

---------

Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
2026-05-28 18:20:02 +08:00
zgenu
737abc0e45
fix: ignore stale run reconnect conflicts (#3284)
* fix: ignore stale run reconnect conflicts

* Potential fix for pull request finding

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

* fix: ignore stale run reconnect conflicts

---------

Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
2026-05-28 17:29:30 +08:00
AochenShen99
8decfd327e
Fix custom skill install permissions (#3241)
* Fix custom skill install permissions

* Fix skill upload test portability

* Keep custom skill writes sandbox readable

* Clear sandbox write bits on skill permissions

* Limit custom skill write permission updates
2026-05-28 15:48:32 +08:00
Xinmin Zeng
0287240728
fix(frontend): show new thread in sidebar immediately on creation (#3276) (#3283)
When a user starts a new conversation, the sidebar list did not display
it until the AI finished streaming and generated a title. This made it
impossible to switch back to an in-progress conversation when working
with multiple threads concurrently.

Optimistically insert the new thread into the TanStack Query cache
during the `onCreated` callback so the sidebar renders a placeholder
entry ("New chat") as soon as the backend acknowledges thread creation.
The existing `onUpdateEvent` title handler and `onFinish` query
invalidation then update the entry in-place with the real title.
2026-05-28 15:27:38 +08:00
Lucy Shen
37451500eb
fix(gateway): split stream_existing_run into per-method routes for unique OpenAPI operationIds (#3228)
* fix(gateway): split stream_existing_run into per-method routes for unique OpenAPI operationIds

`@router.api_route("/.../stream", methods=["GET", "POST"])` registers a
single FastAPI route that holds both methods. FastAPI's auto-generated
`operationId` is computed once per route from a single method picked out
of `route.methods`, so when OpenAPI generation iterates over every method
on that route both end up sharing the same `operationId`. That triggers
`UserWarning: Duplicate Operation ID stream_existing_run_..._stream_(get|post) for function stream_existing_run`
during `app.openapi()` and produces an invalid OpenAPI spec for SDK /
codegen consumers.

Register GET and POST as two separate routes on the same handler so each
method gets a distinct auto-generated `operationId` ("..._stream_get" and
"..._stream_post"). Behavior is otherwise unchanged: same handler, same
`require_permission` decoration, same response.

Add `tests/test_openapi_operation_ids.py` to lock in the invariant:
no duplicate-operationId warnings during spec generation, globally unique
operationIds across the spec, and distinct GET / POST operationIds on the
stream endpoint specifically. Reverted the source change locally and
confirmed all three tests fail before the fix.

* test(runtime): widen CancelledError catch in _ScriptedAgent to fix cancel-race flake

`_ScriptedAgent.astream()` previously only caught `asyncio.CancelledError`
inside the inner `if self.block_after_first_chunk:` while-loop. Cancellation
arriving during any earlier `await` in the same body
(`self.model.ainvoke`, `_write_checkpoint`, the `yield`) would propagate
without setting `controller.cancelled`, so callers waiting on
`controller.cancelled.wait(5)` after `POST /cancel` returned 204 could race
and time out.

`test_cancel_interrupt_stops_running_background_run` waits only for the
`started` event (set on the first line of `astream`) before issuing cancel,
so its race window spans all three pre-loop `await`s. On a clean `main`
checkout, stress-running the test 20× reproduces the failure 6/20
(~30%). `test_cancel_rollback_restores_pre_run_checkpoint`, which waits
for the later `checkpoint_written` event, passes 20/20 — confirming the
race lives entirely in the gap between `started.set()` and the
cancellation-aware block.

Widen the try/except to cover the entire `astream` body so any
`CancelledError` sets the controller event; the non-cancel path is
unchanged (no exception means no event set). After this change the
previously flaky test passes 50/50, the rollback test still passes 30/30,
and the full backend suite remains at 3649 passed / 19 skipped.

Test-only change — `backend/tests/test_runtime_lifecycle_e2e.py` is the
only file touched; the production cancel pipeline is unaffected.
2026-05-28 08:20:52 +08:00
Lawrance_YXLiao
3cb75887c1
fix(memory): parse wrapped memory update json responses (#3252)
* fix(memory): parse wrapped memory update json responses

* test(memory): format wrapped response coverage

* fix(memory): guard malformed nested memory facts

* fix(memory): require full update object when parsing responses

* fix(memory): fail closed on unsafe partial removals

* style(memory): format updater tests
2026-05-28 07:46:44 +08:00
AochenShen99
a5599c100c
fix(gateway): honour on_disconnect on /wait endpoints (#3267)
* fix(gateway): honour on_disconnect on /wait endpoints (#3265)

The non-streaming /threads/{tid}/runs/wait and /runs/wait handlers used
to await record.task directly with no disconnect handling and silently
swallow CancelledError. When a long tool call (e.g. pip install inside
a custom skill) kept the connection idle long enough for an
intermediate HTTP layer to time out, the handler would still read the
in-progress checkpoint and return it as if the run had completed
normally -- masking a half-finished run as a successful response.

Add wait_for_run_completion in app.gateway.services that mirrors
sse_consumer's bridge-consumption pattern: subscribe to the stream
bridge until END_SENTINEL, poll request.is_disconnected on every
wake-up, and on real client disconnect cancel the background run when
record.on_disconnect is "cancel". Wire it into both wait endpoints.

The streaming path was unaffected because sse_consumer already has
this loop; this just brings /wait to parity.

* fix(gateway): skip checkpoint serialization on /wait disconnect

Copilot review on #3267 caught a follow-on of the same #3265 bug: when
the client disconnects, wait_for_run_completion breaks out of the bridge
loop and cancels the run, but the /wait endpoint then continues to read
the checkpointer and serializes whatever partial checkpoint exists as a
normal 200 response.

Have the helper return a bool — True only when END_SENTINEL was observed
— and skip the checkpoint serialization path on False. Also reorder the
inner check so END_SENTINEL is honoured even when is_disconnected() flips
true in the same iteration; the run truly finished so the real final
checkpoint is still valid.
2026-05-28 07:22:39 +08:00
dependabot[bot]
9e332c594a
chore(deps): bump uuid from 10.0.0 to 14.0.0 in /frontend (#3281)
Bumps [uuid](https://github.com/uuidjs/uuid) from 10.0.0 to 14.0.0.
- [Release notes](https://github.com/uuidjs/uuid/releases)
- [Changelog](https://github.com/uuidjs/uuid/blob/main/CHANGELOG.md)
- [Commits](https://github.com/uuidjs/uuid/compare/v10.0.0...v14.0.0)

---
updated-dependencies:
- dependency-name: uuid
  dependency-version: 14.0.0
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-05-28 07:14:44 +08:00
Willem Jiang
162fb2143e
fix(mcp): skip session pooling for HTTP/SSE transports to avoid anyioRuntimeError (#3203) (#3224)
* fix(mcp): skip session pooling for HTTP/SSE transports to avoid anyio RuntimeError (#3203)

  HTTP/SSE transports use anyio.TaskGroup internally for streamable
  connections. These task groups have cancel scopes bound to the async task
  that created them, so closing a pooled session from a different task
  raises RuntimeError. Restrict session pooling to stdio transports only.

* Potential fix for pull request finding

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

* docs: clarify MCP pooling applies only to stdio tools

Agent-Logs-Url: https://github.com/bytedance/deer-flow/sessions/2dd9881d-54c6-45fd-90bc-154a09e29841

Co-authored-by: WillemJiang <219644+WillemJiang@users.noreply.github.com>

---------

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
2026-05-27 08:32:57 +08:00
QY
92905e9e3e
fix(todo): reuse thread state schema (#3206)
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
2026-05-26 23:58:08 +08:00
AochenShen99
da41701f87
Add static blocking IO inventory (#3208)
* feat(detectors): add static blocking IO inventory

* refactor(detectors): drop superseded runtime probe; clarify static report path

- Remove the #2924 custom runtime blocking IO probe entirely:
  backend/tests/support/detectors/blocking_io.py,
  backend/tests/test_blocking_io_detector.py,
  backend/tests/test_blocking_io_probe_integration.py, and the
  pytest_addoption / pytest_runtest_call / pytest_runtest_teardown /
  pytest_sessionfinish / pytest_terminal_summary hooks plus the
  blocking_io_detector fixture from backend/tests/conftest.py.
  Its narrow DEFAULT_BLOCKING_CALL_SPECS (time.sleep, requests, httpx,
  os.walk, Path.resolve, Path.read_text, Path.write_text) cannot serve
  as a CI gate; a Blockbuster-backed runtime detector will land in a
  separate follow-up PR. Leaving the half-coverage probe alongside
  the static inventory in this PR added a redundant detect path with
  no production value.
- Address Copilot review comments on backend/README.md and
  backend/CLAUDE.md by stating explicitly that the JSON report writes
  to .deer-flow/blocking-io-findings.json at the repository root,
  whether the target is invoked from the repo root or from backend/.

Verified: pytest tests/test_detect_blocking_io_static.py (18 passed),
ruff check + format on touched files (passed), make detect-blocking-io
from both repo root and backend/ produce the same 105-finding report
at <repo-root>/.deer-flow/blocking-io-findings.json.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
2026-05-26 23:30:24 +08:00
Xinmin Zeng
e02801944a
chore: add a pull request template (#3259)
* chore: add pull request template

* fix: address Copilot review on PR template

- Reword the issue-link comment (plain #123 links; Fixes/Closes only auto-closes)
- Remove the standalone '-' bullets under Bug fix verification / Validation
- Align Validation commands with CI (frontend format + build with BETTER_AUTH_SECRET)
2026-05-26 23:25:29 +08:00
Stellar鱼
b00749a8a6
fix(auth): share internal gateway token across workers (#3184)
* fix(auth): share internal gateway token across workers

* fix: restore deploy script executable bit

* Update deploy.sh

to skip the auth_token setup for the down command

---------

Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
2026-05-26 23:19:57 +08:00
AochenShen99
e344be8d94
feat(tests): add Blockbuster runtime gate for event-loop blocking IO (#3229)
* feat(tests): add Blockbuster runtime gate for event-loop blocking IO

Adds a strict runtime gate that fails CI when sync blocking IO calls run
on the asyncio event loop thread through DeerFlow business code.

Components:
- backend/tests/support/detectors/blocking_io_runtime.py — Blockbuster
  context scoped to `app.*` and `deerflow.*` so test infrastructure,
  pytest internals, and third-party libraries stay silent.
- backend/tests/blocking_io/conftest.py — pytest_runtest_protocol
  hookwrapper that wraps every item (setup + call + teardown) with the
  strict context. Respects `@pytest.mark.allow_blocking_io` opt-out.
- backend/tests/blocking_io/test_skills_load.py — regression anchor for
  the #1917 fix (asyncio.to_thread offload around
  LocalSkillStorage.load_skills).
- backend/tests/blocking_io/test_sqlite_lifespan.py — regression anchor
  for the #1912 fix (asyncio.to_thread offload around
  ensure_sqlite_parent_dir).
- backend/tests/blocking_io/test_gate_smoke.py — meta-test asserting the
  gate actually catches unoffloaded blocking IO and that the
  `@pytest.mark.allow_blocking_io` opt-out works.
- backend/Makefile — `make test-blocking-io` target.
- .github/workflows/backend-blocking-io-tests.yml — hard-fail PR gate on
  ubuntu-latest. Windows matrix deferred to follow-up.

Dependencies:
- blockbuster>=1.5.26,<1.6 added to dev group.

Coverage boundary (called out in PR body): the gate only catches blocking
IO on code paths the test suite actually exercises. Static AST inventory
(separate, informational) is the complementary coverage tool. Three blind
spot categories — untested paths, mocked-away paths, env-mismatched paths
— are documented in the PR description.

Findings surfaced while authoring this PR:
- resolve_sqlite_conn_str in runtime/store/_sqlite_utils.py:19 does sync
  Path.resolve() -> os.path.abspath on the lifespan loop thread, ahead of
  the #1912 fix. Not addressed here; tracked as follow-up.

Tests: 4 passed locally (`make test-blocking-io`).
Lint/format: clean (`ruff check` and `ruff format --check`).

* fix(tests): scope Blockbuster gate to blocking-io suite

* fix(tests): harden Blockbuster runtime gate

* test(blocking-io): add project rule extension point

* test(blocking-io): address review cleanup
2026-05-26 23:03:49 +08:00
Admire
f68bcb771c
fix(frontend): guard message copy clipboard access (#3211)
* fix(frontend): guard message copy clipboard access

* fix(frontend): reuse clipboard guard across copy actions
2026-05-26 09:37:51 +08:00
AochenShen99
11dd5b0683
fix(frontend): strip unclosed <think> tags from streaming AI content (#3218)
* fix(frontend): strip unclosed <think> tags from streaming AI content

During streaming, an opening <think> tag may arrive in one chunk
while the matching </think> arrives in a later chunk. The existing
splitInlineReasoning regex only matched fully closed pairs, so the
mid-flight reasoning was left in message.content and rendered into
the chat bubble via the markdown pipeline's rehypeRaw plugin until
the closing tag landed.

Extend splitInlineReasoning with a second pass: after stripping every
closed <think>...</think> pair, route any remaining content from a
lone opener to the reasoning slot and leave only the preceding
preamble in content. Closed-tag behavior is unchanged.

Covers every provider whose stream emits reasoning inline as <think>
tags (MiniMax streaming path, MindIE, PatchedChatOpenAI, and any
gateway-served DeepSeek/OpenAI-compatible model).

* style(frontend): apply prettier formatting to streaming reasoning tests

* fix(frontend): skip <think> split for literal think tags in inline code

Treats a `<think>` opener immediately preceded by a backtick as part of
markdown inline code rather than a streaming reasoning marker. Prevents
permanent content truncation when an AI message documents the `<think>`
tag literally (e.g. ``Use `<think>` markers``), where the streaming-safe
fallback would otherwise route the rest of the answer into the reasoning
panel because no `</think>` ever arrives.

Adds regression tests for both the post-stream and mid-stream cases.
2026-05-26 09:35:07 +08:00
Willem Jiang
f9b7071304
fix(sandbox): add group/other read permissions to uploaded files for Docker sandbox (#3127) (#3134)
* fix(sandbox): add group/other read permissions to uploaded files for Docker sandbox (#3127)

  When using AIO sandbox with LocalContainerBackend, uploaded files are
  created with 0o600 (owner-only) permissions by the gateway process
  running as root. The sandbox process inside the Docker container runs
  as a non-root user and cannot read these bind-mounted files, causing
  a "Permission denied" error on read_file.

  Add `needs_upload_permission_adjustment` attribute to SandboxProvider
  (default True) to indicate that uploaded files need chmod adjustment.
  LocalSandboxProvider opts out (same user). A new `_make_file_sandbox_readable`
  function adds S_IRGRP | S_IROTH bits after files are written, changing
  permissions from 0o600 to 0o644 so the sandbox can read the uploads.

  fixes #3127

* fix(uploads): unconditionally adjust file permissions for sandbox access

  The conditional check  meant uploaded files retained 0o600
  permissions in some Docker sandbox configurations, preventing the
  sandbox process (UID 1000) from reading them. Always add group/other
  read bits so every sandbox setup can access uploaded content. Also add
  read bits to the sync-path writable helper as defense in depth.
2026-05-25 09:26:18 +08:00
Admire
e7967a7fc3
fix(frontend): hide copy for streaming assistant turn (#3176) 2026-05-23 23:29:16 +08:00
Huixin615
8785658a2e
fix(agents): preserve todos state across node updates (#3180)
* fix(agents): preserve todos state across node updates

ThreadState.todos had no reducer, so any downstream node returning a
partial state without todos was implicitly setting it to None, which
LangGraph then used to overwrite the previously streamed value. This
caused the to-do list to render correctly during streaming but vanish
once streaming completed.

Add a merge_todos reducer that keeps the last non-None value, mirroring
the merge_artifacts pattern already used in the same file. An explicit
empty list is still respected so that 'user cleared todos' works.

Tests: 10 new unit tests in tests/test_thread_state_reducers.py covering
merge_todos plus regression coverage for merge_artifacts and
merge_viewed_images. All 69 thread-related tests pass locally.

Closes #3123

* test(agents): add annotation binding regression guard

Address Copilot review feedback on #3123:

- Add TestThreadStateAnnotations asserting that ThreadState.todos is
  Annotated with merge_todos. Without this guard, reverting the
  Annotated[list | None, merge_todos] binding would silently regress
  #3123 while all existing reducer unit tests continue to pass.

- Align test imports to 'from deerflow.agents.thread_state import ...'
  matching the rest of the backend test suite.
2026-05-23 23:25:38 +08:00
rayhpeng
0fb05825a2
fix(runtime): make run creation persistence atomic (#3152)
* fix runtime run creation persistence atomicity

* fix run creation cancellation rollback

* fix run manager test cleanup await

* clarify run creation rollback on cancellation

* document new run persistence rollback boundary

---------

Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
2026-05-23 22:43:34 +08:00
Admire
d0fa37e71d
fix(frontend): avoid duplicate optimistic user message (#3002) 2026-05-23 17:02:23 +08:00
AochenShen99
604fcbb9d2
Stabilize write artifact previews (#3172) 2026-05-23 16:56:14 +08:00
Nan Gao
a64a39dbc0
config: raise default summarization trigger before v2.0-m1 (#3174)
* config: update summarization configuration

* docs: sync summarization trigger guidance
2026-05-23 15:38:25 +08:00
JeffJiang
b103d1a7f5
feat(frontend): support static website demo mode (#3170)
* feat(frontend): support static website demo mode

* fix(frontend): render html artifact previews from blob content

* chore(frontend): apply pre-commit formatting

* fix(frontend): address static demo PR review comments

* Update the release information of DeerFlow

---------

Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
2026-05-23 00:10:56 +08:00
AochenShen99
66d6a6a4e8
fix: harden run finalization persistence (#3155)
* fix: harden run finalization persistence

* style: format gateway recovery test

* fix: align run repository return types

* fix: harden completion recovery follow-up
2026-05-23 00:09:06 +08:00
Nan Gao
f0bae28636
fix(middleware): handle repeated tool call ids (#3143)
* fix(middleware): handle repeated tool call ids

* add tests

* refactor(middleware): rely on tool result queues
2026-05-22 21:44:05 +08:00
Lawrance_YXLiao
2eeb597985
fix(runs): expose active progress counters (#3148)
* fix(runs): expose active progress counters

* fix(runs): avoid delayed progress flush on completion

* fix(runs): tighten progress snapshot semantics

* fix(runs): preserve omitted progress fields

* chore(runs): remove duplicate journal initialization
2026-05-22 21:42:14 +08:00
Nan Gao
914d6a4f1c
docs: add provider safety termination post (#3167) 2026-05-22 21:33:15 +08:00
Xinmin Zeng
be0eae9825
fix(runtime): suppress tool execution when provider safety-terminates with tool_calls (#3035)
* fix(runtime): suppress tool execution when provider safety-terminates with tool_calls

When a provider stops generation for safety reasons (OpenAI/Moonshot
finish_reason=content_filter, Anthropic stop_reason=refusal, Gemini
finish_reason=SAFETY/BLOCKLIST/PROHIBITED_CONTENT/SPII/RECITATION/
IMAGE_SAFETY/...), the response may still carry truncated tool_calls.
LangChain's tool router treats any non-empty tool_calls as executable,
so partial arguments (e.g. write_file with a half-finished markdown)
get dispatched and the agent loops on retry.

Add SafetyFinishReasonMiddleware at after_model: detect safety
termination via a pluggable detector registry, clear both structured
tool_calls and raw additional_kwargs.tool_calls / function_call,
preserve response_metadata.finish_reason for downstream observers,
stamp additional_kwargs.safety_termination for traces, append a
user-facing explanation to message content (list-aware for thinking
blocks), and emit a safety_termination custom stream event so SSE
consumers can reconcile any "tool starting..." UI.

Default detectors cover OpenAI-compatible content_filter, Anthropic
refusal, and Gemini safety enums (text + image). Custom providers are
added via reflection (same pattern as guardrails). Wired into both
lead-agent and subagent runtimes.

Closes #3028

* fix(runtime): persist safety_termination as a middleware audit event

Address review on #3035: the SSE custom event is great for live
consumers but invisible to post-run audit. RunEventStore should carry
its own row so operators can answer "which runs were safety-suppressed
today?" from a single SQL query without joining the message body.

Worker now exposes the run-scoped RunJournal via
runtime.context["__run_journal"] (sentinel key, internal channel).
SafetyFinishReasonMiddleware calls the previously-unused
RunJournal.record_middleware, which emits

  event_type = "middleware:safety_termination"
  category   = "middleware"
  content    = {name, hook, action, changes={
                  detector, reason_field, reason_value,
                  suppressed_tool_call_count,
                  suppressed_tool_call_names,
                  suppressed_tool_call_ids,
                  message_id, extras}}

Tool *arguments* are deliberately excluded — those are the very content
the provider filtered and persisting them would defeat the purpose of
the safety filter (per review note in #3035).

Graceful skips when journal is absent (subagent runtime, unit tests,
no-event-store local dev). Journal exceptions never propagate into the
agent loop.

Refs #3028

* fix(runtime): satisfy ruff format + address Copilot review

- ruff format on safety_finish_reason_config.py and e2e demo (CI lint
  failed on ruff format --check; backend Makefile lint target runs
  ruff check AND ruff format --check).
- Docstring on SafetyFinishReasonConfig now says resolve_variable to
  match the actual loader used in from_config (the wording was
  resolve_class previously; behavior is unchanged — resolve_variable
  mirrors how guardrails.provider is loaded).
- Switch the AIMessage type check in SafetyFinishReasonMiddleware._apply
  from getattr(last, "type") == "ai" to isinstance(last, AIMessage),
  matching TokenUsageMiddleware / TodoMiddleware / ViewImageMiddleware
  / SummarizationMiddleware which are the dominant pattern.

Refs #3028
2026-05-22 21:20:28 +08:00
Nan Gao
253542ea0d
docs: discourage MCP filesystem workspace config (#3141) 2026-05-22 09:19:23 +08:00
Willem Jiang
c881d95898
fix(mcp): persist MCP sessions across tool calls for stateful servers (#3089)
* fix(mcp): persist MCP sessions across tool calls for stateful servers

  MCP tools loaded via langchain-mcp-adapters created a new session on
  every call, causing stateful servers like Playwright to lose browser
  state (pages, forms) between consecutive tool invocations within the
  same thread.

  Add MCPSessionPool that maintains persistent sessions scoped by
  (server_name, thread_id). Tool calls within the same thread now reuse
  the same MCP session, preserving server-side state. Sessions are evicted
  in LRU order (max 256) and cleaned up on cache invalidation.

  Fixes #3054

* fix(sandbox): add group/other read permissions to uploaded files for Docker sandbox (#3127)

  When using AIO sandbox with LocalContainerBackend, uploaded files are
  created with 0o600 (owner-only) permissions by the gateway process
  running as root. The sandbox process inside the Docker container runs
  as a non-root user and cannot read these bind-mounted files, causing
  a "Permission denied" error on read_file.

  Add `needs_upload_permission_adjustment` attribute to SandboxProvider
  (default True) to indicate that uploaded files need chmod adjustment.
  LocalSandboxProvider opts out (same user). A new `_make_file_sandbox_readable`
  function adds S_IRGRP | S_IROTH bits after files are written, changing
  permissions from 0o600 to 0o644 so the sandbox can read the uploads.

* fix(mcp): address review comments on session pool and tools

- _extract_thread_id: return "default" instead of stringifying None
  when get_config() returns no thread_id
- call_with_persistent_session: fix **arguments annotation from
  dict[str,Any] to Any
- Replace private _convert_call_tool_result import with a local
  implementation that handles all MCP content block types
- _make_session_pool_tool: accept tool_interceptors and apply the
  configured interceptor chain on every call (preserving OAuth and
  custom interceptors)
- MCPSessionPool: replace asyncio.Lock with threading.Lock; restructure
  get/close methods to never await while holding the lock; add
  close_all_sync() that closes sessions on their owning event loops
- reset_mcp_tools_cache: use pool.close_all_sync() instead of
  asyncio.run-in-thread to close sessions deterministically
- test: add test_session_pool_tool_sync_wrapper_path_is_safe covering
  tool invocation via the sync wrapper (tool.func) path

Agent-Logs-Url: https://github.com/bytedance/deer-flow/sessions/9e7f9e7f-1d2b-464a-b3b7-7f1649b74122

Co-authored-by: WillemJiang <219644+WillemJiang@users.noreply.github.com>

* fix(mcp): extract SESSION_CLOSE_TIMEOUT to class constant

Agent-Logs-Url: https://github.com/bytedance/deer-flow/sessions/9e7f9e7f-1d2b-464a-b3b7-7f1649b74122

Co-authored-by: WillemJiang <219644+WillemJiang@users.noreply.github.com>

* Potential fix for pull request finding 'Empty except'

Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
2026-05-21 23:22:20 +08:00
Xinmin Zeng
e93f658472
fix(stability): resolve P0 blockers from v2.0-m1-rc1 stability audit (#3107) (#3131)
* fix(task-tool): unwrap callback manager when locating usage recorder

`config["callbacks"]` may arrive as a `BaseCallbackManager` (e.g. the
`AsyncCallbackManager` LangChain hands to async tool runs), not just a plain
list. The previous `for cb in callbacks` loop raised
`TypeError: 'AsyncCallbackManager' object is not iterable`, which
`ToolErrorHandlingMiddleware` then converted into a failed `task` ToolMessage
even though the subagent had completed internally — Ultra mode lost subagent
results and the lead agent fell back to redoing the work.

Unwrap `BaseCallbackManager.handlers` before searching for the recorder.

Refs: bytedance/deer-flow#3107 (BUG-002)

* fix(frontend): treat any task tool error as a terminal subtask failure

The subtask card status machine matched only three English prefixes (`Task
Succeeded. Result:`, `Task failed.`, `Task timed out`). Anything else fell
through to `in_progress`, so a `task` tool error wrapped by
`ToolErrorHandlingMiddleware` (`Error: Tool 'task' failed ...`) left the card
spinning forever even after the run had ended.

Extract the prefix logic into `parseSubtaskResult` and recognise any leading
`Error:` token as a terminal failure. The extracted function is unit-tested
against the legacy prefixes plus the `AsyncCallbackManager` regression
captured in the upstream issue.

Refs: bytedance/deer-flow#3107 (BUG-007)

* fix(frontend): exclude hidden, reasoning, and tool payloads from chat export

`formatThreadAsMarkdown` / `formatThreadAsJSON` iterated raw messages without
running the UI-level `isHiddenFromUIMessage` filter. Exported transcripts
therefore included `hide_from_ui` system reminders, memory injections,
provider `reasoning_content`, tool calls, and tool result messages — content
that is intentionally hidden in the chat view.

Filter the export to the user-visible transcript by default and gate
reasoning / tool calls / tool messages / hidden messages behind explicit
`ExportOptions` flags so a future debug export can opt back in without
forking the formatter.

Refs: bytedance/deer-flow#3107 (BUG-006)

* fix(gateway): route get_config through get_app_config for mtime hot reload

`get_config(request)` returned the `app.state.config` snapshot captured at
startup. The worker / lead-agent path then threaded that frozen `AppConfig`
through `RunContext` and `agent_factory`, so per-run fields edited in
`config.yaml` (notably `max_tokens`) were ignored until the gateway process
was restarted — even though `get_app_config()` already does mtime-based
reload at the bottom layer.

Route the request dependency through `get_app_config()` directly. Runtime
`ContextVar` overrides (`push_current_app_config`) and test-injected
singletons (`set_app_config`) keep working; `app.state.config` is now only
read at startup for one-shot bootstrap (logging level, IM channels,
`langgraph_runtime` engines).

`tests/test_gateway_deps_config.py` encoded the old snapshot contract and is
removed; `tests/test_gateway_config_freshness.py` replaces it with mtime,
ContextVar, and `set_app_config` coverage. `test_skills_custom_router.py` and
`test_uploads_router.py` now inject test configs via FastAPI
`dependency_overrides[get_config]` instead of mutating `app.state.config`.

Document the hot-reload boundary in `backend/CLAUDE.md` so reviewers know
which fields are picked up on the next request vs. which still require a
restart (`database`, `checkpointer`, `run_events`, `stream_bridge`,
`sandbox.use`, `log_level`, `channels.*`).

Refs: bytedance/deer-flow#3107 (BUG-001)

* fix(gateway): broaden get_config 503 to any config-load failure

Address review feedback on the previous commit:

1. Narrow exception catch removed. The old contract returned 503 whenever
   `app.state.config is None`. The first cut only mapped
   `FileNotFoundError`, leaving `PermissionError`, YAML parse errors, and
   pydantic `ValidationError` to bubble up as 500. At the request boundary
   we treat any inability to materialise the config as "configuration not
   available" (503) and log the original exception so the operator still
   has the stack.

2. Removed the unused `request: Request` parameter and the matching
   `# noqa: ARG001`. FastAPI's `Depends()` does not require the dependency
   to accept `Request`; the only call site uses the no-arg form.

3. `backend/CLAUDE.md` boundary now lists the *reason* each field is
   restart-required (engine binding, singleton caching, one-shot
   `apply_logging_level`, etc.), not just the field name, so reviewers do
   not have to reverse-engineer the boundary themselves.

Tests parametrise four exception classes (`FileNotFoundError`,
`PermissionError`, `ValueError`, `RuntimeError`) and assert 503 for each.

Refs: bytedance/deer-flow#3107 (BUG-001)

* fix(task-tool): defend _find_usage_recorder against non-list callbacks

Address review feedback. The previous commit handled the two common shapes
LangChain hands to async tool runs — a plain `list[BaseCallbackHandler]` and
a `BaseCallbackManager` subclass — but iterated any other shape directly,
which would still raise `TypeError` if e.g. a single handler instance leaked
through without a list wrapper.

Treat any non-list, non-manager `config["callbacks"]` value as "no recorder"
rather than crash. Docstring now lists all four shapes explicitly. New tests
cover the single-handler-object case, `runtime is None`, `callbacks is None`,
and `runtime.config` being a non-dict — all required to be silent no-ops.

Refs: bytedance/deer-flow#3107 (BUG-002)

* fix(frontend): drop dead identity ternary and add opt-in export tests

Address review feedback on the previous export commit:

1. Removed the no-op `typeof msg.content === "string" ? msg.content : msg.content`
   expression in `formatThreadAsJSON`. Both branches returned the same value;
   the message content now flows through unchanged whether it is a string or
   the rich `MessageContent[]` shape (LangChain JSON-serialises the array
   structure correctly already).

2. Expanded the JSDoc on `ExportOptions` to make it clearer that the four
   flags are not currently wired to any UI control — callers wanting a debug
   export must build the options object explicitly. The default behaviour
   continues to match the explicit prescription in
   bytedance/deer-flow#3107 BUG-006.

3. Added opt-in coverage. The previous tests only exercised the
   `options = {}` default path; the new cases verify each flag flips the
   corresponding payload back into the export so a future debug-export
   surface does not silently break the contract.

Refs: bytedance/deer-flow#3107 (BUG-006)

* fix(frontend): export subtask prefix constants and document fallback intent

Address review feedback on the previous BUG-007 commit:

1. `SUCCESS_PREFIX`, `FAILURE_PREFIX`, `TIMEOUT_PREFIX`, and the
   `ERROR_WRAPPER_PATTERN` regex are now exported. The JSDoc explicitly
   pins them as part of the backend↔frontend contract defined in
   `task_tool.py` and `tool_error_handling_middleware.py`, so any future
   structured-status migration (e.g. backend writing
   `additional_kwargs.subagent_status` instead of leading text) can
   reference these from one canonical place rather than redefine them.

2. The `in_progress` fallback now carries a docstring explaining the
   deliberate choice — LangChain only ever emits a `ToolMessage` once the
   tool itself has returned, so unrecognised content means the contract
   has drifted and "still running" is the right operator signal (eagerly
   marking it terminal-failed would mask the drift).

No behaviour change; this is documentation and an API export.

Refs: bytedance/deer-flow#3107 (BUG-007)

* fix(gateway): drop app.state.config snapshot and freeze run_events_config

Address @ShenAC-SAC's BUG-001 review on #3131. The previous cut still
stored an ``AppConfig`` snapshot on ``app.state.config`` for startup
bootstrap. Two follow-on hazards from that:

1. Future code touching the gateway lifespan could accidentally start
   reading ``app.state.config`` again, silently regressing the request
   hot path back to a stale snapshot.
2. ``get_run_context()`` paired a freshly-reloaded ``AppConfig`` with the
   startup-bound ``event_store`` and a *live* ``run_events_config``
   field — so an operator who edited ``run_events.backend`` mid-flight
   would have produced a run context whose ``event_store`` and
   ``run_events_config`` referred to different backends.

Clean approach (aligned with the direction in PR #3128):

- ``lifespan()`` keeps a local ``startup_config`` variable and passes it
  explicitly into ``langgraph_runtime(app, startup_config)`` and into
  ``start_channel_service``. No ``app.state.config`` attribute is set at
  any point.
- ``langgraph_runtime`` now accepts ``startup_config`` as a required
  parameter, removing the ``getattr(app.state, "config", None)`` lookup
  and the "config not initialised" runtime error.
- The matching ``run_events_config`` is frozen onto ``app.state`` next
  to ``run_event_store`` so ``get_run_context`` reads the two from the
  same startup-time source. ``app_config`` continues to be resolved
  live via ``get_app_config()``.
- ``backend/CLAUDE.md`` boundary explanation updated to spell out the
  ``startup_config`` / ``get_app_config()`` split.

New regression test ``test_run_context_app_config_reflects_yaml_edit``
exercises the worker-feeding path: it asserts that ``ctx.app_config``
follows a mid-flight ``config.yaml`` edit while
``ctx.run_events_config`` stays frozen to the startup snapshot the
event store was built from.

Refs: bytedance/deer-flow#3107 (BUG-001), bytedance/deer-flow#3131 review

* fix(frontend): parse Task cancelled and polling timed out as terminal

Address @ShenAC-SAC's BUG-007 review on #3131. `task_tool.py` actually
emits five terminal strings:

- `Task Succeeded. Result: …`
- `Task failed. …`
- `Task timed out. …`
- `Task cancelled by user.`               ← previously matched none
- `Task polling timed out after N minutes …` ← previously matched none

The previous cut handled three; the last two fell through to the
"unknown content" branch and pushed the subtask card back to
`in_progress` even though the backend had already reached a terminal
state. Add explicit matches plus regression tests for both. The
`in_progress` fallback is now reserved for genuinely unrecognised
output (i.e. contract drift), as documented.

Refs: bytedance/deer-flow#3107 (BUG-007), bytedance/deer-flow#3131 review

* fix(frontend): sanitize JSON export content via the Markdown content path

Address @ShenAC-SAC's BUG-006 review and the Copilot inline comment on
#3131. The previous cut filtered hidden/tool messages out of the JSON
export but still serialised `msg.content` verbatim, so:

- inline `<think>…</think>` wrappers stayed in the exported `content`
  even with `includeReasoning: false`,
- content-array thinking blocks leaked the `thinking` field,
- `<uploaded_files>…</uploaded_files>` markers leaked the workspace
  paths a user uploaded files to.

JSON now goes through the same sanitiser the Markdown path uses
(`extractContentFromMessage` + `stripUploadedFilesTag`). Reasoning and
tool_calls remain gated behind their `ExportOptions` flags. AI / human
rows that sanitise to empty content with no opted-in reasoning or tool
calls are dropped so the JSON matches the Markdown path's `continue`
on empty assistant fragments.

New regression tests cover the three leak shapes the reviewer called
out plus the empty-content-drop case.

Refs: bytedance/deer-flow#3107 (BUG-006), bytedance/deer-flow#3131 review

* test(gateway): align lifespan stub with langgraph_runtime two-arg signature

Codex round-3 review of c0bc7a06 flagged this: changing
`langgraph_runtime` to require `startup_config` as a second positional
argument broke the one-arg stub `_noop_langgraph_runtime(_app)` in
`test_gateway_lifespan_shutdown.py`, which is patched into
`app.gateway.app.langgraph_runtime` by the lifespan shutdown bounded-timeout
regression. Lifespan would then call the stub with two args and raise
`TypeError` before the bounded-shutdown assertion ran.

Update the stub to match the new signature. The shutdown test itself is
unaffected — it only cares about the channel `stop_channel_service` hang
path.

Refs: bytedance/deer-flow#3107 (BUG-001), bytedance/deer-flow#3131 review

* fix(frontend): strip every known backend marker in export, not just uploads

Codex round-3 review of 258ca800 and the matching maintainer feedback on
PR #3131 made the same point: the JSON export now ran the
Markdown-side sanitiser, but that sanitiser only stripped
`<uploaded_files>`. The full set of payloads middleware embeds inside
message `content` is larger:

- `<uploaded_files>` — `UploadsMiddleware`
- `<system-reminder>` — `DynamicContextMiddleware`
- `<memory>` — `DynamicContextMiddleware` (nested inside system-reminder)
- `<current_date>` — `DynamicContextMiddleware`

The primary protection is still `isHiddenFromUIMessage`: the
`<system-reminder>` HumanMessage is marked `hide_from_ui: true` and never
reaches the formatter. This commit adds the second line of defence so a
regression that drops the `hide_from_ui` flag — or any future middleware
that injects the same tag vocabulary into a visible HumanMessage —
cannot leak the payload into the export file.

Concrete changes:

- New `INTERNAL_MARKER_TAGS` constant + `stripInternalMarkers(content)`
  helper in `core/messages/utils.ts`. The constant doubles as
  documentation for the backend↔frontend contract.
- `formatMessageContent` in `export.ts` now calls `stripInternalMarkers`
  instead of `stripUploadedFilesTag`. UI render paths
  (`message-list-item.tsx`) keep using the narrower function so a user
  legitimately typing `<memory>` in a meta-discussion is preserved.
- The "drop empty rows" guard in `buildJSONMessage` switched from
  `=== undefined` to truthy `!` checks. Codex spotted the asymmetry: when
  `extractReasoningContentFromMessage` returned the empty string (which it
  legitimately can), the JSON path emitted `{reasoning: ""}` while the
  Markdown path's `!reasoning` `continue` correctly dropped the row.

New regression tests cover the defence-in-depth strip with a
`<system-reminder><memory><current_date>` payload deliberately *not*
marked `hide_from_ui`; tool-message sanitization under
`includeToolMessages: true`; the mixed-content-array case
(`thinking + text + image_url`); and the opted-in empty-reasoning drop.

Live verification on a real Ultra-mode thread that uploaded a PDF
(`曾鑫民-薪资交易流水.pdf`): backend state's first HumanMessage carries the
`<uploaded_files>` block (with `/mnt/user-data/uploads/...` paths) as part
of a content-array. The Markdown and JSON export blobs both come back
free of `<uploaded_files>`, `<system-reminder>`, `<current_date>`,
`tool_calls`, and reasoning — while preserving the user's `这是什么 ?`
prompt and the assistant's visible answer.

Refs: bytedance/deer-flow#3107 (BUG-006), bytedance/deer-flow#3131 review

* test(frontend): cover trim, varied N, and pre-execution Error: prefixes

Codex round-3 review of 50e2c257 flagged three coverage gaps in the
subtask-status parser:

1. `Task cancelled by user.` and `Task polling timed out` previously had
   no whitespace-trim coverage — the original trim test only exercised
   the success prefix. Streaming chunks can arrive with leading/trailing
   newlines; the regex needed an explicit assertion.
2. The polling-timeout case was tested only at one `N` (15 minutes). The
   backend interpolates the live `timeout_seconds // 60` value, so the
   matcher must hold for any positive integer. Now we run the case for
   1, 5, and 60 minutes.
3. `task_tool.py` also emits three `Error:` strings for pre-execution
   failures — unknown subagent type, host-bash disabled, and "task
   disappeared from background tasks". They are intentionally handled by
   `ERROR_WRAPPER_PATTERN` rather than dedicated prefixes (the wrapper
   already produces the right terminal-failed shape) but had no test
   coverage proving that wiring. Codex was right that a refactor splitting
   one of them off into its own prefix would silently break things.

The JSDoc on the constants block now spells the three pre-execution
errors out so the relationship between `task_tool.py` returns and the
prefix vocabulary is explicit.

No production code change beyond the docstring — this commit is pure
coverage hardening for the contract that already exists.

Refs: bytedance/deer-flow#3107 (BUG-007), bytedance/deer-flow#3131 review
2026-05-21 21:18:10 +08:00
john lee
4cb2a22400
docs(config.example): fix Claude thinking example — add supports_thinking and budget_tokens (#3068)
The commented Claude example used Claude 3.5 Sonnet with
when_thinking_enabled but lacked supports_thinking: true. Copying the
block and swapping to a Claude 4 model name would silently fall back to
non-thinking mode (agent.py line 380 suppresses the error and logs only
a warning).

A second trap: budget_tokens is required by the Anthropic API when
thinking.type == "enabled"; there is no server default. The old example
omitted it, so any user who did add supports_thinking: true would get an
API error on the first thinking request.

Replace with a Claude Sonnet 4 example that includes both fields and
inline comments explaining the constraints.

Closes #2336

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-21 21:13:24 +08:00
Xinmin Zeng
9c03a71a07
fix(gateway): preserve message additional_kwargs in normalize_input (#3132) (#3136)
* fix(gateway): preserve message additional_kwargs in normalize_input (#3132)

The gateway's hand-rolled dict→message coercion only forwarded `content`
and collapsed every role to `HumanMessage`, silently dropping the
frontend's `additional_kwargs.files` payload (along with `id`, `name`,
and ai/system/tool roles).

Effect on issue #3132:

- `UploadsMiddleware` saw no `files` on the last human message, so the
  just-uploaded file got bucketed under "previous messages" while the
  current turn was reported as `(empty)`.
- The persisted human message had no `files`, so the attachment chip on
  the message disappeared the moment the optimistic UI cleared.

Delegate the conversion to `langchain_core.messages.utils.convert_to_messages`
so `additional_kwargs`, `id`, `name`, and non-human roles round-trip
unchanged.

* fix(gateway): convert malformed-message ValueError into HTTP 400

normalize_input now sits at the request boundary, so a malformed
input.messages[N] dict (missing role/type/content, unsupported role,
etc.) should surface as 400 with the offending index — not bubble out
of FastAPI as 500.

Per Copilot review on #3136.
2026-05-21 21:06:19 +08:00
Lawrance_YXLiao
1c5c585741
fix(runtime): bound write_file execution-failure observations (#3133)
* fix(runtime): bound write_file execution-failure observations

* fix(runtime): preserve write_file error prefixes

* test(runtime): trim write_file prefix assertions

* refactor(runtime): drop redundant exception suffix for permission/directory write errors

Address Copilot review on #3133: the PermissionError and IsADirectoryError
branches now return self-contained, non-redundant messages (e.g.
"Error: Permission denied writing to file: /mnt/...") via direct
truncation, instead of going through _format_write_file_error which
appended a duplicate ": PermissionError: permission denied" suffix.

OSError, SandboxError and the generic Exception branches keep the
unified "Failed to write file '{path}': {ExceptionType}: {detail}"
format so the model still sees a stable, machine-readable error class.

Removes the now-unused message= parameter from _format_write_file_error,
keeping a single code path. Truncation contract (<= 2000 chars) and
host-path sanitization unchanged.

* fix(runtime): handle write_file sandbox init errors

Initialize the requested path before sandbox setup so early sandbox failures can still return a bounded write_file error.

Add a regression test for sandbox initialization failures.

* style(test): format sandbox security tests
2026-05-21 20:35:46 +08:00
Xinmin Zeng
df95154282
fix(tracing): propagate session_id and user_id into Langfuse traces (#2944)
* fix(tracing): propagate session_id and user_id into Langfuse traces

Adds Langfuse v4 reserved trace attributes (langfuse_session_id,
langfuse_user_id, langfuse_trace_name, langfuse_tags) to
RunnableConfig.metadata inside the run worker, so the langchain
CallbackHandler can lift them onto the root trace.

- New deerflow.tracing.metadata.build_langfuse_trace_metadata() returns
  the reserved keys when Langfuse is in the enabled providers, else {}.
- worker.run_agent merges them with setdefault so caller-supplied keys
  win, allowing per-request overrides from upstream metadata.
- session_id mirrors the LangGraph thread_id; user_id reads
  get_effective_user_id() (falls back to "default" in no-auth mode).
- trace_name defaults to "lead-agent"; tags carry env and model name
  when DEER_FLOW_ENV (or ENVIRONMENT) and a model name are present.

Closes #2930

* fix(tracing): attach Langfuse callback at graph root so metadata propagates

The first commit injected ``langfuse_session_id`` / ``langfuse_user_id`` /
``langfuse_trace_name`` / ``langfuse_tags`` into ``RunnableConfig.metadata``,
but on ``main`` the Langfuse callback is attached at *model* level
(``models/factory.py``). LangChain still threads ``parent_run_id`` through
the contextvar, so the handler sees the model as a nested observation and
``__on_llm_action`` strips the ``langfuse_*`` keys
(``keep_langfuse_trace_attributes=False``). The trace's top-level
``sessionId`` / ``userId`` therefore stayed empty in deer-flow's LangGraph
runtime — confirmed live against a real Langfuse instance.

This commit moves the callback to the **graph invocation root** so the
handler fires ``on_chain_start(parent_run_id=None)`` and runs the
``propagate_attributes`` path that actually lifts ``session_id`` /
``user_id`` onto the trace:

- ``models/factory.py``: add ``attach_tracing`` keyword (default ``True``)
  so standalone callers (``MemoryUpdater``, etc.) keep their direct
  model-level tracing.
- ``agents/lead_agent/agent.py``: call ``build_tracing_callbacks()`` once
  inside ``_make_lead_agent`` and append the result to
  ``config["callbacks"]``; the four in-graph ``create_chat_model`` sites
  (bootstrap, default agent, sync + async summarization) pass
  ``attach_tracing=False`` to avoid duplicate spans.
- ``agents/middlewares/title_middleware.py``: same ``attach_tracing=False``
  for the title-generation model, since it inherits the graph's
  RunnableConfig via ``_get_runnable_config``.

Test updates:

- ``tests/test_lead_agent_model_resolution.py`` and
  ``tests/test_title_middleware_core_logic.py``: extend the fake
  ``create_chat_model`` signatures / mock assertions to accept the new
  ``attach_tracing`` kwarg.
- ``tests/test_worker_langfuse_metadata.py``: switch the no-user fallback
  test from direct ContextVar mutation to ``monkeypatch.setattr`` on
  ``get_effective_user_id`` to avoid pollution across the langfuse OTel
  global tracer provider.
- ``tests/conftest.py``: add an autouse fixture that resets
  ``deerflow.config.title_config._title_config`` to its pristine default
  after every test. Any test that loads the real ``config.yaml`` (via
  ``get_app_config()``) calls ``load_title_config_from_dict`` and mutates
  the module-level singleton, which previously poisoned the
  title-middleware suite when run after, e.g., the new
  ``test_worker_langfuse_metadata.py`` cases. The fixture is independent
  of this PR's main change but unblocks the cross-file test run.

Live verification (same Langfuse instance as before):

- Drove ``worker.run_agent`` against the real ``make_lead_agent`` +
  ``gpt-4o-mini`` for three distinct ``user_context`` identities
  (``fancy-engineer``, ``alice-pm``, ``bob-designer``).
- Each run produced one ``lead-agent`` trace whose top-level
  ``sessionId`` / ``userId`` / ``tags`` carry the expected values, e.g.
  ``session=e2e-2930-8f347c-alice-pm user=alice-pm name='lead-agent'
  tags=['model:gpt-4o-mini']``.

Refs #2930.

* fix(tracing): extend root-callback + metadata injection to the embedded client

Addresses Copilot review on PR #2944.

Commit 2 disabled model-level tracing for ``TitleMiddleware`` and
``_create_summarization_middleware`` because ``_make_lead_agent`` now
attaches the tracing callbacks at the graph invocation root. But the
embedded ``DeerFlowClient`` does not call ``_make_lead_agent`` — it
calls ``_build_middlewares`` directly and never appends the tracing
handlers to its ``RunnableConfig``. So under the embedded path,
title-generation and summarization LLM calls were left untraced —
a regression introduced by this PR.

This commit mirrors the gateway worker's injection in
``DeerFlowClient.stream``:

- Append ``build_tracing_callbacks()`` to ``config["callbacks"]`` so
  the Langfuse handler sees ``on_chain_start(parent_run_id=None)`` at
  the graph root and runs the ``propagate_attributes`` path.
- Merge ``build_langfuse_trace_metadata(...)`` into
  ``config["metadata"]`` with ``setdefault`` so caller-supplied keys
  still win.
- ``_ensure_agent`` now creates its main model with
  ``attach_tracing=False`` to avoid duplicate spans now that the
  callback lives at the graph root.

Docs:
- ``backend/CLAUDE.md`` Tracing section rewritten to describe the
  graph-root attachment model (replacing the inaccurate
  "at model-creation time" wording).
- ``README.md`` Langfuse section now lists both injection points
  (worker + client) instead of only the worker path.

Tests:
- ``tests/test_client_langfuse_metadata.py`` (new, 3 cases):
  callbacks + metadata are injected when Langfuse is enabled,
  caller-supplied metadata overrides win via ``setdefault``, and the
  injection is inert when Langfuse is disabled.

Live verification on the real Langfuse instance:

  === user=fancy-client ===
    id=cbd22847..  session=client-2930-6b9491-fancy-client  user=fancy-client  name='lead-agent'
  === user=alice-client ===
    id=b4f6f576..  session=client-2930-6b9491-alice-client  user=alice-client  name='lead-agent'

Refs #2930.

* refactor(tracing): address maintainer review on PR #2944

Addresses @WillemJiang's 5 comments.

1. Duplicated metadata-injection code between worker.py and client.py
   New ``deerflow.tracing.inject_langfuse_metadata(config, ...)`` helper
   takes the 10-line build + merge + setdefault logic that was duplicated
   in ``runtime/runs/worker.py`` and ``client.py``. Both callers now share
   a single source of truth, so the two paths cannot drift.

2. Direct private-attribute mutation in conftest.py and tests
   Added public ``reset_tracing_config()`` / ``reset_title_config()``
   functions. ``tests/conftest.py`` and every test that previously did
   ``tracing_module._tracing_config = None`` or
   ``title_module._title_config = TitleConfig()`` now goes through the
   public API. A future internal rename will surface as an ImportError
   instead of a silent no-op.

3. client.py reading os.environ directly
   ``DeerFlowClient.__init__`` grows an optional ``environment`` parameter
   so programmatic callers can pass the deployment label explicitly.
   ``stream()`` consults ``self._environment`` first and only falls back
   to ``DEER_FLOW_ENV`` / ``ENVIRONMENT`` env vars when nothing was
   passed in. Backwards compatible — env-var behaviour preserved for
   callers that opt to keep using it.

4. build_tracing_callbacks() cached on hot path
   Not implemented. Inspected the langfuse v4 ``langchain.CallbackHandler``
   constructor: it only resolves the module-level singleton client via
   ``get_client()`` and initialises a few dicts (no I/O, no env parsing
   at construction time). The build is essentially free. Caching would
   trade a non-measurable speedup for two real risks: handler instances
   carry per-run state internally (``_run_states``, ``_root_run_states``,
   ``last_trace_id``), and tracing config can be reloaded by env-var
   changes between runs. Will revisit if profiling ever shows it as
   a hot spot.

5. attach_tracing=False easy to forget at new in-graph call sites
   - Module docstring at the top of ``lead_agent/agent.py`` documents
     the invariant ("every in-graph ``create_chat_model`` MUST pass
     ``attach_tracing=False``") and enumerates the current sites.
   - New regression test
     ``test_make_lead_agent_attaches_tracing_callbacks_at_graph_root`` in
     ``tests/test_lead_agent_model_resolution.py`` locks both halves of
     the invariant: ``config["callbacks"]`` carries the tracing handler
     after ``_make_lead_agent``, AND every ``create_chat_model`` call
     captured by the test passes ``attach_tracing=False``. A future
     in-graph site that forgets the flag will fail this test.

Lint clean. Full touched-suite bundle: 246 passed.

---------

Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
2026-05-21 16:49:31 +08:00
john lee
ca7042dec2
chore(windows): add PYTHONIOENCODING and PYTHONUTF8 to backend Makefile targets (#3069)
langgraph-api emits → and ⚠️ characters in version-check log lines.
On Windows with cp1252 as the default stream encoding, each such line
throws a UnicodeEncodeError inside the logging handler, littering
startup output with tracebacks (though the server still boots).

#1550 already fixed this for scripts/check.py via stream.reconfigure().
Apply the same treatment to the backend Makefile dev/gateway/test
targets by setting PYTHONIOENCODING=utf-8 and PYTHONUTF8=1 before each
uv run invocation. Both variables are no-ops on Linux/macOS where UTF-8
is already the default.

Closes #2337

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-21 16:42:26 +08:00
Xinmin Zeng
31513c2ccb
fix(persistence): emit tz-aware timestamps from SQLite-backed stores (#3130)
SQLAlchemy's DateTime(timezone=True) is a no-op on SQLite (the backend
has no native tz type), so values round-tripped through the DB come
back as naive datetimes. The four SQL _row_to_dict helpers were calling
.isoformat() directly on those naive values, shipping timezone-less
strings like "2026-05-20T06:10:22.970977" out of the API. The browser's
new Date(...) then parses them as local time, shifting recent threads
in /threads/search by the local UTC offset (about 8h in Asia/Shanghai).

Route the four call sites through coerce_iso() instead — it already
normalizes naive values as UTC and emits "+00:00" so the wire format
always carries tz. No data migration is needed; existing SQLite rows
read back via the corrected serializer.

PostgreSQL deployments are unaffected because timestamptz preserves
tzinfo end-to-end.

Closes #3120
2026-05-21 16:22:09 +08:00
Airene Fang
923f516deb
feat(trace):LangGraph -> lead_agent and set custom agent_name to run_name (#3101)
* feat(trace):LangGraph -> lead_agent and set user custom agent name to run_name

* feat(trace):follow github copilot suggest

* feat(trace):Refactor run_name resolution and improve test coverage
2026-05-21 14:48:28 +08:00
AochenShen99
8b697245eb
fix(sandbox): avoid blocking sandbox readiness polling (#2822)
* fix(sandbox): offload async sandbox acquisition

Run blocking sandbox provider acquisition through the async provider hook so eager sandbox setup does not stall the event loop.

* fix(sandbox): add async readiness polling

Introduce an async sandbox readiness poller using httpx and asyncio.sleep while preserving the existing synchronous API.

* test(sandbox): cover async readiness polling

Lock in non-blocking readiness behavior so the async helper does not regress to requests.get or time.sleep.

* fix(sandbox): allow anonymous backend creation

* fix(sandbox): use async readiness in provider acquisition

* fix(sandbox): use async acquisition for lazy tools

* test(sandbox): cover anonymous remote creation

* fix(sandbox): clamp async readiness timeout budget

* fix(sandbox): offload async lock file handling

* fix(sandbox): delegate async middleware fallthrough

* docs(sandbox): document async acquisition path

* fix(sandbox): offload async sandbox release

* docs(sandbox): mention async release hook

* fix(sandbox): address async lock review

Reduce duplicate sync/async sandbox acquisition state handling and move async thread-lock waits onto a dedicated executor with cancellation-safe cleanup.

* chore: retrigger ci

Retrigger GitHub Actions after upstream main fixed the stale PR merge lint failure.

* test(sandbox): sync backend unit fixtures

---------

Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
2026-05-21 14:44:34 +08:00
Nan Gao
dcc6f1e678
feat(loop-detection): defer warning injection (#2752)
* fix(loop-detection): defer warn injection to wrap_model_call

The warn branch in LoopDetectionMiddleware injected a HumanMessage
into state from after_model. The tools node had not yet produced
ToolMessage responses to the previous AIMessage(tool_calls=...), so
the new HumanMessage landed *between* the assistant's tool_calls and
their responses. OpenAI/Moonshot reject the next request with
"tool_call_ids did not have response messages" because their
validators require tool_calls to be followed immediately by tool
messages.

Detection now runs in after_model as before, but only enqueues the
warning into a per-thread list. Injection happens in wrap_model_call,
where every prior ToolMessage is already present in request.messages.
The warning is appended at the end as HumanMessage(name="loop_warning")
— pairing intact, AIMessage semantics untouched, no SystemMessage
issues for Anthropic.

Closes #2029, addresses #2255 #2293 #2304 #2511.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(channels): remove loop warning display filter

* feat(loop-detection): scope pending warnings by run

* docs(loop-detection): update docs

* test(loop-detection): assert deferred warnings are queued

* fix(loop-detection): cap transient warning state

* docs: update docs

* add async awrap_model_call test coverage

* docs(loop-detection): document transient warnings

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-21 14:36:07 +08:00
sunsine
7ec8d3a6e7
fix(security): mask sensitive values in MCP config API responses (#2667)
* fix(security): mask sensitive values in MCP config API responses

GET /api/mcp/config previously returned plaintext secrets including
env dict values (API keys), headers (auth tokens), and OAuth
client_secret/refresh_token. Any authenticated user could read all
MCP service credentials.

This commit masks sensitive fields in GET/PUT responses while
preserving the key structure so the frontend round-trip (GET masked
→ toggle enabled → PUT) correctly preserves existing secrets.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(security): address Copilot review on MCP config masking

- Load raw JSON (un-resolved $VAR placeholders) as merge source instead
  of resolved config, preventing plaintext secrets from replacing
  $VAR placeholders on disk (Comment 2)
- Preserve all top-level keys (e.g. mcpInterceptors) in PUT, not just
  mcpServers/skills (Comment 1)
- Reject masked value '***' for new keys that don't exist in existing
  config, returning 400 with actionable error (Comment 3)
- Allow empty string '' to explicitly clear OAuth secrets, while None
  means 'preserve existing' for safe round-trip (Comment 4)
- Add 3 new tests for rejection, clearing, and edge cases (18 total)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-05-21 10:28:57 +08:00
InitBoy
e19bec1422
fix(task-tool): cancel and schedule deferred cleanup on polling safety timeout (#3097)
When the poll loop's safety-net timeout fires (poll_count > max_poll_count),
the background subagent task was abandoned without cancellation or cleanup,
leaving a stale entry in _background_tasks indefinitely.

The original code had a comment promising "the cleanup will happen when the
executor completes", but run_task() in executor.py never calls
cleanup_background_task after reaching a terminal state -- the promise was
never implemented.

This change mirrors the asyncio.CancelledError path: signal cooperative
cancellation via request_cancel_background_task and schedule
_deferred_cleanup_subagent_task to remove the entry once the background
thread reaches a terminal state.

Direct cleanup at poll-timeout time would introduce a race: run_task() could
remove the entry while the poll loop is still mid-iteration, causing a
spurious "Task disappeared" error. The deferred approach avoids this by
waiting for terminal state before removal.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-21 07:47:19 +08:00
Yuyi Ao
9afeaf66bc
Fix env resolution in MCP config lists (#2556)
* Fix env resolution in MCP config lists

* fix:unset env variable and consistent function

---------

Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
2026-05-21 07:27:00 +08:00
Airene Fang
b6b3650e50
fix(trace):memory 中文 in trace info is unicode escape sequence. (#3104)
* fix(trace):memory 中文 in trace is unicode

* Potential fix for pull request finding

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
2026-05-20 22:34:10 +08:00