2064 Commits

Author SHA1 Message Date
dependabot[bot]
109490da25
chore(deps): bump python-multipart from 0.0.26 to 0.0.27 in /backend (#2799)
Bumps [python-multipart](https://github.com/Kludex/python-multipart) from 0.0.26 to 0.0.27.
- [Release notes](https://github.com/Kludex/python-multipart/releases)
- [Changelog](https://github.com/Kludex/python-multipart/blob/main/CHANGELOG.md)
- [Commits](https://github.com/Kludex/python-multipart/compare/0.0.26...0.0.27)

---
updated-dependencies:
- dependency-name: python-multipart
  dependency-version: 0.0.27
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-05-08 22:58:15 +08:00
dependabot[bot]
14c0a32ee6
chore(deps): bump mako from 1.3.11 to 1.3.12 in /backend (#2798)
Bumps [mako](https://github.com/sqlalchemy/mako) from 1.3.11 to 1.3.12.
- [Release notes](https://github.com/sqlalchemy/mako/releases)
- [Changelog](https://github.com/sqlalchemy/mako/blob/main/CHANGES)
- [Commits](https://github.com/sqlalchemy/mako/commits)

---
updated-dependencies:
- dependency-name: mako
  dependency-version: 1.3.12
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-05-08 22:57:48 +08:00
Willem Jiang
70737af7cd
fix(nignx):resolve CSRF auth failure on non-standard ports (#2796) 2026-05-08 22:40:38 +08:00
DanielWalnut
2b1fcb3e43
fix(task): remove max_turns parameter from task tool interface (#2783)
* fix(task): remove max_turns parameter from task tool interface

Subagents should always use their configured max_turns value. Exposing
this parameter allowed callers to override the admin-configured limit,
which is undesirable. The value is now exclusively driven by subagent
config (per-agent overrides and global defaults in config.yaml).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Potential fix for pull request finding

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
2026-05-08 15:05:24 +08:00
He Wang
7de9b5828b
fix(tools): introduce Runtime type alias to eliminate Pydantic serialization warning (#2774)
* fix(tools): introduce Runtime type alias to eliminate Pydantic serialization warning

Add deerflow/tools/types.py with:

    Runtime = ToolRuntime[dict[str, Any], ThreadState]

Replace every runtime: ToolRuntime[ContextT, ThreadState] and
runtime: ToolRuntime[dict[str, Any], ThreadState] annotation in
sandbox/tools.py, present_file_tool.py, task_tool.py, view_image_tool.py,
and skill_manage_tool.py with the new Runtime alias.

The unbound ContextT TypeVar (default None) caused
PydanticSerializationUnexpectedValue warnings on every tool call because
LangChain's BaseTool._parse_input calls model_dump() on the auto-generated
args_schema while DeerFlow passes a dict as runtime context.
Binding the context to dict[str, Any] aligns Pydantic's serialization
expectations with reality and removes the noise from all run modes.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(tools): extend Runtime alias to setup_agent and update_agent tools

Replace bare ToolRuntime annotations in setup_agent_tool.py and
update_agent_tool.py with the shared Runtime alias introduced in the
previous commit, and add both tools to the Pydantic serialization
warning regression test (13 cases total).

Co-authored-by: Cursor <cursoragent@cursor.com>

* test(tools): loosen Pydantic warning filter to avoid version-specific format

Replace the brittle "field_name='context'" substring check with a looser
"context" match so the assertion stays valid if Pydantic changes its
internal warning format across versions.

Co-authored-by: Cursor <cursoragent@cursor.com>

* test(tools): simplify warning filter and clean up docstring

Remove the "context" substring condition from the Pydantic warning
filter — asserting that no PydanticSerializationUnexpectedValue fires
at all is both simpler and more comprehensive, since the test payload
contains only the tool's own args plus runtime.

Also update the module docstring to remove the version-specific warning
format example that was inconsistent with the looser filter.

Co-authored-by: Cursor <cursoragent@cursor.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-08 14:50:33 +08:00
Eilen Shin
37db689349
fix(events): serialize structured db event content (#2762) 2026-05-08 10:17:17 +08:00
Eilen Shin
bd45cb2846
fix(sandbox): disable msys path conversion (#2766) 2026-05-08 10:13:11 +08:00
Eilen Shin
5fd0e6ac89
fix(middleware): sync raw tool call metadata (#2757) 2026-05-08 10:08:53 +08:00
YuJitang
530bda7107
fix: dedupe token usage aggregation by message id (#2770) 2026-05-08 09:54:20 +08:00
Willem Jiang
6c220a9aef
fix(chat): prevent first user message from being swallowed in new conversations (#2731)
* fix(chat): prevent first user message from being swallowed in new conversations

  The optimistic message clearing effect cleared too eagerly — any stream
  message (including AI messages from messages-tuple events) triggered the
  clear before the server's human message had arrived via values events.
  For new threads this caused the user's first prompt to disappear permanently.

  Only clear optimistic messages once the server's human message has been
  confirmed to arrive in thread.messages, not just when any message arrives.

  Fixes #2730

* Potential fix for pull request finding

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
2026-05-07 17:31:48 +08:00
Tao Liu
daa3ffc29b
feat(loop-detection): make loop detection configurable with per-tool frequency overrides (#2711)
* Make loop detection configurable

Expose LoopDetectionMiddleware thresholds through config.yaml while preserving existing defaults and allowing the middleware to be disabled.

Refs bytedance/deer-flow#2517

* feat(loop-detection): add per-tool tool_freq_overrides to Phase 1

Adds ToolFreqOverride model and tool_freq_overrides field to
LoopDetectionConfig, wires it through LoopDetectionMiddleware, and
documents the option in config.example.yaml.

Resolves the gap flagged in the #2586 review: without per-tool overrides,
users hit by #2510/#2511 (RNA-seq workflows exceeding the bash hard limit)
had no way to raise thresholds for one tool without loosening the global
limit for every tool.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* Potential fix for pull request finding

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

* docs(loop-detection): document tool_freq_overrides in LoopDetectionMiddleware docstring

Add the missing Args entry for tool_freq_overrides, explaining the
(warn, hard_limit) tuple structure and how per-tool thresholds supersede
the global tool_freq_warn / tool_freq_hard_limit for named tools.
Also run ruff format on the three files flagged by the lint check.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(loop-detection): validate LoopDetectionMiddleware __init__ params eagerly

Raise clear ValueError at construction time instead of crashing at
unpack-time inside _track_and_check when bad values are passed:
- tool_freq_overrides: must be 2-tuples of positive ints with hard_limit >= warn
- scalar thresholds: warn_threshold, hard_limit, tool_freq_warn,
  tool_freq_hard_limit must be >= 1 and hard limits must >= their warn pairs
- window_size, max_tracked_threads must be >= 1

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(test): isolate credential loader directory-path test from real ~/.claude

The test didn't monkeypatch HOME, so on any machine with real Claude Code
credentials at ~/.claude/.credentials.json the function fell through to
those credentials and the assertion failed. Adding HOME redirect ensures
the default credential path doesn't exist during the test.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* style(test): add blank lines after import pytest in TestInitValidation

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* refactor(loop-detection): collapse dual validation to LoopDetectionConfig

Modifications
  - LoopDetectionMiddleware.__init__: stripped of all ValueError raises;
    becomes a plain field-assignment constructor.
  - LoopDetectionMiddleware.from_config: classmethod that builds the
    middleware from a Pydantic-validated LoopDetectionConfig and handles
    the ToolFreqOverride -> tuple[int, int] conversion.
  - agents/factory.py: SDK construction routed through
    LoopDetectionMiddleware.from_config(LoopDetectionConfig()) so the
    defaults path is Pydantic-validated too.
  - agents/lead_agent/agent.py: uses from_config instead of unpacking
    config fields by hand.
  - tests/test_loop_detection_middleware.py: deleted TestInitValidation
    (16 methods exercising the removed __init__ checks); added
    TestFromConfig (4 tests: scalar field mapping, override tuple
    conversion, empty overrides, behavioral smoke test).

Result: one validation layer (Pydantic), zero duplication, no __new__
hacks. Both production construction sites flow through LoopDetectionConfig.

Test results
  make test   -> 2977 passed, 18 skipped, 0 failed (137s)
  make format -> All checks passed; 411 files left unchanged

* feat(agents): make loop_detection configurable in create_deerflow_agent

Adds a `loop_detection: bool | AgentMiddleware = True` field to
RuntimeFeatures, mirroring the existing pattern used by `sandbox`,
`memory`, and `vision`. SDK users can now disable LoopDetectionMiddleware
or replace it with a custom instance built from their own
LoopDetectionConfig — e.g.
`LoopDetectionMiddleware.from_config(my_cfg)` — instead of being stuck
with the hardcoded defaults previously installed by the SDK factory.

The lead-agent path (which already reads AppConfig.loop_detection) is
unchanged, and the default `True` preserves prior always-on behavior for
all existing callers.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: knight0940 <631532668@qq.com>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Co-authored-by: Amorend <142649913+knight0940@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
2026-05-07 16:15:15 +08:00
Xinmin Zeng
27559f3675
fix(frontend): defer thread id to onStart to avoid 404 on new chat (#2749)
* fix(frontend): defer thread id to onStart to avoid 404 on new chat

The LangGraph SDK's useStream eagerly fetches /threads/{id}/history the
moment it receives a thread id, and the local useThreadRuns issues
GET /threads/{id}/runs for the same reason. The chats page used to flip
isNewThread=false (and forward the client-generated thread id) inside
the synchronous onSend callback, before thread.submit had created the
thread on the backend. The two queries therefore raced ahead of
POST /runs/stream and returned 404 on the very first send.

Drop the onSend handler so isNewThread stays true until onStart fires
from useStream's onCreated — by then the backend has the thread, and
the SDK's submittingRef guard naturally suppresses the redundant
history fetch. The agent chat page already uses this pattern, so this
also unifies the two flows.

Adds an E2E regression that records request ordering and asserts
GET /history and GET /runs are never issued before POST /runs/stream
on the first send from /chats/new.

Closes #2746

* fix(frontend): split welcome layout from backend thread state

Removing onSend kept GET /history and GET /runs from racing ahead of
POST /runs/stream, but it also coupled the welcome layout (centered
input, hero, quick actions) to backend thread creation.  Until onCreated
returned, the user's optimistic message and the welcome hero rendered on
top of each other.

Introduce a dedicated `isWelcomeMode` UI flag, separate from
`isNewThread`:
- `isNewThread` still tracks "backend has no thread yet" and gates the
  thread id forwarded to useStream.
- `isWelcomeMode` drives the visual layout (header background, input
  box position, max width, hero, quick actions, autoFocus) and flips to
  false inside onSend so the layout animates immediately.

`isWelcomeMode` is kept in sync with `isNewThread` via an effect so
sidebar navigation and "new chat" still behave correctly.  All 15 E2E
tests pass, including the ordering regression added in the previous
commit.

* test(e2e): use monotonic sequence for thread-init ordering check

Date.now() is millisecond-resolution, so two requests emitted within
the same tick would share a timestamp and slip past the strict `<`
ordering assertions. Replace the timestamp with a monotonic counter
that increments on every observed request/requestfinished event so the
ordering check is robust regardless of scheduling.

Per PR #2749 review feedback from copilot-pull-request-reviewer.

* refactor(input-box): rename isNewThread prop to isWelcomeMode

Inside InputBox, the prop named `isNewThread` is only ever consulted
for visual layout decisions — gating follow-up suggestions, the bottom
background strip, and the welcome-mode quick-action SuggestionList. It
never reflects "the backend has created the thread", which after #2746
is tracked separately via `isNewThread` in the chat pages themselves.

Rename the prop to `isWelcomeMode` and update both call sites
(workspace chats page and agent chats page) so the prop name matches
its actual semantics. No behavior change.

Per PR #2749 review feedback from @WillemJiang.
2026-05-07 16:11:44 +08:00
AochenShen99
cef4224381
fix(skills): enforce allowed-tools metadata (#2626)
* fix(skills): parse allowed-tools frontmatter

* fix(skills): validate allowed-tools metadata

* fix(skills): add shared allowed-tools policy

* fix(subagents): enforce skill allowed-tools

* fix(agent): enforce skill allowed-tools

* refactor(skills): dedupe TypeVar and reuse cached enabled skills

- Drop redundant module-level TypeVar in tool_policy; rely on PEP 695 syntax.
- Expose get_cached_enabled_skills() and have the lead agent reuse it
  instead of synchronously rescanning skills on every request.

* fix(agent): expose config-scoped skill cache

* fix(subagents): pass filtered tools explicitly

* fix(skills): clean allowed-tools policy feedback
2026-05-07 08:34:43 +08:00
Hinotobi
2b0e62f679
[security] fix(auth): reject cross-site auth POSTs (#2740)
* fix(security): reject cross-site auth posts

* fix(auth): align secure cookie proxy scheme handling

---------

Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
2026-05-07 07:58:06 +08:00
Eilen Shin
1336872b15
fix(channels): authenticate gateway command requests (#2742) 2026-05-06 15:27:34 +08:00
KiteEater
4ead2c6b19
fix(config): reset config-backed singletons on hot reload (#2588)
* Fix stale config singletons on reload

* fix(config): update checkpointer imports after runtime move

* Fix config reload singleton mutation on validation failure

---------

Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
2026-05-06 10:17:55 +08:00
yangzheli
59c4a3f0a4
feat(agent): add custom-agent self-updates with user isolation (#2713)
* feat(agent): add update_agent tool for in-chat custom-agent self-updates (#2616)

Custom agents had no built-in way to persist updates to their own SOUL.md /
config.yaml from a normal chat — `setup_agent` was only bound during the
bootstrap flow, so when the user asked the agent to refine its description
or personality, the agent would shell out via bash/write_file and the edits
landed in a temporary sandbox/tool workspace instead of
`{base_dir}/agents/{agent_name}/`.

Changes:
- New `update_agent` builtin tool with partial-update semantics (only the
  fields you pass are written) and atomic temp-file + os.replace writes so
  a failed update never corrupts existing SOUL.md / config.yaml.
- Lead agent now binds `update_agent` in the non-bootstrap path whenever
  `agent_name` is set in the runtime context. Default agent (no
  agent_name) and bootstrap flow are unchanged.
- New `<self_update>` system-prompt section is injected for custom agents,
  instructing them to use `update_agent` — and explicitly NOT bash /
  write_file — to persist self-updates.
- Tests: 11 new cases in `tests/test_update_agent_tool.py` covering
  validation (missing/invalid agent_name, unknown agent, no fields),
  partial updates (soul-only, description-only, skills=[] vs omitted),
  no-op detection, atomic-write safety, and AgentConfig round-tripping;
  plus 2 new cases in `tests/test_lead_agent_prompt.py` covering the
  self-update prompt section.
- Docs: updated backend/CLAUDE.md builtin tools list and tools.mdx
  (en/zh) with the new tool description.

* feat(agent): isolate custom agents per user

Store custom agent definitions under the effective user, keep legacy agents readable until migration, and cover API/tool/migration behavior with tests.

Co-authored-by: Cursor <cursoragent@cursor.com>

* feat: consistent write/delete targets & add --user-id to migration

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-05 23:17:42 +08:00
Nan Gao
e8675f266d
fix(loop-detection): keep tool-call pairing on warn injection (#2724) (#2725)
* fix(loop-detection): keep tool-call pairing on warn injection (#2724)

* make format

* fix(loop-detection): avoid IMMessage leak to downstream consumer

* fix(channels): filter loop warning text from IM replies
2026-05-05 18:53:49 +08:00
Xun
680187ddc2
fix: Supplement list_running in RemoteSandboxBackend (#2716)
* fix: Supplement list_running in RemoteSandboxBackend

* fix

* except requests.RequestException as exc:

* fix
2026-05-05 18:53:10 +08:00
Xinmin Zeng
aded753de3
fix(frontend): restore localhost fallback for getGatewayConfig in prod mode (#2705) (#2718)
* fix(frontend): unify gateway-config localhost fallback for prod (#2705)

`getGatewayConfig()` only fell back to localhost defaults when
`NODE_ENV === "development"`, while `next.config.js` always falls back
to `127.0.0.1:8001`. Running `make start` (which sets NODE_ENV=production
via `next start`) without `DEER_FLOW_INTERNAL_GATEWAY_BASE_URL` /
`DEER_FLOW_TRUSTED_ORIGINS` therefore caused zod to throw inside SSR
layouts and surfaced as a 500.

Drop the NODE_ENV gating and use localhost defaults everywhere — the
"force explicit config in prod" intent should be enforced by deployment
templates (docker-compose already sets both vars), not by request-time
crashes. Document the two vars in both .env.example files and add unit
coverage for the dev/prod env-unset paths.

* Potential fix for pull request finding

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

* Update internalGatewayUrl in gateway config tests

---------

Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
2026-05-05 16:27:29 +08:00
Willem Jiang
028493bfd8
fix(docker):force ngix to resolve upstream names at request time (#2717)
* fix(docker):force ngix to resolve upstream names at request time

* fix(docker): set resolver valid=0s to eliminate DNS cache window for request-time re-resolution

Agent-Logs-Url: https://github.com/bytedance/deer-flow/sessions/07bdb872-022f-4fd2-9fa8-d800a4ce34a7

Co-authored-by: WillemJiang <219644+WillemJiang@users.noreply.github.com>

* Update DNS resolver valid time and add upstreams

* fix the unit test error

* Remove upstream server configurations from nginx.conf

Removed upstream server configurations for gateway and frontend.

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
2026-05-05 14:35:55 +08:00
Willem Jiang
8e48b7e85c
fix(channels): preserve clarification conversation history across follow-up turns (#2444)
* fix(channels): preserve clarification conversation history across follow-up turns

Pin channel-triggered runs to the root checkpoint namespace and ensure thread_id is always present in configurable run config so follow-up replies resume the same conversation state.

Add regression coverage to channel tests:

assert checkpoint_ns/thread_id are passed in wait and stream paths
add an integration-style clarification flow test that verifies the second user reply continues prior context instead of starting a new session
This addresses history loss after ask_clarification interruptions (issue #2425).

* Apply suggestions from code review

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* fix(channels): copy configurable dict before injecting run-scoped fields

  When configurable was already a plain dict, _resolve_run_params mutated
  it in place, leaking checkpoint_ns and thread_id back into the shared
  session config. Always copy via dict() before mutating to prevent
  cross-user or cross-channel config pollution.

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-05-04 16:14:07 +08:00
Willem Jiang
af6e48ccaa
fix(i18n): add Chinese translations for account settings page (#2712)
The account settings page had all user-facing strings (profile labels,
  password form placeholders, validation messages, button text) hardcoded
  in English. Replace them with i18n translation keys so the page renders
  correctly when the locale is set to Chinese.

 Fixed #2710
v2.0-m1-rc0
2026-05-04 11:15:16 +08:00
Willem Jiang
b10eb7bafc
feat(github): Added container push workflow (#2709)
* feat(github):Added container push workflow

* Apply suggestions from code review

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
2026-05-04 11:14:34 +08:00
YuJitang
d02f762ab0
feat: refine token usage display modes (#2329)
* feat: refine token usage display modes

* docs: clarify token usage accounting semantics

* fix: avoid duplicate subtask debug keys

* style: format token usage tests

* chore: address token attribution review feedback

* Update test_token_usage_middleware.py

* Update test_token_usage_middleware.py

* chore: simplify token attribution fallback

* fix token usage metadata follow-up handling

---------

Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
2026-05-04 09:56:16 +08:00
Willem Jiang
82e7936d36
fix(docker): set UTF-8 locale to prevent ASCII encoding errors in minimal containers (#2707)
* fix(docker): set UTF-8 locale to prevent ASCII encoding errors in minimal containers

* Potential fix for pull request finding

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
2026-05-04 09:41:10 +08:00
Nan Gao
222a7773cb
fix(frontend): avoid misleading error message when agent api is disable (#2697) (#2698) 2026-05-04 09:38:05 +08:00
Nan Gao
f80ac961ec
fix(harness): restore legacy skills path fallback (#2694) (#2696)
* fix(harness): restore legacy skills path fallback (#2694)

* fix(format): make format

* Potential fix for pull request finding

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
2026-05-03 23:40:59 +08:00
wanxsb
44ab21fc44
feat(community): add Serper web search provider (#2630)
* feat(community): add Serper web search provider

Add a new community search provider backed by the Serper Google Search
API (https://serper.dev). Serper returns real-time Google results via a
simple JSON API and requires only an API key — no extra Python package.

Changes:
- backend/packages/harness/deerflow/community/serper/__init__.py
- backend/packages/harness/deerflow/community/serper/tools.py
  Implements web_search_tool using httpx (already a project dependency).
  API key is read from config.yaml `api_key` field or SERPER_API_KEY env var.
  Follows the same interface / output shape as the existing ddg_search provider.
  Exposes max_results parameter (default 5) with config override logic.
- backend/tests/test_serper_tools.py
  Unit tests covering API key resolution, config overrides, HTTP errors,
  empty results, and parameter passing.
- config.example.yaml: add commented-out Serper example alongside other providers
- .env.example: add SERPER_API_KEY placeholder

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Fix the lint error

* Fix the lint error

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
2026-05-02 16:22:35 +08:00
Hinotobi
e543bbf5d6
[security] fix(upload): reject symlinked upload destinations (#2623)
* fix: reject symlinked upload destinations

* test: harden upload destination checks

* fix: address PR feedback for #2623

* test: cover safe upload re-uploads

* fix: preserve upload limit checks after rebase

* fix(upload): stream safe HTTP upload writes
2026-05-02 15:19:28 +08:00
Xinmin Zeng
ca3332f8bf
fix(gateway): return ISO 8601 timestamps from threads endpoints (#2599)
* fix(gateway): return ISO 8601 timestamps from threads endpoints (#2594)

ThreadResponse documents created_at / updated_at as ISO timestamps,
matching the LangGraph Platform schema (langgraph_sdk.schema.Thread
exposes them as datetime, JSON-encoded as ISO 8601). The gateway
threads router was instead emitting str(time.time()) — unix-second
floats — breaking frontend new Date() parsing and producing a mixed
ISO/unix wire format that also corrupted the search sort order.

Centralize timestamp generation in deerflow.utils.time:
- now_iso()       — datetime.now(UTC).isoformat()
- coerce_iso(x)   — heals legacy unix-timestamp strings on read so the
                    store converges to ISO without a one-shot migration

threads.py: replace 6 time.time() call sites with now_iso(); wrap all
read paths and Phase-2 checkpoint metadata with coerce_iso(); _store_upsert
opportunistically heals legacy created_at on update; drop unused time import.

thread_runs.py: reuse now_iso() instead of a private duplicate _now_iso(),
preventing future drift between the two timestamp call sites.

Tests: 9 unit tests for the helper; 5 integration tests pinning the ISO
contract for create/get/patch/search and the legacy-healing path on the
internal store upsert. Full suite: 2144 passed, 15 skipped, 0 failed.

Closes #2594

* fix(gateway): coerce checkpoint metadata timestamps to ISO on read

After the merge with main, three additional read paths in ``threads.py``
were still emitting raw ``str(metadata.get("created_at", ""))`` —
``get_thread_state``, ``update_thread_state``, and ``get_thread_history``.

Same root cause as #2594: when the checkpoint metadata's ``created_at``
is a unix-second float (legacy data, or a checkpoint written by an older
Gateway version), ``str(float)`` produces ``"1777252410.411327"`` and the
frontend's ``new Date(...)`` returns ``Invalid Date``. The fix on the
``/threads/{id}`` GET path was already in place; these three sibling
endpoints needed the same treatment.

All four call sites now flow through ``coerce_iso``, so:
- legacy float metadata heals to ISO on the way out,
- ISO metadata passes through unchanged,
- ``datetime`` instances (which the new ``coerce_iso`` branch handles
  explicitly) emit with the ``T`` separator instead of falling through
  to the space-separated ``str(datetime)`` form.

Coverage added for the two endpoints not already pinned by the merge:
- ``test_get_thread_state_returns_iso_for_legacy_checkpoint_metadata``
- ``test_get_thread_history_returns_iso_for_legacy_checkpoint_metadata``

Both pre-seed a checkpoint whose metadata carries the literal float
from the issue body and assert the wire format is ISO.
2026-05-02 15:16:16 +08:00
Willem Jiang
bb8b234d85
chroe(2585): keep polishing the code of codex token usage (#2689) 2026-05-02 15:04:11 +08:00
KiteEater
17447fccbe
fix(runtime): make rollback restore checkpoint supersede newer checkpoints (#2582)
* Restore rollback checkpoints with fresh ids

* Tighten rollback checkpoint tests and imports

* Update test_run_worker_rollback.py

---------

Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
2026-05-02 11:25:45 +08:00
KiteEater
866d1ca409
Populate Codex usage metadata for token accounting (#2585) 2026-05-02 11:16:03 +08:00
greatmengqi
8ba01dfd83
refactor: thread app_config through lead and subagent task path (#2666)
* refactor: thread app config through lead prompt

* fix: honor explicit app config across runtime paths

* style: format subagent executor tests

* fix: thread resolved app config and guard subagents-only fallback

Address two PR review findings:

1. _create_summarization_middleware passed the original (possibly None)
   app_config into create_chat_model, forcing the model factory back to
   ambient get_app_config() and risking config drift between the
   middleware's resolved view and the model's view. Pass the resolved
   AppConfig instance through end-to-end.

2. get_available_subagent_names accepted Any-typed config and forwarded
   it to is_host_bash_allowed, which reads ``.sandbox``. A
   SubagentsAppConfig (also accepted upstream as a sum-type input) has
   no ``.sandbox`` attribute and would be silently treated as "no
   sandbox configured", incorrectly disabling the bash subagent. Guard
   on hasattr and fall back to ambient lookup otherwise.

Adds regression tests for both paths.

* chore: simplify hasattr guard and tighten regression tests

- Collapse if/else into ternary in get_available_subagent_names; hasattr(None, ...) is False so the explicit None check was redundant.
- Drop comments that narrate the change rather than explain non-obvious WHY (test names already convey intent).
- Replace stringly-typed sentinel "no-arg" in regression test with direct args tuple comparison.

---------

Co-authored-by: greatmengqi <chenmengqi.0376@bytedance.com>
2026-05-02 06:37:49 +08:00
Willem Jiang
189b82405c
fix(sandbox): pass no_change_timeout to exec_command to prevent 120s premature termination (#2685)
* fix(sandbox): pass no_change_timeout to exec_command to prevent 120s premature termination

  The agent_sandbox library's shell API defaults no_change_timeout to 120
  seconds. When AioSandbox.execute_command() called exec_command() without
  this parameter, commands producing no output for 120s would return with
  NO_CHANGE_TIMEOUT status even though the script was still running.

  Pass no_change_timeout=600 to all exec_command calls (matching the
  client-level HTTP timeout) so long-running commands are not cut short.

  Fixes #2668

* test(sandbox): add assertions for no_change_timeout in execute_command and list_dir

Agent-Logs-Url: https://github.com/bytedance/deer-flow/sessions/2f37bc72-0826-4443-a6ba-e5b78c22fb5a

Co-authored-by: WillemJiang <219644+WillemJiang@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
2026-05-01 22:27:02 +08:00
Nan Gao
487c1d939f
fix(subagents): use model override for tools and middleware (#2641)
* fix(subagents): use model override for tools and middleware

* fix(config): resolve effective subagent model

* fix(subagents): defer app config loading

* fix(subagents): fully defer config.yaml load in executor __init__

The previous attempt only relocated the explicit get_app_config() call,
but left resolve_subagent_model_name(...) running eagerly in __init__.
That helper has its own internal get_app_config() fallback, which still
fired when both app_config and parent_model were None and
config.model == "inherit" — exactly the path unit tests hit, breaking
21 tests in CI with FileNotFoundError: config.yaml.

Skip the eager resolve in __init__ when it would require loading the
config file, and defer to _create_agent (which already has the
app_config or get_app_config() fallback).
2026-05-01 22:21:10 +08:00
Nan Gao
c09c334544
fix(harness): resolve runtime paths from project root (#2642)
* fix(harness): resolve runtime paths from project root

* docs(config): update

* fix(config): address runtime path review feedback

* test(config): fix skills path e2e root

* test(config): cover legacy config fallback when project root lacks config files

Verifies that when DEER_FLOW_PROJECT_ROOT is unset and cwd has no
config.yaml/extensions_config.json, AppConfig and ExtensionsConfig fall back
to the legacy backend/repo-root candidates — the backward-compat path
requested in PR #2642 review.

---------

Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
2026-05-01 22:19:50 +08:00
KiteEater
8939ccaed2
fix(uploads): enforce streaming upload limits in gateway (#2589)
* fix: enforce gateway upload limits

* fix: acquire sandbox before upload writes

* Fix upload limit config wiring

* Sanitize upload size error filenames

* test: call upload routes unwrapped

* fix: guard upload limits endpoint

---------

Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
2026-05-01 20:19:30 +08:00
JerryLee
83938cf35a
fix(subagents): propagate user context across threaded execution (#2676) 2026-05-01 16:27:18 +08:00
yangzheli
78633c69ac
fix(agents): propagate agent_name into ToolRuntime.context for setup_agent (#2679)
* fix(agents): propagate agent_name into ToolRuntime.context for setup_agent (#2677)

When creating a custom agent via the web UI, SOUL.md was always written
to the global base_dir/SOUL.md instead of agents/<name>/SOUL.md.

Root cause: the bootstrap flow sends agent_name via body.context, but
two layers were broken:

1. services.py only forwarded body.context keys into config["configurable"];
   config["context"] was never populated.
2. worker.py constructed the parent Runtime with a hard-coded
   {thread_id, run_id} context, ignoring config["context"] entirely.

After the langgraph >= 1.1.9 bump (#98a5b34f), ToolRuntime.context no
longer falls back to configurable, so setup_agent's
runtime.context.get("agent_name") returned None and the tool's silent
agent_name=None -> base_dir fallback kicked in, overwriting the global
SOUL.md.

Fix:
- services.py: extract merge_run_context_overrides() and write the
  whitelisted context keys into both configurable (legacy readers) and
  context (langgraph 1.1+ ToolRuntime consumers).
- worker.py: extract _build_runtime_context() and merge config["context"]
  into the Runtime's context (without letting callers override
  thread_id/run_id).

The base_dir fallback in setup_agent_tool.py is left in place because
the IM /bootstrap channel command depends on it. That code path can
be tightened in a follow-up.

Adds regression tests covering both helpers.

* Apply suggestions from code review

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-05-01 16:00:11 +08:00
greatmengqi
8b61c94e1d
fix: keep lead agent graph factory signature compatible (#2678)
Co-authored-by: greatmengqi <chenmengqi.0376@bytedance.com>
2026-05-01 15:43:28 +08:00
Xun
1ad1420e31
refactor(skills): Unified skill storage capability (#2613) 2026-05-01 13:23:26 +08:00
He Wang
eba3b9e18d
fix(config): unify log_level from config.yaml across Gateway and debug entry points (#2601)
Centralize log level parsing in `logging_level_from_config()` and
application in `apply_logging_level()` within `deerflow.config.app_config`.

- Gateway lifespan applies configured log level on startup
- `debug.py` uses shared helpers instead of local duplicates
- `apply_logging_level()` targets only `deerflow`/`app` logger hierarchies
  so third-party library verbosity is not affected; root handler levels
  are only lowered (never raised) to allow configured loggers through
  without suppressing third-party output; root logger level is not modified
- Config field description updated to clarify scope
- Tests save/restore global logging state to avoid test pollution

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-30 22:27:14 +08:00
Willem Jiang
c0da278269
fix(memory): replace short-lived asyncio.run() with persistent event loop (#2627)
* fix(memory): replace short-lived asyncio.run() with persistent event loop to prevent zombie httpx connections

  The memory updater used asyncio.run() inside daemon threads, creating
  and destroying short-lived event loops on every update. Langchain
  providers (e.g. langchain-anthropic) cache httpx AsyncClient instances
  globally via @lru_cache, so SSL connections created on a loop that is
  subsequently destroyed become zombie connections in the shared pool.
  When the main agent's lead run later reuses one of these connections,
  httpx/anyio triggers RuntimeError: Event loop is closed during
  connection cleanup.

  Replace the ThreadPoolExecutor + asyncio.run() pattern with a
  _MemoryLoopRunner that maintains a single persistent event loop in a
  daemon thread for the process lifetime. Since the loop never closes,
  connections bound to it never become invalid. The _run_async_update_sync
  function now submits coroutines to this persistent loop via
  run_coroutine_threadsafe instead of creating throwaway loops.

* update the code to address the review comments

* Fix the review comments of 2615

 P1 — user_id forwarded through sync path: Added user_id parameter to _prepare_update_prompt, _finalize_update, and _do_update_memory_sync, and forwarded it to get_memory_data(agent_name, user_id=user_id) and
  save(..., user_id=user_id). The update_memory() entry point now passes user_id through both the executor.submit path and the direct call path. Added TestUserIdForwarding with two regression tests (sync + async)
   verifying get_memory_data and save receive the correct user_id.

  P2 — aupdate_memory() delegates to sync: Replaced the model.ainvoke() call with asyncio.to_thread(self._do_update_memory_sync, ...). This eliminates the unsafe async provider client path entirely — all memory
  updater entry points now use the isolated sync model.invoke() path. Updated the test from asserting ainvoke is awaited to asserting invoke is called and ainvoke is not.

  Nit — duplicate comment removed: Removed the duplicated # Matches sentences... comment on line 230.

* Chore(test): update the code of test_memory_updater

---------

Co-authored-by: rayhpeng <rayhpeng@gmail.com>
2026-04-30 17:59:57 +08:00
KiteEater
7dea1666ce
fix: avoid temporary event loops in async subagent execution (#2414)
* fix: avoid temporary event loops in async subagent execution

* Rename isolated subagent loop globals

* Harden isolated subagent loop shutdown and logging

* Sort subagent executor imports

* Format subagent executor

* Remove isolated loop pool from subagent executor

* Format subagent executor cleanup

---------

Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
2026-04-30 15:29:17 +08:00
Chincherry93
88d47f677f
fix(nginx): add catch-all /api/ location for auth routes (#2657)
The recent refactor (7bf618de) removed the langgraph upstream and added
individual /api/* prefix locations for models, memory, mcp, skills,
agents, threads, and sandboxes. However, /api/v1/auth/* routes (login,
register, setup-status, etc.) were not covered by any explicit location
block, causing them to fall through to the frontend catch-all.

In Docker production builds, Next.js rewrites are baked at build time
with http://127.0.0.1:8001 (the gateway is unreachable from the
frontend container's localhost), resulting in ECONNREFUSED errors for
all auth operations.

Adding a catch-all `location /api/` block after the more specific
prefix/regex locations ensures all gateway API routes are properly
proxied. nginx's location matching priority means the more specific
locations above still take precedence for /api/langgraph/, /api/models,
/api/memory, /api/mcp, /api/skills, /api/agents, /api/threads/*, and
/api/sandboxes.

Co-authored-by: Chincherry93 <Chincherry93@users.noreply.github.com>
2026-04-30 15:21:22 +08:00
greatmengqi
38714b6ceb
refactor: thread app_config through middleware factories (#2652)
* refactor: thread app_config through middleware factories

Continues the incremental config-refactor sequence (#2611 root, #2612 lead
path) one layer deeper into the middleware factories. Two ambient lookups
inside _build_runtime_middlewares are eliminated and the LLMErrorHandling
band-aid removed:

- _build_runtime_middlewares / build_lead_runtime_middlewares /
  build_subagent_runtime_middlewares now require app_config: AppConfig.
- get_guardrails_config() inside the factory is replaced with
  app_config.guardrails (semantically identical — same default-factory
  GuardrailsConfig — verified by direct equality check).
- LLMErrorHandlingMiddleware.__init__ now requires app_config and reads
  circuit_breaker fields directly. The class-level
  circuit_failure_threshold / circuit_recovery_timeout_sec defaults are
  removed along with the try/except (FileNotFoundError, RuntimeError):
  pass band-aid — the let-it-crash invariant the rest of the refactor
  enforces.

Caller chain (already-resolved app_config sources):
- _build_middlewares in lead_agent/agent.py: reorder so
  resolved_app_config = app_config or get_app_config() is computed BEFORE
  build_lead_runtime_middlewares is called, then passed as kwarg.
- SubagentExecutor: optional app_config parameter (mirrors the lead-agent
  pattern); _create_agent does the same `or get_app_config()` fallback at
  agent-build time, so task_tool callers don't need to plumb app_config
  through yet (typed-context plumbing for tool runtimes is a separate
  refactor).

Tests:
- test_llm_error_handling_middleware: _make_app_config helper using
  AppConfig(sandbox=SandboxConfig(use="test")) — same minimal-config
  pattern conftest already uses. Three direct LLMErrorHandlingMiddleware()
  calls each followed by post-construction circuit_breaker mutation fold
  cleanly into _build_middleware(circuit_failure_threshold=...,
  circuit_recovery_timeout_sec=...).

Verification:
- tests/test_llm_error_handling_middleware.py — 14 passed
- tests/test_subagent_executor.py — 28 passed
- tests/test_tool_error_handling_middleware.py — 6 passed
- tests/test_task_tool_core_logic.py — 18 passed (verifies task_tool
  unchanged behavior)
- Full suite: 2697 passed, 3 skipped. The single intermittent failure in
  tests/test_client_e2e.py::test_tool_call_produces_events is pre-existing
  LLM flakiness (the test asserts the model decided to call a tool;
  reproduces 1/3 on unchanged main as well).

* fix: address middleware app config review comments

* fix: satisfy app config annotation lint

* test: cover explicit app config middleware wiring

---------

Co-authored-by: greatmengqi <chenmengqi.0376@bytedance.com>
2026-04-30 12:41:09 +08:00
Hinotobi
74081a85a6
[security] fix(sandbox): bind local Docker ports to loopback (#2633)
* fix(sandbox): bind local Docker ports to loopback

* fix(sandbox): preserve IPv6 loopback Docker binds

* fix(sandbox): log Docker bind host selection
2026-04-30 11:40:28 +08:00
Jsonz
24a5a00679
fix: avoid duplicate call to extractReasoningContentFromMessage (#2661)
In convertToSteps(), the extractReasoningContentFromMessage function was
called twice for the same message - once to check if reasoning exists and
again to assign it to the step object. Reuse the already-extracted value
from the local variable instead.
2026-04-30 11:33:49 +08:00