deer-flow

mirror of https://github.com/bytedance/deer-flow.git synced 2026-07-27 08:28:00 +00:00

Author	SHA1	Message	Date
阿泽	1baa8ad696	feat(clarification): structured form fields for human-input cards (#4400 Phase 1) (#4406 ) * feat(clarification): structured form fields for human-input cards Add a request-side v2 `form` mode to the ask_clarification protocol so business flows (e.g. expense reimbursement) can collect several values in one card instead of sequential free-text questions: - `ask_clarification` gains a restricted `fields` parameter (text / textarea / number / select / multi_select / checkbox / date) - ClarificationMiddleware validates and normalizes fields explicitly (whitelisted types, unknown -> text, select-likes without options -> text, duplicate/invalid entries dropped, all-invalid falls back to the legacy modes) since the middleware short-circuits before tool execution; the plain-text fallback lists fields for IM channels - Form payloads carry `version: 2` so older frontends degrade to the text fallback; replies stay on the v1 response protocol — the card submits a readable summary as `response_kind: "text"`, so journal persistence and answered-card recovery are unchanged - Frontend renders typed field controls with required-field validation and compact multi-select chips Part of #4400 (scope narrowed per maintainer feedback: request-side only, no new response kinds, no top-level multi_choice). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * fix(clarification): harden form protocol per review feedback Address the five review points on #4406: - Reject field names colliding with JS Object.prototype members on both sides; frontend reads form values via own-property access only, so `constructor`/`toString`-style names can no longer leak inherited members into required validation or the submitted summary - Close open requests answered through the legacy text fallback: a visible plain human reply (no response metadata) now marks every previously-opened request as answered, so upgrading to a v2-aware frontend cannot leave the composer locked on an already-answered card - Give checkbox fields deterministic boolean semantics: values are seeded to an explicit false ("no" in the summary) and `required` means must-agree/consent; documented in the tool schema - Make middleware field validation atomic: structurally broken entries (bad/duplicate/reserved names, over-cap field/option counts or text lengths) degrade the whole form instead of silently dropping fields; options are trimmed/deduped with blanks removed so the backend never emits payloads the frontend parser rejects - Associate form labels/controls (htmlFor/id), aria-required, aria-invalid, and error descriptions for accessibility Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * refactor(clarification): type the fields item schema via TypedDict Replace `fields: list[dict[str, Any]]` with `list[ClarificationFormField]` (a TypedDict with `name` required and the type whitelist as a Literal) so the provider-facing tool schema documents the item shape instead of an opaque object relying on the docstring. Runtime validation is unchanged and stays in ClarificationMiddleware, which intercepts the call before tool execution. Addresses the non-blocking review suggestion on #4406. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * fix(frontend): drop unsupported aria-invalid from multi-select group jsx-a11y: role=group does not support aria-invalid; the error linkage stays via aria-describedby. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * fix(clarification): coerce numeric required flags and normalize fields once - `_normalize_bool` now coerces 1/0 (some providers serialize booleans as integers), so `required: 1` no longer silently flips to optional - `_handle_clarification` normalizes `fields` once and passes the result to both the text fallback and the payload builder Addresses the non-blocking review nits on #4406. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * fix(clarification): harden form protocol per contract review round 2 Backend: - Guard unhashable JSON in the intercept path: `type: []`/`{}` degrades the field to text and `clarification_type: []` coerces to str instead of raising TypeError (which, with return_direct, ended the turn with an error and no card or fallback) - Add a total budget over the serialized normalized fields (16KB UTF-8 bytes): per-item caps alone admitted forms whose IM text fallback exceeded channel delivery limits (Slack 40k chars, Feishu ~30KB card), silently truncating trailing fields; a boundary test proves any accepted form's fallback stays deliverable Frontend: - Submission value now appends a JSON block keyed by stable field names (readable summary alone is delimiter-ambiguous), with a collision regression test - Parser boundary tightened to match backend constraints: empty option values (Radix SelectItem crash), duplicate option ids/values, duplicate field names, and the form<->version-2 binding are rejected - Keep the error node mounted while any field is still invalid so aria-describedby never points at a removed element (happy-dom interaction test) - Required semantics are now accessible: native checkbox control (no HTML required attribute — it would intercept the custom submit path), visually-hidden localized "required" markers next to the aria-hidden asterisks - Legacy-fallback closure narrowed to the latest unanswered request: nothing guarantees a single outstanding clarification across runs, and closing all would silently swallow older decisions; an older request left open becomes the active card again Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * fix(frontend): keep clarification selects controlled --------- Co-authored-by: Claude Fable 5 <noreply@anthropic.com>	2026-07-27 14:05:31 +08:00
Ryker_Feng	cd9432bcc1	feat(tools): support GIF images in view_image (#4438 ) Add GIF to the view_image allowlist: map the .gif extension to image/gif and detect the GIF87a/GIF89a magic bytes so the existing extension/content cross-check accepts GIFs instead of rejecting them as an unsupported format. Covered by a new success test.	2026-07-24 13:12:43 +08:00
Huixin615	4a2ecd430e	fix(streaming): expose custom events to astream_events (#4403 ) * fix(streaming): expose custom events to astream_events * test(streaming): validate real custom event emitters --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com>	2026-07-23 22:56:12 +08:00
Aari	7b330101d2	fix(tools): exclude injected runtime from list_uploaded_files schema (#4375 ) (#4376 ) Declaring the injected runtime arg as `Annotated[Runtime, InjectedToolArg] \| None` made the top-level annotation a Union, so LangChain no longer treated it as injected. It leaked into the model-facing schema and pydantic raised PydanticInvalidForJsonSchema on the ToolRuntime dataclass the moment the tool was bound to a model. The tool is bound by default for the lead agent, so any default run on an OpenAI-compatible provider failed at tool-bind time. Declare runtime as a bare Runtime first param, matching every other built-in tool (present_files, view_image, task, ...), which LangChain auto-injects and auto-excludes from the schema. Add a schema regression test that binds the tool.	2026-07-23 08:22:15 +08:00
Aari	0d4d0cb17d	feat(agents): database-backed storage for custom agent definitions (#4359 ) * feat(agents): database-backed storage for custom agent definitions Add an agent_storage.backend switch (default file, behaviour-unchanged) with a db backend that stores each custom agent as a row in the shared SQL persistence layer, so a multi-instance deployment sees the same agents on every node (#4331, #4357). Introduces an AgentStore interface routing all read/write surfaces, an agents table + migration 0006, startup validation, and a file->db importer. Follows the thread_meta store / run_events backend-switch / 0003_scheduled_tasks migration patterns; no new dependency. * fix(agents): make db storage path production-ready (review round 1) Addresses review feedback on the db/sync agent-storage path: - sql.py: mirror the async engine's per-connection SQLite PRAGMAs on the sync engine (busy_timeout=30000, synchronous=NORMAL, foreign_keys=ON, WAL) so both engines behave identically against the shared DB; guard the engine cache with a lock (double-checked) so concurrent first-touch cannot build duplicate engines or register the connect listener twice. - routers/agents.py + routers/assistants_compat.py: offload the sync-store reads that ran on the event loop (list/get/check, update's pre-read + legacy guard + refresh, and assistants_compat's four list routes) via asyncio.to_thread — on db+postgres each was a network round trip stalling the loop. Writes were already offloaded. - file.py: translate the create() mkdir(exist_ok=False) race FileExistsError into AgentExistsError (router 409, matching SqlAgentStore's IntegrityError path); correct the _write docstring — per-file atomic replace, two commits sequential not transactional. Tests: sync-engine PRAGMA + engine-cache reuse assertions; file create-race -> AgentExistsError; strict Blockbuster anchor over the read endpoints so a regression back onto the loop fails CI. * fix(agents): address round-2 review on the db store path - update_agent tool: align the docstring/inline comment with FileAgentStore._write. Cross-field write atomicity is db-only; the file backend commits config then soul via two sequential os.replace (a crash between them can leave a fresh config.yaml beside a stale SOUL.md). The dropped partial-write reporting is an intentional tradeoff — the stage-then-replace safety is preserved (test_update_agent_soul_failure_does_not_replace_config still holds). - SqlAgentStore.update(): true upsert. Catch IntegrityError on the insert-on-missing branch, re-fetch and apply, so two concurrent first-time writes (e.g. two setup_agent handshakes) converge instead of surfacing a raw UNIQUE(user_id, name) violation as a 500. Symmetric with create(). - get_agent_store(): document the graph-subprocess config-resolution invariant (the except->file fallback is a genuine no-config path, not a mask for a misconfigured graph process) and pin it with two tests driving the real get_app_config() file resolution: db resolves from an on-disk config.yaml, file fallback when config is unresolvable. * test(agents): cover SqlAgentStore.update() write-race upsert recovery Mandatory-TDD test for the round-2 fix in 0680340a: two concurrent first-time update()s where the loser's insert hits UNIQUE(user_id, name). Deterministically forces the IntegrityError recovery path by making the first _row probe miss the committed winner, and asserts last-writer-wins instead of a surfaced 500.	2026-07-23 08:03:21 +08:00
Daoyuan Li	09d9cf53d2	fix(harness): add timeout to invoke_acp_agent to prevent indefinite hangs (#4238 ) invoke_acp_agent had no timeout anywhere in its call path, and ACPAgentConfig had no timeout field. If the ACP agent subprocess answers initialize/new_session correctly but then hangs inside prompt(), the tool call - and therefore the whole agent turn - blocks indefinitely, with the child process left running. MCP stdio servers already guard against this class of hang via tool_call_timeout; ACP agent invocations had no equivalent. Add ACPAgentConfig.timeout_seconds (default 1800, ge=1), mirroring the shape/default of subagents.timeout_seconds, and wrap the conn.prompt() call in asyncio.wait_for(). On TimeoutError, return a clear error instead of hanging; exiting the spawn_agent_process context block triggers the ACP library's own graceful-then-forceful subprocess cleanup, so the hung process is actually terminated.	2026-07-22 14:47:08 +08:00
Lee minjing	e225ad57d7	feat(uploads): lazy-load historical files via list_uploaded_files tool (#4174 ) * feat(uploads): lazy-load historical files via list_uploaded_files tool Replace per-turn injection of all historical upload metadata with on-demand discovery via a new `list_uploaded_files` built-in tool, following the same deferred-discovery pattern used by skills. - Rename <uploaded_files> block to <current_uploads> (current-run files only) - Add list_uploaded_files tool with include_outline: bool\|list[str] - Extract outline helpers to shared deerflow/utils/file_outline.py - Update system prompt to reflect lazy-loading behaviour - Historical file scan removed from UploadsMiddleware.before_agent() Co-Authored-By: Claude <noreply@anthropic.com> * fix(uploads): clear uploaded_files state when no new files in current turn When before_agent() returns None on empty turns, the LastValue uploaded_files field retains the previous turn's filenames. list_uploaded_files then incorrectly excludes those files as "current-run" files, making them invisible until the next upload. Fix: return {"uploaded_files": []} instead of None to explicitly clear state. Add two-turn regression test covering the exact scenario from review feedback. Co-Authored-By: Claude <noreply@anthropic.com> * fix: resolve CI lint errors and stale test assertion from merge - Split long prompt line to fit 240-char limit - Add missing `Any` import in list_uploaded_files_tool - Remove unused `re` import in file_conversion (outline code moved) - Remove unused `os` import in middleware test - Fix test assertion: <uploaded_files> → <current_uploads> after main merge Co-Authored-By: Claude <noreply@anthropic.com> * fix: resolve CI lint errors and stale test assertion from merge - Split long prompt line to fit 240-char limit - Add missing `Any` import in list_uploaded_files_tool - Remove unused `re` import in file_conversion (outline code moved) - Remove unused `os` import in middleware test - Fix test assertion: <uploaded_files> → <current_uploads> after main merge Co-Authored-By: Claude <noreply@anthropic.com> * fix: add current_uploads to input sanitization exempt tags The lazy-loading PR renamed <uploaded_files> to <current_uploads>. The anti-drift guard scans all framework XML blocks and requires each to be either blocked or explicitly exempted. current_uploads wraps trusted server-generated file metadata, not user input, so it belongs in the exempt set. Co-Authored-By: Claude <noreply@anthropic.com> * test: regenerate replay golden after uploaded_files state change before_agent now returns {"uploaded_files": []} instead of None, adding uploaded_files to SSE values events. Regenerated via DEERFLOW_WRITE_GOLDEN=1. Co-Authored-By: Claude <noreply@anthropic.com> * fix: review feedback — memory pipeline, stale tags, state clearing, nits - Match both tags in memory stripping pipeline (uploaded_files\|current_uploads) - Remove stale uploaded_files from _BLOCKED_TAG_NAMES - Clear uploaded_files on all before_agent early-return paths - Fix ponytail: stray word in file_conversion re-export comment - Remove dead total_omitted branch in _format_omitted_summary - ruff format fixes Co-Authored-By: Claude <noreply@anthropic.com> * fix: block current_uploads, sanitize only original user content Per review feedback: instead of exempting <current_uploads> (which allows user forgery), move it to _BLOCKED_TAG_NAMES and change InputSanitizationMiddleware._process_request to scan only the original user content (ORIGINAL_USER_CONTENT_KEY) when available. Server-injected trusted blocks are no longer checked against the blocked-tag denylist. Co-Authored-By: Claude <noreply@anthropic.com> * docs: clarify fallback reason in input sanitization comment Co-Authored-By: Claude <noreply@anthropic.com> * @ fix: third-round review feedback — state visibility, sanitization, regex, nits - list_uploaded_files_tool: logger.warning instead of silent try/except on runtime.state read failure (High) - input_sanitization_middleware: _extract_text_from_content skips empty text blocks to match message_content_to_text behaviour; rfind fallback path logs warning for observability (Medium) - memory pipeline regexes: backreference (?P<tag>)(?P=tag) in message_processing.py and prompt.py (Low) - file_conversion.py: re-export moved to top of file (Low) - Tests: middleware→tool state bridge test; integrated forged-tag + multimodal sanitization tests PR #4174 — Follow-up issues: #4212, #4213, #4214 Co-Authored-By: Claude <noreply@anthropic.com> @ * @ fix: 4th-round review — denylist, sanitization, scandir, nits - Add "uploaded_files" back to _BLOCKED_TAG_NAMES (old tag still processed by deermem; user forgery must be escaped) (consistency) - Fix inaccurate rfind-fallback comment: UploadsMiddleware keeps string as string, fallback is unreachable for strings (doc fix) - Distinguish "empty string key" (upload without text) from "non-string key" (caller forgery) so empty-text uploads never escape the server block (edge) - Merge dual os.scandir(uploads_dir) calls into one list re-use (minor) - Add comment on .md sibling skip known limitation: user-uploaded .md files whose stem collides with a converted doc are hidden (boundary, no code change) Co-Authored-By: Claude <noreply@anthropic.com> @ * @ fix: tighten rfind-failure fallback — distinguish server blocks from user blocks When _extract_text_from_content and message_content_to_text disagree on multimodal list content and rfind fails, use content[0] (server-injected <current_uploads> block) vs content[1:] (user blocks) to sanitize only user blocks. Raw strings and non-standard dict blocks that _extract_text_from_content misses are now also sanitized. Non-distinguishable paths (< 2 text blocks, non-list content) still degrade to full sanitization (safe — server block may be escaped but user forgery never leaks). All fallback paths log via logger.warning. Decision 18 / willem-bd 4th-round comment #3 Co-Authored-By: Claude <noreply@anthropic.com> @ * @ fix: correct comments referencing text_blocks → content in rfind fallback Co-Authored-By: Claude <noreply@anthropic.com> @ * fix: 5th-round review — dead code, subagent gating, integration test, perf, consistency - Delete unreachable ORIGINAL_USER_CONTENT_KEY guard in rfind fallback branch (original_user_content guaranteed non-empty str at that point) - Remove list_uploaded_files from BUILTIN_TOOLS; add include_upload_tool param to get_available_tools(), default True; task_tool.py passes False so subagents no longer receive a tool whose state exclusion is broken - Add integration test exercising real create_agent graph (not mocked runtime.state) to verify LangGraph propagates before_agent state writes into ToolRuntime.state during same-turn tool calls - Cache DirEntry.stat() st_size in candidates tuple to avoid second per-file syscall in the rendering loop - Make the upload-tag pre-check case-insensitive (content_str.lower()) to match _UPLOAD_BLOCK_RE re.IGNORECASE PR #4174 — willem-bd 5th-round review items #1-#5 Co-Authored-By: Claude <noreply@anthropic.com> * fix(channels): pass files metadata through _human_input_message() for IM uploads _human_input_message() was not passing additional_kwargs.files to the downstream message. UploadsMiddleware read no files, wrote uploaded_files=[], and list_uploaded_files reported same-run IM attachments as historical files (fancyboi999 repro). Fix: add files parameter to _human_input_message(), call site passes files=uploaded. Regression test locks the contract. Co-Authored-By: Claude <noreply@anthropic.com> * fix(channels): remove legacy <uploaded_files> manual prepend to fix double-injection regression Commit 8d86dbf6 added files= pass-through to UploadsMiddleware but left the manual _format_uploaded_files_block() prepend in place. Every IM attachment reached the model twice — once via the legacy <uploaded_files> block and once via <current_uploads>. This commit removes the manual prepend and the now-dead _format_uploaded_files_block() function. UploadsMiddleware is the sole upload-context producer for both IM and web paths. Reported-by: fancyboi999 (PR review) Co-Authored-By: Claude <noreply@anthropic.com> * docs: update #4212 issue body to reflect completed fixes and narrowed remaining scope * chore: remove temporary scratch file * fix(middleware): neutralize user-derived values inside <current_uploads> block Upload-derived filenames, paths, outline titles, and preview text are interpolated verbatim inside the trusted <current_uploads> wrapper, which InputSanitizationMiddleware exempts from sanitization. A crafted filename or document heading containing blocked authority tags would bypass the guardrail and enter model context as trusted framework data. Fix: call neutralize_untrusted_tags() on all four user-derived values inside _format_file_entry(), preserving the outer <current_uploads> wrapper untouched. Reported-by: fancyboi999 (P1 security review) Co-Authored-By: Claude <noreply@anthropic.com> * fix(middleware): neutralize extension labels in omitted-file summary Files exceeding the 10-item context cap bypass _format_file_entry(). Their extensions, derived from user-controlled filenames via _extension_label(), were interpolated verbatim into the trusted <current_uploads> wrapper — another path for blocked authority tags to escape the guardrail. Fix: neutralize extension values inside _extension_label(), the single extraction point for all extension labels. Reported-by: fancyboi999 (P1 security review) Co-Authored-By: Claude <noreply@anthropic.com> * fix(tools): neutralize user-derived values in list_uploaded_files tool result Apply neutralize_untrusted_tags() to every model-visible user-derived value returned by list_uploaded_files: filename, virtual path, extension, outline titles, outline preview lines, and omitted-file extension summary. This closes the last remaining injection bypass in the upload lazy-loading path - the <current_uploads> block and its omitted summary were already neutralized (previous commits), but the list_uploaded_files tool produced a second exit for the same attacker-controlled metadata that ToolResultSanitizationMiddleware did not cover. Co-Authored-By: Claude <noreply@anthropic.com> * fix(tests): add missing include_upload_tool=False to task_tool mock assertions PR #4174 added include_upload_tool parameter to get_available_tools(). task_tool.py correctly passes include_upload_tool=False for subagents but 5 existing tests' assert_called_once_with expectations were not updated, causing CI failures. Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Claude <noreply@anthropic.com>	2026-07-22 14:02:56 +08:00
Ryker_Feng	20debf9cc7	feat(agents): per-agent model and generation settings (#4347 ) * feat(agents): per-agent model and generation settings Let each custom agent choose its own model and sampling settings (temperature, max_tokens) plus thinking / reasoning_effort defaults, so agents sharing a model profile are no longer stuck with one shared temperature and output length (#4336). AgentConfig gains optional model_settings / thinking_enabled / reasoning_effort (None = inherit). create_chat_model applies per-caller model_overrides on top of the profile before the thinking/Codex transforms; the lead agent resolves each knob with precedence request > agent config > profile/default. The /api/agents create/update routes persist the fields and reject an unknown model. The default lead agent path is unchanged (no agent config -> overrides None). The agent chat composer also stops force-overriding an agent's configured default model with models[0]. * fix(agents): tri-state thinking control and default-model capability gating The model-settings dialog seeded the thinking switch to false, so opening it to tweak temperature and saving silently disabled thinking (the runtime default is on) with no way back to inherit. It also hid the thinking / reasoning controls whenever the agent inherited the global default model, since `__default__` never resolved through `models.find`. Give thinking an explicit Inherit / On / Off tri-state so an untouched save is a no-op, and resolve `__default__` to the effective default (models[0]) for the capability check. Logic lives in the tested helpers module.	2026-07-22 13:44:55 +08:00
Aari	cd34a1a504	fix(skills): don't attach model tracing to the in-graph skill security scan (#4252 ) * fix(skills): don't attach model tracing to the in-graph skill security scan * fix(skills): pass attach_tracing explicitly from the in-graph scan call site Follow the tracing INVARIANT's own convention rather than detecting the call context: scan_skill_content takes an attach_tracing flag, and _scan_or_raise -- the single in-graph choke point -- passes False. Standalone callers (Gateway skill routes, installer) keep the default True. The INVARIANT list named four sites and asks that new in-graph calls be added to it; record this fifth one so a future audit of that list finds it. --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com>	2026-07-20 08:18:23 +08:00
hataa	10890e10a8	feat(authz): propagate trusted authorization principal context (#4203 )	2026-07-17 14:49:51 +08:00
Huixin615	f9340c1f08	test(mcp): cover passive skill tool visibility (#4247 ) * test(mcp): cover passive skill tool visibility * test(mcp): tighten deferred discovery coverage	2026-07-17 14:39:35 +08:00
Aari	7df44f586c	fix(agents): refuse empty SOUL.md updates in update_agent (#4219 ) * fix(agents): refuse empty SOUL.md updates in update_agent setup_agent already rejects empty/whitespace soul (#3553). update_agent is the sibling write path and previously reported success while wiping a working SOUL.md. Mirror the same guard before staging. * fix(agents): guide the retry in the empty-SOUL update rejection Append "Omit the soul field if you do not want to change it." to the empty-soul error so the model self-corrects in one step instead of retrying with another null-like value, matching the "No fields provided" sibling message's helpfulness. Both regression tests assert the guidance.	2026-07-16 14:44:22 +08:00
Huixin615	65afc9b1d2	fix(skills): apply allowed-tools only to active skills (#4098 ) * fix(skills): scope allowed-tools to active skills * fix(skills): tolerate stale active skill paths * chore: retrigger CI * fix(skills): document policy activation limits * perf(skills): reuse per-step tool policy decisions * fix(skills): harden runtime tool policy contracts * fix(skills): redact cached policy decisions * fix(skills): make slash tool policy authoritative * fix(skills): preserve policy-safe discovery tools * test(skills): cover explicit task delegation policy	2026-07-16 14:12:02 +08:00
Daoyuan Li	0dd90ccfde	fix(agents): require config.yaml in update_agent's legacy-agent guard (#4166 ) update_agent (the harness tool) and PUT /api/agents/{name} (the same operation over HTTP) share an identical guard meant to block updates to an agent that only exists in the legacy shared layout. The guard checked bare directory existence: if not agent_dir.exists() and paths.agent_dir(name).exists(): When memory is enabled, the first time a user chats with a legacy shared agent, the memory writer creates a per-user directory containing only memory.json (no config.yaml). agent_dir.exists() is then true, so the guard never fires: the tool falls through to load_agent_config, which resolves through to the legacy shared config via the already-hardened resolve_agent_dir, and silently writes a brand-new config.yaml/SOUL.md into the memory-only directory. That forks the agent for just this user; every other user keeps reading the original shared config forever, with no error or warning. resolve_agent_dir itself was already hardened against exactly this failure mode: it requires config.yaml to exist, not just the directory. Mirror that condition at both call sites here.	2026-07-15 08:13:17 +08:00
qin-chenghan	713ee544b7	fix(agents): stop persisting base64 image data in checkpoint state (#4140 ) * fix(agents): stop persisting base64 image data in checkpoint state (#4138) The viewed_images state field stored full base64-encoded image data, which was duplicated across every subsequent checkpoint (O(n * steps) growth). A single 1MB image viewed early in a conversation would be re-stored in every checkpoint for the rest of the session. Changes: - ViewedImageData: replace base64 field with lightweight metadata (mime_type, size, actual_path) - view_image_tool: store only metadata in state, no base64 encoding - ViewImageMiddleware: read image files from disk on-demand in before_model and encode base64 temporarily for the model call - Update all tests to use the new metadata-only format This is the first step of #4138. The base64 data is no longer in persistent state, but the injected HumanMessage (with base64 content) still appears in the checkpoint for the step where it was injected. Checkpoint retention policies and large tool result dedup are separate follow-up items. * fix(agents): address review feedback on #4140 - view_image_tool: remove stale 'convert to base64' comment, replace with 'validate contents'; drop redundant image_size reassignment and add a TOCTOU guard that rejects files changed between stat() and read(). - view_image_middleware: extract _read_image_as_data_url helper that re-checks size against the recorded value AND the absolute cap (_MAX_IMAGE_BYTES). Document the trust assumption for actual_path (server-set, not client-settable) in the helper docstring. - view_image_middleware: abefore_model now runs the blocking read+encode via asyncio.to_thread to avoid stalling the event loop on up to 20MB images. - tests: add coverage for OSError during read, file-changed-since-view (TOCTOU), and size-exceeds-cap branches.	2026-07-14 23:02:26 +08:00
Aari	79cdd99fca	fix(mcp): validate MCP tool names at load so deferred prompts stay inert (#4154 ) * fix(tools): escape MCP tool names rendered into deferred prompts get_deferred_tools_prompt_section and get_mcp_routing_hints_prompt_section list MCP tool names into the <available-deferred-tools> and <mcp_routing_hints> system-prompt blocks without escaping, while the mirror get_skill_index_prompt_section (per its docstring) does escape. An MCP name is taken verbatim from an external server, so a crafted name could close the block and forge a framework tag. Escape names (and routing keywords) at render, mirroring the skill-index section. * fix(mcp): validate tool names at the load boundary Escaping at render only neutralizes < > &, so a tool name with newlines or markdown still injects free-form text into the deferred-tools prompt block. Deferred (tool_search) tools are never bound, so the provider's function-name check never runs on them. Drop any MCP tool whose name is not a valid identifier (^[A-Za-z0-9_-]+$) in get_mcp_tools() — the same charset the provider enforces at bind time — before it can enter the catalog or the prompt. Render-time html.escape stays as defense-in-depth. Mirrors the load-time skill-name validation in skills/storage/skill_storage.py.	2026-07-14 09:49:43 +08:00
Aari	1ebf59fe24	fix(tools): stop capping tool_search's select: at MAX_RESULTS (#4054 ) `select:` names its targets explicitly, so capping it silently drops schemas the model asked for by name -- and picks the survivors by catalog order, not request order. The model is told the tool it wanted was not returned by nothing at all; it then tries to call a tool that is still deferred. The rule is already stated three times in the repo, and this is the one place that breaks it: - backend/AGENTS.md:447 -- "select: returns all requested skills without a result cap; other modes cap at MAX_RESULTS=5" - skills/catalog.py:71 -- SkillCatalog.search returns select: uncapped; the ranked modes slice. DeferredToolCatalog shares its query grammar and its MAX_RESULTS = 5, and is capped. - tool_search's own docstring -- "select:Read,Edit -- fetch these exact tools by name" versus "notebook jupyter -- keyword search, up to max_results best matches". Only the ranked form promises a cap. The cap is applied twice: once inside `DeferredToolCatalog.search` and again in the tool closure. `search` already caps the ranked branches internally, so the closure's slice is redundant for them and is the only thing capping `select:` once the first is removed -- fixing one site alone changes nothing the model can observe. Its sibling closure, `skills/describe.py::describe_skill`, calls `catalog.search(name)` with no slice. Both slices are removed. The ranked modes keep their cap.	2026-07-12 00:17:11 +08:00
Ryker_Feng	41658c5ff4	feat(skills): add skill review quality gate (#4037 ) * feat(skills): add skill review quality gate * fix(skills): skip review eval fixtures in CI * fix(skills): ignore review eval fixtures in bundled scans * fix(skill-review): harden review gate boundaries * fix(skills): address skill review gate feedback	2026-07-11 15:58:07 +08:00
Nan Gao	aafd5077b2	feat(subagents): show effective model and token usage on task cards (#4049 ) * feat(subagents): show runtime metadata on task cards * fix(subagents): stop task-card render loop and dedupe model fetches Address code review on the runtime-metadata cards: - P1 render loop: the terminal ToolMessage is re-parsed on every MessageList render and always carries modelName/usage, so the presence-based setTasks condition fired a fresh state object each render -> "Maximum update depth exceeded". computeNextSubtask now returns a value-compared `changed` flag and a pure subtaskNotification() routes terminal transitions through the deferred after-render path while skipping no-op re-parses. - Per-card useModels refetch: add staleTime: Infinity to the ["models"] query so every subtask card shares one /api/models fetch instead of refetching on each mount. * make format * refactor(subagents): dedupe token-usage validators + tidy event narrowing Address PR review follow-ups: - DRY: extract one shared token-usage validator per side. Backend status_contract.normalize_token_usage() now backs both the terminal ToolMessage metadata and the subagent.step/.end run events (step_events.py), and frontend messages/usage.normalizeTokenUsage() backs both the live task_running event (lifecycle.ts) and the terminal ToolMessage metadata (subtask-result.ts). Prevents the input/output/ total_tokens validation from drifting across the four former copies. - Nit: onCustomEvent narrows event.type once instead of re-checking the object shape per branch; the redundant task_started early-return (already validated by taskEventToSubtaskUpdate) is dropped.	2026-07-11 15:41:57 +08:00
Ryker_Feng	ebc09ce130	feat(mcp): auto-promote deferred MCP tools from routing hints (#4019 ) * feat(mcp): auto-promote deferred MCP tools from routing hints When tool_search.enabled=true defers MCP tool schemas, PR1 routing hints still require the model to spend a tool_search discovery round trip before it can call the tool the routing metadata already points at. This adds a McpRoutingMiddleware that matches the latest user message against PR1 routing keywords and promotes the matching deferred schemas before the model call, removing that round trip. Design (soft routing, opt-in, additive): - Matches only the latest real HumanMessage (shared is_real_user_message helper, reused by SkillActivationMiddleware so the two cannot drift); case-insensitive substring match, no tokenizer dependency. - Ordering: priority desc, then tool name asc; capped by the new global tool_search.auto_promote_top_k (default 3, clamped 1..5). Does not add or consume a per-tool auto_promote_top_k (PR1 schema unchanged); a per-tool value is ignored with a DEBUG note. - Returns a plain {"promoted": ...} state update (not a Command) and relies on ThreadState.merge_promoted for union/dedupe, so auto-promote and a model-triggered tool_search converge on the same catalog hash. - Installed before DeferredToolFilterMiddleware on every deferred-tool path (lead agent, subagent, embedded client, webhook via shared builders); a construction-time assert rejects the reversed order. catalog_hash is None / no routing index is a complete no-op, so bootstrap and ACP skip it. - Privacy: never executes tools, never promotes policy-filtered tools, adds no routing keywords or matched tool names to trace metadata or INFO/WARN logs. No behavior change when tool_search.enabled=false. Tests: index construction, matching semantics, middleware state updates, same-cycle deferred-filter interaction, lead/subagent/embedded-client builder wiring + order invariant, config clamping, config.example.yaml parseability, and privacy assertions. * refactor(mcp): address auto-promote review nits - executor: access app_config.tool_search.auto_promote_top_k directly to match the lead-agent and embedded-client paths (drop the over-defensive getattr that masked missing config); update the subagent test mock to carry tool_search. - tool_search / mcp_routing_middleware: cross-reference the duplicated routing priority/keyword normalization between the builder and the middleware's defensive _normalize_index so they cannot silently drift. - MCP_SERVER.md: document that auto-promote keyword matching is a case-insensitive substring test (not word-boundary), advising distinctive keywords.	2026-07-10 07:54:36 +08:00
Ryker_Feng	5ba25b06ec	feat(mcp): add MCP routing hints (#4004 ) * feat: add MCP routing hints * test: isolate mcp routing prompt config * fix: address mcp routing review feedback	2026-07-09 16:26:31 +08:00
hataa	c9fb9768d4	fix(subagents): unify guardrail caps on additive stop_reason + add token_budget (#3875 Phase 2) (#3980 ) Phase 2 of #3875. Two guardrail axes can end a subagent run early — the turn budget (GraphRecursionError) and the token budget (TokenBudgetMiddleware) — and both now surface why through one additive `subagent_stop_reason` field instead of a status enum. This completes and course-corrects Phase 1 (#3949), which shipped the turn-budget cap as a `max_turns_reached` status enum. The agreed Phase 2 design replaces that enum with an optional `stop_reason` field (token_capped \| turn_capped \| loop_capped): a new enum value would break v1 consumers, while an additive field is ignored by older frontends and ledger readers. `max_turns_reached` and SubagentStatus.MAX_TURNS_REACHED are removed. - subagents.token_budget config (default enabled, 2,000,000 tokens, warn 0.7) with per-agent override; TokenBudgetMiddleware is now attached in build_subagent_runtime_middlewares so the cost-ceiling backstop engages for every subagent. The hard-stop does not raise — it strips tool_calls and lets the run finish with a final answer, recording the cap on a per-run consume_stop_reason() accessor. - executor.py: on normal completion it reads consume_stop_reason() and stamps completed + token_capped when the budget fired; on GraphRecursionError it recovers the last AIMessage partial (completed + turn_capped) or, if nothing usable survived, failed + turn_capped. SubagentResult gains stop_reason. - status_contract.py / contracts/subagent_status_contract.json (v2) / frontend subtask-result.ts: additive subagent_stop_reason field, pinned by test_status_values_match_contract / test_stop_reason_values_match_contract. - task_tool.py + delegation_ledger.py: drop the max_turns_reached paths; the ledger captures stop_reason and renders model-facing "capped" guidance so the lead reuses a capped completion knowingly. The 2,000,000-token default is deliberately loose (tighten to taste) — it would have roughly halved the reported 4.4M burn while leaving legitimate deep-research runs (max_turns=150) room. Subagent summarization is a follow-up.	2026-07-08 22:26:06 +08:00
AochenShen99	658c39ccf7	feat(skills): Add native SkillScan phase 1 for skills (#3033 ) * Add phase 1 skill static scanning * Rework SkillScan phase 1 as native scanner * refactor(skillscan): align phase 1 with trimmed RFC contract - SecurityFinding: 7 fields (rule_id, severity, file, line, message, remediation, evidence); category/analyzer derive from the rule_id prefix, confidence/column/fingerprint/metadata removed - scan_archive_preflight()/scan_skill_dir() are pure functions: no ScanContext, no policy schema; CRITICAL-blocks is a code constant and skill_scan.enabled is applied by enforce_static_scan()/callers - secret-* evidence is redacted before findings leave the scanner - de-dup keys on (rule_id, file, line) so repeated occurrences keep distinct locations for agent self-correction - cloud-metadata detection consolidated into network-cloud-metadata - nested zip members get a one-level stdlib magic-byte peek; an executable member escalates package-nested-archive to CRITICAL - install metadata sidecar removed (Phase 7 decides if it is needed) - rule specs moved next to their analyzers; skillscan/rules/ removed - tests updated + new anchors: redaction, dedup lines, nested-zip escalation, single cloud-metadata rule, bundled-skill zero-CRITICAL * fix(skillscan): tighten reverse-shell/secret/archive scan rules from review Address PR #3033 review feedback on the native SkillScan analyzers: - Reverse-shell false positives: split shell detection by signal strength (/dev/tcp/, nc -e stay CRITICAL; bash -i, mkfifo -> new HIGH shell-reverse-shell-heuristic, warn->LLM). The Python check is now AST-anchored on real socket.socket/os.dup2/subprocess call sites instead of raw-text substring matching, so prose/docstrings no longer hard-block. - Secret evidence: _redact_secret_evidence returns [redacted] with no secret bytes (was value[:6], which leaked 2 real token bytes past the prefix). - Archive DoS: cap outer archive member count (_MAX_ARCHIVE_MEMBERS=4096); scan_archive_preflight early-aborts with a package-too-many-members CRITICAL finding (routes through the existing blocked->400 fail-closed path). - shell-destructive-command: broaden the rm -rf matcher to sensitive system roots (/home, /usr, /*, --no-preserve-root /) while leaving safe subpaths unflagged. - Dead code: collapse _decode_text_for_analysis to a single decode path and drop the unused _TEXT_SUFFIXES set and _has_text_shebang helper. - local_skill_storage: document why the host_path branch keeps app_config possibly-None (lazy kill-switch resolution; avoids eager get_app_config in config-free environments such as CI). Tests: new negative/positive coverage in test_skillscan_native.py. Full backend suite 6616 passed, 26 skipped.	2026-07-07 21:44:28 +08:00
hataa	0664ea2243	fix(subagents): surface turn-budget cap as MAX_TURNS_REACHED with partial result (#3875 Phase 2) (#3949 ) * fix(subagents): surface turn-budget cap as MAX_TURNS_REACHED with partial result (#3875) Phase 2 of #3875. When a subagent exhausts its turn budget (recursion_limit == max_turns), LangGraph raises GraphRecursionError from agent.astream. The generic except Exception in _aexecute misclassified it as FAILED and discarded the partial work already streamed into final_state, so the lead could not tell 'broken subagent' from 'out of budget' and got an empty failure. Catch GraphRecursionError specifically (before the generic handler) and set a distinct SubagentStatus.MAX_TURNS_REACHED terminal status, recovering the partial result from the last streamed chunk via a shared _extract_final_result helper (refactored out of the normal-completion path so both paths render content identically). Extend the cross-language status contract so the new value travels on additional_kwargs.subagent_status: a capped run is result-bearing, so make_subagent_additional_kwargs / read_subagent_result_metadata carry subagent_result_brief + subagent_result_sha256 (the recovered work, like completed) AND the cap notice on subagent_error -- the one status that carries both. task_tool.py returns it via the shared _task_result_command; the delegation ledger prefers the partial result_brief and renders model-facing guidance (reuse / retry tighter / raise max_turns). Frontend collapses max_turns_reached to the failed pill with the cap notice on error. No agent-loop, runner, or persistence behavior touched; default max_turns is unchanged. * refactor(subagents): consolidate content-stringify onto shared helper Address review feedback on #3949 (willem-bd, copilot-pull-request-reviewer): - executor.py: drop the private `_stringify_message_content` — a third near-duplicate of `utils/messages.py::message_content_to_text`. `_extract_final_result` now delegates to that canonical helper; the "No response generated" sentinel is pushed down to the consumer (the shared helper returns "" for no-text, matching every other call site). - task_tool.py: align the live `task_failed` event's error string with the canonical "Reached max_turns=N" used by the logger, the structured `error=`, and the executor (was "Reached max turns (N)"). Behavior for real AIMessage content is unchanged; only atypical edge inputs (consecutive bare-string list items; empty content) now match the canonical helper that every other call site already uses. `extract_response_text` is intentionally left as-is: it filters by OpenAI content-block `type`, a different shape with many callers and its own tests. Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Claude <noreply@anthropic.com>	2026-07-06 07:55:32 +08:00
Zheng Feng	dcb2e687d5	feat(channels): add GitHub as a webhook-driven channel (#3754 ) * feat(channels): add GitHub event-driven agents (#3754) Add a webhook-driven GitHub channel with fail-closed webhook routing, deterministic per-agent PR/issue threads, mention-gated trigger fan-out, GitHub App token injection for sandboxed gh/git commands, and backend/AGENTS.md documentation. * fix(llm-middleware): classify bare IndexError as transient Upstream chat providers occasionally return 200 OK with an empty generations list (observed against Volces "coding" on ark.cn-beijing.volces.com). When that happens, langchain_core.language_models.chat_models.ainvoke raises ``IndexError: list index out of range`` at ``llm_result.generations[0][0].message`` and kills the run. Treat a bare IndexError reaching the middleware as a transient upstream-payload glitch and route it through the existing retry/backoff path instead of failing the whole agent run. The retry budget and backoff schedule are unchanged. Adds three regression tests covering the classifier and both the recover-on-retry and exhausted-retries paths. * fix(runtime): ignore stale LLM fallback markers from prior runs When a run on a thread ends with the LLM-error-handling middleware emitting a `deerflow_error_fallback`-marked AIMessage (e.g. after the IndexError empty-generations classification fix lands), that message is persisted to the thread's checkpoint as part of the messages channel. LangGraph replays the full message history in `stream_mode="values"` chunks, so every subsequent run on the same thread re-streams the stale fallback marker — and the worker's chunk scanner faithfully picks it up, flipping `RunStatus.success` to `RunStatus.error` for runs that themselves had no LLM failure at all. Snapshot the set of pre-existing message ids from the pre-run checkpoint and thread it through `_extract_llm_error_fallback_message` / `_try_extract_from_message` as a filter. Markers on history messages are ignored; markers on fresh messages produced during this run still trip the error path. Falls back to an empty set when the checkpointer is absent or the snapshot can't be captured, preserving the prior behavior on first-run / no-state paths. Adds unit tests for the new filter (helper-level and `_collect_pre_existing_message_ids`) plus an integration test exercising the full `run_agent` path with a stale history checkpointer. * fix(channels): make github channel fire-and-forget to avoid httpx.ReadTimeout on long runs GitHub agent runs (clone -> edit -> test -> push -> PR) routinely exceed the langgraph_sdk default 300s read deadline. The manager's runs.wait call kept an HTTP stream open for the entire run lifetime, so the long run blew up with httpx.ReadTimeout and the outer except branch then released the dedupe key and emitted a false 'internal error' outbound. The GitHub channel's outbound send is log-only by design: agents post to the issue/PR via the gh CLI in the sandbox when they choose to comment or create a PR. There is nothing for the manager to ferry back, so the long-poll was pure overhead. This change adds ChannelRunPolicy.fire_and_forget (default False) and sets it True for the github channel. When fire_and_forget is True, _handle_chat dispatches via client.runs.create (short POST, returns once the run is pending) instead of client.runs.wait, and skips the response-extraction + outbound-publish block. ConflictError on a busy thread still trips the standard THREAD_BUSY_MESSAGE path so behavior on the busy case is preserved for any future non-github fire-and-forget channel. Other (non-github) channels are unchanged: their policy defaults fire_and_forget=False and they continue to dispatch via runs.wait. Adds 6 regression tests in tests/test_channels.py::TestGithubFireAndForget: - Default ChannelRunPolicy.fire_and_forget is False. - The github policy registers fire_and_forget=True. - github inbound calls runs.create, not runs.wait, with the right kwargs. - github inbound publishes no outbound on success. - ConflictError from runs.create still emits THREAD_BUSY_MESSAGE. - Non-github channels (slack) still dispatch via runs.wait. * test(lead-agent): accept user_id kwarg in skill-policy test stubs The two GitHub-channel tests added in #3754 stubbed _load_enabled_skills_for_tool_policy with a lambda that only accepted `available_skills` and `app_config`, but the real function (and its call site in agent.py) also passes `user_id`. This raised TypeError on every run, failing backend-unit-tests. Add `user_id=None` to match the three sibling stubs in the same file. * refactor(gateway): disambiguate context-key set names The two frozensets _INTERNAL_ONLY_CONTEXT_KEYS and _CONTEXT_ONLY_KEYS shared a confusable "CONTEXT_ONLY" token in different orders, and the first broke the _CONTEXT_<X>_KEYS pattern of its sibling _CONTEXT_CONFIGURABLE_KEYS. Rename to make the distinct axes explicit: _CONTEXT_INTERNAL_CALLER_KEYS - WHO: internal callers (scheduler) only _CONTEXT_RUNTIME_ONLY_KEYS - WHERE: runtime context only, never configurable Pure rename, no behavior change.	2026-07-04 22:56:24 +08:00
Xinmin Zeng	576577bd32	feat(channels): expose IM channel_user_id to sandbox commands as DEERFLOW_CHANNEL_USER_ID (#3926 ) * feat(channels): expose channel_user_id to sandbox commands as DEERFLOW_CHANNEL_USER_ID IM-channel skills need the sender's platform identity (Feishu open_id, Slack Uxxx, ...). The channel manager already writes channel_user_id into body.context, but the Gateway whitelist dropped it. Forward it into the runtime context only (never configurable, which is checkpointed), and have bash_tool export it as a fixed env var through a shell-quoted command prefix. The identity deliberately does not ride execute_command(env=...): that channel is reserved for request-scoped secrets, and a non-empty env switches AioSandbox onto the bash.exec path (fresh session per call, image >= 1.9.3 required), which would have broken every IM bash command on older sandbox images and abandoned persistent-shell semantics on new ones. A command-string export keeps the legacy path, stays visible in audit logs (it is an identifier, not a secret), and gives per-call correctness in group chats where one thread and sandbox are shared by senders with different platform ids. Skipped on the Windows local sandbox, whose PowerShell/cmd.exe fallback has no POSIX export. Part of #3914 * feat(channels): propagate channel_user_id to subagents; cap value length Review findings from the pre-PR verification pass: - Subagent delegation dropped the sender identity: task_tool now captures channel_user_id from the parent runtime context and the executor forwards it into the subagent's context, mirroring the guardrail attribution fields (user_role/oauth_/run_id). Without this, bash commands delegated via task lost the group-chat sender's id. - body.context is client-writable on web requests, so values over 256 chars are ignored instead of bloating every command string sent to the sandbox. fix(channels): set-or-unset channel_user_id so identity is per-call regardless of AIO session persistence Review (willem-bd): the identity export could leak across senders in a shared group-chat AIO sandbox. The AIO no-env path reuses a persistent shell session (the class-lock reason, #1433), and the 256-char/type guard made some commands carry no prefix — so a dropped-id command could resolve the id a previous sender exported. Make per-call correctness independent of session semantics: an IM-channel command (channel_user_id present in context) now always carries an explicit prefix — export VAR=<quoted> for a valid id, or unset VAR for an unusable one (empty / non-str / over the cap). Non-IM runs (no key) are untouched. A prefix unset has none of the '& ; unset' suffix hazard raised earlier. Verified on a real AIO 1.11.0 container: the no-id shell path auto-creates a session per call (does not persist today), but an explicit shared session DOES persist (export stale-A -> readback [stale-A]); the unset prefix clears it (-> []). So the fix holds even on an image whose no-id path persists. Regression tests cover the dropped-id group-chat window and the non-IM passthrough. Part of #3914 * test(channels): align channel_user_id task test with new Command return shape The merge from main changed task_tool to return a Command(update=...) instead of a plain string; update the assertion to extract the tool message via the existing _task_tool_message helper, matching the sibling tests. Fixes the CI backend-unit-tests failure introduced by the merge.	2026-07-04 21:18:11 +08:00
Zhipeng Zheng	53a80d3ad1	feat(skills): per-user custom skill isolation with sandbox mounting (#3889 ) * feat(skills): per-user skill isolation (#2905) Implement user-scoped skill storage that isolates custom skills between users while sharing public skills globally. Key changes: - Add UserScopedSkillStorage class for per-user custom skill directories - Introduce get_or_new_user_skill_storage() factory with user_id context - Auth middleware sets effective_user_id for request-scoped storage - Agent/prompt/middleware now use user-scoped storage and prompt cache - Sandbox mounts user-scoped skill directories for search/read tools - Add validate_skill_file_path() to SkillStorage for path security - Migration script supports --all-users bulk migration - Frontend: add editable field to Skill type, error check in enableSkill - All skill categories can be toggled (custom skills default to enabled) - Update skill-creator SKILL.md with isolation-aware instructions Tests: - Add test_user_scoped_skill_storage.py (new) - Update all existing skill tests for user-scoped storage - Update sandbox, client, and router tests * fix(skills): address second-round PR review feedback (#3889) - P1-1: restrict legacy skill mount to users without custom skills - P1-2: fail-closed for _is_disabled_skill_path (OSError → return True) - P2-1: AND-merge global extensions_config skill disabled state - P2-2: atomic write for _skill_states.json (mkstemp + replace) - P2-3: normalize X-DeerFlow-Owner-User-Id in trusted boundary - P2-4: LRU-bounded _enabled_skills_by_config_cache (OrderedDict, maxsize=256) - P2-5: clear global prompt cache on PUBLIC skill toggle - P2-6: invalidate skill caches on client.update_skill * fix(tests): correct tool policy test after merge * fix(skills): use DEFAULT_SKILLS_CONTAINER_PATH in UserScopedSkillStorage The "/mnt/skills" literal in UserScopedSkillStorage.__init__ triggers test_skill_container_path_defaults::test_mnt_skills_literal_is_owned_by_skill_constants_module on CI. Migrate the default to the existing deerflow.constants constant, matching the pattern already used by LocalSkillStorage, SkillStorage, and the durable/tool_error middlewares. --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com>	2026-07-04 13:54:04 +08:00
AochenShen99	66b9e7f212	feat: emit structured runtime metadata (follow-up#3887) (#3906 ) * feat: emit structured runtime metadata * fix: avoid subagent import cycle in replay gateway * fix: preserve legacy subtask result parsing * refactor: tighten runtime metadata contracts * fix(middleware): keep recovery hint on task exception wrapper content The structured-metadata stamp overwrote the wrapper text with the bare task-failure message, dropping the model-facing 'Continue with available context, or choose an alternative tool.' guidance that every other tool exception keeps. Append the shared hint after the formatted message. * fix(subagents): require lowercase hex for result_sha256 reader Length-only validation accepted any 64-char string; a faulty serializer or relaying wrapper could store a non-digest value in the delegation ledger. Enforce the producer's hexdigest shape with a fullmatch. --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com>	2026-07-04 11:27:19 +08:00
AnoobFeng	e3e5c73b03	feat(observability): add trace-id correlation and enhanced logging (#3902 ) * feat(observability): add trace-id correlation and enhanced logging - add opt-in gateway request trace correlation via X-Trace-Id - enhance logging with configurable trace_id-aware formatting - propagate deerflow_trace_id into runtime context and Langfuse metadata - keep enhanced logging disabled by default to preserve existing behavior * fix: harden trace correlation wiring - Make logging enhancement a restart-required startup snapshot and remove per-request config reads from TraceMiddleware - Restrict trace ids to printable ASCII before writing them to response headers, logs, and Langfuse metadata - Gate implicit DeerFlowClient trace-id creation behind logging.enhance.enabled while preserving explicit caller opt-in - Bind embedded client trace context per stream step to avoid generator ContextVar leaks and cross-context reset errors - Rebind memory update trace ids in Timer/executor worker paths so enhanced logs keep the captured correlation id - Remove unrelated __run_journal context overwrite from the trace-correlation change set * fix(gateway): avoid eager app construction on package import * fix(gateway): avoid config load during app import Keep Gateway app construction import-safe when config.yaml is absent by disabling TraceMiddleware only for that construction-time fallback path. Startup lifespan still performs strict config loading before serving.	2026-07-03 08:01:46 +08:00
Miracle778	5a699e24a1	feat(guardrails): expose authenticated runtime context in GuardrailRequest (#3665 ) * docs: guardrail runtime attribution spec * docs: guardrail request attribution implementation plan * feat(guardrails): add runtime user context and attribution fields to GuardrailRequest Extend GuardrailRequest with optional runtime attribution fields so that pluggable GuardrailProviders can access authenticated user context and tool-call-level attribution: - Gateway injects user_role, oauth_provider, oauth_id into runtime context alongside the existing user_id (server-authenticated only, client spoofing prevented) - GuardrailRequest gains: user_id, user_role, oauth_provider, oauth_id, run_id, tool_call_id (all optional, backward compatible) - GuardrailMiddleware reads these from ToolCallRequest.runtime.context - thread_id now actually populated from context (was always None before) - Tests: 15 new/expanded tests covering Gateway injection, runtime context reading, partial/missing fields, and client spoofing prevention - Docs: new Runtime Attribution section in GUARDRAILS.md with provider example and YAML policy illustration * fix(guardrails): propagate attribution to subagents * fix(guardrails): complete subagent attribution propagation --------- Co-authored-by: Miracle778 <miracle778@no-reply.com>	2026-06-21 16:08:25 +08:00
heart-scalpel	a72af8ea37	feat(subagents): attribute subagent spans to parent thread's Langfuse session (#3611 ) The subagent execution path did not call inject_langfuse_metadata(...) and built its model with attach_tracing=True, so subagent LLM/tool spans landed in Langfuse as isolated top-level traces carrying fresh session ids and the default user. They were findable in the unfiltered trace list but did not group under the parent thread's session card, and Langfuse cost attribution for subagent traffic did not line up with the parent conversation — even though DeerFlow's internal token accounting (SubagentTokenCollector) was already correct. Extend the lead-agent tracing wiring to the subagent path so a single subagent run produces one trace that shares the parent thread's session_id and user_id, with a subagent:<name> trace name: - subagents/executor.py: append build_tracing_callbacks() output to run_config["callbacks"] (preserving SubagentTokenCollector) and call inject_langfuse_metadata(...) with thread_id, user_id, and the normalized subagent:<name> trace name. Build the model with attach_tracing=False so model-level tracing does not double-count with the graph-root callbacks — the same pairing the lead agent uses. - tools/builtins/task_tool.py: resolve user_id via resolve_runtime_user_id(runtime) at the parent tool layer (before the background thread starts) and thread it through SubagentExecutor.__init__, because the _current_user contextvar is not guaranteed to survive the _execution_pool boundary. Trace topology is unchanged: subagent traces remain separate top-level traces in the same session, not nested as child spans under the lead trace (Plan B follow-up). Tests: tests/test_subagent_executor.py::TestSubagentTracingWiring covers the callback append, the session/user/trace-name injection, the disabled-langfuse no-op, the DEFAULT_USER_ID fallback, the empty-name trace-name fallback, and the env-tag emission. Existing test_create_agent_threads_explicit_app_config_to_model_and_middlewares now also asserts attach_tracing=False. Docs: CLAUDE.md Tracing System section documents subagents/executor.py as a third injection point alongside worker.py and client.py.	2026-06-17 14:36:09 +08:00
Huixin615	f43aa78107	fix(agents): sync agent_name across context/configurable and reject empty soul (#3549 ) (#3553 ) * fix(agents): sync agent_name across context/configurable and reject empty soul (#3549) Two independent issues caused custom agent creation to silently fail: 1. build_run_config only wrote agent_name into one container (configurable or context), so setup_agent — which reads ToolRuntime.context exclusively since LangGraph >=1.1.9 — saw agent_name=None and wrote SOUL.md to the global base_dir instead of users/{user_id}/agents/{name}/. Mirror the dual-write pattern already used by merge_run_context_overrides and naming.py so both containers always carry the same value. 2. setup_agent persisted whatever soul string it received, including empty or whitespace-only content, and still reported success. The frontend then surfaced an unusable agent and the global default SOUL.md could be silently overwritten with empty content. Reject empty soul before any filesystem operation so the model can retry. Tests: - test_gateway_services.py: dual-write regressions for both configurable and context entry paths, explicit-agent-name precedence on both sides, and a shape-parity test against merge_run_context_overrides. - test_setup_agent_tool.py: empty/whitespace soul rejection, plus no-overwrite guarantees for existing global and per-agent SOUL.md. * Update services.py	2026-06-14 10:40:16 +08:00
AochenShen99	3b6dd0a4e3	feat(subagents): extend deferred MCP tool loading to subagents (#3432 ) * feat(subagents): extend deferred MCP tool loading to subagents (#3341) Subagents now reuse the lead agent's deferred-tool path: when tool_search.enabled, MCP tool schemas are withheld from the model and surfaced by name in <available-deferred-tools>, fetched on demand via the generated tool_search helper. DeferredToolFilterMiddleware deterministically rewrites request.tools to hide the deferred schemas (the prompt section is discovery only, not enforcement). Consolidates the assembly into deerflow.tools.builtins.tool_search, now the single home for both assemble_deferred_tools (centralized fail-closed guard, replacing the lead-only private _assemble_deferred) and the relocated get_deferred_tools_prompt_section. Shared by every build path: lead agent, embedded client, and subagent executor. tool_search is appended after the subagent's name-level tool policy and is treated as infrastructure: its catalog is built from the already policy-filtered list, so it can never surface a tool the policy denied. Follow-up to #3370. Fixes #3341. * test(subagents): assert the real middleware builder emits a working deferred filter (#3341) The existing recipe test hand-constructs DeferredToolFilterMiddleware, so it cannot catch a regression in how build_subagent_runtime_middlewares (the call executor._create_agent actually makes) wires the deferred setup into the filter. Add a test that sources the filter from the real builder given a real setup and runs it through a graph: a wrong catalog hash would silently stop promotion, a dropped filter would stop hiding — both now caught. Running the full real middleware stack is intentionally avoided (the other runtime middlewares need sandbox/thread infra to execute, which would make the test flaky); their attachment + ordering before Safety stays locked in test_tool_error_handling_middleware.py. * test(subagents): keep executor tests config-free in CI * chore: trigger ci * Potential fix for pull request finding Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com> Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>	2026-06-08 23:17:22 +08:00
AochenShen99	2bbc7879fa	refactor(tool-search): consolidate MCP metadata tag and harden deferred-tool setup (#3370 ) Follow-up to #3342 (deferred MCP tool loading). Maintainability cleanup plus hardening of malformed/empty tool_search queries; no change to the deferral mechanism or search ranking. - Add deerflow/tools/mcp_metadata.py as the single source of truth for the "deerflow_mcp" tag (MCP_TOOL_METADATA_KEY + tag_mcp_tool + public is_mcp_tool). Removes the duplicated magic string and the private, cross-module _is_mcp_tool import. - tool_search.search: never raise on model-generated input. Extract _compile_catalog_regex (shared compile-with-literal-fallback); return empty for empty/whitespace queries and a bare "+" instead of matching everything or raising IndexError. - DeferredToolSetup: document the empty-vs-populated invariant. - build_deferred_tool_setup: comment the two distinct empty-return branches. - _assemble_deferred: add return type, rename local to deferred_setup, build the final list with an explicit append. - Tests: use tag_mcp_tool instead of per-file tag helpers; cover empty and bare-"+" queries.	2026-06-05 15:21:41 +08:00
Eilen Shin	28b1da2172	fix(agents): harden update_agent null-like args (#3237 ) * fix(agents): harden update_agent null-like args * docs: mention undefined null-like update args --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com>	2026-06-04 07:10:59 +08:00
AochenShen99	d9f4724950	fix(tool-search): reliably hide deferred MCP schemas by removing the ContextVar (closures + graph state) (#3342 ) * feat(tool-search): add hash-scoped promoted state to ThreadState * feat(tool-search): add immutable DeferredToolCatalog with stable hash * feat(tool-search): add build_deferred_tool_setup + Command-writing tool_search * refactor(tool-search): replace deferred-tool ContextVar with closures + graph state (#3272) Build the deferred catalog + tool_search tool per agent from the policy-filtered tool list (after skill allowed-tools), pass deferred_names + catalog_hash explicitly to DeferredToolFilterMiddleware and the prompt, and record promotions in ThreadState.promoted (scoped by catalog_hash) via a Command-returning tool_search. Removes DeferredToolRegistry and the _registry_var ContextVar so deferral no longer depends on build/execute sharing an async context. MCP tools are tagged with metadata[deerflow_mcp]; client.py assembles deferral the same way. Catalog is built AFTER tool-policy filtering (no policy-excluded tool can leak via tool_search) and assembly is fail-closed. Migrate tests off the deleted registry APIs; delete the obsolete ContextVar-based #2884 regression (re-covered by state-based tests in a follow-up). * test(tool-search): lock tool_search promotion into next model turn via graph state * test(tool-search): cross-context, policy-leak, fail-closed, #2884 isolation regressions * test(tool-search): align real-LLM e2e with closure-based deferred setup * docs: update DeferredToolFilterMiddleware description for closure+state design * style(tests): drop unused import in test_deferred_setup (ruff) * test(tool-search): harden merge_promoted + replace tautological catalog test From independent code review: - merge_promoted: use existing.get("catalog_hash") so a forward-incompatible or externally-injected persisted promoted dict triggers a replace instead of a KeyError crash; add regression test for the malformed-existing case. - test_deferred_catalog: replace the `== [] or True` tautology (a test that could never fail) with a deterministic invalid-regex->literal-fallback check (positive match on calc + negative empty match). - DeferredToolCatalog: comment why frozen-without-slots is required for the cached_property hash/names fields (adding slots=True would break them). * fix(tool-search): read tool_search.enabled from self._app_config in client DeerFlowClient._ensure_agent called get_app_config() directly to read tool_search.enabled, but the client already resolves and stores its config as self._app_config at construction (and uses it everywhere else). The bare call re-resolves config from disk at agent-build time, which raises FileNotFoundError in environments without a config.yaml (CI) — test_client.py's fixture only patches get_app_config during __init__, so the later call hit the real loader. Use self._app_config, matching the rest of the client. * test(tool-search): lock tool_search post-policy append ordering tool_search is appended after skill-allowlist filtering, so the allowlist can no longer deny it by name. Lock the intended contract: it only appears when allowed MCP tools survive the filter, and its catalog (derived from the already policy-filtered list) can never expose a denied tool. Addresses the ordering observation from the Copilot review on #3342.	2026-06-02 22:43:22 +08:00
Xinmin Zeng	e93f658472	fix(stability): resolve P0 blockers from v2.0-m1-rc1 stability audit (#3107 ) (#3131 ) * fix(task-tool): unwrap callback manager when locating usage recorder `config["callbacks"]` may arrive as a `BaseCallbackManager` (e.g. the `AsyncCallbackManager` LangChain hands to async tool runs), not just a plain list. The previous `for cb in callbacks` loop raised `TypeError: 'AsyncCallbackManager' object is not iterable`, which `ToolErrorHandlingMiddleware` then converted into a failed `task` ToolMessage even though the subagent had completed internally — Ultra mode lost subagent results and the lead agent fell back to redoing the work. Unwrap `BaseCallbackManager.handlers` before searching for the recorder. Refs: bytedance/deer-flow#3107 (BUG-002) * fix(frontend): treat any task tool error as a terminal subtask failure The subtask card status machine matched only three English prefixes (`Task Succeeded. Result:`, `Task failed.`, `Task timed out`). Anything else fell through to `in_progress`, so a `task` tool error wrapped by `ToolErrorHandlingMiddleware` (`Error: Tool 'task' failed ...`) left the card spinning forever even after the run had ended. Extract the prefix logic into `parseSubtaskResult` and recognise any leading `Error:` token as a terminal failure. The extracted function is unit-tested against the legacy prefixes plus the `AsyncCallbackManager` regression captured in the upstream issue. Refs: bytedance/deer-flow#3107 (BUG-007) * fix(frontend): exclude hidden, reasoning, and tool payloads from chat export `formatThreadAsMarkdown` / `formatThreadAsJSON` iterated raw messages without running the UI-level `isHiddenFromUIMessage` filter. Exported transcripts therefore included `hide_from_ui` system reminders, memory injections, provider `reasoning_content`, tool calls, and tool result messages — content that is intentionally hidden in the chat view. Filter the export to the user-visible transcript by default and gate reasoning / tool calls / tool messages / hidden messages behind explicit `ExportOptions` flags so a future debug export can opt back in without forking the formatter. Refs: bytedance/deer-flow#3107 (BUG-006) * fix(gateway): route get_config through get_app_config for mtime hot reload `get_config(request)` returned the `app.state.config` snapshot captured at startup. The worker / lead-agent path then threaded that frozen `AppConfig` through `RunContext` and `agent_factory`, so per-run fields edited in `config.yaml` (notably `max_tokens`) were ignored until the gateway process was restarted — even though `get_app_config()` already does mtime-based reload at the bottom layer. Route the request dependency through `get_app_config()` directly. Runtime `ContextVar` overrides (`push_current_app_config`) and test-injected singletons (`set_app_config`) keep working; `app.state.config` is now only read at startup for one-shot bootstrap (logging level, IM channels, `langgraph_runtime` engines). `tests/test_gateway_deps_config.py` encoded the old snapshot contract and is removed; `tests/test_gateway_config_freshness.py` replaces it with mtime, ContextVar, and `set_app_config` coverage. `test_skills_custom_router.py` and `test_uploads_router.py` now inject test configs via FastAPI `dependency_overrides[get_config]` instead of mutating `app.state.config`. Document the hot-reload boundary in `backend/CLAUDE.md` so reviewers know which fields are picked up on the next request vs. which still require a restart (`database`, `checkpointer`, `run_events`, `stream_bridge`, `sandbox.use`, `log_level`, `channels.`). Refs: bytedance/deer-flow#3107 (BUG-001) fix(gateway): broaden get_config 503 to any config-load failure Address review feedback on the previous commit: 1. Narrow exception catch removed. The old contract returned 503 whenever `app.state.config is None`. The first cut only mapped `FileNotFoundError`, leaving `PermissionError`, YAML parse errors, and pydantic `ValidationError` to bubble up as 500. At the request boundary we treat any inability to materialise the config as "configuration not available" (503) and log the original exception so the operator still has the stack. 2. Removed the unused `request: Request` parameter and the matching `# noqa: ARG001`. FastAPI's `Depends()` does not require the dependency to accept `Request`; the only call site uses the no-arg form. 3. `backend/CLAUDE.md` boundary now lists the reason each field is restart-required (engine binding, singleton caching, one-shot `apply_logging_level`, etc.), not just the field name, so reviewers do not have to reverse-engineer the boundary themselves. Tests parametrise four exception classes (`FileNotFoundError`, `PermissionError`, `ValueError`, `RuntimeError`) and assert 503 for each. Refs: bytedance/deer-flow#3107 (BUG-001) * fix(task-tool): defend _find_usage_recorder against non-list callbacks Address review feedback. The previous commit handled the two common shapes LangChain hands to async tool runs — a plain `list[BaseCallbackHandler]` and a `BaseCallbackManager` subclass — but iterated any other shape directly, which would still raise `TypeError` if e.g. a single handler instance leaked through without a list wrapper. Treat any non-list, non-manager `config["callbacks"]` value as "no recorder" rather than crash. Docstring now lists all four shapes explicitly. New tests cover the single-handler-object case, `runtime is None`, `callbacks is None`, and `runtime.config` being a non-dict — all required to be silent no-ops. Refs: bytedance/deer-flow#3107 (BUG-002) * fix(frontend): drop dead identity ternary and add opt-in export tests Address review feedback on the previous export commit: 1. Removed the no-op `typeof msg.content === "string" ? msg.content : msg.content` expression in `formatThreadAsJSON`. Both branches returned the same value; the message content now flows through unchanged whether it is a string or the rich `MessageContent[]` shape (LangChain JSON-serialises the array structure correctly already). 2. Expanded the JSDoc on `ExportOptions` to make it clearer that the four flags are not currently wired to any UI control — callers wanting a debug export must build the options object explicitly. The default behaviour continues to match the explicit prescription in bytedance/deer-flow#3107 BUG-006. 3. Added opt-in coverage. The previous tests only exercised the `options = {}` default path; the new cases verify each flag flips the corresponding payload back into the export so a future debug-export surface does not silently break the contract. Refs: bytedance/deer-flow#3107 (BUG-006) * fix(frontend): export subtask prefix constants and document fallback intent Address review feedback on the previous BUG-007 commit: 1. `SUCCESS_PREFIX`, `FAILURE_PREFIX`, `TIMEOUT_PREFIX`, and the `ERROR_WRAPPER_PATTERN` regex are now exported. The JSDoc explicitly pins them as part of the backend↔frontend contract defined in `task_tool.py` and `tool_error_handling_middleware.py`, so any future structured-status migration (e.g. backend writing `additional_kwargs.subagent_status` instead of leading text) can reference these from one canonical place rather than redefine them. 2. The `in_progress` fallback now carries a docstring explaining the deliberate choice — LangChain only ever emits a `ToolMessage` once the tool itself has returned, so unrecognised content means the contract has drifted and "still running" is the right operator signal (eagerly marking it terminal-failed would mask the drift). No behaviour change; this is documentation and an API export. Refs: bytedance/deer-flow#3107 (BUG-007) * fix(gateway): drop app.state.config snapshot and freeze run_events_config Address @ShenAC-SAC's BUG-001 review on #3131. The previous cut still stored an ``AppConfig`` snapshot on ``app.state.config`` for startup bootstrap. Two follow-on hazards from that: 1. Future code touching the gateway lifespan could accidentally start reading ``app.state.config`` again, silently regressing the request hot path back to a stale snapshot. 2. ``get_run_context()`` paired a freshly-reloaded ``AppConfig`` with the startup-bound ``event_store`` and a live ``run_events_config`` field — so an operator who edited ``run_events.backend`` mid-flight would have produced a run context whose ``event_store`` and ``run_events_config`` referred to different backends. Clean approach (aligned with the direction in PR #3128): - ``lifespan()`` keeps a local ``startup_config`` variable and passes it explicitly into ``langgraph_runtime(app, startup_config)`` and into ``start_channel_service``. No ``app.state.config`` attribute is set at any point. - ``langgraph_runtime`` now accepts ``startup_config`` as a required parameter, removing the ``getattr(app.state, "config", None)`` lookup and the "config not initialised" runtime error. - The matching ``run_events_config`` is frozen onto ``app.state`` next to ``run_event_store`` so ``get_run_context`` reads the two from the same startup-time source. ``app_config`` continues to be resolved live via ``get_app_config()``. - ``backend/CLAUDE.md`` boundary explanation updated to spell out the ``startup_config`` / ``get_app_config()`` split. New regression test ``test_run_context_app_config_reflects_yaml_edit`` exercises the worker-feeding path: it asserts that ``ctx.app_config`` follows a mid-flight ``config.yaml`` edit while ``ctx.run_events_config`` stays frozen to the startup snapshot the event store was built from. Refs: bytedance/deer-flow#3107 (BUG-001), bytedance/deer-flow#3131 review * fix(frontend): parse Task cancelled and polling timed out as terminal Address @ShenAC-SAC's BUG-007 review on #3131. `task_tool.py` actually emits five terminal strings: - `Task Succeeded. Result: …` - `Task failed. …` - `Task timed out. …` - `Task cancelled by user.` ← previously matched none - `Task polling timed out after N minutes …` ← previously matched none The previous cut handled three; the last two fell through to the "unknown content" branch and pushed the subtask card back to `in_progress` even though the backend had already reached a terminal state. Add explicit matches plus regression tests for both. The `in_progress` fallback is now reserved for genuinely unrecognised output (i.e. contract drift), as documented. Refs: bytedance/deer-flow#3107 (BUG-007), bytedance/deer-flow#3131 review * fix(frontend): sanitize JSON export content via the Markdown content path Address @ShenAC-SAC's BUG-006 review and the Copilot inline comment on #3131. The previous cut filtered hidden/tool messages out of the JSON export but still serialised `msg.content` verbatim, so: - inline `<think>…</think>` wrappers stayed in the exported `content` even with `includeReasoning: false`, - content-array thinking blocks leaked the `thinking` field, - `<uploaded_files>…</uploaded_files>` markers leaked the workspace paths a user uploaded files to. JSON now goes through the same sanitiser the Markdown path uses (`extractContentFromMessage` + `stripUploadedFilesTag`). Reasoning and tool_calls remain gated behind their `ExportOptions` flags. AI / human rows that sanitise to empty content with no opted-in reasoning or tool calls are dropped so the JSON matches the Markdown path's `continue` on empty assistant fragments. New regression tests cover the three leak shapes the reviewer called out plus the empty-content-drop case. Refs: bytedance/deer-flow#3107 (BUG-006), bytedance/deer-flow#3131 review * test(gateway): align lifespan stub with langgraph_runtime two-arg signature Codex round-3 review of c0bc7a06 flagged this: changing `langgraph_runtime` to require `startup_config` as a second positional argument broke the one-arg stub `_noop_langgraph_runtime(_app)` in `test_gateway_lifespan_shutdown.py`, which is patched into `app.gateway.app.langgraph_runtime` by the lifespan shutdown bounded-timeout regression. Lifespan would then call the stub with two args and raise `TypeError` before the bounded-shutdown assertion ran. Update the stub to match the new signature. The shutdown test itself is unaffected — it only cares about the channel `stop_channel_service` hang path. Refs: bytedance/deer-flow#3107 (BUG-001), bytedance/deer-flow#3131 review * fix(frontend): strip every known backend marker in export, not just uploads Codex round-3 review of 258ca800 and the matching maintainer feedback on PR #3131 made the same point: the JSON export now ran the Markdown-side sanitiser, but that sanitiser only stripped `<uploaded_files>`. The full set of payloads middleware embeds inside message `content` is larger: - `<uploaded_files>` — `UploadsMiddleware` - `<system-reminder>` — `DynamicContextMiddleware` - `<memory>` — `DynamicContextMiddleware` (nested inside system-reminder) - `<current_date>` — `DynamicContextMiddleware` The primary protection is still `isHiddenFromUIMessage`: the `<system-reminder>` HumanMessage is marked `hide_from_ui: true` and never reaches the formatter. This commit adds the second line of defence so a regression that drops the `hide_from_ui` flag — or any future middleware that injects the same tag vocabulary into a visible HumanMessage — cannot leak the payload into the export file. Concrete changes: - New `INTERNAL_MARKER_TAGS` constant + `stripInternalMarkers(content)` helper in `core/messages/utils.ts`. The constant doubles as documentation for the backend↔frontend contract. - `formatMessageContent` in `export.ts` now calls `stripInternalMarkers` instead of `stripUploadedFilesTag`. UI render paths (`message-list-item.tsx`) keep using the narrower function so a user legitimately typing `<memory>` in a meta-discussion is preserved. - The "drop empty rows" guard in `buildJSONMessage` switched from `=== undefined` to truthy `!` checks. Codex spotted the asymmetry: when `extractReasoningContentFromMessage` returned the empty string (which it legitimately can), the JSON path emitted `{reasoning: ""}` while the Markdown path's `!reasoning` `continue` correctly dropped the row. New regression tests cover the defence-in-depth strip with a `<system-reminder><memory><current_date>` payload deliberately not marked `hide_from_ui`; tool-message sanitization under `includeToolMessages: true`; the mixed-content-array case (`thinking + text + image_url`); and the opted-in empty-reasoning drop. Live verification on a real Ultra-mode thread that uploaded a PDF (`曾鑫民-薪资交易流水.pdf`): backend state's first HumanMessage carries the `<uploaded_files>` block (with `/mnt/user-data/uploads/...` paths) as part of a content-array. The Markdown and JSON export blobs both come back free of `<uploaded_files>`, `<system-reminder>`, `<current_date>`, `tool_calls`, and reasoning — while preserving the user's `这是什么？` prompt and the assistant's visible answer. Refs: bytedance/deer-flow#3107 (BUG-006), bytedance/deer-flow#3131 review * test(frontend): cover trim, varied N, and pre-execution Error: prefixes Codex round-3 review of 50e2c257 flagged three coverage gaps in the subtask-status parser: 1. `Task cancelled by user.` and `Task polling timed out` previously had no whitespace-trim coverage — the original trim test only exercised the success prefix. Streaming chunks can arrive with leading/trailing newlines; the regex needed an explicit assertion. 2. The polling-timeout case was tested only at one `N` (15 minutes). The backend interpolates the live `timeout_seconds // 60` value, so the matcher must hold for any positive integer. Now we run the case for 1, 5, and 60 minutes. 3. `task_tool.py` also emits three `Error:` strings for pre-execution failures — unknown subagent type, host-bash disabled, and "task disappeared from background tasks". They are intentionally handled by `ERROR_WRAPPER_PATTERN` rather than dedicated prefixes (the wrapper already produces the right terminal-failed shape) but had no test coverage proving that wiring. Codex was right that a refactor splitting one of them off into its own prefix would silently break things. The JSDoc on the constants block now spells the three pre-execution errors out so the relationship between `task_tool.py` returns and the prefix vocabulary is explicit. No production code change beyond the docstring — this commit is pure coverage hardening for the contract that already exists. Refs: bytedance/deer-flow#3107 (BUG-007), bytedance/deer-flow#3131 review	2026-05-21 21:18:10 +08:00
InitBoy	e19bec1422	fix(task-tool): cancel and schedule deferred cleanup on polling safety timeout (#3097 ) When the poll loop's safety-net timeout fires (poll_count > max_poll_count), the background subagent task was abandoned without cancellation or cleanup, leaving a stale entry in _background_tasks indefinitely. The original code had a comment promising "the cleanup will happen when the executor completes", but run_task() in executor.py never calls cleanup_background_task after reaching a terminal state -- the promise was never implemented. This change mirrors the asyncio.CancelledError path: signal cooperative cancellation via request_cancel_background_task and schedule _deferred_cleanup_subagent_task to remove the entry once the background thread reaches a terminal state. Direct cleanup at poll-timeout time would introduce a race: run_task() could remove the entry while the poll loop is still mid-iteration, causing a spurious "Task disappeared" error. The deferred approach avoids this by waiting for terminal state before removal. Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-21 07:47:19 +08:00
AochenShen99	3599b570a9	fix(harness): wrap all async-only tools for sync clients (#2935 )	2026-05-19 22:11:46 +08:00
YuJitang	eab7ae3d62	feat: stream subagent token usage to header via terminal task events (#2882 ) * feat: real-time subagent token usage display in header and per-turn Backend: - Persist subagent token usage to AIMessage.usage_metadata via TokenUsageMiddleware, so accumulateUsage() naturally includes subagent tokens without frontend state management - Cache subagent usage by tool_call_id in task_tool, write back to the dispatching AIMessage on next model response - Emit subagent token usage on all terminal task events (task_completed, task_failed, task_cancelled, task_timed_out) - Report subagent usage to parent RunJournal for API totals - Search backward from ToolMessage to find dispatching AIMessage for correct multi-tool-call attribution Frontend: - Remove subagentUsage state, custom event handling, and prop threading — subagent tokens are now embedded in message metadata - Simplify selectHeaderTokenUsage (no subagentUsage parameter) - Per-turn inline badges show turn-specific usage via message accumulation - Remove isLoading guard from MessageTokenUsageList for dynamic updates during streaming * fix: prevent header token double counting from baseline reset race onFinish, onError, and thread-switch useEffect all reset pendingUsageBaselineMessageIdsRef to an empty Set. If thread.isLoading is still true on the next render, all messages pass the getMessagesAfterBaseline filter and their tokens are added to backendUsage (which already includes them), causing the header to display up to 2× the actual token count. Capture current message IDs instead of using an empty Set so that getMessagesAfterBaseline correctly returns no pending messages even if thread.isLoading lags behind the stream end. * fix: write back subagent tokens for all concurrent task tool calls TokenUsageMiddleware only processed messages[-2], so when a single model response dispatched multiple task tool calls only the last ToolMessage had its cached subagent usage written back to the dispatch AIMessage.usage_metadata. Earlier tasks' usage stayed in _subagent_usage_cache indefinitely (leak) and never appeared in the per-turn inline token display. Walk backward through all consecutive ToolMessages before the new AIMessage, and accumulate updates targeting the same dispatch message into one state update so overlapping writes don't clobber each other. * fix: clean up subagent usage cache entry on task cancellation When a task_tool invocation is cancelled via CancelledError, any cached subagent usage entry leaked because the TokenUsageMiddleware writeback path never fires after cancellation. Pop the cache entry before re-raising to prevent unbounded growth of the module-level _subagent_usage_cache dict. * fix: address token usage review feedback * fix: handle missing config for subagent usage cache --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com>	2026-05-13 23:52:19 +08:00
Xinmin Zeng	f1a0ab699a	fix(tools): preserve tool_search promotions across re-entrant get_available_tools (#2885 ) * fix(tools): preserve tool_search promotions across re-entrant get_available_tools Closes #2884. ``get_available_tools`` used to unconditionally call ``reset_deferred_registry()`` and rebuild a fresh ``DeferredToolRegistry`` on every invocation. That works for the first call of a request (the ContextVar starts at its default of ``None``), but any RE-ENTRANT call during the same async context — e.g. ``task_tool`` building a subagent's toolset, or a custom middleware that rebuilds tools mid-run — wiped any ``tool_search`` promotions the parent agent had already made. The ``DeferredToolFilterMiddleware`` would then re-hide those tools from the next model call, leaving the agent able to see a tool's name (via the prior ``tool_search`` result that's still in conversation history) but unable to invoke it. Fix: when the ContextVar already holds a registry, reuse it instead of rebuilding. Fresh requests still get a fresh registry because each new graph run starts in a new asyncio task with the ContextVar at ``None``. ## Verification - Unit-level reproduction (``test_get_available_tools_resets_registry_wiping_promotion``): promote a tool in the registry, call ``get_available_tools`` again, assert the promotion is preserved. Fails on main, passes on this branch. - Graph-execution reproduction (two tests): drive a real ``langchain.agents.create_agent`` graph with the real ``DeferredToolFilterMiddleware`` through two model turns, including one that issues a re-entrant ``get_available_tools`` call to simulate the task_tool subagent path. - Real-LLM end-to-end (``test_deferred_tool_promotion_real_llm.py``, opt-in via ``ONEAPI_E2E=1``): drives the same flow against a real OpenAI-compatible model (verified on GPT-5.4-mini through the one-api gateway), watches the model call the promoted ``fake_calculator`` through the deferred-filter middleware, and asserts the right arithmetic result. Passes against the fixed branch. - Companion update to ``test_tool_deduplication.py``: dropped the ``@patch("deerflow.tools.tools.reset_deferred_registry")`` decorators because the symbol is no longer imported there. - Test fixtures in the new files patch ``deerflow.tools.tools.get_app_config`` with a minimal ``model_construct``-ed ``AppConfig`` instead of calling the real loader, so they never trigger ``_apply_singleton_configs`` and never leak ``_memory_config``/``_title_config``/… mutations into the rest of the suite. Full backend suite: 3208 passed / 14 skipped / 0 failed. ruff check + format clean. * fix(tools): address Copilot review on #2885 - tools.py: rewrite the reuse-path comment to spell out (a) why we don't reconcile the registry against the current ``mcp_tools`` snapshot — the MCP cache doesn't refresh mid-graph-run, the lead agent's ``ToolNode`` is already bound to the previous tool set anyway, and ``promote()`` drops the entry so a naive re-sync misclassifies promotions as new tools — and (b) why the log uses ``max(0, …)`` to avoid negative counts when the cache shrinks between snapshots. - Replace direct ``ts_mod._registry_var.set(None)`` in test fixtures with the public ``reset_deferred_registry()`` helper so tests don't couple to module internals. - Correct the docstring path in ``test_deferred_tool_registry_promotion.py`` to match the actual monkeypatch target (``deerflow.mcp.cache.get_cached_mcp_tools``). - Rename ``test_get_available_tools_resets_registry_wiping_promotion`` to ``test_get_available_tools_preserves_promotions_across_reentrant_calls`` so the test name describes the contract being asserted, not the bug it originally reproduced. Full backend suite: 3208 passed / 14 skipped. Real-LLM e2e: 1 passed.	2026-05-13 23:45:47 +08:00
Xinmin Zeng	68d8caec1f	fix(agents): make update_agent honor runtime.context user_id like setup_agent (#2867 ) * fix(agents): make update_agent honor runtime.context user_id like setup_agent PR #2784 hardened setup_agent to prefer runtime.context["user_id"] (set by inject_authenticated_user_context from the auth-validated request) over the contextvar, so an agent created during the bootstrap flow always lands under users/<auth_uid>/agents/<name>. update_agent was left calling get_effective_user_id() unconditionally — the same class of bug that produced issues #2782 / #2862 still applies whenever the contextvar is not available on the executing task (background work, future cross-process drivers, checkpoint resume on a different task). In that regime update_agent silently routes writes to users/default/agents/<name>, corrupting the shared default bucket and losing the user's edit. Extract the resolution policy into a shared resolve_runtime_user_id helper on deerflow.runtime.user_context and route both setup_agent and update_agent through it so the two halves of the lifecycle stay in lockstep. Add load-bearing end-to-end tests that drive a real langchain.agents create_agent graph with a fake LLM, exercising the full pipeline: HTTP wire format -> app.gateway.services.start_run config-assembly -> deerflow.runtime.runs.worker._build_runtime_context -> langchain.agents create_agent graph -> ToolNode dispatch (sync + async + sub-graph + ContextThreadPoolExecutor) -> setup_agent / update_agent The negative-control tests intentionally land in users/default/ to prove the positive tests are actually load-bearing rather than vacuously passing. The new test_update_agent_e2e_user_isolation suite included a test that failed against main and now passes after this fix. * style: ruff format on new e2e tests * test(e2e): real-server HTTP test driving setup_agent through the full ASGI stack Adds tests/test_setup_agent_http_e2e_real_server.py — a single load-bearing test that drives the entire FastAPI gateway through starlette.testclient. TestClient with no mocks above the LLM: - lifespan boots (config, sqlite engine, LangGraph runtime, channels) - POST /api/v1/auth/register (real password hash, real sqlite write, issues access_token + csrf_token cookies) - POST /api/threads (real thread_meta + checkpoint creation) - POST /api/threads/{id}/runs/stream with the exact wire shape the React frontend sends (assistant_id + input + config + context with agent_name/is_bootstrap) - AuthMiddleware -> CSRFMiddleware -> require_permission -> start_run -> inject_authenticated_user_context -> asyncio.create_task(run_agent) -> worker._build_runtime_context -> Runtime injection -> ToolNode dispatch -> real setup_agent - Asserts SOUL.md is under users/<authenticated_uid>/agents/<name>/ and NOT under users/default/agents/<name>/. DEER_FLOW_HOME and the sqlite path are redirected into tmp_path so the test never touches the real .deer-flow directory or developer database. The only patch above the LLM boundary is replacing create_chat_model with a fake that emits a single setup_agent tool_call. This is the "真实验证" answer: it reproduces what curl-against-uvicorn would do, minus the network socket layer. * test: address Copilot review on user-isolation e2e tests - Drop "currently expected to FAIL" wording from update_agent e2e docstring and header (Copilot review): the fix is in this PR, the test pins the corrected behaviour rather than driving a future change. - Rephrase the assertion failure messages from "BUG:" to "REGRESSION:" to match the test's role on the fixed branch. - Bound _drain_stream with a wall-clock timeout, a max-bytes cap, and an early break on the "event: end" SSE frame (Copilot review). Stops the test from hanging on a stuck run or runaway heartbeat loop. - Replace the misleading "patch both module aliases" comment with an explanation of why patching lead_agent.agent.create_chat_model is the only correct target (Copilot review): lead_agent rebinds the symbol into its own namespace at import time, so patching deerflow.models is too late. * test(refactor): address WillemJiang review on user-isolation e2e tests - Extract the duplicated FakeToolCallingModel (and a build_single_tool_call_model helper) into tests/_agent_e2e_helpers.py. All three e2e files now import from the shared module instead of redefining the shim locally. - Convert the manual p.start() / p.stop() try/finally blocks in test_update_agent_e2e_user_isolation.py to contextlib.ExitStack so patch lifecycle is Pythonic and exception-safe. - Lift the isolated_app fixture's private-attribute resets into a named _reset_process_singletons helper with a comment block explaining why each singleton has to be invalidated for true e2e isolation, and why raising=False is intentional. Makes the fragility visible and the intent self-documenting rather than leaving the resets inline as opaque monkeypatch calls. Net change: -59 lines (143 -> 84) across the three test files, with every assertion intact. Full suite remains 69 passed / lint clean. * test(e2e): make real-server test self-supply its config CI's actions/checkout only ships config.example.yaml (the real config.yaml is gitignored), so the production config-discovery search (./config.yaml -> ../config.yaml -> $DEER_FLOW_CONFIG_PATH) finds nothing and the test fails at lifespan boot with FileNotFoundError. The dev-machine run passed only because a local config.yaml happened to exist. Write a minimal AppConfig-valid yaml into tmp_path and pin DEER_FLOW_CONFIG_PATH to it. The yaml carries just what the schema requires (a single fake-test-model entry, LocalSandboxProvider, sqlite database). The LLM never gets instantiated because the test patches create_chat_model on the lead agent module, so the api_key/base_url stay placeholders. Verified by hiding the local config.yaml to mirror the CI checkout — the test now passes in both environments.	2026-05-12 23:18:54 +08:00
AochenShen99	bedbf2291e	fix(harness): wrap async-only config tools for sync client execution (#2878 ) * fix(harness): wrap async-only config tools for sync clients * refactor(tools): share async tool sync wrapper	2026-05-11 22:14:13 +08:00
Maz Benoscar	30a5846219	fix(tools): make write_file append discoverable in model-facing schema (#2843 ) * fix: make tool argument behavior discoverable The write_file tool already supported append=false by default with append=true for end-of-file writes, but the parsed docstring did not describe append in the model-facing schema. This records the overwrite default and append path in the tool description, adds resilient schema regression coverage, and keeps backend sandbox docs aligned. The regression now also checks that every public parameter in the existing tool schema test matrix has a description. Enabling docstring parsing on setup_agent and update_agent fills the two existing gaps with their existing Args docs instead of duplicating descriptions elsewhere. Constraint: Issue #2831 asks for a small docstring/schema discoverability fix without changing runtime file-writing behavior Rejected: Changing write_file defaults \| would alter existing overwrite semantics and broaden the fix beyond schema discoverability Rejected: Exact phrase assertions \| too brittle for future docstring rewording while testing the same behavior Confidence: high Scope-risk: narrow Directive: Keep model-facing tool parameters documented through parsed docstrings or equivalent schema descriptions Tested: cd backend && uv run pytest tests/test_setup_agent_tool.py tests/test_update_agent_tool.py tests/test_tool_args_schema_no_pydantic_warning.py tests/test_sandbox_tools_security.py::test_str_replace_and_append_on_same_path_should_preserve_both_updates -q Tested: cd backend && uv run ruff check packages/harness/deerflow/sandbox/tools.py packages/harness/deerflow/tools/builtins/setup_agent_tool.py packages/harness/deerflow/tools/builtins/update_agent_tool.py tests/test_tool_args_schema_no_pydantic_warning.py Not-tested: Full backend test suite Co-authored-by: OmX <omx@oh-my-codex.dev> * Fix the lint error --------- Co-authored-by: OmX <omx@oh-my-codex.dev> Co-authored-by: Willem Jiang <willem.jiang@gmail.com>	2026-05-10 23:09:03 +08:00
YuJitang	9892a7d468	fix: bucket subagent token usage into parent run totals (#2838 ) * fix: bucket subagent token usage into RunRow.subagent_tokens Add caller-bucketed token tracking to RunJournal so subagent and middleware LLM calls are written to the correct RunRow columns instead of all falling into lead_agent_tokens (default 0). - RunJournal: accumulate _lead_agent_tokens / _subagent_tokens / _middleware_tokens in on_llm_end, deduped by langchain run_id. Add record_external_llm_usage_records() for external sources (respects track_token_usage flag). Return caller buckets from get_completion_data(). - SubagentTokenCollector: new lightweight callback handler that collects LLM usage within subagent execution. - SubagentExecutor: wire collector into subagent run_config and sync records to SubagentResult on every chunk (timeout/cancel safe). - SubagentResult: add token_usage_records and usage_reported fields. - task_tool: report subagent usage to parent RunJournal on every terminal status (COMPLETED/FAILED/CANCELLED/TIMED_OUT), including the CancelledError path, guarded against double-reporting. No DB migration needed — RunRow columns already exist. * Potential fix for pull request finding Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> * fix: address token usage review feedback * Address review follow-ups --------- Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>	2026-05-10 22:47:30 +08:00
Eilen Shin	1c96a6afc8	fix: keep new agent bootstrap in user scope (#2784 )	2026-05-09 19:43:50 +08:00
DanielWalnut	2b1fcb3e43	fix(task): remove max_turns parameter from task tool interface (#2783 ) * fix(task): remove max_turns parameter from task tool interface Subagents should always use their configured max_turns value. Exposing this parameter allowed callers to override the admin-configured limit, which is undesirable. The value is now exclusively driven by subagent config (per-agent overrides and global defaults in config.yaml). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Potential fix for pull request finding Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Willem Jiang <willem.jiang@gmail.com> Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>	2026-05-08 15:05:24 +08:00
He Wang	7de9b5828b	fix(tools): introduce Runtime type alias to eliminate Pydantic serialization warning (#2774 ) * fix(tools): introduce Runtime type alias to eliminate Pydantic serialization warning Add deerflow/tools/types.py with: Runtime = ToolRuntime[dict[str, Any], ThreadState] Replace every runtime: ToolRuntime[ContextT, ThreadState] and runtime: ToolRuntime[dict[str, Any], ThreadState] annotation in sandbox/tools.py, present_file_tool.py, task_tool.py, view_image_tool.py, and skill_manage_tool.py with the new Runtime alias. The unbound ContextT TypeVar (default None) caused PydanticSerializationUnexpectedValue warnings on every tool call because LangChain's BaseTool._parse_input calls model_dump() on the auto-generated args_schema while DeerFlow passes a dict as runtime context. Binding the context to dict[str, Any] aligns Pydantic's serialization expectations with reality and removes the noise from all run modes. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Cursor <cursoragent@cursor.com> * fix(tools): extend Runtime alias to setup_agent and update_agent tools Replace bare ToolRuntime annotations in setup_agent_tool.py and update_agent_tool.py with the shared Runtime alias introduced in the previous commit, and add both tools to the Pydantic serialization warning regression test (13 cases total). Co-authored-by: Cursor <cursoragent@cursor.com> * test(tools): loosen Pydantic warning filter to avoid version-specific format Replace the brittle "field_name='context'" substring check with a looser "context" match so the assertion stays valid if Pydantic changes its internal warning format across versions. Co-authored-by: Cursor <cursoragent@cursor.com> * test(tools): simplify warning filter and clean up docstring Remove the "context" substring condition from the Pydantic warning filter — asserting that no PydanticSerializationUnexpectedValue fires at all is both simpler and more comprehensive, since the test payload contains only the tool's own args plus runtime. Also update the module docstring to remove the version-specific warning format example that was inconsistent with the looser filter. Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-08 14:50:33 +08:00
yangzheli	59c4a3f0a4	feat(agent): add custom-agent self-updates with user isolation (#2713 ) * feat(agent): add update_agent tool for in-chat custom-agent self-updates (#2616) Custom agents had no built-in way to persist updates to their own SOUL.md / config.yaml from a normal chat — `setup_agent` was only bound during the bootstrap flow, so when the user asked the agent to refine its description or personality, the agent would shell out via bash/write_file and the edits landed in a temporary sandbox/tool workspace instead of `{base_dir}/agents/{agent_name}/`. Changes: - New `update_agent` builtin tool with partial-update semantics (only the fields you pass are written) and atomic temp-file + os.replace writes so a failed update never corrupts existing SOUL.md / config.yaml. - Lead agent now binds `update_agent` in the non-bootstrap path whenever `agent_name` is set in the runtime context. Default agent (no agent_name) and bootstrap flow are unchanged. - New `<self_update>` system-prompt section is injected for custom agents, instructing them to use `update_agent` — and explicitly NOT bash / write_file — to persist self-updates. - Tests: 11 new cases in `tests/test_update_agent_tool.py` covering validation (missing/invalid agent_name, unknown agent, no fields), partial updates (soul-only, description-only, skills=[] vs omitted), no-op detection, atomic-write safety, and AgentConfig round-tripping; plus 2 new cases in `tests/test_lead_agent_prompt.py` covering the self-update prompt section. - Docs: updated backend/CLAUDE.md builtin tools list and tools.mdx (en/zh) with the new tool description. * feat(agent): isolate custom agents per user Store custom agent definitions under the effective user, keep legacy agents readable until migration, and cover API/tool/migration behavior with tests. Co-authored-by: Cursor <cursoragent@cursor.com> * feat: consistent write/delete targets & add --user-id to migration --------- Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-05 23:17:42 +08:00
greatmengqi	8ba01dfd83	refactor: thread app_config through lead and subagent task path (#2666 ) * refactor: thread app config through lead prompt * fix: honor explicit app config across runtime paths * style: format subagent executor tests * fix: thread resolved app config and guard subagents-only fallback Address two PR review findings: 1. _create_summarization_middleware passed the original (possibly None) app_config into create_chat_model, forcing the model factory back to ambient get_app_config() and risking config drift between the middleware's resolved view and the model's view. Pass the resolved AppConfig instance through end-to-end. 2. get_available_subagent_names accepted Any-typed config and forwarded it to is_host_bash_allowed, which reads ``.sandbox``. A SubagentsAppConfig (also accepted upstream as a sum-type input) has no ``.sandbox`` attribute and would be silently treated as "no sandbox configured", incorrectly disabling the bash subagent. Guard on hasattr and fall back to ambient lookup otherwise. Adds regression tests for both paths. * chore: simplify hasattr guard and tighten regression tests - Collapse if/else into ternary in get_available_subagent_names; hasattr(None, ...) is False so the explicit None check was redundant. - Drop comments that narrate the change rather than explain non-obvious WHY (test names already convey intent). - Replace stringly-typed sentinel "no-arg" in regression test with direct args tuple comparison. --------- Co-authored-by: greatmengqi <chenmengqi.0376@bytedance.com>	2026-05-02 06:37:49 +08:00

1 2

80 Commits