deer-flow

mirror of https://github.com/bytedance/deer-flow.git synced 2026-06-09 17:12:01 +00:00

Author	SHA1	Message	Date
Xinmin Zeng	7679f21edf	fix(frontend): truncate overflowing text in agent cards (#3391 ) * fix(frontend): truncate overflowing text in agent cards Long custom agent names, descriptions, skills and tool-group labels overflowed the agent card and broke its layout (issue #3389). The title already had `truncate`, but it never took effect: an ancestor flex container lacked `min-w-0`, so the flex item refused to shrink below its content width. - Restore the truncation chain by adding `min-w-0` to the title's flex ancestors so `truncate` can finally take effect. - Cap and ellipsize model / skill / tool-group badges via a small `TruncatedBadge` (`block max-w-full truncate`). - Reveal the full value on hover, but only when the text is actually clipped (`TruncatedTooltip`, width + height detection), so names, descriptions and labels stay readable without popping redundant tooltips on short cards. * fix(frontend): wrap unbreakable strings in agent card tooltips A long token with no break opportunity (no spaces or hyphens) could still overflow the tooltip horizontally. Add `break-words` next to the existing `text-wrap` so such strings wrap instead of overflowing. Addresses Copilot review feedback on tooltip wrapping robustness. * fix(frontend): show agent card tooltips instantly Drop the explicit `delayDuration` so card tooltips fall back to the provider's default 0ms delay. Instant feedback is better UX for revealing text that is already clipped, per maintainer review. --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com>	2026-06-07 23:29:59 +08:00
Xinmin Zeng	8d2e55a05f	fix(subagent): structured subagent_status field over text parsing (#3146 ) (#3154 ) * fix(subagent): structured subagent_status field over text parsing Closes #3146. ## Why The frontend used to derive subtask card state by string-matching the leading text of the `task` tool's result. That contract surface was fragile — `#3107` BUG-007 and the `#3131` review both surfaced cases where new backend wording (`Task cancelled by user.`, `Task polling timed out after N minutes`, `ToolErrorHandlingMiddleware` exception wrappers) silently broke the card lifecycle. The frontend fallback kept growing more prefixes; any future rewording would break it again. ## Design 1. Backend → frontend contract: `ToolMessage.additional_kwargs` carries `subagent_status` (one of `completed \| failed \| cancelled \| timed_out \| polling_timed_out`) and an optional `subagent_error` blob. The frontend prefers it over parsing `content`. 2. Centralised stamping, not 8 sprinkled stamps: rather than have each of `task_tool.py`'s 5 normal-return + 3 pre-execution `Error:` paths remember to set `additional_kwargs`, `ToolErrorHandlingMiddleware` stamps the field after every task-tool call. Adding a new return path in `task_tool.py` cannot now skip the stamp. 3. Cross-language contract fixture: the prefix→status mapping is the one piece both sides must agree on. The shared fixture at `contracts/subagent_status_contract.json` lists every backend return string, the expected status, and what the error substring should contain. Backend test (`backend/tests/test_subagent_status_contract.py`) and frontend test (`frontend/tests/unit/core/tasks/subtask-result.test.ts`) both load that fixture and assert the same cases. A wording drift on either side fails the matching language's test. 4. Round-trip serialisation pinned: the round-trip test asserts `ToolMessage.model_dump_json()` → `model_validate_json()` preserves `additional_kwargs.subagent_status`. Catches the case where a future LangChain or Pydantic upgrade silently strips unknown kwargs. 5. Frontend status collapse documented: the backend has five status values, the frontend card has three (`completed \| failed \| in_progress`). `cancelled` / `timed_out` / `polling_timed_out` all collapse to `failed` with the original status preserved in `error`. `parseSubtaskResult` returns `in_progress` for unknown values so a backend that ships a new enum variant before the frontend upgrades degrades to the legacy prefix fallback instead of getting pinned. ## Changes Backend: - `deerflow.subagents.status_contract` — new module exporting `SUBAGENT_STATUS_KEY`, `SUBAGENT_ERROR_KEY`, `SUBAGENT_STATUS_VALUES`, `extract_subagent_status(content)`, and `make_subagent_additional_kwargs(status, error)`. - `ToolErrorHandlingMiddleware`: new `_stamp_task_subagent_status` helper centralises the stamp; `wrap_tool_call` / `awrap_tool_call` stamp on the success path; `_build_error_message` stamps on the wrapper path (carrying `ExcClass: detail` into `subagent_error`). Non-task tools are untouched. - New tests: `test_subagent_status_contract.py` (19 cases from the shared fixture + status-enum / blank-error / unknown-status rejection) and `test_tool_error_handling_subagent_stamp.py` (middleware integration: terminal-content stamps, non-terminal doesn't, non-task tools untouched, async path mirrors sync, existing additional_kwargs survive, JSON round-trip preserved). Frontend: - `parseSubtaskResult(text, additionalKwargs?)` — prefers the structured stamp; falls back to the legacy prefix matcher for historical threads / unknown future status values. - `STRUCTURED_STATUS_TO_SUBTASK` documents the five→three collapse. - `message-list.tsx` passes `message.additional_kwargs` through. - `subtask-result.test.ts` adds a structured-status block + a fixture-driven contract block; legacy prefix tests stay green for the fallback path. Contract: - `contracts/subagent_status_contract.json` — single source of truth both languages load. Whitespace variants, varied N for polling timeouts, the 3 pre-execution `Error:` returns task_tool produces, and the middleware wrapper shape are all in there. ## Test plan - `make lint` clean (backend + frontend). - `pytest tests/test_subagent_status_contract.py tests/test_tool_error_handling_subagent_stamp.py` → 37 passed. - `pnpm test --run` → 103 passed (was 76, +27 new). ## Migration / fallback retirement The text-prefix fallback stays in place until backend telemetry shows the frontend never hits it for newly produced messages. At that point a follow-up PR can drop the prefix branches and keep only the structured-status branch. Refs: bytedance/deer-flow#3138 (split summary), #3107 (origin), #3131 (prior prefix-only fix), #3146 (this issue). * fix(subtask): back-fill result/error from text when structured status present Three follow-ups on the PR #3154 review: 1. `readStructuredStatus` no longer short-circuits the prefix parse. The backend currently stamps only the `subagent_status` enum value; the human-facing `result` body and wrapped-error message still live in `ToolMessage.content`. Dropping the text parse meant successful tasks rendered empty completed pills and wrapped failures lost their diagnostic. Now both shapes get composed: structured status wins, `result`/`error` come from text when both sides agree, and a lying success body under a `failed` stamp is dropped instead of leaking. 2. Replace the ESM-incompatible `__dirname` fixture lookup in subtask-result.test.ts with `fileURLToPath(new URL(..., import.meta.url))`. The frontend package is `"type": "module"`, so the previous path would have thrown at runtime if anything ever changed under the contract directory. 3. Drop the `$schema` reference from contracts/subagent_status_contract.json pointing at a file that doesn't exist in the tree. Three new tests cover the structured + text composition: completed back-fills the success body, failed back-fills the wrapper text, and unrecognised content under a `failed` stamp stays empty rather than echoing noise.	2026-06-07 22:49:55 +08:00
Ryker_Feng	d8b728f7cb	fix(mcp): close stdio sessions on their owning loop to avoid cross-task cancel-scope error (#3379 ) (#3392 ) * fix(mcp): close stdio sessions on their owning loop to avoid cross-task cancel-scope error (#3379) Adopt an owner-task lifecycle for pooled MCP ClientSessions so each session is entered, initialized, and exited within a single asyncio task on its owning event loop. This eliminates the anyio "Attempted to exit cancel scope in a different task than it was entered in" RuntimeError that surfaced when stdio MCP tools were used via the sync tool wrapper (which spins up and tears down event loops across tasks). Also harden the pool lifecycle: - track in-flight session creation per (server, scope) to dedupe concurrent get_session() calls for the same key - make close_scope/close_server/close_all/close_all_sync cover both established entries and in-flight creations so sessions cannot be resurrected or leaked after close - handle cross-loop preemption of an in-flight creation by cancelling the stale owner task instead of only signalling it - define close_all_sync() semantics for a running loop on the current thread (signal-only, async completion) and route reset_mcp_tools_cache through a deterministic async close in that case * fix(mcp): avoid reset deadlock on running loop cache reset * fix(mcp): address session pool review feedback	2026-06-07 21:37:30 +08:00
Xinmin Zeng	befe334f10	fix(config): make the reload boundary discoverable from code (#3144 ) (#3153 ) * fix(config): make the reload boundary discoverable from code, not just docs Closes #3144. The hot-reload contract — per-run fields are resolved through `get_app_config()` on every request, infrastructure fields snapshot at gateway startup — landed in `backend/CLAUDE.md` as part of #3131. A maintainer reading `get_config()` or an `AppConfig` field still had to context-switch to that document to know which fields require a process restart, and there was no enforcement that the prose list stayed in sync with the code. This commit moves the boundary to a machine-readable single source of truth and surfaces it where the code lives: - New `deerflow.config.reload_boundary` module owns the registry of restart-required fields (`STARTUP_ONLY_FIELDS`) and a tiny helper API (`is_startup_only_field`, `iter_startup_only_field_paths`, `format_field_description`). The standardised `"startup-only:"` prefix is exported as `STARTUP_ONLY_PREFIX` so future scanners / lint hooks / doc generators can pivot off it without re-parsing prose. - `AppConfig`'s `database`, `checkpointer`, `run_events`, `stream_bridge`, `sandbox`, and `log_level` fields now build their `Field(description=...)` from `format_field_description(...)`. The same text shows up in IDE hover (Pydantic v2 exposes `description` via `model_fields[...]`). - `channels` is restart-required too but lives outside the AppConfig Pydantic schema (the config section is consumed directly by `start_channel_service`). The registry owns it so the boundary is not split between two places. - `get_config()` docstring points to the registry instead of leaving the reader to find `CLAUDE.md`. The `CLAUDE.md` table collapses to a one-liner pointing back at `reload_boundary.py` so the boundary has one canonical location, not two. Drift coverage in `tests/test_reload_boundary.py`: - Every registered field has a non-trivial reason. - Iterator / membership helpers stay in sync with the dict. - Every registry entry that maps to an `AppConfig` field also carries the `"startup-only:"` prefix in the schema (catches "forgot to update the schema"). - Reverse drift: any AppConfig field whose description starts with the prefix must be registered (catches "marked restart-required in the schema but forgot the registry"). - The runtime introspection that IDE hover depends on (`AppConfig.model_fields["database"].description`) is pinned, so a future Pydantic upgrade or schema swap that breaks the hover surface shows up as a test failure rather than a silent regression. Refs: bytedance/deer-flow#3138 (split summary), #3107 (origin), #3131 (prior boundary fix in prose form). * fix(config): preserve field doc and correct log_level reload reason Two follow-ups on the PR #3153 review: 1. The `log_level` STARTUP_ONLY_FIELDS reason previously claimed `apply_logging_level()` mutates the root logger level. It does not: only the `deerflow` / `app` logger levels are set, and root handler thresholds are conditionally lowered so messages from those loggers can propagate. Reword to match the actual behavior so operators reading IDE hover get accurate restart guidance. 2. `format_field_description(field_path)` was the sole `Field(description=)` for every restart-required field, which silently overwrote the original human-facing documentation — most visibly the `log_level` field that used to list debug/info/warning/error and clarify that third-party libraries are not affected. Extend the helper with a keyword-only `field_doc` parameter that composes the startup-only marker with the original prose so IDE hover documents both why the field is restart-required and what it actually accepts. Updated all six restart-required AppConfig fields (`log_level`, `database`, `sandbox`, `run_events`, `checkpointer`, `stream_bridge`) to pass their original descriptions through the helper. Tests: two new cases in `test_reload_boundary.py` pin (a) the helper composition and (b) every AppConfig restart-required field still surfaces a recognisable substring of its original documentation. --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com>	2026-06-07 21:27:14 +08:00
Ryker_Feng	d133b1119a	fix(summarization): tag summary LLM calls nostream to stop phantom stream messages (#2503 ) (#3378 ) * fix(summarization): tag summary LLM calls nostream to stop phantom stream messages (#2503) The SummarizationMiddleware runs its summary LLM call inside a before_model hook. Without a nostream tag the summary tokens were captured by LangGraph's messages-tuple stream callback and broadcast to the frontend as a phantom AI message. Generate a dedicated summary model copy tagged with "nostream" (merged on top of any existing tags such as "middleware:summarize" so RunJournal attribution is preserved) and override _create_summary / _acreate_summary to invoke it directly. This avoids temporarily swapping the shared self.model, which would otherwise leak the RunnableBinding across concurrent runs and break parent logic that inspects the raw model (profile / _get_ls_params). Add regression tests covering nostream tagging, concurrent-run isolation, raw model preservation, and existing-tag merge. * fix(summarization): address nostream review feedback	2026-06-07 17:55:04 +08:00
Huixin615	88e36d9686	fix(#3189 ): prevent write_file streaming timeout on long reports (#3195 ) * fix(#3189): prevent write_file streaming timeout on long reports Adds a layered defense against StreamChunkTimeoutError caused by oversized single-shot write_file tool calls: - factory: default stream_chunk_timeout to 240s for OpenAI-compatible clients (overridable via ModelConfig.stream_chunk_timeout in config.yaml) - sandbox/tools: server-side 80 KB length guard on non-append write_file calls (configurable via DEERFLOW_WRITE_FILE_MAX_BYTES env var, 0 disables); rejects oversized payloads with a structured error pointing the model at str_replace or append=True - middleware: classify StreamChunkTimeoutError as transient but cap retries at 1 via per-exception _RETRY_BUDGET_OVERRIDES (same-payload retry on a chunk-gap timeout buffers the same way upstream; full 3-attempt loop would stack 6-12 min of dead air) - middleware: surface an actionable user-facing message for stream-drop exceptions instead of leaking the raw langchain stack - prompts: add a routing-style File Editing Workflow hint to both lead_agent and general_purpose subagent prompts, pointing the model at str_replace for incremental edits (mirrors Claude Code's Edit / Codex's apply_patch) - tests: behavioural coverage for size guard, retry budget override, stream-drop user message, factory default injection Refs #3189 * fix(#3189): drop stream_chunk_timeout for non-OpenAI providers Address CR feedback on PR #3195: - factory: pop `stream_chunk_timeout` from kwargs for any model_use_path other than `langchain_openai:ChatOpenAI` instead of returning early. `ModelConfig.stream_chunk_timeout` is part of the shared schema, so a user-supplied value on a non-OpenAI provider would otherwise be forwarded to its constructor and raise `TypeError: unexpected keyword argument`. - factory: rewrite docstring to describe the actual `exclude_none=True` behaviour (explicit null is excluded and falls back to the default) instead of the misleading "None falling out via exclude_none=True keeps its value". - tests: add regression coverage asserting the kwarg is stripped before reaching a non-OpenAI provider's constructor. Refs: bytedance#3189 * fix(#3189): restrict stream-drop user copy to StreamChunkTimeoutError only Per CR on #3195: narrow _STREAM_DROP_EXCEPTIONS to StreamChunkTimeoutError. Generic httpx RemoteProtocolError / ReadError fall back to the standard 'temporarily unavailable' copy, since they routinely fire on transient network blips where the 'split the output' guidance is misleading. Retry/backoff classification is unchanged — both remain transient/retriable. Tests updated to reflect new copy, plus a symmetric regression test for ReadError. --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com>	2026-06-07 17:47:11 +08:00
Xinmin Zeng	268fdd6968	fix(gateway): drain in-flight runs before closing checkpointer on shutdown (#3381 ) * fix(gateway): drain in-flight runs before closing checkpointer on shutdown Chat runs execute in fire-and-forget background asyncio tasks that write checkpoints through a shared checkpointer. On shutdown, langgraph_runtime's AsyncExitStack tore down the checkpointer's postgres connection pool while those run tasks were still mid-graph. langgraph's AsyncPregelLoop._checkpointer_put_after_previous then ran its `finally: await checkpointer.aput(...)` against the closed pool, raising psycopg_pool.PoolClosed. Because that put runs in a langgraph-internal task (not on run_agent's call stack), run_agent's try/except cannot catch it and it surfaces as "unhandled exception during asyncio.run() shutdown". Add RunManager.shutdown() to cancel and bounded-await all in-flight runs, and call it from langgraph_runtime BEFORE the AsyncExitStack closes the checkpointer, so the final checkpoint write lands while the pool is still open. The drain is bounded by a timeout so a stuck run cannot hang worker shutdown, and is shielded so a second shutdown signal cannot abandon it mid-drain and reopen the race. Closes #3373 * fix(gateway): address review — preserve completed-run status, bound drain persistence Addresses Copilot review on #3381: - RunManager.shutdown(): decide run status AFTER the drain. Under the lock it now only requests cancellation; after asyncio.wait it marks/persists `interrupted` only for runs still pending or ended cancelled. A run that completes (e.g. `success`) during the drain window keeps its real terminal status instead of being unconditionally overwritten. - Bound the trailing status persistence within the timeout budget (deadline = loop.time()+timeout; gather wrapped in asyncio.wait_for) so a slow store backing off under DB pressure cannot push shutdown past the deadline. - deps: use asyncio.create_task instead of asyncio.ensure_future. - tests: wait deterministically for the run to be in-flight (poll the first checkpoint) instead of a fixed sleep; init shutdown_calls explicitly in the recovery test double; add regression test asserting a run completing during the drain keeps its status (in memory and in the store). * fix(gateway): address maintainer review — surface failed drain persists, clarify timeout constant Addresses @WillemJiang review on #3381: - shutdown(): inspect the gather result of the trailing interrupted-status persistence. _persist_status is best-effort (it catches + logs its own failure with exc_info and returns False, so it never raises out of the gather), but the aggregate result was never checked — a partial failure had no shutdown-level visibility. Now any escaped Exception is logged, and any False (a persist that did not confirm) is logged with the run_id. Added regression test test_shutdown_surfaces_failed_interrupted_persist. - deps: clarify the _RUN_DRAIN_TIMEOUT_SECONDS comment — state the actual value of _SHUTDOWN_HOOK_TIMEOUT_SECONDS (5.0s) and that both count toward the lifespan shutdown window. Kept as two separate constants (independent teardown steps that may diverge) rather than one shared "must match" value. - Verified no other test fake needs the shutdown stub: _FakeRunManager in test_worker_langfuse_metadata.py is a run_agent() argument (worker path), never injected into langgraph_runtime, so it never receives shutdown().	2026-06-07 11:24:30 +08:00
Nan Gao	9a5de8d6a5	fix(ux): remove Backspace shortcut for deleting prompt attachments (#3410 ) * Remove backspace attachment deletion * Fix the lint error --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com>	2026-06-06 15:13:24 +08:00
Nan Gao	1aac408dd0	fix upload file size contract (#3408 )	2026-06-06 15:12:17 +08:00
Xinmin Zeng	dd8f9bf5f0	chore: add AI assistance disclosure to PR template and CONTRIBUTING (#3398 )	2026-06-05 22:08:24 +08:00
AochenShen99	2bbc7879fa	refactor(tool-search): consolidate MCP metadata tag and harden deferred-tool setup (#3370 ) Follow-up to #3342 (deferred MCP tool loading). Maintainability cleanup plus hardening of malformed/empty tool_search queries; no change to the deferral mechanism or search ranking. - Add deerflow/tools/mcp_metadata.py as the single source of truth for the "deerflow_mcp" tag (MCP_TOOL_METADATA_KEY + tag_mcp_tool + public is_mcp_tool). Removes the duplicated magic string and the private, cross-module _is_mcp_tool import. - tool_search.search: never raise on model-generated input. Extract _compile_catalog_regex (shared compile-with-literal-fallback); return empty for empty/whitespace queries and a bare "+" instead of matching everything or raising IndexError. - DeferredToolSetup: document the empty-vs-populated invariant. - build_deferred_tool_setup: comment the two distinct empty-return branches. - _assemble_deferred: add return type, rename local to deferred_setup, build the final list with an explicit append. - Tests: use tag_mcp_tool instead of per-file tag helpers; cover empty and bare-"+" queries.	2026-06-05 15:21:41 +08:00
Eilen Shin	28b1da2172	fix(agents): harden update_agent null-like args (#3237 ) * fix(agents): harden update_agent null-like args * docs: mention undefined null-like update args --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com>	2026-06-04 07:10:59 +08:00
Eilen Shin	3fddc24c5f	chore: remove stale LangGraph server runtime remnants (#3344 ) * chore: remove stale langgraph server runtime remnants * Potential fix for pull request finding Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com> Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>	2026-06-03 22:04:05 +08:00
Admire	0d0968a364	chore: add sandbox memory profiling tools (#3249 ) * chore: add sandbox memory profiling tools * chore: keep sandbox memory PR profiling-only * Format sandbox memory profiling script	2026-06-03 22:02:27 +08:00
Huixin615	89ae74d4f4	fix(skills): surface offending line and quoting hint on SKILL.md YAML… (#3335 ) * fix(skills): surface offending line and quoting hint on SKILL.md YAML errors When a SKILL.md front-matter fails to parse, the existing log only echoes PyYAML's raw message, leaving authors to grep the file for the offending line. This is especially painful for the very common LLM-authored mistake of an unquoted scalar containing ': ' (e.g. 'description: foo: bar'), which fails with 'mapping values are not allowed here' and silently drops the skill. Enrich the error log with: - the source line PyYAML pointed at via problem_mark - a targeted, copy-pasteable quoting hint when (and only when) the error is the well-known 'mapping values are not allowed' scanner error on an unquoted value The skill is still rejected (no semantics are guessed or rewritten); only the diagnostic is improved. Fixes #3333 * improve(skills): address CR feedback on SKILL.md YAML error diagnostics Per review on #3335: - Log the file line number (mark.line + 2) instead of the front-matter-internal line number, so authors land on the right row in their editor. - Use exc.problem == "mapping values are not allowed here" for a tighter match than substring-scanning str(exc). - Preserve the offending key's leading whitespace in the quoting hint so nested mappings stay nested when authors paste the fix back. - Rewrite the regression test to actually exercise the new behaviour: PyYAML's own message already echoes the offending line (and truncates it with "..."), so the old assertion passed on main. New assertions pin (a) the file-line number, (b) the full untruncated line, and (c) the copy-pasteable hint. - Add a guard test for nested-key indentation so the partition()/strip() shape cannot regress silently. Refs #3333, #3335 * fix(skills): escape backslashes in YAML quoting hint The hint emitted by _format_yaml_error previously escaped only double quotes, so values containing backslashes (e.g. Windows paths like C:\Temp or regex escapes like \d) produced a suggested scalar that was either invalid YAML or silently re-interpreted by PyYAML's double-quoted escape rules when pasted back. Escape order matters: backslashes first, then double quotes. Adds two regression tests covering Windows-path and regex-style backslashes. Address Copilot CR feedback on PR #3335.	2026-06-03 21:53:52 +08:00
Huixin615	9a53f9dfbb	fix(frontend): preserve chronological order of thread history after context compression (#3354 ) * fix(frontend): preserve chronological order of thread history after context compression Iterate runs from newest to match backend `list_by_thread` (newest-first) and the prepend semantics of the history loader, so refreshed history renders in A→B→C→D→E→F order. Fixes #3352 * fix(frontend): auto-continue loading runs with no visible messages after context compression	2026-06-03 21:51:48 +08:00
Ryker_Feng	8fca56cf43	fix(mcp): accept transport field as alias for type (#3238 ) (#3243 ) The official MCP configuration schema uses `transport` to specify the transport mechanism (stdio/sse/http), but `McpServerConfig` only honored `type` and defaulted to `stdio`. Remote MCP servers configured with just `transport: sse` were therefore misidentified as stdio and failed with "with stdio transport requires 'command' field". Add a model validator that promotes `transport` to `type` when only `transport` is provided, while keeping `type` authoritative when both are set. This matches the MCP-spec field name without breaking existing configurations. Fixes #3238	2026-06-03 18:11:38 +08:00
Octopus	0ffa995fe9	feat: upgrade MiniMax default model to M3 (#3357 ) - Add MiniMax-M3 to model list and set as default - Keep MiniMax-M2.7 and MiniMax-M2.7-highspeed - Remove older models (M2.5) - Update related tests Co-authored-by: octo-patch <octo-patch@github.com>	2026-06-03 17:04:16 +08:00
Xinmin Zeng	f97b0c0f74	feat(issue-templates): add structured bug & feature issue forms (#3359 ) Replace the single runtime-information form with: - config.yml: disable blank issues, route Q&A/ideas to Discussions, link security policy - bug-report.yml: reproducible bug form (folds in the old runtime/environment fields + affected-area picker) - feature-request.yml: scoped proposal form Uses only default labels (bug/enhancement) so it is self-contained.	2026-06-03 16:42:07 +08:00
Xinmin Zeng	aca7acc105	feat(ci): PR/issue auto-labeling + declarative label sync (#3360 ) - .github/labels.yml: declarative source of truth (29 namespaced labels) - scripts/sync_labels.py + label-sync.yml: idempotent label sync (self-bootstraps on merge) - labeler.yml + pr-labeler.yml: area:* labels by changed path (actions/labeler) - pr-triage.yml: size/, risk:, needs-validation, first-time-contributor, reviewing - issue-triage.yml: needs-triage on new issues (self-healing) All PR workflows use pull_request_target but never check out or run PR code (read changed-file metadata via the API only).	2026-06-03 16:40:24 +08:00
zhongli-sz	3ae82dc663	fix(mcp): add auth interceptor with channel user_id and keep header propagation to mcp tools (#3294 ) * 修复channel中的user_id传递到interceptor中的bug, mcp可通过header传递user_id到mcp工具 Co-authored-by: Cursor <cursoragent@cursor.com> * fix(channel,mcp,gateway): normalize channel user_id and add regression tests Normalize external channel user ids into filesystem-safe runtime context while preserving raw channel_user_id, and document gateway user_id propagation semantics. Add regression coverage for channel user_id context mapping, gateway user_id precedence/internal-role behavior, and MCP interceptor header forwarding via meta.headers. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(auth,mcp): harden user id normalization and header handling Increase sanitized user-id digest suffix to 16 hex chars, replace internal system role magic string with a shared constant, and harden MCP header forwarding with Mapping type checks. Add regression tests for empty channel user_id handling, unsupported header types, and updated digest length behavior. Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: zhongli <335302680@qq.com> Co-authored-by: Cursor <cursoragent@cursor.com>	2026-06-03 15:48:19 +08:00
Ryker_Feng	5dc2d6cbf5	fix(sandbox): close AioSandbox HTTP client during provider teardown (#2872 ) (#3245 ) * fix(sandbox): close AioSandbox HTTP client during provider teardown (#2872) AioSandbox allocates a host-side agent_sandbox client (wrapping an httpx.Client) in __init__, but AioSandboxProvider.release/destroy/shutdown only popped provider state and tore down the backend container — the client/transport owned by each cached AioSandbox was never explicitly closed, accumulating unreclaimed sockets in long-running services. - Add AioSandbox.close(): best-effort, idempotent close of the wrapped httpx_client (falls back to top-level client.close()); errors are logged but never raised so backend cleanup is never blocked. - AioSandboxProvider.release()/destroy() now close the cached AioSandbox before dropping it; shutdown() inherits this via destroy(). * fix(sandbox): close the real httpx.Client owned by AioSandbox (#2872) The previous close() only walked one level (wrapper.httpx_client), which resolves to the Fern-generated HttpClient wrapper that has no close(). The real socket-owning httpx.Client lives one level deeper at _client_wrapper.httpx_client.httpx_client, so the close path never fired and host-side sockets still leaked. Resolve the real httpx.Client with graceful degradation; clear self._client under the lock for use-after-close and concurrent double-close safety; mark provider release()/destroy() try/except as defense-in-depth; rewrite TestClose against the real nested structure to lock down the original no-op bug.	2026-06-02 22:55:59 +08:00
AochenShen99	d9f4724950	fix(tool-search): reliably hide deferred MCP schemas by removing the ContextVar (closures + graph state) (#3342 ) * feat(tool-search): add hash-scoped promoted state to ThreadState * feat(tool-search): add immutable DeferredToolCatalog with stable hash * feat(tool-search): add build_deferred_tool_setup + Command-writing tool_search * refactor(tool-search): replace deferred-tool ContextVar with closures + graph state (#3272) Build the deferred catalog + tool_search tool per agent from the policy-filtered tool list (after skill allowed-tools), pass deferred_names + catalog_hash explicitly to DeferredToolFilterMiddleware and the prompt, and record promotions in ThreadState.promoted (scoped by catalog_hash) via a Command-returning tool_search. Removes DeferredToolRegistry and the _registry_var ContextVar so deferral no longer depends on build/execute sharing an async context. MCP tools are tagged with metadata[deerflow_mcp]; client.py assembles deferral the same way. Catalog is built AFTER tool-policy filtering (no policy-excluded tool can leak via tool_search) and assembly is fail-closed. Migrate tests off the deleted registry APIs; delete the obsolete ContextVar-based #2884 regression (re-covered by state-based tests in a follow-up). * test(tool-search): lock tool_search promotion into next model turn via graph state * test(tool-search): cross-context, policy-leak, fail-closed, #2884 isolation regressions * test(tool-search): align real-LLM e2e with closure-based deferred setup * docs: update DeferredToolFilterMiddleware description for closure+state design * style(tests): drop unused import in test_deferred_setup (ruff) * test(tool-search): harden merge_promoted + replace tautological catalog test From independent code review: - merge_promoted: use existing.get("catalog_hash") so a forward-incompatible or externally-injected persisted promoted dict triggers a replace instead of a KeyError crash; add regression test for the malformed-existing case. - test_deferred_catalog: replace the `== [] or True` tautology (a test that could never fail) with a deterministic invalid-regex->literal-fallback check (positive match on calc + negative empty match). - DeferredToolCatalog: comment why frozen-without-slots is required for the cached_property hash/names fields (adding slots=True would break them). * fix(tool-search): read tool_search.enabled from self._app_config in client DeerFlowClient._ensure_agent called get_app_config() directly to read tool_search.enabled, but the client already resolves and stores its config as self._app_config at construction (and uses it everywhere else). The bare call re-resolves config from disk at agent-build time, which raises FileNotFoundError in environments without a config.yaml (CI) — test_client.py's fixture only patches get_app_config during __init__, so the later call hit the real loader. Use self._app_config, matching the rest of the client. * test(tool-search): lock tool_search post-policy append ordering tool_search is appended after skill-allowlist filtering, so the allowlist can no longer deny it by name. Lock the intended contract: it only appears when allowed MCP tools survive the filter, and its catalog (derived from the already policy-filtered list) can never expose a denied tool. Addresses the ordering observation from the Copilot review on #3342.	2026-06-02 22:43:22 +08:00
Eilen Shin	74e3e80cf6	docs: clean gateway runtime transition remnants (#3334 )	2026-06-02 10:03:28 +08:00
Eilen Shin	019bd16a06	fix: load paginated run history messages (#3305 )	2026-06-01 15:50:39 +08:00
Willem Jiang	031d6fbcbe	fix(checkpointer): use AsyncConnectionPool for postgres to prevent stale connection errors (#3223 ) (#3226 ) * fix(checkpointer): use AsyncConnectionPool for postgres to prevent stale connection errors (#3223) Replace AsyncPostgresSaver.from_conn_string() with an explicit AsyncConnectionPool that has check_connection enabled, so dead idle connections are detected and replaced on checkout instead of raising OperationalError. * Potential fix for pull request finding Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> * Fixed the unit test error and lint error * fix(checkpointer): add TCP keepalive to postgres connection pool (#3254) Enable TCP keepalive probes on the AsyncConnectionPool to prevent idle postgres connections from being dropped by the server or network middleware. Combined with the existing check_connection callback, this provides defense-in-depth against stale connection errors. Fixes #3254 * Changed the code as review suggestion --------- Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>	2026-06-01 09:05:11 +08:00
FallingSnowFlake	d6a604d5a1	fix(makefile): extract setup-sandbox inline bash to script for Windows compatibility (#3326 )	2026-06-01 07:28:13 +08:00
kia	46ddc346ad	fix(channels): preserve Feishu clarification thread continuity (#3285 ) * fix(channels): preserve Feishu clarification thread continuity * fix(channels): address Feishu clarification review feedback --------- Co-authored-by: zzp1221 <zzp1221@users.noreply.github.com> Co-authored-by: Willem Jiang <willem.jiang@gmail.com>	2026-05-31 22:43:07 +08:00
Nan Gao	79cc227917	fix(middleware): fix LLM fallback run status (#3321 ) * Fix LLM fallback run status * optimize LLM fallback maker extraction in streaming path	2026-05-31 22:42:13 +08:00
AochenShen99	9f3be2a9fa	fix(agents): offload UploadsMiddleware uploads scan off the event loop (#3311 ) UploadsMiddleware defines only the sync `before_agent` hook. LangChain wires a sync-only hook as `RunnableCallable(before_agent, None)`, and LangGraph's `ainvoke` runs it directly on the event loop when `afunc is None` — so the per-message uploads-directory scan (`exists`/`iterdir`/`stat` plus reading sibling `.md` outlines) blocks the asyncio event loop on every message that has an uploads directory. Add `abefore_agent` that offloads the scan to a worker thread via `run_in_executor`; it copies the current context, preserving the `user_id` contextvar read by `get_effective_user_id()`. Add a runtime anchor under `tests/blocking_io/` that drives the real `create_agent` graph via `ainvoke` under the strict Blockbuster gate, so a regression back onto the event loop fails CI. Update blocking-IO docs.	2026-05-30 21:46:35 +08:00
Ryker_Feng	e8e9edcb6e	fix(channels): ignore hidden control messages when extracting replies (#3219 ) (#3270 )	2026-05-29 23:06:58 +08:00
AochenShen99	4093c83383	refactor(provider): share assistant payload replay matching (#3307 ) * Share assistant payload replay matching * fix(provider): recover assistant field when ordinal AI index is taken The mismatch-length fallback in `_match_ai_message` only tried the exact `fallback_ordinal` AI index. When serialization drops or reorders an assistant message, a unique signature match can consume a non-ordinal index, leaving a later ambiguous payload's ordinal already used — so its provider field (e.g. `reasoning_content`) was silently dropped. Scan forward from the ordinal for the next unused `AIMessage` (wrapping to earlier indices) to preserve the positional bias while still recovering the field. Forward scanning avoids a naive min-unused pick that could restore the wrong field after a leading message is dropped. Add a regression test for the dropped-leading-message case. * fix(provider): avoid earlier assistant fallback replay	2026-05-29 23:05:59 +08:00
AochenShen99	052b1e2102	test(runtime): add Blockbuster runtime anchor for JsonlRunEventStore async IO (#3313 ) * test(runtime): add Blockbuster runtime anchor for JsonlRunEventStore async IO #3084 offloaded `JsonlRunEventStore`'s file IO via `asyncio.to_thread` and added a mock-based offload assertion (`tests/test_jsonl_event_store_async_io.py`) that covers `put()` only. That guard is not part of the Blockbuster runtime gate (`tests/blocking_io/`) run by `backend-blocking-io-tests.yml`. Add a runtime anchor that drives the full async surface (`put`, `put_batch`, `list_messages`, `list_events`, `list_messages_by_run`, `count_messages`, `delete_by_run`, `delete_by_thread`) under the strict Blockbuster gate, so any blocking IO reintroduced on the event loop in any of these methods fails CI — not only removal of a specific `to_thread` call. Verified each offloaded method goes red when its offload is reverted. Test-only; no production change. * test(runtime): exercise list_events event_types filter branch Per review feedback: the anchor called list_events without event_types, so the filter branch never ran after _read_run_events' filesystem IO. Add a second list_events call with event_types=["message"] so the full read path -- including the filter branch -- executes under the gate.	2026-05-29 23:02:41 +08:00
Xinmin Zeng	ca487578a4	feat(agent): add ToolOutputBudgetMiddleware for oversized tool output protection (#3303 ) * feat(agent): add ToolOutputBudgetMiddleware for oversized tool output protection Closes #3289. Adds a unified middleware that enforces per-result budgets on ALL tool outputs (MCP, sandbox, community, custom), preventing oversized external tool results from blowing the model context window. Design informed by claude-code (persistToolResult), hermes-agent (tool_result_storage), and pi (OutputAccumulator) — the three most mature implementations in production coding-agent frameworks. Key features: - Disk externalization: oversized outputs written to thread-local .tool-results/ directory, replaced with compact preview + file reference. Model can read full output via read_file with offset/limit. - Fallback truncation: head+tail truncation when disk is unavailable (no thread_data, write failure), ensuring the context is always protected. - read_file exemption: prevents persist-read-persist infinite loops (independently discovered by claude-code, hermes-agent, and pi). - Per-tool threshold overrides via config. - Line-boundary-aware truncation (no partial lines in previews). - Multimodal content passthrough (images/structured blocks skip budget). - Historical ToolMessage patching in wrap_model_call for checkpoint recovery scenarios. Related: #3222 (design RFC), #1844 (comprehensive context management), #3137 (write_file args compaction), #1677 (sandbox tool truncation). * test: add MCP content_and_artifact format coverage Add 5 tests for MCP tool output format (list of content blocks): - text content blocks are extracted and budgeted - multiple text blocks are joined and budgeted - image content blocks are skipped (multimodal passthrough) - mixed text+image blocks are skipped - small text blocks pass through unchanged Total test count: 59 (was 54). * fix(agent): address Codex review findings for ToolOutputBudgetMiddleware Three issues identified by Codex code review, all fixed: 1. `enabled` config field was unused — middleware now checks `config.enabled` and skips all processing when disabled. 2. `_build_fallback` could exceed `fallback_max_chars` — the marker text itself (~139 chars) was not deducted from the budget. Now pre-computes marker overhead and falls back to hard slice when max_chars is smaller than the marker. 3. Sync file I/O in async path — `awrap_tool_call` now delegates `_patch_result` to `asyncio.to_thread` to avoid blocking the event loop during disk writes. Tests updated to use realistic fallback_max_chars values (500+) that can accommodate the marker overhead, plus two new tests: - `test_result_never_exceeds_max_chars` (parametric across sizes) - `test_very_small_max_chars_does_not_crash` * fix(agent): address Copilot review — path traversal, async perf, shared config 1. Path traversal defense: sanitize tool_name via _sanitize_tool_name() (strips separators, .., absolute paths), validate storage_subdir is relative, and verify resolved filepath stays inside storage_dir. 2. Async hot-path optimization: add _needs_budget() cheap check before asyncio.to_thread offload — small outputs (99% of calls) skip the thread overhead entirely. 3. Replace shared module-level _DEFAULT_CONFIG with _default_config() factory to prevent cross-instance mutation of mutable fields. 12 new tests: TestSanitizeToolName (5), TestExternalizePathTraversal (3), TestNeedsBudget (4). * fix(agent): correct preview hint to match read_file actual API read_file uses start_line/end_line (1-indexed line numbers), not offset/limit. The previous wording was copied from hermes-agent which has a different read_file interface. * perf(agent): hoist hot-path imports, add model-call pre-scan (review #3303) Address maintainer review feedback: 1. Hoist inline imports to module level — `import asyncio` (was in awrap_tool_call hot path) and `from dataclasses import replace` (was in _patch_result) now live at module top. 2. Add a cheap pre-scan to _patch_model_messages so the historical message list is not rebuilt on every model call when nothing is oversized (the common case once results are budgeted at tool-call time). Also adds the same _needs_budget gate to the sync wrap_tool_call for symmetry with awrap_tool_call. The pre-scan is refactored into per-tool-aware helpers (_effective_trigger / _tool_message_over_budget) that mirror the exact trigger conditions in _budget_content — including tool_overrides — so the fast-path can never produce a false negative (silently skipping budgeting for a tool with a low per-tool threshold). 7 new regression tests lock the per-tool-override-through-pre-scan path and the model-call early return. --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com>	2026-05-29 22:59:26 +08:00
Nan Gao	e683ed6a76	fix(runtime): guide malformed write_file recovery (#3040 ) * fix(runtime): guide malformed write_file recovery * fix(runtime): align write_file recovery guidance	2026-05-29 17:46:24 +08:00
Eilen Shin	872079b894	docs: clean standalone LangGraph server remnants (#3301 )	2026-05-29 11:36:45 +08:00
john lee	cbf8b194e8	fix(runtime): harden JSONL async I/O and DB put_batch thread validation (#3084 ) * fix(runtime): harden JSONL async I/O and DB put_batch thread validation (#2816) - JsonlRunEventStore: offload all file I/O to asyncio.to_thread() so the event loop is never blocked; add per-thread asyncio.Lock to serialise concurrent puts and prevent interleaved JSONL lines - Split _ensure_seq_loaded into a sync _compute_max_seq (runs in thread) and an async wrapper; seq counter is recovered from disk on fresh store init - DbRunEventStore.put_batch: raise ValueError when events span multiple thread_ids (previously silently assumed same thread) - Add test_jsonl_event_store_async_io.py: 12 tests covering lock reuse, concurrent seq monotonicity, disk recovery, and mixed-thread batch rejection Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: address Copilot review comments - delete_by_thread: pop _write_locks after releasing the lock to prevent unbounded growth when threads are repeatedly created and deleted - tests: add regression guard asserting asyncio.to_thread is called for _write_record in put(); assert _write_locks entry removed on delete * fix(lint): move patch import to local scope to fix ruff I001 * fix(lint): apply ruff check+format fixes to test file * fix(runtime): address review feedback for JSONL async I/O hardening (#2816) Use setdefault for atomic lock init in _get_write_lock; pop _write_locks inside the held lock scope in delete_by_thread; update test docstring and assert lock entry also cleared on delete. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: rayhpeng <rayhpeng@gmail.com>	2026-05-29 09:27:53 +08:00
Nan Gao	d46a5779bc	fix(chat): preserve messages after summarization (#3280 ) * fix(chat): preserve messages after summarization * make format * fix(chat): address summarization review comments	2026-05-29 08:24:47 +08:00
Xinmin Zeng	2ace78d1e5	fix(frontend): surface backend detail when agent name check fails (#3048 ) * fix(frontend): surface backend detail when agent name check fails The new-agent page caught AgentNameCheckError but only branched on reason === "backend_unreachable". Everything else (notably the 422 "Invalid agent name '...'. Must match ^[A-Za-z0-9-]+$" response from GET /api/agents/check when the user submits a name with disallowed characters — trailing space, dot, Chinese, invisible whitespace from copy-paste) fell through to the generic fallback "Could not verify name availability — please try again", swallowing the detail that already told the user exactly what to fix. Add a request_failed branch that surfaces err.message (which checkAgentName already populates from the backend's detail at core/agents/api.ts). The disabled / backend_unreachable / unknown- error paths are unchanged. Pin the contract with unit tests covering: 200 success, fetch rejection, 502/503/504 network errors, agents_api disabled detail, 422 validation detail carried verbatim, statusText fallback when detail is absent, and a regression guard against misclassifying a 422 as agents_api disabled. Closes #3041 * fix(frontend): localise the error prefix when surfacing backend detail The previous commit surfaced the backend's raw `err.message` on the new-agent page when the name check failed. The detail itself is English (backend's `_validate_agent_name` text, any 5xx business message, etc.) and dropping it bare into a zh-CN page produced a jarring English-among-Chinese line that didn't match neighbouring strings like "已存在同名智能体" / "无法验证名称可用性". Add `nameStepCheckErrorWithDetail` as a templated string ("Name check failed: {detail}" / "名称校验失败：{detail}"), mirroring the existing `nameStepBootstrapMessage` `{name}` template pattern. The page wraps `err.message` in it when present and falls back to the plain `nameStepCheckError` when the detail is empty. Rendered output (verified locally with a Console fetch mock that returns 500 + detail): zh-CN: 名称校验失败：Database connection lost: SQLAlchemy connection pool exhausted (max 5 connections, all in use) en-US: Name check failed: Database connection lost: SQLAlchemy connection pool exhausted (max 5 connections, all in use) The localised prefix tells the user what operation failed; the raw detail tells them why. Translating the detail itself would be lossy (any unbounded backend string would need a translation table) and would break the debuggability the previous commit delivered. Refs #3041 * fix(frontend): distinguish backend detail from generated fallback in AgentNameCheckError Addresses Copilot's review on #3048: the previous commits keyed off `err.message`, but `checkAgentName` substitutes a generated fallback string ("Failed to check agent name: ${statusText}") when the backend sent no detail. That guaranteed `err.message` was always truthy, made the `nameStepCheckError` fallback branch unreachable in practice, and could surface awkward strings like "名称校验失败：Failed to check agent name: Bad Gateway" in the UI. Add an explicit `detail: string \| null` field to AgentNameCheckError. `checkAgentName` populates it only when the backend response actually carried a string `detail` (defensive guard against the dict-shaped detail that other deer-flow endpoints use for typed error codes). The new-agent page now selects on `err.detail` instead of `err.message` so the localised fallback wins when no real detail exists. Also fix the prettier formatting that broke lint-frontend CI on the previous push. Test changes: - The 422 carry-through test now asserts both `detail` and `message` hold the backend string verbatim. - A new "falls back to statusText in message but leaves detail null" test pins the contract that no real detail ⇒ no UI surface leak. - A new "treats non-string detail as null" test guards against future backend schema drift toward dict-shaped detail. Refs #3041 #3048	2026-05-28 18:38:45 +08:00
AochenShen99	8330b244a9	docs: add blocking IO detection usage and maintenance (#3233 ) * docs: add blocking IO detection usage and maintenance * docs: address blocking io doc review feedback	2026-05-28 18:26:26 +08:00
AochenShen99	44677c5eb4	feat(provider) Add patched MiMo reasoning content support (#3298 ) * Add patched MiMo reasoning content support * Clarify MiMo patched model coverage * Remove unused MiMo payload index * Address MiMo review nits	2026-05-28 18:24:32 +08:00
Admire	2fdfff0db3	fix(frontend): fix Mermaid preview failure in historical messages (#3196 ) * fix(frontend): render historical mermaid diagrams * fix(frontend): address mermaid review feedback * Stabilize cancel lifecycle test * fix(frontend): handle mermaid fence variants * fix(frontend): normalize mermaid arrow spacing * fix(frontend): handle mermaid CRLF fences * chore: keep mermaid fix frontend-scoped --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com>	2026-05-28 18:20:02 +08:00
zgenu	737abc0e45	fix: ignore stale run reconnect conflicts (#3284 ) * fix: ignore stale run reconnect conflicts * Potential fix for pull request finding Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> * fix: ignore stale run reconnect conflicts --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com> Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>	2026-05-28 17:29:30 +08:00
AochenShen99	8decfd327e	Fix custom skill install permissions (#3241 ) * Fix custom skill install permissions * Fix skill upload test portability * Keep custom skill writes sandbox readable * Clear sandbox write bits on skill permissions * Limit custom skill write permission updates	2026-05-28 15:48:32 +08:00
Xinmin Zeng	0287240728	fix(frontend): show new thread in sidebar immediately on creation (#3276 ) (#3283 ) When a user starts a new conversation, the sidebar list did not display it until the AI finished streaming and generated a title. This made it impossible to switch back to an in-progress conversation when working with multiple threads concurrently. Optimistically insert the new thread into the TanStack Query cache during the `onCreated` callback so the sidebar renders a placeholder entry ("New chat") as soon as the backend acknowledges thread creation. The existing `onUpdateEvent` title handler and `onFinish` query invalidation then update the entry in-place with the real title.	2026-05-28 15:27:38 +08:00
Lucy Shen	37451500eb	fix(gateway): split stream_existing_run into per-method routes for unique OpenAPI operationIds (#3228 ) * fix(gateway): split stream_existing_run into per-method routes for unique OpenAPI operationIds `@router.api_route("/.../stream", methods=["GET", "POST"])` registers a single FastAPI route that holds both methods. FastAPI's auto-generated `operationId` is computed once per route from a single method picked out of `route.methods`, so when OpenAPI generation iterates over every method on that route both end up sharing the same `operationId`. That triggers `UserWarning: Duplicate Operation ID stream_existing_run_..._stream_(get\|post) for function stream_existing_run` during `app.openapi()` and produces an invalid OpenAPI spec for SDK / codegen consumers. Register GET and POST as two separate routes on the same handler so each method gets a distinct auto-generated `operationId` ("..._stream_get" and "..._stream_post"). Behavior is otherwise unchanged: same handler, same `require_permission` decoration, same response. Add `tests/test_openapi_operation_ids.py` to lock in the invariant: no duplicate-operationId warnings during spec generation, globally unique operationIds across the spec, and distinct GET / POST operationIds on the stream endpoint specifically. Reverted the source change locally and confirmed all three tests fail before the fix. * test(runtime): widen CancelledError catch in _ScriptedAgent to fix cancel-race flake `_ScriptedAgent.astream()` previously only caught `asyncio.CancelledError` inside the inner `if self.block_after_first_chunk:` while-loop. Cancellation arriving during any earlier `await` in the same body (`self.model.ainvoke`, `_write_checkpoint`, the `yield`) would propagate without setting `controller.cancelled`, so callers waiting on `controller.cancelled.wait(5)` after `POST /cancel` returned 204 could race and time out. `test_cancel_interrupt_stops_running_background_run` waits only for the `started` event (set on the first line of `astream`) before issuing cancel, so its race window spans all three pre-loop `await`s. On a clean `main` checkout, stress-running the test 20× reproduces the failure 6/20 (~30%). `test_cancel_rollback_restores_pre_run_checkpoint`, which waits for the later `checkpoint_written` event, passes 20/20 — confirming the race lives entirely in the gap between `started.set()` and the cancellation-aware block. Widen the try/except to cover the entire `astream` body so any `CancelledError` sets the controller event; the non-cancel path is unchanged (no exception means no event set). After this change the previously flaky test passes 50/50, the rollback test still passes 30/30, and the full backend suite remains at 3649 passed / 19 skipped. Test-only change — `backend/tests/test_runtime_lifecycle_e2e.py` is the only file touched; the production cancel pipeline is unaffected.	2026-05-28 08:20:52 +08:00
Lawrance_YXLiao	3cb75887c1	fix(memory): parse wrapped memory update json responses (#3252 ) * fix(memory): parse wrapped memory update json responses * test(memory): format wrapped response coverage * fix(memory): guard malformed nested memory facts * fix(memory): require full update object when parsing responses * fix(memory): fail closed on unsafe partial removals * style(memory): format updater tests	2026-05-28 07:46:44 +08:00
AochenShen99	a5599c100c	fix(gateway): honour on_disconnect on /wait endpoints (#3267 ) * fix(gateway): honour on_disconnect on /wait endpoints (#3265) The non-streaming /threads/{tid}/runs/wait and /runs/wait handlers used to await record.task directly with no disconnect handling and silently swallow CancelledError. When a long tool call (e.g. pip install inside a custom skill) kept the connection idle long enough for an intermediate HTTP layer to time out, the handler would still read the in-progress checkpoint and return it as if the run had completed normally -- masking a half-finished run as a successful response. Add wait_for_run_completion in app.gateway.services that mirrors sse_consumer's bridge-consumption pattern: subscribe to the stream bridge until END_SENTINEL, poll request.is_disconnected on every wake-up, and on real client disconnect cancel the background run when record.on_disconnect is "cancel". Wire it into both wait endpoints. The streaming path was unaffected because sse_consumer already has this loop; this just brings /wait to parity. * fix(gateway): skip checkpoint serialization on /wait disconnect Copilot review on #3267 caught a follow-on of the same #3265 bug: when the client disconnects, wait_for_run_completion breaks out of the bridge loop and cancels the run, but the /wait endpoint then continues to read the checkpointer and serializes whatever partial checkpoint exists as a normal 200 response. Have the helper return a bool — True only when END_SENTINEL was observed — and skip the checkpoint serialization path on False. Also reorder the inner check so END_SENTINEL is honoured even when is_disconnected() flips true in the same iteration; the run truly finished so the real final checkpoint is still valid.	2026-05-28 07:22:39 +08:00
dependabot[bot]	9e332c594a	chore(deps): bump uuid from 10.0.0 to 14.0.0 in /frontend (#3281 ) Bumps [uuid](https://github.com/uuidjs/uuid) from 10.0.0 to 14.0.0. - [Release notes](https://github.com/uuidjs/uuid/releases) - [Changelog](https://github.com/uuidjs/uuid/blob/main/CHANGELOG.md) - [Commits](https://github.com/uuidjs/uuid/compare/v10.0.0...v14.0.0) --- updated-dependencies: - dependency-name: uuid dependency-version: 14.0.0 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-05-28 07:14:44 +08:00
Willem Jiang	162fb2143e	fix(mcp): skip session pooling for HTTP/SSE transports to avoid anyioRuntimeError (#3203 ) (#3224 ) * fix(mcp): skip session pooling for HTTP/SSE transports to avoid anyio RuntimeError (#3203) HTTP/SSE transports use anyio.TaskGroup internally for streamable connections. These task groups have cancel scopes bound to the async task that created them, so closing a pooled session from a different task raises RuntimeError. Restrict session pooling to stdio transports only. * Potential fix for pull request finding Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> * docs: clarify MCP pooling applies only to stdio tools Agent-Logs-Url: https://github.com/bytedance/deer-flow/sessions/2dd9881d-54c6-45fd-90bc-154a09e29841 Co-authored-by: WillemJiang <219644+WillemJiang@users.noreply.github.com> --------- Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>	2026-05-27 08:32:57 +08:00

1 2 3 4 5 ...

2215 Commits