deer-flow

mirror of https://github.com/bytedance/deer-flow.git synced 2026-07-27 16:37:55 +00:00

Author	SHA1	Message	Date
qin-chenghan	795af20a6b	feat(memory): built-in FTS5/BM25 retrieval adapter (#4360 ) * feat(memory): integrate FTS5 retrieval adapter * deps: add jieba as default dependency for Chinese tokenization Without jieba, FTS5 unicode61 tokenizer treats entire Chinese sentences as single tokens, making single-character or sub-phrase searches impossible (e.g. '吃' or '油泼面' returns 0 hits against '用户喜欢吃油泼面'). jieba segments Chinese text into meaningful tokens before indexing. * fix(memory): avoid treating hyphens as FTS5 operators * feat(memory): make Chinese tokenization optional * fix(memory): warm every requested retrieval scope * fix(memory): close retrieval resources on shutdown * fix(memory): close backend when shutdown flush fails * fix(memory): recreate corrupt retrieval index * fix(memory): tolerate partial retrieval rebuilds * fix(memory): warm retrieval index in background * fix(memory): preserve shutdown flush budget * fix(memory): stop retrying partial lazy rebuilds * fix(memory): close retrieval through storage * refactor(memory): simplify retrieval scope limit * docs(memory): clarify retrieval shutdown lifecycle --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com>	2026-07-27 23:17:18 +08:00
Zhipeng Zheng	838037188e	feat(channels): share inbound webhook dedupe across pods via Postgres (#4210 ) * feat(channels): share inbound webhook dedupe across pods via Postgres (#4120) * ci: run cross-pod inbound dedupe integration tests in CI Expose the job Postgres service via DEDUPE_TEST_POSTGRES_URL so the integration tests (issue #4120) actually execute instead of silently skipping. Normalize the URL for asyncpg (postgresql:// -> +asyncpg, drop libpq-only sslmode) and await the now-async _is_duplicate_inbound in test_github_dispatcher.	2026-07-27 23:07:40 +08:00
Ryker_Feng	fcbf0609b0	feat(chat): edit and rerun latest user turn (#4377 ) Co-authored-by: Willem Jiang <willem.jiang@gmail.com>	2026-07-27 22:46:51 +08:00
Vanzeren	6f53fd5e99	feat(runtime): enforce artifact delivery from workspace snapshots (#4494 )	2026-07-27 22:27:16 +08:00
Diego Câmara	ac18f518c8	feat(sandbox): add Tenki cloud sandbox provider (#4382 ) * feat(sandbox): add Tenki cloud sandbox provider Adds deerflow.community.tenki, a SandboxProvider backed by Tenki cloud microVMs, alongside the existing e2b_sandbox / boxlite / aio_sandbox backends. Selected via `sandbox.use: deerflow.community.tenki:TenkiSandboxProvider` (resolved by class path, so the change is purely additive). The full Sandbox contract is implemented — execute_command plus read/write/update/download_file and list_dir/glob/grep — with file ops run as busybox-portable shell commands (cat / find / grep / chunked base64), reusing deerflow.sandbox.search, mirroring e2b_sandbox and boxlite. Tenki's SDK is synchronous, so unlike boxlite there is no event-loop bridge. Tenki sandboxes run as an unprivileged user with /mnt root-owned, so the /mnt/user-data virtual prefix is remapped under the writable home dir (like e2b_sandbox); the provider also best-effort sudo-symlinks /mnt/user-data to that home dir so agent shell commands using the literal path still work. Sandboxes are pooled per (user, thread) with warm reclaim, a replica cap, and an idle reaper via the shared WarmPoolLifecycleMixin. Transient transport blips get one bounded retry; terminal session errors evict and recreate. Only the stable Tenki surface is used (create/terminate + exec/shell/fs) — no volumes, snapshots, or template builds — so any stock base image works. The tenki-sandbox SDK is an optional extra (deerflow-harness[tenki]) and is imported lazily, so a default install and every other provider are unaffected. Tested: unit suite runs in CI without tenki-sandbox installed; a live integration test and full-surface e2e were verified against real Tenki sandboxes. * fix(sandbox): remove unsafe auto-retry from Tenki exec Pre-merge review caught that the transient-transport retry sat at the universal _exec layer, so it retried every operation — execute_command and base64 file-write chunks included. gRPC has no exactly-once guarantee: a "socket closed" ack-drop after the server already ran the op means the retry runs it twice, double-firing command side effects and duplicating a write chunk mid-file (silent binary corruption on multi-chunk writes). exec is not idempotent, so it must not be auto-retried. Reverts to the boxlite/e2b behavior: a transient error surfaces to the caller (returned as text by execute_command, raised by the file ops); a terminal session error still evicts the sandbox so the next acquire rebuilds it. Verified live end-to-end across 31 edge cases (empty/binary/unicode/chunk-boundary files, shell-metachar content, error paths, list/glob/grep, warm-pool reclaim, concurrency). * fix(sandbox): address Tenki provider review feedback - Use Tenki's native sandbox.fs API for all file transport (read_text, read_bytes, write_stream, mkdir) instead of cat/chunked-base64 over shell. Uploads stream in 1 MiB frames; append is read-modify-write because the write stream has no append mode (same approach as community/e2b_sandbox). - download_file streams via fs.read_stream and enforces the 100 MB cap on bytes actually received, closing the TOCTOU window between the old wc -c size probe and the read. - list_dir/glob/grep report paths back under /mnt/user-data instead of the sandbox-internal home dir, so results feed straight into the file APIs. - Create with wait=False and await wait_ready() here: create(wait=True) raises with the session handle still inside the SDK, leaking a running microVM this provider could never terminate. - Configure the sandbox lifetime (max_duration, default 4h) and expose sticky; without it Tenki reaps a reused thread's sandbox after ~30 min. - close() terminates before marking the adapter closed and re-raises real failures, so a failed termination stays retryable instead of silently leaking a billed microVM; an already-gone session still counts as closed. - Bump the optional extra to tenki-sandbox>=0.4.0 and commit backend/uv.lock. * fix(sandbox): scope tenki grep() glob filter to its directory prefix Mirrors #4168, which fixed the same defect in the E2B provider. The tenki adapter reduced a directory-scoped pattern like "src/.js" to its basename before filtering, so the search silently broadened to every matching-extension file in the tree. Post-filter grep's hits through path_matches() against the path relative to the search root, the same way glob() already does, so both agree on what a directory-scoped pattern means. fix(sandbox): address Tenki provider review — eviction, id width, write lock, grep -H Four fixes from the upstream review: download_file no longer swallows terminal transport errors. The broad `except OSError: raise` re-raised ConnectionError/BrokenPipeError/EOFError (all OSError subclasses that _is_terminal_failure treats as terminal) before _note_failure ran, so a session that died mid-download was never evicted. Only our own EFBIG size-cap now passes through without eviction. Sandbox id widened from 32 to 64 bits (`[:8]` to `[:16]`), matching community/e2b_sandbox. The warm pool is keyed by this id with no full-seed fallback, so a collision could let one user reclaim another's parked sandbox on a multi-tenant gateway. _fs_op now holds the lock across the op, not just the fs lookup, so concurrent calls on the same sandbox serialise over the SDK's shared connection. The eviction callback runs after the lock is released to avoid a lock-order deadlock with the provider. The append read-modify-write is serialised by a dedicated _write_lock so two concurrent appends can't clobber each other. grep passes -H so a search whose path resolves to a single file still prints the filename; without it the file:line:text unpack dropped every match. * fix(sandbox): address Tenki provider review round 2 - validate config `environment` at load time (_validate_extra_env) so a bad key fails fast instead of surfacing as an SDK error mid-command - document the deliberate lock decision in download_file: the instance lock is dropped before streaming so a 100 MB download can't block every other tool; terminal transport errors still evict via _note_failure - tighten the terminal-error comment to note ConnectionError/BrokenPipeError/ EOFError are also treated terminal via isinstance - document TenkiSandboxProvider in backend/AGENTS.md (provider detail, warm-pool destroy hook, community provider list) - add a commented Tenki block to config.example.yaml for parity with AIO/BoxLite - tests: config env validation, grep -F/case-sensitive flags, glob include_dirs, list_dir max_depth, bootstrap-failure warning branch * fix(sandbox): make Tenki bootstrap non-interactive and time-bounded The create-time bootstrap runs under the per-scope acquire lock, so a hang would stall acquire for that scope indefinitely: - use `sudo -n` so a password-requiring sudoers entry fails fast (swallowed by the existing `\|\| true`) instead of blocking on a tty password prompt - pass a timeout to the bootstrap `remote.exec` so any other stall drops to the existing warning path rather than wedging acquire Best-effort by design; the file APIs still work via the home remap on failure. * test(sandbox): pin Tenki bootstrap timeout to its actual value Assert bootstrap["timeout"] == _BOOTSTRAP_TIMEOUT instead of `is not None`, so a regression to timeout=0 (treated as no timeout by some SDKs) or an unrelated value is caught rather than passing a weaker non-None check. --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com>	2026-07-27 22:20:07 +08:00
Daoyuan Li	5ce3cecf2a	Fix concurrent thread metadata merges (#4489 )	2026-07-27 22:18:02 +08:00
March-77	b22f85c686	fix(sandbox): reconcile E2B sandboxes safely (#4443 ) * fix(sandbox): reconcile E2B sandboxes safely * fix(sandbox): clear failed E2B adoption intent	2026-07-27 14:10:24 +08:00
阿泽	1baa8ad696	feat(clarification): structured form fields for human-input cards (#4400 Phase 1) (#4406 ) * feat(clarification): structured form fields for human-input cards Add a request-side v2 `form` mode to the ask_clarification protocol so business flows (e.g. expense reimbursement) can collect several values in one card instead of sequential free-text questions: - `ask_clarification` gains a restricted `fields` parameter (text / textarea / number / select / multi_select / checkbox / date) - ClarificationMiddleware validates and normalizes fields explicitly (whitelisted types, unknown -> text, select-likes without options -> text, duplicate/invalid entries dropped, all-invalid falls back to the legacy modes) since the middleware short-circuits before tool execution; the plain-text fallback lists fields for IM channels - Form payloads carry `version: 2` so older frontends degrade to the text fallback; replies stay on the v1 response protocol — the card submits a readable summary as `response_kind: "text"`, so journal persistence and answered-card recovery are unchanged - Frontend renders typed field controls with required-field validation and compact multi-select chips Part of #4400 (scope narrowed per maintainer feedback: request-side only, no new response kinds, no top-level multi_choice). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * fix(clarification): harden form protocol per review feedback Address the five review points on #4406: - Reject field names colliding with JS Object.prototype members on both sides; frontend reads form values via own-property access only, so `constructor`/`toString`-style names can no longer leak inherited members into required validation or the submitted summary - Close open requests answered through the legacy text fallback: a visible plain human reply (no response metadata) now marks every previously-opened request as answered, so upgrading to a v2-aware frontend cannot leave the composer locked on an already-answered card - Give checkbox fields deterministic boolean semantics: values are seeded to an explicit false ("no" in the summary) and `required` means must-agree/consent; documented in the tool schema - Make middleware field validation atomic: structurally broken entries (bad/duplicate/reserved names, over-cap field/option counts or text lengths) degrade the whole form instead of silently dropping fields; options are trimmed/deduped with blanks removed so the backend never emits payloads the frontend parser rejects - Associate form labels/controls (htmlFor/id), aria-required, aria-invalid, and error descriptions for accessibility Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * refactor(clarification): type the fields item schema via TypedDict Replace `fields: list[dict[str, Any]]` with `list[ClarificationFormField]` (a TypedDict with `name` required and the type whitelist as a Literal) so the provider-facing tool schema documents the item shape instead of an opaque object relying on the docstring. Runtime validation is unchanged and stays in ClarificationMiddleware, which intercepts the call before tool execution. Addresses the non-blocking review suggestion on #4406. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * fix(frontend): drop unsupported aria-invalid from multi-select group jsx-a11y: role=group does not support aria-invalid; the error linkage stays via aria-describedby. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * fix(clarification): coerce numeric required flags and normalize fields once - `_normalize_bool` now coerces 1/0 (some providers serialize booleans as integers), so `required: 1` no longer silently flips to optional - `_handle_clarification` normalizes `fields` once and passes the result to both the text fallback and the payload builder Addresses the non-blocking review nits on #4406. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * fix(clarification): harden form protocol per contract review round 2 Backend: - Guard unhashable JSON in the intercept path: `type: []`/`{}` degrades the field to text and `clarification_type: []` coerces to str instead of raising TypeError (which, with return_direct, ended the turn with an error and no card or fallback) - Add a total budget over the serialized normalized fields (16KB UTF-8 bytes): per-item caps alone admitted forms whose IM text fallback exceeded channel delivery limits (Slack 40k chars, Feishu ~30KB card), silently truncating trailing fields; a boundary test proves any accepted form's fallback stays deliverable Frontend: - Submission value now appends a JSON block keyed by stable field names (readable summary alone is delimiter-ambiguous), with a collision regression test - Parser boundary tightened to match backend constraints: empty option values (Radix SelectItem crash), duplicate option ids/values, duplicate field names, and the form<->version-2 binding are rejected - Keep the error node mounted while any field is still invalid so aria-describedby never points at a removed element (happy-dom interaction test) - Required semantics are now accessible: native checkbox control (no HTML required attribute — it would intercept the custom submit path), visually-hidden localized "required" markers next to the aria-hidden asterisks - Legacy-fallback closure narrowed to the latest unanswered request: nothing guarantees a single outstanding clarification across runs, and closing all would silently swallow older decisions; an older request left open becomes the active card again Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * fix(frontend): keep clarification selects controlled --------- Co-authored-by: Claude Fable 5 <noreply@anthropic.com>	2026-07-27 14:05:31 +08:00
Vanzeren	e01173d8b2	bench(checkpoint): production-shaped full/delta benchmark with configurable snapshot frequency (#4467 ) * feat(checkpoint): production-shaped full/delta benchmark with configurable snapshot frequency - Group benchmark scripts into per-family folders (checkpoint/, sandbox/) - Extract shared benchmark infrastructure into checkpoint_bench_common.py - Add checkpoint_delta_snapshot_frequency config (default 1000, process-frozen); freeze it in make_lead_agent and DeerFlowClient; key the state-schema adaptation cache by resolved frequency - New bench_production.py: per-case child processes run N ainvoke turns through the real lead-agent graph (scripted deterministic model, real AsyncSqliteSaver), then measure GET /state + POST /history through the real Gateway route stack in one event loop (httpx ASGITransport), cold/warm accessor-cache split, cross-mode digest gates - New summarize_production.py: delta/full ratios plus decision metrics (snapshot_write_spike, cache_effect_ms, checkpoint_write_share, auto-discovered history per-limit ratios) * fix(checkpoint): address production benchmark review	2026-07-27 11:47:49 +08:00
March-77	2e5c8da257	fix(sandbox): bypass proxies for local AIO traffic (#4444 ) * fix(sandbox): bypass proxies for local AIO traffic * fix(sandbox): classify public IPv6 proxy targets	2026-07-27 07:47:39 +08:00
Huixin615	090e80c1dd	fix(runtime): fail-stop runs when lease ownership cannot be confirmed (#4431 ) * fix(runtime): fail-stop runs after lease expiry * test(runtime): cover late successful lease renewal --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com>	2026-07-27 07:25:34 +08:00
Huixin615	1cd5dea336	fix(streaming): signal replay history gaps (#4426 ) * fix(streaming): signal replay history gaps * fix(streaming): guard initial Redis replay window * fix(frontend): align inactive gap recovery --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com>	2026-07-27 07:13:06 +08:00
Aari	244ce7739f	fix(runtime): linearize delta-mode checkpoint resume (#4460 ) * fix(runtime): linearize delta-mode checkpoint resume Resuming a run from an older checkpoint forks the lineage, and in delta mode that fork's state cannot be materialized correctly: the delta history walk collects every pending_writes entry stored on each on-path ancestor, but a shared parent also carries the writes of the sibling child that was abandoned. Those writes replay into the fork, so the run starts from a message list that still contains the answer it was meant to replace — regenerating in a branched thread surfaced this as the superseded assistant message reappearing beside the new one after a reload. All three saver implementations are affected, so write-to-child ownership is a gap in the upstream delta contract rather than one saver's slip. Rather than reimplement that walk, express the fork as what it means: materialize the requested checkpoint's state, write it as an Overwrite on the current head (which has no siblings), and run linearly. The abandoned turn stays in history as the rewritten head's ancestry. This runs after the rollback point is captured, so cancel-with-rollback still restores the real pre-run head, and fails closed — an unreadable resume checkpoint raises instead of falling back to the corrupt fork. Full mode keeps forking: its checkpoints carry complete channel_values and need no replay. * fix(runtime): restore complete delta resume state * fix(runtime): linearize delta rollback restoration * Apply suggestions from code review Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> * fix(runtime): serialize delta resume preparation --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com> Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>	2026-07-26 21:59:19 +08:00
DanielWalnut	bb9f67aaf1	fix(runtime): close cancelled replacement admission (#4472 )	2026-07-26 21:57:39 +08:00
Vanzeren	1c7531242c	feat(runtime): record terminal artifact delivery receipts (slice 1 of #4272 ) (#4365 ) * feat(runtime): record terminal artifact delivery receipts (#4272) * fix(runtime): persist delivery receipts across recovery * test(runtime): cover delivery receipt invariants * fix(runtime): preserve terminal status on receipt outages	2026-07-26 21:45:47 +08:00
lllyfff	8145d66a33	feat(memory): memory message processing (#4447 ) * feat(memory): signals-based update pipeline + always-on watermark/trivial filter Refactor the DeerMem memory update pipeline (message_processing -> queue -> updater) around a signals frozenset seam, replacing the (filtered, correction_detected, reinforcement_detected) 3-tuple with (filtered, signals: frozenset[str]) end to end. message_processing: - Externalize signal-detection patterns to YAML (message_patterns/.yaml). - Extend signals from correction/reinforcement to a 6-class set (correction/reinforcement/preference/identity/goal/decision); detect_signals returns a frozenset aligned with the fact category enum. - Pure-acknowledgment turns ("ok"/"好的"/...) are always filtered out before enqueue (whole-message fullmatch), saving an extraction LLM call. queue (core/queue.py): - In-memory list + debounce timer, with flush_sync (graceful-shutdown drain that joins an in-flight worker under a hard timeout) and queue_max_depth backpressure (signal-bearing updates always admitted; QueueFull otherwise). - Same-key updates coalesce with a signal union; per-batch success/fail summary. updater (core/updater.py): - head500+tail500 message truncation (replaces the 1000-char head chop). - Always-on per-thread watermark: feed only messages added since the last extraction. The watermark is in-memory and is not advanced on failure, so a failed/lost update is re-fed on the next conversation turn. - [MANUAL] prompt marker for user-authored facts (source.type="manual"). - Post-invoke extraction_callback (host-injected) emitting facts_extracted / facts_accepted / rejected_low_confidence; the host default logs metrics and flags >60% rejection. Confidence filtering remains in _apply_updates (the existing fact_confidence_threshold check); there is no separate write gate. Consolidation stays opt-in (lossy). The ABC add/add_nowait signature is unchanged, so the summarization flush hook and host are unaffected. Tests: add test_message_processing_signals, test_updater_truncation, test_updater_watermark; update queue/updater/consolidation/staleness/pluggable tests for the signals seam. Co-Authored-By: Claude <noreply@anthropic.com> fix(memory): harden update pipeline per PR review - Catch QueueFull in DeerMem.add/add_nowait so backpressure degrades to 'update skipped' instead of propagating into after_agent / summarization_hook and breaking the agent run (peer middlewares self-guard; MemoryMiddleware was the lone exception). Emergency (add_nowait) always admits under backpressure -- its data cannot be re-fed next turn. - Rewrite the watermark from index-based to content/identity-based (_message_identity + _feed_after_watermark) so it stays correct when summarization removes the conversation front -- an index watermark pointed at the wrong message and silently skipped un-extracted tail turns. The emergency flush bypasses the watermark (bypass_watermark on ConversationContext, threaded through update_memory) and coexists with (does not replace) a pending normal update, so a flush cannot drop a pending update's un-extracted tail. - Populate facts_accepted / rejected_low_confidence inside _apply_updates at the real confidence-filter site (passed_threshold) instead of re-deriving the threshold in _finalize_update -- eliminates metric drift. - Emit extraction metrics in a finally with an 'attempted' flag so exception failures (parse error, apply_changes raise after retry) are observable, not only the happy path. - Re-detect signals on the post-watermark feed for the extraction hint so it no longer references turns the LLM cannot see; admission-time signals still drive backpressure. - Move the post-batch reschedule inside the queue lock to close a non-atomic self._timer race with a concurrent add. Co-Authored-By: Claude <noreply@anthropic.com> * fix(memory): address follow-up review nits (LRU, metric name, docstring) - Bound the in-memory watermark cache with a configurable LRU (watermark_max_keys, default 4096, 0=unbounded). A dropped key re-extracts one batch on that thread's next turn (the documented restart behavior), so eviction is safe and preserves the content-identity watermark's front-removal guarantee. Adds _watermark_get/_watermark_set helpers and a bounded-LRU regression test. - Rename the extraction metric facts_accepted -> facts_passed_confidence so the name matches what the >60% rejection-rate warning assumes (a confidence-gate signal, not a persisted-fact count); drop the stale "historical semantics" justification. Brand-new callback, one consumer. - Fix the stale test_message_processing_signals module docstring: the signals seam is already swapped to frozenset, and a stale stage-numbering prefix is removed. Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Claude <noreply@anthropic.com>	2026-07-26 21:16:36 +08:00
Ryker_Feng	68c0ffdac8	feat(frontend): pin recent chats (#4442 ) * feat(frontend): pin recent chats * fix(threads): address pin-chat review feedback - Stop bumping updated_at on metadata-only PATCH (pin/unpin) via a new update_metadata(touch=False) path so unpinning no longer jumps a chat to the top of the updated_at-sorted recent list. - Narrow patchThreadMetadata to a ThreadMetadataPatchResponse matching the Gateway's actual response (no values/context). - Namespace the pinned metadata key as deerflow_pinned for consistency with deerflow_sidecar / deerflow_branch. - Cover touch/touch=False behavior in repo + router tests; document the e2e mock's updated_at preservation now mirrors production. * style(frontend): format thread utils test * fix(threads): make pinned ordering server-side * test(frontend): keep infinite-scroll fixture order stable * test(frontend): stabilize lark reconnect e2e * docs: clarify thread pin metadata contract	2026-07-26 20:47:58 +08:00
Zhengcy05	2f60bee388	fix: surface length-capped model responses (#4309 ) * fix: surface length-capped model responses * fix: avoid the influence of the mid-turn * fix: correcting semantic annotations * fix: add ModelLengthTerminationDetector to compatible providers * fix:delete redundancy code * fix:supplementing log information improves observability * fix: align the document and complete the assertions. * fix: unit test * fix: revert AGENTS.md * fix: unit test * fix: add annotation and skip AIMessage has empty content	2026-07-26 14:43:08 +08:00
Aari	d1aeea2c3e	fix(checkpoint): unwrap Overwrite first writes into empty channels (#4383 ) * fix(gateway): stop persisting Overwrite wrappers into empty reducer channels on branch Thread branching (and POST /state on a never-written channel) wraps copied reducer values in Overwrite. Upstream BinaryOperatorAggregate.update seeds an empty (MISSING) channel with values[0] verbatim without unwrapping, so Union-typed channels (sandbox/goal/todos/promoted) stored the wrapper literally and the next consumer crashed with TypeError: 'Overwrite' object is not subscriptable (#4380). Patch the channel to unwrap the first write (mirroring DeltaChannel semantics), and stop copying thread-scoped channels (sandbox, thread_data) into branches: the parent's sandbox_id would bind the branch to the parent's workspace and release lifecycle. * refactor(checkpoint): drop private _get_overwrite import for a local Overwrite check Importing langgraph's underscored _get_overwrite at module top level meant an upstream refactor that drops it - plausibly the same release that fixes the bug - would fail this module's import and crash startup before the probe can stand the patch down. Replace it with a local helper on the public Overwrite type, and fix two test docstring nits. * refactor(checkpoint): write patch flags via their constants to avoid drift Both saver patches read their "already patched" idempotence flag through a module constant (_PATCH_FLAG / _BINOP_PATCH_FLAG) but wrote it as a hard-coded attribute literal, so renaming the constant would silently break the guard and double-apply the patch. Write via the same constant (setattr), dropping the now-unneeded attr-defined ignores. --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com>	2026-07-26 10:44:50 +08:00
Daoyuan Li	5d073991c2	fix(sandbox): widen boxlite/aio_sandbox tenant hash and verify identity on reclaim (#4171 ) * fix(sandbox): prevent truncated tenant ID reuse * fix(sandbox): handle late same-tenant box registration	2026-07-26 09:47:57 +08:00
Yufeng He	6e6c078595	fix(sandbox): unwrap Overwrite-wrapped sandbox state in after_agent (#4381 ) * fix(sandbox): unwrap Overwrite-wrapped sandbox state in after_agent Fork-restored checkpoints can deliver the sandbox channel still wrapped in langgraph.types.Overwrite: the rollback restore applies replace-style writes through a state-mutation graph in delta checkpoint mode, and after_agent/aafter_agent then crash subscripting the wrapper ("TypeError: 'Overwrite' object is not subscriptable") on the next sandbox tool run in the forked conversation. Unwrap before reading the sandbox id, and pin both hooks against an Overwrite-wrapped state. Refs #4380 (bug 1 of 2; the history-loss half is a separate display path) Signed-off-by: Yufeng He <40085740+he-yufeng@users.noreply.github.com> * fix(sandbox): don't release fork-restored parent sandboxes The Overwrite-wrapped value replays the parent thread's sandbox state, so releasing it from the forked run would evict the parent's warm sandbox. _unwrap_sandbox now reports the wrapped form, and both after_agent hooks skip the release for it while keeping the normal path unchanged. Signed-off-by: Yufeng He <40085740+he-yufeng@users.noreply.github.com> --------- Signed-off-by: Yufeng He <40085740+he-yufeng@users.noreply.github.com> Co-authored-by: Willem Jiang <willem.jiang@gmail.com>	2026-07-26 09:02:40 +08:00
Ryker_Feng	7aa314b4c1	feat: add Lark CLI integration (#3971 ) * feat: add lark cli integration * fix: polish lark integration actions * feat: support lark incremental permissions * fix: detect lark authorization completion * fix: harden lark integration install * feat: expand lark auth scopes and reuse host auth in sandbox Default lark auth to least-privilege (recommend=false, base sign-in only) and expose the full set of lark-cli --domain business domains as native --domain grants instead of a 4-domain read-only mapping. Resolve the skill pack from the latest larksuite/cli GitHub release at install time with content-hash integrity, and surface version/runtime drift in status. Share the per-user lark-cli config/data profile between the Gateway Settings auth flow and agent conversations by mounting the integration dirs into the AIO sandbox and injecting the matching env for lark-cli commands, with an allowlisted extra_mounts path in the provisioner/K8s backend and traversal guards on integration paths. * style: fix lint issues from ruff and prettier Sort imports in the provisioner PVC test and re-wrap two long i18n description strings to satisfy backend ruff and frontend prettier CI. * fix(lark): address managed integration review feedback * fix(frontend): stabilize integrations settings e2e * test(sandbox): isolate remote backend legacy visibility check * test: fix backend unit failures after merge * Harden Lark integration review fixes * Format Lark integration E2E test * fix(lark): harden sandbox credential exposure and status disclosure Address willem_bd's security review on PR #3971: - Mount the per-user lark-cli config dir (long-lived appSecret) read-only into the AIO sandbox; only the refreshable-token data dir stays writable. - Redact host filesystem paths (install_path, cli.path) from GET /lark/status and the config/auth complete responses for non-admin callers, fail-closed on any auth error. - Document the npm postinstall trade-off (--ignore-scripts is not viable because @larksuite/cli fetches its platform binary in postinstall). - Document the sandbox credential trust boundary in AGENTS.md and README, pointing at the sidecar-broker follow-up (#4338). --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com>	2026-07-26 08:09:17 +08:00
Aari	68797c5759	fix(gateway): scope branch history seed run ids per inherited turn (#4459 ) Branch creation seeds the new thread's run-event feed from its checkpoint so inherited history survives the first run (#4380). Every seeded row carried one shared run id, but run_id is a turn identity to the feed's consumers, not a provenance tag: regenerating the inherited answer resolves that row's run id as the superseded source, and GET /messages/page then drops every row carrying it. One shared id for the whole seed therefore deleted the complete inherited history on a branch's first regenerate, leaving only the regenerated turn. Group seeded rows into one synthetic run per inherited turn (branch-seed-{thread_id}-{n}), a new turn opening at each persisted human message — the same boundary a real run has, including the allowlisted hidden ask_clarification reply, which resumes as its own run. Supersession is then confined to the turn actually regenerated. Co-authored-by: Willem Jiang <willem.jiang@gmail.com>	2026-07-26 08:00:48 +08:00
Aari	37c343fe30	fix(summarization): summarize with the run model, fall back on summary-provider failure (#4361 ) * fix(summarization): own the run model for compaction; bound failure With summarization.model_name: null the summary model resolved to config.models[0] while the executing model is selected per run; when they differ and models[0]'s provider is broken (expired key, quota, outage) compaction silently failed every triggered turn and context grew unbounded until the main provider 400s the run (#3103's shape), even though the run's own model was healthy. Model ownership is now sourced from the builders, not re-derived at runtime: - The lead, subagent, and manual /compact builders each pass the resolved run model into create_summarization_middleware(run_model_name=...). The middleware no longer reads runtime.context / get_config(), which do not carry a custom agent's or a subagent's resolved model, so a custom-agent lead run and a distinct-model subagent now summarize with their own model, not models[0] / the parent's. Runtime re-resolution and the per-name model cache are removed. - model_name: null summarizes with the run's own model; an explicitly configured summary model generates and falls back to the run model on failure. The fallback is built lazily after the primary fails and its construction is guarded, so a broken fallback cannot skip a healthy primary or escape the automatic failure boundary. Failure is bounded and side-effect-safe: - An empty or whitespace-only response is treated as a generation failure, not a valid summary, so compaction never removes all history for an empty replacement. - compact_state/acompact_state take raise_on_failure independent of force: the manual /compact path always surfaces a generation failure (even force=false) and routes it to the existing ContextCompactionFailed path (HTTP 500 -> frontend error toast) instead of an unconsumed response reason. The automatic path leaves compaction state unchanged. - before_summarization hooks fire only after a replacement summary exists. SummarizationConfig.model_name, config.example.yaml, and docs/summarization.md document the final lead/subagent/manual ownership rules. Part of RFC #4346 (section A). Evaluating fraction/triggers against the run model's profile (profile ownership) is a separate follow-up. * fix(summarization): manual /compact model ownership + fail-open construct/parse Manual /compact carried only agent_name, so it derived the run model from the custom-agent model or config.models[0] and missed the request-selected model the run path uses (request -> custom-agent -> default). Carry model_name through ThreadCompactRequest and the frontend compact call, resolve with the same precedence, and move the custom-agent config read off the event loop (asyncio .to_thread) with user_id so the strict blocking-IO gate is not bypassed by the broad except. Make one summary attempt own its full lifecycle so the fail-open boundary covers construction and response parsing, not just invocation: build each candidate model lazily and guarded (a raising constructor falls through to the healthy run model instead of breaking agent construction), build the model_name:null primary from the run model rather than config.models[0], and run response text extraction inside the invocation try so a failing .text accessor falls back instead of escaping compaction. Adds factory-level constructor-failure, response-extraction-failure (sync/async), and route-path model-ownership tests.	2026-07-26 07:39:39 +08:00
MiaoRuidx	735f67a5b2	fix: guard pending run startup cancellation (#4450 ) * fix: guard pending run startup cancellation * fix(run): address startup review feedback * fix(run): narrow start_run store contract --------- Co-authored-by: MiaoRuidx <12540796+MiaoRuidx@users.noreply.github.com> Co-authored-by: Willem Jiang <willem.jiang@gmail.com>	2026-07-25 23:50:21 +08:00
Huixin615	8af760fc30	fix(runtime): make orphan reconciliation lease-aware (#4427 )	2026-07-25 23:26:17 +08:00
Vanzeren	3c8b82c594	fix(runtime): serialize checkpoint writes with active runs (#4437 ) * fix(runtime): serialize checkpoint writes with active runs * fix(runtime): address checkpoint reservation reviews * fix(runtime): address reservation race reviews * fix(runtime): refine reservation conflict semantics	2026-07-25 23:18:34 +08:00
VectorPeak	07d8b98864	fix(mcp): ignore malformed path-like text (#4456 ) Co-authored-by: chatgpt-codex-connector[bot] <199175422+chatgpt-codex-connector[bot]@users.noreply.github.com>	2026-07-25 21:43:33 +08:00
Vanzeren	8c19a2eb36	perf(checkpoint): linearize message write merging (#4421 ) * perf(checkpoint): linearize message write merging * test(checkpoint): address message reducer review	2026-07-25 21:19:24 +08:00
luo jiyin	3b77a7401b	fix(sandbox): enforce E2B replica capacity limits (#4391 ) * fix(sandbox): enforce E2B replica capacity limits (in-process) Add SandboxCapacityExceededError with diagnostic fields. Add overflow_policy (wait/reject/burst), acquire_timeout, and burst_limit config options. Implement atomic capacity reservation with a four-slot model: reserved / active / warm / transitioning. Transitioning slots close the window where active-to-warm or warm-to-active transitions appear to have zero occupied slots, which would let concurrent acquires exceed the configured replica ceiling. Re-route release, reclaim, and evict through transitioning counters. Add shutdown guard: reject waiters, kill VMs created during shutdown. Add 14 tests: policy enforcement, release+acquire race, warm-reclaim race, shutdown-waiter interaction, shutdown-during-create, and concurrent different-thread capacity assertion. Related: #4339 * fix: harden e2b sandbox capacity lifecycle * fix: retain e2b capacity during uncertain eviction * fix: serialize e2b tombstone eviction * fix: retain capacity after uncertain e2b cleanup * fix: track e2b remote operations during shutdown * fix(sandbox): validate E2B capacity config * fix(sandbox): classify capacity errors * fix(sandbox): harden E2B capacity lifecycle * test(sandbox): cover E2B review findings * docs(changelog): note E2B capacity behavior * docs(readme): explain E2B overflow handling * docs(backend): record E2B lifecycle rules * docs(sandbox): clarify destructive E2B reset * fix(sandbox): close E2B capacity race gaps --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com>	2026-07-25 10:54:14 +08:00
ShitK	0f0955bf7b	fix(client): preserve ToolMessage artifacts (#4422 ) Co-authored-by: Willem Jiang <willem.jiang@gmail.com>	2026-07-25 09:47:58 +08:00
黄云龙	126fc9ea81	fix(subagents): clamp subagent limit consistently with MIN_SUBAGENT_LIMIT (#4081 ) * fix(subagents): align prompt and middleware subagent limit; allow min of 1 SubagentLimitMiddleware clamped max_concurrent to [2, 4] internally, but agent.py and client.py fed the raw config value into the system prompt, so a user-configured 1 (or 5) produced a prompt that disagreed with the enforced middleware limit. Lower MIN_SUBAGENT_LIMIT to 1 and clamp the raw config value with _clamp_subagent_limit() at both the agent factory and the embedded client so the prompt and middleware see the same value. * fix: remove unused imports MAX_CONCURRENT_SUBAGENT_CALLS, MIN_CONCURRENT_SUBAGENT_CALLS, clamp_subagent_concurrency * fix: harmonize clamp range [1,4] across middleware, config, and prompt path; fix lint - Changed MIN_CONCURRENT_SUBAGENT_CALLS from 2 to 1 so prompt.py's clamp_subagent_concurrency and the middleware's _clamp_subagent_limit both clamp to [1,4] — eliminating the divergence where the prompt told the model 'max 2 task calls' but the middleware enforced 1. - Applied _clamp_subagent_limit at build_middlewares (agent.py:360) so all 3 construction sites (agent.py:360, agent.py:450, client.py:259) consistently clamp the config-resolved limit. - Derived MIN_SUBAGENT_LIMIT / MAX_SUBAGENT_LIMIT from MIN_CONCURRENT_SUBAGENT_CALLS / MAX_CONCURRENT_SUBAGENT_CALLS so the two module-level definitions stay in sync. - Added TestConfigParity.test_prompt_path_and_middleware_clamp_agree regression test. - Fixed lint. * fix(lint): add missing imports for MIN_CONCURRENT_SUBAGENT_CALLS and MAX_CONCURRENT_SUBAGENT_CALLS * docs+test: update AGENTS.md clamp range to 1-4; add prompt/middleware parity regression test - backend/AGENTS.md still documented the old [2,4] clamp in two places; updated to [1,4] to match MIN_CONCURRENT_SUBAGENT_CALLS = 1. - Added test_apply_prompt_template_single_subagent_limit_matches_middleware: renders the real system prompt with max_concurrent_subagents=1 and asserts the advertised HARD LIMITS value equals SubagentLimitMiddleware's enforced max_concurrent — the end-to-end check that would have caught the [1,4] vs [2,4] prompt-path divergence flagged in review. * refactor: simplify per review — restore clamp delegation, drop redundant call-site clamps Per willem-bd's review, reduce the PR to the one behavioral change plus docs/tests: - _clamp_subagent_limit delegates to clamp_subagent_concurrency again instead of inlining a byte-identical copy; with a single source of truth the TestConfigParity sync-check class is unnecessary — dropped. - Revert the call-site clamps in agent.py (build_middlewares, _make_lead_agent) and client.py (_ensure_agent) to main: both downstream consumers (SubagentLimitMiddleware.__init__ and the prompt path) already clamp internally, and the cross-module private import of _clamp_subagent_limit goes away with them. - Keep MIN_CONCURRENT_SUBAGENT_CALLS = 1 (the fix), the [1, 4] docstring updates, the AGENTS.md range corrections, and the end-to-end prompt/middleware parity test for single-subagent mode (docstring reworded: on main a configured 1 was bumped to 2 by both paths — there was no divergence to fix, just a silently raised floor). * test: fix stale comment referencing reverted agent.py/client.py call-site clamps --------- Co-authored-by: nankingjing <nankingjing@users.noreply.github.com> Co-authored-by: Willem Jiang <willem.jiang@gmail.com>	2026-07-24 21:56:11 +08:00
Daoyuan Li	ca3e510b7d	fix(scheduler): close duplicate dispatch race (#4105 ) Enforce one queued or running scheduled-task run per task with a partial unique index. The migration resolves legacy duplicates before creating the index, and losing inserts use the existing conflict or skip outcomes.	2026-07-24 21:41:09 +08:00
Daoyuan Li	159b774944	fix(skills): handle non-string frontmatter keys (#4167 ) Normalize YAML frontmatter keys in the shared parser so validation and review report malformed fields instead of failing while sorting mixed key types.	2026-07-24 21:25:53 +08:00
H Haidong	c7538cfb35	fix(runs): terminate orphaned streams after lease recovery (#4420 ) * fix(runs): terminate orphaned streams after lease recovery * fix(runs): include recovered ids in callback warnings * fix(runs): harden orphan recovery lifecycle	2026-07-24 19:34:20 +08:00
ShitK	a4ede80deb	fix(runtime): reject unsupported run options and stream modes (#4430 ) * fix(runtime): reject unsupported run options * fix(runtime): align SDK run compatibility * fix(frontend): avoid unsupported events stream mode --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com>	2026-07-24 19:24:24 +08:00
Ryker_Feng	cd9432bcc1	feat(tools): support GIF images in view_image (#4438 ) Add GIF to the view_image allowlist: map the .gif extension to image/gif and detect the GIF87a/GIF89a magic bytes so the existing extension/content cross-check accepts GIFs instead of rejecting them as an unsupported format. Covered by a new success test.	2026-07-24 13:12:43 +08:00
MiaoRuidx	80c06414f8	fix: make orphan reconciliation lease-aware (#4434 ) 让启动/孤儿 run 恢复在最终写入前通过 claim_for_takeover 原子重查 lease，避免 owner 在扫描后续约成功仍被误标为 error。补充扫描后续约的回归测试，并把 reconciliation 写失败测试迁移到 takeover claim 路径。 Co-authored-by: MiaoRuidx <12540796+MiaoRuidx@users.noreply.github.com> Co-authored-by: Willem Jiang <willem.jiang@gmail.com>	2026-07-24 09:48:48 +08:00
Aari	5f0108f56c	fix(runtime): stop subgraph stream frames impersonating root frames (#4407 ) * fix(runtime): stop subgraph stream frames impersonating root frames The web frontend always requested stream_subgraphs, and since delegated subagent graphs inherit the parent checkpoint namespace (#4215), their values snapshots and token chunks ride the parent stream. The worker's _unpack_stream_item dropped the namespace and published every subgraph frame under a bare event name, so a subagent's values snapshot replaced the whole thread view in SDK clients (#4399), its token chunks flooded the parent message stream, and a subagent's LLM error fallback could be mistaken for the parent run's. Publish subgraph frames under namespace-qualified SSE event names (mode\|ns1\|ns2, LangGraph Platform style) and keep root-only consumers (file-tool chunk batcher, subagent event persistence, error-fallback detection) on root frames only. Drop streamSubgraphs from the frontend submit paths: subtask progress arrives via root-namespace task_* custom events, so the flag only exposed the leak. * test(runtime): add production-shaped subgraph stream regression tests Address review: the namespace tests validated the publishing helpers with hand-fed namespaces, while the #4399 regression lived in the integration between LangGraph's delegation routing and the worker's stream loop. Add TestWorkerSubgraphStreamIntegration: a real parent graph delegates through the real SubagentExecutor and streams through run_agent into a real MemoryStreamBridge, locking both stream_subgraphs modes -- delegated frames arrive namespaced (never bare), a delegated error fallback cannot mark the parent run as errored, and without the flag delegated frames stay out while task_* custom events remain.	2026-07-23 23:32:06 +08:00
Huixin615	4a2ecd430e	fix(streaming): expose custom events to astream_events (#4403 ) * fix(streaming): expose custom events to astream_events * test(streaming): validate real custom event emitters --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com>	2026-07-23 22:56:12 +08:00
hataa	7857fa0cce	feat(authz): enforce tool authorization at assembly and runtime (#4370 ) * feat(authz): enforce tool authorization at assembly and runtime * fix(middleware): guard deferred tool setup lookup (#4370) --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com>	2026-07-23 22:51:35 +08:00
MiaoRuidx	f1632cc351	fix(run): add run event stream contract (#4342 ) * docs: document run event stream contract * fix(run): address event stream review feedback --------- Co-authored-by: MiaoRuidx <12540796+MiaoRuidx@users.noreply.github.com> Co-authored-by: Willem Jiang <willem.jiang@gmail.com>	2026-07-23 21:33:57 +08:00
Aari	b7933d18e4	fix(safety): backfill empty content-filter responses so they don't poison the thread (#4394 ) An empty assistant message from a provider safety filter (content_filter with no content, no tool calls) was persisted into thread history and replayed to strict OpenAI-compatible providers, which reject it with HTTP 400 ("message ... with role 'assistant' must not be empty") — breaking every later turn until a new chat is started. SafetyFinishReasonMiddleware only handled the tool-call case (#3028) and TerminalResponseMiddleware only the post-tool case (#4027), so a plain empty content-filter response fell through both. Extend the safety middleware to backfill a user-facing explanation when a safety-terminated message is otherwise blank, so the persisted turn is non-empty (and the user sees why it was blocked). Fixes #4393	2026-07-23 16:59:34 +08:00
Aari	70fb91654d	fix(gateway): seed branch run-events so inherited history survives forking (#4385 ) * fix(gateway): seed branch run-events so inherited history survives (#4380) The thread feed (GET /messages, /messages/page) reads the run-event store, but branch creation only wrote checkpoint state - a fresh branch had no message rows, so the parent history vanished from the UI as soon as the branch's first run refreshed the feed. Seed the branch's run_events from the same checkpoint snapshot the branch was created from, mirroring RunJournal's message-event contract (event types, hidden-message rules, original-user-text restoration). Best-effort: a seeding failure degrades to the old behavior and is reported as history_seed_mode=failed. * docs(gateway): correct branch-seed docstring on RunJournal divergences The "consumers cannot tell a seeded row from a journaled one" claim was overstated for AI rows: seeded rows omit run-scoped enrichment (usage / latency_ms / llm_call_index) and stamp caller=lead_agent rather than the message's original caller, neither recoverable from a checkpoint message. Rewrite the docstring to state these divergences explicitly and note they are display-invisible today (no consumer indexes those keys; per-message caller drives no attribution). Also add a code comment marking the hide_from_ui filter as intentionally stricter than the live paths. * fix(gateway): seed dict-shaped checkpoint messages + persist hidden AI/tool rows Two review-driven fixes to build_branch_history_seed_events: 1. Checkpoint messages can arrive as model_dump()-shaped dicts (the branch-matching helpers in threads.py already handle both BaseMessage and dict). The seed only handled BaseMessage, so a dict-backed checkpoint seeded nothing and the branch reported skipped_empty while history existed. Coerce dicts back to BaseMessage via messages_from_dict (faithful: tool_calls / tool_call_id / additional_kwargs survive); unparseable dicts are dropped best-effort. 2. RunJournal.on_llm_end and _persist_tool_result_message persist hide_from_ui AI/tool rows unconditionally (the frontend hides them client-side); the hide check only gates the reconciliation pass. The seed dropped them, so a hidden turn vanished from a forked feed and seeded rows diverged from journaled ones. Match RunJournal and write them, restoring true row-level parity. Adds tests for dict deserialization, the unparseable-dict drop, and the hidden AI/tool persistence contract.	2026-07-23 13:57:32 +08:00
Admire	a38b1daec3	fix(streaming): keep large file generation responsive (#4354 ) * fix(streaming): keep large file generation responsive * fix(streaming): address follow-up review feedback * fix(streaming): address final review feedback	2026-07-23 08:51:14 +08:00
Aari	7b330101d2	fix(tools): exclude injected runtime from list_uploaded_files schema (#4375 ) (#4376 ) Declaring the injected runtime arg as `Annotated[Runtime, InjectedToolArg] \| None` made the top-level annotation a Union, so LangChain no longer treated it as injected. It leaked into the model-facing schema and pydantic raised PydanticInvalidForJsonSchema on the ToolRuntime dataclass the moment the tool was bound to a model. The tool is bound by default for the lead agent, so any default run on an OpenAI-compatible provider failed at tool-bind time. Declare runtime as a bare Runtime first param, matching every other built-in tool (present_files, view_image, task, ...), which LangChain auto-injects and auto-excludes from the schema. Add a schema regression test that binds the tool.	2026-07-23 08:22:15 +08:00
Aari	0d4d0cb17d	feat(agents): database-backed storage for custom agent definitions (#4359 ) * feat(agents): database-backed storage for custom agent definitions Add an agent_storage.backend switch (default file, behaviour-unchanged) with a db backend that stores each custom agent as a row in the shared SQL persistence layer, so a multi-instance deployment sees the same agents on every node (#4331, #4357). Introduces an AgentStore interface routing all read/write surfaces, an agents table + migration 0006, startup validation, and a file->db importer. Follows the thread_meta store / run_events backend-switch / 0003_scheduled_tasks migration patterns; no new dependency. * fix(agents): make db storage path production-ready (review round 1) Addresses review feedback on the db/sync agent-storage path: - sql.py: mirror the async engine's per-connection SQLite PRAGMAs on the sync engine (busy_timeout=30000, synchronous=NORMAL, foreign_keys=ON, WAL) so both engines behave identically against the shared DB; guard the engine cache with a lock (double-checked) so concurrent first-touch cannot build duplicate engines or register the connect listener twice. - routers/agents.py + routers/assistants_compat.py: offload the sync-store reads that ran on the event loop (list/get/check, update's pre-read + legacy guard + refresh, and assistants_compat's four list routes) via asyncio.to_thread — on db+postgres each was a network round trip stalling the loop. Writes were already offloaded. - file.py: translate the create() mkdir(exist_ok=False) race FileExistsError into AgentExistsError (router 409, matching SqlAgentStore's IntegrityError path); correct the _write docstring — per-file atomic replace, two commits sequential not transactional. Tests: sync-engine PRAGMA + engine-cache reuse assertions; file create-race -> AgentExistsError; strict Blockbuster anchor over the read endpoints so a regression back onto the loop fails CI. * fix(agents): address round-2 review on the db store path - update_agent tool: align the docstring/inline comment with FileAgentStore._write. Cross-field write atomicity is db-only; the file backend commits config then soul via two sequential os.replace (a crash between them can leave a fresh config.yaml beside a stale SOUL.md). The dropped partial-write reporting is an intentional tradeoff — the stage-then-replace safety is preserved (test_update_agent_soul_failure_does_not_replace_config still holds). - SqlAgentStore.update(): true upsert. Catch IntegrityError on the insert-on-missing branch, re-fetch and apply, so two concurrent first-time writes (e.g. two setup_agent handshakes) converge instead of surfacing a raw UNIQUE(user_id, name) violation as a 500. Symmetric with create(). - get_agent_store(): document the graph-subprocess config-resolution invariant (the except->file fallback is a genuine no-config path, not a mask for a misconfigured graph process) and pin it with two tests driving the real get_app_config() file resolution: db resolves from an on-disk config.yaml, file fallback when config is unresolvable. * test(agents): cover SqlAgentStore.update() write-race upsert recovery Mandatory-TDD test for the round-2 fix in 0680340a: two concurrent first-time update()s where the loser's insert hits UNIQUE(user_id, name). Deterministically forces the IntegrityError recovery path by making the first _row probe miss the committed winner, and asserts last-writer-wins instead of a surfaced 500.	2026-07-23 08:03:21 +08:00
March7	4dd7cafef1	fix(sandbox): serialize E2B release transitions (#4355 )	2026-07-23 07:42:43 +08:00
Daoyuan Li	44990ff194	fix(mcp): use threading.Lock for OAuth token refresh to avoid cross-thread deadlock (#4240 ) * fix(mcp): use threading.Lock for OAuth token refresh to avoid cross-thread deadlock OAuthTokenManager created one asyncio.Lock per server for the process lifetime. The embedded/TUI sync tool-call path (DeerFlowClient.stream() -> LangGraph's ToolNode._func -> a ThreadPoolExecutor -> make_sync_tool_wrapper's per-call asyncio.run()) invokes get_authorization_header from a fresh event loop on a fresh OS thread for every concurrent tool call. asyncio.Lock binds to whichever loop first contends on it; when a caller on a different loop later releases or wakes a waiter, it does so without call_soon_threadsafe, so the waiting loop's selector is never woken and that caller hangs forever with no exception. A third concurrent caller instead raises a synchronous RuntimeError ("bound to a different event loop"). Either way, two concurrent OAuth-protected tool calls (including the very first cold-start token fetch) can freeze the entire agent turn. Gateway's async path (ToolNode._afunc) is unaffected. Replace the asyncio.Lock with a plain threading.Lock, acquired via asyncio.to_thread so the blocking wait never blocks the event loop, and released synchronously in a finally block. This keeps the single-fetch de-duplication the lock provided while making it safe across however many event loops/threads call into the same server's lock. Adds a regression test that runs three threads, each with its own event loop, calling get_authorization_header concurrently for the same server, and asserts (with a bounded join timeout so a regression fails fast instead of hanging the suite) that none hang or raise, and that only one real token fetch happens. * fix(mcp): make OAuth lock acquisition cancellation-safe get_authorization_header acquired the per-server threading.Lock via a bare `await asyncio.to_thread(lock.acquire)`, with the try/finally that guarantees release only starting after that await returned. Once the executor thread had actually started running lock.acquire(), cancelling the awaiting caller only stopped the caller -- Python cannot interrupt a running OS thread. CancelledError was still delivered to the caller immediately, but the thread kept blocking until the current holder released, then silently acquired the lock with nobody left to call release() for it. The lock stayed locked forever and every later OAuth token refresh for that server blocked permanently at the same line -- the exact cross-thread deadlock this lock was introduced to prevent, reintroduced via a different path under cancellation (e.g. a caller wrapped in asyncio.wait_for/asyncio.timeout, or task-group cancellation). Run the acquisition as an explicit asyncio.create_task, awaited via asyncio.shield() so cancelling the caller no longer cancels the underlying acquisition task. If the caller is cancelled, keep (re-)waiting on the still-shielded acquisition task -- tolerating further cancellation during this cleanup by simply retrying -- until it actually finishes, release the lock immediately, and only then re-raise. This guarantees the lock is released regardless of when or how many times the caller is cancelled: before the acquisition is even scheduled, while queued, or after it has already been silently granted. Adds a regression test that holds the per-server lock, starts a second caller that has to wait for it, cancels that caller while it is genuinely blocked in its executor thread, releases the original holder, and asserts a third caller completes within a bounded asyncio.wait_for and still performs exactly one token fetch. Every potentially-hanging await is bounded so a regression fails the test quickly instead of hanging the suite.	2026-07-22 19:58:43 +08:00
March7	8c78d1f41f	fix(subagents): load user-scoped skills (#4356 )	2026-07-22 14:59:33 +08:00

1 2 3 4 5 ...

642 Commits