* feat(middleware): inject dynamic context via DynamicContextMiddleware
Move memory and current date out of the system prompt and into a
dedicated <system-reminder> HumanMessage injected once per session
(frozen-snapshot pattern) via a new DynamicContextMiddleware.
This keeps the system prompt byte-exact across all users and sessions,
enabling maximum Anthropic/Bedrock prefix-cache reuse.
Key design decisions:
- ID-swap technique: reminder takes the first HumanMessage's ID
(replacing it in-place via add_messages), original content gets a
derived `{id}__user` ID (appended after). Preserves correct ordering.
- hide_from_ui: True on reminder messages so frontend filters them out.
- Midnight crossing: date-update reminder injected before the current
turn's HumanMessage when the conversation spans midnight.
- INFO-level logging for production diagnostics.
Also adds prompt-caching breakpoint budget enforcement tests and
updates ClaudeChatModel docs to reference the new pattern.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* feat(token-usage): log input/output token detail breakdown in middleware
Extend the LLM token usage log line to include input_token_details and
output_token_details (cache_creation, cache_read, reasoning, audio, etc.)
when present. Adds tests covering Anthropic cache detail logging from
both usage_metadata and response_metadata.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix: fix nginx
* fix(middleware): always inject date; gate memory on injection_enabled
Date injection is now unconditional — it is part of the static system
prompt replacement and should always be present. Memory injection
remains gated by `memory.injection_enabled` in the app config.
Previously the entire DynamicContextMiddleware was skipped when
injection_enabled was False, which also suppressed the date.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(lint): format files and correct test assertions for token usage middleware
- ruff format dynamic_context_middleware.py and test_claude_provider_prompt_caching.py
- Remove unused pytest import from test_dynamic_context_middleware.py
- Fix two tests that asserted response_metadata fallback logic that
doesn't exist: replace with tests that match actual middleware behavior
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(middleware): address Copilot review comments on DynamicContextMiddleware
- Use additional_kwargs flag for reminder detection instead of content
substring matching, so user messages containing '<system-reminder>'
are not mistakenly treated as injected reminders
- Generate stable UUID when original HumanMessage.id is None to prevent
ambiguous 'None__user' derived IDs and message collisions
- Downgrade per-turn no-op log to DEBUG; keep actual injection events at INFO
- Add two new tests: missing-id UUID fallback and user-text false-positive
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(task): remove max_turns parameter from task tool interface
Subagents should always use their configured max_turns value. Exposing
this parameter allowed callers to override the admin-configured limit,
which is undesirable. The value is now exclusively driven by subagent
config (per-agent overrides and global defaults in config.yaml).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* Potential fix for pull request finding
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
* fix(tools): introduce Runtime type alias to eliminate Pydantic serialization warning
Add deerflow/tools/types.py with:
Runtime = ToolRuntime[dict[str, Any], ThreadState]
Replace every runtime: ToolRuntime[ContextT, ThreadState] and
runtime: ToolRuntime[dict[str, Any], ThreadState] annotation in
sandbox/tools.py, present_file_tool.py, task_tool.py, view_image_tool.py,
and skill_manage_tool.py with the new Runtime alias.
The unbound ContextT TypeVar (default None) caused
PydanticSerializationUnexpectedValue warnings on every tool call because
LangChain's BaseTool._parse_input calls model_dump() on the auto-generated
args_schema while DeerFlow passes a dict as runtime context.
Binding the context to dict[str, Any] aligns Pydantic's serialization
expectations with reality and removes the noise from all run modes.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(tools): extend Runtime alias to setup_agent and update_agent tools
Replace bare ToolRuntime annotations in setup_agent_tool.py and
update_agent_tool.py with the shared Runtime alias introduced in the
previous commit, and add both tools to the Pydantic serialization
warning regression test (13 cases total).
Co-authored-by: Cursor <cursoragent@cursor.com>
* test(tools): loosen Pydantic warning filter to avoid version-specific format
Replace the brittle "field_name='context'" substring check with a looser
"context" match so the assertion stays valid if Pydantic changes its
internal warning format across versions.
Co-authored-by: Cursor <cursoragent@cursor.com>
* test(tools): simplify warning filter and clean up docstring
Remove the "context" substring condition from the Pydantic warning
filter — asserting that no PydanticSerializationUnexpectedValue fires
at all is both simpler and more comprehensive, since the test payload
contains only the tool's own args plus runtime.
Also update the module docstring to remove the version-specific warning
format example that was inconsistent with the looser filter.
Co-authored-by: Cursor <cursoragent@cursor.com>
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
* Make loop detection configurable
Expose LoopDetectionMiddleware thresholds through config.yaml while preserving existing defaults and allowing the middleware to be disabled.
Refs bytedance/deer-flow#2517
* feat(loop-detection): add per-tool tool_freq_overrides to Phase 1
Adds ToolFreqOverride model and tool_freq_overrides field to
LoopDetectionConfig, wires it through LoopDetectionMiddleware, and
documents the option in config.example.yaml.
Resolves the gap flagged in the #2586 review: without per-tool overrides,
users hit by #2510/#2511 (RNA-seq workflows exceeding the bash hard limit)
had no way to raise thresholds for one tool without loosening the global
limit for every tool.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* Potential fix for pull request finding
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
* docs(loop-detection): document tool_freq_overrides in LoopDetectionMiddleware docstring
Add the missing Args entry for tool_freq_overrides, explaining the
(warn, hard_limit) tuple structure and how per-tool thresholds supersede
the global tool_freq_warn / tool_freq_hard_limit for named tools.
Also run ruff format on the three files flagged by the lint check.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(loop-detection): validate LoopDetectionMiddleware __init__ params eagerly
Raise clear ValueError at construction time instead of crashing at
unpack-time inside _track_and_check when bad values are passed:
- tool_freq_overrides: must be 2-tuples of positive ints with hard_limit >= warn
- scalar thresholds: warn_threshold, hard_limit, tool_freq_warn,
tool_freq_hard_limit must be >= 1 and hard limits must >= their warn pairs
- window_size, max_tracked_threads must be >= 1
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(test): isolate credential loader directory-path test from real ~/.claude
The test didn't monkeypatch HOME, so on any machine with real Claude Code
credentials at ~/.claude/.credentials.json the function fell through to
those credentials and the assertion failed. Adding HOME redirect ensures
the default credential path doesn't exist during the test.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* style(test): add blank lines after import pytest in TestInitValidation
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* refactor(loop-detection): collapse dual validation to LoopDetectionConfig
Modifications
- LoopDetectionMiddleware.__init__: stripped of all ValueError raises;
becomes a plain field-assignment constructor.
- LoopDetectionMiddleware.from_config: classmethod that builds the
middleware from a Pydantic-validated LoopDetectionConfig and handles
the ToolFreqOverride -> tuple[int, int] conversion.
- agents/factory.py: SDK construction routed through
LoopDetectionMiddleware.from_config(LoopDetectionConfig()) so the
defaults path is Pydantic-validated too.
- agents/lead_agent/agent.py: uses from_config instead of unpacking
config fields by hand.
- tests/test_loop_detection_middleware.py: deleted TestInitValidation
(16 methods exercising the removed __init__ checks); added
TestFromConfig (4 tests: scalar field mapping, override tuple
conversion, empty overrides, behavioral smoke test).
Result: one validation layer (Pydantic), zero duplication, no __new__
hacks. Both production construction sites flow through LoopDetectionConfig.
Test results
make test -> 2977 passed, 18 skipped, 0 failed (137s)
make format -> All checks passed; 411 files left unchanged
* feat(agents): make loop_detection configurable in create_deerflow_agent
Adds a `loop_detection: bool | AgentMiddleware = True` field to
RuntimeFeatures, mirroring the existing pattern used by `sandbox`,
`memory`, and `vision`. SDK users can now disable LoopDetectionMiddleware
or replace it with a custom instance built from their own
LoopDetectionConfig — e.g.
`LoopDetectionMiddleware.from_config(my_cfg)` — instead of being stuck
with the hardcoded defaults previously installed by the SDK factory.
The lead-agent path (which already reads AppConfig.loop_detection) is
unchanged, and the default `True` preserves prior always-on behavior for
all existing callers.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---------
Co-authored-by: knight0940 <631532668@qq.com>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Co-authored-by: Amorend <142649913+knight0940@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* feat(agent): add update_agent tool for in-chat custom-agent self-updates (#2616)
Custom agents had no built-in way to persist updates to their own SOUL.md /
config.yaml from a normal chat — `setup_agent` was only bound during the
bootstrap flow, so when the user asked the agent to refine its description
or personality, the agent would shell out via bash/write_file and the edits
landed in a temporary sandbox/tool workspace instead of
`{base_dir}/agents/{agent_name}/`.
Changes:
- New `update_agent` builtin tool with partial-update semantics (only the
fields you pass are written) and atomic temp-file + os.replace writes so
a failed update never corrupts existing SOUL.md / config.yaml.
- Lead agent now binds `update_agent` in the non-bootstrap path whenever
`agent_name` is set in the runtime context. Default agent (no
agent_name) and bootstrap flow are unchanged.
- New `<self_update>` system-prompt section is injected for custom agents,
instructing them to use `update_agent` — and explicitly NOT bash /
write_file — to persist self-updates.
- Tests: 11 new cases in `tests/test_update_agent_tool.py` covering
validation (missing/invalid agent_name, unknown agent, no fields),
partial updates (soul-only, description-only, skills=[] vs omitted),
no-op detection, atomic-write safety, and AgentConfig round-tripping;
plus 2 new cases in `tests/test_lead_agent_prompt.py` covering the
self-update prompt section.
- Docs: updated backend/CLAUDE.md builtin tools list and tools.mdx
(en/zh) with the new tool description.
* feat(agent): isolate custom agents per user
Store custom agent definitions under the effective user, keep legacy agents readable until migration, and cover API/tool/migration behavior with tests.
Co-authored-by: Cursor <cursoragent@cursor.com>
* feat: consistent write/delete targets & add --user-id to migration
---------
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(loop-detection): keep tool-call pairing on warn injection (#2724)
* make format
* fix(loop-detection): avoid IMMessage leak to downstream consumer
* fix(channels): filter loop warning text from IM replies
* fix(harness): restore legacy skills path fallback (#2694)
* fix(format): make format
* Potential fix for pull request finding
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
---------
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
* feat(community): add Serper web search provider
Add a new community search provider backed by the Serper Google Search
API (https://serper.dev). Serper returns real-time Google results via a
simple JSON API and requires only an API key — no extra Python package.
Changes:
- backend/packages/harness/deerflow/community/serper/__init__.py
- backend/packages/harness/deerflow/community/serper/tools.py
Implements web_search_tool using httpx (already a project dependency).
API key is read from config.yaml `api_key` field or SERPER_API_KEY env var.
Follows the same interface / output shape as the existing ddg_search provider.
Exposes max_results parameter (default 5) with config override logic.
- backend/tests/test_serper_tools.py
Unit tests covering API key resolution, config overrides, HTTP errors,
empty results, and parameter passing.
- config.example.yaml: add commented-out Serper example alongside other providers
- .env.example: add SERPER_API_KEY placeholder
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* Fix the lint error
* Fix the lint error
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* fix(gateway): return ISO 8601 timestamps from threads endpoints (#2594)
ThreadResponse documents created_at / updated_at as ISO timestamps,
matching the LangGraph Platform schema (langgraph_sdk.schema.Thread
exposes them as datetime, JSON-encoded as ISO 8601). The gateway
threads router was instead emitting str(time.time()) — unix-second
floats — breaking frontend new Date() parsing and producing a mixed
ISO/unix wire format that also corrupted the search sort order.
Centralize timestamp generation in deerflow.utils.time:
- now_iso() — datetime.now(UTC).isoformat()
- coerce_iso(x) — heals legacy unix-timestamp strings on read so the
store converges to ISO without a one-shot migration
threads.py: replace 6 time.time() call sites with now_iso(); wrap all
read paths and Phase-2 checkpoint metadata with coerce_iso(); _store_upsert
opportunistically heals legacy created_at on update; drop unused time import.
thread_runs.py: reuse now_iso() instead of a private duplicate _now_iso(),
preventing future drift between the two timestamp call sites.
Tests: 9 unit tests for the helper; 5 integration tests pinning the ISO
contract for create/get/patch/search and the legacy-healing path on the
internal store upsert. Full suite: 2144 passed, 15 skipped, 0 failed.
Closes#2594
* fix(gateway): coerce checkpoint metadata timestamps to ISO on read
After the merge with main, three additional read paths in ``threads.py``
were still emitting raw ``str(metadata.get("created_at", ""))`` —
``get_thread_state``, ``update_thread_state``, and ``get_thread_history``.
Same root cause as #2594: when the checkpoint metadata's ``created_at``
is a unix-second float (legacy data, or a checkpoint written by an older
Gateway version), ``str(float)`` produces ``"1777252410.411327"`` and the
frontend's ``new Date(...)`` returns ``Invalid Date``. The fix on the
``/threads/{id}`` GET path was already in place; these three sibling
endpoints needed the same treatment.
All four call sites now flow through ``coerce_iso``, so:
- legacy float metadata heals to ISO on the way out,
- ISO metadata passes through unchanged,
- ``datetime`` instances (which the new ``coerce_iso`` branch handles
explicitly) emit with the ``T`` separator instead of falling through
to the space-separated ``str(datetime)`` form.
Coverage added for the two endpoints not already pinned by the merge:
- ``test_get_thread_state_returns_iso_for_legacy_checkpoint_metadata``
- ``test_get_thread_history_returns_iso_for_legacy_checkpoint_metadata``
Both pre-seed a checkpoint whose metadata carries the literal float
from the issue body and assert the wire format is ISO.
* refactor: thread app config through lead prompt
* fix: honor explicit app config across runtime paths
* style: format subagent executor tests
* fix: thread resolved app config and guard subagents-only fallback
Address two PR review findings:
1. _create_summarization_middleware passed the original (possibly None)
app_config into create_chat_model, forcing the model factory back to
ambient get_app_config() and risking config drift between the
middleware's resolved view and the model's view. Pass the resolved
AppConfig instance through end-to-end.
2. get_available_subagent_names accepted Any-typed config and forwarded
it to is_host_bash_allowed, which reads ``.sandbox``. A
SubagentsAppConfig (also accepted upstream as a sum-type input) has
no ``.sandbox`` attribute and would be silently treated as "no
sandbox configured", incorrectly disabling the bash subagent. Guard
on hasattr and fall back to ambient lookup otherwise.
Adds regression tests for both paths.
* chore: simplify hasattr guard and tighten regression tests
- Collapse if/else into ternary in get_available_subagent_names; hasattr(None, ...) is False so the explicit None check was redundant.
- Drop comments that narrate the change rather than explain non-obvious WHY (test names already convey intent).
- Replace stringly-typed sentinel "no-arg" in regression test with direct args tuple comparison.
---------
Co-authored-by: greatmengqi <chenmengqi.0376@bytedance.com>
* fix(sandbox): pass no_change_timeout to exec_command to prevent 120s premature termination
The agent_sandbox library's shell API defaults no_change_timeout to 120
seconds. When AioSandbox.execute_command() called exec_command() without
this parameter, commands producing no output for 120s would return with
NO_CHANGE_TIMEOUT status even though the script was still running.
Pass no_change_timeout=600 to all exec_command calls (matching the
client-level HTTP timeout) so long-running commands are not cut short.
Fixes#2668
* test(sandbox): add assertions for no_change_timeout in execute_command and list_dir
Agent-Logs-Url: https://github.com/bytedance/deer-flow/sessions/2f37bc72-0826-4443-a6ba-e5b78c22fb5a
Co-authored-by: WillemJiang <219644+WillemJiang@users.noreply.github.com>
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
* fix(subagents): use model override for tools and middleware
* fix(config): resolve effective subagent model
* fix(subagents): defer app config loading
* fix(subagents): fully defer config.yaml load in executor __init__
The previous attempt only relocated the explicit get_app_config() call,
but left resolve_subagent_model_name(...) running eagerly in __init__.
That helper has its own internal get_app_config() fallback, which still
fired when both app_config and parent_model were None and
config.model == "inherit" — exactly the path unit tests hit, breaking
21 tests in CI with FileNotFoundError: config.yaml.
Skip the eager resolve in __init__ when it would require loading the
config file, and defer to _create_agent (which already has the
app_config or get_app_config() fallback).
* fix(harness): resolve runtime paths from project root
* docs(config): update
* fix(config): address runtime path review feedback
* test(config): fix skills path e2e root
* test(config): cover legacy config fallback when project root lacks config files
Verifies that when DEER_FLOW_PROJECT_ROOT is unset and cwd has no
config.yaml/extensions_config.json, AppConfig and ExtensionsConfig fall back
to the legacy backend/repo-root candidates — the backward-compat path
requested in PR #2642 review.
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* fix(agents): propagate agent_name into ToolRuntime.context for setup_agent (#2677)
When creating a custom agent via the web UI, SOUL.md was always written
to the global base_dir/SOUL.md instead of agents/<name>/SOUL.md.
Root cause: the bootstrap flow sends agent_name via body.context, but
two layers were broken:
1. services.py only forwarded body.context keys into config["configurable"];
config["context"] was never populated.
2. worker.py constructed the parent Runtime with a hard-coded
{thread_id, run_id} context, ignoring config["context"] entirely.
After the langgraph >= 1.1.9 bump (#98a5b34f), ToolRuntime.context no
longer falls back to configurable, so setup_agent's
runtime.context.get("agent_name") returned None and the tool's silent
agent_name=None -> base_dir fallback kicked in, overwriting the global
SOUL.md.
Fix:
- services.py: extract merge_run_context_overrides() and write the
whitelisted context keys into both configurable (legacy readers) and
context (langgraph 1.1+ ToolRuntime consumers).
- worker.py: extract _build_runtime_context() and merge config["context"]
into the Runtime's context (without letting callers override
thread_id/run_id).
The base_dir fallback in setup_agent_tool.py is left in place because
the IM /bootstrap channel command depends on it. That code path can
be tightened in a follow-up.
Adds regression tests covering both helpers.
* Apply suggestions from code review
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Centralize log level parsing in `logging_level_from_config()` and
application in `apply_logging_level()` within `deerflow.config.app_config`.
- Gateway lifespan applies configured log level on startup
- `debug.py` uses shared helpers instead of local duplicates
- `apply_logging_level()` targets only `deerflow`/`app` logger hierarchies
so third-party library verbosity is not affected; root handler levels
are only lowered (never raised) to allow configured loggers through
without suppressing third-party output; root logger level is not modified
- Config field description updated to clarify scope
- Tests save/restore global logging state to avoid test pollution
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
* fix(memory): replace short-lived asyncio.run() with persistent event loop to prevent zombie httpx connections
The memory updater used asyncio.run() inside daemon threads, creating
and destroying short-lived event loops on every update. Langchain
providers (e.g. langchain-anthropic) cache httpx AsyncClient instances
globally via @lru_cache, so SSL connections created on a loop that is
subsequently destroyed become zombie connections in the shared pool.
When the main agent's lead run later reuses one of these connections,
httpx/anyio triggers RuntimeError: Event loop is closed during
connection cleanup.
Replace the ThreadPoolExecutor + asyncio.run() pattern with a
_MemoryLoopRunner that maintains a single persistent event loop in a
daemon thread for the process lifetime. Since the loop never closes,
connections bound to it never become invalid. The _run_async_update_sync
function now submits coroutines to this persistent loop via
run_coroutine_threadsafe instead of creating throwaway loops.
* update the code to address the review comments
* Fix the review comments of 2615
P1 — user_id forwarded through sync path: Added user_id parameter to _prepare_update_prompt, _finalize_update, and _do_update_memory_sync, and forwarded it to get_memory_data(agent_name, user_id=user_id) and
save(..., user_id=user_id). The update_memory() entry point now passes user_id through both the executor.submit path and the direct call path. Added TestUserIdForwarding with two regression tests (sync + async)
verifying get_memory_data and save receive the correct user_id.
P2 — aupdate_memory() delegates to sync: Replaced the model.ainvoke() call with asyncio.to_thread(self._do_update_memory_sync, ...). This eliminates the unsafe async provider client path entirely — all memory
updater entry points now use the isolated sync model.invoke() path. Updated the test from asserting ainvoke is awaited to asserting invoke is called and ainvoke is not.
Nit — duplicate comment removed: Removed the duplicated # Matches sentences... comment on line 230.
* Chore(test): update the code of test_memory_updater
---------
Co-authored-by: rayhpeng <rayhpeng@gmail.com>
* refactor: thread app_config through middleware factories
Continues the incremental config-refactor sequence (#2611 root, #2612 lead
path) one layer deeper into the middleware factories. Two ambient lookups
inside _build_runtime_middlewares are eliminated and the LLMErrorHandling
band-aid removed:
- _build_runtime_middlewares / build_lead_runtime_middlewares /
build_subagent_runtime_middlewares now require app_config: AppConfig.
- get_guardrails_config() inside the factory is replaced with
app_config.guardrails (semantically identical — same default-factory
GuardrailsConfig — verified by direct equality check).
- LLMErrorHandlingMiddleware.__init__ now requires app_config and reads
circuit_breaker fields directly. The class-level
circuit_failure_threshold / circuit_recovery_timeout_sec defaults are
removed along with the try/except (FileNotFoundError, RuntimeError):
pass band-aid — the let-it-crash invariant the rest of the refactor
enforces.
Caller chain (already-resolved app_config sources):
- _build_middlewares in lead_agent/agent.py: reorder so
resolved_app_config = app_config or get_app_config() is computed BEFORE
build_lead_runtime_middlewares is called, then passed as kwarg.
- SubagentExecutor: optional app_config parameter (mirrors the lead-agent
pattern); _create_agent does the same `or get_app_config()` fallback at
agent-build time, so task_tool callers don't need to plumb app_config
through yet (typed-context plumbing for tool runtimes is a separate
refactor).
Tests:
- test_llm_error_handling_middleware: _make_app_config helper using
AppConfig(sandbox=SandboxConfig(use="test")) — same minimal-config
pattern conftest already uses. Three direct LLMErrorHandlingMiddleware()
calls each followed by post-construction circuit_breaker mutation fold
cleanly into _build_middleware(circuit_failure_threshold=...,
circuit_recovery_timeout_sec=...).
Verification:
- tests/test_llm_error_handling_middleware.py — 14 passed
- tests/test_subagent_executor.py — 28 passed
- tests/test_tool_error_handling_middleware.py — 6 passed
- tests/test_task_tool_core_logic.py — 18 passed (verifies task_tool
unchanged behavior)
- Full suite: 2697 passed, 3 skipped. The single intermittent failure in
tests/test_client_e2e.py::test_tool_call_produces_events is pre-existing
LLM flakiness (the test asserts the model decided to call a tool;
reproduces 1/3 on unchanged main as well).
* fix: address middleware app config review comments
* fix: satisfy app config annotation lint
* test: cover explicit app config middleware wiring
---------
Co-authored-by: greatmengqi <chenmengqi.0376@bytedance.com>
* feat(models): 适配 MindIE引擎的模型
* test: add unit tests for MindIEChatModel adapter and fix PR review comments
* chore: update uv.lock with pytest-asyncio
* build: add pytest-asyncio to test dependencies
* fix: address PR review comments (lazy import, cache clients, safe newline escape, strict xml regex)
* fix(mindie): preserve string args without JSON quotes in XML tool call serialization
* fix(mindie): preserve string args without JSON quotes in XML tool call serialization
* test_mindie_provider:format
* Potential fix for pull request finding
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
* fix(mindie): prevent nested tool_call params from leaking into outer args
* fixed by escaping XML entities in _fix_messages and unescaping during parse, with regression tests added.
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
* fix(sandbox): block host bash traversal escapes
Fixes#2535
* fix(sandbox): harden local bash path guards
* fix(sandbox): avoid bash cd argument false positives
* Fix the lint error
Add function to resolve and validate user data path.
* Fix the lint error
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* fix(security): harden auth system and fix run journal logic bug
- Fix inverted condition in RunJournal.on_chat_model_start that prevented
first human message capture (not messages → messages)
- Pre-hash passwords with SHA-256 before bcrypt to avoid silent 72-byte
truncation vulnerability
- Move load_dotenv() from module scope into get_auth_config() to prevent
import-time os.environ mutation breaking test isolation
- Return generic ‘Invalid token’ instead of exposing specific error
variants (expired, malformed, invalid_signature) to clients
- Make @require_auth independently enforce 401 instead of silently
passing through when AuthMiddleware is absent
- Rate-limit /setup-status endpoint with per-IP cooldown to mitigate
initialization-state information leak
- Document in-process rate limiter limitation for multi-worker deployments
* fix(security): return 429+Retry-After on setup-status rate limit, bound cooldown dict
Agent-Logs-Url: https://github.com/bytedance/deer-flow/sessions/070d0be8-99a5-46c8-85bb-6b81b5284021
Co-authored-by: WillemJiang <219644+WillemJiang@users.noreply.github.com>
* fix(security): add versioned password hashes with auto-migration on login
The SHA-256 pre-hash change silently broke verification for any existing
bcrypt-only password hashes. Introduce a <N>$ prefix scheme so hashes
are self-describing:
- v2 (current): bcrypt(b64(sha256(password))) with $ prefix
- v1 (legacy): plain bcrypt, prefixed $ or bare (no prefix)
verify_password auto-detects the version and falls back to v1 for older
hashes. LocalAuthProvider.authenticate() now rehashes legacy hashes to v2
on successful login via needs_rehash(), so existing users upgrade
transparently without a dedicated migration step.
* fix(auth): harden verify_password, best-effort rehash, update require_auth docstring, downgrade journal logging
- password.py: wrap bcrypt.checkpw in try/except → return False for malformed/corrupt hashes instead of crashing
- local_provider.py: wrap auto-rehash update_user() in try/except so transient DB errors don't fail valid logins
- authz.py: update require_auth docstring to reflect independent 401 enforcement
- journal.py: downgrade on_chat_model_start from INFO to DEBUG, log only metadata (batch_count, message_counts) instead of full serialized/messages content
Agent-Logs-Url: https://github.com/bytedance/deer-flow/sessions/48c5cf31-a4ab-418a-982a-6343c37bb299
Co-authored-by: WillemJiang <219644+WillemJiang@users.noreply.github.com>
* fix(auth): address code review - narrow ValueError catch, add rehash warning log, rename num_batches
Agent-Logs-Url: https://github.com/bytedance/deer-flow/sessions/48c5cf31-a4ab-418a-982a-6343c37bb299
Co-authored-by: WillemJiang <219644+WillemJiang@users.noreply.github.com>
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>