deer-flow

mirror of https://github.com/bytedance/deer-flow.git synced 2026-07-27 00:17:53 +00:00

Author	SHA1	Message	Date
Ryker_Feng	20debf9cc7	feat(agents): per-agent model and generation settings (#4347 ) * feat(agents): per-agent model and generation settings Let each custom agent choose its own model and sampling settings (temperature, max_tokens) plus thinking / reasoning_effort defaults, so agents sharing a model profile are no longer stuck with one shared temperature and output length (#4336). AgentConfig gains optional model_settings / thinking_enabled / reasoning_effort (None = inherit). create_chat_model applies per-caller model_overrides on top of the profile before the thinking/Codex transforms; the lead agent resolves each knob with precedence request > agent config > profile/default. The /api/agents create/update routes persist the fields and reject an unknown model. The default lead agent path is unchanged (no agent config -> overrides None). The agent chat composer also stops force-overriding an agent's configured default model with models[0]. * fix(agents): tri-state thinking control and default-model capability gating The model-settings dialog seeded the thinking switch to false, so opening it to tweak temperature and saving silently disabled thinking (the runtime default is on) with no way back to inherit. It also hid the thinking / reasoning controls whenever the agent inherited the global default model, since `__default__` never resolved through `models.find`. Give thinking an explicit Inherit / On / Off tri-state so an untouched save is a no-op, and resolve `__default__` to the effective default (models[0]) for the capability check. Logic lives in the tested helpers module.	2026-07-22 13:44:55 +08:00
Yufeng He	ae223199fd	fix(security): escape MindIE tool-response content against </tool_response> breakout (#4253 ) Signed-off-by: Yufeng He <40085740+he-yufeng@users.noreply.github.com>	2026-07-17 14:23:44 +08:00
Aari	e2816eaa97	fix(models): scope the OpenAI-compat rules to BaseChatOpenAI, not a class-path allowlist (#4146 ) * fix(models): scope the OpenAI-compat rules to BaseChatOpenAI, not a class-path allowlist * address review: drop redundant stream_usage helper, close test matrix - Remove _enable_stream_usage_by_default and its now-unused _OPENAI_COMPAT_USE_PATHS tuple. The class-field stream_usage fallback already sets stream_usage=True for every BaseChatOpenAI subclass (they all declare the field), so the helper's use-path allowlist gated nothing real — verified a no-op in prod, and the two stream_usage tests stay green on main with the helper present. Those tests used a BaseChatModel stub that does not declare the field; point them at a real ChatOpenAI capturing class so they exercise the fallback they now depend on. - Give the non-OpenAI normalization-skip test an actual api_base value so it exercises the skip path (api_base passed through verbatim, never rewritten to base_url). - Add an Unreleased CHANGELOG entry for the api_base behavior change on the five affected subclasses. --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com>	2026-07-14 09:32:53 +08:00
Daoyuan Li	3e7baba39a	fix(models): apply stream_chunk_timeout default to all BaseChatOpenAI subclasses (#4102 ) * fix(models): apply stream_chunk_timeout default to all BaseChatOpenAI subclasses The 240s stream_chunk_timeout default (issue #3189, PR #3195) was scoped to a class-path allowlist of only ChatOpenAI and PatchedChatOpenAI. Every other OpenAI-compatible provider that subclasses BaseChatOpenAI — VllmChatModel, MindIEChatModel, PatchedChatDeepSeek, PatchedChatMiMo, PatchedChatStepFun and PatchedChatMiniMax — was excluded, so they kept langchain-openai's aggressive 120s built-in chunk-gap timeout and, worse, silently discarded a user's explicit stream_chunk_timeout override from config.yaml. Issue #3189 was itself reported on mimo-v2.5 (PatchedChatMiMo), the exact class the original fix left out. Gate the injection on issubclass(model_class, BaseChatOpenAI) instead of the string allowlist, so any OpenAI-compatible subclass inherits the default and honors an explicit override. Genuinely non-OpenAI clients (e.g. ChatAnthropic) stay excluded and still have the kwarg dropped before it reaches a constructor that would divert it into model_kwargs and fail at request time. * fix(models): address review nits on stream_chunk_timeout default Correct the module-level comment above _DEFAULT_STREAM_CHUNK_TIMEOUT_SECONDS: langchain-openai's built-in stream_chunk_timeout default is 120s, not 60s (BaseChatOpenAI.stream_chunk_timeout's default_factory reads LANGCHAIN_OPENAI_STREAM_CHUNK_TIMEOUT_S with a 120.0 fallback). Simplify the BaseChatOpenAI gate in _apply_stream_chunk_timeout_default from `isinstance(model_class, type) and issubclass(model_class, BaseChatOpenAI)` to just `issubclass(...)`. The sole caller passes model_class from resolve_class(), which already raises before returning anything that isn't a type, so the isinstance half can never be False there. Also soften the docstring's non-OpenAI-client bullet: ChatAnthropic declares extra="ignore" and silently drops an unrecognized kwarg rather than diverting it into model_kwargs and failing at request time (that failure mode is specific to other OpenAI-style clients).	2026-07-13 21:28:44 +08:00
Prax Lannister	f6a910dec9	fix(models): normalize api_base->base_url for ChatOpenAI + warn on unknown config keys (#3790 ) ModelConfig is `extra="allow"`, so a config key like `api_base` (which config.example.yaml uses for other model classes, e.g. PatchedChatDeepSeek and Moonshot) gets copied onto a `langchain_openai:ChatOpenAI` model by users. LangChain's OpenAI client does not reject the unknown kwarg — it transfers it into `model_kwargs` (with a UserWarning), which is then spread into every `Completions.create()` call and rejected by the OpenAI SDK at REQUEST time with an opaque `unexpected keyword argument 'api_base'` error. The endpoint override is also silently dropped, so the model targets the wrong base URL. Changes in factory.py, mirroring the existing OpenAI-compatible helpers: - `_normalize_openai_base_url`: renames `api_base` -> `base_url` for the OpenAI-compatible family (ChatOpenAI + PatchedChatOpenAI); when an endpoint key is already present, drops the alias with a warning. Runs before the stream_usage/stream_chunk_timeout heuristics so they see the canonical key. - `_warn_unknown_model_settings`: scoped to the same OpenAI-compatible family (where the model_kwargs divert-and-crash actually happens and the field/alias set is accurate), logs an actionable warning for unrecognized config keys. - The three OpenAI-compatible helpers now share `_OPENAI_COMPAT_USE_PATHS` instead of disagreeing on the literal class string. Adds a note to docs/CONFIGURATION.md and 9 tests covering normalization (ChatOpenAI + PatchedChatOpenAI, both-set precedence for base_url and openai_api_base, non-OpenAI class untouched, no-op when unset) and the unknown-key warning (fires on a typo, silent on a clean config and on non-OpenAI providers like ChatAnthropic).	2026-07-05 10:17:25 +08:00
codeingforcoffee	4669d3c089	feat(gateway): cache-aware cost accounting (#3920 ) * feat(gateway): cache-aware cost accounting + /api/console observability endpoints - Capture prompt-cache hits (usage_metadata.input_token_details.cache_read) in RunJournal and SubagentTokenCollector as a sparse cache_read_tokens key in token_usage_by_model (JSON field — no schema migration; legacy bucket shapes unchanged) - New read-only /api/console router: GET /stats (headline counters), GET /runs (cross-thread paginated history joined with thread titles), GET /usage (zero-filled daily token series + per-model breakdown); user-scoped, 503 on the memory database backend - Optional models[].pricing (currency, input_per_million, output_per_million, input_cache_hit_per_million) powers real spend estimation; cache-hit input tokens are billed at the hit price (omitted hit price falls back to the miss price as a conservative upper bound); unpriced models yield cost: null - create_chat_model strips the presentation-only pricing block so it never reaches the provider client (unknown kwargs are forwarded into the completion payload and break live calls) - Tests: console router SQLite round-trips, journal/collector cache capture incl. a DeepSeek raw-usage pin test, factory strip regression Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> refactor: address review feedback on cost sum and sparse cache_read_tokens - console.py: replace the walrus-in-generator total-cost sum with an explicit loop (review noted the multi-line form reads ambiguously) - token_collector.py: omit cache_read_tokens from usage records when the provider reported no cache hits, matching the journal's sparse per-model bucket shape; absent is treated as 0 downstream - add a regression test pinning the sparse record shape Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> --------- Co-authored-by: coffeeFish <codeingforcoffee@users.noreply.github.com> Co-authored-by: Claude Fable 5 <noreply@anthropic.com>	2026-07-04 23:14:46 +08:00
hataa	37337b77f9	feat(models): add StepFun reasoning model adapter (#3461 ) Add PatchedChatStepFun adapter for StepFun reasoning models (step-3.7-flash, step-3.5-flash). Captures reasoning from both streaming and non-streaming responses and replays it on historical assistant messages for multi-turn tool-call conversations. - New: PatchedChatStepFun adapter with streaming/non-streaming reasoning capture - Support both reasoning and reasoning_content field names - 17 unit tests covering all response paths - Updated: config.example.yaml with StepFun configuration example	2026-06-09 18:01:43 +08:00
DanielWalnut	cd5bedaa74	feat: MiniMax provider for image/video/podcast skills + new music-generation skill (#3437 ) * docs(spec): MiniMax integration for generation skills + new music skill Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs(plan): MiniMax generation providers implementation plan Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(skills): add importlib loader + FakeResp for skill tests * test(skills): register loaded module in sys.modules; raise requests.HTTPError in FakeResp * feat(image-generation): add MiniMax provider with env auto-detect Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * refactor(image-generation): guard unknown provider, derive ref MIME, strengthen tests Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(video-generation): add MiniMax provider with async poll/download Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * refactor(video-generation): surface base_resp errors while polling; add timeout test * feat(podcast-generation): add MiniMax t2a_v2 provider with env auto-detect Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * refactor(podcast-generation): restore TTS credential guard; add volcengine + voice tests Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(music-generation): new MiniMax music skill via skill-creator Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * refactor(music-generation): treat empty lyrics as absent; test no-audio-data path * refactor(skills): add request timeouts to MiniMax network calls Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Potential fix for pull request finding 'Explicit returns mixed with implicit (fall through) returns' Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com> * fix(models): strip inconsistent user-message names for MiniMax chat DeerFlow middlewares tag user messages with provenance names (user-input, summary, loop_warning); langchain serializes them into the OpenAI-compatible payload and MiniMax rejects mismatched user-message names with "user name must be consistent (2013)". PatchedChatMiniMax now drops the per-message name from user-role messages. Point the config.example MiniMax models at PatchedChatMiniMax so they also get reasoning_content mapping. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(image-generation): MiniMax sends JSON prompt field, guard 1500-char limit MiniMax image-01 takes one text string capped at 1500 chars, but the skill was sending the whole structured JSON. The MiniMax provider now extracts the JSON `prompt` field (relying on prompt_optimizer to expand it) and fails fast with a clear error before calling the API when that field exceeds 1500 chars. Authoring stays provider-agnostic; Gemini still receives the full JSON. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(podcast-generation): per-provider TTS concurrency and retry/backoff Each TTS provider owns its concurrency internally — MiniMax runs single-threaded to reduce rate-limit failures, Volcengine keeps 4 workers — with automatic retry and backoff on transient HTTP and base_resp errors. No caller-facing concurrency knob. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(skills): address Copilot review comments on generation skills - video: add raise_for_status + timeout to the Gemini download/POST/poll calls so non-2xx responses surface as clear HTTP errors instead of JSON/KeyError or hangs - video: check the task Fail status before the generic base_resp check so the failure keeps its task_id context - video/image: create the output file parent directory before writing (matching music-generation) so nested output paths do not raise FileNotFoundError - music: require a non-empty prompt and fail fast with ValueError instead of sending an empty prompt to the API Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(scripts): reclaim dev ports across worktrees in make stop/dev All deer-flow worktrees (main checkout + linked worktrees) hardcode the same dev ports (8001/3000/2026), so a service started from any worktree must be reclaimable from another. stop_all now resolves the set of worktree roots (DEERFLOW_ROOTS) and treats a process as deer-flow-owned when its open files live under any of them. It also force-kills survivors on 2026 alongside 8001/3000, fixing `make dev` aborting on the nginx port preflight when a prior nginx lingered on 2026. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(view-image): hide the injected image-context message from the UI ViewImageMiddleware injects a HumanMessage (text + base64 images) so the vision model can see viewed images, but it was the only internal injector that set neither hide_from_ui nor a hidden name, so it leaked into the chat UI (and IM channels) as a user bubble reading "Here are the images you've viewed:". Mark it with additional_kwargs={"hide_from_ui": True}, matching todo/dynamic_context injections, which the frontend isHiddenFromUIMessage and the channel sender already honor. The model still receives the full content. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(minimax): mark M2.7 models as text-only (no vision) MiniMax M2.7 / M2.7-highspeed do not support vision; only M3 does. The provider config asserted vision support for M2.7 in four places. - config.example.yaml: 4 M2.7 entries -> supports_vision: false - backend/docs/CONFIGURATION.md: M2.7 + highspeed -> supports_vision: false - wizard: add LLMProvider.model_vision_overrides + extra_config_for() so selecting an M2.7 model writes supports_vision: false while M3 (default) keeps vision; wire it through setup_wizard.py - tests: M2.7-highspeed fixture -> supports_vision=False; add test_minimax_vision_is_per_model Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Co-authored-by: Willem Jiang <willem.jiang@gmail.com> Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>	2026-06-08 22:04:38 +08:00
Huixin615	88e36d9686	fix(#3189 ): prevent write_file streaming timeout on long reports (#3195 ) * fix(#3189): prevent write_file streaming timeout on long reports Adds a layered defense against StreamChunkTimeoutError caused by oversized single-shot write_file tool calls: - factory: default stream_chunk_timeout to 240s for OpenAI-compatible clients (overridable via ModelConfig.stream_chunk_timeout in config.yaml) - sandbox/tools: server-side 80 KB length guard on non-append write_file calls (configurable via DEERFLOW_WRITE_FILE_MAX_BYTES env var, 0 disables); rejects oversized payloads with a structured error pointing the model at str_replace or append=True - middleware: classify StreamChunkTimeoutError as transient but cap retries at 1 via per-exception _RETRY_BUDGET_OVERRIDES (same-payload retry on a chunk-gap timeout buffers the same way upstream; full 3-attempt loop would stack 6-12 min of dead air) - middleware: surface an actionable user-facing message for stream-drop exceptions instead of leaking the raw langchain stack - prompts: add a routing-style File Editing Workflow hint to both lead_agent and general_purpose subagent prompts, pointing the model at str_replace for incremental edits (mirrors Claude Code's Edit / Codex's apply_patch) - tests: behavioural coverage for size guard, retry budget override, stream-drop user message, factory default injection Refs #3189 * fix(#3189): drop stream_chunk_timeout for non-OpenAI providers Address CR feedback on PR #3195: - factory: pop `stream_chunk_timeout` from kwargs for any model_use_path other than `langchain_openai:ChatOpenAI` instead of returning early. `ModelConfig.stream_chunk_timeout` is part of the shared schema, so a user-supplied value on a non-OpenAI provider would otherwise be forwarded to its constructor and raise `TypeError: unexpected keyword argument`. - factory: rewrite docstring to describe the actual `exclude_none=True` behaviour (explicit null is excluded and falls back to the default) instead of the misleading "None falling out via exclude_none=True keeps its value". - tests: add regression coverage asserting the kwarg is stripped before reaching a non-OpenAI provider's constructor. Refs: bytedance#3189 * fix(#3189): restrict stream-drop user copy to StreamChunkTimeoutError only Per CR on #3195: narrow _STREAM_DROP_EXCEPTIONS to StreamChunkTimeoutError. Generic httpx RemoteProtocolError / ReadError fall back to the standard 'temporarily unavailable' copy, since they routinely fire on transient network blips where the 'split the output' guidance is misleading. Retry/backoff classification is unchanged — both remain transient/retriable. Tests updated to reflect new copy, plus a symmetric regression test for ReadError. --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com>	2026-06-07 17:47:11 +08:00
AochenShen99	4093c83383	refactor(provider): share assistant payload replay matching (#3307 ) * Share assistant payload replay matching * fix(provider): recover assistant field when ordinal AI index is taken The mismatch-length fallback in `_match_ai_message` only tried the exact `fallback_ordinal` AI index. When serialization drops or reorders an assistant message, a unique signature match can consume a non-ordinal index, leaving a later ambiguous payload's ordinal already used — so its provider field (e.g. `reasoning_content`) was silently dropped. Scan forward from the ordinal for the next unused `AIMessage` (wrapping to earlier indices) to preserve the positional bias while still recovering the field. Forward scanning avoids a naive min-unused pick that could restore the wrong field after a leading message is dropped. Add a regression test for the dropped-leading-message case. * fix(provider): avoid earlier assistant fallback replay	2026-05-29 23:05:59 +08:00
AochenShen99	44677c5eb4	feat(provider) Add patched MiMo reasoning content support (#3298 ) * Add patched MiMo reasoning content support * Clarify MiMo patched model coverage * Remove unused MiMo payload index * Address MiMo review nits	2026-05-28 18:24:32 +08:00
Xinmin Zeng	df95154282	fix(tracing): propagate session_id and user_id into Langfuse traces (#2944 ) * fix(tracing): propagate session_id and user_id into Langfuse traces Adds Langfuse v4 reserved trace attributes (langfuse_session_id, langfuse_user_id, langfuse_trace_name, langfuse_tags) to RunnableConfig.metadata inside the run worker, so the langchain CallbackHandler can lift them onto the root trace. - New deerflow.tracing.metadata.build_langfuse_trace_metadata() returns the reserved keys when Langfuse is in the enabled providers, else {}. - worker.run_agent merges them with setdefault so caller-supplied keys win, allowing per-request overrides from upstream metadata. - session_id mirrors the LangGraph thread_id; user_id reads get_effective_user_id() (falls back to "default" in no-auth mode). - trace_name defaults to "lead-agent"; tags carry env and model name when DEER_FLOW_ENV (or ENVIRONMENT) and a model name are present. Closes #2930 * fix(tracing): attach Langfuse callback at graph root so metadata propagates The first commit injected ``langfuse_session_id`` / ``langfuse_user_id`` / ``langfuse_trace_name`` / ``langfuse_tags`` into ``RunnableConfig.metadata``, but on ``main`` the Langfuse callback is attached at model level (``models/factory.py``). LangChain still threads ``parent_run_id`` through the contextvar, so the handler sees the model as a nested observation and ``__on_llm_action`` strips the ``langfuse_`` keys (``keep_langfuse_trace_attributes=False``). The trace's top-level ``sessionId`` / ``userId`` therefore stayed empty in deer-flow's LangGraph runtime — confirmed live against a real Langfuse instance. This commit moves the callback to the graph invocation root* so the handler fires ``on_chain_start(parent_run_id=None)`` and runs the ``propagate_attributes`` path that actually lifts ``session_id`` / ``user_id`` onto the trace: - ``models/factory.py``: add ``attach_tracing`` keyword (default ``True``) so standalone callers (``MemoryUpdater``, etc.) keep their direct model-level tracing. - ``agents/lead_agent/agent.py``: call ``build_tracing_callbacks()`` once inside ``_make_lead_agent`` and append the result to ``config["callbacks"]``; the four in-graph ``create_chat_model`` sites (bootstrap, default agent, sync + async summarization) pass ``attach_tracing=False`` to avoid duplicate spans. - ``agents/middlewares/title_middleware.py``: same ``attach_tracing=False`` for the title-generation model, since it inherits the graph's RunnableConfig via ``_get_runnable_config``. Test updates: - ``tests/test_lead_agent_model_resolution.py`` and ``tests/test_title_middleware_core_logic.py``: extend the fake ``create_chat_model`` signatures / mock assertions to accept the new ``attach_tracing`` kwarg. - ``tests/test_worker_langfuse_metadata.py``: switch the no-user fallback test from direct ContextVar mutation to ``monkeypatch.setattr`` on ``get_effective_user_id`` to avoid pollution across the langfuse OTel global tracer provider. - ``tests/conftest.py``: add an autouse fixture that resets ``deerflow.config.title_config._title_config`` to its pristine default after every test. Any test that loads the real ``config.yaml`` (via ``get_app_config()``) calls ``load_title_config_from_dict`` and mutates the module-level singleton, which previously poisoned the title-middleware suite when run after, e.g., the new ``test_worker_langfuse_metadata.py`` cases. The fixture is independent of this PR's main change but unblocks the cross-file test run. Live verification (same Langfuse instance as before): - Drove ``worker.run_agent`` against the real ``make_lead_agent`` + ``gpt-4o-mini`` for three distinct ``user_context`` identities (``fancy-engineer``, ``alice-pm``, ``bob-designer``). - Each run produced one ``lead-agent`` trace whose top-level ``sessionId`` / ``userId`` / ``tags`` carry the expected values, e.g. ``session=e2e-2930-8f347c-alice-pm user=alice-pm name='lead-agent' tags=['model:gpt-4o-mini']``. Refs #2930. * fix(tracing): extend root-callback + metadata injection to the embedded client Addresses Copilot review on PR #2944. Commit 2 disabled model-level tracing for ``TitleMiddleware`` and ``_create_summarization_middleware`` because ``_make_lead_agent`` now attaches the tracing callbacks at the graph invocation root. But the embedded ``DeerFlowClient`` does not call ``_make_lead_agent`` — it calls ``_build_middlewares`` directly and never appends the tracing handlers to its ``RunnableConfig``. So under the embedded path, title-generation and summarization LLM calls were left untraced — a regression introduced by this PR. This commit mirrors the gateway worker's injection in ``DeerFlowClient.stream``: - Append ``build_tracing_callbacks()`` to ``config["callbacks"]`` so the Langfuse handler sees ``on_chain_start(parent_run_id=None)`` at the graph root and runs the ``propagate_attributes`` path. - Merge ``build_langfuse_trace_metadata(...)`` into ``config["metadata"]`` with ``setdefault`` so caller-supplied keys still win. - ``_ensure_agent`` now creates its main model with ``attach_tracing=False`` to avoid duplicate spans now that the callback lives at the graph root. Docs: - ``backend/CLAUDE.md`` Tracing section rewritten to describe the graph-root attachment model (replacing the inaccurate "at model-creation time" wording). - ``README.md`` Langfuse section now lists both injection points (worker + client) instead of only the worker path. Tests: - ``tests/test_client_langfuse_metadata.py`` (new, 3 cases): callbacks + metadata are injected when Langfuse is enabled, caller-supplied metadata overrides win via ``setdefault``, and the injection is inert when Langfuse is disabled. Live verification on the real Langfuse instance: === user=fancy-client === id=cbd22847.. session=client-2930-6b9491-fancy-client user=fancy-client name='lead-agent' === user=alice-client === id=b4f6f576.. session=client-2930-6b9491-alice-client user=alice-client name='lead-agent' Refs #2930. * refactor(tracing): address maintainer review on PR #2944 Addresses @WillemJiang's 5 comments. 1. Duplicated metadata-injection code between worker.py and client.py New ``deerflow.tracing.inject_langfuse_metadata(config, ...)`` helper takes the 10-line build + merge + setdefault logic that was duplicated in ``runtime/runs/worker.py`` and ``client.py``. Both callers now share a single source of truth, so the two paths cannot drift. 2. Direct private-attribute mutation in conftest.py and tests Added public ``reset_tracing_config()`` / ``reset_title_config()`` functions. ``tests/conftest.py`` and every test that previously did ``tracing_module._tracing_config = None`` or ``title_module._title_config = TitleConfig()`` now goes through the public API. A future internal rename will surface as an ImportError instead of a silent no-op. 3. client.py reading os.environ directly ``DeerFlowClient.__init__`` grows an optional ``environment`` parameter so programmatic callers can pass the deployment label explicitly. ``stream()`` consults ``self._environment`` first and only falls back to ``DEER_FLOW_ENV`` / ``ENVIRONMENT`` env vars when nothing was passed in. Backwards compatible — env-var behaviour preserved for callers that opt to keep using it. 4. build_tracing_callbacks() cached on hot path Not implemented. Inspected the langfuse v4 ``langchain.CallbackHandler`` constructor: it only resolves the module-level singleton client via ``get_client()`` and initialises a few dicts (no I/O, no env parsing at construction time). The build is essentially free. Caching would trade a non-measurable speedup for two real risks: handler instances carry per-run state internally (``_run_states``, ``_root_run_states``, ``last_trace_id``), and tracing config can be reloaded by env-var changes between runs. Will revisit if profiling ever shows it as a hot spot. 5. attach_tracing=False easy to forget at new in-graph call sites - Module docstring at the top of ``lead_agent/agent.py`` documents the invariant ("every in-graph ``create_chat_model`` MUST pass ``attach_tracing=False``") and enumerates the current sites. - New regression test ``test_make_lead_agent_attaches_tracing_callbacks_at_graph_root`` in ``tests/test_lead_agent_model_resolution.py`` locks both halves of the invariant: ``config["callbacks"]`` carries the tracing handler after ``_make_lead_agent``, AND every ``create_chat_model`` call captured by the test passes ``attach_tracing=False``. A future in-graph site that forgets the flag will fail this test. Lint clean. Full touched-suite bundle: 246 passed. --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com>	2026-05-21 16:49:31 +08:00
DanielWalnut	c1b7f1d189	feat: static system prompt with DynamicContextMiddleware for prefix-cache optimization (#2801 ) * feat(middleware): inject dynamic context via DynamicContextMiddleware Move memory and current date out of the system prompt and into a dedicated <system-reminder> HumanMessage injected once per session (frozen-snapshot pattern) via a new DynamicContextMiddleware. This keeps the system prompt byte-exact across all users and sessions, enabling maximum Anthropic/Bedrock prefix-cache reuse. Key design decisions: - ID-swap technique: reminder takes the first HumanMessage's ID (replacing it in-place via add_messages), original content gets a derived `{id}__user` ID (appended after). Preserves correct ordering. - hide_from_ui: True on reminder messages so frontend filters them out. - Midnight crossing: date-update reminder injected before the current turn's HumanMessage when the conversation spans midnight. - INFO-level logging for production diagnostics. Also adds prompt-caching breakpoint budget enforcement tests and updates ClaudeChatModel docs to reference the new pattern. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(token-usage): log input/output token detail breakdown in middleware Extend the LLM token usage log line to include input_token_details and output_token_details (cache_creation, cache_read, reasoning, audio, etc.) when present. Adds tests covering Anthropic cache detail logging from both usage_metadata and response_metadata. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: fix nginx * fix(middleware): always inject date; gate memory on injection_enabled Date injection is now unconditional — it is part of the static system prompt replacement and should always be present. Memory injection remains gated by `memory.injection_enabled` in the app config. Previously the entire DynamicContextMiddleware was skipped when injection_enabled was False, which also suppressed the date. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(lint): format files and correct test assertions for token usage middleware - ruff format dynamic_context_middleware.py and test_claude_provider_prompt_caching.py - Remove unused pytest import from test_dynamic_context_middleware.py - Fix two tests that asserted response_metadata fallback logic that doesn't exist: replace with tests that match actual middleware behavior Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(middleware): address Copilot review comments on DynamicContextMiddleware - Use additional_kwargs flag for reminder detection instead of content substring matching, so user messages containing '<system-reminder>' are not mistakenly treated as injected reminders - Generate stable UUID when original HumanMessage.id is None to prevent ambiguous 'None__user' derived IDs and message collisions - Downgrade per-turn no-op log to DEBUG; keep actual injection events at INFO - Add two new tests: missing-id UUID fallback and user-text false-positive Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-09 09:27:02 +08:00
Willem Jiang	bb8b234d85	chroe(2585): keep polishing the code of codex token usage (#2689 )	2026-05-02 15:04:11 +08:00
KiteEater	866d1ca409	Populate Codex usage metadata for token accounting (#2585 )	2026-05-02 11:16:03 +08:00
Willem Jiang	844ad8e528	Merge branch 'main' into release/2.0-rc	2026-04-28 15:44:02 +08:00
pyp0327	395c14357b	chore(adpator):Adapt MindIE engine model and improve testing and fixes (#2523 ) * feat(models): 适配 MindIE引擎的模型 * test: add unit tests for MindIEChatModel adapter and fix PR review comments * chore: update uv.lock with pytest-asyncio * build: add pytest-asyncio to test dependencies * fix: address PR review comments (lazy import, cache clients, safe newline escape, strict xml regex) * fix(mindie): preserve string args without JSON quotes in XML tool call serialization * fix(mindie): preserve string args without JSON quotes in XML tool call serialization * test_mindie_provider:format * Potential fix for pull request finding Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> * fix(mindie): prevent nested tool_call params from leaking into outer args * fixed by escaping XML entities in _fix_messages and unescaping during parse, with regression tests added. --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com> Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>	2026-04-28 15:09:31 +08:00
greatmengqi	e82940c03d	refactor: thread release config through lead path (#2612 ) Co-authored-by: greatmengqi <chenmengqi.0376@bytedance.com>	2026-04-28 14:53:18 +08:00
rayhpeng	d8ecaf46c9	feat(persistence): add unified persistence layer with event store, token tracking, and feedback (#1930 ) * feat(persistence): add SQLAlchemy 2.0 async ORM scaffold Introduce a unified database configuration (DatabaseConfig) that controls both the LangGraph checkpointer and the DeerFlow application persistence layer from a single `database:` config section. New modules: - deerflow.config.database_config — Pydantic config with memory/sqlite/postgres backends - deerflow.persistence — async engine lifecycle, DeclarativeBase with to_dict mixin, Alembic skeleton - deerflow.runtime.runs.store — RunStore ABC + MemoryRunStore implementation Gateway integration initializes/tears down the persistence engine in the existing langgraph_runtime() context manager. Legacy checkpointer config is preserved for backward compatibility. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(persistence): add RunEventStore ABC + MemoryRunEventStore Phase 2-A prerequisite for event storage: adds the unified run event stream interface (RunEventStore) with an in-memory implementation, RunEventsConfig, gateway integration, and comprehensive tests (27 cases). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(persistence): add ORM models, repositories, DB/JSONL event stores, RunJournal, and API endpoints Phase 2-B: run persistence + event storage + token tracking. - ORM models: RunRow (with token fields), ThreadMetaRow, RunEventRow - RunRepository implements RunStore ABC via SQLAlchemy ORM - ThreadMetaRepository with owner access control - DbRunEventStore with trace content truncation and cursor pagination - JsonlRunEventStore with per-run files and seq recovery from disk - RunJournal (BaseCallbackHandler) captures LLM/tool/lifecycle events, accumulates token usage by caller type, buffers and flushes to store - RunManager now accepts optional RunStore for persistent backing - Worker creates RunJournal, writes human_message, injects callbacks - Gateway deps use factory functions (RunRepository when DB available) - New endpoints: messages, run messages, run events, token-usage - ThreadCreateRequest gains assistant_id field - 92 tests pass (33 new), zero regressions Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(persistence): add user feedback + follow-up run association Phase 2-C: feedback and follow-up tracking. - FeedbackRow ORM model (rating +1/-1, optional message_id, comment) - FeedbackRepository with CRUD, list_by_run/thread, aggregate stats - Feedback API endpoints: create, list, stats, delete - follow_up_to_run_id in RunCreateRequest (explicit or auto-detected from latest successful run on the thread) - Worker writes follow_up_to_run_id into human_message event metadata - Gateway deps: feedback_repo factory + getter - 17 new tests (14 FeedbackRepository + 3 follow-up association) - 109 total tests pass, zero regressions Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test+config: comprehensive Phase 2 test coverage + deprecate checkpointer config - config.example.yaml: deprecate standalone checkpointer section, activate unified database:sqlite as default (drives both checkpointer + app data) - New: test_thread_meta_repo.py (14 tests) — full ThreadMetaRepository coverage including check_access owner logic, list_by_owner pagination - Extended test_run_repository.py (+4 tests) — completion preserves fields, list ordering desc, limit, owner_none returns all - Extended test_run_journal.py (+8 tests) — on_chain_error, track_tokens=false, middleware no ai_message, unknown caller tokens, convenience fields, tool_error, non-summarization custom event - Extended test_run_event_store.py (+7 tests) — DB batch seq continuity, make_run_event_store factory (memory/db/jsonl/fallback/unknown) - Extended test_phase2b_integration.py (+4 tests) — create_or_reject persists, follow-up metadata, summarization in history, full DB-backed lifecycle - Fixed DB integration test to use proper fake objects (not MagicMock) for JSON-serializable metadata - 157 total Phase 2 tests pass, zero regressions Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * config: move default sqlite_dir to .deer-flow/data Keep SQLite databases alongside other DeerFlow-managed data (threads, memory) under the .deer-flow/ directory instead of a top-level ./data folder. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor(persistence): remove UTFJSON, use engine-level json_serializer + datetime.now() - Replace custom UTFJSON type with standard sqlalchemy.JSON in all ORM models. Add json_serializer=json.dumps(ensure_ascii=False) to all create_async_engine calls so non-ASCII text (Chinese etc.) is stored as-is in both SQLite and Postgres. - Change ORM datetime defaults from datetime.now(UTC) to datetime.now(), remove UTC imports. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor(gateway): simplify deps.py with getter factory + inline repos - Replace 6 identical getter functions with _require() factory. - Inline 3 _make__repo() factories into langgraph_runtime(), call get_session_factory() once instead of 3 times. - Add thread_meta upsert in start_run (services.py). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> feat(docker): add UV_EXTRAS build arg for optional dependencies Support installing optional dependency groups (e.g. postgres) at Docker build time via UV_EXTRAS build arg: UV_EXTRAS=postgres docker compose build Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor(journal): fix flush, token tracking, and consolidate tests RunJournal fixes: - _flush_sync: retain events in buffer when no event loop instead of dropping them; worker's finally block flushes via async flush(). - on_llm_end: add tool_calls filter and caller=="lead_agent" guard for ai_message events; mark message IDs for dedup with record_llm_usage. - worker.py: persist completion data (tokens, message count) to RunStore in finally block. Model factory: - Auto-inject stream_usage=True for BaseChatOpenAI subclasses with custom api_base, so usage_metadata is populated in streaming responses. Test consolidation: - Delete test_phase2b_integration.py (redundant with existing tests). - Move DB-backed lifecycle test into test_run_journal.py. - Add tests for stream_usage injection in test_model_factory.py. - Clean up executor/task_tool dead journal references. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(events): widen content type to str\|dict in all store backends Allow event content to be a dict (for structured OpenAI-format messages) in addition to plain strings. Dict values are JSON-serialized for the DB backend and deserialized on read; memory and JSONL backends handle dicts natively. Trace truncation now serializes dicts to JSON before measuring. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(events): use metadata flag instead of heuristic for dict content detection Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(converters): add LangChain-to-OpenAI message format converters Pure functions langchain_to_openai_message, langchain_to_openai_completion, langchain_messages_to_openai, and _infer_finish_reason for converting LangChain BaseMessage objects to OpenAI Chat Completions format, used by RunJournal for event storage. 15 unit tests added. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(converters): handle empty list content as null, clean up test Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(events): human_message content uses OpenAI user message format Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(events): ai_message uses OpenAI format, add ai_tool_call message event - ai_message content now uses {"role": "assistant", "content": "..."} format - New ai_tool_call message event emitted when lead_agent LLM responds with tool_calls - ai_tool_call uses langchain_to_openai_message converter for consistent format - Both events include finish_reason in metadata ("stop" or "tool_calls") Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(events): add tool_result message event with OpenAI tool message format Cache tool_call_id from on_tool_start keyed by run_id as fallback for on_tool_end, then emit a tool_result message event (role=tool, tool_call_id, content) after each successful tool completion. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(events): summary content uses OpenAI system message format Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(events): replace llm_start/llm_end with llm_request/llm_response in OpenAI format Add on_chat_model_start to capture structured prompt messages as llm_request events. Replace llm_end trace events with llm_response using OpenAI Chat Completions format. Track llm_call_index to pair request/response events. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(events): add record_middleware method for middleware trace events Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test(events): add full run sequence integration test for OpenAI content format Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(events): align message events with checkpoint format and add middleware tag injection - Message events (ai_message, ai_tool_call, tool_result, human_message) now use BaseMessage.model_dump() format, matching LangGraph checkpoint values.messages - on_tool_end extracts tool_call_id/name/status from ToolMessage objects - on_tool_error now emits tool_result message events with error status - record_middleware uses middleware:{tag} event_type and middleware category - Summarization custom events use middleware:summarize category - TitleMiddleware injects middleware:title tag via get_config() inheritance - SummarizationMiddleware model bound with middleware:summarize tag - Worker writes human_message using HumanMessage.model_dump() Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(threads): switch search endpoint to threads_meta table and sync title - POST /api/threads/search now queries threads_meta table directly, removing the two-phase Store + Checkpointer scan approach - Add ThreadMetaRepository.search() with metadata/status filters - Add ThreadMetaRepository.update_display_name() for title sync - Worker syncs checkpoint title to threads_meta.display_name on run completion - Map display_name to values.title in search response for API compatibility Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(threads): history endpoint reads messages from event store - POST /api/threads/{thread_id}/history now combines two data sources: checkpointer for checkpoint_id, metadata, title, thread_data; event store for messages (complete history, not truncated by summarization) - Strip internal LangGraph metadata keys from response - Remove full channel_values serialization in favor of selective fields Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: remove duplicate optional-dependencies header in pyproject.toml Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(middleware): pass tagged config to TitleMiddleware ainvoke call Without the config, the middleware:title tag was not injected, causing the LLM response to be recorded as a lead_agent ai_message in run_events. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: resolve merge conflict in .env.example Keep both DATABASE_URL (from persistence-scaffold) and WECOM credentials (from main) after the merge. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(persistence): address review feedback on PR #1851 - Fix naive datetime.now() → datetime.now(UTC) in all ORM models - Fix seq race condition in DbRunEventStore.put() with FOR UPDATE and UNIQUE(thread_id, seq) constraint - Encapsulate _store access in RunManager.update_run_completion() - Deduplicate _store.put() logic in RunManager via _persist_to_store() - Add update_run_completion to RunStore ABC + MemoryRunStore - Wire follow_up_to_run_id through the full create path - Add error recovery to RunJournal._flush_sync() lost-event scenario - Add migration note for search_threads breaking change - Fix test_checkpointer_none_fix mock to set database=None Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: update uv.lock Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(persistence): address 22 review comments from CodeQL, Copilot, and Code Quality Bug fixes: - Sanitize log params to prevent log injection (CodeQL) - Reset threads_meta.status to idle/error when run completes - Attach messages only to latest checkpoint in /history response - Write threads_meta on POST /threads so new threads appear in search Lint fixes: - Remove unused imports (journal.py, migrations/env.py, test_converters.py) - Convert lambda to named function (engine.py, Ruff E731) - Remove unused logger definitions in repos (Ruff F841) - Add logging to JSONL decode errors and empty except blocks - Separate assert side-effects in tests (CodeQL) - Remove unused local variables in tests (Ruff F841) - Fix max_trace_content truncation to use byte length, not char length Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * style: apply ruff format to persistence and runtime files Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Potential fix for pull request finding 'Statement has no effect' Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com> * refactor(runtime): introduce RunContext to reduce run_agent parameter bloat Extract checkpointer, store, event_store, run_events_config, thread_meta_repo, and follow_up_to_run_id into a frozen RunContext dataclass. Add get_run_context() in deps.py to build the base context from app.state singletons. start_run() uses dataclasses.replace() to enrich per-run fields before passing ctx to run_agent. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor(gateway): move sanitize_log_param to app/gateway/utils.py Extract the log-injection sanitizer from routers/threads.py into a shared utils module and rename to sanitize_log_param (public API). Eliminates the reverse service → router import in services.py. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * perf: use SQL aggregation for feedback stats and thread token usage Replace Python-side counting in FeedbackRepository.aggregate_by_run with a single SELECT COUNT/SUM query. Add RunStore.aggregate_tokens_by_thread abstract method with SQL GROUP BY implementation in RunRepository and Python fallback in MemoryRunStore. Simplify the thread_token_usage endpoint to delegate to the new method, eliminating the limit=10000 truncation risk. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: annotate DbRunEventStore.put() as low-frequency path Add docstring clarifying that put() opens a per-call transaction with FOR UPDATE and should only be used for infrequent writes (currently just the initial human_message event). High-throughput callers should use put_batch() instead. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(threads): fall back to Store search when ThreadMetaRepository is unavailable When database.backend=memory (default) or no SQL session factory is configured, search_threads now queries the LangGraph Store instead of returning 503. Returns empty list if neither Store nor repo is available. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor(persistence): introduce ThreadMetaStore ABC for backend-agnostic thread metadata Add ThreadMetaStore abstract base class with create/get/search/update/delete interface. ThreadMetaRepository (SQL) now inherits from it. New MemoryThreadMetaStore wraps LangGraph BaseStore for memory-mode deployments. deps.py now always provides a non-None thread_meta_repo, eliminating all `if thread_meta_repo is not None` guards in services.py, worker.py, and routers/threads.py. search_threads no longer needs a Store fallback branch. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor(history): read messages from checkpointer instead of RunEventStore The /history endpoint now reads messages directly from the checkpointer's channel_values (the authoritative source) instead of querying RunEventStore.list_messages(). The RunEventStore API is preserved for other consumers. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(persistence): address new Copilot review comments - feedback.py: validate thread_id/run_id before deleting feedback - jsonl.py: add path traversal protection with ID validation - run_repo.py: parse `before` to datetime for PostgreSQL compat - thread_meta_repo.py: fix pagination when metadata filter is active - database_config.py: use resolve_path for sqlite_dir consistency Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Implement skill self-evolution and skill_manage flow (#1874) * chore: ignore .worktrees directory * Add skill_manage self-evolution flow * Fix CI regressions for skill_manage * Address PR review feedback for skill evolution * fix(skill-evolution): preserve history on delete * fix(skill-evolution): tighten scanner fallbacks * docs: add skill_manage e2e evidence screenshot * fix(skill-manage): avoid blocking fs ops in session runtime --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com> * fix(config): resolve sqlite_dir relative to CWD, not Paths.base_dir resolve_path() resolves relative to Paths.base_dir (.deer-flow), which double-nested the path to .deer-flow/.deer-flow/data/app.db. Use Path.resolve() (CWD-relative) instead. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Feature/feishu receive file (#1608) * feat(feishu): add channel file materialization hook for inbound messages - Introduce Channel.receive_file(msg, thread_id) as a base method for file materialization; default is no-op. - Implement FeishuChannel.receive_file to download files/images from Feishu messages, save to sandbox, and inject virtual paths into msg.text. - Update ChannelManager to call receive_file for any channel if msg.files is present, enabling downstream model access to user-uploaded files. - No impact on Slack/Telegram or other channels (they inherit the default no-op). * style(backend): format code with ruff for lint compliance - Auto-formatted packages/harness/deerflow/agents/factory.py and tests/test_create_deerflow_agent.py using `ruff format` - Ensured both files conform to project linting standards - Fixes CI lint check failures caused by code style issues * fix(feishu): handle file write operation asynchronously to prevent blocking * fix(feishu): rename GetMessageResourceRequest to _GetMessageResourceRequest and remove redundant code * test(feishu): add tests for receive_file method and placeholder replacement * fix(manager): remove unnecessary type casting for channel retrieval * fix(feishu): update logging messages to reflect resource handling instead of image * fix(feishu): sanitize filename by replacing invalid characters in file uploads * fix(feishu): improve filename sanitization and reorder image key handling in message processing * fix(feishu): add thread lock to prevent filename conflicts during file downloads * fix(test): correct bad merge in test_feishu_parser.py * chore: run ruff and apply formatting cleanup fix(feishu): preserve rich-text attachment order and improve fallback filename handling * fix(docker): restore gateway env vars and fix langgraph empty arg issue (#1915) Two production docker-compose.yaml bugs prevent `make up` from working: 1. Gateway missing DEER_FLOW_CONFIG_PATH and DEER_FLOW_EXTENSIONS_CONFIG_PATH environment overrides. Added in fb2d99f (#1836) but accidentally reverted by ca2fb95 (#1847). Without them, gateway reads host paths from .env via env_file, causing FileNotFoundError inside the container. 2. Langgraph command fails when LANGGRAPH_ALLOW_BLOCKING is unset (default). Empty $${allow_blocking} inserts a bare space between flags, causing ' --no-reload' to be parsed as unexpected extra argument. Fix by building args string first and conditionally appending --allow-blocking. Co-authored-by: cooper <cooperfu@tencent.com> * fix(frontend): resolve invalid HTML nesting and tabnabbing vulnerabilities (#1904) * fix(frontend): resolve invalid HTML nesting and tabnabbing vulnerabilities Fix `<button>` inside `<a>` invalid HTML in artifact components and add missing `noopener,noreferrer` to `window.open` calls to prevent reverse tabnabbing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(frontend): address Copilot review on tabnabbing and double-tab-open Remove redundant parent onClick on web_fetch ChainOfThoughtStep to prevent opening two tabs on link click, and explicitly null out window.opener after window.open() for defensive tabnabbing hardening. --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> * refactor(persistence): organize entities into per-entity directories Restructure the persistence layer from horizontal "models/ + repositories/" split into vertical entity-aligned directories. Each entity (thread_meta, run, feedback) now owns its ORM model, abstract interface (where applicable), and concrete implementations under a single directory with an aggregating __init__.py for one-line imports. Layout: persistence/thread_meta/{base,model,sql,memory}.py persistence/run/{model,sql}.py persistence/feedback/{model,sql}.py models/__init__.py is kept as a facade so Alembic autogenerate continues to discover all ORM tables via Base.metadata. RunEventRow remains under models/run_event.py because its storage implementation lives in runtime/events/store/db.py and has no matching repository directory. The repositories/ directory is removed entirely. All call sites in gateway/deps.py and tests are updated to import from the new entity packages, e.g.: from deerflow.persistence.thread_meta import ThreadMetaRepository from deerflow.persistence.run import RunRepository from deerflow.persistence.feedback import FeedbackRepository Full test suite passes (1690 passed, 14 skipped). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(gateway): sync thread rename and delete through ThreadMetaStore The POST /threads/{id}/state endpoint previously synced title changes only to the LangGraph Store via _store_upsert. In sqlite mode the search endpoint reads from the ThreadMetaRepository SQL table, so renames never appeared in /threads/search until the next agent run completed (worker.py syncs title from checkpoint to thread_meta in its finally block). Likewise the DELETE /threads/{id} endpoint cleaned up the filesystem, Store, and checkpointer but left the threads_meta row orphaned in sqlite, so deleted threads kept appearing in /threads/search. Fix both endpoints by routing through the ThreadMetaStore abstraction which already has the correct sqlite/memory implementations wired up by deps.py. The rename path now calls update_display_name() and the delete path calls delete() — both work uniformly across backends. Verified end-to-end with curl in gateway mode against sqlite backend. Existing test suite (1690 passed) and focused router/repo tests pass. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor(gateway): route all thread metadata access through ThreadMetaStore Following the rename/delete bug fix in PR1, migrate the remaining direct LangGraph Store reads/writes in the threads router and services to the ThreadMetaStore abstraction so that the sqlite and memory backends behave identically and the legacy dual-write paths can be removed. Migrated endpoints (threads.py): - create_thread: idempotency check + write now use thread_meta_repo.get/create instead of dual-writing the LangGraph Store and the SQL row. - get_thread: reads from thread_meta_repo.get; the checkpoint-only fallback for legacy threads is preserved. - patch_thread: replaced _store_get/_store_put with thread_meta_repo.update_metadata. - delete_thread_data: dropped the legacy store.adelete; thread_meta_repo.delete already covers it. Removed dead code (services.py): - _upsert_thread_in_store — redundant with the immediately following thread_meta_repo.create() call. - _sync_thread_title_after_run — worker.py's finally block already syncs the title via thread_meta_repo.update_display_name() after each run. Removed dead code (threads.py): - _store_get / _store_put / _store_upsert helpers (no remaining callers). - THREADS_NS constant. - get_store import (router no longer touches the LangGraph Store directly). New abstract method: - ThreadMetaStore.update_metadata(thread_id, metadata) merges metadata into the thread's metadata field. Implemented in both ThreadMetaRepository (SQL, read-modify-write inside one session) and MemoryThreadMetaStore. Three new unit tests cover merge / empty / nonexistent behaviour. Net change: -134 lines. Full test suite: 1693 passed, 14 skipped. Verified end-to-end with curl in gateway mode against sqlite backend (create / patch / get / rename / search / delete). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com> Co-authored-by: DanielWalnut <45447813+hetaoBackend@users.noreply.github.com> Co-authored-by: Willem Jiang <willem.jiang@gmail.com> Co-authored-by: JilongSun <965640067@qq.com> Co-authored-by: jie <49781832+stan-fu@users.noreply.github.com> Co-authored-by: cooper <cooperfu@tencent.com> Co-authored-by: yangzheli <43645580+yangzheli@users.noreply.github.com>	2026-04-26 11:05:47 +08:00
Octopus	1f59e945af	fix: cap prompt caching breakpoints at 4 to prevent API 400 errors (#2449 ) * fix: cap prompt caching breakpoints at 4 to prevent API 400 errors (fixes #2448) The previous _apply_prompt_caching() attached cache_control to every text block in the system prompt, every content block in the last N messages, and the last tool definition. In multi-turn conversations with structured content blocks this easily exceeded the 4-breakpoint hard limit enforced by both the Anthropic API and AWS Bedrock, producing a 400 Bad Request (or a silent "No generations found in stream" when streaming). Fix: collect all candidate blocks in document order, then apply cache_control only to the last MAX_CACHE_BREAKPOINTS (4) of them. Later breakpoints cover a larger prefix and therefore yield better cache hit rates, making this the optimal placement strategy as well as the safe one. Adds 13 unit tests covering the budget cap, edge cases, and correct last-candidate placement. * docs: clarify _apply_prompt_caching docstring includes tool definitions Per Copilot review: the implementation also caches the last tool definition (see the candidates list at lines 202-205), so the docstring summary should explicitly mention tools alongside system and recent messages. * Fix the lint error * style: fix ruff format check for test_claude_provider_prompt_caching.py Add the missing blank line before the 'Edge cases' section comment so that ruff format --check passes in CI. --------- Co-authored-by: octo-patch <octo-patch@github.com> Co-authored-by: Willem Jiang <willem.jiang@gmail.com>	2026-04-25 19:40:06 +08:00
pyp0327	2bb1a2dfa2	feat(models): Provider for MindIE model engine (#2483 ) * feat(models): 适配 MindIE引擎的模型 * test: add unit tests for MindIEChatModel adapter and fix PR review comments * chore: update uv.lock with pytest-asyncio * build: add pytest-asyncio to test dependencies * fix: address PR review comments (lazy import, cache clients, safe newline escape, strict xml regex) --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com>	2026-04-25 08:59:03 +08:00
Admire	c99865f53d	fix(token-usage): enable stream usage for openai-compatible models (#2217 ) * fix(token-usage): enable stream usage for openai-compatible models * fix(token-usage): narrow stream_usage default to ChatOpenAI	2026-04-19 22:42:55 +08:00
shivam johri	194bab4691	feat(config): add when_thinking_disabled support for model configs (#1970 ) * feat(config): add when_thinking_disabled support for model configs Allow users to explicitly configure what parameters are sent to the model when thinking is disabled, via a new `when_thinking_disabled` field in model config. This mirrors the existing `when_thinking_enabled` pattern and takes full precedence over the hardcoded disable behavior when set. Backwards compatible — existing configs work unchanged. Closes #1675 * fix(config): address copilot review — gate when_thinking_disabled independently - Switch truthiness check to `is not None` so empty dict overrides work - Restructure disable path so when_thinking_disabled is gated independently of has_thinking_settings, allowing it to work without when_thinking_enabled - Update test to reflect new behavior	2026-04-09 18:49:00 +08:00
Xun	1b74d84590	fix: resolve missing serialized kwargs in PatchedChatDeepSeek (#2025 ) * add tests * fix ci * fix ci	2026-04-09 16:07:16 +08:00
Octopus	616caa92b1	fix(models): resolve duplicate keyword argument error when reasoning_effort appears in both config and kwargs (#2017 ) When a model config includes `reasoning_effort` as an extra YAML field (ModelConfig uses `extra="allow"`), and the thinking-disabled code path also injects `reasoning_effort="minimal"` into kwargs, the previous `model_class(kwargs, model_settings_from_config)` call raises: TypeError: got multiple values for keyword argument 'reasoning_effort' Fix by merging the two dicts before instantiation, giving runtime kwargs precedence over config values: `{model_settings_from_config, kwargs}`. Fixes #1977 Co-authored-by: octo-patch <octo-patch@github.com>	2026-04-09 15:09:39 +08:00
Async23	0948c7a4e1	fix(provider): preserve streamed Codex output when response.completed.output is empty (#1928 ) * fix: preserve streamed Codex output items * fix: prefer completed Codex output over streamed placeholders	2026-04-07 18:21:22 +08:00
NmanQAQ	dd30e609f7	feat(models): add vLLM provider support (#1860 ) support for vLLM 0.19.0 OpenAI-compatible chat endpoints and fixes the Qwen reasoning toggle so flash mode can actually disable thinking. Co-authored-by: NmanQAQ <normangyao@qq.com> Co-authored-by: Willem Jiang <willem.jiang@gmail.com>	2026-04-06 15:18:34 +08:00
totoyang	2d1f90d5dc	feat(tracing): add optional Langfuse support (#1717 ) * feat(tracing): add optional Langfuse support * Fix tracing fail-fast behavior for explicitly enabled providers * fix(lint)	2026-04-02 13:06:10 +08:00
Markus Corazzione	5ceb19f6f6	fix(oauth): Harden Claude OAuth cache-control handling (#1583 )	2026-03-30 07:41:18 +08:00
greatmengqi	084dc7e748	ci: enforce code formatting checks for backend and frontend (#1536 )	2026-03-29 15:34:38 +08:00
taka6745	43ef3691a5	fix(oauth): inject billing header for Claude oAuth Models (#1442 ) * fix(oauth): inject billing header for non-Haiku model access The Anthropic Messages API requires a billing identification block in the system prompt when using Claude Code OAuth tokens (sk-ant-oat) to access non-Haiku models (Opus, Sonnet). Without it, the API returns a generic 400 "Error" with no actionable detail. This was discovered by intercepting Claude Code CLI requests — the CLI injects an `x-anthropic-billing-header` text block as the first system prompt entry on every request. Third-party consumers of the same OAuth tokens must do the same. Changes: - Add `_apply_oauth_billing()` to `ClaudeChatModel` that prepends the billing header block to the system prompt when `_is_oauth` is True - Add `metadata.user_id` with device/session identifiers (required by the API alongside the billing header) - Called from `_get_request_payload()` before prompt caching runs Verified with Claude Max OAuth tokens against all three model tiers: - claude-opus-4-6: 200 OK - claude-sonnet-4-6: 200 OK - claude-haiku-4-5-20251001: 200 OK (was already working) Closes #1245 fix(oauth): address review feedback on billing header injection - Make OAUTH_BILLING_HEADER configurable via ANTHROPIC_BILLING_HEADER env var - Normalize billing block to always be first in system list (strip + reinsert) - Guard metadata with isinstance check for non-dict values - Replace os.uname() with socket.gethostname() for Windows compat - Fix docstrings to say "all OAuth requests" instead of "non-Haiku" - Move inline imports to module level (fixes ruff I001) - Add 9 unit tests for _apply_oauth_billing Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-28 08:49:34 +08:00
Admire	b9583f7204	Fix Windows backend test compatibility (#1384 ) * Fix Windows backend test compatibility * Preserve ACP path style on Windows * Fix installer import ordering * Address review comments for Windows fixes --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com>	2026-03-26 17:39:16 +08:00
Willem Jiang	a087fe7bcc	fix(LLM): fixing Gemini thinking + tool calls via OpenAI gateway (#1180 ) (#1205 ) * fix(LLM): fixing Gemini thinking + tool calls via OpenAI gateway (#1180) When using Gemini with thinking enabled through an OpenAI-compatible gateway, the API requires that fields on thinking content blocks are preserved and echoed back verbatim in subsequent requests. Standard silently drops these signatures when serializing messages, causing HTTP 400 errors: Changes: - Add PatchedChatOpenAI adapter that re-injects signed thinking blocks into request payloads, preserving the signature chain across multi-turn conversations with tool calls. - Support two LangChain storage patterns: additional_kwargs.thinking_blocks and content list. - Add 11 unit tests covering signed/unsigned blocks, storage patterns, edge cases, and precedence rules. - Update config.example.yaml with Gemini + thinking gateway example. - Update CONFIGURATION.md with detailed guidance and error explanation. Fixes: #1180 * Updated the patched_openai.py with thought_signature of function call * Apply suggestions from code review Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * docs: fix inaccurate thought_signature description in CONFIGURATION.md (#1220) * Initial plan * docs: fix CONFIGURATION.md wording for thought_signature - tool-call objects, not thinking blocks Co-authored-by: WillemJiang <219644+WillemJiang@users.noreply.github.com> Agent-Logs-Url: https://github.com/bytedance/deer-flow/sessions/360f5226-4631-48a7-a050-189094af8ffe --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: WillemJiang <219644+WillemJiang@users.noreply.github.com> --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>	2026-03-26 15:07:05 +08:00
DanielWalnut	d119214fee	feat(harness): integration ACP agent tool (#1344 ) * refactor: extract shared utils to break harness→app cross-layer imports Move _validate_skill_frontmatter to src/skills/validation.py and CONVERTIBLE_EXTENSIONS + convert_file_to_markdown to src/utils/file_conversion.py. This eliminates the two reverse dependencies from client.py (harness layer) into gateway/routers/ (app layer), preparing for the harness/app package split. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor: split backend/src into harness (deerflow.) and app (app.) Physically split the monolithic backend/src/ package into two layers: - Harness (`packages/harness/deerflow/`): publishable agent framework package with import prefix `deerflow.`. Contains agents, sandbox, tools, models, MCP, skills, config, and all core infrastructure. - App* (`app/`): unpublished application code with import prefix `app.`. Contains gateway (FastAPI REST API) and channels (IM integrations). Key changes: - Move 13 harness modules to packages/harness/deerflow/ via git mv - Move gateway + channels to app/ via git mv - Rename all imports: src. → deerflow.* (harness) / app.* (app layer) - Set up uv workspace with deerflow-harness as workspace member - Update langgraph.json, config.example.yaml, all scripts, Docker files - Add build-system (hatchling) to harness pyproject.toml - Add PYTHONPATH=. to gateway startup commands for app.* resolution - Update ruff.toml with known-first-party for import sorting - Update all documentation to reflect new directory structure Boundary rule enforced: harness code never imports from app. All 429 tests pass. Lint clean. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * chore: add harness→app boundary check test and update docs Add test_harness_boundary.py that scans all Python files in packages/harness/deerflow/ and fails if any `from app.` or `import app.` statement is found. This enforces the architectural rule that the harness layer never depends on the app layer. Update CLAUDE.md to document the harness/app split architecture, import conventions, and the boundary enforcement test. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add config versioning with auto-upgrade on startup When config.example.yaml schema changes, developers' local config.yaml files can silently become outdated. This adds a config_version field and auto-upgrade mechanism so breaking changes (like src.* → deerflow.* renames) are applied automatically before services start. - Add config_version: 1 to config.example.yaml - Add startup version check warning in AppConfig.from_file() - Add scripts/config-upgrade.sh with migration registry for value replacements - Add `make config-upgrade` target - Auto-run config-upgrade in serve.sh and start-daemon.sh before starting services - Add config error hints in service failure messages Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix comments * fix: update src.* import in test_sandbox_tools_security to deerflow.* Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: handle empty config and search parent dirs for config.example.yaml Address Copilot review comments on PR #1131: - Guard against yaml.safe_load() returning None for empty config files - Search parent directories for config.example.yaml instead of only looking next to config.yaml, fixing detection in common setups Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: correct skills root path depth and config_version type coercion - loader.py: fix get_skills_root_path() to use 5 parent levels (was 3) after harness split, file lives at packages/harness/deerflow/skills/ so parent×3 resolved to backend/packages/harness/ instead of backend/ - app_config.py: coerce config_version to int() before comparison in _check_config_version() to prevent TypeError when YAML stores value as string (e.g. config_version: "1") - tests: add regression tests for both fixes Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: update test imports from src.* to deerflow./app. after harness refactor Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(harness): add tool-first ACP agent invocation (#37) * feat(harness): add tool-first ACP agent invocation * build(harness): make ACP dependency required * fix(harness): address ACP review feedback * feat(harness): decouple ACP agent workspace from thread data ACP agents (codex, claude-code) previously used per-thread workspace directories, causing path resolution complexity and coupling task execution to DeerFlow's internal thread data layout. This change: - Replace _resolve_cwd() with a fixed _get_work_dir() that always uses {base_dir}/acp-workspace/, eliminating virtual path translation and thread_id lookups - Introduce /mnt/acp-workspace virtual path for lead agent read-only access to ACP agent output files (same pattern as /mnt/skills) - Add security guards: read-only validation, path traversal prevention, command path allowlisting, and output masking for acp-workspace - Update system prompt and tool description to guide LLM: send self-contained tasks to ACP agents, copy results via /mnt/acp-workspace - Add 11 new security tests for ACP workspace path handling Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor(prompt): inject ACP section only when ACP agents are configured The ACP agent guidance in the system prompt is now conditionally built by _build_acp_section(), which checks get_acp_agents() and returns an empty string when no ACP agents are configured. This avoids polluting the prompt with irrelevant instructions for users who don't use ACP. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix lint * fix(harness): address Copilot review comments on sandbox path handling and ACP tool - local_sandbox: fix path-segment boundary bug in _resolve_path (== or startswith +"/") and add lookahead in _resolve_paths_in_command regex to prevent /mnt/skills matching inside /mnt/skills-extra - local_sandbox_provider: replace print() with logger.warning(..., exc_info=True) - invoke_acp_agent_tool: guard getattr(option, "optionId") with None default + continue; move full prompt from INFO to DEBUG level (truncated to 200 chars) - sandbox/tools: fix _get_acp_workspace_host_path docstring to match implementation; remove misleading "read-only" language from validate_local_bash_command_paths Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(acp): thread-isolated workspaces, permission guardrail, and ContextVar registry P1.1 – ACP workspace thread isolation - Add `Paths.acp_workspace_dir(thread_id)` for per-thread paths - `_get_work_dir(thread_id)` in invoke_acp_agent_tool now uses `{base_dir}/threads/{thread_id}/acp-workspace/`; falls back to global workspace when thread_id is absent or invalid - `_invoke` extracts thread_id from `RunnableConfig` via `Annotated[RunnableConfig, InjectedToolArg]` - `sandbox/tools.py`: `_get_acp_workspace_host_path(thread_id)`, `_resolve_acp_workspace_path(path, thread_id)`, and all callers (`replace_virtual_paths_in_command`, `mask_local_paths_in_output`, `ls_tool`, `read_file_tool`) now resolve ACP paths per-thread P1.2 – ACP permission guardrail - New `auto_approve_permissions: bool = False` field in `ACPAgentConfig` - `_build_permission_response(options, , auto_approve: bool)` now defaults to deny; only approves when `auto_approve=True` - Document field in `config.example.yaml` P2 – Deferred tool registry race condition - Replace module-level `_registry` global with `contextvars.ContextVar` - Each asyncio request context gets its own registry; worker threads inherit the context automatically via `loop.run_in_executor` - Expose `get_deferred_registry` / `set_deferred_registry` / `reset_deferred_registry` helpers Tests: 831 pass (57 for affected modules, 3 new tests) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> fix(sandbox): mount /mnt/acp-workspace in docker sandbox container The AioSandboxProvider was not mounting the ACP workspace into the sandbox container, so /mnt/acp-workspace was inaccessible when the lead agent tried to read ACP results in docker mode. Changes: - `ensure_thread_dirs`: also create `acp-workspace/` (chmod 0o777) so the directory exists before the sandbox container starts — required for Docker volume mounts - `_get_thread_mounts`: add read-only `/mnt/acp-workspace` mount using the per-thread host path (`host_paths.acp_workspace_dir(thread_id)`) - Update stale CLAUDE.md description (was "fixed global workspace") Tests: `test_aio_sandbox_provider.py` (4 new tests) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(lint): remove unused imports in test_aio_sandbox_provider Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix config --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-26 14:20:18 +08:00
Purricane	835ba041f8	feat: add Claude Code OAuth and Codex CLI as LLM providers (#1166 ) * feat: add Claude Code OAuth and Codex CLI providers Port of bytedance/deer-flow#1136 from @solanian's feat/cli-oauth-providers branch.\n\nCarries the feature forward on top of current main without the original CLA-blocked commit metadata, while preserving attribution in the commit message for review. * fix: harden CLI credential loading Align Codex auth loading with the current ~/.codex/auth.json shape, make Docker credential mounts directory-based to avoid broken file binds on hosts without exported credential files, and add focused loader tests. * refactor: tighten codex auth typing Replace the temporary Any return type in CodexChatModel._load_codex_auth with the concrete CodexCliCredential type after the credential loader was stabilized. * fix: load Claude Code OAuth from Keychain Match Claude Code's macOS storage strategy more closely by checking the Keychain-backed credentials store before falling back to ~/.claude/.credentials.json. Keep explicit file overrides and add focused tests for the Keychain path. * fix: require explicit Claude OAuth handoff * style: format thread hooks reasoning request * docs: document CLI-backed auth providers * fix: address provider review feedback * fix: harden provider edge cases * Fix deferred tools, Codex message normalization, and local sandbox paths * chore: narrow PR scope to OAuth providers * chore: remove unrelated frontend changes * chore: reapply OAuth branch frontend scope cleanup * fix: preserve upload guards with reasoning effort wiring --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com>	2026-03-22 22:39:50 +08:00
Simon Su	ceab7fac14	fix: improve MiniMax code plan integration (#1169 ) This PR improves MiniMax Code Plan integration in DeerFlow by fixing three issues in the current flow: stream errors were not clearly surfaced in the UI, the frontend could not display the actual provider model ID, and MiniMax reasoning output could leak into final assistant content as inline <think>...</think>. The change adds a MiniMax-specific adapter, exposes real model IDs end-to-end, and adds a frontend fallback for historical messages. Co-authored-by: Willem Jiang <willem.jiang@gmail.com>	2026-03-20 17:18:59 +08:00
DanielWalnut	76803b826f	refactor: split backend into harness (deerflow.) and app (app.) (#1131 ) * refactor: extract shared utils to break harness→app cross-layer imports Move _validate_skill_frontmatter to src/skills/validation.py and CONVERTIBLE_EXTENSIONS + convert_file_to_markdown to src/utils/file_conversion.py. This eliminates the two reverse dependencies from client.py (harness layer) into gateway/routers/ (app layer), preparing for the harness/app package split. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor: split backend/src into harness (deerflow.) and app (app.) Physically split the monolithic backend/src/ package into two layers: - Harness (`packages/harness/deerflow/`): publishable agent framework package with import prefix `deerflow.`. Contains agents, sandbox, tools, models, MCP, skills, config, and all core infrastructure. - App* (`app/`): unpublished application code with import prefix `app.`. Contains gateway (FastAPI REST API) and channels (IM integrations). Key changes: - Move 13 harness modules to packages/harness/deerflow/ via git mv - Move gateway + channels to app/ via git mv - Rename all imports: src. → deerflow.* (harness) / app.* (app layer) - Set up uv workspace with deerflow-harness as workspace member - Update langgraph.json, config.example.yaml, all scripts, Docker files - Add build-system (hatchling) to harness pyproject.toml - Add PYTHONPATH=. to gateway startup commands for app.* resolution - Update ruff.toml with known-first-party for import sorting - Update all documentation to reflect new directory structure Boundary rule enforced: harness code never imports from app. All 429 tests pass. Lint clean. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * chore: add harness→app boundary check test and update docs Add test_harness_boundary.py that scans all Python files in packages/harness/deerflow/ and fails if any `from app.` or `import app.` statement is found. This enforces the architectural rule that the harness layer never depends on the app layer. Update CLAUDE.md to document the harness/app split architecture, import conventions, and the boundary enforcement test. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add config versioning with auto-upgrade on startup When config.example.yaml schema changes, developers' local config.yaml files can silently become outdated. This adds a config_version field and auto-upgrade mechanism so breaking changes (like src.* → deerflow.* renames) are applied automatically before services start. - Add config_version: 1 to config.example.yaml - Add startup version check warning in AppConfig.from_file() - Add scripts/config-upgrade.sh with migration registry for value replacements - Add `make config-upgrade` target - Auto-run config-upgrade in serve.sh and start-daemon.sh before starting services - Add config error hints in service failure messages Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix comments * fix: update src.* import in test_sandbox_tools_security to deerflow.* Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: handle empty config and search parent dirs for config.example.yaml Address Copilot review comments on PR #1131: - Guard against yaml.safe_load() returning None for empty config files - Search parent directories for config.example.yaml instead of only looking next to config.yaml, fixing detection in common setups Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: correct skills root path depth and config_version type coercion - loader.py: fix get_skills_root_path() to use 5 parent levels (was 3) after harness split, file lives at packages/harness/deerflow/skills/ so parent×3 resolved to backend/packages/harness/ instead of backend/ - app_config.py: coerce config_version to int() before comparison in _check_config_version() to prevent TypeError when YAML stores value as string (e.g. config_version: "1") - tests: add regression tests for both fixes Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: update test imports from src.* to deerflow./app. after harness refactor Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-14 22:55:52 +08:00

37 Commits