deer-flow

mirror of https://github.com/bytedance/deer-flow.git synced 2026-04-25 11:18:22 +00:00

Author	SHA1	Message	Date
肖	5db71cb68c	fix(middleware): repair dangling tool-call history after loop interru… (#2035 ) * fix(middleware): repair dangling tool-call history after loop interruption (#2029) * docs(backend): fix middleware chain ordering --------- Co-authored-by: luoxiao6645 <luoxiao6645@gmail.com>	2026-04-12 19:11:22 +08:00
Jin	4d4ddb3d3f	feat(llm): introduce lightweight circuit breaker to prevent rate-limit bans and resource exhaustion (#2095 )	2026-04-12 17:48:40 +08:00
ZHANG Ning	5b633449f8	fix(middleware): add per-tool-type frequency detection to LoopDetectionMiddleware (#1988 ) * fix(middleware): add per-tool-type frequency detection to LoopDetectionMiddleware The existing hash-based loop detection only catches identical tool call sets. When the agent calls the same tool type (e.g. read_file) on many different files, each call produces a unique hash and bypasses detection. This causes the agent to exhaust recursion_limit, consuming 150K-225K tokens per failed run. Add a second detection layer that tracks cumulative call counts per tool type per thread. Warns at 30 calls (configurable) and forces stop at 50. The hard stop message now uses the actual returned message instead of a hardcoded constant, so both hash-based and frequency-based stops produce accurate diagnostics. Also fix _apply() to use the warning message returned by _track_and_check() for hard stops, instead of always using _HARD_STOP_MSG. Closes #1987 * Apply suggestions from code review Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * fix(lint): remove unused imports and fix line length - Remove unused _TOOL_FREQ_HARD_STOP_MSG and _TOOL_FREQ_WARNING_MSG imports from test file (F401) - Break long _TOOL_FREQ_WARNING_MSG string to fit within 240 char limit (E501) * style: apply ruff format * test: add LRU eviction and per-thread reset coverage for frequency state Address review feedback from @WillemJiang: - Verify _tool_freq and _tool_freq_warned are cleaned on LRU eviction - Add test for reset(thread_id=...) clearing only the target thread's frequency state while leaving others intact * fix(makefile): route Windows shell-script targets through Git Bash (#2060) --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Asish Kumar <87874775+officialasishkumar@users.noreply.github.com>	2026-04-11 17:33:27 +08:00
yorick	02569136df	fix(sandbox): improve sandbox security and preserve multimodal content (#2114 ) * fix: improve sandbox security and preserve multimodal content * Add unit test modifications for test_injects_uploaded_files_tag_into_list_content * format updated_content * Add regression tests for multimodal upload content and host bash default safety	2026-04-11 16:52:10 +08:00
Xinmin Zeng	ad6d934a5f	fix(middleware): handle string-serialized options in ClarificationMiddleware (#1997 ) * fix(middleware): handle string-serialized options in ClarificationMiddleware (#1995) Some models (e.g. Qwen3-Max) serialize array tool parameters as JSON strings instead of native arrays. Add defensive type checking in _format_clarification_message() to deserialize string options before iteration, preventing per-character rendering. * fix(middleware): normalize options after JSON deserialization Address Copilot review feedback: - Add post-deserialization normalization so options is always a list (handles json.loads returning a scalar string, dict, or None) - Add test for JSON-encoded scalar string ("development") - Fix test_json_string_with_mixed_types to use actual mixed types	2026-04-08 21:04:20 +08:00
koppx	c3170f22da	fix(backend): make loop detection hash tool calls by stable keys (#1911 ) * fix(backend): make loop detection hash tool calls by stable keys The loop detection middleware previously hashed full tool call arguments, which made repeated calls look different when only non-essential argument details changed. In particular, `read_file` calls with nearby line ranges could bypass repetition detection even when the agent was effectively reading the same file region again and again. - Hash tool calls using stable keys instead of the full raw args payload - Bucket `read_file` line ranges so nearby reads map to the same region key - Prefer stable identifiers such as `path`, `url`, `query`, or `command` before falling back to JSON serialization of args - Keep hashing order-independent so the same tool call set produces the same hash regardless of call order Fixes #1905 * fix(backend): harden loop detection hash normalization - Normalize and parse stringified tool args defensively - Expand stable key derivation to include pattern, glob, and cmd - Normalize reversed read_file ranges before bucketing Fixes #1905 * fix(backend): harden loop detection tool format * exclude write_file and str_replace from the stable-key path — writing different content to the same file shouldn't be flagged. --------- Co-authored-by: JeffJiang <for-eleven@hotmail.com>	2026-04-07 17:46:33 +08:00
KKK	3b3e8e1b0b	feat(sandbox): strengthen bash command auditing with compound splitting and expanded patterns (#1881 ) * fix(sandbox): strengthen regex coverage in SandboxAuditMiddleware Expand high-risk patterns from 6 to 13 and medium-risk from 4 to 6, closing several bypass vectors identified by cross-referencing Claude Code's BashSecurity validator chain against DeerFlow's threat model. High-risk additions: - Generalised pipe-to-sh (replaces narrow curl\|sh rule) - Targeted command substitution ($() / backtick with dangerous executables) - base64 decode piped to execution - Overwrite system binaries (/usr/bin/, /bin/, /sbin/) - Overwrite shell startup files (~/.bashrc, ~/.profile, etc.) - /proc//environ leakage - LD_PRELOAD / LD_LIBRARY_PATH hijack - /dev/tcp/ bash built-in networking Medium-risk additions: - sudo/su (no-op under Docker root, warn only) - PATH= modification (long attack chain, warn only) Design decisions: - Command substitution uses targeted matching (curl/wget/bash/sh/python/ ruby/perl/base64) rather than blanket block to avoid false positives on safe usage like $(date) or `whoami`. - Skipped encoding/obfuscation checks (hex, octal, Unicode homoglyphs) as ROI is low in Docker sandbox — LLMs don't generate encoded commands and container isolation bounds the blast radius. - Merged pip/pip3 into single pip3? pattern. feat(sandbox): compound command splitting and fork bomb detection Split compound bash commands (&&, \|\|, ;) into sub-commands and classify each independently — prevents dangerous commands hidden after safe prefixes (e.g. "cd /workspace && rm -rf /") from bypassing detection. - Add _split_compound_command() with shlex quote-aware splitting - Add fork bomb detection patterns (classic and while-loop variants) - Most severe verdict wins; block short-circuits - 15 new tests covering compound commands, splitting, and fork bombs * test(sandbox): add async tests for fork bomb and compound commands Cover awrap_tool_call path for fork bomb detection (3 variants) and compound command splitting (block/warn/pass scenarios). * fix(sandbox): address Copilot review — no-whitespace operators, >>/etc/, whole-command scan - _split_compound_command: replace shlex-based implementation with a character-by-character quote/escape-aware scanner. shlex.split only separates '&&' / '\|\|' / ';' when they are surrounded by whitespace, so payloads like 'rm -rf /&&echo ok' or 'safe;rm -rf /' bypassed the previous splitter and therefore the per-sub-command classifier. - _HIGH_RISK_PATTERNS: change r'>\s/etc/' to r'>+\s/etc/' so append redirection ('>>/etc/hosts') is also blocked. - _classify_command: run a whole-command high-risk scan before splitting. Structural attacks like 'while true; do bash & done' span multiple shell statements — splitting on ';' destroys the pattern context, so the raw command must be scanned first. - tests: add no-whitespace operator cases to TestSplitCompoundCommand and test_compound_command_classification to lock in the bypass fix.	2026-04-07 17:15:24 +08:00
KKK	055e4df049	fix(sandbox): add input sanitisation guard to SandboxAuditMiddleware (#1872 ) * fix(sandbox): add L2 input sanitisation to SandboxAuditMiddleware Add _validate_input() to reject malformed bash commands before regex classification: empty commands, oversized commands (>10 000 chars), and null bytes that could cause detection/execution layer inconsistency. * fix(sandbox): address Copilot review — type guard, log truncation, reject reason - Coerce None/non-string command to str before validation - Truncate oversized commands in audit logs to prevent log amplification - Propagate reject_reason through _pre_process() to block message - Remove L2 label from comments and test class names * fix(sandbox): isinstance type guard + async input sanitisation tests Address review comments: - Replace str() coercion with isinstance(raw_command, str) guard so non-string truthy values (0, [], False) fall back to empty string instead of passing validation as "0"/"[]"/"False". - Add TestInputSanitisationBlocksInAwrapToolCall with 4 async tests covering empty, null-byte, oversized, and None command via awrap_tool_call path.	2026-04-06 17:21:58 +08:00
Zhou	1ced6e977c	fix(backend): preserve viewed image reducer metadata (#1900 ) Fix concurrent viewed_images state updates for multi-image input by preserving the reducer metadata in the vision middleware state schema.	2026-04-06 16:47:19 +08:00
thefoolgy	8049785de6	fix(memory): case-insensitive fact deduplication and positive reinforcement detection (#1804 ) * fix(memory): case-insensitive fact deduplication and positive reinforcement detection Two fixes to the memory system: 1. _fact_content_key() now lowercases content before comparison, preventing semantically duplicate facts like "User prefers Python" and "user prefers python" from being stored separately. 2. Adds detect_reinforcement() to MemoryMiddleware (closes #1719), mirroring detect_correction(). When users signal approval ("yes exactly", "perfect", "完全正确", etc.), the memory updater now receives reinforcement_detected=True and injects a hint prompting the LLM to record confirmed preferences and behaviors with high confidence. Changes across the full signal path: - memory_middleware.py: _REINFORCEMENT_PATTERNS + detect_reinforcement() - queue.py: reinforcement_detected field in ConversationContext and add() - updater.py: reinforcement_detected param in update_memory() and update_memory_from_conversation(); builds reinforcement_hint alongside the existing correction_hint Tests: 11 new tests covering deduplication, hint injection, and signal detection (Chinese + English patterns, window boundary, conflict with correction). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(memory): address Copilot review comments on reinforcement detection - Tighten _REINFORCEMENT_PATTERNS: remove 很好, require punctuation/end-of-string boundaries on remaining patterns, split this-is-good into stricter variants - Suppress reinforcement_detected when correction_detected is true to avoid mixed-signal noise - Use casefold() instead of lower() for Unicode-aware fact deduplication - Add missing test coverage for reinforcement_detected OR merge and forwarding in queue --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-05 16:23:00 +08:00
DanielWalnut	2a150f5d4a	fix: unblock concurrent threads and workspace hydration (#1839 ) * fix: unblock concurrent threads and workspace hydration * fix: restore async title generation * fix: address PR review feedback * style: format lead agent prompt	2026-04-04 21:19:35 +08:00
SHIYAO ZHANG	163121d327	fix(uploads): handle split-bold headings and artefacts in extract_outline (#1838 ) * feat(uploads): guide agent to use grep/glob/read_file for uploaded documents Add workflow guidance to the <uploaded_files> context block so the agent knows to use grep and glob (added in #1784) alongside read_file when working with uploaded documents, rather than falling back to web search. This is the final piece of the three-PR PDF agentic search pipeline: - PR1 (#1727): pymupdf4llm converter produces structured Markdown with headings - PR2 (#1738): document outline injected into agent context with line numbers - PR3 (this): agent guided to use outline + grep + read_file workflow * feat(uploads): add file-first priority and fallback guidance to uploaded_files context * fix(uploads): handle split-bold headings and artefacts in extract_outline - Add _clean_bold_title() to merge adjacent bold spans ( ) produced by pymupdf4llm when bold text crosses span boundaries - Add _SPLIT_BOLD_HEADING_RE (Style 3) to recognise <num> <title> headings common in academic papers; excludes pure-number table headers and rows with more than 4 bold blocks - When outline is empty, read first 5 non-empty lines of the .md as a content preview and surface a grep hint in the agent context - Update _format_file_entry to render the preview + grep hint instead of silently omitting the outline section - Add 3 new extract_outline tests and 2 new middleware tests (65 total) * fix(uploads): address Copilot review comments on extract_outline regex - Replace ASCII [A-Za-z] guard with negative lookahead to support non-ASCII titles (e.g. 1 概述); pure-numeric/punctuation blocks still excluded - Replace .+ with [^*]+ and cap repetition at {0,2} (four blocks total) to keep _SPLIT_BOLD_HEADING_RE linear and avoid ReDoS on malformed input - Remove now-redundant len(blocks) <= 4 code-level check (enforced by regex) - Log debug message with exc_info when preview extraction fails	2026-04-04 14:25:08 +08:00
SHIYAO ZHANG	bbd0866374	feat(uploads): guide agent using agentic search for uploaded documents (#1816 ) * feat(uploads): guide agent to use grep/glob/read_file for uploaded documents Add workflow guidance to the <uploaded_files> context block so the agent knows to use grep and glob (added in #1784) alongside read_file when working with uploaded documents, rather than falling back to web search. This is the final piece of the three-PR PDF agentic search pipeline: - PR1 (#1727): pymupdf4llm converter produces structured Markdown with headings - PR2 (#1738): document outline injected into agent context with line numbers - PR3 (this): agent guided to use outline + grep + read_file workflow * feat(uploads): add file-first priority and fallback guidance to uploaded_files context	2026-04-04 11:08:31 +08:00
ppyt	db82b59254	fix(middleware): handle list-type AIMessage.content in LoopDetectionMiddleware (#1823 ) * fix: inject longTermBackground into memory prompt The format_memory_for_injection function only processed recentMonths and earlierContext from the history section, silently dropping longTermBackground. The LLM writes longTermBackground correctly and it persists to memory.json, but it was never injected into the system prompt — making the user's long-term background invisible to the AI. Add the missing field handling and a regression test. * fix(middleware): handle list-type AIMessage.content in LoopDetectionMiddleware LangChain AIMessage.content can be str \| list. When using providers that return structured content blocks (e.g. Anthropic thinking mode, certain OpenAI-compatible gateways), content is a list of dicts like [{"type": "text", "text": "..."}]. The hard_limit branch in _apply() concatenated content with a string via (last_msg.content or "") + f"\n\n{_HARD_STOP_MSG}", which raises TypeError when content is a non-empty list (list + str is invalid). Add _append_text() static method that: - Returns the text directly when content is None - Appends a {"type": "text"} block when content is a list - Falls back to string concatenation when content is a str This is consistent with how other modules in the project already handle list content (client.py._extract_text, memory_middleware, executor.py). * test(middleware): add unit tests for _append_text and list content hard stop Add regression tests to verify LoopDetectionMiddleware handles list-type AIMessage.content correctly during hard stop: - TestAppendText: unit tests for the new _append_text() static method covering None, str, list (including empty list) content types - TestHardStopWithListContent: integration tests verifying hard stop works correctly with list content (Anthropic thinking mode), None content, and str content Requested by reviewer in PR #1823. * fix(middleware): improve _append_text robustness and test isolation - Add explicit isinstance(content, str) check with fallback for unexpected types (coerce to str) to prevent TypeError on edge cases - Deep-copy list content in _make_state() test helper to prevent shared mutable references across test iterations - Add test_unexpected_type_coerced_to_str: verify fallback for non-str/list/None content types - Add test_list_content_not_mutated_in_place: verify _append_text does not modify the original list * style: fix ruff format whitespace in test file --------- Co-authored-by: ppyt <14163465+ppyt@users.noreply.github.com>	2026-04-04 10:38:22 +08:00
SHIYAO ZHANG	5ff230eafd	feat(uploads): inject document outline into agent context for converted files (#1738 ) * feat(uploads): inject document outline into agent context for converted files Extract headings from converted .md files and inject them into the <uploaded_files> context block so the agent can navigate large documents by line number before reading. - Add `extract_outline()` to `file_conversion.py`: recognises standard Markdown headings (#/##/###) and SEC-style bold structural headings (ITEM N. BUSINESS, PART II); caps at 50 entries; excludes cover-page boilerplate (WASHINGTON DC, CURRENT REPORT, SIGNATURES) - Add `_extract_outline_for_file()` helper in `uploads_middleware.py`: looks for a sibling `.md` file produced by the conversion pipeline - Update `UploadsMiddleware._create_files_message()` to render the outline under each file entry with `L{line}: {title}` format and a `read_file` prompt for range-based reading - Tests: 10 new tests for `extract_outline()`, 4 new tests for outline injection in `UploadsMiddleware`; existing test updated for new `outline` field in `uploaded_files` state Partially addresses #1647 (agent ignores uploaded files). * fix(uploads): stream outline file reads and strip inline bold from heading titles - Switch extract_outline() from read_text().splitlines() to open()+line iteration so large converted documents are not loaded into memory on every agent turn; exits as soon as MAX_OUTLINE_ENTRIES is reached (Copilot suggestion) - Strip ... wrapper from standard Markdown heading titles before appending to outline so agent context stays clean (e.g. "## Overview" → "Overview") (Copilot suggestion) - Remove unused pathlib.Path import and fix import sort order in test_file_conversion.py to satisfy ruff CI lint * fix(uploads): show truncation hint when outline exceeds MAX_OUTLINE_ENTRIES When extract_outline() hits the cap it now appends a sentinel entry {"truncated": True} instead of silently dropping the rest of the headings. UploadsMiddleware reads the sentinel and renders a hint line: ... (showing first 50 headings; use `read_file` to explore further) Without this the agent had no way to know the outline was incomplete and would treat the first 50 headings as the full document structure. * fix(uploads): fall back to configurable.thread_id when runtime.context lacks thread_id runtime.context does not always carry thread_id (depends on LangGraph invocation path). ThreadDataMiddleware already falls back to get_config().configurable.thread_id — apply the same pattern so UploadsMiddleware can resolve the uploads directory and attach outlines in all invocation paths. * style: apply ruff format --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com>	2026-04-03 20:52:47 +08:00
SHIYAO ZHANG	46d0c329c1	fix(uploads): fall back to configurable.thread_id when runtime.context lacks thread_id (#1814 ) * fix(uploads): fall back to configurable.thread_id when runtime.context lacks thread_id runtime.context does not always carry thread_id depending on the LangGraph invocation path. When absent, uploads_dir resolved to None and the entire outline/historical-files attachment was silently skipped. Apply the same fallback pattern already used by ThreadDataMiddleware: try get_config().configurable.thread_id, with a RuntimeError guard for test environments where get_config() is called outside a runnable context. Discovered via live integration testing (curl against local LangGraph). Unit tests inject uploads_dir directly and would not catch this. * style: apply ruff format to uploads_middleware.py	2026-04-03 20:26:21 +08:00
greatmengqi	8128a3bc57	fix: enable DanglingToolCallMiddleware for subagents (#1766 )	2026-04-02 18:56:18 +08:00
肖	3a672b39c7	Fix/1681 llm call retry handling (#1683 ) * fix(runtime): handle llm call errors gracefully * fix(runtime): preserve graph control flow in llm retry middleware --------- Co-authored-by: luoxiao6645 <luoxiao6645@gmail.com>	2026-04-02 10:12:17 +08:00
AochenShen99	0cdecf7b30	feat(memory): structured reflection + correction detection in MemoryMiddleware (#1620 ) (#1668 ) * feat(memory): add structured reflection and correction detection * fix(memory): align sourceError schema and prompt guidance --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com>	2026-04-01 16:45:29 +08:00
SHIYAO ZHANG	9aa3ff7c48	feat(sandbox): add SandboxAuditMiddleware for bash command security auditing (#1532 ) * feat(sandbox): add SandboxAuditMiddleware for bash command security auditing Addresses the LocalSandbox escape vector reported in #1224 where bash tool calls can execute destructive commands against the host filesystem. - Add SandboxAuditMiddleware with three-tier command classification: - High-risk (block): rm -rf /, curl\|bash, dd if=, mkfs, /etc/shadow access - Medium-risk (warn): pip install, apt install, chmod 777 - Safe (pass): normal workspace operations - Register middleware after GuardrailMiddleware in _build_runtime_middlewares, applied to both lead agent and subagents - Structured audit log via standard logger (visible in langgraph.log) - Medium-risk commands execute but append a warning to the tool result, allowing the LLM to self-correct without blocking legitimate workflows - High-risk commands return an error ToolMessage without calling the handler, so the agent loop continues gracefully * fix(lint): sort imports in test_sandbox_audit_middleware * refactor(sandbox-audit): address Copilot review feedback (3/5/6) - Fix class docstring to match implementation: medium-risk commands are executed with a warning appended (not rejected), and cwd anchoring note removed (handled in a separate PR) - Remove capsys.disabled() from benchmark test to avoid CI log noise; keep assertions for recall/precision targets - Remove misleading 'cwd fix' from test module docstring * test(sandbox-audit): add async tests for awrap_tool_call * fix(sandbox-audit): address Copilot review feedback (1/2) - Narrow rm high-risk regex to only block truly destructive targets (/, /, ~, ~/, /home, /root); legitimate workspace paths like /mnt/user-data/ are no longer false-positived - Handle list-typed ToolMessage content in _append_warn_to_result; append a text block instead of str()-ing the list to avoid breaking structured content normalization * style: apply ruff format to sandbox_audit_middleware files * fix(sandbox-audit): update benchmark comment to match assert-based implementation --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com>	2026-03-30 07:48:31 +08:00
greatmengqi	06a623f9c8	feat: add create_deerflow_agent SDK entry point (Phase 1) (#1203 )	2026-03-29 15:31:18 +08:00
Nan Gao	520c0352b5	fix(middleware): fall back to configurable thread_id in MemoryMiddleware (#1425 ) (#1426 ) * fix(middleware): fall back to configurable thread_id in MemoryMiddleware (#1425) * Apply suggestions from code review Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: Willem Jiang <willem.jiang@gmail.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2026-03-28 17:00:11 +08:00
moose-lab	03b144f9c9	fix: replace print() with logging across harness package (#1282 ) Replace all bare print() calls with proper logging using Python's standard logging module across the deerflow harness package. Changes across 8 files (16 print statements replaced): - agents/middlewares/clarification_middleware.py: use logger.info/debug - agents/middlewares/memory_middleware.py: use logger.debug - agents/middlewares/thread_data_middleware.py: use logger.debug - agents/middlewares/view_image_middleware.py: use logger.debug - agents/memory/queue.py: use logger.info/debug/warning/error - agents/lead_agent/prompt.py: use logger.error - skills/loader.py: use logger.warning - skills/parser.py: use logger.error Each file follows the established codebase convention: import logging logger = logging.getLogger(__name__) Log levels chosen based on message semantics: - debug: routine operational details (directory creation, timer resets) - info: significant state changes (memory queued, updates processed) - warning: recoverable issues (config load failures, skipped updates) - error: unexpected failures (parsing errors, memory update errors) Note: client.py is intentionally excluded as it uses print() for CLI output, which is the correct behavior for a command-line client. Co-authored-by: moose-lab <moose-lab@users.noreply.github.com>	2026-03-27 23:15:35 +08:00
Jason	4708700723	fix(middleware): return proper content format when no images viewed (#1454 ) - Fix OpenAI BadRequestError: 'No images have been viewed.' was returned as a plain string array instead of a properly formatted content block - The OpenAI API expects message content to be either a string or an array of objects with 'type' field, not an array of plain strings - Changed return from ['No images have been viewed.'] to [{'type': 'text', 'text': 'No images have been viewed.'}] Fixes #1441 Co-authored-by: JasonOA888 <noreply@github.com>	2026-03-27 17:33:17 +08:00
DanielWalnut	d119214fee	feat(harness): integration ACP agent tool (#1344 ) * refactor: extract shared utils to break harness→app cross-layer imports Move _validate_skill_frontmatter to src/skills/validation.py and CONVERTIBLE_EXTENSIONS + convert_file_to_markdown to src/utils/file_conversion.py. This eliminates the two reverse dependencies from client.py (harness layer) into gateway/routers/ (app layer), preparing for the harness/app package split. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor: split backend/src into harness (deerflow.) and app (app.) Physically split the monolithic backend/src/ package into two layers: - Harness (`packages/harness/deerflow/`): publishable agent framework package with import prefix `deerflow.`. Contains agents, sandbox, tools, models, MCP, skills, config, and all core infrastructure. - App* (`app/`): unpublished application code with import prefix `app.`. Contains gateway (FastAPI REST API) and channels (IM integrations). Key changes: - Move 13 harness modules to packages/harness/deerflow/ via git mv - Move gateway + channels to app/ via git mv - Rename all imports: src. → deerflow.* (harness) / app.* (app layer) - Set up uv workspace with deerflow-harness as workspace member - Update langgraph.json, config.example.yaml, all scripts, Docker files - Add build-system (hatchling) to harness pyproject.toml - Add PYTHONPATH=. to gateway startup commands for app.* resolution - Update ruff.toml with known-first-party for import sorting - Update all documentation to reflect new directory structure Boundary rule enforced: harness code never imports from app. All 429 tests pass. Lint clean. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * chore: add harness→app boundary check test and update docs Add test_harness_boundary.py that scans all Python files in packages/harness/deerflow/ and fails if any `from app.` or `import app.` statement is found. This enforces the architectural rule that the harness layer never depends on the app layer. Update CLAUDE.md to document the harness/app split architecture, import conventions, and the boundary enforcement test. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add config versioning with auto-upgrade on startup When config.example.yaml schema changes, developers' local config.yaml files can silently become outdated. This adds a config_version field and auto-upgrade mechanism so breaking changes (like src.* → deerflow.* renames) are applied automatically before services start. - Add config_version: 1 to config.example.yaml - Add startup version check warning in AppConfig.from_file() - Add scripts/config-upgrade.sh with migration registry for value replacements - Add `make config-upgrade` target - Auto-run config-upgrade in serve.sh and start-daemon.sh before starting services - Add config error hints in service failure messages Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix comments * fix: update src.* import in test_sandbox_tools_security to deerflow.* Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: handle empty config and search parent dirs for config.example.yaml Address Copilot review comments on PR #1131: - Guard against yaml.safe_load() returning None for empty config files - Search parent directories for config.example.yaml instead of only looking next to config.yaml, fixing detection in common setups Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: correct skills root path depth and config_version type coercion - loader.py: fix get_skills_root_path() to use 5 parent levels (was 3) after harness split, file lives at packages/harness/deerflow/skills/ so parent×3 resolved to backend/packages/harness/ instead of backend/ - app_config.py: coerce config_version to int() before comparison in _check_config_version() to prevent TypeError when YAML stores value as string (e.g. config_version: "1") - tests: add regression tests for both fixes Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: update test imports from src.* to deerflow./app. after harness refactor Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(harness): add tool-first ACP agent invocation (#37) * feat(harness): add tool-first ACP agent invocation * build(harness): make ACP dependency required * fix(harness): address ACP review feedback * feat(harness): decouple ACP agent workspace from thread data ACP agents (codex, claude-code) previously used per-thread workspace directories, causing path resolution complexity and coupling task execution to DeerFlow's internal thread data layout. This change: - Replace _resolve_cwd() with a fixed _get_work_dir() that always uses {base_dir}/acp-workspace/, eliminating virtual path translation and thread_id lookups - Introduce /mnt/acp-workspace virtual path for lead agent read-only access to ACP agent output files (same pattern as /mnt/skills) - Add security guards: read-only validation, path traversal prevention, command path allowlisting, and output masking for acp-workspace - Update system prompt and tool description to guide LLM: send self-contained tasks to ACP agents, copy results via /mnt/acp-workspace - Add 11 new security tests for ACP workspace path handling Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor(prompt): inject ACP section only when ACP agents are configured The ACP agent guidance in the system prompt is now conditionally built by _build_acp_section(), which checks get_acp_agents() and returns an empty string when no ACP agents are configured. This avoids polluting the prompt with irrelevant instructions for users who don't use ACP. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix lint * fix(harness): address Copilot review comments on sandbox path handling and ACP tool - local_sandbox: fix path-segment boundary bug in _resolve_path (== or startswith +"/") and add lookahead in _resolve_paths_in_command regex to prevent /mnt/skills matching inside /mnt/skills-extra - local_sandbox_provider: replace print() with logger.warning(..., exc_info=True) - invoke_acp_agent_tool: guard getattr(option, "optionId") with None default + continue; move full prompt from INFO to DEBUG level (truncated to 200 chars) - sandbox/tools: fix _get_acp_workspace_host_path docstring to match implementation; remove misleading "read-only" language from validate_local_bash_command_paths Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(acp): thread-isolated workspaces, permission guardrail, and ContextVar registry P1.1 – ACP workspace thread isolation - Add `Paths.acp_workspace_dir(thread_id)` for per-thread paths - `_get_work_dir(thread_id)` in invoke_acp_agent_tool now uses `{base_dir}/threads/{thread_id}/acp-workspace/`; falls back to global workspace when thread_id is absent or invalid - `_invoke` extracts thread_id from `RunnableConfig` via `Annotated[RunnableConfig, InjectedToolArg]` - `sandbox/tools.py`: `_get_acp_workspace_host_path(thread_id)`, `_resolve_acp_workspace_path(path, thread_id)`, and all callers (`replace_virtual_paths_in_command`, `mask_local_paths_in_output`, `ls_tool`, `read_file_tool`) now resolve ACP paths per-thread P1.2 – ACP permission guardrail - New `auto_approve_permissions: bool = False` field in `ACPAgentConfig` - `_build_permission_response(options, , auto_approve: bool)` now defaults to deny; only approves when `auto_approve=True` - Document field in `config.example.yaml` P2 – Deferred tool registry race condition - Replace module-level `_registry` global with `contextvars.ContextVar` - Each asyncio request context gets its own registry; worker threads inherit the context automatically via `loop.run_in_executor` - Expose `get_deferred_registry` / `set_deferred_registry` / `reset_deferred_registry` helpers Tests: 831 pass (57 for affected modules, 3 new tests) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> fix(sandbox): mount /mnt/acp-workspace in docker sandbox container The AioSandboxProvider was not mounting the ACP workspace into the sandbox container, so /mnt/acp-workspace was inaccessible when the lead agent tried to read ACP results in docker mode. Changes: - `ensure_thread_dirs`: also create `acp-workspace/` (chmod 0o777) so the directory exists before the sandbox container starts — required for Docker volume mounts - `_get_thread_mounts`: add read-only `/mnt/acp-workspace` mount using the per-thread host path (`host_paths.acp_workspace_dir(thread_id)`) - Update stale CLAUDE.md description (was "fixed global workspace") Tests: `test_aio_sandbox_provider.py` (4 new tests) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(lint): remove unused imports in test_aio_sandbox_provider Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix config --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-26 14:20:18 +08:00
吴旭云	d7e510763d	fix: add null checks for runtime.context and tighten langgraph constraint (#1326 ) - Add null checks for runtime.context in uploads_middleware.py and sandbox/middleware.py to prevent NPE when langgraph runtime context is None - Tighten langgraph version constraint from >=1.0.6 to >=1.0.6,<1.0.10 to avoid context=None incompatibility with langgraph-api 0.7.x Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Willem Jiang <willem.jiang@gmail.com>	2026-03-25 21:01:10 +08:00
Matthew	2eca58bd86	fix: add null checks for runtime.context in middlewares and tools (#1269 ) Add defensive null checks before accessing runtime.context.get() to prevent AttributeError when runtime.context is None. This affects: - UploadsMiddleware - MemoryMiddleware - LoopDetectionMiddleware - SandboxMiddleware - sandbox tools - setup_agent_tool - present_file_tool - task_tool Also adds .env loading in serve.sh for environment variable support. Co-authored-by: Willem Jiang <willem.jiang@gmail.com>	2026-03-25 08:46:42 +08:00
greatmengqi	16ed797e0e	feat: add configurable log level and token usage tracking (#1301 ) * feat: add configurable log level and token usage tracking - Add `log_level` config to control deerflow module log level, synced to LangGraph Server via serve.sh `--server-log-level` - Add `token_usage.enabled` config with TokenUsageMiddleware that logs input/output/total tokens per LLM call from usage_metadata - Add .omc/ to .gitignore * fix: use info level for token usage logs since feature has its own toggle * fix: sort imports to pass lint check --------- Co-authored-by: greatmengqi <chenmengqi.0376@bytedance.com> Co-authored-by: Willem Jiang <willem.jiang@gmail.com>	2026-03-25 08:13:26 +08:00
d 🔹	77b8ef79ca	fix(middleware): use HumanMessage in LoopDetectionMiddleware for Anthropic compat (#1300 ) LoopDetectionMiddleware injected SystemMessage mid-conversation to warn about repetitive tool calls. This crashes Anthropic models because langchain_anthropic's _format_messages() requires system messages to appear only at the start of the conversation — interleaved system messages raise 'Received multiple non-consecutive system messages'. Switch the warning injection from SystemMessage to HumanMessage, which works with all providers (Anthropic, OpenAI, Google, etc.). Fixes #1299 Co-authored-by: voidborne-d <voidborne-d@users.noreply.github.com>	2026-03-25 08:00:01 +08:00
Uchi Uchibeke	a29134d7c9	feat(guardrails): add pre-tool-call authorization middleware with pluggable providers (#1240 ) Add GuardrailMiddleware that evaluates every tool call before execution. Three provider options: built-in AllowlistProvider (zero deps), OAP passport providers (open standard), or custom providers loaded by class path. - GuardrailProvider protocol with GuardrailRequest/Decision dataclasses - GuardrailMiddleware (AgentMiddleware, position 5 in chain) - AllowlistProvider for simple deny/allow by tool name - GuardrailsConfig (Pydantic singleton, loaded from config.yaml) - 25 tests covering allow/deny, fail-closed/open, async, GraphBubbleUp - Comprehensive docs at backend/docs/GUARDRAILS.md Closes #1213 Co-authored-by: Willem Jiang <willem.jiang@gmail.com>	2026-03-23 18:07:33 +08:00
haoliangxu	e6c6770b70	fix(middleware): fallback to configurable thread_id in thread data middleware (#1237 ) Co-authored-by: Exploreunive <Exploreunive@users.noreply.github.com> Co-authored-by: Willem Jiang <willem.jiang@gmail.com>	2026-03-22 20:14:51 +08:00
greatmengqi	accf5b5f8e	fix: add sync after_model to TitleMiddleware (#1190 )	2026-03-19 15:46:31 +08:00
lhd	0091d9f071	feat(tools): add tool_search for deferred MCP tool loading (#1176 ) * feat(tools): add tool_search for deferred MCP tool loading When multiple MCP servers are enabled, total tool count can exceed 30-50, causing context bloat and degraded tool selection accuracy. This adds a deferred tool loading mechanism controlled by `tool_search.enabled` config. - Add ToolSearchConfig with single `enabled` field - Add DeferredToolRegistry with regex search (select:, +keyword, keyword) - Add tool_search tool returning OpenAI-compatible function JSON - Add DeferredToolFilterMiddleware to hide deferred schemas from bind_tools - Add <available-deferred-tools> section to system prompt - Enable MCP tool_name_prefix to prevent cross-server name collisions - Add 34 unit tests covering registry, tool, prompt, and middleware * fix: reset stale deferred registry and bump config_version - Reset deferred registry upfront in get_available_tools() to prevent stale tool entries when MCP servers are disabled between calls - Bump config_version to 2 for new tool_search config field Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(tests): mock get_app_config in prompt section tests for CI CI has no config.yaml, causing TestDeferredToolsPromptSection to fail with FileNotFoundError. Add autouse fixture to mock get_app_config. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-17 20:43:55 +08:00
Ryanba	b1913a1902	fix(harness): normalize structured content for titles (#1155 ) * fix(harness): normalize structured content for titles Flatten structured LangChain message content before prompting the title model so list/block payloads don't leak Python reprs into generated thread titles. Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai> * Apply suggestions from code review Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai> Co-authored-by: Willem Jiang <willem.jiang@gmail.com> Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>	2026-03-17 09:10:09 +08:00
DanielWalnut	76803b826f	refactor: split backend into harness (deerflow.) and app (app.) (#1131 ) * refactor: extract shared utils to break harness→app cross-layer imports Move _validate_skill_frontmatter to src/skills/validation.py and CONVERTIBLE_EXTENSIONS + convert_file_to_markdown to src/utils/file_conversion.py. This eliminates the two reverse dependencies from client.py (harness layer) into gateway/routers/ (app layer), preparing for the harness/app package split. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor: split backend/src into harness (deerflow.) and app (app.) Physically split the monolithic backend/src/ package into two layers: - Harness (`packages/harness/deerflow/`): publishable agent framework package with import prefix `deerflow.`. Contains agents, sandbox, tools, models, MCP, skills, config, and all core infrastructure. - App* (`app/`): unpublished application code with import prefix `app.`. Contains gateway (FastAPI REST API) and channels (IM integrations). Key changes: - Move 13 harness modules to packages/harness/deerflow/ via git mv - Move gateway + channels to app/ via git mv - Rename all imports: src. → deerflow.* (harness) / app.* (app layer) - Set up uv workspace with deerflow-harness as workspace member - Update langgraph.json, config.example.yaml, all scripts, Docker files - Add build-system (hatchling) to harness pyproject.toml - Add PYTHONPATH=. to gateway startup commands for app.* resolution - Update ruff.toml with known-first-party for import sorting - Update all documentation to reflect new directory structure Boundary rule enforced: harness code never imports from app. All 429 tests pass. Lint clean. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * chore: add harness→app boundary check test and update docs Add test_harness_boundary.py that scans all Python files in packages/harness/deerflow/ and fails if any `from app.` or `import app.` statement is found. This enforces the architectural rule that the harness layer never depends on the app layer. Update CLAUDE.md to document the harness/app split architecture, import conventions, and the boundary enforcement test. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add config versioning with auto-upgrade on startup When config.example.yaml schema changes, developers' local config.yaml files can silently become outdated. This adds a config_version field and auto-upgrade mechanism so breaking changes (like src.* → deerflow.* renames) are applied automatically before services start. - Add config_version: 1 to config.example.yaml - Add startup version check warning in AppConfig.from_file() - Add scripts/config-upgrade.sh with migration registry for value replacements - Add `make config-upgrade` target - Auto-run config-upgrade in serve.sh and start-daemon.sh before starting services - Add config error hints in service failure messages Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix comments * fix: update src.* import in test_sandbox_tools_security to deerflow.* Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: handle empty config and search parent dirs for config.example.yaml Address Copilot review comments on PR #1131: - Guard against yaml.safe_load() returning None for empty config files - Search parent directories for config.example.yaml instead of only looking next to config.yaml, fixing detection in common setups Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: correct skills root path depth and config_version type coercion - loader.py: fix get_skills_root_path() to use 5 parent levels (was 3) after harness split, file lives at packages/harness/deerflow/skills/ so parent×3 resolved to backend/packages/harness/ instead of backend/ - app_config.py: coerce config_version to int() before comparison in _check_config_version() to prevent TypeError when YAML stores value as string (e.g. config_version: "1") - tests: add regression tests for both fixes Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: update test imports from src.* to deerflow./app. after harness refactor Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-14 22:55:52 +08:00

35 Commits