* feat(config): add when_thinking_disabled support for model configs
Allow users to explicitly configure what parameters are sent to the
model when thinking is disabled, via a new `when_thinking_disabled`
field in model config. This mirrors the existing `when_thinking_enabled`
pattern and takes full precedence over the hardcoded disable behavior
when set. Backwards compatible — existing configs work unchanged.
Closes#1675
* fix(config): address copilot review — gate when_thinking_disabled independently
- Switch truthiness check to `is not None` so empty dict overrides work
- Restructure disable path so when_thinking_disabled is gated independently
of has_thinking_settings, allowing it to work without when_thinking_enabled
- Update test to reflect new behavior
* feat: implement full checkpoint rollback on user cancellation
- Capture pre-run checkpoint snapshot including checkpoint state, metadata, and pending_writes
- Add _rollback_to_pre_run_checkpoint() function to restore thread state
- Implement _call_checkpointer_method() helper to support both async and sync checkpointer methods
- Rollback now properly restores checkpoint, metadata, channel_versions, and pending_writes
- Remove obsolete TODO comment (Phase 2) as rollback is now complete
This resolves the TODO(Phase 2) comment and enables full thread state
restoration when a run is cancelled by the user.
* fix: address rollback review feedback
* fix: strengthen checkpoint rollback validation and error handling
- Validate restored_config structure and checkpoint_id before use
- Raise RuntimeError on malformed pending_writes instead of silent skip
- Normalize None checkpoint_ns to empty string instead of "None"
- Move delete_thread to only execute when pre_run_snapshot is None
- Add docstring noting non-atomic rollback as known limitation
This addresses review feedback on PR #1867 regarding data integrity
in the checkpoint rollback implementation.
* test: add comprehensive coverage for checkpoint rollback edge cases
- test_rollback_restores_snapshot_without_deleting_thread
- test_rollback_deletes_thread_when_no_snapshot_exists
- test_rollback_raises_when_restore_config_has_no_checkpoint_id
- test_rollback_normalizes_none_checkpoint_ns_to_root_namespace
- test_rollback_raises_on_malformed_pending_write_not_a_tuple
- test_rollback_raises_on_malformed_pending_write_non_string_channel
- test_rollback_propagates_aput_writes_failure
Covers all scenarios from PR #1867 review feedback.
* test: format rollback worker tests
* fix(sandbox): add startup reconciliation to prevent orphaned container leaks
Sandbox containers were never cleaned up when the managing process restarted,
because all lifecycle tracking lived in in-memory dictionaries. This adds
startup reconciliation that enumerates running containers via `docker ps` and
either destroys orphans (age > idle_timeout) or adopts them into the warm pool.
Closes#1972
* fix(sandbox): address Copilot review — adopt-all strategy, improved error handling
- Reconciliation now adopts all containers into warm pool unconditionally,
letting the idle checker decide cleanup. Avoids destroying containers
that another concurrent process may still be using.
- list_running() logs stderr on docker ps failure and catches
FileNotFoundError/OSError.
- Signal handler test restores SIGTERM/SIGINT in addition to SIGHUP.
- E2E test docstring corrected to match actual coverage scope.
* fix(sandbox): address maintainer review — batch inspect, lock tightening, import hygiene
- _reconcile_orphans(): merge check-and-insert into a single lock acquisition
per container to eliminate the TOCTOU window.
- list_running(): batch the per-container docker inspect into a single call.
Total subprocess calls drop from 2N+1 to 2 (one ps + one batch inspect).
Parse port and created_at from the inspect JSON payload.
- Extract _parse_docker_timestamp() and _extract_host_port() as module-level
pure helpers and test them directly.
- Move datetime/json imports to module top level.
- _make_provider_for_reconciliation(): document the __new__ bypass and the
lockstep coupling to AioSandboxProvider.__init__.
- Add assertion that list_running() makes exactly ONE inspect call.
* Fix HTML artifact preview rendering
* Add after screenshot for HTML preview fix
* Add before screenshot for HTML preview fix
* Update before screenshot for HTML preview fix
* Update after screenshot for HTML preview fix
* Update before screenshot to Tsinghua homepage repro
* Update after screenshot to Tsinghua homepage preview
* Address PR review on HTML artifact preview
* Harden HTML artifact preview isolation
Streamdown's streaming safeguard appends closing markers (e.g. `*`) to
text with unmatched markdown syntax. This causes user messages containing
literal `*` (such as `99 * 87`) to display with a spurious trailing
asterisk. Human messages are always complete, so the incomplete-markdown
pre-processing is unnecessary.
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* fix(docker): nginx fails to start on hosts without IPv6
- Detect IPv6 support at runtime and remove `listen [::]` directive
when unavailable, preventing nginx startup failure on non-IPv6 hosts
- Use `exec` to replace shell with nginx as PID 1 for proper signal
handling (graceful shutdown on SIGTERM)
- Reformat command from YAML folded scalar to block scalar (no
functional change)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(docker): harden nginx startup script (Copilot review feedback)
Add `set -e` so envsubst failures exit immediately instead of starting
nginx with an incomplete config. Narrow the sed pattern to match only
the `listen [::]:2026;` directive to avoid accidentally removing future
lines containing [::].
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
When a model config includes `reasoning_effort` as an extra YAML field
(ModelConfig uses `extra="allow"`), and the thinking-disabled code path
also injects `reasoning_effort="minimal"` into kwargs, the previous
`model_class(**kwargs, **model_settings_from_config)` call raises:
TypeError: got multiple values for keyword argument 'reasoning_effort'
Fix by merging the two dicts before instantiation, giving runtime kwargs
precedence over config values: `{**model_settings_from_config, **kwargs}`.
Fixes#1977
Co-authored-by: octo-patch <octo-patch@github.com>
* fix(middleware): handle string-serialized options in ClarificationMiddleware (#1995)
Some models (e.g. Qwen3-Max) serialize array tool parameters as JSON
strings instead of native arrays. Add defensive type checking in
_format_clarification_message() to deserialize string options before
iteration, preventing per-character rendering.
* fix(middleware): normalize options after JSON deserialization
Address Copilot review feedback:
- Add post-deserialization normalization so options is always a list
(handles json.loads returning a scalar string, dict, or None)
- Add test for JSON-encoded scalar string ("development")
- Fix test_json_string_with_mixed_types to use actual mixed types
* feat(community): add Exa search as community tool provider
Add Exa (exa.ai) as a new community search provider alongside Tavily,
Firecrawl, InfoQuest, and Jina AI. Exa is an AI-native search engine
with neural, keyword, and auto search types.
New files:
- community/exa/tools.py: web_search_tool and web_fetch_tool
- tests/test_exa_tools.py: 10 unit tests with mocked Exa client
Changes:
- pyproject.toml: add exa-py dependency
- config.example.yaml: add commented-out Exa configuration examples
Usage: set `use: deerflow.community.exa.tools:web_search_tool` in
config.yaml and provide EXA_API_KEY.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(community): address PR review comments for Exa tools
- Make _get_exa_client() accept tool_name param so web_fetch reads its own config
- Remove __init__.py to match namespace package pattern of other providers
- Add duplicate tool name warning in config.example.yaml
- Add regression tests for web_fetch config resolution
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Update revision in uv.lock to 3
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* fix(backend): use timezone-aware UTC in memory modules
Replace datetime.utcnow() with datetime.now(timezone.utc) and a shared
utc_now_iso_z() helper so persisted ISO timestamps keep the trailing Z
suffix without triggering Python 3.12+ deprecation warnings.
Made-with: Cursor
* refactor(backend): use removesuffix for utc_now_iso_z suffix
Makes the +00:00 -> Z transform explicit for the trailing offset only
(Copilot review on PR #1992).
Made-with: Cursor
* style(backend): satisfy ruff UP017 with datetime.UTC in memory queue
Made-with: Cursor
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* Fix event loop conflict in SubagentExecutor.execute()
When SubagentExecutor.execute() is called from within an already-running
event loop (e.g., when the parent agent uses async/await), calling
asyncio.run() creates a new event loop that conflicts with asyncio
primitives (like httpx.AsyncClient) that were created in and bound to
the parent loop.
This fix detects if we're already in a running event loop, and if so,
runs the subagent in a separate thread with its own isolated event loop
to avoid conflicts.
Fixes: sub-task cards not appearing in Ultra mode when using async parent agents
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(subagent): harden isolated event loop execution
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
After history.replaceState updates the URL from /chats/new to
/chats/{UUID}, Next.js useParams does not update because replaceState
bypasses the router. The useEffect in useThreadChat would then set
threadIdFromPath ('new') as the threadId, causing the LangGraph SDK
to call POST /threads/new/history which returns HTTP 422 (Invalid
thread ID: must be a UUID).
This fix adds a guard to skip the threadId update when
threadIdFromPath is the literal string 'new', preserving the
already-correct UUID that was set when the thread was created.
- Fix `font-norma` typo to `font-normal` in message-list subtask count
- Fix dark mode `--border` using reddish hue (22.216) instead of neutral
- Replace hardcoded `rgb(184,184,192)` in hero with `text-muted-foreground`
- Replace hardcoded `bg-[#a3a1a1]` in streaming indicator with `bg-muted-foreground`
- Add missing `font-sans` to welcome description `<pre>` for consistency
- Make case-study-section padding responsive (`px-4 md:px-20`)
Closes#1940
* fix(backend): make loop detection hash tool calls by stable keys
The loop detection middleware previously hashed full tool call arguments,
which made repeated calls look different when only non-essential argument
details changed. In particular, `read_file` calls with nearby line ranges
could bypass repetition detection even when the agent was effectively
reading the same file region again and again.
- Hash tool calls using stable keys instead of the full raw args payload
- Bucket `read_file` line ranges so nearby reads map to the same region key
- Prefer stable identifiers such as `path`, `url`, `query`, or `command`
before falling back to JSON serialization of args
- Keep hashing order-independent so the same tool call set produces the
same hash regardless of call order
Fixes#1905
* fix(backend): harden loop detection hash normalization
- Normalize and parse stringified tool args defensively
- Expand stable key derivation to include pattern, glob, and cmd
- Normalize reversed read_file ranges before bucketing
Fixes#1905
* fix(backend): harden loop detection tool format
* exclude write_file and str_replace from the stable-key path — writing different content to the same file shouldn't be flagged.
---------
Co-authored-by: JeffJiang <for-eleven@hotmail.com>
* fix(frontend): resolve layout flickering by migrating workspace sidebar state to cookie
* fix(frontend): unify local settings runtime state to fix state drift
* fix(frontend): only persist thread model on explicit context model updates
* fix(subagents): add cooperative cancellation for subagent threads
Subagent tasks run inside ThreadPoolExecutor threads with their own
event loop (asyncio.run). When a user clicks stop, RunManager cancels
the parent asyncio.Task, but Future.cancel() cannot terminate a running
thread and asyncio.Event does not propagate across event loops. This
causes subagent threads to keep executing (writing files, calling LLMs)
even after the user explicitly stops the run.
Fix: add a threading.Event (cancel_event) to SubagentResult and check
it cooperatively in _aexecute()'s astream iteration loop. On cancel,
request_cancel_background_task() sets the event, and the thread exits
at the next iteration boundary.
Changes:
- executor.py: Add cancel_event field to SubagentResult, check it in
_aexecute loop, set it on timeout, add request_cancel_background_task
- task_tool.py: Call request_cancel_background_task on CancelledError
* fix(subagents): guard cancel status and add pre-check before astream
- Only overwrite status to FAILED when still RUNNING, preserving
TIMED_OUT set by the scheduler thread.
- Add cancel_event pre-check before entering the astream loop so
cancellation is detected immediately when already signalled.
* fix(subagents): guard status updates with lock to prevent race condition
Wrap the check-and-set on result.status in _aexecute with
_background_tasks_lock so the timeout handler in execute_async
cannot interleave between the read and write.
* fix(subagents): add dedicated CANCELLED status for user cancellation
Introduce SubagentStatus.CANCELLED to distinguish user-initiated
cancellation from actual execution failures. Update _aexecute,
task_tool polling, cleanup terminal-status sets, and test fixtures.
* test(subagents): add cancellation tests and fix timeout regression test
- Add dedicated TestCooperativeCancellation test class with 6 tests:
- Pre-set cancel_event prevents astream from starting
- Mid-stream cancel_event returns CANCELLED immediately
- request_cancel_background_task() sets cancel_event correctly
- request_cancel on nonexistent task is a no-op
- Real execute_async timeout does not overwrite CANCELLED (deterministic
threading.Event sync, no wall-clock sleeps)
- cleanup_background_task removes CANCELLED tasks
- Add task_tool cancellation coverage:
- test_cancellation_calls_request_cancel: assert CancelledError path
calls request_cancel_background_task(task_id)
- test_task_tool_returns_cancelled_message: assert CANCELLED polling
branch emits task_cancelled event and returns expected message
- Fix pre-existing test infrastructure issue: add deerflow.sandbox.security
to _MOCKED_MODULE_NAMES (fixes ModuleNotFoundError for all executor tests)
- Add RUNNING guard to timeout handler in executor.py to prevent
TIMED_OUT from overwriting CANCELLED status
- Add cooperative cancellation granularity comment documenting that
cancellation is only detected at astream iteration boundaries
---------
Co-authored-by: lulusiyuyu <lulusiyuyu@users.noreply.github.com>
* fix(frontend): resolve invalid HTML nesting and tabnabbing vulnerabilities
Fix `<button>` inside `<a>` invalid HTML in artifact components and add
missing `noopener,noreferrer` to `window.open` calls to prevent reverse
tabnabbing.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(frontend): address Copilot review on tabnabbing and double-tab-open
Remove redundant parent onClick on web_fetch ChainOfThoughtStep to
prevent opening two tabs on link click, and explicitly null out
window.opener after window.open() for defensive tabnabbing hardening.
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Two production docker-compose.yaml bugs prevent `make up` from working:
1. Gateway missing DEER_FLOW_CONFIG_PATH and DEER_FLOW_EXTENSIONS_CONFIG_PATH
environment overrides. Added in fb2d99f (#1836) but accidentally reverted
by ca2fb95 (#1847). Without them, gateway reads host paths from .env via
env_file, causing FileNotFoundError inside the container.
2. Langgraph command fails when LANGGRAPH_ALLOW_BLOCKING is unset (default).
Empty $${allow_blocking} inserts a bare space between flags, causing
' --no-reload' to be parsed as unexpected extra argument. Fix by building
args string first and conditionally appending --allow-blocking.
Co-authored-by: cooper <cooperfu@tencent.com>
* feat(feishu): add channel file materialization hook for inbound messages
- Introduce Channel.receive_file(msg, thread_id) as a base method for file materialization; default is no-op.
- Implement FeishuChannel.receive_file to download files/images from Feishu messages, save to sandbox, and inject virtual paths into msg.text.
- Update ChannelManager to call receive_file for any channel if msg.files is present, enabling downstream model access to user-uploaded files.
- No impact on Slack/Telegram or other channels (they inherit the default no-op).
* style(backend): format code with ruff for lint compliance
- Auto-formatted packages/harness/deerflow/agents/factory.py and tests/test_create_deerflow_agent.py using `ruff format`
- Ensured both files conform to project linting standards
- Fixes CI lint check failures caused by code style issues
* fix(feishu): handle file write operation asynchronously to prevent blocking
* fix(feishu): rename GetMessageResourceRequest to _GetMessageResourceRequest and remove redundant code
* test(feishu): add tests for receive_file method and placeholder replacement
* fix(manager): remove unnecessary type casting for channel retrieval
* fix(feishu): update logging messages to reflect resource handling instead of image
* fix(feishu): sanitize filename by replacing invalid characters in file uploads
* fix(feishu): improve filename sanitization and reorder image key handling in message processing
* fix(feishu): add thread lock to prevent filename conflicts during file downloads
* fix(test): correct bad merge in test_feishu_parser.py
* chore: run ruff and apply formatting cleanup
fix(feishu): preserve rich-text attachment order and improve fallback filename handling
* fix(sandbox): add L2 input sanitisation to SandboxAuditMiddleware
Add _validate_input() to reject malformed bash commands before regex
classification: empty commands, oversized commands (>10 000 chars), and
null bytes that could cause detection/execution layer inconsistency.
* fix(sandbox): address Copilot review — type guard, log truncation, reject reason
- Coerce None/non-string command to str before validation
- Truncate oversized commands in audit logs to prevent log amplification
- Propagate reject_reason through _pre_process() to block message
- Remove L2 label from comments and test class names
* fix(sandbox): isinstance type guard + async input sanitisation tests
Address review comments:
- Replace str() coercion with isinstance(raw_command, str) guard so
non-string truthy values (0, [], False) fall back to empty string
instead of passing validation as "0"/"[]"/"False".
- Add TestInputSanitisationBlocksInAwrapToolCall with 4 async tests
covering empty, null-byte, oversized, and None command via
awrap_tool_call path.
support for vLLM 0.19.0 OpenAI-compatible chat endpoints and fixes the Qwen reasoning toggle so flash mode can actually disable thinking.
Co-authored-by: NmanQAQ <normangyao@qq.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
ls_tool was the only sandbox tool without output size limits, allowing
multi-MB results from large directories to blow up the model context
window. Add head-truncation (configurable via ls_output_max_chars,
default 20000) consistent with existing bash and read_file truncation.
Closes#1887
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Escape shell variables to prevent Docker Compose from attempting
substitution at parse time. Rename allow_blocking_flag to allow_blocking
for consistency with dev version.
Fixes the 'allow_blocking_flag not set' warning and enables --allow-blocking
flag to work correctly.