* feat(persistence): add SQLAlchemy 2.0 async ORM scaffold
Introduce a unified database configuration (DatabaseConfig) that
controls both the LangGraph checkpointer and the DeerFlow application
persistence layer from a single `database:` config section.
New modules:
- deerflow.config.database_config — Pydantic config with memory/sqlite/postgres backends
- deerflow.persistence — async engine lifecycle, DeclarativeBase with to_dict mixin, Alembic skeleton
- deerflow.runtime.runs.store — RunStore ABC + MemoryRunStore implementation
Gateway integration initializes/tears down the persistence engine in
the existing langgraph_runtime() context manager. Legacy checkpointer
config is preserved for backward compatibility.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(persistence): add RunEventStore ABC + MemoryRunEventStore
Phase 2-A prerequisite for event storage: adds the unified run event
stream interface (RunEventStore) with an in-memory implementation,
RunEventsConfig, gateway integration, and comprehensive tests (27 cases).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(persistence): add ORM models, repositories, DB/JSONL event stores, RunJournal, and API endpoints
Phase 2-B: run persistence + event storage + token tracking.
- ORM models: RunRow (with token fields), ThreadMetaRow, RunEventRow
- RunRepository implements RunStore ABC via SQLAlchemy ORM
- ThreadMetaRepository with owner access control
- DbRunEventStore with trace content truncation and cursor pagination
- JsonlRunEventStore with per-run files and seq recovery from disk
- RunJournal (BaseCallbackHandler) captures LLM/tool/lifecycle events,
accumulates token usage by caller type, buffers and flushes to store
- RunManager now accepts optional RunStore for persistent backing
- Worker creates RunJournal, writes human_message, injects callbacks
- Gateway deps use factory functions (RunRepository when DB available)
- New endpoints: messages, run messages, run events, token-usage
- ThreadCreateRequest gains assistant_id field
- 92 tests pass (33 new), zero regressions
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(persistence): add user feedback + follow-up run association
Phase 2-C: feedback and follow-up tracking.
- FeedbackRow ORM model (rating +1/-1, optional message_id, comment)
- FeedbackRepository with CRUD, list_by_run/thread, aggregate stats
- Feedback API endpoints: create, list, stats, delete
- follow_up_to_run_id in RunCreateRequest (explicit or auto-detected
from latest successful run on the thread)
- Worker writes follow_up_to_run_id into human_message event metadata
- Gateway deps: feedback_repo factory + getter
- 17 new tests (14 FeedbackRepository + 3 follow-up association)
- 109 total tests pass, zero regressions
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test+config: comprehensive Phase 2 test coverage + deprecate checkpointer config
- config.example.yaml: deprecate standalone checkpointer section, activate
unified database:sqlite as default (drives both checkpointer + app data)
- New: test_thread_meta_repo.py (14 tests) — full ThreadMetaRepository coverage
including check_access owner logic, list_by_owner pagination
- Extended test_run_repository.py (+4 tests) — completion preserves fields,
list ordering desc, limit, owner_none returns all
- Extended test_run_journal.py (+8 tests) — on_chain_error, track_tokens=false,
middleware no ai_message, unknown caller tokens, convenience fields,
tool_error, non-summarization custom event
- Extended test_run_event_store.py (+7 tests) — DB batch seq continuity,
make_run_event_store factory (memory/db/jsonl/fallback/unknown)
- Extended test_phase2b_integration.py (+4 tests) — create_or_reject persists,
follow-up metadata, summarization in history, full DB-backed lifecycle
- Fixed DB integration test to use proper fake objects (not MagicMock)
for JSON-serializable metadata
- 157 total Phase 2 tests pass, zero regressions
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* config: move default sqlite_dir to .deer-flow/data
Keep SQLite databases alongside other DeerFlow-managed data
(threads, memory) under the .deer-flow/ directory instead of a
top-level ./data folder.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor(persistence): remove UTFJSON, use engine-level json_serializer + datetime.now()
- Replace custom UTFJSON type with standard sqlalchemy.JSON in all ORM
models. Add json_serializer=json.dumps(ensure_ascii=False) to all
create_async_engine calls so non-ASCII text (Chinese etc.) is stored
as-is in both SQLite and Postgres.
- Change ORM datetime defaults from datetime.now(UTC) to datetime.now(),
remove UTC imports.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor(gateway): simplify deps.py with getter factory + inline repos
- Replace 6 identical getter functions with _require() factory.
- Inline 3 _make_*_repo() factories into langgraph_runtime(), call
get_session_factory() once instead of 3 times.
- Add thread_meta upsert in start_run (services.py).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(docker): add UV_EXTRAS build arg for optional dependencies
Support installing optional dependency groups (e.g. postgres) at
Docker build time via UV_EXTRAS build arg:
UV_EXTRAS=postgres docker compose build
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor(journal): fix flush, token tracking, and consolidate tests
RunJournal fixes:
- _flush_sync: retain events in buffer when no event loop instead of
dropping them; worker's finally block flushes via async flush().
- on_llm_end: add tool_calls filter and caller=="lead_agent" guard for
ai_message events; mark message IDs for dedup with record_llm_usage.
- worker.py: persist completion data (tokens, message count) to RunStore
in finally block.
Model factory:
- Auto-inject stream_usage=True for BaseChatOpenAI subclasses with
custom api_base, so usage_metadata is populated in streaming responses.
Test consolidation:
- Delete test_phase2b_integration.py (redundant with existing tests).
- Move DB-backed lifecycle test into test_run_journal.py.
- Add tests for stream_usage injection in test_model_factory.py.
- Clean up executor/task_tool dead journal references.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(events): widen content type to str|dict in all store backends
Allow event content to be a dict (for structured OpenAI-format messages)
in addition to plain strings. Dict values are JSON-serialized for the DB
backend and deserialized on read; memory and JSONL backends handle dicts
natively. Trace truncation now serializes dicts to JSON before measuring.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(events): use metadata flag instead of heuristic for dict content detection
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(converters): add LangChain-to-OpenAI message format converters
Pure functions langchain_to_openai_message, langchain_to_openai_completion,
langchain_messages_to_openai, and _infer_finish_reason for converting
LangChain BaseMessage objects to OpenAI Chat Completions format, used by
RunJournal for event storage. 15 unit tests added.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(converters): handle empty list content as null, clean up test
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(events): human_message content uses OpenAI user message format
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* feat(events): ai_message uses OpenAI format, add ai_tool_call message event
- ai_message content now uses {"role": "assistant", "content": "..."} format
- New ai_tool_call message event emitted when lead_agent LLM responds with tool_calls
- ai_tool_call uses langchain_to_openai_message converter for consistent format
- Both events include finish_reason in metadata ("stop" or "tool_calls")
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(events): add tool_result message event with OpenAI tool message format
Cache tool_call_id from on_tool_start keyed by run_id as fallback for on_tool_end,
then emit a tool_result message event (role=tool, tool_call_id, content) after each
successful tool completion.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* feat(events): summary content uses OpenAI system message format
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(events): replace llm_start/llm_end with llm_request/llm_response in OpenAI format
Add on_chat_model_start to capture structured prompt messages as llm_request events.
Replace llm_end trace events with llm_response using OpenAI Chat Completions format.
Track llm_call_index to pair request/response events.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(events): add record_middleware method for middleware trace events
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test(events): add full run sequence integration test for OpenAI content format
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* feat(events): align message events with checkpoint format and add middleware tag injection
- Message events (ai_message, ai_tool_call, tool_result, human_message) now use
BaseMessage.model_dump() format, matching LangGraph checkpoint values.messages
- on_tool_end extracts tool_call_id/name/status from ToolMessage objects
- on_tool_error now emits tool_result message events with error status
- record_middleware uses middleware:{tag} event_type and middleware category
- Summarization custom events use middleware:summarize category
- TitleMiddleware injects middleware:title tag via get_config() inheritance
- SummarizationMiddleware model bound with middleware:summarize tag
- Worker writes human_message using HumanMessage.model_dump()
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(threads): switch search endpoint to threads_meta table and sync title
- POST /api/threads/search now queries threads_meta table directly,
removing the two-phase Store + Checkpointer scan approach
- Add ThreadMetaRepository.search() with metadata/status filters
- Add ThreadMetaRepository.update_display_name() for title sync
- Worker syncs checkpoint title to threads_meta.display_name on run completion
- Map display_name to values.title in search response for API compatibility
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(threads): history endpoint reads messages from event store
- POST /api/threads/{thread_id}/history now combines two data sources:
checkpointer for checkpoint_id, metadata, title, thread_data;
event store for messages (complete history, not truncated by summarization)
- Strip internal LangGraph metadata keys from response
- Remove full channel_values serialization in favor of selective fields
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: remove duplicate optional-dependencies header in pyproject.toml
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(middleware): pass tagged config to TitleMiddleware ainvoke call
Without the config, the middleware:title tag was not injected,
causing the LLM response to be recorded as a lead_agent ai_message
in run_events.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: resolve merge conflict in .env.example
Keep both DATABASE_URL (from persistence-scaffold) and WECOM
credentials (from main) after the merge.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(persistence): address review feedback on PR #1851
- Fix naive datetime.now() → datetime.now(UTC) in all ORM models
- Fix seq race condition in DbRunEventStore.put() with FOR UPDATE
and UNIQUE(thread_id, seq) constraint
- Encapsulate _store access in RunManager.update_run_completion()
- Deduplicate _store.put() logic in RunManager via _persist_to_store()
- Add update_run_completion to RunStore ABC + MemoryRunStore
- Wire follow_up_to_run_id through the full create path
- Add error recovery to RunJournal._flush_sync() lost-event scenario
- Add migration note for search_threads breaking change
- Fix test_checkpointer_none_fix mock to set database=None
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: update uv.lock
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(persistence): address 22 review comments from CodeQL, Copilot, and Code Quality
Bug fixes:
- Sanitize log params to prevent log injection (CodeQL)
- Reset threads_meta.status to idle/error when run completes
- Attach messages only to latest checkpoint in /history response
- Write threads_meta on POST /threads so new threads appear in search
Lint fixes:
- Remove unused imports (journal.py, migrations/env.py, test_converters.py)
- Convert lambda to named function (engine.py, Ruff E731)
- Remove unused logger definitions in repos (Ruff F841)
- Add logging to JSONL decode errors and empty except blocks
- Separate assert side-effects in tests (CodeQL)
- Remove unused local variables in tests (Ruff F841)
- Fix max_trace_content truncation to use byte length, not char length
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* style: apply ruff format to persistence and runtime files
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Potential fix for pull request finding 'Statement has no effect'
Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
* refactor(runtime): introduce RunContext to reduce run_agent parameter bloat
Extract checkpointer, store, event_store, run_events_config, thread_meta_repo,
and follow_up_to_run_id into a frozen RunContext dataclass. Add get_run_context()
in deps.py to build the base context from app.state singletons. start_run() uses
dataclasses.replace() to enrich per-run fields before passing ctx to run_agent.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor(gateway): move sanitize_log_param to app/gateway/utils.py
Extract the log-injection sanitizer from routers/threads.py into a shared
utils module and rename to sanitize_log_param (public API). Eliminates the
reverse service → router import in services.py.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* perf: use SQL aggregation for feedback stats and thread token usage
Replace Python-side counting in FeedbackRepository.aggregate_by_run with
a single SELECT COUNT/SUM query. Add RunStore.aggregate_tokens_by_thread
abstract method with SQL GROUP BY implementation in RunRepository and
Python fallback in MemoryRunStore. Simplify the thread_token_usage
endpoint to delegate to the new method, eliminating the limit=10000
truncation risk.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs: annotate DbRunEventStore.put() as low-frequency path
Add docstring clarifying that put() opens a per-call transaction with
FOR UPDATE and should only be used for infrequent writes (currently
just the initial human_message event). High-throughput callers should
use put_batch() instead.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(threads): fall back to Store search when ThreadMetaRepository is unavailable
When database.backend=memory (default) or no SQL session factory is
configured, search_threads now queries the LangGraph Store instead of
returning 503. Returns empty list if neither Store nor repo is available.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor(persistence): introduce ThreadMetaStore ABC for backend-agnostic thread metadata
Add ThreadMetaStore abstract base class with create/get/search/update/delete
interface. ThreadMetaRepository (SQL) now inherits from it. New
MemoryThreadMetaStore wraps LangGraph BaseStore for memory-mode deployments.
deps.py now always provides a non-None thread_meta_repo, eliminating all
`if thread_meta_repo is not None` guards in services.py, worker.py, and
routers/threads.py. search_threads no longer needs a Store fallback branch.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor(history): read messages from checkpointer instead of RunEventStore
The /history endpoint now reads messages directly from the
checkpointer's channel_values (the authoritative source) instead of
querying RunEventStore.list_messages(). The RunEventStore API is
preserved for other consumers.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(persistence): address new Copilot review comments
- feedback.py: validate thread_id/run_id before deleting feedback
- jsonl.py: add path traversal protection with ID validation
- run_repo.py: parse `before` to datetime for PostgreSQL compat
- thread_meta_repo.py: fix pagination when metadata filter is active
- database_config.py: use resolve_path for sqlite_dir consistency
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Implement skill self-evolution and skill_manage flow (#1874)
* chore: ignore .worktrees directory
* Add skill_manage self-evolution flow
* Fix CI regressions for skill_manage
* Address PR review feedback for skill evolution
* fix(skill-evolution): preserve history on delete
* fix(skill-evolution): tighten scanner fallbacks
* docs: add skill_manage e2e evidence screenshot
* fix(skill-manage): avoid blocking fs ops in session runtime
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* fix(config): resolve sqlite_dir relative to CWD, not Paths.base_dir
resolve_path() resolves relative to Paths.base_dir (.deer-flow),
which double-nested the path to .deer-flow/.deer-flow/data/app.db.
Use Path.resolve() (CWD-relative) instead.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Feature/feishu receive file (#1608)
* feat(feishu): add channel file materialization hook for inbound messages
- Introduce Channel.receive_file(msg, thread_id) as a base method for file materialization; default is no-op.
- Implement FeishuChannel.receive_file to download files/images from Feishu messages, save to sandbox, and inject virtual paths into msg.text.
- Update ChannelManager to call receive_file for any channel if msg.files is present, enabling downstream model access to user-uploaded files.
- No impact on Slack/Telegram or other channels (they inherit the default no-op).
* style(backend): format code with ruff for lint compliance
- Auto-formatted packages/harness/deerflow/agents/factory.py and tests/test_create_deerflow_agent.py using `ruff format`
- Ensured both files conform to project linting standards
- Fixes CI lint check failures caused by code style issues
* fix(feishu): handle file write operation asynchronously to prevent blocking
* fix(feishu): rename GetMessageResourceRequest to _GetMessageResourceRequest and remove redundant code
* test(feishu): add tests for receive_file method and placeholder replacement
* fix(manager): remove unnecessary type casting for channel retrieval
* fix(feishu): update logging messages to reflect resource handling instead of image
* fix(feishu): sanitize filename by replacing invalid characters in file uploads
* fix(feishu): improve filename sanitization and reorder image key handling in message processing
* fix(feishu): add thread lock to prevent filename conflicts during file downloads
* fix(test): correct bad merge in test_feishu_parser.py
* chore: run ruff and apply formatting cleanup
fix(feishu): preserve rich-text attachment order and improve fallback filename handling
* fix(docker): restore gateway env vars and fix langgraph empty arg issue (#1915)
Two production docker-compose.yaml bugs prevent `make up` from working:
1. Gateway missing DEER_FLOW_CONFIG_PATH and DEER_FLOW_EXTENSIONS_CONFIG_PATH
environment overrides. Added in fb2d99f (#1836) but accidentally reverted
by ca2fb95 (#1847). Without them, gateway reads host paths from .env via
env_file, causing FileNotFoundError inside the container.
2. Langgraph command fails when LANGGRAPH_ALLOW_BLOCKING is unset (default).
Empty $${allow_blocking} inserts a bare space between flags, causing
' --no-reload' to be parsed as unexpected extra argument. Fix by building
args string first and conditionally appending --allow-blocking.
Co-authored-by: cooper <cooperfu@tencent.com>
* fix(frontend): resolve invalid HTML nesting and tabnabbing vulnerabilities (#1904)
* fix(frontend): resolve invalid HTML nesting and tabnabbing vulnerabilities
Fix `<button>` inside `<a>` invalid HTML in artifact components and add
missing `noopener,noreferrer` to `window.open` calls to prevent reverse
tabnabbing.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(frontend): address Copilot review on tabnabbing and double-tab-open
Remove redundant parent onClick on web_fetch ChainOfThoughtStep to
prevent opening two tabs on link click, and explicitly null out
window.opener after window.open() for defensive tabnabbing hardening.
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* refactor(persistence): organize entities into per-entity directories
Restructure the persistence layer from horizontal "models/ + repositories/"
split into vertical entity-aligned directories. Each entity (thread_meta,
run, feedback) now owns its ORM model, abstract interface (where applicable),
and concrete implementations under a single directory with an aggregating
__init__.py for one-line imports.
Layout:
persistence/thread_meta/{base,model,sql,memory}.py
persistence/run/{model,sql}.py
persistence/feedback/{model,sql}.py
models/__init__.py is kept as a facade so Alembic autogenerate continues to
discover all ORM tables via Base.metadata. RunEventRow remains under
models/run_event.py because its storage implementation lives in
runtime/events/store/db.py and has no matching repository directory.
The repositories/ directory is removed entirely. All call sites in
gateway/deps.py and tests are updated to import from the new entity
packages, e.g.:
from deerflow.persistence.thread_meta import ThreadMetaRepository
from deerflow.persistence.run import RunRepository
from deerflow.persistence.feedback import FeedbackRepository
Full test suite passes (1690 passed, 14 skipped).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(gateway): sync thread rename and delete through ThreadMetaStore
The POST /threads/{id}/state endpoint previously synced title changes
only to the LangGraph Store via _store_upsert. In sqlite mode the search
endpoint reads from the ThreadMetaRepository SQL table, so renames never
appeared in /threads/search until the next agent run completed (worker.py
syncs title from checkpoint to thread_meta in its finally block).
Likewise the DELETE /threads/{id} endpoint cleaned up the filesystem,
Store, and checkpointer but left the threads_meta row orphaned in sqlite,
so deleted threads kept appearing in /threads/search.
Fix both endpoints by routing through the ThreadMetaStore abstraction
which already has the correct sqlite/memory implementations wired up by
deps.py. The rename path now calls update_display_name() and the delete
path calls delete() — both work uniformly across backends.
Verified end-to-end with curl in gateway mode against sqlite backend.
Existing test suite (1690 passed) and focused router/repo tests pass.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor(gateway): route all thread metadata access through ThreadMetaStore
Following the rename/delete bug fix in PR1, migrate the remaining direct
LangGraph Store reads/writes in the threads router and services to the
ThreadMetaStore abstraction so that the sqlite and memory backends behave
identically and the legacy dual-write paths can be removed.
Migrated endpoints (threads.py):
- create_thread: idempotency check + write now use thread_meta_repo.get/create
instead of dual-writing the LangGraph Store and the SQL row.
- get_thread: reads from thread_meta_repo.get; the checkpoint-only fallback
for legacy threads is preserved.
- patch_thread: replaced _store_get/_store_put with thread_meta_repo.update_metadata.
- delete_thread_data: dropped the legacy store.adelete; thread_meta_repo.delete
already covers it.
Removed dead code (services.py):
- _upsert_thread_in_store — redundant with the immediately following
thread_meta_repo.create() call.
- _sync_thread_title_after_run — worker.py's finally block already syncs
the title via thread_meta_repo.update_display_name() after each run.
Removed dead code (threads.py):
- _store_get / _store_put / _store_upsert helpers (no remaining callers).
- THREADS_NS constant.
- get_store import (router no longer touches the LangGraph Store directly).
New abstract method:
- ThreadMetaStore.update_metadata(thread_id, metadata) merges metadata into
the thread's metadata field. Implemented in both ThreadMetaRepository (SQL,
read-modify-write inside one session) and MemoryThreadMetaStore. Three new
unit tests cover merge / empty / nonexistent behaviour.
Net change: -134 lines. Full test suite: 1693 passed, 14 skipped.
Verified end-to-end with curl in gateway mode against sqlite backend
(create / patch / get / rename / search / delete).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
Co-authored-by: DanielWalnut <45447813+hetaoBackend@users.noreply.github.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: JilongSun <965640067@qq.com>
Co-authored-by: jie <49781832+stan-fu@users.noreply.github.com>
Co-authored-by: cooper <cooperfu@tencent.com>
Co-authored-by: yangzheli <43645580+yangzheli@users.noreply.github.com>
* feat(provisioner): add optional PVC support for sandbox volumes (#1978)
Add SKILLS_PVC_NAME and USERDATA_PVC_NAME env vars to allow sandbox
Pods to use PersistentVolumeClaims instead of hostPath volumes. This
prevents data loss in production when pods are rescheduled across nodes.
When USERDATA_PVC_NAME is set, a subPath of threads/{thread_id}/user-data
is used so a single PVC can serve multiple threads. Falls back to hostPath
when the new env vars are not set, preserving backward compatibility.
* add unit test for provisioner pvc volumes
* refactor: extract shared provisioner_module fixture to conftest.py
Agent-Logs-Url: https://github.com/bytedance/deer-flow/sessions/e7ccf708-c6ba-40e4-844a-b526bdb249dd
Co-authored-by: WillemJiang <219644+WillemJiang@users.noreply.github.com>
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: JeffJiang <for-eleven@hotmail.com>
* fix(backend): stream DeerFlowClient AI text as token deltas (#1969)
DeerFlowClient.stream() subscribed to LangGraph stream_mode=["values",
"custom"] which only delivers full-state snapshots at graph-node
boundaries, so AI replies were dumped as a single messages-tuple event
per node instead of streaming token-by-token. `client.stream("hello")`
looked identical to `client.chat("hello")` — the bug reported in #1969.
Subscribe to "messages" mode as well, forward AIMessageChunk deltas as
messages-tuple events with delta semantics (consumers accumulate by id),
and dedup the values-snapshot path so it does not re-synthesize AI
text that was already streamed. Introduce a per-id usage_metadata
counter so the final AIMessage in the values snapshot and the final
"messages" chunk — which carry the same cumulative usage — are not
double-counted.
chat() now accumulates per-id deltas and returns the last message's
full accumulated text. Non-streaming mock sources (single event per id)
are a degenerate case of the same logic, keeping existing callers and
tests backward compatible.
Verified end-to-end against a real LLM: a 15-number count emits 35
messages-tuple events with BPE subword boundaries clearly visible
("eleven" -> "ele" / "ven", "twelve" -> "tw" / "elve"), 476ms across
the window, end-event usage matches the values-snapshot usage exactly
(not doubled). tests/test_client_live.py::TestLiveStreaming passes.
New unit tests:
- test_messages_mode_emits_token_deltas: 3 AIMessageChunks produce 3
delta events with correct content/id/usage, values-snapshot does not
duplicate, usage counted once.
- test_chat_accumulates_streamed_deltas: chat() rebuilds full text
from deltas.
- test_messages_mode_tool_message: ToolMessage delivered via messages
mode is not duplicated by the values-snapshot synthesis path.
The stream() docstring now documents why this client does not reuse
Gateway's run_agent() / StreamBridge pipeline (sync vs async, raw
LangChain objects vs serialized dicts, single caller vs HTTP fan-out).
Fixes#1969
* refactor(backend): simplify DeerFlowClient streaming helpers (#1969)
Post-review cleanup for the token-level streaming fix. No behavior
change for correct inputs; one efficiency regression fixed.
Fix: chat() O(n²) accumulator
-----------------------------
`chat()` accumulated per-id text via `buffers[id] = buffers.get(id,"") + delta`,
which is O(n) per concat → O(n²) total over a streamed response. At
~2 KB cumulative text this becomes user-visible; at 50 KB / 5000 chunks
it costs roughly 100-300 ms of pure copying. Switched to
`dict[str, list[str]]` + `"".join()` once at return.
Cleanup
-------
- Extract `_serialize_tool_calls`, `_ai_text_event`, `_ai_tool_calls_event`,
and `_tool_message_event` static helpers. The messages-mode and
values-mode branches previously repeated four inline dict literals each;
they now call the same builders.
- `StreamEvent.type` is now typed as `Literal["values", "messages-tuple",
"custom", "end"]` via a `StreamEventType` alias. Makes the closed set
explicit and catches typos at type-check time.
- Direct attribute access on `AIMessage`/`AIMessageChunk`: `.usage_metadata`,
`.tool_calls`, `.id` all have default values on the base class, so the
`getattr(..., None)` fallbacks were dead code. Removed from the hot
path.
- `_account_usage` parameter type loosened to `Any` so that LangChain's
`UsageMetadata` TypedDict is accepted under strict type checking.
- Trimmed narrating comments on `seen_ids` / `streamed_ids` / the
values-synthesis skip block; kept the non-obvious ones that document
the cross-mode dedup invariant.
Net diff: -15 lines. All 132 unit tests + harness boundary test still
pass; ruff check and ruff format pass.
* docs(backend): add STREAMING.md design note (#1969)
Dedicated design document for the token-level streaming architecture,
prompted by the bug investigation in #1969.
Contents:
- Why two parallel streaming paths exist (Gateway HTTP/async vs
DeerFlowClient sync/in-process) and why they cannot be merged.
- LangGraph's three-layer mode naming (Graph "messages" vs Platform
SDK "messages-tuple" vs HTTP SSE) and why a shared string constant
would be harmful.
- Gateway path: run_agent + StreamBridge + sse_consumer with a
sequence diagram.
- DeerFlowClient path: sync generator + direct yield, delta semantics,
chat() accumulator.
- Why the three id sets (seen_ids / streamed_ids / counted_usage_ids)
each carry an independent invariant and cannot be collapsed.
- End-to-end sequence for a real conversation turn.
- Lessons from #1969: why mock-based tests missed the bug, why
BPE subword boundaries in live output are the strongest
correctness signal, and the regression test that locks it in.
- Source code location index.
Also:
- Link from backend/CLAUDE.md Embedded Client section.
- Link from backend/docs/README.md under Feature Documentation.
* test(backend): add refactor regression guards for stream() (#1969)
Three new tests in TestStream that lock the contract introduced by
PR #1974 so any future refactor (sync->async migration, sharing a
core with Gateway's run_agent, dedup strategy change) cannot
silently change behavior.
- test_dedup_requires_messages_before_values_invariant: canary that
documents the order-dependence of cross-mode dedup. streamed_ids
is populated only by the messages branch, so values-before-messages
for the same id produces duplicate AI text events. Real LangGraph
never inverts this order, but a refactor that does (or that makes
dedup idempotent) must update this test deliberately.
- test_messages_mode_golden_event_sequence: locks the *exact* event
sequence (4 events: 2 messages-tuple deltas, 1 values snapshot, 1
end) for a canonical streaming turn. List equality gives a clear
diff on any drift in order, type, or payload shape.
- test_chat_accumulates_in_linear_time: perf canary for the O(n^2)
fix in commit 1f11ba10. 10,000 single-char chunks must accumulate
in under 1s; the threshold is wide enough to pass on slow CI but
tight enough to fail if buffer = buffer + delta is restored.
All three tests pass alongside the existing 12 TestStream tests
(15/15). ruff check + ruff format clean.
* docs(backend): clarify stream() docstring on JSON serialization (#1969)
Replace the misleading "raw LangChain objects (AIMessage,
usage_metadata as dataclasses), not dicts" claim in the
"Why not reuse Gateway's run_agent?" section. The implementation
already yields plain Python dicts (StreamEvent.data is dict, and
usage_metadata is a TypedDict), so the original wording suggested
a richer return type than the API actually delivers.
The corrected wording focuses on what is actually true and
relevant: this client skips the JSON/SSE serialization layer that
Gateway adds for HTTP wire transmission, and yields stream event
payloads directly as Python data structures.
Addresses Copilot review feedback on PR #1974.
* test(backend): document none-id messages dedup limitation (#1969)
Add test_none_id_chunks_produce_duplicates_known_limitation to
TestStream that explicitly documents and asserts the current
behavior when an LLM provider emits AIMessageChunk with id=None
(vLLM, certain custom backends).
The cross-mode dedup machinery cannot record a None id in
streamed_ids (guarded by ``if msg_id:``), so the values snapshot's
reassembled AIMessage with a real id falls through and synthesizes
a duplicate AI text event. The test asserts len == 2 and locks
this as a known limitation rather than silently letting future
contributors hit it without context.
Why this is documented rather than fixed:
* Falling back to ``metadata.get("id")`` does not help — LangGraph's
messages-mode metadata never carries the message id.
* Synthesizing ``f"_synth_{id(msg_chunk)}"`` only helps if the
values snapshot uses the same fallback, which it does not.
* A real fix requires provider cooperation (always emit chunk ids)
or content-based dedup (false-positive risk), neither of which
belongs in this PR.
If a real fix lands, replace this test with a positive assertion
that dedup works for None-id chunks.
Addresses Copilot review feedback on PR #1974 (client.py:515).
* fix(frontend): UI polish - fix CSS typo, dark mode border, and hardcoded colors (#1942)
- Fix `font-norma` typo to `font-normal` in message-list subtask count
- Fix dark mode `--border` using reddish hue (22.216) instead of neutral
- Replace hardcoded `rgb(184,184,192)` in hero with `text-muted-foreground`
- Replace hardcoded `bg-[#a3a1a1]` in streaming indicator with `bg-muted-foreground`
- Add missing `font-sans` to welcome description `<pre>` for consistency
- Make case-study-section padding responsive (`px-4 md:px-20`)
Closes#1940
* docs: clarify deployment sizing guidance (#1963)
* fix(frontend): prevent stale 'new' thread ID from triggering 422 history requests (#1960)
After history.replaceState updates the URL from /chats/new to
/chats/{UUID}, Next.js useParams does not update because replaceState
bypasses the router. The useEffect in useThreadChat would then set
threadIdFromPath ('new') as the threadId, causing the LangGraph SDK
to call POST /threads/new/history which returns HTTP 422 (Invalid
thread ID: must be a UUID).
This fix adds a guard to skip the threadId update when
threadIdFromPath is the literal string 'new', preserving the
already-correct UUID that was set when the thread was created.
* fix(frontend): avoid using route new as thread id (#1967)
Co-authored-by: luoxiao6645 <luoxiao6645@gmail.com>
* Fix(subagent): Event loop conflict in SubagentExecutor.execute() (#1965)
* Fix event loop conflict in SubagentExecutor.execute()
When SubagentExecutor.execute() is called from within an already-running
event loop (e.g., when the parent agent uses async/await), calling
asyncio.run() creates a new event loop that conflicts with asyncio
primitives (like httpx.AsyncClient) that were created in and bound to
the parent loop.
This fix detects if we're already in a running event loop, and if so,
runs the subagent in a separate thread with its own isolated event loop
to avoid conflicts.
Fixes: sub-task cards not appearing in Ultra mode when using async parent agents
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(subagent): harden isolated event loop execution
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* refactor(backend): remove dead getattr in _tool_message_event
---------
Co-authored-by: greatmengqi <chenmengqi.0376@bytedance.com>
Co-authored-by: Xinmin Zeng <135568692+fancyboi999@users.noreply.github.com>
Co-authored-by: 13ernkastel <LennonCMJ@live.com>
Co-authored-by: siwuai <458372151@qq.com>
Co-authored-by: 肖 <168966994+luoxiao6645@users.noreply.github.com>
Co-authored-by: luoxiao6645 <luoxiao6645@gmail.com>
Co-authored-by: Saber <11769524+hawkli-1994@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
The /api/langgraph/* endpoints proxy straight to the LangGraph server,
so clients inherit LangGraph's native recursion_limit default of 25
instead of the 100 that build_run_config sets for the Gateway and IM
channel paths. 25 is too low for plan-mode or subagent runs and
reliably triggers GraphRecursionError on the lead agent's final
synthesis step after subagents return.
Set recursion_limit: 100 in the Create Run example and the cURL
snippet, and add a short note explaining the discrepancy so users
following the docs don't hit the 25-step ceiling as a surprise.
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add langchain-ollama as an optional dependency and provide ChatOllama
config examples, enabling proper thinking/reasoning content preservation
for local Ollama models.
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(config): add when_thinking_disabled support for model configs
Allow users to explicitly configure what parameters are sent to the
model when thinking is disabled, via a new `when_thinking_disabled`
field in model config. This mirrors the existing `when_thinking_enabled`
pattern and takes full precedence over the hardcoded disable behavior
when set. Backwards compatible — existing configs work unchanged.
Closes#1675
* fix(config): address copilot review — gate when_thinking_disabled independently
- Switch truthiness check to `is not None` so empty dict overrides work
- Restructure disable path so when_thinking_disabled is gated independently
of has_thinking_settings, allowing it to work without when_thinking_enabled
- Update test to reflect new behavior
* feat: implement full checkpoint rollback on user cancellation
- Capture pre-run checkpoint snapshot including checkpoint state, metadata, and pending_writes
- Add _rollback_to_pre_run_checkpoint() function to restore thread state
- Implement _call_checkpointer_method() helper to support both async and sync checkpointer methods
- Rollback now properly restores checkpoint, metadata, channel_versions, and pending_writes
- Remove obsolete TODO comment (Phase 2) as rollback is now complete
This resolves the TODO(Phase 2) comment and enables full thread state
restoration when a run is cancelled by the user.
* fix: address rollback review feedback
* fix: strengthen checkpoint rollback validation and error handling
- Validate restored_config structure and checkpoint_id before use
- Raise RuntimeError on malformed pending_writes instead of silent skip
- Normalize None checkpoint_ns to empty string instead of "None"
- Move delete_thread to only execute when pre_run_snapshot is None
- Add docstring noting non-atomic rollback as known limitation
This addresses review feedback on PR #1867 regarding data integrity
in the checkpoint rollback implementation.
* test: add comprehensive coverage for checkpoint rollback edge cases
- test_rollback_restores_snapshot_without_deleting_thread
- test_rollback_deletes_thread_when_no_snapshot_exists
- test_rollback_raises_when_restore_config_has_no_checkpoint_id
- test_rollback_normalizes_none_checkpoint_ns_to_root_namespace
- test_rollback_raises_on_malformed_pending_write_not_a_tuple
- test_rollback_raises_on_malformed_pending_write_non_string_channel
- test_rollback_propagates_aput_writes_failure
Covers all scenarios from PR #1867 review feedback.
* test: format rollback worker tests
* fix(sandbox): add startup reconciliation to prevent orphaned container leaks
Sandbox containers were never cleaned up when the managing process restarted,
because all lifecycle tracking lived in in-memory dictionaries. This adds
startup reconciliation that enumerates running containers via `docker ps` and
either destroys orphans (age > idle_timeout) or adopts them into the warm pool.
Closes#1972
* fix(sandbox): address Copilot review — adopt-all strategy, improved error handling
- Reconciliation now adopts all containers into warm pool unconditionally,
letting the idle checker decide cleanup. Avoids destroying containers
that another concurrent process may still be using.
- list_running() logs stderr on docker ps failure and catches
FileNotFoundError/OSError.
- Signal handler test restores SIGTERM/SIGINT in addition to SIGHUP.
- E2E test docstring corrected to match actual coverage scope.
* fix(sandbox): address maintainer review — batch inspect, lock tightening, import hygiene
- _reconcile_orphans(): merge check-and-insert into a single lock acquisition
per container to eliminate the TOCTOU window.
- list_running(): batch the per-container docker inspect into a single call.
Total subprocess calls drop from 2N+1 to 2 (one ps + one batch inspect).
Parse port and created_at from the inspect JSON payload.
- Extract _parse_docker_timestamp() and _extract_host_port() as module-level
pure helpers and test them directly.
- Move datetime/json imports to module top level.
- _make_provider_for_reconciliation(): document the __new__ bypass and the
lockstep coupling to AioSandboxProvider.__init__.
- Add assertion that list_running() makes exactly ONE inspect call.
When a model config includes `reasoning_effort` as an extra YAML field
(ModelConfig uses `extra="allow"`), and the thinking-disabled code path
also injects `reasoning_effort="minimal"` into kwargs, the previous
`model_class(**kwargs, **model_settings_from_config)` call raises:
TypeError: got multiple values for keyword argument 'reasoning_effort'
Fix by merging the two dicts before instantiation, giving runtime kwargs
precedence over config values: `{**model_settings_from_config, **kwargs}`.
Fixes#1977
Co-authored-by: octo-patch <octo-patch@github.com>
* fix(middleware): handle string-serialized options in ClarificationMiddleware (#1995)
Some models (e.g. Qwen3-Max) serialize array tool parameters as JSON
strings instead of native arrays. Add defensive type checking in
_format_clarification_message() to deserialize string options before
iteration, preventing per-character rendering.
* fix(middleware): normalize options after JSON deserialization
Address Copilot review feedback:
- Add post-deserialization normalization so options is always a list
(handles json.loads returning a scalar string, dict, or None)
- Add test for JSON-encoded scalar string ("development")
- Fix test_json_string_with_mixed_types to use actual mixed types
* feat(community): add Exa search as community tool provider
Add Exa (exa.ai) as a new community search provider alongside Tavily,
Firecrawl, InfoQuest, and Jina AI. Exa is an AI-native search engine
with neural, keyword, and auto search types.
New files:
- community/exa/tools.py: web_search_tool and web_fetch_tool
- tests/test_exa_tools.py: 10 unit tests with mocked Exa client
Changes:
- pyproject.toml: add exa-py dependency
- config.example.yaml: add commented-out Exa configuration examples
Usage: set `use: deerflow.community.exa.tools:web_search_tool` in
config.yaml and provide EXA_API_KEY.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(community): address PR review comments for Exa tools
- Make _get_exa_client() accept tool_name param so web_fetch reads its own config
- Remove __init__.py to match namespace package pattern of other providers
- Add duplicate tool name warning in config.example.yaml
- Add regression tests for web_fetch config resolution
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Update revision in uv.lock to 3
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* fix(backend): use timezone-aware UTC in memory modules
Replace datetime.utcnow() with datetime.now(timezone.utc) and a shared
utc_now_iso_z() helper so persisted ISO timestamps keep the trailing Z
suffix without triggering Python 3.12+ deprecation warnings.
Made-with: Cursor
* refactor(backend): use removesuffix for utc_now_iso_z suffix
Makes the +00:00 -> Z transform explicit for the trailing offset only
(Copilot review on PR #1992).
Made-with: Cursor
* style(backend): satisfy ruff UP017 with datetime.UTC in memory queue
Made-with: Cursor
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* Fix event loop conflict in SubagentExecutor.execute()
When SubagentExecutor.execute() is called from within an already-running
event loop (e.g., when the parent agent uses async/await), calling
asyncio.run() creates a new event loop that conflicts with asyncio
primitives (like httpx.AsyncClient) that were created in and bound to
the parent loop.
This fix detects if we're already in a running event loop, and if so,
runs the subagent in a separate thread with its own isolated event loop
to avoid conflicts.
Fixes: sub-task cards not appearing in Ultra mode when using async parent agents
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(subagent): harden isolated event loop execution
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(backend): make loop detection hash tool calls by stable keys
The loop detection middleware previously hashed full tool call arguments,
which made repeated calls look different when only non-essential argument
details changed. In particular, `read_file` calls with nearby line ranges
could bypass repetition detection even when the agent was effectively
reading the same file region again and again.
- Hash tool calls using stable keys instead of the full raw args payload
- Bucket `read_file` line ranges so nearby reads map to the same region key
- Prefer stable identifiers such as `path`, `url`, `query`, or `command`
before falling back to JSON serialization of args
- Keep hashing order-independent so the same tool call set produces the
same hash regardless of call order
Fixes#1905
* fix(backend): harden loop detection hash normalization
- Normalize and parse stringified tool args defensively
- Expand stable key derivation to include pattern, glob, and cmd
- Normalize reversed read_file ranges before bucketing
Fixes#1905
* fix(backend): harden loop detection tool format
* exclude write_file and str_replace from the stable-key path — writing different content to the same file shouldn't be flagged.
---------
Co-authored-by: JeffJiang <for-eleven@hotmail.com>
* fix(subagents): add cooperative cancellation for subagent threads
Subagent tasks run inside ThreadPoolExecutor threads with their own
event loop (asyncio.run). When a user clicks stop, RunManager cancels
the parent asyncio.Task, but Future.cancel() cannot terminate a running
thread and asyncio.Event does not propagate across event loops. This
causes subagent threads to keep executing (writing files, calling LLMs)
even after the user explicitly stops the run.
Fix: add a threading.Event (cancel_event) to SubagentResult and check
it cooperatively in _aexecute()'s astream iteration loop. On cancel,
request_cancel_background_task() sets the event, and the thread exits
at the next iteration boundary.
Changes:
- executor.py: Add cancel_event field to SubagentResult, check it in
_aexecute loop, set it on timeout, add request_cancel_background_task
- task_tool.py: Call request_cancel_background_task on CancelledError
* fix(subagents): guard cancel status and add pre-check before astream
- Only overwrite status to FAILED when still RUNNING, preserving
TIMED_OUT set by the scheduler thread.
- Add cancel_event pre-check before entering the astream loop so
cancellation is detected immediately when already signalled.
* fix(subagents): guard status updates with lock to prevent race condition
Wrap the check-and-set on result.status in _aexecute with
_background_tasks_lock so the timeout handler in execute_async
cannot interleave between the read and write.
* fix(subagents): add dedicated CANCELLED status for user cancellation
Introduce SubagentStatus.CANCELLED to distinguish user-initiated
cancellation from actual execution failures. Update _aexecute,
task_tool polling, cleanup terminal-status sets, and test fixtures.
* test(subagents): add cancellation tests and fix timeout regression test
- Add dedicated TestCooperativeCancellation test class with 6 tests:
- Pre-set cancel_event prevents astream from starting
- Mid-stream cancel_event returns CANCELLED immediately
- request_cancel_background_task() sets cancel_event correctly
- request_cancel on nonexistent task is a no-op
- Real execute_async timeout does not overwrite CANCELLED (deterministic
threading.Event sync, no wall-clock sleeps)
- cleanup_background_task removes CANCELLED tasks
- Add task_tool cancellation coverage:
- test_cancellation_calls_request_cancel: assert CancelledError path
calls request_cancel_background_task(task_id)
- test_task_tool_returns_cancelled_message: assert CANCELLED polling
branch emits task_cancelled event and returns expected message
- Fix pre-existing test infrastructure issue: add deerflow.sandbox.security
to _MOCKED_MODULE_NAMES (fixes ModuleNotFoundError for all executor tests)
- Add RUNNING guard to timeout handler in executor.py to prevent
TIMED_OUT from overwriting CANCELLED status
- Add cooperative cancellation granularity comment documenting that
cancellation is only detected at astream iteration boundaries
---------
Co-authored-by: lulusiyuyu <lulusiyuyu@users.noreply.github.com>
* feat(feishu): add channel file materialization hook for inbound messages
- Introduce Channel.receive_file(msg, thread_id) as a base method for file materialization; default is no-op.
- Implement FeishuChannel.receive_file to download files/images from Feishu messages, save to sandbox, and inject virtual paths into msg.text.
- Update ChannelManager to call receive_file for any channel if msg.files is present, enabling downstream model access to user-uploaded files.
- No impact on Slack/Telegram or other channels (they inherit the default no-op).
* style(backend): format code with ruff for lint compliance
- Auto-formatted packages/harness/deerflow/agents/factory.py and tests/test_create_deerflow_agent.py using `ruff format`
- Ensured both files conform to project linting standards
- Fixes CI lint check failures caused by code style issues
* fix(feishu): handle file write operation asynchronously to prevent blocking
* fix(feishu): rename GetMessageResourceRequest to _GetMessageResourceRequest and remove redundant code
* test(feishu): add tests for receive_file method and placeholder replacement
* fix(manager): remove unnecessary type casting for channel retrieval
* fix(feishu): update logging messages to reflect resource handling instead of image
* fix(feishu): sanitize filename by replacing invalid characters in file uploads
* fix(feishu): improve filename sanitization and reorder image key handling in message processing
* fix(feishu): add thread lock to prevent filename conflicts during file downloads
* fix(test): correct bad merge in test_feishu_parser.py
* chore: run ruff and apply formatting cleanup
fix(feishu): preserve rich-text attachment order and improve fallback filename handling
* fix(sandbox): add L2 input sanitisation to SandboxAuditMiddleware
Add _validate_input() to reject malformed bash commands before regex
classification: empty commands, oversized commands (>10 000 chars), and
null bytes that could cause detection/execution layer inconsistency.
* fix(sandbox): address Copilot review — type guard, log truncation, reject reason
- Coerce None/non-string command to str before validation
- Truncate oversized commands in audit logs to prevent log amplification
- Propagate reject_reason through _pre_process() to block message
- Remove L2 label from comments and test class names
* fix(sandbox): isinstance type guard + async input sanitisation tests
Address review comments:
- Replace str() coercion with isinstance(raw_command, str) guard so
non-string truthy values (0, [], False) fall back to empty string
instead of passing validation as "0"/"[]"/"False".
- Add TestInputSanitisationBlocksInAwrapToolCall with 4 async tests
covering empty, null-byte, oversized, and None command via
awrap_tool_call path.
support for vLLM 0.19.0 OpenAI-compatible chat endpoints and fixes the Qwen reasoning toggle so flash mode can actually disable thinking.
Co-authored-by: NmanQAQ <normangyao@qq.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
ls_tool was the only sandbox tool without output size limits, allowing
multi-MB results from large directories to blow up the model context
window. Add head-truncation (configurable via ls_output_max_chars,
default 20000) consistent with existing bash and read_file truncation.
Closes#1887
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* fix(memory): case-insensitive fact deduplication and positive reinforcement detection
Two fixes to the memory system:
1. _fact_content_key() now lowercases content before comparison, preventing
semantically duplicate facts like "User prefers Python" and "user prefers
python" from being stored separately.
2. Adds detect_reinforcement() to MemoryMiddleware (closes#1719), mirroring
detect_correction(). When users signal approval ("yes exactly", "perfect",
"完全正确", etc.), the memory updater now receives reinforcement_detected=True
and injects a hint prompting the LLM to record confirmed preferences and
behaviors with high confidence.
Changes across the full signal path:
- memory_middleware.py: _REINFORCEMENT_PATTERNS + detect_reinforcement()
- queue.py: reinforcement_detected field in ConversationContext and add()
- updater.py: reinforcement_detected param in update_memory() and
update_memory_from_conversation(); builds reinforcement_hint alongside
the existing correction_hint
Tests: 11 new tests covering deduplication, hint injection, and signal
detection (Chinese + English patterns, window boundary, conflict with correction).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(memory): address Copilot review comments on reinforcement detection
- Tighten _REINFORCEMENT_PATTERNS: remove 很好, require punctuation/end-of-string boundaries on remaining patterns, split this-is-good into stricter variants
- Suppress reinforcement_detected when correction_detected is true to avoid mixed-signal noise
- Use casefold() instead of lower() for Unicode-aware fact deduplication
- Add missing test coverage for reinforcement_detected OR merge and forwarding in queue
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* Rename BACKEND_TODO.md to TODO.md in documentation
* Update MCP Setup Guide link in CONTRIBUTING.md
* Update reference to config.yaml path in documentation
* Fix config file path in TITLE_GENERATION_IMPLEMENTATION.md
Updated the path to the example config file in the documentation.
* fix(docker): use multi-stage build to remove build-essential from runtime image
The build-essential toolchain (~200 MB) was only needed for compiling
native Python extensions during `uv sync` but remained in the final
image, increasing size and attack surface. Split the Dockerfile into
a builder stage (with build-essential) and a clean runtime stage that
copies only the compiled artifacts, Node.js, Docker CLI, and uv.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(docker): add dev stage and pin docker:cli per review feedback
Address Copilot review comments:
- Add a `dev` build stage (FROM builder) that retains build-essential
so startup-time `uv sync` in dev containers can compile from source
- Update docker-compose-dev.yaml to use `target: dev` for gateway and
langgraph services
- Keep the clean runtime stage (no build-essential) as the default
final stage for production builds
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
sandbox_from_runtime() and ensure_sandbox_initialized() write
sandbox_id into runtime.context after acquiring a sandbox. When
lazy_init=True and no context is supplied to the graph run,
runtime.context is None (the LangGraph default), causing a TypeError
on the assignment.
Add `if runtime.context is not None` guards at all three write sites.
Reads already had equivalent guards (e.g. `runtime.context.get(...) if
runtime.context else None`); this brings writes into line.
Previously, the list endpoint always returned soul=null because
_agent_config_to_response() was called without include_soul=True.
This caused confusion since PUT /api/agents/{name} and GET /api/agents/{name}
both returned the soul content, but the list endpoint silently omitted it.
Co-authored-by: octo-patch <octo-patch@users.noreply.github.com>
* feat(uploads): guide agent to use grep/glob/read_file for uploaded documents
Add workflow guidance to the <uploaded_files> context block so the agent
knows to use grep and glob (added in #1784) alongside read_file when
working with uploaded documents, rather than falling back to web search.
This is the final piece of the three-PR PDF agentic search pipeline:
- PR1 (#1727): pymupdf4llm converter produces structured Markdown with headings
- PR2 (#1738): document outline injected into agent context with line numbers
- PR3 (this): agent guided to use outline + grep + read_file workflow
* feat(uploads): add file-first priority and fallback guidance to uploaded_files context
* fix(uploads): handle split-bold headings and ** ** artefacts in extract_outline
- Add _clean_bold_title() to merge adjacent bold spans (** **) produced
by pymupdf4llm when bold text crosses span boundaries
- Add _SPLIT_BOLD_HEADING_RE (Style 3) to recognise **<num>** **<title>**
headings common in academic papers; excludes pure-number table headers
and rows with more than 4 bold blocks
- When outline is empty, read first 5 non-empty lines of the .md as a
content preview and surface a grep hint in the agent context
- Update _format_file_entry to render the preview + grep hint instead of
silently omitting the outline section
- Add 3 new extract_outline tests and 2 new middleware tests (65 total)
* fix(uploads): address Copilot review comments on extract_outline regex
- Replace ASCII [A-Za-z] guard with negative lookahead to support non-ASCII
titles (e.g. **1** **概述**); pure-numeric/punctuation blocks still excluded
- Replace .+ with [^*]+ and cap repetition at {0,2} (four blocks total) to
keep _SPLIT_BOLD_HEADING_RE linear and avoid ReDoS on malformed input
- Remove now-redundant len(blocks) <= 4 code-level check (enforced by regex)
- Log debug message with exc_info when preview extraction fails
* feat(uploads): guide agent to use grep/glob/read_file for uploaded documents
Add workflow guidance to the <uploaded_files> context block so the agent
knows to use grep and glob (added in #1784) alongside read_file when
working with uploaded documents, rather than falling back to web search.
This is the final piece of the three-PR PDF agentic search pipeline:
- PR1 (#1727): pymupdf4llm converter produces structured Markdown with headings
- PR2 (#1738): document outline injected into agent context with line numbers
- PR3 (this): agent guided to use outline + grep + read_file workflow
* feat(uploads): add file-first priority and fallback guidance to uploaded_files context
* fix: inject longTermBackground into memory prompt
The format_memory_for_injection function only processed recentMonths and
earlierContext from the history section, silently dropping longTermBackground.
The LLM writes longTermBackground correctly and it persists to memory.json,
but it was never injected into the system prompt — making the user's
long-term background invisible to the AI.
Add the missing field handling and a regression test.
* fix(middleware): handle list-type AIMessage.content in LoopDetectionMiddleware
LangChain AIMessage.content can be str | list. When using providers that
return structured content blocks (e.g. Anthropic thinking mode, certain
OpenAI-compatible gateways), content is a list of dicts like
[{"type": "text", "text": "..."}].
The hard_limit branch in _apply() concatenated content with a string via
(last_msg.content or "") + f"\n\n{_HARD_STOP_MSG}", which raises
TypeError when content is a non-empty list (list + str is invalid).
Add _append_text() static method that:
- Returns the text directly when content is None
- Appends a {"type": "text"} block when content is a list
- Falls back to string concatenation when content is a str
This is consistent with how other modules in the project already handle
list content (client.py._extract_text, memory_middleware, executor.py).
* test(middleware): add unit tests for _append_text and list content hard stop
Add regression tests to verify LoopDetectionMiddleware handles list-type
AIMessage.content correctly during hard stop:
- TestAppendText: unit tests for the new _append_text() static method
covering None, str, list (including empty list) content types
- TestHardStopWithListContent: integration tests verifying hard stop
works correctly with list content (Anthropic thinking mode), None
content, and str content
Requested by reviewer in PR #1823.
* fix(middleware): improve _append_text robustness and test isolation
- Add explicit isinstance(content, str) check with fallback for
unexpected types (coerce to str) to prevent TypeError on edge cases
- Deep-copy list content in _make_state() test helper to prevent
shared mutable references across test iterations
- Add test_unexpected_type_coerced_to_str: verify fallback for
non-str/list/None content types
- Add test_list_content_not_mutated_in_place: verify _append_text
does not modify the original list
* style: fix ruff format whitespace in test file
---------
Co-authored-by: ppyt <14163465+ppyt@users.noreply.github.com>
* feat(uploads): add pymupdf4llm PDF converter with auto-fallback and async offload
- Introduce pymupdf4llm as an optional PDF converter with better heading
detection and table preservation than MarkItDown
- Auto mode: prefer pymupdf4llm when installed; fall back to MarkItDown
when output is suspiciously sparse (image-based / scanned PDFs)
- Sparsity check uses chars-per-page (< 50 chars/page) rather than an
absolute threshold, correctly handling both short and long documents
- Large files (> 1 MB) are offloaded to asyncio.to_thread() to avoid
blocking the event loop (related: #1569)
- Add UploadsConfig with pdf_converter field (auto/pymupdf4llm/markitdown)
- Add pymupdf4llm as optional dependency: pip install deerflow-harness[pymupdf]
- Add 14 unit tests covering sparsity heuristic, routing logic, and async path
* fix(uploads): address Copilot review comments on PDF converter
- Fix docstring: MIN_CHARS_PYMUPDF -> _MIN_CHARS_PER_PAGE (typo)
- Fix file handle leak: wrap pymupdf.open in try/finally to ensure doc.close()
- Fix silent fallback gap: _convert_pdf_with_pymupdf4llm now catches all
conversion exceptions (not just ImportError), so encrypted/corrupt PDFs
fall back to MarkItDown instead of propagating
- Tighten type: pdf_converter field changed from str to Literal[auto|pymupdf4llm|markitdown]
- Normalize config value: _get_pdf_converter() strips and lowercases the raw
config string, warns and falls back to 'auto' on unknown values