1966 Commits

Author SHA1 Message Date
rayhpeng
d8ecaf46c9 feat(persistence): add unified persistence layer with event store, token tracking, and feedback (#1930)
* feat(persistence): add SQLAlchemy 2.0 async ORM scaffold

Introduce a unified database configuration (DatabaseConfig) that
controls both the LangGraph checkpointer and the DeerFlow application
persistence layer from a single `database:` config section.

New modules:
- deerflow.config.database_config — Pydantic config with memory/sqlite/postgres backends
- deerflow.persistence — async engine lifecycle, DeclarativeBase with to_dict mixin, Alembic skeleton
- deerflow.runtime.runs.store — RunStore ABC + MemoryRunStore implementation

Gateway integration initializes/tears down the persistence engine in
the existing langgraph_runtime() context manager. Legacy checkpointer
config is preserved for backward compatibility.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(persistence): add RunEventStore ABC + MemoryRunEventStore

Phase 2-A prerequisite for event storage: adds the unified run event
stream interface (RunEventStore) with an in-memory implementation,
RunEventsConfig, gateway integration, and comprehensive tests (27 cases).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(persistence): add ORM models, repositories, DB/JSONL event stores, RunJournal, and API endpoints

Phase 2-B: run persistence + event storage + token tracking.

- ORM models: RunRow (with token fields), ThreadMetaRow, RunEventRow
- RunRepository implements RunStore ABC via SQLAlchemy ORM
- ThreadMetaRepository with owner access control
- DbRunEventStore with trace content truncation and cursor pagination
- JsonlRunEventStore with per-run files and seq recovery from disk
- RunJournal (BaseCallbackHandler) captures LLM/tool/lifecycle events,
  accumulates token usage by caller type, buffers and flushes to store
- RunManager now accepts optional RunStore for persistent backing
- Worker creates RunJournal, writes human_message, injects callbacks
- Gateway deps use factory functions (RunRepository when DB available)
- New endpoints: messages, run messages, run events, token-usage
- ThreadCreateRequest gains assistant_id field
- 92 tests pass (33 new), zero regressions

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(persistence): add user feedback + follow-up run association

Phase 2-C: feedback and follow-up tracking.

- FeedbackRow ORM model (rating +1/-1, optional message_id, comment)
- FeedbackRepository with CRUD, list_by_run/thread, aggregate stats
- Feedback API endpoints: create, list, stats, delete
- follow_up_to_run_id in RunCreateRequest (explicit or auto-detected
  from latest successful run on the thread)
- Worker writes follow_up_to_run_id into human_message event metadata
- Gateway deps: feedback_repo factory + getter
- 17 new tests (14 FeedbackRepository + 3 follow-up association)
- 109 total tests pass, zero regressions

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test+config: comprehensive Phase 2 test coverage + deprecate checkpointer config

- config.example.yaml: deprecate standalone checkpointer section, activate
  unified database:sqlite as default (drives both checkpointer + app data)
- New: test_thread_meta_repo.py (14 tests) — full ThreadMetaRepository coverage
  including check_access owner logic, list_by_owner pagination
- Extended test_run_repository.py (+4 tests) — completion preserves fields,
  list ordering desc, limit, owner_none returns all
- Extended test_run_journal.py (+8 tests) — on_chain_error, track_tokens=false,
  middleware no ai_message, unknown caller tokens, convenience fields,
  tool_error, non-summarization custom event
- Extended test_run_event_store.py (+7 tests) — DB batch seq continuity,
  make_run_event_store factory (memory/db/jsonl/fallback/unknown)
- Extended test_phase2b_integration.py (+4 tests) — create_or_reject persists,
  follow-up metadata, summarization in history, full DB-backed lifecycle
- Fixed DB integration test to use proper fake objects (not MagicMock)
  for JSON-serializable metadata
- 157 total Phase 2 tests pass, zero regressions

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* config: move default sqlite_dir to .deer-flow/data

Keep SQLite databases alongside other DeerFlow-managed data
(threads, memory) under the .deer-flow/ directory instead of a
top-level ./data folder.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor(persistence): remove UTFJSON, use engine-level json_serializer + datetime.now()

- Replace custom UTFJSON type with standard sqlalchemy.JSON in all ORM
  models. Add json_serializer=json.dumps(ensure_ascii=False) to all
  create_async_engine calls so non-ASCII text (Chinese etc.) is stored
  as-is in both SQLite and Postgres.
- Change ORM datetime defaults from datetime.now(UTC) to datetime.now(),
  remove UTC imports.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor(gateway): simplify deps.py with getter factory + inline repos

- Replace 6 identical getter functions with _require() factory.
- Inline 3 _make_*_repo() factories into langgraph_runtime(), call
  get_session_factory() once instead of 3 times.
- Add thread_meta upsert in start_run (services.py).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(docker): add UV_EXTRAS build arg for optional dependencies

Support installing optional dependency groups (e.g. postgres) at
Docker build time via UV_EXTRAS build arg:
  UV_EXTRAS=postgres docker compose build

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor(journal): fix flush, token tracking, and consolidate tests

RunJournal fixes:
- _flush_sync: retain events in buffer when no event loop instead of
  dropping them; worker's finally block flushes via async flush().
- on_llm_end: add tool_calls filter and caller=="lead_agent" guard for
  ai_message events; mark message IDs for dedup with record_llm_usage.
- worker.py: persist completion data (tokens, message count) to RunStore
  in finally block.

Model factory:
- Auto-inject stream_usage=True for BaseChatOpenAI subclasses with
  custom api_base, so usage_metadata is populated in streaming responses.

Test consolidation:
- Delete test_phase2b_integration.py (redundant with existing tests).
- Move DB-backed lifecycle test into test_run_journal.py.
- Add tests for stream_usage injection in test_model_factory.py.
- Clean up executor/task_tool dead journal references.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(events): widen content type to str|dict in all store backends

Allow event content to be a dict (for structured OpenAI-format messages)
in addition to plain strings. Dict values are JSON-serialized for the DB
backend and deserialized on read; memory and JSONL backends handle dicts
natively. Trace truncation now serializes dicts to JSON before measuring.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(events): use metadata flag instead of heuristic for dict content detection

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(converters): add LangChain-to-OpenAI message format converters

Pure functions langchain_to_openai_message, langchain_to_openai_completion,
langchain_messages_to_openai, and _infer_finish_reason for converting
LangChain BaseMessage objects to OpenAI Chat Completions format, used by
RunJournal for event storage. 15 unit tests added.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(converters): handle empty list content as null, clean up test

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(events): human_message content uses OpenAI user message format

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(events): ai_message uses OpenAI format, add ai_tool_call message event

- ai_message content now uses {"role": "assistant", "content": "..."} format
- New ai_tool_call message event emitted when lead_agent LLM responds with tool_calls
- ai_tool_call uses langchain_to_openai_message converter for consistent format
- Both events include finish_reason in metadata ("stop" or "tool_calls")

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(events): add tool_result message event with OpenAI tool message format

Cache tool_call_id from on_tool_start keyed by run_id as fallback for on_tool_end,
then emit a tool_result message event (role=tool, tool_call_id, content) after each
successful tool completion.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(events): summary content uses OpenAI system message format

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(events): replace llm_start/llm_end with llm_request/llm_response in OpenAI format

Add on_chat_model_start to capture structured prompt messages as llm_request events.
Replace llm_end trace events with llm_response using OpenAI Chat Completions format.
Track llm_call_index to pair request/response events.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(events): add record_middleware method for middleware trace events

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test(events): add full run sequence integration test for OpenAI content format

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(events): align message events with checkpoint format and add middleware tag injection

- Message events (ai_message, ai_tool_call, tool_result, human_message) now use
  BaseMessage.model_dump() format, matching LangGraph checkpoint values.messages
- on_tool_end extracts tool_call_id/name/status from ToolMessage objects
- on_tool_error now emits tool_result message events with error status
- record_middleware uses middleware:{tag} event_type and middleware category
- Summarization custom events use middleware:summarize category
- TitleMiddleware injects middleware:title tag via get_config() inheritance
- SummarizationMiddleware model bound with middleware:summarize tag
- Worker writes human_message using HumanMessage.model_dump()

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(threads): switch search endpoint to threads_meta table and sync title

- POST /api/threads/search now queries threads_meta table directly,
  removing the two-phase Store + Checkpointer scan approach
- Add ThreadMetaRepository.search() with metadata/status filters
- Add ThreadMetaRepository.update_display_name() for title sync
- Worker syncs checkpoint title to threads_meta.display_name on run completion
- Map display_name to values.title in search response for API compatibility

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(threads): history endpoint reads messages from event store

- POST /api/threads/{thread_id}/history now combines two data sources:
  checkpointer for checkpoint_id, metadata, title, thread_data;
  event store for messages (complete history, not truncated by summarization)
- Strip internal LangGraph metadata keys from response
- Remove full channel_values serialization in favor of selective fields

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: remove duplicate optional-dependencies header in pyproject.toml

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(middleware): pass tagged config to TitleMiddleware ainvoke call

Without the config, the middleware:title tag was not injected,
causing the LLM response to be recorded as a lead_agent ai_message
in run_events.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: resolve merge conflict in .env.example

Keep both DATABASE_URL (from persistence-scaffold) and WECOM
credentials (from main) after the merge.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(persistence): address review feedback on PR #1851

- Fix naive datetime.now() → datetime.now(UTC) in all ORM models
- Fix seq race condition in DbRunEventStore.put() with FOR UPDATE
  and UNIQUE(thread_id, seq) constraint
- Encapsulate _store access in RunManager.update_run_completion()
- Deduplicate _store.put() logic in RunManager via _persist_to_store()
- Add update_run_completion to RunStore ABC + MemoryRunStore
- Wire follow_up_to_run_id through the full create path
- Add error recovery to RunJournal._flush_sync() lost-event scenario
- Add migration note for search_threads breaking change
- Fix test_checkpointer_none_fix mock to set database=None

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: update uv.lock

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(persistence): address 22 review comments from CodeQL, Copilot, and Code Quality

Bug fixes:
- Sanitize log params to prevent log injection (CodeQL)
- Reset threads_meta.status to idle/error when run completes
- Attach messages only to latest checkpoint in /history response
- Write threads_meta on POST /threads so new threads appear in search

Lint fixes:
- Remove unused imports (journal.py, migrations/env.py, test_converters.py)
- Convert lambda to named function (engine.py, Ruff E731)
- Remove unused logger definitions in repos (Ruff F841)
- Add logging to JSONL decode errors and empty except blocks
- Separate assert side-effects in tests (CodeQL)
- Remove unused local variables in tests (Ruff F841)
- Fix max_trace_content truncation to use byte length, not char length

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* style: apply ruff format to persistence and runtime files

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Potential fix for pull request finding 'Statement has no effect'

Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>

* refactor(runtime): introduce RunContext to reduce run_agent parameter bloat

Extract checkpointer, store, event_store, run_events_config, thread_meta_repo,
and follow_up_to_run_id into a frozen RunContext dataclass. Add get_run_context()
in deps.py to build the base context from app.state singletons. start_run() uses
dataclasses.replace() to enrich per-run fields before passing ctx to run_agent.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor(gateway): move sanitize_log_param to app/gateway/utils.py

Extract the log-injection sanitizer from routers/threads.py into a shared
utils module and rename to sanitize_log_param (public API). Eliminates the
reverse service → router import in services.py.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* perf: use SQL aggregation for feedback stats and thread token usage

Replace Python-side counting in FeedbackRepository.aggregate_by_run with
a single SELECT COUNT/SUM query. Add RunStore.aggregate_tokens_by_thread
abstract method with SQL GROUP BY implementation in RunRepository and
Python fallback in MemoryRunStore. Simplify the thread_token_usage
endpoint to delegate to the new method, eliminating the limit=10000
truncation risk.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: annotate DbRunEventStore.put() as low-frequency path

Add docstring clarifying that put() opens a per-call transaction with
FOR UPDATE and should only be used for infrequent writes (currently
just the initial human_message event). High-throughput callers should
use put_batch() instead.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(threads): fall back to Store search when ThreadMetaRepository is unavailable

When database.backend=memory (default) or no SQL session factory is
configured, search_threads now queries the LangGraph Store instead of
returning 503. Returns empty list if neither Store nor repo is available.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor(persistence): introduce ThreadMetaStore ABC for backend-agnostic thread metadata

Add ThreadMetaStore abstract base class with create/get/search/update/delete
interface. ThreadMetaRepository (SQL) now inherits from it. New
MemoryThreadMetaStore wraps LangGraph BaseStore for memory-mode deployments.

deps.py now always provides a non-None thread_meta_repo, eliminating all
`if thread_meta_repo is not None` guards in services.py, worker.py, and
routers/threads.py. search_threads no longer needs a Store fallback branch.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor(history): read messages from checkpointer instead of RunEventStore

The /history endpoint now reads messages directly from the
checkpointer's channel_values (the authoritative source) instead of
querying RunEventStore.list_messages(). The RunEventStore API is
preserved for other consumers.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(persistence): address new Copilot review comments

- feedback.py: validate thread_id/run_id before deleting feedback
- jsonl.py: add path traversal protection with ID validation
- run_repo.py: parse `before` to datetime for PostgreSQL compat
- thread_meta_repo.py: fix pagination when metadata filter is active
- database_config.py: use resolve_path for sqlite_dir consistency

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Implement skill self-evolution and skill_manage flow (#1874)

* chore: ignore .worktrees directory

* Add skill_manage self-evolution flow

* Fix CI regressions for skill_manage

* Address PR review feedback for skill evolution

* fix(skill-evolution): preserve history on delete

* fix(skill-evolution): tighten scanner fallbacks

* docs: add skill_manage e2e evidence screenshot

* fix(skill-manage): avoid blocking fs ops in session runtime

---------

Co-authored-by: Willem Jiang <willem.jiang@gmail.com>

* fix(config): resolve sqlite_dir relative to CWD, not Paths.base_dir

resolve_path() resolves relative to Paths.base_dir (.deer-flow),
which double-nested the path to .deer-flow/.deer-flow/data/app.db.
Use Path.resolve() (CWD-relative) instead.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Feature/feishu receive file (#1608)

* feat(feishu): add channel file materialization hook for inbound messages

- Introduce Channel.receive_file(msg, thread_id) as a base method for file materialization; default is no-op.
- Implement FeishuChannel.receive_file to download files/images from Feishu messages, save to sandbox, and inject virtual paths into msg.text.
- Update ChannelManager to call receive_file for any channel if msg.files is present, enabling downstream model access to user-uploaded files.
- No impact on Slack/Telegram or other channels (they inherit the default no-op).

* style(backend): format code with ruff for lint compliance

- Auto-formatted packages/harness/deerflow/agents/factory.py and tests/test_create_deerflow_agent.py using `ruff format`
- Ensured both files conform to project linting standards
- Fixes CI lint check failures caused by code style issues

* fix(feishu): handle file write operation asynchronously to prevent blocking

* fix(feishu): rename GetMessageResourceRequest to _GetMessageResourceRequest and remove redundant code

* test(feishu): add tests for receive_file method and placeholder replacement

* fix(manager): remove unnecessary type casting for channel retrieval

* fix(feishu): update logging messages to reflect resource handling instead of image

* fix(feishu): sanitize filename by replacing invalid characters in file uploads

* fix(feishu): improve filename sanitization and reorder image key handling in message processing

* fix(feishu): add thread lock to prevent filename conflicts during file downloads

* fix(test): correct bad merge in test_feishu_parser.py

* chore: run ruff and apply formatting cleanup
fix(feishu): preserve rich-text attachment order and improve fallback filename handling

* fix(docker): restore gateway env vars and fix langgraph empty arg issue (#1915)

Two production docker-compose.yaml bugs prevent `make up` from working:

1. Gateway missing DEER_FLOW_CONFIG_PATH and DEER_FLOW_EXTENSIONS_CONFIG_PATH
   environment overrides. Added in fb2d99f (#1836) but accidentally reverted
   by ca2fb95 (#1847). Without them, gateway reads host paths from .env via
   env_file, causing FileNotFoundError inside the container.

2. Langgraph command fails when LANGGRAPH_ALLOW_BLOCKING is unset (default).
   Empty $${allow_blocking} inserts a bare space between flags, causing
   ' --no-reload' to be parsed as unexpected extra argument. Fix by building
   args string first and conditionally appending --allow-blocking.

Co-authored-by: cooper <cooperfu@tencent.com>

* fix(frontend): resolve invalid HTML nesting and tabnabbing vulnerabilities (#1904)

* fix(frontend): resolve invalid HTML nesting and tabnabbing vulnerabilities

Fix `<button>` inside `<a>` invalid HTML in artifact components and add
missing `noopener,noreferrer` to `window.open` calls to prevent reverse
tabnabbing.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(frontend): address Copilot review on tabnabbing and double-tab-open

Remove redundant parent onClick on web_fetch ChainOfThoughtStep to
prevent opening two tabs on link click, and explicitly null out
window.opener after window.open() for defensive tabnabbing hardening.

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* refactor(persistence): organize entities into per-entity directories

Restructure the persistence layer from horizontal "models/ + repositories/"
split into vertical entity-aligned directories. Each entity (thread_meta,
run, feedback) now owns its ORM model, abstract interface (where applicable),
and concrete implementations under a single directory with an aggregating
__init__.py for one-line imports.

Layout:
  persistence/thread_meta/{base,model,sql,memory}.py
  persistence/run/{model,sql}.py
  persistence/feedback/{model,sql}.py

models/__init__.py is kept as a facade so Alembic autogenerate continues to
discover all ORM tables via Base.metadata. RunEventRow remains under
models/run_event.py because its storage implementation lives in
runtime/events/store/db.py and has no matching repository directory.

The repositories/ directory is removed entirely. All call sites in
gateway/deps.py and tests are updated to import from the new entity
packages, e.g.:

    from deerflow.persistence.thread_meta import ThreadMetaRepository
    from deerflow.persistence.run import RunRepository
    from deerflow.persistence.feedback import FeedbackRepository

Full test suite passes (1690 passed, 14 skipped).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(gateway): sync thread rename and delete through ThreadMetaStore

The POST /threads/{id}/state endpoint previously synced title changes
only to the LangGraph Store via _store_upsert. In sqlite mode the search
endpoint reads from the ThreadMetaRepository SQL table, so renames never
appeared in /threads/search until the next agent run completed (worker.py
syncs title from checkpoint to thread_meta in its finally block).

Likewise the DELETE /threads/{id} endpoint cleaned up the filesystem,
Store, and checkpointer but left the threads_meta row orphaned in sqlite,
so deleted threads kept appearing in /threads/search.

Fix both endpoints by routing through the ThreadMetaStore abstraction
which already has the correct sqlite/memory implementations wired up by
deps.py. The rename path now calls update_display_name() and the delete
path calls delete() — both work uniformly across backends.

Verified end-to-end with curl in gateway mode against sqlite backend.
Existing test suite (1690 passed) and focused router/repo tests pass.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor(gateway): route all thread metadata access through ThreadMetaStore

Following the rename/delete bug fix in PR1, migrate the remaining direct
LangGraph Store reads/writes in the threads router and services to the
ThreadMetaStore abstraction so that the sqlite and memory backends behave
identically and the legacy dual-write paths can be removed.

Migrated endpoints (threads.py):
- create_thread: idempotency check + write now use thread_meta_repo.get/create
  instead of dual-writing the LangGraph Store and the SQL row.
- get_thread: reads from thread_meta_repo.get; the checkpoint-only fallback
  for legacy threads is preserved.
- patch_thread: replaced _store_get/_store_put with thread_meta_repo.update_metadata.
- delete_thread_data: dropped the legacy store.adelete; thread_meta_repo.delete
  already covers it.

Removed dead code (services.py):
- _upsert_thread_in_store — redundant with the immediately following
  thread_meta_repo.create() call.
- _sync_thread_title_after_run — worker.py's finally block already syncs
  the title via thread_meta_repo.update_display_name() after each run.

Removed dead code (threads.py):
- _store_get / _store_put / _store_upsert helpers (no remaining callers).
- THREADS_NS constant.
- get_store import (router no longer touches the LangGraph Store directly).

New abstract method:
- ThreadMetaStore.update_metadata(thread_id, metadata) merges metadata into
  the thread's metadata field. Implemented in both ThreadMetaRepository (SQL,
  read-modify-write inside one session) and MemoryThreadMetaStore. Three new
  unit tests cover merge / empty / nonexistent behaviour.

Net change: -134 lines. Full test suite: 1693 passed, 14 skipped.
Verified end-to-end with curl in gateway mode against sqlite backend
(create / patch / get / rename / search / delete).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
Co-authored-by: DanielWalnut <45447813+hetaoBackend@users.noreply.github.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: JilongSun <965640067@qq.com>
Co-authored-by: jie <49781832+stan-fu@users.noreply.github.com>
Co-authored-by: cooper <cooperfu@tencent.com>
Co-authored-by: yangzheli <43645580+yangzheli@users.noreply.github.com>
2026-04-26 11:05:47 +08:00
Willem Jiang
9dc25987e0
fix(channles):update the logger for the channel config (#2524)
* fix(channles):update the logger for the channel config

* fix(channels): normalize credential values and add tests for disabled-but-configured warning

Agent-Logs-Url: https://github.com/bytedance/deer-flow/sessions/dfc0a566-aa59-49f9-a74d-610292fb0a63

Co-authored-by: WillemJiang <219644+WillemJiang@users.noreply.github.com>

* fix the backend lint error

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
2026-04-26 10:09:55 +08:00
Willem Jiang
8a044142cb
feat(dev): add pre-commit hooks for ruff, eslint, and prettier (#2525)
* feat(dev): add pre-commit hooks for ruff, eslint, and prettier

* fix: use local uv-based ruff hooks and uv run for pre-commit install

Agent-Logs-Url: https://github.com/bytedance/deer-flow/sessions/a1e34cc5-0d4b-4400-9e6a-e687d964ff1e

Co-authored-by: WillemJiang <219644+WillemJiang@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
2026-04-26 09:40:17 +08:00
ming1523
410f0c48b5
fix(channels): accept single slack allowed user (#2481)
* fix(channels): accept single slack allowed user

* docs: address Slack allowed_users review notes

* ci: rerun backend unit tests

* docs: clarify Slack allowed_users config

---------

Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
2026-04-25 19:40:52 +08:00
Octopus
1f59e945af
fix: cap prompt caching breakpoints at 4 to prevent API 400 errors (#2449)
* fix: cap prompt caching breakpoints at 4 to prevent API 400 errors (fixes #2448)

The previous _apply_prompt_caching() attached cache_control to every text
block in the system prompt, every content block in the last N messages, and
the last tool definition. In multi-turn conversations with structured content
blocks this easily exceeded the 4-breakpoint hard limit enforced by both the
Anthropic API and AWS Bedrock, producing a 400 Bad Request (or a silent
"No generations found in stream" when streaming).

Fix: collect all candidate blocks in document order, then apply cache_control
only to the last MAX_CACHE_BREAKPOINTS (4) of them. Later breakpoints cover a
larger prefix and therefore yield better cache hit rates, making this the
optimal placement strategy as well as the safe one.

Adds 13 unit tests covering the budget cap, edge cases, and correct
last-candidate placement.

* docs: clarify _apply_prompt_caching docstring includes tool definitions

Per Copilot review: the implementation also caches the last tool definition
(see the candidates list at lines 202-205), so the docstring summary should
explicitly mention tools alongside system and recent messages.

* Fix the lint error

* style: fix ruff format check for test_claude_provider_prompt_caching.py

Add the missing blank line before the 'Edge cases' section comment so
that ruff format --check passes in CI.

---------

Co-authored-by: octo-patch <octo-patch@github.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
2026-04-25 19:40:06 +08:00
IECspace
f394c0d8c8
feat(mcp): support custom tool interceptors via extensions_config.json (#2451)
* feat(mcp): support custom tool interceptors via extensions_config.json

Add a generic extension point for registering custom MCP tool
interceptors through `extensions_config.json`. This allows downstream
projects to inject per-request header manipulation, auth context
propagation, or other cross-cutting concerns without modifying
DeerFlow source code.

Interceptors are declared as Python callable paths in a new
`mcpInterceptors` array field and loaded via the existing
`resolve_variable` reflection mechanism:

```json
{
  "mcpInterceptors": [
    "my_package.mcp.auth:build_auth_interceptor"
  ]
}
```

Each entry must resolve to a no-arg builder function that returns an
async interceptor compatible with `MultiServerMCPClient`'s
`tool_interceptors` interface.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test(mcp): add unit tests for custom tool interceptors

Cover all branches of the mcpInterceptors loading logic:

- valid interceptor loaded and appended to tool_interceptors
- multiple interceptors loaded in declaration order
- builder returning None is skipped
- resolve_variable ImportError logged and skipped
- builder raising exception logged and skipped
- absent mcpInterceptors field is safe (no-op)
- custom interceptors coexist with OAuth interceptor

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Potential fix for pull request finding

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

* fix(mcp): validate mcpInterceptors type and fix lint warnings

Address review feedback:

1. Validate mcpInterceptors config value before iterating:
   - Accept a single string and normalize to [string]
   - Ignore None silently
   - Log warning and skip for non-list/non-string types

2. Fix ruff F841 lint errors in tests:
   - Rename _make_mock_env to _make_patches, embed mock_client
   - Remove unused `as mock_cls` bindings where not needed
   - Extract _get_interceptors() helper to reduce repetition

3. Add two new test cases for type validation:
   - test_mcp_interceptors_single_string_is_normalized
   - test_mcp_interceptors_invalid_type_logs_warning

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(mcp): validate interceptor return type and fix import mock path

Address review feedback:

1. Validate builder return type with callable() check:
   - callable interceptor → append to tool_interceptors
   - None → silently skip (builder opted out)
   - non-callable → log warning with type name and skip

2. Fix test mock path: resolve_variable is a top-level import in
   tools.py, so mock deerflow.mcp.tools.resolve_variable instead of
   deerflow.reflection.resolve_variable to correctly intercept calls.

3. Add test_custom_interceptor_non_callable_return_logs_warning to
   cover the new non-callable validation branch.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs(mcp): add mcpInterceptors example and documentation

- Add mcpInterceptors field to extensions_config.example.json
- Add "Custom Tool Interceptors" section to MCP_SERVER.md with
  configuration format, example interceptor code, and edge case
  behavior notes

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: IECspace <IECspace@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
2026-04-25 09:18:13 +08:00
orbisai0security
950821cb9b
fix: use subprocess instead of os.system in local_backend.py (#2494)
* fix: use subprocess instead of os.system in local_backend.py

The sandbox backend and skill evaluation scripts use subprocess

* fixing the failing test

---------

Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
2026-04-25 08:59:31 +08:00
pyp0327
2bb1a2dfa2
feat(models): Provider for MindIE model engine (#2483)
* feat(models): 适配 MindIE引擎的模型

* test: add unit tests for MindIEChatModel adapter and fix PR review comments

* chore: update uv.lock with pytest-asyncio

* build: add pytest-asyncio to test dependencies

* fix: address PR review comments (lazy import, cache clients, safe newline escape, strict xml regex)

---------

Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
2026-04-25 08:59:03 +08:00
DanielWalnut
b970993425
fix: read lead agent options from context (#2515)
* fix: read lead agent options from context

* fix: validate runtime context config
2026-04-24 22:46:51 +08:00
DanielWalnut
ec8a8cae38
fix: gate deferred MCP tool execution (#2513)
* fix: gate deferred MCP tool execution

* style: format deferred tool middleware

* fix: address deferred tool review feedback
2026-04-24 22:45:41 +08:00
DanielWalnut
d78ed5c8f2
fix: inherit subagent skill allowlists (#2514) 2026-04-24 21:24:42 +08:00
Nan Gao
f9ff3a698d
fix(middleware): avoid rescuing non-skill tool outputs during summarization (#2458)
* fix(middelware): narrow skill rescue to skill-related tool outputs

* fix(summarization): address skill rescue review feedback

* fix: wire summarization skill rescue config

* fix: remove dead skill tool helper

* fix(lint): fix format

---------

Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
2026-04-24 21:19:46 +08:00
Admire
c2332bb790
fix memory settings layout overflow (#2420)
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
2026-04-24 20:29:55 +08:00
He Wang
3a61126824
fix: keep debug.py interactive terminal free from background log noise (#2466)
* fix(debug): keep terminal clean by redirecting all logs to file

- Redirect all logs to debug.log file to prevent background task logs
  from interfering with interactive terminal prompts
- Honor AppConfig.log_level setting instead of hard-coding to INFO
- Make logging setup idempotent by clearing pre-existing handlers
- Defer deerflow imports until after logging is configured to ensure
  import-time side effects are captured in debug.log
- Display active log level in startup banner
- Add prompt_toolkit installation tip for enhanced readline support

Made-with: Cursor

* attaching the file handler before importing/calling get_app_config()

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-04-24 17:09:41 +08:00
Airene Fang
11f557a2c6
feat(trace):Add run_name to the trace info for system agents. (#2492)
* feat(trace): Add `run_name` to the trace info for suggestions and memory.

before(in langsmith):
CodexChatModel
CodexChatModel
lead_agent
after:
suggest_agent
memory_agent
lead_agent

feat(trace): Add `run_name` to the trace info for suggestions and memory.

before(in langsmith):
CodexChatModel
CodexChatModel
lead_agent
after:
suggest_agent
memory_agent
lead_agent

* feat(trace): Add `run_name` to the trace info for system agents.

before(in langsmith):
CodexChatModel
CodexChatModel
CodexChatModel
CodexChatModel
lead_agent
after:
suggest_agent
title_agent
security_agent
memory_agent
lead_agent

* chore(code format):code format

---------

Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
2026-04-24 17:06:55 +08:00
d 🔹
e8572b9d0c
fix(jina): log transient failures at WARNING without traceback (#2484) (#2485)
The exception handler in JinaClient.crawl used logger.exception, which
emits an ERROR-level record with the full httpx/httpcore/anyio traceback
for every transient network failure (timeout, connection refused). Other
search/crawl providers in the project log the same class of recoverable
failures as a single line. One offline/slow-network session could produce
dozens of multi-frame ERROR stack traces, drowning out real problems.

Switch to logger.warning with a concise message that includes the
exception type and its str, matching the style used elsewhere for
recoverable transient failures (aio_sandbox, ddg, etc.). The exception
type now also surfaces into the returned "Error: ..." string so callers
retain diagnostic signal.

Adds a regression test that asserts the log record is WARNING, carries
no exc_info, and includes the exception class name.

Co-authored-by: voidborne-d <voidborne-d@users.noreply.github.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
2026-04-24 16:00:14 +08:00
Willem Jiang
80a7446fd6 fix(backend): fix the unit test error in backend 2026-04-24 14:56:03 +08:00
Willem Jiang
cd12821134 fix(backend): Updated the uv.lock with new added dependency 2026-04-24 14:55:13 +08:00
Xinmin Zeng
30d619de08
feat(subagents): support per-subagent skill loading and custom subagent types (#2253)
* feat(subagents): support per-subagent skill loading and custom subagent types (#2230)

Add per-subagent skill configuration and custom subagent type registration,
aligned with Codex's role-based config layering and per-session skill injection.

Backend:
- SubagentConfig gains `skills` field (None=all, []=none, list=whitelist)
- New CustomSubagentConfig for user-defined subagent types in config.yaml
- SubagentsAppConfig gains `custom_agents` section and `get_skills_for()`
- Registry resolves custom agents with three-layer config precedence
- SubagentExecutor loads skills per-session as conversation items (Codex pattern)
- task_tool no longer appends skills to system_prompt
- Lead agent system prompt dynamically lists all registered subagent types
- setup_agent tool accepts optional skills parameter
- Gateway agents API transparently passes skills in CRUD operations

Frontend:
- Agent/CreateAgentRequest/UpdateAgentRequest types include skills field
- Agent card displays skills as badges alongside tool_groups

Config:
- config.example.yaml documents custom_agents and per-agent skills override

Tests:
- 40 new tests covering all skill config, custom agents, and registry logic
- Existing tests updated for new get_skills_prompt_section signature

Closes #2230

* fix: address review feedback on skills PR

- Remove stale get_skills_prompt_section monkeypatches from test_task_tool_core_logic.py
  (task_tool no longer imports this function after skill injection moved to executor)
- Add key prefixes (tg:/sk:) to agent-card badges to prevent React key collisions
  between tool_groups and skills

* fix(ci): resolve lint and test failures

- Format agent-card.tsx with prettier (lint-frontend)
- Remove stale "Skills Appendix" system_prompt assertion — skills are now
  loaded per-session by SubagentExecutor, not appended to system_prompt

* fix(ci): sort imports in test_subagent_skills_config.py (ruff I001)

* fix(ci): use nullish coalescing in agent-card badge condition (eslint)

* fix: address review feedback on skills PR

- Use model_fields_set in AgentUpdateRequest to distinguish "field omitted"
  from "explicitly set to null" — fixes skills=None ambiguity where None
  means "inherit all" but was treated as "don't change"
- Move lazy import of get_subagent_config outside loop in
  _build_available_subagents_description to avoid repeated import overhead

---------

Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
2026-04-23 23:59:47 +08:00
JerryChaox
4e72410154
fix(gateway): bound lifespan shutdown hooks to prevent worker hang under uvicorn reload (#2331)
* fix(gateway): bound lifespan shutdown hooks to prevent worker hang

Gateway worker can hang indefinitely in `uvicorn --reload` mode with
the listening socket still bound — all /api/* requests return 504,
and SIGKILL is the only recovery.

Root cause (py-spy dump from a reproduction showed 16+ stacked frames
of signal_handler -> Event.set -> threading.Lock.__enter__ on the
main thread): CPython's `threading.Event` uses `Condition(Lock())`
where the inner Lock is non-reentrant. uvicorn's BaseReload signal
handler calls `should_exit.set()` directly from signal context; if a
second signal (SIGTERM/SIGHUP from the reload supervisor, or
watchfiles-triggered reload) arrives while the first handler holds
the Lock, the reentrant call deadlocks on itself.

The reload supervisor keeps sending those signals only when the
worker fails to exit promptly. DeerFlow's lifespan currently awaits
`stop_channel_service()` with no timeout; if a channel's `stop()`
stalls (e.g. Feishu/Slack WebSocket waiting for an ack), the worker
can't exit, the supervisor keeps signaling, and the deadlock becomes
reachable.

This is a defense-in-depth fix — it does not repair the upstream
uvicorn/CPython issue, but it ensures DeerFlow's lifespan exits
within a bounded window so the supervisor has no reason to keep
firing signals. No behavior change on the happy path.

Wraps the shutdown hook in `asyncio.wait_for(timeout=5.0)` and logs
a warning on timeout before proceeding to worker exit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Update backend/app/gateway/app.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* style: apply make format (ruff) to test assertions

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-04-23 19:41:26 +08:00
He Wang
c42ae3af79
feat: add optional prompt-toolkit support to debug.py (#2461)
* feat: add optional prompt-toolkit support to debug.py

Use PromptSession.prompt_async() for arrow-key navigation and input
history when prompt-toolkit is available, falling back to plain input()
with a helpful install tip otherwise.

Made-with: Cursor

* fix: handle EOFError gracefully in debug.py

Catch EOFError alongside KeyboardInterrupt so that Ctrl-D exits
cleanly instead of printing a traceback.

Made-with: Cursor
2026-04-23 17:49:18 +08:00
dependabot[bot]
bd35cd39aa
chore(deps): bump uuid from 13.0.0 to 14.0.0 in /frontend (#2467)
Bumps [uuid](https://github.com/uuidjs/uuid) from 13.0.0 to 14.0.0.
- [Release notes](https://github.com/uuidjs/uuid/releases)
- [Changelog](https://github.com/uuidjs/uuid/blob/main/CHANGELOG.md)
- [Commits](https://github.com/uuidjs/uuid/compare/v13.0.0...v14.0.0)

---
updated-dependencies:
- dependency-name: uuid
  dependency-version: 14.0.0
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-23 14:47:15 +08:00
d 🔹
b90f219bd1
fix(skills): validate bundled SKILL.md front-matter in CI (fixes #2443) (#2457)
* fix(skills): validate bundled SKILL.md front-matter in CI (fixes #2443)

Adds a parametrized backend test that runs `_validate_skill_frontmatter`
against every bundled SKILL.md under `skills/public/`, so a broken
front-matter fails CI with a per-skill error message instead of
surfacing as a runtime gateway-load warning.

The new test caught two pre-existing breakages on `main` and fixes them:

* `bootstrap/SKILL.md`: the unquoted description had a second `:` mid-line
  ("Also trigger for updates: ..."), which YAML parses as a nested mapping
  ("mapping values are not allowed here"). Rewrites the description as a
  folded scalar (`>-`), which preserves the original wording (including the
  embedded colon, double quotes, and apostrophes) without further escaping.
  This complements PR #2436 (single-file colon→hyphen patch) with a more
  general convention that survives future edits.

* `chart-visualization/SKILL.md`: used `dependency:` which is not in
  `ALLOWED_FRONTMATTER_PROPERTIES`. Renamed to `compatibility:`, the
  documented field for "Required tools, dependencies" per skill-creator.
  No code reads `dependency` (verified by grep across backend/).

* Apply suggestions from code review

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

* Fix the lint error

---------

Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
2026-04-23 14:06:14 +08:00
dependabot[bot]
96d00f6073
chore(deps): bump dompurify from 3.3.1 to 3.4.1 in /frontend (#2462)
Bumps [dompurify](https://github.com/cure53/DOMPurify) from 3.3.1 to 3.4.1.
- [Release notes](https://github.com/cure53/DOMPurify/releases)
- [Commits](https://github.com/cure53/DOMPurify/compare/3.3.1...3.4.1)

---
updated-dependencies:
- dependency-name: dompurify
  dependency-version: 3.4.1
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-23 12:18:59 +08:00
He Wang
c43c803f66
fix: remove mismatched context param in debug.py to suppress Pydantic warning (#2446)
* fix: remove mismatched context param in debug.py to suppress Pydantic warning

The ainvoke call passed context={"thread_id": ...} but the agent graph
has no context_schema (ContextT defaults to None), causing a
PydanticSerializationUnexpectedValue warning on every invocation.

Align with the production run_agent path by injecting context via
Runtime into configurable["__pregel_runtime"] instead.

Closes #2445

Made-with: Cursor

* refactor: derive runtime thread_id from config to avoid duplication

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Made-with: Cursor

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-04-23 09:56:57 +08:00
dependabot[bot]
dbd777fe62
chore(deps): bump python-dotenv from 1.2.1 to 1.2.2 in /backend (#2440)
Bumps [python-dotenv](https://github.com/theskumar/python-dotenv) from 1.2.1 to 1.2.2.
- [Release notes](https://github.com/theskumar/python-dotenv/releases)
- [Changelog](https://github.com/theskumar/python-dotenv/blob/main/CHANGELOG.md)
- [Commits](https://github.com/theskumar/python-dotenv/compare/v1.2.1...v1.2.2)

---
updated-dependencies:
- dependency-name: python-dotenv
  dependency-version: 1.2.2
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-22 16:48:09 +08:00
dependabot[bot]
1ca2621285
chore(deps): bump lxml from 6.0.2 to 6.1.0 in /backend (#2427)
Bumps [lxml](https://github.com/lxml/lxml) from 6.0.2 to 6.1.0.
- [Release notes](https://github.com/lxml/lxml/releases)
- [Changelog](https://github.com/lxml/lxml/blob/master/CHANGES.txt)
- [Commits](https://github.com/lxml/lxml/compare/lxml-6.0.2...lxml-6.1.0)

---
updated-dependencies:
- dependency-name: lxml
  dependency-version: 6.1.0
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-22 16:14:11 +08:00
Shawn Jasper
5ba1dacf25
fix: rename present_file to present_files in docs and prompts (#2393)
The tool is registered as `present_files` (plural) in present_file_tool.py,
but four references in documentation and prompt strings incorrectly used the
singular form `present_file`. This could cause confusion and potentially
lead to incorrect tool invocations.

Changed files:
- backend/docs/GUARDRAILS.md
- backend/docs/ARCHITECTURE.md
- backend/packages/harness/deerflow/agents/lead_agent/prompt.py (2 occurrences)
2026-04-21 16:10:14 +08:00
Reuben Bowlby
085c13edc7
fix: remove unnecessary f-string prefixes and unused import (#2352)
- Remove f-string prefix on 7 strings with no placeholders (F541)
  in analyze.py, aggregate_benchmark.py, run_loop.py, generate_review.py
- Remove unused `os` import in quick_validate.py (F401)

Found by ruff via HUMMBL Arbiter (https://hummbl.io/audit).
2026-04-21 09:53:18 +08:00
Copilot
ef04174194
Fix invalid HTML nesting in reasoning trigger during complex task rendering (#2382)
* Initial plan

* fix(frontend): avoid invalid paragraph nesting in reasoning trigger

Agent-Logs-Url: https://github.com/bytedance/deer-flow/sessions/4c9eb0c2-ff29-4629-a61c-4e33d736d918

Co-authored-by: WillemJiang <219644+WillemJiang@users.noreply.github.com>

* test(frontend): strengthen reasoning trigger DOM nesting assertion

Agent-Logs-Url: https://github.com/bytedance/deer-flow/sessions/4c9eb0c2-ff29-4629-a61c-4e33d736d918

Co-authored-by: WillemJiang <219644+WillemJiang@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: WillemJiang <219644+WillemJiang@users.noreply.github.com>
2026-04-21 09:41:28 +08:00
Ansel
6dce26a52e
fix: resolve tool duplication and skill parser YAML inconsistencies (#1803) (#2107)
* Refactor tests for SKILL.md parser

Updated tests for SKILL.md parser to handle quoted names and descriptions correctly. Added new tests for parsing plain and single-quoted names, and ensured multi-line descriptions are processed properly.

* Implement tool name validation and deduplication

Add tool name mismatch warning and deduplication logic

* Refactor skill file parsing and error handling

* Add tests for tool name deduplication

Added tests for tool name deduplication in get_available_tools(). Ensured that duplicates are not returned, the first occurrence is kept, and warnings are logged for skipped duplicates.

* Apply suggestions from code review

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

* Update minimal config to include tools list

* Update test for nonexistent skill file

Ensure the test for nonexistent files checks for None.

* Refactor tool loading and add skill management support

Refactor tool loading logic to include skill management tools based on configuration and clean up comments.

* Enhance code comments for tool loading logic

Added comments to clarify the purpose of various code sections related to tool loading and configuration.

* Fix assertion for duplicate tool name warning

* Fix indentation issues in tools.py

* Fix the lint error of test_tool_deduplication

* Fix the lint error of tools.py

* Fix the lint error

* Fix the lint error

* make format

---------

Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
2026-04-20 20:25:03 +08:00
imhaoran
fc94e90f6c
fix(setup-agent): prevent data loss when setup fails on existing agen… (#2254)
* fix(setup-agent): prevent data loss when setup fails on existing agent directory

Record whether the agent directory pre-existed before mkdir, and only
run shutil.rmtree cleanup when the directory was newly created during
this call. Previously, any failure would delete the entire directory
including pre-existing SOUL.md and config.yaml.

* fix: address PR review — init variables before try, remove unused result

* style: fix ruff I001 import block formatting in test file

* style: add missing blank lines between top-level definitions in test file
2026-04-20 20:17:30 +08:00
Eilen Shin
f2013f47aa
fix command palette hydration mismatch (#2301)
* fix command palette hydration mismatch

* style: format command dialog description
2026-04-20 11:36:16 +08:00
KiteEater
4be857f64b
fix: use Apple Container image pull syntax (#2366) 2026-04-20 08:00:05 +08:00
Admire
c99865f53d
fix(token-usage): enable stream usage for openai-compatible models (#2217)
* fix(token-usage): enable stream usage for openai-compatible models

* fix(token-usage): narrow stream_usage default to ChatOpenAI
2026-04-19 22:42:55 +08:00
YYMa
05f1da03e5
fix(script): use portable locale for langgraph log pipeline on macOS (#2361) 2026-04-19 22:41:00 +08:00
Xun
a62ca5dd47
fix: Catch httpx.ReadError in the error handling (#2309)
* fix: Catch httpx.ReadError in the error handling

* fix
2026-04-19 22:30:22 +08:00
Nan Gao
f514e35a36
fix(backend): make clarification messages idempotent (#2350) (#2351) 2026-04-19 22:00:58 +08:00
Xun
7c87dc5bca
fix(reasoning): prevent LLM-hallucinated HTML tags from rendering as DOM elements (#2321)
* fix

* add test

* fix
2026-04-19 19:27:34 +08:00
Hinotobi
80e210f5bb
[security] fix(uploads): require explicit opt-in for host-side document conversion (#2332)
* fix: disable host-side upload conversion by default

* fix: address PR review comments on upload conversion gate
2026-04-18 22:47:42 +08:00
dependabot[bot]
5656f90792
chore(deps-dev): bump pytest from 9.0.2 to 9.0.3 in /backend (#2349)
Bumps [pytest](https://github.com/pytest-dev/pytest) from 9.0.2 to 9.0.3.
- [Release notes](https://github.com/pytest-dev/pytest/releases)
- [Changelog](https://github.com/pytest-dev/pytest/blob/main/CHANGELOG.rst)
- [Commits](https://github.com/pytest-dev/pytest/compare/9.0.2...9.0.3)

---
updated-dependencies:
- dependency-name: pytest
  dependency-version: 9.0.3
  dependency-type: direct:development
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-18 22:22:40 +08:00
Shawn Jasper
55474011c9
fix(subagent): inherit parent agent's tool_groups in task_tool (#2305)
* fix(subagent): inherit parent agent's tool_groups in task_tool

When a custom agent defines tool_groups (e.g. [file:read, file:write, bash]),
the restriction is correctly applied to the lead agent. However, when the lead
agent delegates work to a subagent via the task tool, get_available_tools() is
called without the groups parameter, causing the subagent to receive ALL tools
(including web_search, web_fetch, image_search, etc.) regardless of the parent
agent's configuration.

This fix propagates tool_groups through run metadata so that task_tool passes
the same group filter when building the subagent's tool set.

Changes:
- agent.py: include tool_groups in run metadata
- task_tool.py: read tool_groups from metadata and pass to get_available_tools()

* fix: initialize metadata before conditional block and update tests for tool_groups propagation

- Initialize metadata = {} before the 'if runtime is not None' block to
  avoid Ruff F821 (possibly-undefined variable) and simplify the
  parent_tool_groups expression.
- Update existing test assertion to expect groups=None in
  get_available_tools call signature.
- Add 3 new test cases:
  - test_task_tool_propagates_tool_groups_to_subagent
  - test_task_tool_no_tool_groups_passes_none
  - test_task_tool_runtime_none_passes_groups_none
2026-04-18 22:17:37 +08:00
imhaoran
24fe5fbd8c
fix(mcp): prevent RuntimeError from escaping except block in get_cach… (#2252)
* fix(mcp): prevent RuntimeError from escaping except block in get_cached_mcp_tools

When `asyncio.get_event_loop()` raises RuntimeError and the fallback
`asyncio.run()` also fails, the exception escapes unhandled because
Python does not route exceptions raised inside an `except` block to
sibling `except` clauses. Wrap the fallback call in its own try/except
so failures are logged and the function returns [] as intended.

* fix: use logger.exception to preserve stack traces on MCP init failure
2026-04-18 21:07:30 +08:00
Willem Jiang
be4663505a chroe(script): disable the color log of langgraph 2026-04-18 20:03:05 +08:00
dependabot[bot]
aa6098e6a4
chore(deps): bump langsmith from 0.6.4 to 0.7.31 in /backend (#2291)
Bumps [langsmith](https://github.com/langchain-ai/langsmith-sdk) from 0.6.4 to 0.7.31.
- [Release notes](https://github.com/langchain-ai/langsmith-sdk/releases)
- [Commits](https://github.com/langchain-ai/langsmith-sdk/compare/v0.6.4...v0.7.31)

---
updated-dependencies:
- dependency-name: langsmith
  dependency-version: 0.7.31
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-18 19:54:21 +08:00
Airene Fang
1221448029
fix(scripts): Cloud Provider Reports Security Issue(aliyun could) (#2323)
ATT&CK矩阵ID:T1059.004
数据来源:进程启动触发检测
告警原因:该进程的命令行显示出反弹shelI的特征
命令行:timeout 1 bash -c exec 3<>/dev/tcp/127.0.0.1/2024
进程路径:/usr/bin/timeout
进程链:-[337650] /usr/sbin/sshd -D
-[397971] /usr/sbin/sshd -D -R
-[397977]-bash
-[398903] make dev
-[398920] bash ./scripts/serve.sh --dev
-[399037]bash ./scripts/wait-for-port.sh 2024 60 LangGraph
2026-04-18 19:33:32 +08:00
Jason
3b91df2b18
fix(frontend): add catch-all API rewrite for gateway routes (#2335)
When NEXT_PUBLIC_BACKEND_BASE_URL is unset, the frontend proxies API
requests to the gateway. Only /api/agents and /api/skills had rewrite
rules, causing 404s for /api/models, /api/threads, /api/memory,
/api/mcp, /api/suggestions, /api/runs, etc.

Add a catch-all /api/:path* rewrite that proxies all remaining gateway
API routes. The existing /api/langgraph rewrite takes priority because
it is pushed to the array first (Next.js checks rewrites in order).

Fixes #2327

Co-authored-by: JasonOA888 <JasonOA888@users.noreply.github.com>
2026-04-18 11:35:19 +08:00
Shawn Jasper
ca1b7d5f48
fix(sandbox): add missing path masking in ls_tool output (#2317)
ls_tool was the only file-system tool that did not call
mask_local_paths_in_output() before returning its result, causing host
absolute paths (e.g. /Users/.../backend/.deer-flow/knowledge-base/...)
to leak to the LLM instead of the expected virtual paths
(/mnt/knowledge-base/...).

This patch:
- Adds the mask_local_paths_in_output() call to ls_tool, consistent
  with bash_tool, glob_tool and grep_tool.
- Initialises thread_data = None before the is_local_sandbox branch
  (same pattern as glob_tool) so the variable is always in scope.
- Adds three new tests covering user-data path masking, skills path
  masking and the empty-directory edge case.
2026-04-18 08:46:59 +08:00
yangzheli
c6b0423558
feat(frontend): add Playwright E2E tests with CI workflow (#2279)
* feat(frontend): add Playwright E2E tests with CI workflow

Add end-to-end testing infrastructure using Playwright (Chromium only).
14 tests across 5 spec files cover landing page, chat workspace,
thread history, sidebar navigation, and agent chat — all with mocked
LangGraph/Backend APIs via network interception (zero backend dependency).

New files:
- playwright.config.ts — Chromium, 30s timeout, auto-start Next.js
- tests/e2e/utils/mock-api.ts — shared API mocks & SSE stream helpers
- tests/e2e/{landing,chat,thread-history,sidebar,agent-chat}.spec.ts
- .github/workflows/e2e-tests.yml — push main + PR trigger, paths filter

Updated: package.json, Makefile, .gitignore, CONTRIBUTING.md,
frontend/CLAUDE.md, frontend/AGENTS.md, frontend/README.md

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: apply Copilot suggestions

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
2026-04-18 08:21:08 +08:00
DanielWalnut
898f4e8ac2
fix: Memory update system has cache corruption, data loss, and thread-safety bugs (#2251)
* fix(memory): cache corruption, thread-safety, and caller mutation bugs

Bug 1 (updater.py): deep-copy current_memory before passing to
_apply_updates() so a subsequent save() failure cannot leave a
partially-mutated object in the storage cache.

Bug 3 (storage.py): add _cache_lock (threading.Lock) to
FileMemoryStorage and acquire it around every read/write of
_memory_cache, fixing concurrent-access races between the background
timer thread and HTTP reload calls.

Bug 4 (storage.py): replace in-place mutation
  memory_data["lastUpdated"] = ...
with a shallow copy
  memory_data = {**memory_data, "lastUpdated": ...}
so save() no longer silently modifies the caller's dict.

Regression tests added for all three bugs in test_memory_storage.py
and test_memory_updater.py.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* style: format test_memory_updater.py with ruff

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* style: remove stale bug-number labels from code comments and docstrings

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 12:00:31 +08:00