deer-flow/backend/tests/test_client.py
rayhpeng 7b9d224b3a
feat(persistence): per-user filesystem isolation, run-scoped APIs, and state/history simplification (#2153)
* feat(persistence): add unified persistence layer with event store, token tracking, and feedback (#1930)

* feat(persistence): add SQLAlchemy 2.0 async ORM scaffold

Introduce a unified database configuration (DatabaseConfig) that
controls both the LangGraph checkpointer and the DeerFlow application
persistence layer from a single `database:` config section.

New modules:
- deerflow.config.database_config — Pydantic config with memory/sqlite/postgres backends
- deerflow.persistence — async engine lifecycle, DeclarativeBase with to_dict mixin, Alembic skeleton
- deerflow.runtime.runs.store — RunStore ABC + MemoryRunStore implementation

Gateway integration initializes/tears down the persistence engine in
the existing langgraph_runtime() context manager. Legacy checkpointer
config is preserved for backward compatibility.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(persistence): add RunEventStore ABC + MemoryRunEventStore

Phase 2-A prerequisite for event storage: adds the unified run event
stream interface (RunEventStore) with an in-memory implementation,
RunEventsConfig, gateway integration, and comprehensive tests (27 cases).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(persistence): add ORM models, repositories, DB/JSONL event stores, RunJournal, and API endpoints

Phase 2-B: run persistence + event storage + token tracking.

- ORM models: RunRow (with token fields), ThreadMetaRow, RunEventRow
- RunRepository implements RunStore ABC via SQLAlchemy ORM
- ThreadMetaRepository with owner access control
- DbRunEventStore with trace content truncation and cursor pagination
- JsonlRunEventStore with per-run files and seq recovery from disk
- RunJournal (BaseCallbackHandler) captures LLM/tool/lifecycle events,
  accumulates token usage by caller type, buffers and flushes to store
- RunManager now accepts optional RunStore for persistent backing
- Worker creates RunJournal, writes human_message, injects callbacks
- Gateway deps use factory functions (RunRepository when DB available)
- New endpoints: messages, run messages, run events, token-usage
- ThreadCreateRequest gains assistant_id field
- 92 tests pass (33 new), zero regressions

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(persistence): add user feedback + follow-up run association

Phase 2-C: feedback and follow-up tracking.

- FeedbackRow ORM model (rating +1/-1, optional message_id, comment)
- FeedbackRepository with CRUD, list_by_run/thread, aggregate stats
- Feedback API endpoints: create, list, stats, delete
- follow_up_to_run_id in RunCreateRequest (explicit or auto-detected
  from latest successful run on the thread)
- Worker writes follow_up_to_run_id into human_message event metadata
- Gateway deps: feedback_repo factory + getter
- 17 new tests (14 FeedbackRepository + 3 follow-up association)
- 109 total tests pass, zero regressions

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test+config: comprehensive Phase 2 test coverage + deprecate checkpointer config

- config.example.yaml: deprecate standalone checkpointer section, activate
  unified database:sqlite as default (drives both checkpointer + app data)
- New: test_thread_meta_repo.py (14 tests) — full ThreadMetaRepository coverage
  including check_access owner logic, list_by_owner pagination
- Extended test_run_repository.py (+4 tests) — completion preserves fields,
  list ordering desc, limit, owner_none returns all
- Extended test_run_journal.py (+8 tests) — on_chain_error, track_tokens=false,
  middleware no ai_message, unknown caller tokens, convenience fields,
  tool_error, non-summarization custom event
- Extended test_run_event_store.py (+7 tests) — DB batch seq continuity,
  make_run_event_store factory (memory/db/jsonl/fallback/unknown)
- Extended test_phase2b_integration.py (+4 tests) — create_or_reject persists,
  follow-up metadata, summarization in history, full DB-backed lifecycle
- Fixed DB integration test to use proper fake objects (not MagicMock)
  for JSON-serializable metadata
- 157 total Phase 2 tests pass, zero regressions

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* config: move default sqlite_dir to .deer-flow/data

Keep SQLite databases alongside other DeerFlow-managed data
(threads, memory) under the .deer-flow/ directory instead of a
top-level ./data folder.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor(persistence): remove UTFJSON, use engine-level json_serializer + datetime.now()

- Replace custom UTFJSON type with standard sqlalchemy.JSON in all ORM
  models. Add json_serializer=json.dumps(ensure_ascii=False) to all
  create_async_engine calls so non-ASCII text (Chinese etc.) is stored
  as-is in both SQLite and Postgres.
- Change ORM datetime defaults from datetime.now(UTC) to datetime.now(),
  remove UTC imports.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor(gateway): simplify deps.py with getter factory + inline repos

- Replace 6 identical getter functions with _require() factory.
- Inline 3 _make_*_repo() factories into langgraph_runtime(), call
  get_session_factory() once instead of 3 times.
- Add thread_meta upsert in start_run (services.py).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(docker): add UV_EXTRAS build arg for optional dependencies

Support installing optional dependency groups (e.g. postgres) at
Docker build time via UV_EXTRAS build arg:
  UV_EXTRAS=postgres docker compose build

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor(journal): fix flush, token tracking, and consolidate tests

RunJournal fixes:
- _flush_sync: retain events in buffer when no event loop instead of
  dropping them; worker's finally block flushes via async flush().
- on_llm_end: add tool_calls filter and caller=="lead_agent" guard for
  ai_message events; mark message IDs for dedup with record_llm_usage.
- worker.py: persist completion data (tokens, message count) to RunStore
  in finally block.

Model factory:
- Auto-inject stream_usage=True for BaseChatOpenAI subclasses with
  custom api_base, so usage_metadata is populated in streaming responses.

Test consolidation:
- Delete test_phase2b_integration.py (redundant with existing tests).
- Move DB-backed lifecycle test into test_run_journal.py.
- Add tests for stream_usage injection in test_model_factory.py.
- Clean up executor/task_tool dead journal references.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(events): widen content type to str|dict in all store backends

Allow event content to be a dict (for structured OpenAI-format messages)
in addition to plain strings. Dict values are JSON-serialized for the DB
backend and deserialized on read; memory and JSONL backends handle dicts
natively. Trace truncation now serializes dicts to JSON before measuring.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(events): use metadata flag instead of heuristic for dict content detection

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(converters): add LangChain-to-OpenAI message format converters

Pure functions langchain_to_openai_message, langchain_to_openai_completion,
langchain_messages_to_openai, and _infer_finish_reason for converting
LangChain BaseMessage objects to OpenAI Chat Completions format, used by
RunJournal for event storage. 15 unit tests added.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(converters): handle empty list content as null, clean up test

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(events): human_message content uses OpenAI user message format

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(events): ai_message uses OpenAI format, add ai_tool_call message event

- ai_message content now uses {"role": "assistant", "content": "..."} format
- New ai_tool_call message event emitted when lead_agent LLM responds with tool_calls
- ai_tool_call uses langchain_to_openai_message converter for consistent format
- Both events include finish_reason in metadata ("stop" or "tool_calls")

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(events): add tool_result message event with OpenAI tool message format

Cache tool_call_id from on_tool_start keyed by run_id as fallback for on_tool_end,
then emit a tool_result message event (role=tool, tool_call_id, content) after each
successful tool completion.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(events): summary content uses OpenAI system message format

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(events): replace llm_start/llm_end with llm_request/llm_response in OpenAI format

Add on_chat_model_start to capture structured prompt messages as llm_request events.
Replace llm_end trace events with llm_response using OpenAI Chat Completions format.
Track llm_call_index to pair request/response events.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(events): add record_middleware method for middleware trace events

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test(events): add full run sequence integration test for OpenAI content format

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(events): align message events with checkpoint format and add middleware tag injection

- Message events (ai_message, ai_tool_call, tool_result, human_message) now use
  BaseMessage.model_dump() format, matching LangGraph checkpoint values.messages
- on_tool_end extracts tool_call_id/name/status from ToolMessage objects
- on_tool_error now emits tool_result message events with error status
- record_middleware uses middleware:{tag} event_type and middleware category
- Summarization custom events use middleware:summarize category
- TitleMiddleware injects middleware:title tag via get_config() inheritance
- SummarizationMiddleware model bound with middleware:summarize tag
- Worker writes human_message using HumanMessage.model_dump()

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(threads): switch search endpoint to threads_meta table and sync title

- POST /api/threads/search now queries threads_meta table directly,
  removing the two-phase Store + Checkpointer scan approach
- Add ThreadMetaRepository.search() with metadata/status filters
- Add ThreadMetaRepository.update_display_name() for title sync
- Worker syncs checkpoint title to threads_meta.display_name on run completion
- Map display_name to values.title in search response for API compatibility

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(threads): history endpoint reads messages from event store

- POST /api/threads/{thread_id}/history now combines two data sources:
  checkpointer for checkpoint_id, metadata, title, thread_data;
  event store for messages (complete history, not truncated by summarization)
- Strip internal LangGraph metadata keys from response
- Remove full channel_values serialization in favor of selective fields

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: remove duplicate optional-dependencies header in pyproject.toml

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(middleware): pass tagged config to TitleMiddleware ainvoke call

Without the config, the middleware:title tag was not injected,
causing the LLM response to be recorded as a lead_agent ai_message
in run_events.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: resolve merge conflict in .env.example

Keep both DATABASE_URL (from persistence-scaffold) and WECOM
credentials (from main) after the merge.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(persistence): address review feedback on PR #1851

- Fix naive datetime.now() → datetime.now(UTC) in all ORM models
- Fix seq race condition in DbRunEventStore.put() with FOR UPDATE
  and UNIQUE(thread_id, seq) constraint
- Encapsulate _store access in RunManager.update_run_completion()
- Deduplicate _store.put() logic in RunManager via _persist_to_store()
- Add update_run_completion to RunStore ABC + MemoryRunStore
- Wire follow_up_to_run_id through the full create path
- Add error recovery to RunJournal._flush_sync() lost-event scenario
- Add migration note for search_threads breaking change
- Fix test_checkpointer_none_fix mock to set database=None

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: update uv.lock

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(persistence): address 22 review comments from CodeQL, Copilot, and Code Quality

Bug fixes:
- Sanitize log params to prevent log injection (CodeQL)
- Reset threads_meta.status to idle/error when run completes
- Attach messages only to latest checkpoint in /history response
- Write threads_meta on POST /threads so new threads appear in search

Lint fixes:
- Remove unused imports (journal.py, migrations/env.py, test_converters.py)
- Convert lambda to named function (engine.py, Ruff E731)
- Remove unused logger definitions in repos (Ruff F841)
- Add logging to JSONL decode errors and empty except blocks
- Separate assert side-effects in tests (CodeQL)
- Remove unused local variables in tests (Ruff F841)
- Fix max_trace_content truncation to use byte length, not char length

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* style: apply ruff format to persistence and runtime files

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Potential fix for pull request finding 'Statement has no effect'

Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>

* refactor(runtime): introduce RunContext to reduce run_agent parameter bloat

Extract checkpointer, store, event_store, run_events_config, thread_meta_repo,
and follow_up_to_run_id into a frozen RunContext dataclass. Add get_run_context()
in deps.py to build the base context from app.state singletons. start_run() uses
dataclasses.replace() to enrich per-run fields before passing ctx to run_agent.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor(gateway): move sanitize_log_param to app/gateway/utils.py

Extract the log-injection sanitizer from routers/threads.py into a shared
utils module and rename to sanitize_log_param (public API). Eliminates the
reverse service → router import in services.py.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* perf: use SQL aggregation for feedback stats and thread token usage

Replace Python-side counting in FeedbackRepository.aggregate_by_run with
a single SELECT COUNT/SUM query. Add RunStore.aggregate_tokens_by_thread
abstract method with SQL GROUP BY implementation in RunRepository and
Python fallback in MemoryRunStore. Simplify the thread_token_usage
endpoint to delegate to the new method, eliminating the limit=10000
truncation risk.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: annotate DbRunEventStore.put() as low-frequency path

Add docstring clarifying that put() opens a per-call transaction with
FOR UPDATE and should only be used for infrequent writes (currently
just the initial human_message event). High-throughput callers should
use put_batch() instead.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(threads): fall back to Store search when ThreadMetaRepository is unavailable

When database.backend=memory (default) or no SQL session factory is
configured, search_threads now queries the LangGraph Store instead of
returning 503. Returns empty list if neither Store nor repo is available.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor(persistence): introduce ThreadMetaStore ABC for backend-agnostic thread metadata

Add ThreadMetaStore abstract base class with create/get/search/update/delete
interface. ThreadMetaRepository (SQL) now inherits from it. New
MemoryThreadMetaStore wraps LangGraph BaseStore for memory-mode deployments.

deps.py now always provides a non-None thread_meta_repo, eliminating all
`if thread_meta_repo is not None` guards in services.py, worker.py, and
routers/threads.py. search_threads no longer needs a Store fallback branch.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor(history): read messages from checkpointer instead of RunEventStore

The /history endpoint now reads messages directly from the
checkpointer's channel_values (the authoritative source) instead of
querying RunEventStore.list_messages(). The RunEventStore API is
preserved for other consumers.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(persistence): address new Copilot review comments

- feedback.py: validate thread_id/run_id before deleting feedback
- jsonl.py: add path traversal protection with ID validation
- run_repo.py: parse `before` to datetime for PostgreSQL compat
- thread_meta_repo.py: fix pagination when metadata filter is active
- database_config.py: use resolve_path for sqlite_dir consistency

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Implement skill self-evolution and skill_manage flow (#1874)

* chore: ignore .worktrees directory

* Add skill_manage self-evolution flow

* Fix CI regressions for skill_manage

* Address PR review feedback for skill evolution

* fix(skill-evolution): preserve history on delete

* fix(skill-evolution): tighten scanner fallbacks

* docs: add skill_manage e2e evidence screenshot

* fix(skill-manage): avoid blocking fs ops in session runtime

---------

Co-authored-by: Willem Jiang <willem.jiang@gmail.com>

* fix(config): resolve sqlite_dir relative to CWD, not Paths.base_dir

resolve_path() resolves relative to Paths.base_dir (.deer-flow),
which double-nested the path to .deer-flow/.deer-flow/data/app.db.
Use Path.resolve() (CWD-relative) instead.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Feature/feishu receive file (#1608)

* feat(feishu): add channel file materialization hook for inbound messages

- Introduce Channel.receive_file(msg, thread_id) as a base method for file materialization; default is no-op.
- Implement FeishuChannel.receive_file to download files/images from Feishu messages, save to sandbox, and inject virtual paths into msg.text.
- Update ChannelManager to call receive_file for any channel if msg.files is present, enabling downstream model access to user-uploaded files.
- No impact on Slack/Telegram or other channels (they inherit the default no-op).

* style(backend): format code with ruff for lint compliance

- Auto-formatted packages/harness/deerflow/agents/factory.py and tests/test_create_deerflow_agent.py using `ruff format`
- Ensured both files conform to project linting standards
- Fixes CI lint check failures caused by code style issues

* fix(feishu): handle file write operation asynchronously to prevent blocking

* fix(feishu): rename GetMessageResourceRequest to _GetMessageResourceRequest and remove redundant code

* test(feishu): add tests for receive_file method and placeholder replacement

* fix(manager): remove unnecessary type casting for channel retrieval

* fix(feishu): update logging messages to reflect resource handling instead of image

* fix(feishu): sanitize filename by replacing invalid characters in file uploads

* fix(feishu): improve filename sanitization and reorder image key handling in message processing

* fix(feishu): add thread lock to prevent filename conflicts during file downloads

* fix(test): correct bad merge in test_feishu_parser.py

* chore: run ruff and apply formatting cleanup
fix(feishu): preserve rich-text attachment order and improve fallback filename handling

* fix(docker): restore gateway env vars and fix langgraph empty arg issue (#1915)

Two production docker-compose.yaml bugs prevent `make up` from working:

1. Gateway missing DEER_FLOW_CONFIG_PATH and DEER_FLOW_EXTENSIONS_CONFIG_PATH
   environment overrides. Added in fb2d99f (#1836) but accidentally reverted
   by ca2fb95 (#1847). Without them, gateway reads host paths from .env via
   env_file, causing FileNotFoundError inside the container.

2. Langgraph command fails when LANGGRAPH_ALLOW_BLOCKING is unset (default).
   Empty $${allow_blocking} inserts a bare space between flags, causing
   ' --no-reload' to be parsed as unexpected extra argument. Fix by building
   args string first and conditionally appending --allow-blocking.

Co-authored-by: cooper <cooperfu@tencent.com>

* fix(frontend): resolve invalid HTML nesting and tabnabbing vulnerabilities (#1904)

* fix(frontend): resolve invalid HTML nesting and tabnabbing vulnerabilities

Fix `<button>` inside `<a>` invalid HTML in artifact components and add
missing `noopener,noreferrer` to `window.open` calls to prevent reverse
tabnabbing.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(frontend): address Copilot review on tabnabbing and double-tab-open

Remove redundant parent onClick on web_fetch ChainOfThoughtStep to
prevent opening two tabs on link click, and explicitly null out
window.opener after window.open() for defensive tabnabbing hardening.

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* refactor(persistence): organize entities into per-entity directories

Restructure the persistence layer from horizontal "models/ + repositories/"
split into vertical entity-aligned directories. Each entity (thread_meta,
run, feedback) now owns its ORM model, abstract interface (where applicable),
and concrete implementations under a single directory with an aggregating
__init__.py for one-line imports.

Layout:
  persistence/thread_meta/{base,model,sql,memory}.py
  persistence/run/{model,sql}.py
  persistence/feedback/{model,sql}.py

models/__init__.py is kept as a facade so Alembic autogenerate continues to
discover all ORM tables via Base.metadata. RunEventRow remains under
models/run_event.py because its storage implementation lives in
runtime/events/store/db.py and has no matching repository directory.

The repositories/ directory is removed entirely. All call sites in
gateway/deps.py and tests are updated to import from the new entity
packages, e.g.:

    from deerflow.persistence.thread_meta import ThreadMetaRepository
    from deerflow.persistence.run import RunRepository
    from deerflow.persistence.feedback import FeedbackRepository

Full test suite passes (1690 passed, 14 skipped).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(gateway): sync thread rename and delete through ThreadMetaStore

The POST /threads/{id}/state endpoint previously synced title changes
only to the LangGraph Store via _store_upsert. In sqlite mode the search
endpoint reads from the ThreadMetaRepository SQL table, so renames never
appeared in /threads/search until the next agent run completed (worker.py
syncs title from checkpoint to thread_meta in its finally block).

Likewise the DELETE /threads/{id} endpoint cleaned up the filesystem,
Store, and checkpointer but left the threads_meta row orphaned in sqlite,
so deleted threads kept appearing in /threads/search.

Fix both endpoints by routing through the ThreadMetaStore abstraction
which already has the correct sqlite/memory implementations wired up by
deps.py. The rename path now calls update_display_name() and the delete
path calls delete() — both work uniformly across backends.

Verified end-to-end with curl in gateway mode against sqlite backend.
Existing test suite (1690 passed) and focused router/repo tests pass.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor(gateway): route all thread metadata access through ThreadMetaStore

Following the rename/delete bug fix in PR1, migrate the remaining direct
LangGraph Store reads/writes in the threads router and services to the
ThreadMetaStore abstraction so that the sqlite and memory backends behave
identically and the legacy dual-write paths can be removed.

Migrated endpoints (threads.py):
- create_thread: idempotency check + write now use thread_meta_repo.get/create
  instead of dual-writing the LangGraph Store and the SQL row.
- get_thread: reads from thread_meta_repo.get; the checkpoint-only fallback
  for legacy threads is preserved.
- patch_thread: replaced _store_get/_store_put with thread_meta_repo.update_metadata.
- delete_thread_data: dropped the legacy store.adelete; thread_meta_repo.delete
  already covers it.

Removed dead code (services.py):
- _upsert_thread_in_store — redundant with the immediately following
  thread_meta_repo.create() call.
- _sync_thread_title_after_run — worker.py's finally block already syncs
  the title via thread_meta_repo.update_display_name() after each run.

Removed dead code (threads.py):
- _store_get / _store_put / _store_upsert helpers (no remaining callers).
- THREADS_NS constant.
- get_store import (router no longer touches the LangGraph Store directly).

New abstract method:
- ThreadMetaStore.update_metadata(thread_id, metadata) merges metadata into
  the thread's metadata field. Implemented in both ThreadMetaRepository (SQL,
  read-modify-write inside one session) and MemoryThreadMetaStore. Three new
  unit tests cover merge / empty / nonexistent behaviour.

Net change: -134 lines. Full test suite: 1693 passed, 14 skipped.
Verified end-to-end with curl in gateway mode against sqlite backend
(create / patch / get / rename / search / delete).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
Co-authored-by: DanielWalnut <45447813+hetaoBackend@users.noreply.github.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: JilongSun <965640067@qq.com>
Co-authored-by: jie <49781832+stan-fu@users.noreply.github.com>
Co-authored-by: cooper <cooperfu@tencent.com>
Co-authored-by: yangzheli <43645580+yangzheli@users.noreply.github.com>

* feat(auth): release-validation pass for 2.0-rc — 12 blockers + simplify follow-ups (#2008)

* feat(auth): introduce backend auth module

Port RFC-001 authentication core from PR #1728:
- JWT token handling (create_access_token, decode_token, TokenPayload)
- Password hashing (bcrypt) with verify_password
- SQLite UserRepository with base interface
- Provider Factory pattern (LocalAuthProvider)
- CLI reset_admin tool
- Auth-specific errors (AuthErrorCode, TokenError, AuthErrorResponse)

Deps:
- bcrypt>=4.0.0
- pyjwt>=2.9.0
- email-validator>=2.0.0
- backend/uv.toml pins public PyPI index

Tests: 12 pure unit tests (test_auth_config.py, test_auth_errors.py).

Scope note: authz.py, test_auth.py, and test_auth_type_system.py are
deferred to commit 2 because they depend on middleware and deps wiring
that is not yet in place. Commit 1 stays "pure new files only" as the
spec mandates.

* feat(auth): wire auth end-to-end (middleware + frontend replacement)

Backend:
- Port auth_middleware, csrf_middleware, langgraph_auth, routers/auth
- Port authz decorator (owner_filter_key defaults to 'owner_id')
- Merge app.py: register AuthMiddleware + CSRFMiddleware + CORS, add
  _ensure_admin_user lifespan hook, _migrate_orphaned_threads helper,
  register auth router
- Merge deps.py: add get_local_provider, get_current_user_from_request,
  get_optional_user_from_request; keep get_current_user as thin str|None
  adapter for feedback router
- langgraph.json: add auth path pointing to langgraph_auth.py:auth
- Rename metadata['user_id'] -> metadata['owner_id'] in langgraph_auth
  (both metadata write and LangGraph filter dict) + test fixtures

Frontend:
- Delete better-auth library and api catch-all route
- Remove better-auth npm dependency and env vars (BETTER_AUTH_SECRET,
  BETTER_AUTH_GITHUB_*) from env.js
- Port frontend/src/core/auth/* (AuthProvider, gateway-config,
  proxy-policy, server-side getServerSideUser, types)
- Port frontend/src/core/api/fetcher.ts
- Port (auth)/layout, (auth)/login, (auth)/setup pages
- Rewrite workspace/layout.tsx as server component that calls
  getServerSideUser and wraps in AuthProvider
- Port workspace/workspace-content.tsx for the client-side sidebar logic

Tests:
- Port 5 auth test files (test_auth, test_auth_middleware,
  test_auth_type_system, test_ensure_admin, test_langgraph_auth)
- 176 auth tests PASS

After this commit: login/logout/registration flow works, but persistence
layer does not yet filter by owner_id. Commit 4 closes that gap.

* feat(auth): account settings page + i18n

- Port account-settings-page.tsx (change password, change email, logout)
- Wire into settings-dialog.tsx as new "account" section with UserIcon,
  rendered first in the section list
- Add i18n keys:
  - en-US/zh-CN: settings.sections.account ("Account" / "账号")
  - en-US/zh-CN: button.logout ("Log out" / "退出登录")
  - types.ts: matching type declarations

* feat(auth): enforce owner_id across 2.0-rc persistence layer

Add request-scoped contextvar-based owner filtering to threads_meta,
runs, run_events, and feedback repositories. Router code is unchanged
— isolation is enforced at the storage layer so that any caller that
forgets to pass owner_id still gets filtered results, and new routes
cannot accidentally leak data.

Core infrastructure
-------------------
- deerflow/runtime/user_context.py (new):
  - ContextVar[CurrentUser | None] with default None
  - runtime_checkable CurrentUser Protocol (structural subtype with .id)
  - set/reset/get/require helpers
  - AUTO sentinel + resolve_owner_id(value, method_name) for sentinel
    three-state resolution: AUTO reads contextvar, explicit str
    overrides, explicit None bypasses the filter (for migration/CLI)

Repository changes
------------------
- ThreadMetaRepository: create/get/search/update_*/delete gain
  owner_id=AUTO kwarg; read paths filter by owner, writes stamp it,
  mutations check ownership before applying
- RunRepository: put/get/list_by_thread/delete gain owner_id=AUTO kwarg
- FeedbackRepository: create/get/list_by_run/list_by_thread/delete
  gain owner_id=AUTO kwarg
- DbRunEventStore: list_messages/list_events/list_messages_by_run/
  count_messages/delete_by_thread/delete_by_run gain owner_id=AUTO
  kwarg. Write paths (put/put_batch) read contextvar softly: when a
  request-scoped user is available, owner_id is stamped; background
  worker writes without a user context pass None which is valid
  (orphan row to be bound by migration)

Schema
------
- persistence/models/run_event.py: RunEventRow.owner_id = Mapped[
  str | None] = mapped_column(String(64), nullable=True, index=True)
- No alembic migration needed: 2.0 ships fresh, Base.metadata.create_all
  picks up the new column automatically

Middleware
----------
- auth_middleware.py: after cookie check, call get_optional_user_from_
  request to load the real User, stamp it into request.state.user AND
  the contextvar via set_current_user, reset in a try/finally. Public
  paths and unauthenticated requests continue without contextvar, and
  @require_auth handles the strict 401 path

Test infrastructure
-------------------
- tests/conftest.py: @pytest.fixture(autouse=True) _auto_user_context
  sets a default SimpleNamespace(id="test-user-autouse") on every test
  unless marked @pytest.mark.no_auto_user. Keeps existing 20+
  persistence tests passing without modification
- pyproject.toml [tool.pytest.ini_options]: register no_auto_user
  marker so pytest does not emit warnings for opt-out tests
- tests/test_user_context.py: 6 tests covering three-state semantics,
  Protocol duck typing, and require/optional APIs
- tests/test_thread_meta_repo.py: one test updated to pass owner_id=
  None explicitly where it was previously relying on the old default

Test results
------------
- test_user_context.py: 6 passed
- test_auth*.py + test_langgraph_auth.py + test_ensure_admin.py: 127
- test_run_event_store / test_run_repository / test_thread_meta_repo
  / test_feedback: 92 passed
- Full backend suite: 1905 passed, 2 failed (both @requires_llm flaky
  integration tests unrelated to auth), 1 skipped

* feat(auth): extend orphan migration to 2.0-rc persistence tables

_ensure_admin_user now runs a three-step pipeline on every boot:

  Step 1 (fatal):     admin user exists / is created / password is reset
  Step 2 (non-fatal): LangGraph store orphan threads → admin
  Step 3 (non-fatal): SQL persistence tables → admin
    - threads_meta
    - runs
    - run_events
    - feedback

Each step is idempotent. The fatal/non-fatal split mirrors PR #1728's
original philosophy: admin creation failure blocks startup (the system
is unusable without an admin), whereas migration failures log a warning
and let the service proceed (a partial migration is recoverable; a
missing admin is not).

Key helpers
-----------
- _iter_store_items(store, namespace, *, page_size=500):
  async generator that cursor-paginates across LangGraph store pages.
  Fixes PR #1728's hardcoded limit=1000 bug that would silently lose
  orphans beyond the first page.

- _migrate_orphaned_threads(store, admin_user_id):
  Rewritten to use _iter_store_items. Returns the migrated count so the
  caller can log it; raises only on unhandled exceptions.

- _migrate_orphan_sql_tables(admin_user_id):
  Imports the 4 ORM models lazily, grabs the shared session factory,
  runs one UPDATE per table in a single transaction, commits once.
  No-op when no persistence backend is configured (in-memory dev).

Tests: test_ensure_admin.py (8 passed)

* test(auth): port AUTH test plan docs + lint/format pass

- Port backend/docs/AUTH_TEST_PLAN.md and AUTH_UPGRADE.md from PR #1728
- Rename metadata.user_id → metadata.owner_id in AUTH_TEST_PLAN.md
  (4 occurrences from the original PR doc)
- ruff auto-fix UP037 in sentinel type annotations: drop quotes around
  "str | None | _AutoSentinel" now that from __future__ import
  annotations makes them implicit string forms
- ruff format: 2 files (app/gateway/app.py, runtime/user_context.py)

Note on test coverage additions:
- conftest.py autouse fixture was already added in commit 4 (had to
  be co-located with the repository changes to keep pre-existing
  persistence tests passing)
- cross-user isolation E2E tests (test_owner_isolation.py) deferred
  — enforcement is already proven by the 98-test repository suite
  via the autouse fixture + explicit _AUTO sentinel exercises
- New test cases (TC-API-17..20, TC-ATK-13, TC-MIG-01..07) listed
  in AUTH_TEST_PLAN.md are deferred to a follow-up PR — they are
  manual-QA test cases rather than pytest code, and the spec-level
  coverage is already met by test_user_context.py + the 98-test
  repository suite.

Final test results:
- Auth suite (test_auth*, test_langgraph_auth, test_ensure_admin,
  test_user_context): 186 passed
- Persistence suite (test_run_event_store, test_run_repository,
  test_thread_meta_repo, test_feedback): 98 passed
- Lint: ruff check + ruff format both clean

* test(auth): add cross-user isolation test suite

10 tests exercising the storage-layer owner filter by manually
switching the user_context contextvar between two users. Verifies
the safety invariant:

  After a repository write with owner_id=A, a subsequent read with
  owner_id=B must not return the row, and vice versa.

Covers all 4 tables that own user-scoped data:

TC-API-17  threads_meta  — read, search, update, delete cross-user
TC-API-18  runs          — get, list_by_thread, delete cross-user
TC-API-19  run_events    — list_messages, list_events, count_messages,
                           delete_by_thread (CRITICAL: raw conversation
                           content leak vector)
TC-API-20  feedback      — get, list_by_run, delete cross-user

Plus two meta-tests verifying the sentinel pattern itself:
- AUTO + unset contextvar raises RuntimeError
- explicit owner_id=None bypasses the filter (migration escape hatch)

Architecture note
-----------------
These tests bypass the HTTP layer by design. The full chain
(cookie → middleware → contextvar → repository) is covered piecewise:

- test_auth_middleware.py: middleware sets contextvar from cookies
- test_owner_isolation.py: repositories enforce isolation when
  contextvar is set to different users

Together they prove the end-to-end safety property without the
ceremony of spinning up a full TestClient + in-memory DB for every
router endpoint.

Tests pass: 231 (full auth + persistence + isolation suite)
Lint: clean

* refactor(auth): migrate user repository to SQLAlchemy ORM

Move the users table into the shared persistence engine so auth
matches the pattern of threads_meta, runs, run_events, and feedback —
one engine, one session factory, one schema init codepath.

New files
---------
- persistence/user/__init__.py, persistence/user/model.py: UserRow
  ORM class with partial unique index on (oauth_provider, oauth_id)
- Registered in persistence/models/__init__.py so
  Base.metadata.create_all() picks it up

Modified
--------
- auth/repositories/sqlite.py: rewritten as async SQLAlchemy,
  identical constructor pattern to the other four repositories
  (def __init__(self, session_factory) + self._sf = session_factory)
- auth/config.py: drop users_db_path field — storage is configured
  through config.database like every other table
- deps.py/get_local_provider: construct SQLiteUserRepository with
  the shared session factory, fail fast if engine is not initialised
- tests/test_auth.py: rewrite test_sqlite_round_trip_new_fields to
  use the shared engine (init_engine + close_engine in a tempdir)
- tests/test_auth_type_system.py: add per-test autouse fixture that
  spins up a scratch engine and resets deps._cached_* singletons

* refactor(auth): remove SQL orphan migration (unused in supported scenarios)

The _migrate_orphan_sql_tables helper existed to bind NULL owner_id
rows in threads_meta, runs, run_events, and feedback to the admin on
first boot. But in every supported upgrade path, it's a no-op:

  1. Fresh install: create_all builds fresh tables, no legacy rows
  2. No-auth → with-auth (no existing persistence DB): persistence
     tables are created fresh by create_all, no legacy rows
  3. No-auth → with-auth (has existing persistence DB from #1930):
     NOT a supported upgrade path — "有 DB 到有 DB" schema evolution
     is out of scope; users wipe DB or run manual ALTER

So the SQL orphan migration never has anything to do in the
supported matrix. Delete the function, simplify _ensure_admin_user
from a 3-step pipeline to a 2-step one (admin creation + LangGraph
store orphan migration only).

LangGraph store orphan migration stays: it serves the real
"no-auth → with-auth" upgrade path where a user's existing LangGraph
thread metadata has no owner_id field and needs to be stamped with
the newly-created admin's id.

Tests: 284 passed (auth + persistence + isolation)
Lint: clean

* security(auth): write initial admin password to 0600 file instead of logs

CodeQL py/clear-text-logging-sensitive-data flagged 3 call sites that
logged the auto-generated admin password to stdout via logger.info().
Production log aggregators (ELK/Splunk/etc) would have captured those
cleartext secrets. Replace with a shared helper that writes to
.deer-flow/admin_initial_credentials.txt with mode 0600, and log only
the path.

New file
--------
- app/gateway/auth/credential_file.py: write_initial_credentials()
  helper. Takes email, password, and a "initial"/"reset" label.
  Creates .deer-flow/ if missing, writes a header comment plus the
  email+password, chmods 0o600, returns the absolute Path.

Modified
--------
- app/gateway/app.py: both _ensure_admin_user paths (fresh creation
  + needs_setup password reset) now write to file and log the path
- app/gateway/auth/reset_admin.py: rewritten to use the shared ORM
  repo (SQLiteUserRepository with session_factory) and the
  credential_file helper. The previous implementation was broken
  after the earlier ORM refactor — it still imported _get_users_conn
  and constructed SQLiteUserRepository() without a session factory.

No tests changed — the three password-log sites are all exercised
via existing test_ensure_admin.py which checks that startup
succeeds, not that a specific string appears in logs.

CodeQL alerts 272, 283, 284: all resolved.

* security(auth): strict JWT validation in middleware (fix junk cookie bypass)

AUTH_TEST_PLAN test 7.5.8 expects junk cookies to be rejected with
401. The previous middleware behaviour was "presence-only": check
that some access_token cookie exists, then pass through. In
combination with my Task-12 decision to skip @require_auth
decorators on routes, this created a gap where a request with any
cookie-shaped string (e.g. access_token=not-a-jwt) would bypass
authentication on routes that do not touch the repository
(/api/models, /api/mcp/config, /api/memory, /api/skills, …).

Fix: middleware now calls get_current_user_from_request() strictly
and catches the resulting HTTPException to render a 401 with the
proper fine-grained error code (token_invalid, token_expired,
user_not_found, …). On success it stamps request.state.user and
the contextvar so repository-layer owner filters work downstream.

The 4 old "_with_cookie_passes" tests in test_auth_middleware.py
were written for the presence-only behaviour; they asserted that
a junk cookie would make the handler return 200. They are renamed
to "_with_junk_cookie_rejected" and their assertions flipped to
401. The negative path (no cookie → 401 not_authenticated)
is unchanged.

Verified:
  no cookie       → 401 not_authenticated
  junk cookie     → 401 token_invalid     (the fixed bug)
  expired cookie  → 401 token_expired

Tests: 284 passed (auth + persistence + isolation)
Lint: clean

* security(auth): wire @require_permission(owner_check=True) on isolation routes

Apply the require_permission decorator to all 28 routes that take a
{thread_id} path parameter. Combined with the strict middleware
(previous commit), this gives the double-layer protection that
AUTH_TEST_PLAN test 7.5.9 documents:

  Layer 1 (AuthMiddleware): cookie + JWT validation, rejects junk
                            cookies and stamps request.state.user
  Layer 2 (@require_permission with owner_check=True): per-resource
                            ownership verification via
                            ThreadMetaStore.check_access — returns
                            404 if a different user owns the thread

The decorator's owner_check branch is rewritten to use the SQL
thread_meta_repo (the 2.0-rc persistence layer) instead of the
LangGraph store path that PR #1728 used (_store_get / get_store
in routers/threads.py). The inject_record convenience is dropped
— no caller in 2.0 needs the LangGraph blob, and the SQL repo has
a different shape.

Routes decorated (28 total):
- threads.py: delete, patch, get, get-state, post-state, post-history
- thread_runs.py: post-runs, post-runs-stream, post-runs-wait,
  list_runs, get_run, cancel_run, join_run, stream_existing_run,
  list_thread_messages, list_run_messages, list_run_events,
  thread_token_usage
- feedback.py: create, list, stats, delete
- uploads.py: upload (added Request param), list, delete
- artifacts.py: get_artifact
- suggestions.py: generate (renamed body parameter to avoid
  conflict with FastAPI Request)

Test fixes:
- test_suggestions_router.py: bypass the decorator via __wrapped__
  (the unit tests cover parsing logic, not auth — no point spinning
  up a thread_meta_repo just to test JSON unwrapping)
- test_auth_middleware.py 4 fake-cookie tests: already updated in
  the previous commit (745bf432)

Tests: 293 passed (auth + persistence + isolation + suggestions)
Lint: clean

* security(auth): defense-in-depth fixes from release validation pass

Eight findings caught while running the AUTH_TEST_PLAN end-to-end against
the deployed sg_dev stack. Each is a pre-condition for shipping
release/2.0-rc that the previous PRs missed.

Backend hardening
- routers/auth.py: rate limiter X-Real-IP now requires AUTH_TRUSTED_PROXIES
  whitelist (CIDR/IP allowlist). Without nginx in front, the previous code
  honored arbitrary X-Real-IP, letting an attacker rotate the header to
  fully bypass the per-IP login lockout.
- routers/auth.py: 36-entry common-password blocklist via Pydantic
  field_validator on RegisterRequest + ChangePasswordRequest. The shared
  _validate_strong_password helper keeps the constraint in one place.
- routers/threads.py: ThreadCreateRequest + ThreadPatchRequest strip
  server-reserved metadata keys (owner_id, user_id) via Pydantic
  field_validator so a forged value can never round-trip back to other
  clients reading the same thread. The actual ownership invariant stays
  on the threads_meta row; this closes the metadata-blob echo gap.
- authz.py + thread_meta/sql.py: require_permission gains a require_existing
  flag plumbed through check_access(require_existing=True). Destructive
  routes (DELETE/PATCH/state-update/runs/feedback) now treat a missing
  thread_meta row as 404 instead of "untracked legacy thread, allow",
  closing the cross-user delete-idempotence gap where any user could
  successfully DELETE another user's deleted thread.
- repositories/sqlite.py + base.py: update_user raises UserNotFoundError
  on a vanished row instead of silently returning the input. Concurrent
  delete during password reset can no longer look like a successful update.
- runtime/user_context.py: resolve_owner_id() coerces User.id (UUID) to
  str at the contextvar boundary so SQLAlchemy String(64) columns can
  bind it. The whole 2.0-rc isolation pipeline was previously broken
  end-to-end (POST /api/threads → 500 "type 'UUID' is not supported").
- persistence/engine.py: SQLAlchemy listener enables PRAGMA journal_mode=WAL,
  synchronous=NORMAL, foreign_keys=ON on every new SQLite connection.
  TC-UPG-06 in the test plan expects WAL; previous code shipped with the
  default 'delete' journal.
- auth_middleware.py: stamp request.state.auth = AuthContext(...) so
  @require_permission's short-circuit fires; previously every isolation
  request did a duplicate JWT decode + users SELECT. Also unifies the
  401 payload through AuthErrorResponse(...).model_dump().
- app.py: _ensure_admin_user restructure removes the noqa F821 scoping
  bug where 'password' was referenced outside the branch that defined it.
  New _announce_credentials helper absorbs the duplicate log block in
  the fresh-admin and reset-admin branches.

* fix(frontend+nginx): rollout CSRF on every state-changing client path

The frontend was 100% broken in gateway-pro mode for any user trying to
open a specific chat thread. Three cumulative bugs each silently
masked the next.

LangGraph SDK CSRF gap (api-client.ts)
- The Client constructor took only apiUrl, no defaultHeaders, no fetch
  interceptor. The SDK's internal fetch never sent X-CSRF-Token, so
  every state-changing /api/langgraph-compat/* call (runs/stream,
  threads/search, threads/{tid}/history, ...) hit CSRFMiddleware and
  got 403 before reaching the auth check. UI symptom: empty thread page
  with no error message; the SPA's hooks swallowed the rejection.
- Fix: pass an onRequest hook that injects X-CSRF-Token from the
  csrf_token cookie per request. Reading the cookie per call (not at
  construction time) handles login / logout / password-change cookie
  rotation transparently. The SDK's prepareFetchOptions calls
  onRequest for both regular requests AND streaming/SSE/reconnect, so
  the same hook covers runs.stream and runs.joinStream.

Raw fetch CSRF gap (7 files)
- Audit: 11 frontend fetch sites, only 2 included CSRF (login/setup +
  account-settings change-password). The other 7 routed through raw
  fetch() with no header — suggestions, memory, agents, mcp, skills,
  uploads, and the local thread cleanup hook all 403'd silently.
- Fix: enhance fetcher.ts:fetchWithAuth to auto-inject X-CSRF-Token on
  POST/PUT/DELETE/PATCH from a single shared readCsrfCookie() helper.
  Convert all 7 raw fetch() callers to fetchWithAuth so the contract
  is centrally enforced. api-client.ts and fetcher.ts share
  readCsrfCookie + STATE_CHANGING_METHODS to avoid drift.

nginx routing + buffering (nginx.local.conf)
- The auth feature shipped without updating the nginx config: per-API
  explicit location blocks but no /api/v1/auth/, /api/feedback, /api/runs.
  The frontend's client-side fetches to /api/v1/auth/login/local 404'd
  from the Next.js side because nginx routed /api/* to the frontend.
- Fix: add catch-all `location /api/` that proxies to the gateway.
  nginx longest-prefix matching keeps the explicit blocks (/api/models,
  /api/threads regex, /api/langgraph/, ...) winning for their paths.
- Fix: disable proxy_buffering + proxy_request_buffering for the
  frontend `location /` block. Without it, nginx tries to spool large
  Next.js chunks into /var/lib/nginx/proxy (root-owned) and fails with
  Permission denied → ERR_INCOMPLETE_CHUNKED_ENCODING → ChunkLoadError.

* test(auth): release-validation test infra and new coverage

Test fixtures and unit tests added during the validation pass.

Router test helpers (NEW: tests/_router_auth_helpers.py)
- make_authed_test_app(): builds a FastAPI test app with a stub
  middleware that stamps request.state.user + request.state.auth and a
  permissive thread_meta_repo mock. TestClient-based router tests
  (test_artifacts_router, test_threads_router) use it instead of bare
  FastAPI() so the new @require_permission(owner_check=True) decorators
  short-circuit cleanly.
- call_unwrapped(): walks the __wrapped__ chain to invoke the underlying
  handler without going through the authz wrappers. Direct-call tests
  (test_uploads_router) use it. Typed with ParamSpec so the wrapped
  signature flows through.

Backend test additions
- test_auth.py: 7 tests for the new _get_client_ip trust model (no
  proxy / trusted proxy / untrusted peer / XFF rejection / invalid
  CIDR / no client). 5 tests for the password blocklist (literal,
  case-insensitive, strong password accepted, change-password binding,
  short-password length-check still fires before blocklist).
  test_update_user_raises_when_row_concurrently_deleted: closes a
  shipped-without-coverage gap on the new UserNotFoundError contract.
- test_thread_meta_repo.py: 4 tests for check_access(require_existing=True)
  — strict missing-row denial, strict owner match, strict owner mismatch,
  strict null-owner still allowed (shared rows survive the tightening).
- test_ensure_admin.py: 3 tests for _migrate_orphaned_threads /
  _iter_store_items pagination, covering the TC-UPG-02 upgrade story
  end-to-end via mock store. Closes the gap where the cursor pagination
  was untested even though the previous PR rewrote it.
- test_threads_router.py: 5 tests for _strip_reserved_metadata
  (owner_id removal, user_id removal, safe-keys passthrough, empty
  input, both-stripped).
- test_auth_type_system.py: replace "password123" fixtures with
  Tr0ub4dor3a / AnotherStr0ngPwd! so the new password blocklist
  doesn't reject the test data.

* docs(auth): refresh TC-DOCKER-05 + document Docker validation gap

- AUTH_TEST_PLAN.md TC-DOCKER-05: the previous expectation
  ("admin password visible in docker logs") was stale after the simplify
  pass that moved credentials to a 0600 file. The grep "Password:" check
  would have silently failed and given a false sense of coverage. New
  expectation matches the actual file-based path: 0600 file in
  DEER_FLOW_HOME, log shows the path (not the secret), reverse-grep
  asserts no leaked password in container logs.
- NEW: docs/AUTH_TEST_DOCKER_GAP.md documents the only un-executed
  block in the test plan (TC-DOCKER-01..06). Reason: sg_dev validation
  host has no Docker daemon installed. The doc maps each Docker case
  to an already-validated bare-metal equivalent (TC-1.1, TC-REENT-01,
  TC-API-02 etc.) so the gap is auditable, and includes pre-flight
  reproduction steps for whoever has Docker available.

---------

Co-authored-by: greatmengqi <chenmengqi.0376@bytedance.com>

* refactor(persistence): unify SQLite to single deerflow.db and move checkpointer to runtime

Merge checkpoints.db and app.db into a single deerflow.db file (WAL mode
handles concurrent access safely). Move checkpointer module from
agents/checkpointer to runtime/checkpointer to better reflect its role
as a runtime infrastructure concern.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor(persistence): rename owner_id to user_id and thread_meta_repo to thread_store

Rename owner_id to user_id across all persistence models, repositories,
stores, routers, and tests for clearer semantics. Rename thread_meta_repo
to thread_store for consistency with run_store/run_event_store naming.
Add ThreadMetaStore return type annotation to get_thread_store().

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(persistence): unify ThreadMetaStore interface with user isolation and factory

Add user_id parameter to all ThreadMetaStore abstract methods. Implement
owner isolation in MemoryThreadMetaStore with _get_owned_record helper.
Add check_access to base class and memory implementation. Add
make_thread_store factory to simplify deps.py initialization. Add
memory-backend isolation tests.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(feedback): add UNIQUE(thread_id, run_id, user_id) constraint

Add UNIQUE constraint to FeedbackRow to enforce one feedback per user per run,
enabling upsert behavior in Task 2. Update tests to use distinct user_ids for
multiple feedback records per run, and pass user_id=None to list_by_run for
admin-style queries that bypass user isolation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(feedback): add upsert() method with UNIQUE enforcement

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(feedback): add delete_by_run() and list_by_thread_grouped()

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(feedback): add PUT upsert and DELETE-by-run endpoints

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(feedback): enrich messages endpoint with per-run feedback data

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(feedback): add frontend feedback API client

Adds upsertFeedback and deleteFeedback API functions backed by
fetchWithAuth, targeting the /api/threads/{id}/runs/{id}/feedback
endpoint.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(feedback): wire feedback data into message rendering for history echo

Adds useThreadFeedback hook that fetches run-level feedback from the
messages API and builds a runId->FeedbackData map. MessageList now calls
this hook and passes feedback and runId to each MessageListItem so
previously-submitted thumbs are pre-filled when revisiting a thread.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(feedback): correct run_id mapping for feedback echo

The feedbackMap was keyed by run_id but looked up by LangGraph message ID.
Fixed by tracking AI message ordinal index to correlate event store
run_ids with LangGraph SDK messages.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(feedback): use real threadId and refresh after stream

- Pass threadId prop to MessageListItem instead of reading "new" from URL params
- Invalidate thread-feedback query on stream finish so buttons appear immediately
- Show feedback buttons always visible, copy button on hover only

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* style(feedback): group copy and feedback buttons together on the left

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* style(feedback): always show toolbar buttons without hover

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(persistence): stream hang when run_events.backend=db

DbRunEventStore._user_id_from_context() returned user.id without
coercing it to str. User.id is a Pydantic UUID, and aiosqlite cannot
bind a raw UUID object to a VARCHAR column, so the INSERT for the
initial human_message event silently rolled back and raised out of
the worker task. Because that put() sat outside the worker's try
block, the finally-clause that publishes end-of-stream never ran
and the SSE stream hung forever.

jsonl mode was unaffected because json.dumps(default=str) coerces
UUID objects transparently.

Fixes:
- db.py: coerce user.id to str at the context-read boundary (matches
  what resolve_user_id already does for the other repositories)
- worker.py: move RunJournal init + human_message put inside the try
  block so any failure flows through the finally/publish_end path
  instead of hanging the subscriber

Defense-in-depth:
- engine.py: add PRAGMA busy_timeout=5000 so checkpointer and event
  store wait for each other on the shared deerflow.db file instead
  of failing immediately under write-lock contention
- journal.py: skip fire-and-forget _flush_sync when a previous flush
  task is still in flight, to avoid piling up concurrent put_batch
  writes on the same SQLAlchemy engine during streaming; flush() now
  waits for pending tasks before draining the buffer
- database_config.py: doc-only update clarifying WAL + busy_timeout
  keep the unified deerflow.db safe for both workloads

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore(persistence): drop redundant busy_timeout PRAGMA

Python's sqlite3 driver defaults to a 5-second busy timeout via the
``timeout`` kwarg of ``sqlite3.connect``, and aiosqlite + SQLAlchemy's
aiosqlite dialect inherit that default. Setting ``PRAGMA busy_timeout=5000``
explicitly was a no-op — verified by reading back the PRAGMA on a fresh
connection (it already reports 5000ms without our PRAGMA).

Concurrent stress test (50 checkpoint writes + 20 event batches + 50
thread_meta updates on the same deerflow.db) still completes with zero
errors and 200/200 rows after removing the explicit PRAGMA.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(journal): unwrap Command tool results in on_tool_end

Tools that update graph state (e.g. ``present_files``) return
``Command(update={'messages': [ToolMessage(...)], 'artifacts': [...]})``.
LangGraph later unwraps the inner ``ToolMessage`` into checkpoint state,
but ``RunJournal.on_tool_end`` was receiving the ``Command`` object
directly via the LangChain callback chain and storing
``str(Command(update={...}))`` as the tool_result content.

This produced a visible divergence between the event-store and the
checkpoint for any thread that used a Command-returning tool, blocking
the event-store-backed history fix in the follow-up commit. Concrete
example from thread ``6d30913e-dcd4-41c8-8941-f66c716cf359`` (seq=48):
checkpoint had ``'Successfully presented files'`` while event_store
stored the full Command repr.

The fix detects ``Command`` in ``on_tool_end``, extracts the first
``ToolMessage`` from ``update['messages']``, and lets the existing
ToolMessage branch handle the ``model_dump()`` path. Legacy rows still
containing the Command repr are separately cleaned up by the history
helper in the follow-up commit.

Tests:
- ``test_tool_end_unwraps_command_with_inner_tool_message`` — unit test
  of the unwrap branch with a constructed Command
- ``test_tool_invoke_end_to_end_unwraps_command`` — end-to-end via
  ``CallbackManager`` + ``tool.invoke`` to exercise the real LangChain
  dispatch path that production uses, matching the repro shape from
  ``present_files``
- Counter-proof: temporarily reverted the patch, both tests failed with
  the exact ``Command(update={...})`` repr that was stored in the
  production SQLite row at seq=48, confirming LangChain does pass the
  ``Command`` through callbacks (the unwrap is load-bearing)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(threads): load history messages from event store, immune to summarize

``get_thread_history`` and ``get_thread_state`` in Gateway mode read
messages from ``checkpoint.channel_values["messages"]``. After
SummarizationMiddleware runs mid-run, that list is rewritten in-place:
pre-summarize messages are dropped and a synthetic summary-as-human
message takes position 0. The frontend then renders a chat history that
starts with ``"Here is a summary of the conversation to date:..."``
instead of the user's original query, and all earlier turns are gone.

The event store (``RunEventStore``) is append-only and never rewritten,
so it retains the full transcript. This commit adds a helper
``_get_event_store_messages`` that loads the event store's message
stream and overrides ``values["messages"]`` in both endpoints; the
checkpoint fallback kicks in only when the event store is unavailable.

Behavior contract of the helper:

- **Full pagination.** ``list_messages`` returns the newest ``limit``
  records when no cursor is given, so a fixed limit silently drops
  older messages on long threads. The helper sizes the read from
  ``count_messages()`` and pages forward with ``after_seq`` cursors.
- **Copy-on-read.** Each content dict is copied before ``id`` is
  patched so the live store object (``MemoryRunEventStore`` returns
  references) is never mutated.
- **Stable ids.** Messages with ``id=None`` (human + tool_result,
  which don't receive an id until checkpoint persistence) get a
  deterministic ``uuid5(NAMESPACE_URL, f"{thread_id}:{seq}")`` so
  React keys stay stable across requests. AI messages keep their
  LLM-assigned ``lc_run--*`` ids.
- **Legacy ``Command`` repr sanitization.** Rows captured before the
  ``journal.py`` ``on_tool_end`` fix (previous commit) stored
  ``str(Command(update={'messages': [ToolMessage(content='X', ...)]}))``
  as the tool_result content. ``_sanitize_legacy_command_repr``
  regex-extracts the inner text so old threads render cleanly.
- **Inline feedback.** When loading the stream, the helper also pulls
  ``feedback_repo.list_by_thread_grouped`` and attaches ``run_id`` to
  every message plus ``feedback`` to the final ``ai_message`` of each
  run. This removes the frontend's need to fetch a second endpoint
  and positional-index-map its way back to the right run. When the
  feedback subsystem is unavailable, the ``feedback`` field is left
  absent entirely so the frontend hides the button rather than
  rendering it over a broken write path.
- **User context.** ``DbRunEventStore`` is user-scoped by default via
  ``resolve_user_id(AUTO)``. The helper relies on the ``@require_permission``
  decorator having populated the user contextvar on both callers; the
  docstring documents this dependency explicitly so nobody wires it
  into a CLI or migration script without passing ``user_id=None``.

Real data verification against thread
``6d30913e-dcd4-41c8-8941-f66c716cf359``: checkpoint showed 12 messages
(summarize-corrupted), event store had 16. The original human message
``"最新伊美局势"`` was preserved as seq=1 in the event store and
correctly restored to position 0 in the helper output. Helper output
for AI messages was byte-identical to checkpoint for every overlapping
message; only tool_result ids differed (patched to uuid5) and the
legacy Command repr at seq=48 was sanitized.

Tests:
- ``test_thread_state_event_store.py`` — 18 tests covering
  ``_sanitize_legacy_command_repr`` (passthrough, single/double-quote
  extraction, unparseable fallback), helper happy path (all message
  types, stable uuid5, store non-mutation), multi-page pagination,
  summarize regression (recovers pre-summarize messages), feedback
  attachment (per-run, multi-run threads, repo failure graceful),
  and dependency failure fallback to ``None``.

Docs:
- ``docs/superpowers/plans/2026-04-10-event-store-history.md`` — the
  implementation plan this commit realizes, with Task 1 revised after
  the evaluation findings (pagination, copy-on-read, Command wrap
  already landed in journal.py, frontend feedback pagination in the
  follow-up commit, Standard-mode follow-up noted).
- ``docs/superpowers/specs/2026-04-11-runjournal-history-evaluation.md``
  — the Claude + second-opinion evaluation document that drove the
  plan revisions (pagination bug, dict-mutation bug, feedback hidden
  bug, Command bug).
- ``docs/superpowers/specs/2026-04-11-summarize-marker-design.md`` —
  design for a follow-up PR that visually marks summarize events in
  history, based on a verified ``adispatch_custom_event`` experiment
  (``trace=False`` middleware nodes can still forward the Pregel task
  config via explicit signature injection).

Scope: Gateway mode only (``make dev-pro``). Standard mode
(``make dev``) hits LangGraph Server directly and bypasses these
endpoints; the summarize symptom is still present there and is
tracked as a separate follow-up in the plan.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor(feedback): inline feedback on history and drop positional mapping

The old ``useThreadFeedback`` hook loaded ``GET /api/threads/{id}/messages?limit=200``
and built two parallel lookup tables: ``runIdByAiIndex`` (an ordinal array of
run_ids for every ``ai_message``-typed event) and ``feedbackByRunId``. The render
loop in ``message-list.tsx`` walked the AI messages in order, incrementing
``aiMessageIndex`` on each non-human message, and used that ordinal to look up
the run_id and feedback.

This shape had three latent bugs we could observe on real threads:

1. **Fetch was capped at 200 messages.** Long or tool-heavy threads silently
   dropped earlier entries from the map, so feedback buttons could be missing
   on messages they should own.
2. **Ordinal mismatch.** The render loop counted every non-human message
   (including each intermediate ``ai_tool_call``), but ``runIdByAiIndex`` only
   pushed entries for ``event_type == "ai_message"``. A run with 3 tool_calls
   + 1 final AI message would push 1 entry while the render consumed 4
   positions, so buttons mapped to the wrong positions across multi-run
   threads.
3. **Two parallel data paths.** The ``/history`` render path and the
   ``/messages`` feedback-lookup path could drift in-between an
   ``invalidateQueries`` call and the next refetch, producing transient
   mismaps.

The previous commit moved the authoritative message source for history to
the event store and added ``run_id`` + ``feedback`` inline on each message
dict returned by ``_get_event_store_messages``. This commit aligns the
frontend with that contract:

- **Delete** ``useThreadFeedback``, ``ThreadFeedbackData``,
  ``runIdByAiIndex``, ``feedbackByRunId``, and ``fetchAllThreadMessages``.
- **Introduce** ``useThreadMessageEnrichment`` that fetches
  ``POST /history?limit=1`` once, indexes the returned messages by
  ``message.id`` into a ``Map<id, {run_id, feedback?}>``, and invalidates
  on stream completion (``onFinish`` in ``useThreadStream``). Keying by
  ``message.id`` is stable across runs, tool_call chains, and summarize.
- **Simplify** ``message-list.tsx`` to drop the ``aiMessageIndex``
  counter and read ``enrichment?.get(msg.id)`` at each render step.
- **Rewire** ``message-list-item.tsx`` so the feedback button renders
  when ``feedback !== undefined`` rather than when the message happens
  to be non-human. ``feedback`` is ``undefined`` for non-eligible
  messages (humans, non-final AI, tools), ``null`` for the final
  ai_message of an unrated run, and a ``FeedbackData`` object once
  rated — cleanly distinguishing "not eligible" from "eligible but
  unrated".

``/api/threads/{id}/messages`` is kept as a debug/export surface; no
frontend code calls it anymore but the backend router is untouched.

Validation:
- ``pnpm check`` clean (0 errors, 1 pre-existing unrelated warning)
- Live test on thread ``3d5dea4a`` after gateway restart confirmed the
  original user query is restored to position 0 and the feedback
  button behaves correctly on the final AI message.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(rebase): remove duplicate definitions and update stale module paths

Rebase left duplicate function blocks in worker.py (triple human_message
write causing 3x user messages in /history), deps.py, and prompt.py.
Also update checkpointer imports from the old deerflow.agents.checkpointer
path to deerflow.runtime.checkpointer, and clean up orphaned feedback
props in the frontend message components.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(rebase): restore FeedbackButtons component and enrichment lost during rebase

The FeedbackButtons component (defined inline in message-list-item.tsx)
was introduced in commit 95df8d13 but lost during rebase. The previous
rebase cleanup commit incorrectly removed the feedback/runId props and
enrichment hook as "orphaned code" instead of restoring the missing
component. This commit restores:

- FeedbackButtons component with thumbs up/down toggle and optimistic state
- FeedbackData/upsertFeedback/deleteFeedback imports
- feedback and runId props on MessageListItem
- useThreadMessageEnrichment hook and entry lookup in message-list.tsx

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(user-context): add DEFAULT_USER_ID and get_effective_user_id helper

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(paths): add user-aware path methods with optional user_id parameter

Add _validate_user_id(), user_dir(), user_memory_file(), user_agent_memory_file()
and optional keyword-only user_id parameter to all thread-related path methods.
When user_id is provided, paths resolve under users/{user_id}/threads/{thread_id}/;
when omitted, legacy layout is preserved for backward compatibility.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(memory): add user_id to MemoryStorage interface for per-user isolation

Thread user_id through MemoryStorage.load/reload/save abstract methods and
FileMemoryStorage, re-keying the in-memory cache from bare agent_name to a
(user_id, agent_name) tuple to prevent cross-user cache collisions.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(memory): thread user_id through memory updater layer

Add `user_id` keyword-only parameter to all public updater functions
(_save_memory_to_file, get_memory_data, reload_memory_data, import_memory_data,
clear_memory_data, create/delete/update_memory_fact) and regular keyword param
to MemoryUpdater.update_memory + update_memory_from_conversation, propagating
it to every storage load/save/reload call.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(memory): capture user_id at enqueue time for async-safe thread isolation

Add user_id field to ConversationContext and MemoryUpdateQueue.add() so the
user identity is stored explicitly at request time, before threading.Timer
fires on a different thread where ContextVar values do not propagate.
MemoryMiddleware.after_agent() now calls get_effective_user_id() at enqueue
time and passes the value through to updater.update_memory().

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(isolation): wire user_id through all Paths and memory callsites

Pass user_id=get_effective_user_id() at every callsite that invokes
Paths methods or memory functions, enabling per-user filesystem isolation
throughout the harness and app layers.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(migration): add idempotent script for per-user data migration

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: update CLAUDE.md and config docs for per-user isolation

* feat(events): add pagination to list_messages_by_run on all store backends

Replicates the existing before_seq/after_seq/limit cursor-pagination pattern
from list_messages onto list_messages_by_run across the abstract interface,
MemoryRunEventStore, JsonlRunEventStore, and DbRunEventStore.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(api): add GET /api/runs/{run_id}/messages with cursor pagination

New endpoint resolves thread_id from the run record and delegates to
RunEventStore.list_messages_by_run for cursor-based pagination.
Ownership is enforced implicitly via RunStore.get() user filtering.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(api): add GET /api/runs/{run_id}/feedback

Delegates to FeedbackRepository.list_by_run via the existing _resolve_run
helper; includes tests for success, 404, empty list, and 503 (no DB).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(api): retrofit cursor pagination onto GET /threads/{tid}/runs/{rid}/messages

Replace bare list[dict] response with {data: [...], has_more: bool} envelope,
forwarding limit/before_seq/after_seq query params to the event store.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add run-level API endpoints to CLAUDE.md routers table

* refactor(threads): remove event-store message loader and feedback from state/history endpoints

State and history endpoints now return messages purely from the
checkpointer's channel_values. The _get_event_store_messages helper
(which loaded the full event-store transcript with feedback attached)
is removed along with its tests. Frontend will use the dedicated
GET /api/runs/{run_id}/messages and /feedback endpoints instead.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(persistence): add unified persistence layer with event store, token tracking, and feedback (#1930)

* feat(persistence): add SQLAlchemy 2.0 async ORM scaffold

Introduce a unified database configuration (DatabaseConfig) that
controls both the LangGraph checkpointer and the DeerFlow application
persistence layer from a single `database:` config section.

New modules:
- deerflow.config.database_config — Pydantic config with memory/sqlite/postgres backends
- deerflow.persistence — async engine lifecycle, DeclarativeBase with to_dict mixin, Alembic skeleton
- deerflow.runtime.runs.store — RunStore ABC + MemoryRunStore implementation

Gateway integration initializes/tears down the persistence engine in
the existing langgraph_runtime() context manager. Legacy checkpointer
config is preserved for backward compatibility.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(persistence): add RunEventStore ABC + MemoryRunEventStore

Phase 2-A prerequisite for event storage: adds the unified run event
stream interface (RunEventStore) with an in-memory implementation,
RunEventsConfig, gateway integration, and comprehensive tests (27 cases).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(persistence): add ORM models, repositories, DB/JSONL event stores, RunJournal, and API endpoints

Phase 2-B: run persistence + event storage + token tracking.

- ORM models: RunRow (with token fields), ThreadMetaRow, RunEventRow
- RunRepository implements RunStore ABC via SQLAlchemy ORM
- ThreadMetaRepository with owner access control
- DbRunEventStore with trace content truncation and cursor pagination
- JsonlRunEventStore with per-run files and seq recovery from disk
- RunJournal (BaseCallbackHandler) captures LLM/tool/lifecycle events,
  accumulates token usage by caller type, buffers and flushes to store
- RunManager now accepts optional RunStore for persistent backing
- Worker creates RunJournal, writes human_message, injects callbacks
- Gateway deps use factory functions (RunRepository when DB available)
- New endpoints: messages, run messages, run events, token-usage
- ThreadCreateRequest gains assistant_id field
- 92 tests pass (33 new), zero regressions

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(persistence): add user feedback + follow-up run association

Phase 2-C: feedback and follow-up tracking.

- FeedbackRow ORM model (rating +1/-1, optional message_id, comment)
- FeedbackRepository with CRUD, list_by_run/thread, aggregate stats
- Feedback API endpoints: create, list, stats, delete
- follow_up_to_run_id in RunCreateRequest (explicit or auto-detected
  from latest successful run on the thread)
- Worker writes follow_up_to_run_id into human_message event metadata
- Gateway deps: feedback_repo factory + getter
- 17 new tests (14 FeedbackRepository + 3 follow-up association)
- 109 total tests pass, zero regressions

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test+config: comprehensive Phase 2 test coverage + deprecate checkpointer config

- config.example.yaml: deprecate standalone checkpointer section, activate
  unified database:sqlite as default (drives both checkpointer + app data)
- New: test_thread_meta_repo.py (14 tests) — full ThreadMetaRepository coverage
  including check_access owner logic, list_by_owner pagination
- Extended test_run_repository.py (+4 tests) — completion preserves fields,
  list ordering desc, limit, owner_none returns all
- Extended test_run_journal.py (+8 tests) — on_chain_error, track_tokens=false,
  middleware no ai_message, unknown caller tokens, convenience fields,
  tool_error, non-summarization custom event
- Extended test_run_event_store.py (+7 tests) — DB batch seq continuity,
  make_run_event_store factory (memory/db/jsonl/fallback/unknown)
- Extended test_phase2b_integration.py (+4 tests) — create_or_reject persists,
  follow-up metadata, summarization in history, full DB-backed lifecycle
- Fixed DB integration test to use proper fake objects (not MagicMock)
  for JSON-serializable metadata
- 157 total Phase 2 tests pass, zero regressions

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* config: move default sqlite_dir to .deer-flow/data

Keep SQLite databases alongside other DeerFlow-managed data
(threads, memory) under the .deer-flow/ directory instead of a
top-level ./data folder.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor(persistence): remove UTFJSON, use engine-level json_serializer + datetime.now()

- Replace custom UTFJSON type with standard sqlalchemy.JSON in all ORM
  models. Add json_serializer=json.dumps(ensure_ascii=False) to all
  create_async_engine calls so non-ASCII text (Chinese etc.) is stored
  as-is in both SQLite and Postgres.
- Change ORM datetime defaults from datetime.now(UTC) to datetime.now(),
  remove UTC imports.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor(gateway): simplify deps.py with getter factory + inline repos

- Replace 6 identical getter functions with _require() factory.
- Inline 3 _make_*_repo() factories into langgraph_runtime(), call
  get_session_factory() once instead of 3 times.
- Add thread_meta upsert in start_run (services.py).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(docker): add UV_EXTRAS build arg for optional dependencies

Support installing optional dependency groups (e.g. postgres) at
Docker build time via UV_EXTRAS build arg:
  UV_EXTRAS=postgres docker compose build

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor(journal): fix flush, token tracking, and consolidate tests

RunJournal fixes:
- _flush_sync: retain events in buffer when no event loop instead of
  dropping them; worker's finally block flushes via async flush().
- on_llm_end: add tool_calls filter and caller=="lead_agent" guard for
  ai_message events; mark message IDs for dedup with record_llm_usage.
- worker.py: persist completion data (tokens, message count) to RunStore
  in finally block.

Model factory:
- Auto-inject stream_usage=True for BaseChatOpenAI subclasses with
  custom api_base, so usage_metadata is populated in streaming responses.

Test consolidation:
- Delete test_phase2b_integration.py (redundant with existing tests).
- Move DB-backed lifecycle test into test_run_journal.py.
- Add tests for stream_usage injection in test_model_factory.py.
- Clean up executor/task_tool dead journal references.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(events): widen content type to str|dict in all store backends

Allow event content to be a dict (for structured OpenAI-format messages)
in addition to plain strings. Dict values are JSON-serialized for the DB
backend and deserialized on read; memory and JSONL backends handle dicts
natively. Trace truncation now serializes dicts to JSON before measuring.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(events): use metadata flag instead of heuristic for dict content detection

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(converters): add LangChain-to-OpenAI message format converters

Pure functions langchain_to_openai_message, langchain_to_openai_completion,
langchain_messages_to_openai, and _infer_finish_reason for converting
LangChain BaseMessage objects to OpenAI Chat Completions format, used by
RunJournal for event storage. 15 unit tests added.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(converters): handle empty list content as null, clean up test

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(events): human_message content uses OpenAI user message format

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(events): ai_message uses OpenAI format, add ai_tool_call message event

- ai_message content now uses {"role": "assistant", "content": "..."} format
- New ai_tool_call message event emitted when lead_agent LLM responds with tool_calls
- ai_tool_call uses langchain_to_openai_message converter for consistent format
- Both events include finish_reason in metadata ("stop" or "tool_calls")

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(events): add tool_result message event with OpenAI tool message format

Cache tool_call_id from on_tool_start keyed by run_id as fallback for on_tool_end,
then emit a tool_result message event (role=tool, tool_call_id, content) after each
successful tool completion.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(events): summary content uses OpenAI system message format

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(events): replace llm_start/llm_end with llm_request/llm_response in OpenAI format

Add on_chat_model_start to capture structured prompt messages as llm_request events.
Replace llm_end trace events with llm_response using OpenAI Chat Completions format.
Track llm_call_index to pair request/response events.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(events): add record_middleware method for middleware trace events

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test(events): add full run sequence integration test for OpenAI content format

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(events): align message events with checkpoint format and add middleware tag injection

- Message events (ai_message, ai_tool_call, tool_result, human_message) now use
  BaseMessage.model_dump() format, matching LangGraph checkpoint values.messages
- on_tool_end extracts tool_call_id/name/status from ToolMessage objects
- on_tool_error now emits tool_result message events with error status
- record_middleware uses middleware:{tag} event_type and middleware category
- Summarization custom events use middleware:summarize category
- TitleMiddleware injects middleware:title tag via get_config() inheritance
- SummarizationMiddleware model bound with middleware:summarize tag
- Worker writes human_message using HumanMessage.model_dump()

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(threads): switch search endpoint to threads_meta table and sync title

- POST /api/threads/search now queries threads_meta table directly,
  removing the two-phase Store + Checkpointer scan approach
- Add ThreadMetaRepository.search() with metadata/status filters
- Add ThreadMetaRepository.update_display_name() for title sync
- Worker syncs checkpoint title to threads_meta.display_name on run completion
- Map display_name to values.title in search response for API compatibility

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(threads): history endpoint reads messages from event store

- POST /api/threads/{thread_id}/history now combines two data sources:
  checkpointer for checkpoint_id, metadata, title, thread_data;
  event store for messages (complete history, not truncated by summarization)
- Strip internal LangGraph metadata keys from response
- Remove full channel_values serialization in favor of selective fields

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: remove duplicate optional-dependencies header in pyproject.toml

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(middleware): pass tagged config to TitleMiddleware ainvoke call

Without the config, the middleware:title tag was not injected,
causing the LLM response to be recorded as a lead_agent ai_message
in run_events.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: resolve merge conflict in .env.example

Keep both DATABASE_URL (from persistence-scaffold) and WECOM
credentials (from main) after the merge.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(persistence): address review feedback on PR #1851

- Fix naive datetime.now() → datetime.now(UTC) in all ORM models
- Fix seq race condition in DbRunEventStore.put() with FOR UPDATE
  and UNIQUE(thread_id, seq) constraint
- Encapsulate _store access in RunManager.update_run_completion()
- Deduplicate _store.put() logic in RunManager via _persist_to_store()
- Add update_run_completion to RunStore ABC + MemoryRunStore
- Wire follow_up_to_run_id through the full create path
- Add error recovery to RunJournal._flush_sync() lost-event scenario
- Add migration note for search_threads breaking change
- Fix test_checkpointer_none_fix mock to set database=None

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: update uv.lock

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(persistence): address 22 review comments from CodeQL, Copilot, and Code Quality

Bug fixes:
- Sanitize log params to prevent log injection (CodeQL)
- Reset threads_meta.status to idle/error when run completes
- Attach messages only to latest checkpoint in /history response
- Write threads_meta on POST /threads so new threads appear in search

Lint fixes:
- Remove unused imports (journal.py, migrations/env.py, test_converters.py)
- Convert lambda to named function (engine.py, Ruff E731)
- Remove unused logger definitions in repos (Ruff F841)
- Add logging to JSONL decode errors and empty except blocks
- Separate assert side-effects in tests (CodeQL)
- Remove unused local variables in tests (Ruff F841)
- Fix max_trace_content truncation to use byte length, not char length

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* style: apply ruff format to persistence and runtime files

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Potential fix for pull request finding 'Statement has no effect'

Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>

* refactor(runtime): introduce RunContext to reduce run_agent parameter bloat

Extract checkpointer, store, event_store, run_events_config, thread_meta_repo,
and follow_up_to_run_id into a frozen RunContext dataclass. Add get_run_context()
in deps.py to build the base context from app.state singletons. start_run() uses
dataclasses.replace() to enrich per-run fields before passing ctx to run_agent.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor(gateway): move sanitize_log_param to app/gateway/utils.py

Extract the log-injection sanitizer from routers/threads.py into a shared
utils module and rename to sanitize_log_param (public API). Eliminates the
reverse service → router import in services.py.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* perf: use SQL aggregation for feedback stats and thread token usage

Replace Python-side counting in FeedbackRepository.aggregate_by_run with
a single SELECT COUNT/SUM query. Add RunStore.aggregate_tokens_by_thread
abstract method with SQL GROUP BY implementation in RunRepository and
Python fallback in MemoryRunStore. Simplify the thread_token_usage
endpoint to delegate to the new method, eliminating the limit=10000
truncation risk.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: annotate DbRunEventStore.put() as low-frequency path

Add docstring clarifying that put() opens a per-call transaction with
FOR UPDATE and should only be used for infrequent writes (currently
just the initial human_message event). High-throughput callers should
use put_batch() instead.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(threads): fall back to Store search when ThreadMetaRepository is unavailable

When database.backend=memory (default) or no SQL session factory is
configured, search_threads now queries the LangGraph Store instead of
returning 503. Returns empty list if neither Store nor repo is available.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor(persistence): introduce ThreadMetaStore ABC for backend-agnostic thread metadata

Add ThreadMetaStore abstract base class with create/get/search/update/delete
interface. ThreadMetaRepository (SQL) now inherits from it. New
MemoryThreadMetaStore wraps LangGraph BaseStore for memory-mode deployments.

deps.py now always provides a non-None thread_meta_repo, eliminating all
`if thread_meta_repo is not None` guards in services.py, worker.py, and
routers/threads.py. search_threads no longer needs a Store fallback branch.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor(history): read messages from checkpointer instead of RunEventStore

The /history endpoint now reads messages directly from the
checkpointer's channel_values (the authoritative source) instead of
querying RunEventStore.list_messages(). The RunEventStore API is
preserved for other consumers.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(persistence): address new Copilot review comments

- feedback.py: validate thread_id/run_id before deleting feedback
- jsonl.py: add path traversal protection with ID validation
- run_repo.py: parse `before` to datetime for PostgreSQL compat
- thread_meta_repo.py: fix pagination when metadata filter is active
- database_config.py: use resolve_path for sqlite_dir consistency

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Implement skill self-evolution and skill_manage flow (#1874)

* chore: ignore .worktrees directory

* Add skill_manage self-evolution flow

* Fix CI regressions for skill_manage

* Address PR review feedback for skill evolution

* fix(skill-evolution): preserve history on delete

* fix(skill-evolution): tighten scanner fallbacks

* docs: add skill_manage e2e evidence screenshot

* fix(skill-manage): avoid blocking fs ops in session runtime

---------

Co-authored-by: Willem Jiang <willem.jiang@gmail.com>

* fix(config): resolve sqlite_dir relative to CWD, not Paths.base_dir

resolve_path() resolves relative to Paths.base_dir (.deer-flow),
which double-nested the path to .deer-flow/.deer-flow/data/app.db.
Use Path.resolve() (CWD-relative) instead.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Feature/feishu receive file (#1608)

* feat(feishu): add channel file materialization hook for inbound messages

- Introduce Channel.receive_file(msg, thread_id) as a base method for file materialization; default is no-op.
- Implement FeishuChannel.receive_file to download files/images from Feishu messages, save to sandbox, and inject virtual paths into msg.text.
- Update ChannelManager to call receive_file for any channel if msg.files is present, enabling downstream model access to user-uploaded files.
- No impact on Slack/Telegram or other channels (they inherit the default no-op).

* style(backend): format code with ruff for lint compliance

- Auto-formatted packages/harness/deerflow/agents/factory.py and tests/test_create_deerflow_agent.py using `ruff format`
- Ensured both files conform to project linting standards
- Fixes CI lint check failures caused by code style issues

* fix(feishu): handle file write operation asynchronously to prevent blocking

* fix(feishu): rename GetMessageResourceRequest to _GetMessageResourceRequest and remove redundant code

* test(feishu): add tests for receive_file method and placeholder replacement

* fix(manager): remove unnecessary type casting for channel retrieval

* fix(feishu): update logging messages to reflect resource handling instead of image

* fix(feishu): sanitize filename by replacing invalid characters in file uploads

* fix(feishu): improve filename sanitization and reorder image key handling in message processing

* fix(feishu): add thread lock to prevent filename conflicts during file downloads

* fix(test): correct bad merge in test_feishu_parser.py

* chore: run ruff and apply formatting cleanup
fix(feishu): preserve rich-text attachment order and improve fallback filename handling

* fix(docker): restore gateway env vars and fix langgraph empty arg issue (#1915)

Two production docker-compose.yaml bugs prevent `make up` from working:

1. Gateway missing DEER_FLOW_CONFIG_PATH and DEER_FLOW_EXTENSIONS_CONFIG_PATH
   environment overrides. Added in fb2d99f (#1836) but accidentally reverted
   by ca2fb95 (#1847). Without them, gateway reads host paths from .env via
   env_file, causing FileNotFoundError inside the container.

2. Langgraph command fails when LANGGRAPH_ALLOW_BLOCKING is unset (default).
   Empty $${allow_blocking} inserts a bare space between flags, causing
   ' --no-reload' to be parsed as unexpected extra argument. Fix by building
   args string first and conditionally appending --allow-blocking.

Co-authored-by: cooper <cooperfu@tencent.com>

* fix(frontend): resolve invalid HTML nesting and tabnabbing vulnerabilities (#1904)

* fix(frontend): resolve invalid HTML nesting and tabnabbing vulnerabilities

Fix `<button>` inside `<a>` invalid HTML in artifact components and add
missing `noopener,noreferrer` to `window.open` calls to prevent reverse
tabnabbing.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(frontend): address Copilot review on tabnabbing and double-tab-open

Remove redundant parent onClick on web_fetch ChainOfThoughtStep to
prevent opening two tabs on link click, and explicitly null out
window.opener after window.open() for defensive tabnabbing hardening.

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* refactor(persistence): organize entities into per-entity directories

Restructure the persistence layer from horizontal "models/ + repositories/"
split into vertical entity-aligned directories. Each entity (thread_meta,
run, feedback) now owns its ORM model, abstract interface (where applicable),
and concrete implementations under a single directory with an aggregating
__init__.py for one-line imports.

Layout:
  persistence/thread_meta/{base,model,sql,memory}.py
  persistence/run/{model,sql}.py
  persistence/feedback/{model,sql}.py

models/__init__.py is kept as a facade so Alembic autogenerate continues to
discover all ORM tables via Base.metadata. RunEventRow remains under
models/run_event.py because its storage implementation lives in
runtime/events/store/db.py and has no matching repository directory.

The repositories/ directory is removed entirely. All call sites in
gateway/deps.py and tests are updated to import from the new entity
packages, e.g.:

    from deerflow.persistence.thread_meta import ThreadMetaRepository
    from deerflow.persistence.run import RunRepository
    from deerflow.persistence.feedback import FeedbackRepository

Full test suite passes (1690 passed, 14 skipped).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(gateway): sync thread rename and delete through ThreadMetaStore

The POST /threads/{id}/state endpoint previously synced title changes
only to the LangGraph Store via _store_upsert. In sqlite mode the search
endpoint reads from the ThreadMetaRepository SQL table, so renames never
appeared in /threads/search until the next agent run completed (worker.py
syncs title from checkpoint to thread_meta in its finally block).

Likewise the DELETE /threads/{id} endpoint cleaned up the filesystem,
Store, and checkpointer but left the threads_meta row orphaned in sqlite,
so deleted threads kept appearing in /threads/search.

Fix both endpoints by routing through the ThreadMetaStore abstraction
which already has the correct sqlite/memory implementations wired up by
deps.py. The rename path now calls update_display_name() and the delete
path calls delete() — both work uniformly across backends.

Verified end-to-end with curl in gateway mode against sqlite backend.
Existing test suite (1690 passed) and focused router/repo tests pass.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor(gateway): route all thread metadata access through ThreadMetaStore

Following the rename/delete bug fix in PR1, migrate the remaining direct
LangGraph Store reads/writes in the threads router and services to the
ThreadMetaStore abstraction so that the sqlite and memory backends behave
identically and the legacy dual-write paths can be removed.

Migrated endpoints (threads.py):
- create_thread: idempotency check + write now use thread_meta_repo.get/create
  instead of dual-writing the LangGraph Store and the SQL row.
- get_thread: reads from thread_meta_repo.get; the checkpoint-only fallback
  for legacy threads is preserved.
- patch_thread: replaced _store_get/_store_put with thread_meta_repo.update_metadata.
- delete_thread_data: dropped the legacy store.adelete; thread_meta_repo.delete
  already covers it.

Removed dead code (services.py):
- _upsert_thread_in_store — redundant with the immediately following
  thread_meta_repo.create() call.
- _sync_thread_title_after_run — worker.py's finally block already syncs
  the title via thread_meta_repo.update_display_name() after each run.

Removed dead code (threads.py):
- _store_get / _store_put / _store_upsert helpers (no remaining callers).
- THREADS_NS constant.
- get_store import (router no longer touches the LangGraph Store directly).

New abstract method:
- ThreadMetaStore.update_metadata(thread_id, metadata) merges metadata into
  the thread's metadata field. Implemented in both ThreadMetaRepository (SQL,
  read-modify-write inside one session) and MemoryThreadMetaStore. Three new
  unit tests cover merge / empty / nonexistent behaviour.

Net change: -134 lines. Full test suite: 1693 passed, 14 skipped.
Verified end-to-end with curl in gateway mode against sqlite backend
(create / patch / get / rename / search / delete).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
Co-authored-by: DanielWalnut <45447813+hetaoBackend@users.noreply.github.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: JilongSun <965640067@qq.com>
Co-authored-by: jie <49781832+stan-fu@users.noreply.github.com>
Co-authored-by: cooper <cooperfu@tencent.com>
Co-authored-by: yangzheli <43645580+yangzheli@users.noreply.github.com>

* feat(auth): release-validation pass for 2.0-rc — 12 blockers + simplify follow-ups (#2008)

* feat(auth): introduce backend auth module

Port RFC-001 authentication core from PR #1728:
- JWT token handling (create_access_token, decode_token, TokenPayload)
- Password hashing (bcrypt) with verify_password
- SQLite UserRepository with base interface
- Provider Factory pattern (LocalAuthProvider)
- CLI reset_admin tool
- Auth-specific errors (AuthErrorCode, TokenError, AuthErrorResponse)

Deps:
- bcrypt>=4.0.0
- pyjwt>=2.9.0
- email-validator>=2.0.0
- backend/uv.toml pins public PyPI index

Tests: 12 pure unit tests (test_auth_config.py, test_auth_errors.py).

Scope note: authz.py, test_auth.py, and test_auth_type_system.py are
deferred to commit 2 because they depend on middleware and deps wiring
that is not yet in place. Commit 1 stays "pure new files only" as the
spec mandates.

* feat(auth): wire auth end-to-end (middleware + frontend replacement)

Backend:
- Port auth_middleware, csrf_middleware, langgraph_auth, routers/auth
- Port authz decorator (owner_filter_key defaults to 'owner_id')
- Merge app.py: register AuthMiddleware + CSRFMiddleware + CORS, add
  _ensure_admin_user lifespan hook, _migrate_orphaned_threads helper,
  register auth router
- Merge deps.py: add get_local_provider, get_current_user_from_request,
  get_optional_user_from_request; keep get_current_user as thin str|None
  adapter for feedback router
- langgraph.json: add auth path pointing to langgraph_auth.py:auth
- Rename metadata['user_id'] -> metadata['owner_id'] in langgraph_auth
  (both metadata write and LangGraph filter dict) + test fixtures

Frontend:
- Delete better-auth library and api catch-all route
- Remove better-auth npm dependency and env vars (BETTER_AUTH_SECRET,
  BETTER_AUTH_GITHUB_*) from env.js
- Port frontend/src/core/auth/* (AuthProvider, gateway-config,
  proxy-policy, server-side getServerSideUser, types)
- Port frontend/src/core/api/fetcher.ts
- Port (auth)/layout, (auth)/login, (auth)/setup pages
- Rewrite workspace/layout.tsx as server component that calls
  getServerSideUser and wraps in AuthProvider
- Port workspace/workspace-content.tsx for the client-side sidebar logic

Tests:
- Port 5 auth test files (test_auth, test_auth_middleware,
  test_auth_type_system, test_ensure_admin, test_langgraph_auth)
- 176 auth tests PASS

After this commit: login/logout/registration flow works, but persistence
layer does not yet filter by owner_id. Commit 4 closes that gap.

* feat(auth): account settings page + i18n

- Port account-settings-page.tsx (change password, change email, logout)
- Wire into settings-dialog.tsx as new "account" section with UserIcon,
  rendered first in the section list
- Add i18n keys:
  - en-US/zh-CN: settings.sections.account ("Account" / "账号")
  - en-US/zh-CN: button.logout ("Log out" / "退出登录")
  - types.ts: matching type declarations

* feat(auth): enforce owner_id across 2.0-rc persistence layer

Add request-scoped contextvar-based owner filtering to threads_meta,
runs, run_events, and feedback repositories. Router code is unchanged
— isolation is enforced at the storage layer so that any caller that
forgets to pass owner_id still gets filtered results, and new routes
cannot accidentally leak data.

Core infrastructure
-------------------
- deerflow/runtime/user_context.py (new):
  - ContextVar[CurrentUser | None] with default None
  - runtime_checkable CurrentUser Protocol (structural subtype with .id)
  - set/reset/get/require helpers
  - AUTO sentinel + resolve_owner_id(value, method_name) for sentinel
    three-state resolution: AUTO reads contextvar, explicit str
    overrides, explicit None bypasses the filter (for migration/CLI)

Repository changes
------------------
- ThreadMetaRepository: create/get/search/update_*/delete gain
  owner_id=AUTO kwarg; read paths filter by owner, writes stamp it,
  mutations check ownership before applying
- RunRepository: put/get/list_by_thread/delete gain owner_id=AUTO kwarg
- FeedbackRepository: create/get/list_by_run/list_by_thread/delete
  gain owner_id=AUTO kwarg
- DbRunEventStore: list_messages/list_events/list_messages_by_run/
  count_messages/delete_by_thread/delete_by_run gain owner_id=AUTO
  kwarg. Write paths (put/put_batch) read contextvar softly: when a
  request-scoped user is available, owner_id is stamped; background
  worker writes without a user context pass None which is valid
  (orphan row to be bound by migration)

Schema
------
- persistence/models/run_event.py: RunEventRow.owner_id = Mapped[
  str | None] = mapped_column(String(64), nullable=True, index=True)
- No alembic migration needed: 2.0 ships fresh, Base.metadata.create_all
  picks up the new column automatically

Middleware
----------
- auth_middleware.py: after cookie check, call get_optional_user_from_
  request to load the real User, stamp it into request.state.user AND
  the contextvar via set_current_user, reset in a try/finally. Public
  paths and unauthenticated requests continue without contextvar, and
  @require_auth handles the strict 401 path

Test infrastructure
-------------------
- tests/conftest.py: @pytest.fixture(autouse=True) _auto_user_context
  sets a default SimpleNamespace(id="test-user-autouse") on every test
  unless marked @pytest.mark.no_auto_user. Keeps existing 20+
  persistence tests passing without modification
- pyproject.toml [tool.pytest.ini_options]: register no_auto_user
  marker so pytest does not emit warnings for opt-out tests
- tests/test_user_context.py: 6 tests covering three-state semantics,
  Protocol duck typing, and require/optional APIs
- tests/test_thread_meta_repo.py: one test updated to pass owner_id=
  None explicitly where it was previously relying on the old default

Test results
------------
- test_user_context.py: 6 passed
- test_auth*.py + test_langgraph_auth.py + test_ensure_admin.py: 127
- test_run_event_store / test_run_repository / test_thread_meta_repo
  / test_feedback: 92 passed
- Full backend suite: 1905 passed, 2 failed (both @requires_llm flaky
  integration tests unrelated to auth), 1 skipped

* feat(auth): extend orphan migration to 2.0-rc persistence tables

_ensure_admin_user now runs a three-step pipeline on every boot:

  Step 1 (fatal):     admin user exists / is created / password is reset
  Step 2 (non-fatal): LangGraph store orphan threads → admin
  Step 3 (non-fatal): SQL persistence tables → admin
    - threads_meta
    - runs
    - run_events
    - feedback

Each step is idempotent. The fatal/non-fatal split mirrors PR #1728's
original philosophy: admin creation failure blocks startup (the system
is unusable without an admin), whereas migration failures log a warning
and let the service proceed (a partial migration is recoverable; a
missing admin is not).

Key helpers
-----------
- _iter_store_items(store, namespace, *, page_size=500):
  async generator that cursor-paginates across LangGraph store pages.
  Fixes PR #1728's hardcoded limit=1000 bug that would silently lose
  orphans beyond the first page.

- _migrate_orphaned_threads(store, admin_user_id):
  Rewritten to use _iter_store_items. Returns the migrated count so the
  caller can log it; raises only on unhandled exceptions.

- _migrate_orphan_sql_tables(admin_user_id):
  Imports the 4 ORM models lazily, grabs the shared session factory,
  runs one UPDATE per table in a single transaction, commits once.
  No-op when no persistence backend is configured (in-memory dev).

Tests: test_ensure_admin.py (8 passed)

* test(auth): port AUTH test plan docs + lint/format pass

- Port backend/docs/AUTH_TEST_PLAN.md and AUTH_UPGRADE.md from PR #1728
- Rename metadata.user_id → metadata.owner_id in AUTH_TEST_PLAN.md
  (4 occurrences from the original PR doc)
- ruff auto-fix UP037 in sentinel type annotations: drop quotes around
  "str | None | _AutoSentinel" now that from __future__ import
  annotations makes them implicit string forms
- ruff format: 2 files (app/gateway/app.py, runtime/user_context.py)

Note on test coverage additions:
- conftest.py autouse fixture was already added in commit 4 (had to
  be co-located with the repository changes to keep pre-existing
  persistence tests passing)
- cross-user isolation E2E tests (test_owner_isolation.py) deferred
  — enforcement is already proven by the 98-test repository suite
  via the autouse fixture + explicit _AUTO sentinel exercises
- New test cases (TC-API-17..20, TC-ATK-13, TC-MIG-01..07) listed
  in AUTH_TEST_PLAN.md are deferred to a follow-up PR — they are
  manual-QA test cases rather than pytest code, and the spec-level
  coverage is already met by test_user_context.py + the 98-test
  repository suite.

Final test results:
- Auth suite (test_auth*, test_langgraph_auth, test_ensure_admin,
  test_user_context): 186 passed
- Persistence suite (test_run_event_store, test_run_repository,
  test_thread_meta_repo, test_feedback): 98 passed
- Lint: ruff check + ruff format both clean

* test(auth): add cross-user isolation test suite

10 tests exercising the storage-layer owner filter by manually
switching the user_context contextvar between two users. Verifies
the safety invariant:

  After a repository write with owner_id=A, a subsequent read with
  owner_id=B must not return the row, and vice versa.

Covers all 4 tables that own user-scoped data:

TC-API-17  threads_meta  — read, search, update, delete cross-user
TC-API-18  runs          — get, list_by_thread, delete cross-user
TC-API-19  run_events    — list_messages, list_events, count_messages,
                           delete_by_thread (CRITICAL: raw conversation
                           content leak vector)
TC-API-20  feedback      — get, list_by_run, delete cross-user

Plus two meta-tests verifying the sentinel pattern itself:
- AUTO + unset contextvar raises RuntimeError
- explicit owner_id=None bypasses the filter (migration escape hatch)

Architecture note
-----------------
These tests bypass the HTTP layer by design. The full chain
(cookie → middleware → contextvar → repository) is covered piecewise:

- test_auth_middleware.py: middleware sets contextvar from cookies
- test_owner_isolation.py: repositories enforce isolation when
  contextvar is set to different users

Together they prove the end-to-end safety property without the
ceremony of spinning up a full TestClient + in-memory DB for every
router endpoint.

Tests pass: 231 (full auth + persistence + isolation suite)
Lint: clean

* refactor(auth): migrate user repository to SQLAlchemy ORM

Move the users table into the shared persistence engine so auth
matches the pattern of threads_meta, runs, run_events, and feedback —
one engine, one session factory, one schema init codepath.

New files
---------
- persistence/user/__init__.py, persistence/user/model.py: UserRow
  ORM class with partial unique index on (oauth_provider, oauth_id)
- Registered in persistence/models/__init__.py so
  Base.metadata.create_all() picks it up

Modified
--------
- auth/repositories/sqlite.py: rewritten as async SQLAlchemy,
  identical constructor pattern to the other four repositories
  (def __init__(self, session_factory) + self._sf = session_factory)
- auth/config.py: drop users_db_path field — storage is configured
  through config.database like every other table
- deps.py/get_local_provider: construct SQLiteUserRepository with
  the shared session factory, fail fast if engine is not initialised
- tests/test_auth.py: rewrite test_sqlite_round_trip_new_fields to
  use the shared engine (init_engine + close_engine in a tempdir)
- tests/test_auth_type_system.py: add per-test autouse fixture that
  spins up a scratch engine and resets deps._cached_* singletons

* refactor(auth): remove SQL orphan migration (unused in supported scenarios)

The _migrate_orphan_sql_tables helper existed to bind NULL owner_id
rows in threads_meta, runs, run_events, and feedback to the admin on
first boot. But in every supported upgrade path, it's a no-op:

  1. Fresh install: create_all builds fresh tables, no legacy rows
  2. No-auth → with-auth (no existing persistence DB): persistence
     tables are created fresh by create_all, no legacy rows
  3. No-auth → with-auth (has existing persistence DB from #1930):
     NOT a supported upgrade path — "有 DB 到有 DB" schema evolution
     is out of scope; users wipe DB or run manual ALTER

So the SQL orphan migration never has anything to do in the
supported matrix. Delete the function, simplify _ensure_admin_user
from a 3-step pipeline to a 2-step one (admin creation + LangGraph
store orphan migration only).

LangGraph store orphan migration stays: it serves the real
"no-auth → with-auth" upgrade path where a user's existing LangGraph
thread metadata has no owner_id field and needs to be stamped with
the newly-created admin's id.

Tests: 284 passed (auth + persistence + isolation)
Lint: clean

* security(auth): write initial admin password to 0600 file instead of logs

CodeQL py/clear-text-logging-sensitive-data flagged 3 call sites that
logged the auto-generated admin password to stdout via logger.info().
Production log aggregators (ELK/Splunk/etc) would have captured those
cleartext secrets. Replace with a shared helper that writes to
.deer-flow/admin_initial_credentials.txt with mode 0600, and log only
the path.

New file
--------
- app/gateway/auth/credential_file.py: write_initial_credentials()
  helper. Takes email, password, and a "initial"/"reset" label.
  Creates .deer-flow/ if missing, writes a header comment plus the
  email+password, chmods 0o600, returns the absolute Path.

Modified
--------
- app/gateway/app.py: both _ensure_admin_user paths (fresh creation
  + needs_setup password reset) now write to file and log the path
- app/gateway/auth/reset_admin.py: rewritten to use the shared ORM
  repo (SQLiteUserRepository with session_factory) and the
  credential_file helper. The previous implementation was broken
  after the earlier ORM refactor — it still imported _get_users_conn
  and constructed SQLiteUserRepository() without a session factory.

No tests changed — the three password-log sites are all exercised
via existing test_ensure_admin.py which checks that startup
succeeds, not that a specific string appears in logs.

CodeQL alerts 272, 283, 284: all resolved.

* security(auth): strict JWT validation in middleware (fix junk cookie bypass)

AUTH_TEST_PLAN test 7.5.8 expects junk cookies to be rejected with
401. The previous middleware behaviour was "presence-only": check
that some access_token cookie exists, then pass through. In
combination with my Task-12 decision to skip @require_auth
decorators on routes, this created a gap where a request with any
cookie-shaped string (e.g. access_token=not-a-jwt) would bypass
authentication on routes that do not touch the repository
(/api/models, /api/mcp/config, /api/memory, /api/skills, …).

Fix: middleware now calls get_current_user_from_request() strictly
and catches the resulting HTTPException to render a 401 with the
proper fine-grained error code (token_invalid, token_expired,
user_not_found, …). On success it stamps request.state.user and
the contextvar so repository-layer owner filters work downstream.

The 4 old "_with_cookie_passes" tests in test_auth_middleware.py
were written for the presence-only behaviour; they asserted that
a junk cookie would make the handler return 200. They are renamed
to "_with_junk_cookie_rejected" and their assertions flipped to
401. The negative path (no cookie → 401 not_authenticated)
is unchanged.

Verified:
  no cookie       → 401 not_authenticated
  junk cookie     → 401 token_invalid     (the fixed bug)
  expired cookie  → 401 token_expired

Tests: 284 passed (auth + persistence + isolation)
Lint: clean

* security(auth): wire @require_permission(owner_check=True) on isolation routes

Apply the require_permission decorator to all 28 routes that take a
{thread_id} path parameter. Combined with the strict middleware
(previous commit), this gives the double-layer protection that
AUTH_TEST_PLAN test 7.5.9 documents:

  Layer 1 (AuthMiddleware): cookie + JWT validation, rejects junk
                            cookies and stamps request.state.user
  Layer 2 (@require_permission with owner_check=True): per-resource
                            ownership verification via
                            ThreadMetaStore.check_access — returns
                            404 if a different user owns the thread

The decorator's owner_check branch is rewritten to use the SQL
thread_meta_repo (the 2.0-rc persistence layer) instead of the
LangGraph store path that PR #1728 used (_store_get / get_store
in routers/threads.py). The inject_record convenience is dropped
— no caller in 2.0 needs the LangGraph blob, and the SQL repo has
a different shape.

Routes decorated (28 total):
- threads.py: delete, patch, get, get-state, post-state, post-history
- thread_runs.py: post-runs, post-runs-stream, post-runs-wait,
  list_runs, get_run, cancel_run, join_run, stream_existing_run,
  list_thread_messages, list_run_messages, list_run_events,
  thread_token_usage
- feedback.py: create, list, stats, delete
- uploads.py: upload (added Request param), list, delete
- artifacts.py: get_artifact
- suggestions.py: generate (renamed body parameter to avoid
  conflict with FastAPI Request)

Test fixes:
- test_suggestions_router.py: bypass the decorator via __wrapped__
  (the unit tests cover parsing logic, not auth — no point spinning
  up a thread_meta_repo just to test JSON unwrapping)
- test_auth_middleware.py 4 fake-cookie tests: already updated in
  the previous commit (745bf432)

Tests: 293 passed (auth + persistence + isolation + suggestions)
Lint: clean

* security(auth): defense-in-depth fixes from release validation pass

Eight findings caught while running the AUTH_TEST_PLAN end-to-end against
the deployed sg_dev stack. Each is a pre-condition for shipping
release/2.0-rc that the previous PRs missed.

Backend hardening
- routers/auth.py: rate limiter X-Real-IP now requires AUTH_TRUSTED_PROXIES
  whitelist (CIDR/IP allowlist). Without nginx in front, the previous code
  honored arbitrary X-Real-IP, letting an attacker rotate the header to
  fully bypass the per-IP login lockout.
- routers/auth.py: 36-entry common-password blocklist via Pydantic
  field_validator on RegisterRequest + ChangePasswordRequest. The shared
  _validate_strong_password helper keeps the constraint in one place.
- routers/threads.py: ThreadCreateRequest + ThreadPatchRequest strip
  server-reserved metadata keys (owner_id, user_id) via Pydantic
  field_validator so a forged value can never round-trip back to other
  clients reading the same thread. The actual ownership invariant stays
  on the threads_meta row; this closes the metadata-blob echo gap.
- authz.py + thread_meta/sql.py: require_permission gains a require_existing
  flag plumbed through check_access(require_existing=True). Destructive
  routes (DELETE/PATCH/state-update/runs/feedback) now treat a missing
  thread_meta row as 404 instead of "untracked legacy thread, allow",
  closing the cross-user delete-idempotence gap where any user could
  successfully DELETE another user's deleted thread.
- repositories/sqlite.py + base.py: update_user raises UserNotFoundError
  on a vanished row instead of silently returning the input. Concurrent
  delete during password reset can no longer look like a successful update.
- runtime/user_context.py: resolve_owner_id() coerces User.id (UUID) to
  str at the contextvar boundary so SQLAlchemy String(64) columns can
  bind it. The whole 2.0-rc isolation pipeline was previously broken
  end-to-end (POST /api/threads → 500 "type 'UUID' is not supported").
- persistence/engine.py: SQLAlchemy listener enables PRAGMA journal_mode=WAL,
  synchronous=NORMAL, foreign_keys=ON on every new SQLite connection.
  TC-UPG-06 in the test plan expects WAL; previous code shipped with the
  default 'delete' journal.
- auth_middleware.py: stamp request.state.auth = AuthContext(...) so
  @require_permission's short-circuit fires; previously every isolation
  request did a duplicate JWT decode + users SELECT. Also unifies the
  401 payload through AuthErrorResponse(...).model_dump().
- app.py: _ensure_admin_user restructure removes the noqa F821 scoping
  bug where 'password' was referenced outside the branch that defined it.
  New _announce_credentials helper absorbs the duplicate log block in
  the fresh-admin and reset-admin branches.

* fix(frontend+nginx): rollout CSRF on every state-changing client path

The frontend was 100% broken in gateway-pro mode for any user trying to
open a specific chat thread. Three cumulative bugs each silently
masked the next.

LangGraph SDK CSRF gap (api-client.ts)
- The Client constructor took only apiUrl, no defaultHeaders, no fetch
  interceptor. The SDK's internal fetch never sent X-CSRF-Token, so
  every state-changing /api/langgraph-compat/* call (runs/stream,
  threads/search, threads/{tid}/history, ...) hit CSRFMiddleware and
  got 403 before reaching the auth check. UI symptom: empty thread page
  with no error message; the SPA's hooks swallowed the rejection.
- Fix: pass an onRequest hook that injects X-CSRF-Token from the
  csrf_token cookie per request. Reading the cookie per call (not at
  construction time) handles login / logout / password-change cookie
  rotation transparently. The SDK's prepareFetchOptions calls
  onRequest for both regular requests AND streaming/SSE/reconnect, so
  the same hook covers runs.stream and runs.joinStream.

Raw fetch CSRF gap (7 files)
- Audit: 11 frontend fetch sites, only 2 included CSRF (login/setup +
  account-settings change-password). The other 7 routed through raw
  fetch() with no header — suggestions, memory, agents, mcp, skills,
  uploads, and the local thread cleanup hook all 403'd silently.
- Fix: enhance fetcher.ts:fetchWithAuth to auto-inject X-CSRF-Token on
  POST/PUT/DELETE/PATCH from a single shared readCsrfCookie() helper.
  Convert all 7 raw fetch() callers to fetchWithAuth so the contract
  is centrally enforced. api-client.ts and fetcher.ts share
  readCsrfCookie + STATE_CHANGING_METHODS to avoid drift.

nginx routing + buffering (nginx.local.conf)
- The auth feature shipped without updating the nginx config: per-API
  explicit location blocks but no /api/v1/auth/, /api/feedback, /api/runs.
  The frontend's client-side fetches to /api/v1/auth/login/local 404'd
  from the Next.js side because nginx routed /api/* to the frontend.
- Fix: add catch-all `location /api/` that proxies to the gateway.
  nginx longest-prefix matching keeps the explicit blocks (/api/models,
  /api/threads regex, /api/langgraph/, ...) winning for their paths.
- Fix: disable proxy_buffering + proxy_request_buffering for the
  frontend `location /` block. Without it, nginx tries to spool large
  Next.js chunks into /var/lib/nginx/proxy (root-owned) and fails with
  Permission denied → ERR_INCOMPLETE_CHUNKED_ENCODING → ChunkLoadError.

* test(auth): release-validation test infra and new coverage

Test fixtures and unit tests added during the validation pass.

Router test helpers (NEW: tests/_router_auth_helpers.py)
- make_authed_test_app(): builds a FastAPI test app with a stub
  middleware that stamps request.state.user + request.state.auth and a
  permissive thread_meta_repo mock. TestClient-based router tests
  (test_artifacts_router, test_threads_router) use it instead of bare
  FastAPI() so the new @require_permission(owner_check=True) decorators
  short-circuit cleanly.
- call_unwrapped(): walks the __wrapped__ chain to invoke the underlying
  handler without going through the authz wrappers. Direct-call tests
  (test_uploads_router) use it. Typed with ParamSpec so the wrapped
  signature flows through.

Backend test additions
- test_auth.py: 7 tests for the new _get_client_ip trust model (no
  proxy / trusted proxy / untrusted peer / XFF rejection / invalid
  CIDR / no client). 5 tests for the password blocklist (literal,
  case-insensitive, strong password accepted, change-password binding,
  short-password length-check still fires before blocklist).
  test_update_user_raises_when_row_concurrently_deleted: closes a
  shipped-without-coverage gap on the new UserNotFoundError contract.
- test_thread_meta_repo.py: 4 tests for check_access(require_existing=True)
  — strict missing-row denial, strict owner match, strict owner mismatch,
  strict null-owner still allowed (shared rows survive the tightening).
- test_ensure_admin.py: 3 tests for _migrate_orphaned_threads /
  _iter_store_items pagination, covering the TC-UPG-02 upgrade story
  end-to-end via mock store. Closes the gap where the cursor pagination
  was untested even though the previous PR rewrote it.
- test_threads_router.py: 5 tests for _strip_reserved_metadata
  (owner_id removal, user_id removal, safe-keys passthrough, empty
  input, both-stripped).
- test_auth_type_system.py: replace "password123" fixtures with
  Tr0ub4dor3a / AnotherStr0ngPwd! so the new password blocklist
  doesn't reject the test data.

* docs(auth): refresh TC-DOCKER-05 + document Docker validation gap

- AUTH_TEST_PLAN.md TC-DOCKER-05: the previous expectation
  ("admin password visible in docker logs") was stale after the simplify
  pass that moved credentials to a 0600 file. The grep "Password:" check
  would have silently failed and given a false sense of coverage. New
  expectation matches the actual file-based path: 0600 file in
  DEER_FLOW_HOME, log shows the path (not the secret), reverse-grep
  asserts no leaked password in container logs.
- NEW: docs/AUTH_TEST_DOCKER_GAP.md documents the only un-executed
  block in the test plan (TC-DOCKER-01..06). Reason: sg_dev validation
  host has no Docker daemon installed. The doc maps each Docker case
  to an already-validated bare-metal equivalent (TC-1.1, TC-REENT-01,
  TC-API-02 etc.) so the gap is auditable, and includes pre-flight
  reproduction steps for whoever has Docker available.

---------

Co-authored-by: greatmengqi <chenmengqi.0376@bytedance.com>

* fix(persistence): stream hang when run_events.backend=db

DbRunEventStore._user_id_from_context() returned user.id without
coercing it to str. User.id is a Pydantic UUID, and aiosqlite cannot
bind a raw UUID object to a VARCHAR column, so the INSERT for the
initial human_message event silently rolled back and raised out of
the worker task. Because that put() sat outside the worker's try
block, the finally-clause that publishes end-of-stream never ran
and the SSE stream hung forever.

jsonl mode was unaffected because json.dumps(default=str) coerces
UUID objects transparently.

Fixes:
- db.py: coerce user.id to str at the context-read boundary (matches
  what resolve_user_id already does for the other repositories)
- worker.py: move RunJournal init + human_message put inside the try
  block so any failure flows through the finally/publish_end path
  instead of hanging the subscriber

Defense-in-depth:
- engine.py: add PRAGMA busy_timeout=5000 so checkpointer and event
  store wait for each other on the shared deerflow.db file instead
  of failing immediately under write-lock contention
- journal.py: skip fire-and-forget _flush_sync when a previous flush
  task is still in flight, to avoid piling up concurrent put_batch
  writes on the same SQLAlchemy engine during streaming; flush() now
  waits for pending tasks before draining the buffer
- database_config.py: doc-only update clarifying WAL + busy_timeout
  keep the unified deerflow.db safe for both workloads

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore(persistence): drop redundant busy_timeout PRAGMA

Python's sqlite3 driver defaults to a 5-second busy timeout via the
``timeout`` kwarg of ``sqlite3.connect``, and aiosqlite + SQLAlchemy's
aiosqlite dialect inherit that default. Setting ``PRAGMA busy_timeout=5000``
explicitly was a no-op — verified by reading back the PRAGMA on a fresh
connection (it already reports 5000ms without our PRAGMA).

Concurrent stress test (50 checkpoint writes + 20 event batches + 50
thread_meta updates on the same deerflow.db) still completes with zero
errors and 200/200 rows after removing the explicit PRAGMA.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(rebase): remove duplicate definitions and update stale module paths

Rebase left duplicate function blocks in worker.py (triple human_message
write causing 3x user messages in /history), deps.py, and prompt.py.
Also update checkpointer imports from the old deerflow.agents.checkpointer
path to deerflow.runtime.checkpointer, and clean up orphaned feedback
props in the frontend message components.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(user-context): add DEFAULT_USER_ID and get_effective_user_id helper

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(paths): add user-aware path methods with optional user_id parameter

Add _validate_user_id(), user_dir(), user_memory_file(), user_agent_memory_file()
and optional keyword-only user_id parameter to all thread-related path methods.
When user_id is provided, paths resolve under users/{user_id}/threads/{thread_id}/;
when omitted, legacy layout is preserved for backward compatibility.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(memory): add user_id to MemoryStorage interface for per-user isolation

Thread user_id through MemoryStorage.load/reload/save abstract methods and
FileMemoryStorage, re-keying the in-memory cache from bare agent_name to a
(user_id, agent_name) tuple to prevent cross-user cache collisions.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(memory): thread user_id through memory updater layer

Add `user_id` keyword-only parameter to all public updater functions
(_save_memory_to_file, get_memory_data, reload_memory_data, import_memory_data,
clear_memory_data, create/delete/update_memory_fact) and regular keyword param
to MemoryUpdater.update_memory + update_memory_from_conversation, propagating
it to every storage load/save/reload call.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(memory): capture user_id at enqueue time for async-safe thread isolation

Add user_id field to ConversationContext and MemoryUpdateQueue.add() so the
user identity is stored explicitly at request time, before threading.Timer
fires on a different thread where ContextVar values do not propagate.
MemoryMiddleware.after_agent() now calls get_effective_user_id() at enqueue
time and passes the value through to updater.update_memory().

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(isolation): wire user_id through all Paths and memory callsites

Pass user_id=get_effective_user_id() at every callsite that invokes
Paths methods or memory functions, enabling per-user filesystem isolation
throughout the harness and app layers.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(migration): add idempotent script for per-user data migration

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: update CLAUDE.md and config docs for per-user isolation

* feat(events): add pagination to list_messages_by_run on all store backends

Replicates the existing before_seq/after_seq/limit cursor-pagination pattern
from list_messages onto list_messages_by_run across the abstract interface,
MemoryRunEventStore, JsonlRunEventStore, and DbRunEventStore.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(api): add GET /api/runs/{run_id}/messages with cursor pagination

New endpoint resolves thread_id from the run record and delegates to
RunEventStore.list_messages_by_run for cursor-based pagination.
Ownership is enforced implicitly via RunStore.get() user filtering.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(api): add GET /api/runs/{run_id}/feedback

Delegates to FeedbackRepository.list_by_run via the existing _resolve_run
helper; includes tests for success, 404, empty list, and 503 (no DB).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(api): retrofit cursor pagination onto GET /threads/{tid}/runs/{rid}/messages

Replace bare list[dict] response with {data: [...], has_more: bool} envelope,
forwarding limit/before_seq/after_seq query params to the event store.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add run-level API endpoints to CLAUDE.md routers table

* refactor(threads): remove event-store message loader and feedback from state/history endpoints

State and history endpoints now return messages purely from the
checkpointer's channel_values. The _get_event_store_messages helper
(which loaded the full event-store transcript with feedback attached)
is removed along with its tests. Frontend will use the dedicated
GET /api/runs/{run_id}/messages and /feedback endpoints instead.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
Co-authored-by: DanielWalnut <45447813+hetaoBackend@users.noreply.github.com>
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
Co-authored-by: JilongSun <965640067@qq.com>
Co-authored-by: jie <49781832+stan-fu@users.noreply.github.com>
Co-authored-by: cooper <cooperfu@tencent.com>
Co-authored-by: yangzheli <43645580+yangzheli@users.noreply.github.com>
Co-authored-by: greatmengqi <chenmengqi.0376@gmail.com>
Co-authored-by: greatmengqi <chenmengqi.0376@bytedance.com>
2026-04-12 20:04:32 +08:00

3114 lines
128 KiB
Python

"""Tests for DeerFlowClient."""
import asyncio
import concurrent.futures
import json
import tempfile
import zipfile
from enum import Enum
from pathlib import Path
from unittest.mock import MagicMock, patch
import pytest
from langchain_core.messages import AIMessage, AIMessageChunk, HumanMessage, SystemMessage, ToolMessage # noqa: F401
from app.gateway.routers.mcp import McpConfigResponse
from app.gateway.routers.memory import MemoryConfigResponse, MemoryStatusResponse
from app.gateway.routers.models import ModelResponse, ModelsListResponse
from app.gateway.routers.skills import SkillInstallResponse, SkillResponse, SkillsListResponse
from app.gateway.routers.uploads import UploadResponse
from deerflow.client import DeerFlowClient
from deerflow.config.paths import Paths
from deerflow.uploads.manager import PathTraversalError
# ---------------------------------------------------------------------------
# Fixtures
# ---------------------------------------------------------------------------
@pytest.fixture
def mock_app_config():
"""Provide a minimal AppConfig mock."""
model = MagicMock()
model.name = "test-model"
model.model = "test-model"
model.supports_thinking = False
model.supports_reasoning_effort = False
model.model_dump.return_value = {"name": "test-model", "use": "langchain_openai:ChatOpenAI"}
config = MagicMock()
config.models = [model]
return config
@pytest.fixture
def client(mock_app_config):
"""Create a DeerFlowClient with mocked config loading."""
with patch("deerflow.client.get_app_config", return_value=mock_app_config):
return DeerFlowClient()
# ---------------------------------------------------------------------------
# __init__
# ---------------------------------------------------------------------------
class TestClientInit:
def test_default_params(self, client):
assert client._model_name is None
assert client._thinking_enabled is True
assert client._subagent_enabled is False
assert client._plan_mode is False
assert client._agent_name is None
assert client._available_skills is None
assert client._checkpointer is None
assert client._agent is None
def test_custom_params(self, mock_app_config):
mock_middleware = MagicMock()
with patch("deerflow.client.get_app_config", return_value=mock_app_config):
c = DeerFlowClient(model_name="gpt-4", thinking_enabled=False, subagent_enabled=True, plan_mode=True, agent_name="test-agent", available_skills={"skill1", "skill2"}, middlewares=[mock_middleware])
assert c._model_name == "gpt-4"
assert c._thinking_enabled is False
assert c._subagent_enabled is True
assert c._plan_mode is True
assert c._agent_name == "test-agent"
assert c._available_skills == {"skill1", "skill2"}
assert c._middlewares == [mock_middleware]
def test_invalid_agent_name(self, mock_app_config):
with patch("deerflow.client.get_app_config", return_value=mock_app_config):
with pytest.raises(ValueError, match="Invalid agent name"):
DeerFlowClient(agent_name="invalid name with spaces!")
with pytest.raises(ValueError, match="Invalid agent name"):
DeerFlowClient(agent_name="../path/traversal")
def test_custom_config_path(self, mock_app_config):
with (
patch("deerflow.client.reload_app_config") as mock_reload,
patch("deerflow.client.get_app_config", return_value=mock_app_config),
):
DeerFlowClient(config_path="/tmp/custom.yaml")
mock_reload.assert_called_once_with("/tmp/custom.yaml")
def test_checkpointer_stored(self, mock_app_config):
cp = MagicMock()
with patch("deerflow.client.get_app_config", return_value=mock_app_config):
c = DeerFlowClient(checkpointer=cp)
assert c._checkpointer is cp
# ---------------------------------------------------------------------------
# list_models / list_skills / get_memory
# ---------------------------------------------------------------------------
class TestConfigQueries:
def test_list_models(self, client):
result = client.list_models()
assert "models" in result
assert len(result["models"]) == 1
assert result["models"][0]["name"] == "test-model"
# Verify Gateway-aligned fields are present
assert "model" in result["models"][0]
assert "display_name" in result["models"][0]
assert "supports_thinking" in result["models"][0]
def test_list_skills(self, client):
skill = MagicMock()
skill.name = "web-search"
skill.description = "Search the web"
skill.license = "MIT"
skill.category = "public"
skill.enabled = True
with patch("deerflow.skills.loader.load_skills", return_value=[skill]) as mock_load:
result = client.list_skills()
mock_load.assert_called_once_with(enabled_only=False)
assert "skills" in result
assert len(result["skills"]) == 1
assert result["skills"][0] == {
"name": "web-search",
"description": "Search the web",
"license": "MIT",
"category": "public",
"enabled": True,
}
def test_list_skills_enabled_only(self, client):
with patch("deerflow.skills.loader.load_skills", return_value=[]) as mock_load:
client.list_skills(enabled_only=True)
mock_load.assert_called_once_with(enabled_only=True)
def test_get_memory(self, client):
memory = {"version": "1.0", "facts": []}
with patch("deerflow.agents.memory.updater.get_memory_data", return_value=memory) as mock_mem:
result = client.get_memory()
mock_mem.assert_called_once()
assert result == memory
def test_export_memory(self, client):
memory = {"version": "1.0", "facts": []}
with patch("deerflow.agents.memory.updater.get_memory_data", return_value=memory) as mock_mem:
result = client.export_memory()
mock_mem.assert_called_once()
assert result == memory
# ---------------------------------------------------------------------------
# stream / chat
# ---------------------------------------------------------------------------
def _make_agent_mock(chunks: list[dict]):
"""Create a mock agent whose .stream() yields the given chunks."""
agent = MagicMock()
agent.stream.return_value = iter(chunks)
return agent
def _ai_events(events):
"""Filter messages-tuple events with type=ai and non-empty content."""
return [e for e in events if e.type == "messages-tuple" and e.data.get("type") == "ai" and e.data.get("content")]
def _tool_call_events(events):
"""Filter messages-tuple events with type=ai and tool_calls."""
return [e for e in events if e.type == "messages-tuple" and e.data.get("type") == "ai" and "tool_calls" in e.data]
def _tool_result_events(events):
"""Filter messages-tuple events with type=tool."""
return [e for e in events if e.type == "messages-tuple" and e.data.get("type") == "tool"]
class TestStream:
def test_basic_message(self, client):
"""stream() emits messages-tuple + values + end for a simple AI reply."""
ai = AIMessage(content="Hello!", id="ai-1")
chunks = [
{"messages": [HumanMessage(content="hi", id="h-1")]},
{"messages": [HumanMessage(content="hi", id="h-1"), ai]},
]
agent = _make_agent_mock(chunks)
with (
patch.object(client, "_ensure_agent"),
patch.object(client, "_agent", agent),
):
events = list(client.stream("hi", thread_id="t1"))
types = [e.type for e in events]
assert "messages-tuple" in types
assert "values" in types
assert types[-1] == "end"
msg_events = _ai_events(events)
assert msg_events[0].data["content"] == "Hello!"
def test_custom_events_are_forwarded(self, client):
"""stream() forwards custom stream events alongside normal values output."""
ai = AIMessage(content="Hello!", id="ai-1")
agent = MagicMock()
agent.stream.return_value = iter(
[
("custom", {"type": "task_started", "task_id": "task-1"}),
("values", {"messages": [HumanMessage(content="hi", id="h-1"), ai]}),
]
)
with (
patch.object(client, "_ensure_agent"),
patch.object(client, "_agent", agent),
):
events = list(client.stream("hi", thread_id="t-custom"))
agent.stream.assert_called_once()
call_kwargs = agent.stream.call_args.kwargs
# ``messages`` enables token-level streaming of AI text deltas;
# see DeerFlowClient.stream() docstring and GitHub issue #1969.
assert call_kwargs["stream_mode"] == ["values", "messages", "custom"]
assert events[0].type == "custom"
assert events[0].data == {"type": "task_started", "task_id": "task-1"}
assert any(event.type == "messages-tuple" and event.data["content"] == "Hello!" for event in events)
assert any(event.type == "values" for event in events)
assert events[-1].type == "end"
def test_context_propagation(self, client):
"""stream() passes agent_name to the context."""
agent = _make_agent_mock([{"messages": [AIMessage(content="ok", id="ai-1")]}])
client._agent_name = "test-agent-1"
with (
patch.object(client, "_ensure_agent"),
patch.object(client, "_agent", agent),
):
list(client.stream("hi", thread_id="t1"))
# Verify context passed to agent.stream
agent.stream.assert_called_once()
call_kwargs = agent.stream.call_args.kwargs
assert call_kwargs["context"]["thread_id"] == "t1"
assert call_kwargs["context"]["agent_name"] == "test-agent-1"
def test_custom_mode_is_normalized_to_string(self, client):
"""stream() forwards custom events even when the mode is not a plain string."""
class StreamMode(Enum):
CUSTOM = "custom"
def __str__(self):
return self.value
agent = _make_agent_mock(
[
(StreamMode.CUSTOM, {"type": "task_started", "task_id": "task-1"}),
{"messages": [AIMessage(content="Hello!", id="ai-1")]},
]
)
with (
patch.object(client, "_ensure_agent"),
patch.object(client, "_agent", agent),
):
events = list(client.stream("hi", thread_id="t-custom-enum"))
assert events[0].type == "custom"
assert events[0].data == {"type": "task_started", "task_id": "task-1"}
assert any(event.type == "messages-tuple" and event.data["content"] == "Hello!" for event in events)
assert events[-1].type == "end"
def test_tool_call_and_result(self, client):
"""stream() emits messages-tuple events for tool calls and results."""
ai = AIMessage(content="", id="ai-1", tool_calls=[{"name": "bash", "args": {"cmd": "ls"}, "id": "tc-1"}])
tool = ToolMessage(content="file.txt", id="tm-1", tool_call_id="tc-1", name="bash")
ai2 = AIMessage(content="Here are the files.", id="ai-2")
chunks = [
{"messages": [HumanMessage(content="list files", id="h-1"), ai]},
{"messages": [HumanMessage(content="list files", id="h-1"), ai, tool]},
{"messages": [HumanMessage(content="list files", id="h-1"), ai, tool, ai2]},
]
agent = _make_agent_mock(chunks)
with (
patch.object(client, "_ensure_agent"),
patch.object(client, "_agent", agent),
):
events = list(client.stream("list files", thread_id="t2"))
assert len(_tool_call_events(events)) >= 1
assert len(_tool_result_events(events)) >= 1
assert len(_ai_events(events)) >= 1
assert events[-1].type == "end"
def test_values_event_with_title(self, client):
"""stream() emits values event containing title when present in state."""
ai = AIMessage(content="ok", id="ai-1")
chunks = [
{"messages": [HumanMessage(content="hi", id="h-1"), ai], "title": "Greeting"},
]
agent = _make_agent_mock(chunks)
with (
patch.object(client, "_ensure_agent"),
patch.object(client, "_agent", agent),
):
events = list(client.stream("hi", thread_id="t3"))
values_events = [e for e in events if e.type == "values"]
assert len(values_events) >= 1
assert values_events[-1].data["title"] == "Greeting"
assert "messages" in values_events[-1].data
def test_deduplication(self, client):
"""Messages with the same id are not emitted twice."""
ai = AIMessage(content="Hello!", id="ai-1")
chunks = [
{"messages": [HumanMessage(content="hi", id="h-1"), ai]},
{"messages": [HumanMessage(content="hi", id="h-1"), ai]}, # duplicate
]
agent = _make_agent_mock(chunks)
with (
patch.object(client, "_ensure_agent"),
patch.object(client, "_agent", agent),
):
events = list(client.stream("hi", thread_id="t4"))
msg_events = _ai_events(events)
assert len(msg_events) == 1
def test_auto_thread_id(self, client):
"""stream() auto-generates a thread_id if not provided."""
agent = _make_agent_mock([{"messages": [AIMessage(content="ok", id="ai-1")]}])
with (
patch.object(client, "_ensure_agent"),
patch.object(client, "_agent", agent),
):
events = list(client.stream("hi"))
# Should not raise; end event proves it completed
assert events[-1].type == "end"
def test_messages_mode_emits_token_deltas(self, client):
"""stream() forwards LangGraph ``messages`` mode chunks as delta events.
Regression for bytedance/deer-flow#1969 — before the fix the client
only subscribed to ``values`` mode, so LLM output was delivered as
a single cumulative dump after each graph node finished instead of
token-by-token deltas as the model generated them.
"""
# Three AI chunks sharing the same id, followed by a terminal
# values snapshot with the fully assembled message — this matches
# the shape LangGraph emits when ``stream_mode`` includes both
# ``messages`` and ``values``.
assembled = AIMessage(content="Hel lo world!", id="ai-1", usage_metadata={"input_tokens": 3, "output_tokens": 4, "total_tokens": 7})
agent = MagicMock()
agent.stream.return_value = iter(
[
("messages", (AIMessageChunk(content="Hel", id="ai-1"), {})),
("messages", (AIMessageChunk(content=" lo ", id="ai-1"), {})),
(
"messages",
(
AIMessageChunk(
content="world!",
id="ai-1",
usage_metadata={"input_tokens": 3, "output_tokens": 4, "total_tokens": 7},
),
{},
),
),
("values", {"messages": [HumanMessage(content="hi", id="h-1"), assembled]}),
]
)
with (
patch.object(client, "_ensure_agent"),
patch.object(client, "_agent", agent),
):
events = list(client.stream("hi", thread_id="t-stream"))
# Three delta messages-tuple events, all with the same id, each
# carrying only its own delta (not cumulative).
ai_text_events = [e for e in events if e.type == "messages-tuple" and e.data.get("type") == "ai" and e.data.get("content")]
assert [e.data["content"] for e in ai_text_events] == ["Hel", " lo ", "world!"]
assert all(e.data["id"] == "ai-1" for e in ai_text_events)
# The values snapshot MUST NOT re-synthesize an AI text event for
# the already-streamed id (otherwise consumers see duplicated text).
assert len(ai_text_events) == 3
# Usage metadata attached only to the chunk that actually carried
# it, and counted into cumulative usage exactly once (the values
# snapshot's duplicate usage on the assembled AIMessage must not
# be double-counted).
events_with_usage = [e for e in ai_text_events if "usage_metadata" in e.data]
assert len(events_with_usage) == 1
assert events_with_usage[0].data["usage_metadata"] == {"input_tokens": 3, "output_tokens": 4, "total_tokens": 7}
end_event = events[-1]
assert end_event.type == "end"
assert end_event.data["usage"] == {"input_tokens": 3, "output_tokens": 4, "total_tokens": 7}
# The values snapshot itself is still emitted.
assert any(e.type == "values" for e in events)
# stream_mode includes ``messages`` — the whole point of this fix.
call_kwargs = agent.stream.call_args.kwargs
assert "messages" in call_kwargs["stream_mode"]
def test_chat_accumulates_streamed_deltas(self, client):
"""chat() concatenates per-id deltas from messages mode."""
agent = MagicMock()
agent.stream.return_value = iter(
[
("messages", (AIMessageChunk(content="Hel", id="ai-1"), {})),
("messages", (AIMessageChunk(content="lo ", id="ai-1"), {})),
("messages", (AIMessageChunk(content="world!", id="ai-1"), {})),
("values", {"messages": [HumanMessage(content="hi", id="h-1"), AIMessage(content="Hello world!", id="ai-1")]}),
]
)
with (
patch.object(client, "_ensure_agent"),
patch.object(client, "_agent", agent),
):
result = client.chat("hi", thread_id="t-chat-stream")
assert result == "Hello world!"
def test_messages_mode_tool_message(self, client):
"""stream() forwards ToolMessage chunks from messages mode."""
agent = MagicMock()
agent.stream.return_value = iter(
[
(
"messages",
(
ToolMessage(content="file.txt", id="tm-1", tool_call_id="tc-1", name="bash"),
{},
),
),
("values", {"messages": [HumanMessage(content="ls", id="h-1"), ToolMessage(content="file.txt", id="tm-1", tool_call_id="tc-1", name="bash")]}),
]
)
with (
patch.object(client, "_ensure_agent"),
patch.object(client, "_agent", agent),
):
events = list(client.stream("ls", thread_id="t-tool-stream"))
tool_events = [e for e in events if e.type == "messages-tuple" and e.data.get("type") == "tool"]
# The tool result must be delivered exactly once (from messages
# mode), not duplicated by the values-snapshot synthesis path.
assert len(tool_events) == 1
assert tool_events[0].data["content"] == "file.txt"
assert tool_events[0].data["name"] == "bash"
assert tool_events[0].data["tool_call_id"] == "tc-1"
def test_list_content_blocks(self, client):
"""stream() handles AIMessage with list-of-blocks content."""
ai = AIMessage(
content=[
{"type": "thinking", "thinking": "hmm"},
{"type": "text", "text": "result"},
],
id="ai-1",
)
chunks = [{"messages": [ai]}]
agent = _make_agent_mock(chunks)
with (
patch.object(client, "_ensure_agent"),
patch.object(client, "_agent", agent),
):
events = list(client.stream("hi", thread_id="t5"))
msg_events = _ai_events(events)
assert len(msg_events) == 1
assert msg_events[0].data["content"] == "result"
# ------------------------------------------------------------------
# Refactor regression guards (PR #1974 follow-up safety)
#
# The three tests below are not bug-fix tests — they exist to lock
# the *exact* contract of stream() so a future refactor (e.g. moving
# to ``agent.astream()``, sharing a core with Gateway's run_agent,
# changing the dedup strategy) cannot silently change behavior.
# ------------------------------------------------------------------
def test_dedup_requires_messages_before_values_invariant(self, client):
"""Canary: locks the order-dependence of cross-mode dedup.
``streamed_ids`` is populated only by the ``messages`` branch.
If a ``values`` snapshot arrives BEFORE its corresponding
``messages`` chunks for the same id, the values path falls
through and synthesizes its own AI text event, then the
messages chunk emits another delta — consumers see the same
id twice.
Under normal LangGraph operation this never happens (messages
chunks are emitted during LLM streaming, the values snapshot
after the node completes), so the implicit invariant is safe
in production. This test exists as a tripwire for refactors
that switch to ``agent.astream()`` or share a core with
Gateway: if the ordering ever changes, this test fails and
forces the refactor to either (a) preserve the ordering or
(b) deliberately re-baseline to a stronger order-independent
dedup contract — and document the new contract here.
"""
agent = MagicMock()
agent.stream.return_value = iter(
[
# values arrives FIRST — streamed_ids still empty.
("values", {"messages": [HumanMessage(content="hi", id="h-1"), AIMessage(content="Hello", id="ai-1")]}),
# messages chunk for the same id arrives SECOND.
("messages", (AIMessageChunk(content="Hello", id="ai-1"), {})),
]
)
with (
patch.object(client, "_ensure_agent"),
patch.object(client, "_agent", agent),
):
events = list(client.stream("hi", thread_id="t-order-canary"))
ai_text_events = [e for e in events if e.type == "messages-tuple" and e.data.get("type") == "ai" and e.data.get("content")]
# Current behavior: 2 events (values synthesis + messages delta).
# If a refactor makes dedup order-independent, this becomes 1 —
# update the assertion AND the docstring above to record the
# new contract, do not silently fix this number.
assert len(ai_text_events) == 2
assert all(e.data["id"] == "ai-1" for e in ai_text_events)
assert [e.data["content"] for e in ai_text_events] == ["Hello", "Hello"]
def test_messages_mode_golden_event_sequence(self, client):
"""Locks the **exact** event sequence for a canonical streaming turn.
This is a strong regression guard: any future refactor that
changes the order, type, or shape of emitted events fails this
test with a clear list-equality diff, forcing either a
preserved sequence or a deliberate re-baseline.
Input shape:
messages chunk 1 — text "Hel", no usage
messages chunk 2 — text "lo", with cumulative usage
values snapshot — assembled AIMessage with same usage
Locked behavior:
* Two messages-tuple AI text events (one per chunk), each
carrying ONLY its own delta — not cumulative.
* ``usage_metadata`` attached only to the chunk that
delivered it (not the first chunk).
* The values event is still emitted, but its embedded
``messages`` list is the *serialized* form — no
synthesized messages-tuple events for the already-
streamed id.
* ``end`` event carries cumulative usage counted exactly
once across both modes.
"""
# Inline the usage literal at construction sites so Pyright can
# narrow ``dict[str, int]`` to ``UsageMetadata`` (TypedDict
# narrowing only works on literals, not on bound variables).
# The local ``usage`` is reused only for assertion comparisons
# below, where structural dict equality is sufficient.
usage = {"input_tokens": 3, "output_tokens": 2, "total_tokens": 5}
agent = MagicMock()
agent.stream.return_value = iter(
[
("messages", (AIMessageChunk(content="Hel", id="ai-1"), {})),
("messages", (AIMessageChunk(content="lo", id="ai-1", usage_metadata={"input_tokens": 3, "output_tokens": 2, "total_tokens": 5}), {})),
(
"values",
{
"messages": [
HumanMessage(content="hi", id="h-1"),
AIMessage(content="Hello", id="ai-1", usage_metadata={"input_tokens": 3, "output_tokens": 2, "total_tokens": 5}),
]
},
),
]
)
with (
patch.object(client, "_ensure_agent"),
patch.object(client, "_agent", agent),
):
events = list(client.stream("hi", thread_id="t-golden"))
actual = [(e.type, e.data) for e in events]
expected = [
("messages-tuple", {"type": "ai", "content": "Hel", "id": "ai-1"}),
("messages-tuple", {"type": "ai", "content": "lo", "id": "ai-1", "usage_metadata": usage}),
(
"values",
{
"title": None,
"messages": [
{"type": "human", "content": "hi", "id": "h-1"},
{"type": "ai", "content": "Hello", "id": "ai-1", "usage_metadata": usage},
],
"artifacts": [],
},
),
("end", {"usage": usage}),
]
assert actual == expected
def test_chat_accumulates_in_linear_time(self, client):
"""``chat()`` must use a non-quadratic accumulation strategy.
PR #1974 commit 2 replaced ``buffer = buffer + delta`` with
``list[str].append`` + ``"".join`` to fix an O(n²) regression
introduced in commit 1. This test guards against a future
refactor accidentally restoring the quadratic path.
Threshold rationale (10,000 single-char chunks, 1 second):
* Current O(n) implementation: ~50-200 ms total, including
all mock + event yield overhead.
* O(n²) regression at n=10,000: chat accumulation alone
becomes ~500 ms-2 s (50 M character copies), reliably
over the bound on any reasonable CI.
If this test ever flakes on slow CI, do NOT raise the threshold
blindly — first confirm the implementation still uses
``"".join``, then consider whether the test should move to a
benchmark suite that excludes mock overhead.
"""
import time
n = 10_000
chunks: list = [("messages", (AIMessageChunk(content="x", id="ai-1"), {})) for _ in range(n)]
chunks.append(
(
"values",
{
"messages": [
HumanMessage(content="go", id="h-1"),
AIMessage(content="x" * n, id="ai-1"),
]
},
)
)
agent = MagicMock()
agent.stream.return_value = iter(chunks)
with (
patch.object(client, "_ensure_agent"),
patch.object(client, "_agent", agent),
):
start = time.monotonic()
result = client.chat("go", thread_id="t-perf")
elapsed = time.monotonic() - start
assert result == "x" * n
assert elapsed < 1.0, f"chat() took {elapsed:.3f}s for {n} chunks — possible O(n^2) regression (see PR #1974 commit 2 for the original fix)"
def test_none_id_chunks_produce_duplicates_known_limitation(self, client):
"""Documents a known dedup limitation: ``messages`` chunks with ``id=None``.
Some LLM providers (vLLM, certain custom backends) emit
``AIMessageChunk`` instances without an ``id``. In that case
the cross-mode dedup machinery cannot record the chunk in
``streamed_ids`` (the implementation guards on ``if msg_id``
before adding), and a subsequent ``values`` snapshot whose
reassembled ``AIMessage`` carries a real id will fall through
the dedup check and synthesize a second AI text event for the
same logical message — consumers see duplicated text.
Why this is documented rather than fixed
----------------------------------------
Falling back to ``metadata.get("id")`` does **not** help:
LangGraph's messages-mode metadata never carries the message
id (it carries ``langgraph_node`` / ``langgraph_step`` /
``checkpoint_ns`` / ``tags`` etc.). Synthesizing a fallback
like ``f"_synth_{id(msg_chunk)}"`` only helps if the values
snapshot uses the same fallback, which it does not. A real
fix requires either provider cooperation (always emit chunk
ids — out of scope for this PR) or content-based dedup (risks
false positives for two distinct short messages with identical
text).
This test makes the limitation **explicit and discoverable**
so a future contributor debugging "duplicate text in vLLM
streaming" finds the answer immediately. If a real fix lands,
replace this test with a positive assertion that dedup works
for the None-id case.
See PR #1974 Copilot review comment on ``client.py:515``.
"""
agent = MagicMock()
agent.stream.return_value = iter(
[
# Realistic shape: chunk has no id (provider didn't set one),
# values snapshot's reassembled AIMessage has a fresh id
# assigned somewhere downstream (langgraph or middleware).
("messages", (AIMessageChunk(content="Hello", id=None), {})),
(
"values",
{
"messages": [
HumanMessage(content="hi", id="h-1"),
AIMessage(content="Hello", id="ai-1"),
]
},
),
]
)
with (
patch.object(client, "_ensure_agent"),
patch.object(client, "_agent", agent),
):
events = list(client.stream("hi", thread_id="t-none-id-limitation"))
ai_text_events = [e for e in events if e.type == "messages-tuple" and e.data.get("type") == "ai" and e.data.get("content")]
# KNOWN LIMITATION: 2 events for the same logical message.
# 1) from messages chunk (id=None, NOT added to streamed_ids
# because of ``if msg_id:`` guard at client.py line ~522)
# 2) from values-snapshot synthesis (ai-1 not in streamed_ids,
# so the skip-branch at line ~549 doesn't trigger)
# If this becomes 1, someone fixed the limitation — update this
# test to a positive assertion and document the fix.
assert len(ai_text_events) == 2
assert ai_text_events[0].data["id"] is None
assert ai_text_events[1].data["id"] == "ai-1"
assert all(e.data["content"] == "Hello" for e in ai_text_events)
class TestChat:
def test_returns_last_message(self, client):
"""chat() returns the last AI message text."""
ai1 = AIMessage(content="thinking...", id="ai-1")
ai2 = AIMessage(content="final answer", id="ai-2")
chunks = [
{"messages": [HumanMessage(content="q", id="h-1"), ai1]},
{"messages": [HumanMessage(content="q", id="h-1"), ai1, ai2]},
]
agent = _make_agent_mock(chunks)
with (
patch.object(client, "_ensure_agent"),
patch.object(client, "_agent", agent),
):
result = client.chat("q", thread_id="t6")
assert result == "final answer"
def test_empty_response(self, client):
"""chat() returns empty string if no AI message produced."""
chunks = [{"messages": []}]
agent = _make_agent_mock(chunks)
with (
patch.object(client, "_ensure_agent"),
patch.object(client, "_agent", agent),
):
result = client.chat("q", thread_id="t7")
assert result == ""
# ---------------------------------------------------------------------------
# _extract_text
# ---------------------------------------------------------------------------
class TestExtractText:
def test_string(self):
assert DeerFlowClient._extract_text("hello") == "hello"
def test_list_text_blocks(self):
content = [
{"type": "text", "text": "first"},
{"type": "thinking", "thinking": "skip"},
{"type": "text", "text": "second"},
]
assert DeerFlowClient._extract_text(content) == "first\nsecond"
def test_list_plain_strings(self):
assert DeerFlowClient._extract_text(["a", "b"]) == "a\nb"
def test_empty_list(self):
assert DeerFlowClient._extract_text([]) == ""
def test_other_type(self):
assert DeerFlowClient._extract_text(42) == "42"
# ---------------------------------------------------------------------------
# _ensure_agent
# ---------------------------------------------------------------------------
class TestEnsureAgent:
def test_creates_agent(self, client):
"""_ensure_agent creates an agent on first call."""
mock_agent = MagicMock()
config = client._get_runnable_config("t1")
with (
patch("deerflow.client.create_chat_model"),
patch("deerflow.client.create_agent", return_value=mock_agent),
patch("deerflow.client._build_middlewares", return_value=[]) as mock_build_middlewares,
patch("deerflow.client.apply_prompt_template", return_value="prompt") as mock_apply_prompt,
patch.object(client, "_get_tools", return_value=[]),
patch("deerflow.runtime.checkpointer.get_checkpointer", return_value=MagicMock()),
):
client._agent_name = "custom-agent"
client._available_skills = {"test_skill"}
client._ensure_agent(config)
assert client._agent is mock_agent
# Verify agent_name propagation
mock_build_middlewares.assert_called_once()
assert mock_build_middlewares.call_args.kwargs.get("agent_name") == "custom-agent"
mock_apply_prompt.assert_called_once()
assert mock_apply_prompt.call_args.kwargs.get("agent_name") == "custom-agent"
assert mock_apply_prompt.call_args.kwargs.get("available_skills") == {"test_skill"}
def test_uses_default_checkpointer_when_available(self, client):
mock_agent = MagicMock()
mock_checkpointer = MagicMock()
config = client._get_runnable_config("t1")
with (
patch("deerflow.client.create_chat_model"),
patch("deerflow.client.create_agent", return_value=mock_agent) as mock_create_agent,
patch("deerflow.client._build_middlewares", return_value=[]),
patch("deerflow.client.apply_prompt_template", return_value="prompt"),
patch.object(client, "_get_tools", return_value=[]),
patch("deerflow.runtime.checkpointer.get_checkpointer", return_value=mock_checkpointer),
):
client._ensure_agent(config)
assert mock_create_agent.call_args.kwargs["checkpointer"] is mock_checkpointer
def test_injects_custom_middlewares(self, client):
mock_agent = MagicMock()
mock_custom_middleware = MagicMock()
client._middlewares = [mock_custom_middleware]
config = client._get_runnable_config("t1")
mock_clarification = MagicMock()
mock_clarification.__class__.__name__ = "ClarificationMiddleware"
def fake_build_middlewares(*args, **kwargs):
custom = kwargs.get("custom_middlewares") or []
return [MagicMock()] + custom + [mock_clarification]
with (
patch("deerflow.client.create_chat_model"),
patch("deerflow.client.create_agent", return_value=mock_agent) as mock_create_agent,
patch("deerflow.client._build_middlewares", side_effect=fake_build_middlewares),
patch("deerflow.client.apply_prompt_template", return_value="prompt"),
patch.object(client, "_get_tools", return_value=[]),
patch("deerflow.runtime.checkpointer.get_checkpointer", return_value=MagicMock()),
):
client._ensure_agent(config)
called_middlewares = mock_create_agent.call_args.kwargs["middleware"]
assert len(called_middlewares) == 3
assert called_middlewares[-2] is mock_custom_middleware
assert called_middlewares[-1] is mock_clarification
def test_skips_default_checkpointer_when_unconfigured(self, client):
mock_agent = MagicMock()
config = client._get_runnable_config("t1")
with (
patch("deerflow.client.create_chat_model"),
patch("deerflow.client.create_agent", return_value=mock_agent) as mock_create_agent,
patch("deerflow.client._build_middlewares", return_value=[]),
patch("deerflow.client.apply_prompt_template", return_value="prompt"),
patch.object(client, "_get_tools", return_value=[]),
patch("deerflow.runtime.checkpointer.get_checkpointer", return_value=None),
):
client._ensure_agent(config)
assert "checkpointer" not in mock_create_agent.call_args.kwargs
def test_reuses_agent_same_config(self, client):
"""_ensure_agent does not recreate if config key unchanged."""
mock_agent = MagicMock()
client._agent = mock_agent
client._agent_config_key = (None, True, False, False, None, None)
config = client._get_runnable_config("t1")
client._ensure_agent(config)
# Should still be the same mock — no recreation
assert client._agent is mock_agent
# ---------------------------------------------------------------------------
# get_model
# ---------------------------------------------------------------------------
class TestGetModel:
def test_found(self, client):
model_cfg = MagicMock()
model_cfg.name = "test-model"
model_cfg.model = "test-model"
model_cfg.display_name = "Test Model"
model_cfg.description = "A test model"
model_cfg.supports_thinking = True
model_cfg.supports_reasoning_effort = True
client._app_config.get_model_config.return_value = model_cfg
result = client.get_model("test-model")
assert result == {
"name": "test-model",
"model": "test-model",
"display_name": "Test Model",
"description": "A test model",
"supports_thinking": True,
"supports_reasoning_effort": True,
}
def test_not_found(self, client):
client._app_config.get_model_config.return_value = None
assert client.get_model("nonexistent") is None
# ---------------------------------------------------------------------------
# Thread Queries (list_threads / get_thread)
# ---------------------------------------------------------------------------
class TestThreadQueries:
def _make_mock_checkpoint_tuple(
self,
thread_id: str,
checkpoint_id: str,
ts: str,
title: str | None = None,
parent_id: str | None = None,
messages: list = None,
pending_writes: list = None,
):
cp = MagicMock()
cp.config = {"configurable": {"thread_id": thread_id, "checkpoint_id": checkpoint_id}}
channel_values = {}
if title is not None:
channel_values["title"] = title
if messages is not None:
channel_values["messages"] = messages
cp.checkpoint = {"ts": ts, "channel_values": channel_values}
cp.metadata = {"source": "test"}
if parent_id:
cp.parent_config = {"configurable": {"thread_id": thread_id, "checkpoint_id": parent_id}}
else:
cp.parent_config = {}
cp.pending_writes = pending_writes or []
return cp
def test_list_threads_empty(self, client):
mock_checkpointer = MagicMock()
mock_checkpointer.list.return_value = []
client._checkpointer = mock_checkpointer
result = client.list_threads()
assert result == {"thread_list": []}
mock_checkpointer.list.assert_called_once_with(config=None, limit=10)
def test_list_threads_basic(self, client):
mock_checkpointer = MagicMock()
client._checkpointer = mock_checkpointer
cp1 = self._make_mock_checkpoint_tuple("t1", "c1", "2023-01-01T10:00:00Z", title="Thread 1")
cp2 = self._make_mock_checkpoint_tuple("t1", "c2", "2023-01-01T10:05:00Z", title="Thread 1 Updated")
cp3 = self._make_mock_checkpoint_tuple("t2", "c3", "2023-01-02T10:00:00Z", title="Thread 2")
cp_empty = self._make_mock_checkpoint_tuple("", "c4", "2023-01-03T10:00:00Z", title="Thread Empty")
# Mock list returns out of order to test the timestamp sorting/comparison
# Also includes a checkpoint with an empty thread_id which should be skipped
mock_checkpointer.list.return_value = [cp2, cp1, cp_empty, cp3]
result = client.list_threads(limit=5)
mock_checkpointer.list.assert_called_once_with(config=None, limit=5)
threads = result["thread_list"]
assert len(threads) == 2
# t2 should be first because its created_at (2023-01-02) is newer than t1 (2023-01-01)
assert threads[0]["thread_id"] == "t2"
assert threads[0]["created_at"] == "2023-01-02T10:00:00Z"
assert threads[0]["title"] == "Thread 2"
assert threads[1]["thread_id"] == "t1"
assert threads[1]["created_at"] == "2023-01-01T10:00:00Z"
assert threads[1]["updated_at"] == "2023-01-01T10:05:00Z"
assert threads[1]["latest_checkpoint_id"] == "c2"
assert threads[1]["title"] == "Thread 1 Updated"
def test_list_threads_fallback_checkpointer(self, client):
mock_checkpointer = MagicMock()
mock_checkpointer.list.return_value = []
with patch("deerflow.runtime.checkpointer.provider.get_checkpointer", return_value=mock_checkpointer):
# No internal checkpointer, should fetch from provider
result = client.list_threads()
assert result == {"thread_list": []}
mock_checkpointer.list.assert_called_once()
def test_get_thread(self, client):
mock_checkpointer = MagicMock()
client._checkpointer = mock_checkpointer
msg1 = HumanMessage(content="Hello", id="m1")
msg2 = AIMessage(content="Hi there", id="m2")
cp1 = self._make_mock_checkpoint_tuple("t1", "c1", "2023-01-01T10:00:00Z", messages=[msg1])
cp2 = self._make_mock_checkpoint_tuple("t1", "c2", "2023-01-01T10:01:00Z", parent_id="c1", messages=[msg1, msg2], pending_writes=[("task_1", "messages", {"text": "pending"})])
cp3_no_ts = self._make_mock_checkpoint_tuple("t1", "c3", None)
# checkpointer.list yields in reverse time or random order, test sorting
mock_checkpointer.list.return_value = [cp2, cp1, cp3_no_ts]
result = client.get_thread("t1")
mock_checkpointer.list.assert_called_once_with({"configurable": {"thread_id": "t1"}})
assert result["thread_id"] == "t1"
checkpoints = result["checkpoints"]
assert len(checkpoints) == 3
# None timestamp remains None but is sorted first via a fallback key
assert checkpoints[0]["checkpoint_id"] == "c3"
assert checkpoints[0]["ts"] is None
# Should be sorted by timestamp globally
assert checkpoints[1]["checkpoint_id"] == "c1"
assert checkpoints[1]["ts"] == "2023-01-01T10:00:00Z"
assert len(checkpoints[1]["values"]["messages"]) == 1
assert checkpoints[2]["checkpoint_id"] == "c2"
assert checkpoints[2]["parent_checkpoint_id"] == "c1"
assert checkpoints[2]["ts"] == "2023-01-01T10:01:00Z"
assert len(checkpoints[2]["values"]["messages"]) == 2
# Verify message serialization
assert checkpoints[2]["values"]["messages"][1]["content"] == "Hi there"
# Verify pending writes
assert len(checkpoints[2]["pending_writes"]) == 1
assert checkpoints[2]["pending_writes"][0]["task_id"] == "task_1"
assert checkpoints[2]["pending_writes"][0]["channel"] == "messages"
def test_get_thread_fallback_checkpointer(self, client):
mock_checkpointer = MagicMock()
mock_checkpointer.list.return_value = []
with patch("deerflow.runtime.checkpointer.provider.get_checkpointer", return_value=mock_checkpointer):
result = client.get_thread("t99")
assert result["thread_id"] == "t99"
assert result["checkpoints"] == []
mock_checkpointer.list.assert_called_once_with({"configurable": {"thread_id": "t99"}})
# ---------------------------------------------------------------------------
# MCP config
# ---------------------------------------------------------------------------
class TestMcpConfig:
def test_get_mcp_config(self, client):
server = MagicMock()
server.model_dump.return_value = {"enabled": True, "type": "stdio"}
ext_config = MagicMock()
ext_config.mcp_servers = {"github": server}
with patch("deerflow.client.get_extensions_config", return_value=ext_config):
result = client.get_mcp_config()
assert "mcp_servers" in result
assert "github" in result["mcp_servers"]
assert result["mcp_servers"]["github"]["enabled"] is True
def test_update_mcp_config(self, client):
# Set up current config with skills
current_config = MagicMock()
current_config.skills = {}
reloaded_server = MagicMock()
reloaded_server.model_dump.return_value = {"enabled": True, "type": "sse"}
reloaded_config = MagicMock()
reloaded_config.mcp_servers = {"new-server": reloaded_server}
with tempfile.NamedTemporaryFile(mode="w", suffix=".json", delete=False) as f:
json.dump({}, f)
tmp_path = Path(f.name)
try:
# Pre-set agent to verify it gets invalidated
client._agent = MagicMock()
with (
patch("deerflow.client.ExtensionsConfig.resolve_config_path", return_value=tmp_path),
patch("deerflow.client.get_extensions_config", return_value=current_config),
patch("deerflow.client.reload_extensions_config", return_value=reloaded_config),
):
result = client.update_mcp_config({"new-server": {"enabled": True, "type": "sse"}})
assert "mcp_servers" in result
assert "new-server" in result["mcp_servers"]
assert client._agent is None # M2: agent invalidated
# Verify file was actually written
with open(tmp_path) as f:
saved = json.load(f)
assert "mcpServers" in saved
finally:
tmp_path.unlink()
# ---------------------------------------------------------------------------
# Skills management
# ---------------------------------------------------------------------------
class TestSkillsManagement:
def _make_skill(self, name="test-skill", enabled=True):
s = MagicMock()
s.name = name
s.description = "A test skill"
s.license = "MIT"
s.category = "public"
s.enabled = enabled
return s
def test_get_skill_found(self, client):
skill = self._make_skill()
with patch("deerflow.skills.loader.load_skills", return_value=[skill]):
result = client.get_skill("test-skill")
assert result is not None
assert result["name"] == "test-skill"
def test_get_skill_not_found(self, client):
with patch("deerflow.skills.loader.load_skills", return_value=[]):
result = client.get_skill("nonexistent")
assert result is None
def test_update_skill(self, client):
skill = self._make_skill(enabled=True)
updated_skill = self._make_skill(enabled=False)
ext_config = MagicMock()
ext_config.mcp_servers = {}
ext_config.skills = {}
with tempfile.NamedTemporaryFile(mode="w", suffix=".json", delete=False) as f:
json.dump({}, f)
tmp_path = Path(f.name)
try:
# Pre-set agent to verify it gets invalidated
client._agent = MagicMock()
with (
patch("deerflow.skills.loader.load_skills", side_effect=[[skill], [updated_skill]]),
patch("deerflow.client.ExtensionsConfig.resolve_config_path", return_value=tmp_path),
patch("deerflow.client.get_extensions_config", return_value=ext_config),
patch("deerflow.client.reload_extensions_config"),
):
result = client.update_skill("test-skill", enabled=False)
assert result["enabled"] is False
assert client._agent is None # M2: agent invalidated
finally:
tmp_path.unlink()
def test_update_skill_not_found(self, client):
with patch("deerflow.skills.loader.load_skills", return_value=[]):
with pytest.raises(ValueError, match="not found"):
client.update_skill("nonexistent", enabled=True)
def test_install_skill(self, client):
with tempfile.TemporaryDirectory() as tmp:
tmp_path = Path(tmp)
# Create a valid .skill archive
skill_dir = tmp_path / "my-skill"
skill_dir.mkdir()
(skill_dir / "SKILL.md").write_text("---\nname: my-skill\ndescription: A skill\n---\nContent")
archive_path = tmp_path / "my-skill.skill"
with zipfile.ZipFile(archive_path, "w") as zf:
zf.write(skill_dir / "SKILL.md", "my-skill/SKILL.md")
skills_root = tmp_path / "skills"
(skills_root / "custom").mkdir(parents=True)
with patch("deerflow.skills.installer.get_skills_root_path", return_value=skills_root):
result = client.install_skill(archive_path)
assert result["success"] is True
assert result["skill_name"] == "my-skill"
assert (skills_root / "custom" / "my-skill").exists()
def test_install_skill_not_found(self, client):
with pytest.raises(FileNotFoundError):
client.install_skill("/nonexistent/path.skill")
def test_install_skill_bad_extension(self, client):
with tempfile.NamedTemporaryFile(suffix=".zip", delete=False) as f:
tmp_path = Path(f.name)
try:
with pytest.raises(ValueError, match=".skill extension"):
client.install_skill(tmp_path)
finally:
tmp_path.unlink()
# ---------------------------------------------------------------------------
# Memory management
# ---------------------------------------------------------------------------
class TestMemoryManagement:
def test_import_memory(self, client):
imported = {"version": "1.0", "facts": []}
with patch("deerflow.agents.memory.updater.import_memory_data", return_value=imported) as mock_import:
result = client.import_memory(imported)
assert mock_import.call_count == 1
call_args = mock_import.call_args
assert call_args.args == (imported,)
assert "user_id" in call_args.kwargs
assert result == imported
def test_reload_memory(self, client):
data = {"version": "1.0", "facts": []}
with patch("deerflow.agents.memory.updater.reload_memory_data", return_value=data):
result = client.reload_memory()
assert result == data
def test_clear_memory(self, client):
data = {"version": "1.0", "facts": []}
with patch("deerflow.agents.memory.updater.clear_memory_data", return_value=data):
result = client.clear_memory()
assert result == data
def test_create_memory_fact(self, client):
data = {"version": "1.0", "facts": []}
with patch("deerflow.agents.memory.updater.create_memory_fact", return_value=data) as create_fact:
result = client.create_memory_fact(
"User prefers concise code reviews.",
category="preference",
confidence=0.88,
)
create_fact.assert_called_once_with(
content="User prefers concise code reviews.",
category="preference",
confidence=0.88,
)
assert result == data
def test_delete_memory_fact(self, client):
data = {"version": "1.0", "facts": []}
with patch("deerflow.agents.memory.updater.delete_memory_fact", return_value=data) as delete_fact:
result = client.delete_memory_fact("fact_123")
delete_fact.assert_called_once_with("fact_123")
assert result == data
def test_update_memory_fact(self, client):
data = {"version": "1.0", "facts": []}
with patch("deerflow.agents.memory.updater.update_memory_fact", return_value=data) as update_fact:
result = client.update_memory_fact(
"fact_123",
"User prefers spaces",
category="workflow",
confidence=0.91,
)
update_fact.assert_called_once_with(
fact_id="fact_123",
content="User prefers spaces",
category="workflow",
confidence=0.91,
)
assert result == data
def test_update_memory_fact_preserves_omitted_fields(self, client):
data = {"version": "1.0", "facts": []}
with patch("deerflow.agents.memory.updater.update_memory_fact", return_value=data) as update_fact:
result = client.update_memory_fact(
"fact_123",
"User prefers spaces",
)
update_fact.assert_called_once_with(
fact_id="fact_123",
content="User prefers spaces",
category=None,
confidence=None,
)
assert result == data
def test_get_memory_config(self, client):
config = MagicMock()
config.enabled = True
config.storage_path = ".deer-flow/memory.json"
config.debounce_seconds = 30
config.max_facts = 100
config.fact_confidence_threshold = 0.7
config.injection_enabled = True
config.max_injection_tokens = 2000
with patch("deerflow.config.memory_config.get_memory_config", return_value=config):
result = client.get_memory_config()
assert result["enabled"] is True
assert result["max_facts"] == 100
def test_get_memory_status(self, client):
config = MagicMock()
config.enabled = True
config.storage_path = ".deer-flow/memory.json"
config.debounce_seconds = 30
config.max_facts = 100
config.fact_confidence_threshold = 0.7
config.injection_enabled = True
config.max_injection_tokens = 2000
data = {"version": "1.0", "facts": []}
with (
patch("deerflow.config.memory_config.get_memory_config", return_value=config),
patch("deerflow.agents.memory.updater.get_memory_data", return_value=data),
):
result = client.get_memory_status()
assert "config" in result
assert "data" in result
# ---------------------------------------------------------------------------
# Uploads
# ---------------------------------------------------------------------------
class TestUploads:
def test_upload_files(self, client):
with tempfile.TemporaryDirectory() as tmp:
tmp_path = Path(tmp)
# Create a source file
src_file = tmp_path / "test.txt"
src_file.write_text("hello")
uploads_dir = tmp_path / "uploads"
uploads_dir.mkdir()
with patch("deerflow.client.get_uploads_dir", return_value=uploads_dir), patch("deerflow.client.ensure_uploads_dir", return_value=uploads_dir):
result = client.upload_files("thread-1", [src_file])
assert result["success"] is True
assert len(result["files"]) == 1
assert result["files"][0]["filename"] == "test.txt"
assert "artifact_url" in result["files"][0]
assert "message" in result
assert (uploads_dir / "test.txt").exists()
def test_upload_files_not_found(self, client):
with pytest.raises(FileNotFoundError):
client.upload_files("thread-1", ["/nonexistent/file.txt"])
def test_upload_files_rejects_directory_path(self, client):
with tempfile.TemporaryDirectory() as tmp:
with pytest.raises(ValueError, match="Path is not a file"):
client.upload_files("thread-1", [tmp])
def test_upload_files_reuses_single_executor_inside_event_loop(self, client):
with tempfile.TemporaryDirectory() as tmp:
tmp_path = Path(tmp)
uploads_dir = tmp_path / "uploads"
uploads_dir.mkdir()
first = tmp_path / "first.pdf"
second = tmp_path / "second.pdf"
first.write_bytes(b"%PDF-1.4 first")
second.write_bytes(b"%PDF-1.4 second")
created_executors = []
real_executor_cls = concurrent.futures.ThreadPoolExecutor
async def fake_convert(path: Path) -> Path:
md_path = path.with_suffix(".md")
md_path.write_text(f"converted {path.name}")
return md_path
class FakeExecutor:
def __init__(self, max_workers: int):
self.max_workers = max_workers
self.shutdown_calls = []
self._executor = real_executor_cls(max_workers=max_workers)
created_executors.append(self)
def submit(self, fn, *args, **kwargs):
return self._executor.submit(fn, *args, **kwargs)
def shutdown(self, wait: bool = True):
self.shutdown_calls.append(wait)
self._executor.shutdown(wait=wait)
async def call_upload() -> dict:
return client.upload_files("thread-async", [first, second])
with (
patch("deerflow.client.get_uploads_dir", return_value=uploads_dir),
patch("deerflow.client.ensure_uploads_dir", return_value=uploads_dir),
patch("deerflow.utils.file_conversion.CONVERTIBLE_EXTENSIONS", {".pdf"}),
patch("deerflow.utils.file_conversion.convert_file_to_markdown", side_effect=fake_convert),
patch("concurrent.futures.ThreadPoolExecutor", FakeExecutor),
):
result = asyncio.run(call_upload())
assert result["success"] is True
assert len(result["files"]) == 2
assert len(created_executors) == 1
assert created_executors[0].max_workers == 1
assert created_executors[0].shutdown_calls == [True]
assert result["files"][0]["markdown_file"] == "first.md"
assert result["files"][1]["markdown_file"] == "second.md"
def test_list_uploads(self, client):
with tempfile.TemporaryDirectory() as tmp:
uploads_dir = Path(tmp)
(uploads_dir / "a.txt").write_text("a")
(uploads_dir / "b.txt").write_text("bb")
with patch("deerflow.client.get_uploads_dir", return_value=uploads_dir), patch("deerflow.client.ensure_uploads_dir", return_value=uploads_dir):
result = client.list_uploads("thread-1")
assert result["count"] == 2
assert len(result["files"]) == 2
names = {f["filename"] for f in result["files"]}
assert names == {"a.txt", "b.txt"}
# Verify artifact_url is present
for f in result["files"]:
assert "artifact_url" in f
def test_delete_upload(self, client):
with tempfile.TemporaryDirectory() as tmp:
uploads_dir = Path(tmp)
(uploads_dir / "delete-me.txt").write_text("gone")
with patch("deerflow.client.get_uploads_dir", return_value=uploads_dir), patch("deerflow.client.ensure_uploads_dir", return_value=uploads_dir):
result = client.delete_upload("thread-1", "delete-me.txt")
assert result["success"] is True
assert "delete-me.txt" in result["message"]
assert not (uploads_dir / "delete-me.txt").exists()
def test_delete_upload_not_found(self, client):
with tempfile.TemporaryDirectory() as tmp:
with patch("deerflow.client.get_uploads_dir", return_value=Path(tmp)):
with pytest.raises(FileNotFoundError):
client.delete_upload("thread-1", "nope.txt")
def test_delete_upload_path_traversal(self, client):
with tempfile.TemporaryDirectory() as tmp:
uploads_dir = Path(tmp)
with patch("deerflow.client.get_uploads_dir", return_value=uploads_dir), patch("deerflow.client.ensure_uploads_dir", return_value=uploads_dir):
with pytest.raises(PathTraversalError):
client.delete_upload("thread-1", "../../etc/passwd")
# ---------------------------------------------------------------------------
# Artifacts
# ---------------------------------------------------------------------------
class TestArtifacts:
def test_get_artifact(self, client):
from deerflow.runtime.user_context import get_effective_user_id
with tempfile.TemporaryDirectory() as tmp:
paths = Paths(base_dir=tmp)
user_id = get_effective_user_id()
outputs = paths.sandbox_outputs_dir("t1", user_id=user_id)
outputs.mkdir(parents=True)
(outputs / "result.txt").write_text("artifact content")
with patch("deerflow.client.get_paths", return_value=paths):
content, mime = client.get_artifact("t1", "mnt/user-data/outputs/result.txt")
assert content == b"artifact content"
assert "text" in mime
def test_get_artifact_not_found(self, client):
from deerflow.runtime.user_context import get_effective_user_id
with tempfile.TemporaryDirectory() as tmp:
paths = Paths(base_dir=tmp)
user_id = get_effective_user_id()
paths.sandbox_outputs_dir("t1", user_id=user_id).mkdir(parents=True)
with patch("deerflow.client.get_paths", return_value=paths):
with pytest.raises(FileNotFoundError):
client.get_artifact("t1", "mnt/user-data/outputs/nope.txt")
def test_get_artifact_bad_prefix(self, client):
with pytest.raises(ValueError, match="must start with"):
client.get_artifact("t1", "bad/path/file.txt")
def test_get_artifact_path_traversal(self, client):
from deerflow.runtime.user_context import get_effective_user_id
with tempfile.TemporaryDirectory() as tmp:
paths = Paths(base_dir=tmp)
user_id = get_effective_user_id()
paths.sandbox_outputs_dir("t1", user_id=user_id).mkdir(parents=True)
with patch("deerflow.client.get_paths", return_value=paths):
with pytest.raises(PathTraversalError):
client.get_artifact("t1", "mnt/user-data/../../../etc/passwd")
# ===========================================================================
# Scenario-based integration tests
# ===========================================================================
# These tests simulate realistic user workflows end-to-end, exercising
# multiple methods in sequence to verify they compose correctly.
class TestScenarioMultiTurnConversation:
"""Scenario: User has a multi-turn conversation within a single thread."""
def test_two_turn_conversation(self, client):
"""Two sequential chat() calls on the same thread_id produce
independent results (without checkpointer, each call is stateless)."""
ai1 = AIMessage(content="I'm a helpful assistant.", id="ai-1")
ai2 = AIMessage(content="Python is great!", id="ai-2")
agent = MagicMock()
agent.stream.side_effect = [
iter([{"messages": [HumanMessage(content="who are you?", id="h-1"), ai1]}]),
iter([{"messages": [HumanMessage(content="what language?", id="h-2"), ai2]}]),
]
with (
patch.object(client, "_ensure_agent"),
patch.object(client, "_agent", agent),
):
r1 = client.chat("who are you?", thread_id="thread-multi")
r2 = client.chat("what language?", thread_id="thread-multi")
assert r1 == "I'm a helpful assistant."
assert r2 == "Python is great!"
assert agent.stream.call_count == 2
def test_stream_collects_all_event_types_across_turns(self, client):
"""A full turn emits messages-tuple (tool_call, tool_result, ai text) + values + end."""
ai_tc = AIMessage(
content="",
id="ai-1",
tool_calls=[
{"name": "web_search", "args": {"query": "LangGraph"}, "id": "tc-1"},
],
)
tool_r = ToolMessage(content="LangGraph is a framework...", id="tm-1", tool_call_id="tc-1", name="web_search")
ai_final = AIMessage(content="LangGraph is a framework for building agents.", id="ai-2")
chunks = [
{"messages": [HumanMessage(content="search", id="h-1"), ai_tc]},
{"messages": [HumanMessage(content="search", id="h-1"), ai_tc, tool_r]},
{"messages": [HumanMessage(content="search", id="h-1"), ai_tc, tool_r, ai_final], "title": "LangGraph Search"},
]
agent = _make_agent_mock(chunks)
with (
patch.object(client, "_ensure_agent"),
patch.object(client, "_agent", agent),
):
events = list(client.stream("search", thread_id="t-full"))
# Verify expected event types
types = set(e.type for e in events)
assert types == {"messages-tuple", "values", "end"}
assert events[-1].type == "end"
# Verify tool_call data
tc_events = _tool_call_events(events)
assert len(tc_events) == 1
assert tc_events[0].data["tool_calls"][0]["name"] == "web_search"
assert tc_events[0].data["tool_calls"][0]["args"] == {"query": "LangGraph"}
# Verify tool_result data
tr_events = _tool_result_events(events)
assert len(tr_events) == 1
assert tr_events[0].data["tool_call_id"] == "tc-1"
assert "LangGraph" in tr_events[0].data["content"]
# Verify AI text
msg_events = _ai_events(events)
assert any("framework" in e.data["content"] for e in msg_events)
# Verify values event contains title
values_events = [e for e in events if e.type == "values"]
assert any(e.data.get("title") == "LangGraph Search" for e in values_events)
class TestScenarioToolChain:
"""Scenario: Agent chains multiple tool calls in sequence."""
def test_multi_tool_chain(self, client):
"""Agent calls bash → reads output → calls write_file → responds."""
ai_bash = AIMessage(
content="",
id="ai-1",
tool_calls=[
{"name": "bash", "args": {"cmd": "ls /mnt/user-data/workspace"}, "id": "tc-1"},
],
)
bash_result = ToolMessage(content="README.md\nsrc/", id="tm-1", tool_call_id="tc-1", name="bash")
ai_write = AIMessage(
content="",
id="ai-2",
tool_calls=[
{"name": "write_file", "args": {"path": "/mnt/user-data/outputs/listing.txt", "content": "README.md\nsrc/"}, "id": "tc-2"},
],
)
write_result = ToolMessage(content="File written successfully.", id="tm-2", tool_call_id="tc-2", name="write_file")
ai_final = AIMessage(content="I listed the workspace and saved the output.", id="ai-3")
chunks = [
{"messages": [HumanMessage(content="list and save", id="h-1"), ai_bash]},
{"messages": [HumanMessage(content="list and save", id="h-1"), ai_bash, bash_result]},
{"messages": [HumanMessage(content="list and save", id="h-1"), ai_bash, bash_result, ai_write]},
{"messages": [HumanMessage(content="list and save", id="h-1"), ai_bash, bash_result, ai_write, write_result]},
{"messages": [HumanMessage(content="list and save", id="h-1"), ai_bash, bash_result, ai_write, write_result, ai_final]},
]
agent = _make_agent_mock(chunks)
with (
patch.object(client, "_ensure_agent"),
patch.object(client, "_agent", agent),
):
events = list(client.stream("list and save", thread_id="t-chain"))
tool_calls = _tool_call_events(events)
tool_results = _tool_result_events(events)
messages = _ai_events(events)
assert len(tool_calls) == 2
assert tool_calls[0].data["tool_calls"][0]["name"] == "bash"
assert tool_calls[1].data["tool_calls"][0]["name"] == "write_file"
assert len(tool_results) == 2
assert len(messages) == 1
assert events[-1].type == "end"
class TestScenarioFileLifecycle:
"""Scenario: Upload files → list them → use in chat → download artifact."""
def test_upload_list_delete_lifecycle(self, client):
"""Upload → list → verify → delete → list again."""
with tempfile.TemporaryDirectory() as tmp:
tmp_path = Path(tmp)
uploads_dir = tmp_path / "uploads"
uploads_dir.mkdir()
# Create source files
(tmp_path / "report.txt").write_text("quarterly report data")
(tmp_path / "data.csv").write_text("a,b,c\n1,2,3")
with patch("deerflow.client.get_uploads_dir", return_value=uploads_dir), patch("deerflow.client.ensure_uploads_dir", return_value=uploads_dir):
# Step 1: Upload
result = client.upload_files(
"t-lifecycle",
[
tmp_path / "report.txt",
tmp_path / "data.csv",
],
)
assert result["success"] is True
assert len(result["files"]) == 2
assert {f["filename"] for f in result["files"]} == {"report.txt", "data.csv"}
# Step 2: List
listed = client.list_uploads("t-lifecycle")
assert listed["count"] == 2
assert all("virtual_path" in f for f in listed["files"])
# Step 3: Delete one
del_result = client.delete_upload("t-lifecycle", "report.txt")
assert del_result["success"] is True
# Step 4: Verify deletion
listed = client.list_uploads("t-lifecycle")
assert listed["count"] == 1
assert listed["files"][0]["filename"] == "data.csv"
def test_upload_then_read_artifact(self, client):
"""Upload a file, simulate agent producing artifact, read it back."""
from deerflow.runtime.user_context import get_effective_user_id
with tempfile.TemporaryDirectory() as tmp:
tmp_path = Path(tmp)
uploads_dir = tmp_path / "uploads"
uploads_dir.mkdir()
paths = Paths(base_dir=tmp_path)
user_id = get_effective_user_id()
outputs_dir = paths.sandbox_outputs_dir("t-artifact", user_id=user_id)
outputs_dir.mkdir(parents=True)
# Upload phase
src_file = tmp_path / "input.txt"
src_file.write_text("raw data to process")
with patch("deerflow.client.get_uploads_dir", return_value=uploads_dir), patch("deerflow.client.ensure_uploads_dir", return_value=uploads_dir):
uploaded = client.upload_files("t-artifact", [src_file])
assert len(uploaded["files"]) == 1
# Simulate agent writing an artifact
(outputs_dir / "analysis.json").write_text('{"result": "processed"}')
# Retrieve artifact
with patch("deerflow.client.get_paths", return_value=paths):
content, mime = client.get_artifact("t-artifact", "mnt/user-data/outputs/analysis.json")
assert json.loads(content) == {"result": "processed"}
assert "json" in mime
class TestScenarioConfigManagement:
"""Scenario: Query and update configuration through a management session."""
def test_model_and_skill_discovery(self, client):
"""List models → get specific model → list skills → get specific skill."""
# List models
result = client.list_models()
assert len(result["models"]) >= 1
model_name = result["models"][0]["name"]
# Get specific model
model_cfg = MagicMock()
model_cfg.name = model_name
model_cfg.model = model_name
model_cfg.display_name = None
model_cfg.description = None
model_cfg.supports_thinking = False
model_cfg.supports_reasoning_effort = False
client._app_config.get_model_config.return_value = model_cfg
detail = client.get_model(model_name)
assert detail["name"] == model_name
# List skills
skill = MagicMock()
skill.name = "web-search"
skill.description = "Search the web"
skill.license = "MIT"
skill.category = "public"
skill.enabled = True
with patch("deerflow.skills.loader.load_skills", return_value=[skill]):
skills_result = client.list_skills()
assert len(skills_result["skills"]) == 1
# Get specific skill
with patch("deerflow.skills.loader.load_skills", return_value=[skill]):
detail = client.get_skill("web-search")
assert detail is not None
assert detail["enabled"] is True
def test_mcp_update_then_skill_toggle(self, client):
"""Update MCP config → toggle skill → verify both invalidate agent."""
with tempfile.TemporaryDirectory() as tmp:
config_file = Path(tmp) / "extensions_config.json"
config_file.write_text("{}")
# --- MCP update ---
current_config = MagicMock()
current_config.skills = {}
reloaded_server = MagicMock()
reloaded_server.model_dump.return_value = {"enabled": True, "type": "sse"}
reloaded_config = MagicMock()
reloaded_config.mcp_servers = {"my-mcp": reloaded_server}
client._agent = MagicMock() # Simulate existing agent
with (
patch("deerflow.client.ExtensionsConfig.resolve_config_path", return_value=config_file),
patch("deerflow.client.get_extensions_config", return_value=current_config),
patch("deerflow.client.reload_extensions_config", return_value=reloaded_config),
):
mcp_result = client.update_mcp_config({"my-mcp": {"enabled": True}})
assert "my-mcp" in mcp_result["mcp_servers"]
assert client._agent is None # Agent invalidated
# --- Skill toggle ---
skill = MagicMock()
skill.name = "code-gen"
skill.description = "Generate code"
skill.license = "MIT"
skill.category = "custom"
skill.enabled = True
toggled = MagicMock()
toggled.name = "code-gen"
toggled.description = "Generate code"
toggled.license = "MIT"
toggled.category = "custom"
toggled.enabled = False
ext_config = MagicMock()
ext_config.mcp_servers = {}
ext_config.skills = {}
client._agent = MagicMock() # Simulate re-created agent
with (
patch("deerflow.skills.loader.load_skills", side_effect=[[skill], [toggled]]),
patch("deerflow.client.ExtensionsConfig.resolve_config_path", return_value=config_file),
patch("deerflow.client.get_extensions_config", return_value=ext_config),
patch("deerflow.client.reload_extensions_config"),
):
skill_result = client.update_skill("code-gen", enabled=False)
assert skill_result["enabled"] is False
assert client._agent is None # Agent invalidated again
class TestScenarioAgentRecreation:
"""Scenario: Config changes trigger agent recreation at the right times."""
def test_different_model_triggers_rebuild(self, client):
"""Switching model_name between calls forces agent rebuild."""
agents_created = []
def fake_create_agent(**kwargs):
agent = MagicMock()
agents_created.append(agent)
return agent
config_a = client._get_runnable_config("t1", model_name="gpt-4")
config_b = client._get_runnable_config("t1", model_name="claude-3")
with (
patch("deerflow.client.create_chat_model"),
patch("deerflow.client.create_agent", side_effect=fake_create_agent),
patch("deerflow.client._build_middlewares", return_value=[]),
patch("deerflow.client.apply_prompt_template", return_value="prompt"),
patch.object(client, "_get_tools", return_value=[]),
patch("deerflow.runtime.checkpointer.get_checkpointer", return_value=MagicMock()),
):
client._ensure_agent(config_a)
first_agent = client._agent
client._ensure_agent(config_b)
second_agent = client._agent
assert len(agents_created) == 2
assert first_agent is not second_agent
def test_same_config_reuses_agent(self, client):
"""Repeated calls with identical config do not rebuild."""
agents_created = []
def fake_create_agent(**kwargs):
agent = MagicMock()
agents_created.append(agent)
return agent
config = client._get_runnable_config("t1", model_name="gpt-4")
with (
patch("deerflow.client.create_chat_model"),
patch("deerflow.client.create_agent", side_effect=fake_create_agent),
patch("deerflow.client._build_middlewares", return_value=[]),
patch("deerflow.client.apply_prompt_template", return_value="prompt"),
patch.object(client, "_get_tools", return_value=[]),
patch("deerflow.runtime.checkpointer.get_checkpointer", return_value=MagicMock()),
):
client._ensure_agent(config)
client._ensure_agent(config)
client._ensure_agent(config)
assert len(agents_created) == 1
def test_reset_agent_forces_rebuild(self, client):
"""reset_agent() clears cache, next call rebuilds."""
agents_created = []
def fake_create_agent(**kwargs):
agent = MagicMock()
agents_created.append(agent)
return agent
config = client._get_runnable_config("t1")
with (
patch("deerflow.client.create_chat_model"),
patch("deerflow.client.create_agent", side_effect=fake_create_agent),
patch("deerflow.client._build_middlewares", return_value=[]),
patch("deerflow.client.apply_prompt_template", return_value="prompt"),
patch.object(client, "_get_tools", return_value=[]),
patch("deerflow.runtime.checkpointer.get_checkpointer", return_value=MagicMock()),
):
client._ensure_agent(config)
client.reset_agent()
client._ensure_agent(config)
assert len(agents_created) == 2
def test_per_call_override_triggers_rebuild(self, client):
"""stream() with model_name override creates a different agent config."""
ai = AIMessage(content="ok", id="ai-1")
agent = _make_agent_mock([{"messages": [ai]}])
agents_created = []
def fake_ensure(config):
key = tuple(config.get("configurable", {}).get(k) for k in ["model_name", "thinking_enabled", "is_plan_mode", "subagent_enabled"])
agents_created.append(key)
client._agent = agent
with patch.object(client, "_ensure_agent", side_effect=fake_ensure):
list(client.stream("hi", thread_id="t1"))
list(client.stream("hi", thread_id="t1", model_name="other-model"))
# Two different config keys should have been created
assert len(agents_created) == 2
assert agents_created[0] != agents_created[1]
class TestScenarioThreadIsolation:
"""Scenario: Operations on different threads don't interfere."""
def test_uploads_isolated_per_thread(self, client):
"""Files uploaded to thread-A are not visible in thread-B."""
with tempfile.TemporaryDirectory() as tmp:
tmp_path = Path(tmp)
uploads_a = tmp_path / "thread-a" / "uploads"
uploads_b = tmp_path / "thread-b" / "uploads"
uploads_a.mkdir(parents=True)
uploads_b.mkdir(parents=True)
src_file = tmp_path / "secret.txt"
src_file.write_text("thread-a only")
def get_dir(thread_id):
return uploads_a if thread_id == "thread-a" else uploads_b
with patch("deerflow.client.get_uploads_dir", side_effect=get_dir), patch("deerflow.client.ensure_uploads_dir", side_effect=get_dir):
client.upload_files("thread-a", [src_file])
files_a = client.list_uploads("thread-a")
files_b = client.list_uploads("thread-b")
assert files_a["count"] == 1
assert files_b["count"] == 0
def test_artifacts_isolated_per_thread(self, client):
"""Artifacts in thread-A are not accessible from thread-B."""
from deerflow.runtime.user_context import get_effective_user_id
with tempfile.TemporaryDirectory() as tmp:
paths = Paths(base_dir=tmp)
user_id = get_effective_user_id()
outputs_a = paths.sandbox_outputs_dir("thread-a", user_id=user_id)
outputs_a.mkdir(parents=True)
paths.sandbox_outputs_dir("thread-b", user_id=user_id).mkdir(parents=True)
(outputs_a / "result.txt").write_text("thread-a artifact")
with patch("deerflow.client.get_paths", return_value=paths):
content, _ = client.get_artifact("thread-a", "mnt/user-data/outputs/result.txt")
assert content == b"thread-a artifact"
with pytest.raises(FileNotFoundError):
client.get_artifact("thread-b", "mnt/user-data/outputs/result.txt")
class TestScenarioMemoryWorkflow:
"""Scenario: Memory query → reload → status check."""
def test_memory_full_lifecycle(self, client):
"""get_memory → reload → get_status covers the full memory API."""
initial_data = {"version": "1.0", "facts": [{"id": "f1", "content": "User likes Python"}]}
updated_data = {
"version": "1.0",
"facts": [
{"id": "f1", "content": "User likes Python"},
{"id": "f2", "content": "User prefers dark mode"},
],
}
config = MagicMock()
config.enabled = True
config.storage_path = ".deer-flow/memory.json"
config.debounce_seconds = 30
config.max_facts = 100
config.fact_confidence_threshold = 0.7
config.injection_enabled = True
config.max_injection_tokens = 2000
with patch("deerflow.agents.memory.updater.get_memory_data", return_value=initial_data):
mem = client.get_memory()
assert len(mem["facts"]) == 1
with patch("deerflow.agents.memory.updater.reload_memory_data", return_value=updated_data):
refreshed = client.reload_memory()
assert len(refreshed["facts"]) == 2
with (
patch("deerflow.config.memory_config.get_memory_config", return_value=config),
patch("deerflow.agents.memory.updater.get_memory_data", return_value=updated_data),
):
status = client.get_memory_status()
assert status["config"]["enabled"] is True
assert len(status["data"]["facts"]) == 2
class TestScenarioSkillInstallAndUse:
"""Scenario: Install a skill → verify it appears → toggle it."""
def test_install_then_toggle(self, client):
"""Install .skill archive → list to verify → disable → verify disabled."""
with tempfile.TemporaryDirectory() as tmp:
tmp_path = Path(tmp)
# Create .skill archive
skill_src = tmp_path / "my-analyzer"
skill_src.mkdir()
(skill_src / "SKILL.md").write_text("---\nname: my-analyzer\ndescription: Analyze code\nlicense: MIT\n---\nAnalysis skill")
archive = tmp_path / "my-analyzer.skill"
with zipfile.ZipFile(archive, "w") as zf:
zf.write(skill_src / "SKILL.md", "my-analyzer/SKILL.md")
skills_root = tmp_path / "skills"
(skills_root / "custom").mkdir(parents=True)
# Step 1: Install
with patch("deerflow.skills.installer.get_skills_root_path", return_value=skills_root):
result = client.install_skill(archive)
assert result["success"] is True
assert (skills_root / "custom" / "my-analyzer" / "SKILL.md").exists()
# Step 2: List and find it
installed_skill = MagicMock()
installed_skill.name = "my-analyzer"
installed_skill.description = "Analyze code"
installed_skill.license = "MIT"
installed_skill.category = "custom"
installed_skill.enabled = True
with patch("deerflow.skills.loader.load_skills", return_value=[installed_skill]):
skills_result = client.list_skills()
assert any(s["name"] == "my-analyzer" for s in skills_result["skills"])
# Step 3: Disable it
disabled_skill = MagicMock()
disabled_skill.name = "my-analyzer"
disabled_skill.description = "Analyze code"
disabled_skill.license = "MIT"
disabled_skill.category = "custom"
disabled_skill.enabled = False
ext_config = MagicMock()
ext_config.mcp_servers = {}
ext_config.skills = {}
config_file = tmp_path / "extensions_config.json"
config_file.write_text("{}")
with (
patch("deerflow.skills.loader.load_skills", side_effect=[[installed_skill], [disabled_skill]]),
patch("deerflow.client.ExtensionsConfig.resolve_config_path", return_value=config_file),
patch("deerflow.client.get_extensions_config", return_value=ext_config),
patch("deerflow.client.reload_extensions_config"),
):
toggled = client.update_skill("my-analyzer", enabled=False)
assert toggled["enabled"] is False
class TestScenarioEdgeCases:
"""Scenario: Edge cases and error boundaries in realistic workflows."""
def test_empty_stream_response(self, client):
"""Agent produces no messages — only values + end events."""
agent = _make_agent_mock([{"messages": []}])
with (
patch.object(client, "_ensure_agent"),
patch.object(client, "_agent", agent),
):
events = list(client.stream("hi", thread_id="t-empty"))
# values event (empty messages) + end
assert len(events) == 2
assert events[0].type == "values"
assert events[-1].type == "end"
def test_chat_on_empty_response(self, client):
"""chat() returns empty string for no-message response."""
agent = _make_agent_mock([{"messages": []}])
with (
patch.object(client, "_ensure_agent"),
patch.object(client, "_agent", agent),
):
result = client.chat("hi", thread_id="t-empty-chat")
assert result == ""
def test_multiple_title_changes(self, client):
"""Title changes are carried in values events."""
ai = AIMessage(content="ok", id="ai-1")
chunks = [
{"messages": [ai], "title": "First Title"},
{"messages": [], "title": "First Title"}, # same title repeated
{"messages": [], "title": "Second Title"}, # different title
]
agent = _make_agent_mock(chunks)
with (
patch.object(client, "_ensure_agent"),
patch.object(client, "_agent", agent),
):
events = list(client.stream("hi", thread_id="t-titles"))
# Every chunk produces a values event with the title
values_events = [e for e in events if e.type == "values"]
assert len(values_events) == 3
assert values_events[0].data["title"] == "First Title"
assert values_events[1].data["title"] == "First Title"
assert values_events[2].data["title"] == "Second Title"
def test_concurrent_tool_calls_in_single_message(self, client):
"""Agent produces multiple tool_calls in one AIMessage — emitted as single messages-tuple."""
ai = AIMessage(
content="",
id="ai-1",
tool_calls=[
{"name": "web_search", "args": {"q": "a"}, "id": "tc-1"},
{"name": "web_search", "args": {"q": "b"}, "id": "tc-2"},
{"name": "bash", "args": {"cmd": "echo hi"}, "id": "tc-3"},
],
)
chunks = [{"messages": [ai]}]
agent = _make_agent_mock(chunks)
with (
patch.object(client, "_ensure_agent"),
patch.object(client, "_agent", agent),
):
events = list(client.stream("do things", thread_id="t-parallel"))
tc_events = _tool_call_events(events)
assert len(tc_events) == 1 # One messages-tuple event for the AIMessage
tool_calls = tc_events[0].data["tool_calls"]
assert len(tool_calls) == 3
assert {tc["id"] for tc in tool_calls} == {"tc-1", "tc-2", "tc-3"}
def test_upload_convertible_file_conversion_failure(self, client):
"""Upload a .pdf file where conversion fails — file still uploaded, no markdown."""
with tempfile.TemporaryDirectory() as tmp:
tmp_path = Path(tmp)
uploads_dir = tmp_path / "uploads"
uploads_dir.mkdir()
pdf_file = tmp_path / "doc.pdf"
pdf_file.write_bytes(b"%PDF-1.4 fake content")
with (
patch("deerflow.client.get_uploads_dir", return_value=uploads_dir),
patch("deerflow.client.ensure_uploads_dir", return_value=uploads_dir),
patch("deerflow.utils.file_conversion.CONVERTIBLE_EXTENSIONS", {".pdf"}),
patch("deerflow.utils.file_conversion.convert_file_to_markdown", side_effect=Exception("conversion failed")),
):
result = client.upload_files("t-pdf-fail", [pdf_file])
assert result["success"] is True
assert len(result["files"]) == 1
assert result["files"][0]["filename"] == "doc.pdf"
assert "markdown_file" not in result["files"][0] # Conversion failed gracefully
assert (uploads_dir / "doc.pdf").exists() # File still uploaded
# ---------------------------------------------------------------------------
# Gateway conformance — validate client output against Gateway Pydantic models
# ---------------------------------------------------------------------------
class TestGatewayConformance:
"""Validate that DeerFlowClient return dicts conform to Gateway Pydantic response models.
Each test calls a client method, then parses the result through the
corresponding Gateway response model. If the client drifts (missing or
wrong-typed fields), Pydantic raises ``ValidationError`` and CI catches it.
"""
def test_list_models(self, mock_app_config):
model = MagicMock()
model.name = "test-model"
model.model = "gpt-test"
model.display_name = "Test Model"
model.description = "A test model"
model.supports_thinking = False
mock_app_config.models = [model]
with patch("deerflow.client.get_app_config", return_value=mock_app_config):
client = DeerFlowClient()
result = client.list_models()
parsed = ModelsListResponse(**result)
assert len(parsed.models) == 1
assert parsed.models[0].name == "test-model"
assert parsed.models[0].model == "gpt-test"
def test_get_model(self, mock_app_config):
model = MagicMock()
model.name = "test-model"
model.model = "gpt-test"
model.display_name = "Test Model"
model.description = "A test model"
model.supports_thinking = True
mock_app_config.models = [model]
mock_app_config.get_model_config.return_value = model
with patch("deerflow.client.get_app_config", return_value=mock_app_config):
client = DeerFlowClient()
result = client.get_model("test-model")
assert result is not None
parsed = ModelResponse(**result)
assert parsed.name == "test-model"
assert parsed.model == "gpt-test"
def test_list_skills(self, client):
skill = MagicMock()
skill.name = "web-search"
skill.description = "Search the web"
skill.license = "MIT"
skill.category = "public"
skill.enabled = True
with patch("deerflow.skills.loader.load_skills", return_value=[skill]):
result = client.list_skills()
parsed = SkillsListResponse(**result)
assert len(parsed.skills) == 1
assert parsed.skills[0].name == "web-search"
def test_get_skill(self, client):
skill = MagicMock()
skill.name = "web-search"
skill.description = "Search the web"
skill.license = "MIT"
skill.category = "public"
skill.enabled = True
with patch("deerflow.skills.loader.load_skills", return_value=[skill]):
result = client.get_skill("web-search")
assert result is not None
parsed = SkillResponse(**result)
assert parsed.name == "web-search"
def test_install_skill(self, client, tmp_path):
skill_dir = tmp_path / "my-skill"
skill_dir.mkdir()
(skill_dir / "SKILL.md").write_text("---\nname: my-skill\ndescription: A test skill\n---\nBody\n")
archive = tmp_path / "my-skill.skill"
with zipfile.ZipFile(archive, "w") as zf:
zf.write(skill_dir / "SKILL.md", "my-skill/SKILL.md")
with patch("deerflow.skills.installer.get_skills_root_path", return_value=tmp_path):
result = client.install_skill(archive)
parsed = SkillInstallResponse(**result)
assert parsed.success is True
assert parsed.skill_name == "my-skill"
def test_get_mcp_config(self, client):
server = MagicMock()
server.model_dump.return_value = {
"enabled": True,
"type": "stdio",
"command": "npx",
"args": ["-y", "server"],
"env": {},
"url": None,
"headers": {},
"description": "test server",
}
ext_config = MagicMock()
ext_config.mcp_servers = {"test": server}
with patch("deerflow.client.get_extensions_config", return_value=ext_config):
result = client.get_mcp_config()
parsed = McpConfigResponse(**result)
assert "test" in parsed.mcp_servers
def test_update_mcp_config(self, client, tmp_path):
server = MagicMock()
server.model_dump.return_value = {
"enabled": True,
"type": "stdio",
"command": "npx",
"args": [],
"env": {},
"url": None,
"headers": {},
"description": "",
}
ext_config = MagicMock()
ext_config.mcp_servers = {"srv": server}
ext_config.skills = {}
config_file = tmp_path / "extensions_config.json"
config_file.write_text("{}")
with (
patch("deerflow.client.get_extensions_config", return_value=ext_config),
patch("deerflow.client.ExtensionsConfig.resolve_config_path", return_value=config_file),
patch("deerflow.client.reload_extensions_config", return_value=ext_config),
):
result = client.update_mcp_config({"srv": server.model_dump.return_value})
parsed = McpConfigResponse(**result)
assert "srv" in parsed.mcp_servers
def test_upload_files(self, client, tmp_path):
uploads_dir = tmp_path / "uploads"
uploads_dir.mkdir()
src_file = tmp_path / "hello.txt"
src_file.write_text("hello")
with patch("deerflow.client.get_uploads_dir", return_value=uploads_dir), patch("deerflow.client.ensure_uploads_dir", return_value=uploads_dir):
result = client.upload_files("t-conform", [src_file])
parsed = UploadResponse(**result)
assert parsed.success is True
assert len(parsed.files) == 1
def test_get_memory_config(self, client):
mem_cfg = MagicMock()
mem_cfg.enabled = True
mem_cfg.storage_path = ".deer-flow/memory.json"
mem_cfg.debounce_seconds = 30
mem_cfg.max_facts = 100
mem_cfg.fact_confidence_threshold = 0.7
mem_cfg.injection_enabled = True
mem_cfg.max_injection_tokens = 2000
with patch("deerflow.config.memory_config.get_memory_config", return_value=mem_cfg):
result = client.get_memory_config()
parsed = MemoryConfigResponse(**result)
assert parsed.enabled is True
assert parsed.max_facts == 100
def test_get_memory_status(self, client):
mem_cfg = MagicMock()
mem_cfg.enabled = True
mem_cfg.storage_path = ".deer-flow/memory.json"
mem_cfg.debounce_seconds = 30
mem_cfg.max_facts = 100
mem_cfg.fact_confidence_threshold = 0.7
mem_cfg.injection_enabled = True
mem_cfg.max_injection_tokens = 2000
memory_data = {
"version": "1.0",
"lastUpdated": "",
"user": {
"workContext": {"summary": "", "updatedAt": ""},
"personalContext": {"summary": "", "updatedAt": ""},
"topOfMind": {"summary": "", "updatedAt": ""},
},
"history": {
"recentMonths": {"summary": "", "updatedAt": ""},
"earlierContext": {"summary": "", "updatedAt": ""},
"longTermBackground": {"summary": "", "updatedAt": ""},
},
"facts": [],
}
with (
patch("deerflow.config.memory_config.get_memory_config", return_value=mem_cfg),
patch("deerflow.agents.memory.updater.get_memory_data", return_value=memory_data),
):
result = client.get_memory_status()
parsed = MemoryStatusResponse(**result)
assert parsed.config.enabled is True
assert parsed.data.version == "1.0"
# ===========================================================================
# Hardening — install_skill security gates
# ===========================================================================
class TestInstallSkillSecurity:
"""Every security gate in install_skill() must have a red-line test."""
def test_zip_bomb_rejected(self, client):
"""Archives whose extracted size exceeds the limit are rejected."""
with tempfile.TemporaryDirectory() as tmp:
archive = Path(tmp) / "bomb.skill"
# Create a small archive that claims huge uncompressed size.
# Write 200 bytes but the safe_extract checks cumulative file_size.
data = b"\x00" * 200
with zipfile.ZipFile(archive, "w", compression=zipfile.ZIP_DEFLATED) as zf:
zf.writestr("big.bin", data)
skills_root = Path(tmp) / "skills"
(skills_root / "custom").mkdir(parents=True)
# Patch max_total_size to a small value to trigger the bomb check.
from deerflow.skills import installer as _installer
orig = _installer.safe_extract_skill_archive
def patched_extract(zf, dest, max_total_size=100):
return orig(zf, dest, max_total_size=100)
with (
patch("deerflow.skills.installer.get_skills_root_path", return_value=skills_root),
patch("deerflow.skills.installer.safe_extract_skill_archive", side_effect=patched_extract),
):
with pytest.raises(ValueError, match="too large"):
client.install_skill(archive)
def test_absolute_path_in_archive_rejected(self, client):
"""ZIP entries with absolute paths are rejected."""
with tempfile.TemporaryDirectory() as tmp:
archive = Path(tmp) / "abs.skill"
with zipfile.ZipFile(archive, "w") as zf:
zf.writestr("/etc/passwd", "root:x:0:0")
skills_root = Path(tmp) / "skills"
(skills_root / "custom").mkdir(parents=True)
with patch("deerflow.skills.installer.get_skills_root_path", return_value=skills_root):
with pytest.raises(ValueError, match="unsafe"):
client.install_skill(archive)
def test_dotdot_path_in_archive_rejected(self, client):
"""ZIP entries with '..' path components are rejected."""
with tempfile.TemporaryDirectory() as tmp:
archive = Path(tmp) / "traversal.skill"
with zipfile.ZipFile(archive, "w") as zf:
zf.writestr("skill/../../../etc/shadow", "bad")
skills_root = Path(tmp) / "skills"
(skills_root / "custom").mkdir(parents=True)
with patch("deerflow.skills.installer.get_skills_root_path", return_value=skills_root):
with pytest.raises(ValueError, match="unsafe"):
client.install_skill(archive)
def test_symlinks_skipped_during_extraction(self, client):
"""Symlink entries in the archive are skipped (never written to disk)."""
import stat as stat_mod
with tempfile.TemporaryDirectory() as tmp:
tmp_path = Path(tmp)
archive = tmp_path / "sym-skill.skill"
with zipfile.ZipFile(archive, "w") as zf:
zf.writestr("sym-skill/SKILL.md", "---\nname: sym-skill\ndescription: test\n---\nBody")
# Inject a symlink entry via ZipInfo with Unix symlink mode.
link_info = zipfile.ZipInfo("sym-skill/sneaky_link")
link_info.external_attr = (stat_mod.S_IFLNK | 0o777) << 16
zf.writestr(link_info, "/etc/passwd")
skills_root = tmp_path / "skills"
(skills_root / "custom").mkdir(parents=True)
with patch("deerflow.skills.installer.get_skills_root_path", return_value=skills_root):
result = client.install_skill(archive)
assert result["success"] is True
installed = skills_root / "custom" / "sym-skill"
assert (installed / "SKILL.md").exists()
assert not (installed / "sneaky_link").exists()
def test_invalid_skill_name_rejected(self, client):
"""Skill names containing special characters are rejected."""
with tempfile.TemporaryDirectory() as tmp:
tmp_path = Path(tmp)
skill_dir = tmp_path / "bad-name"
skill_dir.mkdir()
(skill_dir / "SKILL.md").write_text("---\nname: ../evil\ndescription: test\n---\n")
archive = tmp_path / "bad.skill"
with zipfile.ZipFile(archive, "w") as zf:
zf.write(skill_dir / "SKILL.md", "bad-name/SKILL.md")
skills_root = tmp_path / "skills"
(skills_root / "custom").mkdir(parents=True)
with (
patch("deerflow.skills.installer.get_skills_root_path", return_value=skills_root),
patch("deerflow.skills.installer._validate_skill_frontmatter", return_value=(True, "OK", "../evil")),
):
with pytest.raises(ValueError, match="Invalid skill name"):
client.install_skill(archive)
def test_existing_skill_rejected(self, client):
"""Installing a skill that already exists is rejected."""
with tempfile.TemporaryDirectory() as tmp:
tmp_path = Path(tmp)
skill_dir = tmp_path / "dupe-skill"
skill_dir.mkdir()
(skill_dir / "SKILL.md").write_text("---\nname: dupe-skill\ndescription: test\n---\n")
archive = tmp_path / "dupe-skill.skill"
with zipfile.ZipFile(archive, "w") as zf:
zf.write(skill_dir / "SKILL.md", "dupe-skill/SKILL.md")
skills_root = tmp_path / "skills"
(skills_root / "custom" / "dupe-skill").mkdir(parents=True)
with (
patch("deerflow.skills.installer.get_skills_root_path", return_value=skills_root),
patch("deerflow.skills.installer._validate_skill_frontmatter", return_value=(True, "OK", "dupe-skill")),
):
with pytest.raises(ValueError, match="already exists"):
client.install_skill(archive)
def test_empty_archive_rejected(self, client):
"""An archive with no entries is rejected."""
with tempfile.TemporaryDirectory() as tmp:
archive = Path(tmp) / "empty.skill"
with zipfile.ZipFile(archive, "w"):
pass # empty archive
skills_root = Path(tmp) / "skills"
(skills_root / "custom").mkdir(parents=True)
with patch("deerflow.skills.installer.get_skills_root_path", return_value=skills_root):
with pytest.raises(ValueError, match="empty"):
client.install_skill(archive)
def test_invalid_frontmatter_rejected(self, client):
"""Archive with invalid SKILL.md frontmatter is rejected."""
with tempfile.TemporaryDirectory() as tmp:
tmp_path = Path(tmp)
skill_dir = tmp_path / "bad-meta"
skill_dir.mkdir()
(skill_dir / "SKILL.md").write_text("no frontmatter at all")
archive = tmp_path / "bad-meta.skill"
with zipfile.ZipFile(archive, "w") as zf:
zf.write(skill_dir / "SKILL.md", "bad-meta/SKILL.md")
skills_root = tmp_path / "skills"
(skills_root / "custom").mkdir(parents=True)
with (
patch("deerflow.skills.installer.get_skills_root_path", return_value=skills_root),
patch("deerflow.skills.installer._validate_skill_frontmatter", return_value=(False, "Missing name field", "")),
):
with pytest.raises(ValueError, match="Invalid skill"):
client.install_skill(archive)
def test_not_a_zip_rejected(self, client):
"""A .skill file that is not a valid ZIP is rejected."""
with tempfile.TemporaryDirectory() as tmp:
archive = Path(tmp) / "fake.skill"
archive.write_text("this is not a zip file")
with pytest.raises(ValueError, match="not a valid ZIP"):
client.install_skill(archive)
def test_directory_path_rejected(self, client):
"""Passing a directory instead of a file is rejected."""
with tempfile.TemporaryDirectory() as tmp:
with pytest.raises(ValueError, match="not a file"):
client.install_skill(tmp)
# ===========================================================================
# Hardening — _atomic_write_json error paths
# ===========================================================================
class TestAtomicWriteJson:
def test_temp_file_cleaned_on_serialization_failure(self):
"""If json.dump raises, the temp file is removed."""
with tempfile.TemporaryDirectory() as tmp:
target = Path(tmp) / "config.json"
# An object that cannot be serialized to JSON.
bad_data = {"key": object()}
with pytest.raises(TypeError):
DeerFlowClient._atomic_write_json(target, bad_data)
# Target should not have been created.
assert not target.exists()
# No stray .tmp files should remain.
tmp_files = list(Path(tmp).glob("*.tmp"))
assert tmp_files == []
def test_happy_path_writes_atomically(self):
"""Normal write produces correct JSON and no temp files."""
with tempfile.TemporaryDirectory() as tmp:
target = Path(tmp) / "out.json"
data = {"key": "value", "nested": [1, 2, 3]}
DeerFlowClient._atomic_write_json(target, data)
assert target.exists()
with open(target) as f:
loaded = json.load(f)
assert loaded == data
# No temp files left behind.
assert list(Path(tmp).glob("*.tmp")) == []
def test_original_preserved_on_failure(self):
"""If write fails, the original file is not corrupted."""
with tempfile.TemporaryDirectory() as tmp:
target = Path(tmp) / "config.json"
target.write_text('{"original": true}')
bad_data = {"key": object()}
with pytest.raises(TypeError):
DeerFlowClient._atomic_write_json(target, bad_data)
# Original content must survive.
with open(target) as f:
assert json.load(f) == {"original": True}
# ===========================================================================
# Hardening — config update error paths
# ===========================================================================
class TestConfigUpdateErrors:
def test_update_mcp_config_no_config_file(self, client):
"""FileNotFoundError when extensions_config.json cannot be located."""
with patch("deerflow.client.ExtensionsConfig.resolve_config_path", return_value=None):
with pytest.raises(FileNotFoundError, match="Cannot locate"):
client.update_mcp_config({"server": {}})
def test_update_skill_no_config_file(self, client):
"""FileNotFoundError when extensions_config.json cannot be located."""
skill = MagicMock()
skill.name = "some-skill"
with (
patch("deerflow.skills.loader.load_skills", return_value=[skill]),
patch("deerflow.client.ExtensionsConfig.resolve_config_path", return_value=None),
):
with pytest.raises(FileNotFoundError, match="Cannot locate"):
client.update_skill("some-skill", enabled=False)
def test_update_skill_disappears_after_write(self, client):
"""RuntimeError when skill vanishes between write and re-read."""
skill = MagicMock()
skill.name = "ghost-skill"
ext_config = MagicMock()
ext_config.mcp_servers = {}
ext_config.skills = {}
with tempfile.TemporaryDirectory() as tmp:
config_file = Path(tmp) / "extensions_config.json"
config_file.write_text("{}")
with (
patch("deerflow.skills.loader.load_skills", side_effect=[[skill], []]),
patch("deerflow.client.ExtensionsConfig.resolve_config_path", return_value=config_file),
patch("deerflow.client.get_extensions_config", return_value=ext_config),
patch("deerflow.client.reload_extensions_config"),
):
with pytest.raises(RuntimeError, match="disappeared"):
client.update_skill("ghost-skill", enabled=False)
# ===========================================================================
# Hardening — stream / chat edge cases
# ===========================================================================
class TestStreamHardening:
def test_agent_exception_propagates(self, client):
"""Exceptions from agent.stream() propagate to caller."""
agent = MagicMock()
agent.stream.side_effect = RuntimeError("model quota exceeded")
with (
patch.object(client, "_ensure_agent"),
patch.object(client, "_agent", agent),
):
with pytest.raises(RuntimeError, match="model quota exceeded"):
list(client.stream("hi", thread_id="t-err"))
def test_messages_without_id(self, client):
"""Messages without id attribute are emitted without crashing."""
ai = AIMessage(content="no id here")
# Forcibly remove the id attribute to simulate edge case.
object.__setattr__(ai, "id", None)
chunks = [{"messages": [ai]}]
agent = _make_agent_mock(chunks)
with (
patch.object(client, "_ensure_agent"),
patch.object(client, "_agent", agent),
):
events = list(client.stream("hi", thread_id="t-noid"))
# Should produce events without error.
assert events[-1].type == "end"
ai_events = _ai_events(events)
assert len(ai_events) == 1
assert ai_events[0].data["content"] == "no id here"
def test_tool_calls_only_no_text(self, client):
"""chat() returns empty string when agent only emits tool calls."""
ai = AIMessage(
content="",
id="ai-1",
tool_calls=[{"name": "bash", "args": {"cmd": "ls"}, "id": "tc-1"}],
)
tool = ToolMessage(content="output", id="tm-1", tool_call_id="tc-1", name="bash")
chunks = [
{"messages": [ai]},
{"messages": [ai, tool]},
]
agent = _make_agent_mock(chunks)
with (
patch.object(client, "_ensure_agent"),
patch.object(client, "_agent", agent),
):
result = client.chat("do it", thread_id="t-tc-only")
assert result == ""
def test_duplicate_messages_without_id_not_deduplicated(self, client):
"""Messages with id=None are NOT deduplicated (each is emitted)."""
ai1 = AIMessage(content="first")
ai2 = AIMessage(content="second")
object.__setattr__(ai1, "id", None)
object.__setattr__(ai2, "id", None)
chunks = [
{"messages": [ai1]},
{"messages": [ai2]},
]
agent = _make_agent_mock(chunks)
with (
patch.object(client, "_ensure_agent"),
patch.object(client, "_agent", agent),
):
events = list(client.stream("hi", thread_id="t-dup-noid"))
ai_msgs = _ai_events(events)
assert len(ai_msgs) == 2
# ===========================================================================
# Hardening — _serialize_message coverage
# ===========================================================================
class TestSerializeMessage:
def test_system_message(self):
msg = SystemMessage(content="You are a helpful assistant.", id="sys-1")
result = DeerFlowClient._serialize_message(msg)
assert result["type"] == "system"
assert result["content"] == "You are a helpful assistant."
assert result["id"] == "sys-1"
def test_unknown_message_type(self):
"""Non-standard message types serialize as 'unknown'."""
msg = MagicMock()
msg.id = "unk-1"
msg.content = "something"
# Not an instance of AIMessage/ToolMessage/HumanMessage/SystemMessage
type(msg).__name__ = "CustomMessage"
result = DeerFlowClient._serialize_message(msg)
assert result["type"] == "unknown"
assert result["id"] == "unk-1"
def test_ai_message_with_tool_calls(self):
msg = AIMessage(
content="",
id="ai-tc",
tool_calls=[{"name": "bash", "args": {"cmd": "ls"}, "id": "tc-1"}],
)
result = DeerFlowClient._serialize_message(msg)
assert result["type"] == "ai"
assert len(result["tool_calls"]) == 1
assert result["tool_calls"][0]["name"] == "bash"
def test_tool_message_non_string_content(self):
msg = ToolMessage(content={"key": "value"}, id="tm-1", tool_call_id="tc-1", name="tool")
result = DeerFlowClient._serialize_message(msg)
assert result["type"] == "tool"
assert isinstance(result["content"], str)
# ===========================================================================
# Hardening — upload / delete symlink attack
# ===========================================================================
class TestUploadDeleteSymlink:
def test_delete_upload_symlink_outside_dir(self, client):
"""A symlink in uploads dir pointing outside is caught by path traversal check."""
with tempfile.TemporaryDirectory() as tmp:
uploads_dir = Path(tmp) / "uploads"
uploads_dir.mkdir()
# Create a target file outside uploads dir.
outside = Path(tmp) / "secret.txt"
outside.write_text("sensitive data")
# Create a symlink inside uploads dir pointing to outside file.
link = uploads_dir / "harmless.txt"
try:
link.symlink_to(outside)
except OSError as exc:
if getattr(exc, "winerror", None) == 1314:
pytest.skip("symlink creation requires Developer Mode or elevated privileges on Windows")
raise
with patch("deerflow.client.get_uploads_dir", return_value=uploads_dir), patch("deerflow.client.ensure_uploads_dir", return_value=uploads_dir):
# The resolved path of the symlink escapes uploads_dir,
# so path traversal check should catch it.
with pytest.raises(PathTraversalError):
client.delete_upload("thread-1", "harmless.txt")
# The outside file must NOT have been deleted.
assert outside.exists()
def test_upload_filename_with_spaces_and_unicode(self, client):
"""Files with spaces and unicode characters in names upload correctly."""
with tempfile.TemporaryDirectory() as tmp:
tmp_path = Path(tmp)
uploads_dir = tmp_path / "uploads"
uploads_dir.mkdir()
weird_name = "report 2024 数据.txt"
src_file = tmp_path / weird_name
src_file.write_text("data")
with patch("deerflow.client.get_uploads_dir", return_value=uploads_dir), patch("deerflow.client.ensure_uploads_dir", return_value=uploads_dir):
result = client.upload_files("thread-1", [src_file])
assert result["success"] is True
assert result["files"][0]["filename"] == weird_name
assert (uploads_dir / weird_name).exists()
# ===========================================================================
# Hardening — artifact edge cases
# ===========================================================================
class TestArtifactHardening:
def test_artifact_directory_rejected(self, client):
"""get_artifact rejects paths that resolve to a directory."""
from deerflow.runtime.user_context import get_effective_user_id
with tempfile.TemporaryDirectory() as tmp:
paths = Paths(base_dir=tmp)
user_id = get_effective_user_id()
subdir = paths.sandbox_outputs_dir("t1", user_id=user_id) / "subdir"
subdir.mkdir(parents=True)
with patch("deerflow.client.get_paths", return_value=paths):
with pytest.raises(ValueError, match="not a file"):
client.get_artifact("t1", "mnt/user-data/outputs/subdir")
def test_artifact_leading_slash_stripped(self, client):
"""Paths with leading slash are handled correctly."""
from deerflow.runtime.user_context import get_effective_user_id
with tempfile.TemporaryDirectory() as tmp:
paths = Paths(base_dir=tmp)
user_id = get_effective_user_id()
outputs = paths.sandbox_outputs_dir("t1", user_id=user_id)
outputs.mkdir(parents=True)
(outputs / "file.txt").write_text("content")
with patch("deerflow.client.get_paths", return_value=paths):
content, _mime = client.get_artifact("t1", "/mnt/user-data/outputs/file.txt")
assert content == b"content"
# ===========================================================================
# BUG DETECTION — tests that expose real bugs in client.py
# ===========================================================================
class TestUploadDuplicateFilenames:
"""Regression: upload_files must auto-rename duplicate basenames.
Previously it silently overwrote the first file with the second,
then reported both in the response while only one existed on disk.
Now duplicates are renamed (data.txt → data_1.txt) and the response
includes original_filename so the agent / caller can see what happened.
"""
def test_duplicate_filenames_auto_renamed(self, client):
"""Two files with same basename → second gets _1 suffix."""
with tempfile.TemporaryDirectory() as tmp:
tmp_path = Path(tmp)
uploads_dir = tmp_path / "uploads"
uploads_dir.mkdir()
dir_a = tmp_path / "a"
dir_b = tmp_path / "b"
dir_a.mkdir()
dir_b.mkdir()
(dir_a / "data.txt").write_text("version A")
(dir_b / "data.txt").write_text("version B")
with patch("deerflow.client.get_uploads_dir", return_value=uploads_dir), patch("deerflow.client.ensure_uploads_dir", return_value=uploads_dir):
result = client.upload_files("t-dup", [dir_a / "data.txt", dir_b / "data.txt"])
assert result["success"] is True
assert len(result["files"]) == 2
# Both files exist on disk with distinct names.
disk_files = sorted(p.name for p in uploads_dir.iterdir())
assert disk_files == ["data.txt", "data_1.txt"]
# First keeps original name, second is renamed.
assert result["files"][0]["filename"] == "data.txt"
assert "original_filename" not in result["files"][0]
assert result["files"][1]["filename"] == "data_1.txt"
assert result["files"][1]["original_filename"] == "data.txt"
# Content preserved correctly.
assert (uploads_dir / "data.txt").read_text() == "version A"
assert (uploads_dir / "data_1.txt").read_text() == "version B"
def test_triple_duplicate_increments_counter(self, client):
"""Three files with same basename → _1, _2 suffixes."""
with tempfile.TemporaryDirectory() as tmp:
tmp_path = Path(tmp)
uploads_dir = tmp_path / "uploads"
uploads_dir.mkdir()
for name in ["x", "y", "z"]:
d = tmp_path / name
d.mkdir()
(d / "report.csv").write_text(f"from {name}")
with patch("deerflow.client.get_uploads_dir", return_value=uploads_dir), patch("deerflow.client.ensure_uploads_dir", return_value=uploads_dir):
result = client.upload_files(
"t-triple",
[tmp_path / "x" / "report.csv", tmp_path / "y" / "report.csv", tmp_path / "z" / "report.csv"],
)
filenames = [f["filename"] for f in result["files"]]
assert filenames == ["report.csv", "report_1.csv", "report_2.csv"]
assert len(list(uploads_dir.iterdir())) == 3
def test_different_filenames_no_rename(self, client):
"""Non-duplicate filenames upload normally without rename."""
with tempfile.TemporaryDirectory() as tmp:
tmp_path = Path(tmp)
uploads_dir = tmp_path / "uploads"
uploads_dir.mkdir()
(tmp_path / "a.txt").write_text("aaa")
(tmp_path / "b.txt").write_text("bbb")
with patch("deerflow.client.get_uploads_dir", return_value=uploads_dir), patch("deerflow.client.ensure_uploads_dir", return_value=uploads_dir):
result = client.upload_files("t-ok", [tmp_path / "a.txt", tmp_path / "b.txt"])
assert result["success"] is True
assert len(result["files"]) == 2
assert all("original_filename" not in f for f in result["files"])
assert len(list(uploads_dir.iterdir())) == 2
class TestBugArtifactPrefixMatchTooLoose:
"""Regression: get_artifact must reject paths like ``mnt/user-data-evil/...``.
Previously ``startswith("mnt/user-data")`` matched ``"mnt/user-data-evil"``
because it was a string prefix, not a path-segment check.
"""
def test_non_canonical_prefix_rejected(self, client):
"""Paths that share a string prefix but differ at segment boundary are rejected."""
with pytest.raises(ValueError, match="must start with"):
client.get_artifact("t1", "mnt/user-data-evil/secret.txt")
def test_exact_prefix_without_subpath_accepted(self, client):
"""Bare 'mnt/user-data' is accepted (will later fail as directory, not at prefix)."""
from deerflow.runtime.user_context import get_effective_user_id
with tempfile.TemporaryDirectory() as tmp:
paths = Paths(base_dir=tmp)
user_id = get_effective_user_id()
paths.sandbox_outputs_dir("t1", user_id=user_id).mkdir(parents=True)
with patch("deerflow.client.get_paths", return_value=paths):
# Accepted at prefix check, but fails because it's a directory.
with pytest.raises(ValueError, match="not a file"):
client.get_artifact("t1", "mnt/user-data")
class TestBugListUploadsDeadCode:
"""Regression: list_uploads works even when called on a fresh thread
(directory does not exist yet — returns empty without creating it).
"""
def test_list_uploads_on_fresh_thread(self, client):
"""list_uploads on a thread that never had uploads returns empty list."""
with tempfile.TemporaryDirectory() as tmp:
non_existent = Path(tmp) / "does-not-exist" / "uploads"
assert not non_existent.exists()
mock_paths = MagicMock()
mock_paths.sandbox_uploads_dir.return_value = non_existent
with patch("deerflow.uploads.manager.get_paths", return_value=mock_paths):
result = client.list_uploads("thread-fresh")
# Read path should NOT create the directory
assert not non_existent.exists()
assert result == {"files": [], "count": 0}
class TestBugAgentInvalidationInconsistency:
"""Regression: update_skill and update_mcp_config must reset both
_agent and _agent_config_key, just like reset_agent() does.
"""
def test_update_mcp_resets_config_key(self, client):
"""After update_mcp_config, both _agent and _agent_config_key are None."""
client._agent = MagicMock()
client._agent_config_key = ("model", True, False, False)
current_config = MagicMock()
current_config.skills = {}
reloaded = MagicMock()
reloaded.mcp_servers = {}
with tempfile.TemporaryDirectory() as tmp:
config_file = Path(tmp) / "ext.json"
config_file.write_text("{}")
with (
patch("deerflow.client.ExtensionsConfig.resolve_config_path", return_value=config_file),
patch("deerflow.client.get_extensions_config", return_value=current_config),
patch("deerflow.client.reload_extensions_config", return_value=reloaded),
):
client.update_mcp_config({})
assert client._agent is None
assert client._agent_config_key is None
def test_update_skill_resets_config_key(self, client):
"""After update_skill, both _agent and _agent_config_key are None."""
client._agent = MagicMock()
client._agent_config_key = ("model", True, False, False)
skill = MagicMock()
skill.name = "s1"
updated = MagicMock()
updated.name = "s1"
updated.description = "d"
updated.license = "MIT"
updated.category = "c"
updated.enabled = False
ext_config = MagicMock()
ext_config.mcp_servers = {}
ext_config.skills = {}
with tempfile.TemporaryDirectory() as tmp:
config_file = Path(tmp) / "ext.json"
config_file.write_text("{}")
with (
patch("deerflow.skills.loader.load_skills", side_effect=[[skill], [updated]]),
patch("deerflow.client.ExtensionsConfig.resolve_config_path", return_value=config_file),
patch("deerflow.client.get_extensions_config", return_value=ext_config),
patch("deerflow.client.reload_extensions_config"),
):
client.update_skill("s1", enabled=False)
assert client._agent is None
assert client._agent_config_key is None