Squashes 25 PR commits onto current main. AppConfig becomes a pure value
object with no ambient lookup. Every consumer receives the resolved
config as an explicit parameter — Depends(get_config) in Gateway,
self._app_config in DeerFlowClient, runtime.context.app_config in agent
runs, AppConfig.from_file() at the LangGraph Server registration
boundary.
Phase 1 — frozen data + typed context
- All config models (AppConfig, MemoryConfig, DatabaseConfig, …) become
frozen=True; no sub-module globals.
- AppConfig.from_file() is pure (no side-effect singleton loaders).
- Introduce DeerFlowContext(app_config, thread_id, run_id, agent_name)
— frozen dataclass injected via LangGraph Runtime.
- Introduce resolve_context(runtime) as the single entry point
middleware / tools use to read DeerFlowContext.
Phase 2 — pure explicit parameter passing
- Gateway: app.state.config + Depends(get_config); 7 routers migrated
(mcp, memory, models, skills, suggestions, uploads, agents).
- DeerFlowClient: __init__(config=...) captures config locally.
- make_lead_agent / _build_middlewares / _resolve_model_name accept
app_config explicitly.
- RunContext.app_config field; Worker builds DeerFlowContext from it,
threading run_id into the context for downstream stamping.
- Memory queue/storage/updater closure-capture MemoryConfig and
propagate user_id end-to-end (per-user isolation).
- Sandbox/skills/community/factories/tools thread app_config.
- resolve_context() rejects non-typed runtime.context.
- Test suite migrated off AppConfig.current() monkey-patches.
- AppConfig.current() classmethod deleted.
Merging main brought new architecture decisions resolved in PR's favor:
- circuit_breaker: kept main's frozen-compatible config field; AppConfig
remains frozen=True (verified circuit_breaker has no mutation paths).
- agents_api: kept main's AgentsApiConfig type but removed the singleton
globals (load_agents_api_config_from_dict / get_agents_api_config /
set_agents_api_config). 8 routes in agents.py now read via
Depends(get_config).
- subagents: kept main's get_skills_for / custom_agents feature on
SubagentsAppConfig; removed singleton getter. registry.py now reads
app_config.subagents directly.
- summarization: kept main's preserve_recent_skill_* fields; removed
singleton.
- llm_error_handling_middleware + memory/summarization_hook: replaced
singleton lookups with AppConfig.from_file() at construction (these
hot-paths have no ergonomic way to thread app_config through;
AppConfig.from_file is a pure load).
- worker.py + thread_data_middleware.py: DeerFlowContext.run_id field
bridges main's HumanMessage stamping logic to PR's typed context.
Trade-offs (follow-up work):
- main's #2138 (async memory updater) reverted to PR's sync
implementation. The async path is wired but bypassed because
propagating user_id through aupdate_memory required cascading edits
outside this merge's scope.
- tests/test_subagent_skills_config.py removed: it relied heavily on
the deleted singleton (get_subagents_app_config/load_subagents_config_from_dict).
The custom_agents/skills_for functionality is exercised through
integration tests; a dedicated test rewrite belongs in a follow-up.
Verification: backend test suite — 2560 passed, 4 skipped, 84 failures.
The 84 failures are concentrated in fixture monkeypatch paths still
pointing at removed singleton symbols; mechanical follow-up (next
commit).
Two production docker-compose.yaml bugs prevent `make up` from working:
1. Gateway missing DEER_FLOW_CONFIG_PATH and DEER_FLOW_EXTENSIONS_CONFIG_PATH
environment overrides. Added in fb2d99f (#1836) but accidentally reverted
by ca2fb95 (#1847). Without them, gateway reads host paths from .env via
env_file, causing FileNotFoundError inside the container.
2. Langgraph command fails when LANGGRAPH_ALLOW_BLOCKING is unset (default).
Empty $${allow_blocking} inserts a bare space between flags, causing
' --no-reload' to be parsed as unexpected extra argument. Fix by building
args string first and conditionally appending --allow-blocking.
Co-authored-by: cooper <cooperfu@tencent.com>
Escape shell variables to prevent Docker Compose from attempting
substitution at parse time. Rename allow_blocking_flag to allow_blocking
for consistency with dev version.
Fixes the 'allow_blocking_flag not set' warning and enables --allow-blocking
flag to work correctly.
The gateway service was missing these two environment variables that tell
it where to find the config files inside the container. Without them,
the gateway reads DEER_FLOW_CONFIG_PATH from the host's .env file (set
to a host filesystem path), which is not accessible inside the container,
causing FileNotFoundError on startup. The langgraph service already had
these variables set correctly.
The `environment` section in docker-compose.yaml set
`LANGSMITH_TRACING=${LANGSMITH_TRACING:-false}`, which always resolves
to `false` because Docker Compose evaluates `${}` substitutions from
the host shell environment, not from `env_file`.
Since `environment` entries take precedence over `env_file`, setting
`LANGSMITH_TRACING=true` in `.env` had no effect — tracing stayed
disabled despite following the documented instructions.
Remove the explicit `LANGSMITH_TRACING` from `environment` so the
value from `.env` (loaded via `env_file`) is used as intended.
`langgraph dev` defaults `n_jobs_per_worker` to 1 when the flag is not
explicitly passed (see langgraph_api/cli.py), even though the
`N_JOBS_PER_WORKER` env-var default is 10.
This causes the LangGraph server to run with a single background worker,
meaning all conversation runs are processed serially. When one run is
busy (e.g. summarization or long tool-calling chains), all other threads
are blocked until it finishes.
Add `--n-jobs-per-worker 10` to both production and dev Docker Compose
files to match the intended default concurrency.
Add LangSmith tracing setup instructions across the project:
- .env.example: add LANGSMITH_* env vars (commented out)
- README.md + translations (zh/ja/fr/ru): add LangSmith Tracing section
under Advanced with setup steps and env var reference
- backend/README.md: add detailed LangSmith Tracing section with setup,
env var table, how-it-works explanation, and Docker notes
- docker-compose.yaml: update LANGCHAIN_TRACING_V2 to LANGSMITH_TRACING
for naming consistency with the rest of the project
Made-with: Cursor
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* fix: add build-arg support for proxies and mirrors in Docker builds (#1260)
Pin Debian images to bookworm, make UV source image configurable,
and pass APT_MIRROR/NPM_REGISTRY/UV_IMAGE through docker-compose.
* fix: ensure build args use consistent defaults across compose and Dockerfiles
UV_IMAGE: ${UV_IMAGE:-} resolved to empty when unset, overriding the
Dockerfile ARG default and breaking `FROM ${UV_IMAGE}`. Also configure
COREPACK_NPM_REGISTRY before pnpm download and propagate NPM_REGISTRY
into the prod stage.
* fix: dearmor NodeSource GPG key to resolve signing error
Pipe the downloaded key through gpg --dearmor so apt can verify
the repository signature (fixes NO_PUBKEY 2F59B5F99B1BE0B4).
---------
Co-authored-by: JeffJiang <for-eleven@hotmail.com>
* feat: add Claude Code OAuth and Codex CLI providers
Port of bytedance/deer-flow#1136 from @solanian's feat/cli-oauth-providers branch.\n\nCarries the feature forward on top of current main without the original CLA-blocked commit metadata, while preserving attribution in the commit message for review.
* fix: harden CLI credential loading
Align Codex auth loading with the current ~/.codex/auth.json shape, make Docker credential mounts directory-based to avoid broken file binds on hosts without exported credential files, and add focused loader tests.
* refactor: tighten codex auth typing
Replace the temporary Any return type in CodexChatModel._load_codex_auth with the concrete CodexCliCredential type after the credential loader was stabilized.
* fix: load Claude Code OAuth from Keychain
Match Claude Code's macOS storage strategy more closely by checking the Keychain-backed credentials store before falling back to ~/.claude/.credentials.json. Keep explicit file overrides and add focused tests for the Keychain path.
* fix: require explicit Claude OAuth handoff
* style: format thread hooks reasoning request
* docs: document CLI-backed auth providers
* fix: address provider review feedback
* fix: harden provider edge cases
* Fix deferred tools, Codex message normalization, and local sandbox paths
* chore: narrow PR scope to OAuth providers
* chore: remove unrelated frontend changes
* chore: reapply OAuth branch frontend scope cleanup
* fix: preserve upload guards with reasoning effort wiring
---------
Co-authored-by: Willem Jiang <willem.jiang@gmail.com>
* refactor: extract shared utils to break harness→app cross-layer imports
Move _validate_skill_frontmatter to src/skills/validation.py and
CONVERTIBLE_EXTENSIONS + convert_file_to_markdown to src/utils/file_conversion.py.
This eliminates the two reverse dependencies from client.py (harness layer)
into gateway/routers/ (app layer), preparing for the harness/app package split.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* refactor: split backend/src into harness (deerflow.*) and app (app.*)
Physically split the monolithic backend/src/ package into two layers:
- **Harness** (`packages/harness/deerflow/`): publishable agent framework
package with import prefix `deerflow.*`. Contains agents, sandbox, tools,
models, MCP, skills, config, and all core infrastructure.
- **App** (`app/`): unpublished application code with import prefix `app.*`.
Contains gateway (FastAPI REST API) and channels (IM integrations).
Key changes:
- Move 13 harness modules to packages/harness/deerflow/ via git mv
- Move gateway + channels to app/ via git mv
- Rename all imports: src.* → deerflow.* (harness) / app.* (app layer)
- Set up uv workspace with deerflow-harness as workspace member
- Update langgraph.json, config.example.yaml, all scripts, Docker files
- Add build-system (hatchling) to harness pyproject.toml
- Add PYTHONPATH=. to gateway startup commands for app.* resolution
- Update ruff.toml with known-first-party for import sorting
- Update all documentation to reflect new directory structure
Boundary rule enforced: harness code never imports from app.
All 429 tests pass. Lint clean.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* chore: add harness→app boundary check test and update docs
Add test_harness_boundary.py that scans all Python files in
packages/harness/deerflow/ and fails if any `from app.*` or
`import app.*` statement is found. This enforces the architectural
rule that the harness layer never depends on the app layer.
Update CLAUDE.md to document the harness/app split architecture,
import conventions, and the boundary enforcement test.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: add config versioning with auto-upgrade on startup
When config.example.yaml schema changes, developers' local config.yaml
files can silently become outdated. This adds a config_version field and
auto-upgrade mechanism so breaking changes (like src.* → deerflow.*
renames) are applied automatically before services start.
- Add config_version: 1 to config.example.yaml
- Add startup version check warning in AppConfig.from_file()
- Add scripts/config-upgrade.sh with migration registry for value replacements
- Add `make config-upgrade` target
- Auto-run config-upgrade in serve.sh and start-daemon.sh before starting services
- Add config error hints in service failure messages
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix comments
* fix: update src.* import in test_sandbox_tools_security to deerflow.*
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: handle empty config and search parent dirs for config.example.yaml
Address Copilot review comments on PR #1131:
- Guard against yaml.safe_load() returning None for empty config files
- Search parent directories for config.example.yaml instead of only
looking next to config.yaml, fixing detection in common setups
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: correct skills root path depth and config_version type coercion
- loader.py: fix get_skills_root_path() to use 5 parent levels (was 3)
after harness split, file lives at packages/harness/deerflow/skills/
so parent×3 resolved to backend/packages/harness/ instead of backend/
- app_config.py: coerce config_version to int() before comparison in
_check_config_version() to prevent TypeError when YAML stores value
as string (e.g. config_version: "1")
- tests: add regression tests for both fixes
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix: update test imports from src.* to deerflow.*/app.* after harness refactor
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>