mirror of https://github.com/bytedance/deer-flow.git synced 2026-06-09 17:12:01 +00:00

History

test(e2e): deterministic record/replay front-back contract verification (#3365 )

* test(e2e): record/replay front-back contract verification

Guards the front-back contract with a deterministic, key-free record/replay
harness (mirrors open-design's golden-trace approach):

- ReplayChatModel (tests/replay_provider.py): replays recorded LLM turns by a
  normalized hash of the model input. Strips <system-reminder>/date/uuid/tmp-path
  so one fixture replays across days and from both the browser and direct-POST
  paths; a miss raises loudly (no silent divergence).
- Recording is record-through-browser (scripts/record_gateway.py +
  build_fixture_from_jsonl.py + frontend/tests/e2e-record): a real run is driven
  through the real frontend so captured inputs match exactly what the browser
  sends; fixtures contain no API key.
- Layer 1 — backend golden (tests/test_replay_golden.py): replay through the real
  gateway, assert the SSE event sequence == committed golden.
- Layer 2 — full-stack render (frontend/tests/e2e-real-backend): real Next.js +
  real gateway (replay model) + Chromium; assert the replayed auto-title and
  follow-up suggestions render. DOM assertions are the gate; visual regression is
  a local dev gate (CI uploads the render as an artifact).
- CI (.github/workflows/replay-e2e.yml): both layers, triggered on EITHER side of
  the contract (frontend/** or backend gateway/harness/fixtures).

* test(e2e): multi-run render-order cross-stack scenario (#3352)

Guards the dangerous front-back class where a backend ordering change
silently breaks a frontend assumption while both sides' unit tests stay
green. Reproduces issue #3352: backend list_by_thread returns runs
newest-first (#2932) and the frontend prepended per-run pages, inverting
chronological order once the checkpoint no longer held the older messages.

- tests/seed_runs_router.py: test-only seeder, mounted on the replay
  gateway only when DEERFLOW_ENABLE_TEST_SEED=1 (never in the production
  app). Seeds a thread with >=2 runs + per-run message events and no
  checkpoint -- the #3352 precondition -- so the frontend per-run reload
  path is the sole source of truth and the prepend inversion is observable.
- frontend/tests/e2e-real-backend/multi-run-order.spec.ts: drives the real
  frontend against the real gateway, asserts the first run renders above
  the second. Reverting the #3354 fix turns it red.
- replay-e2e.yml: trigger on the new replay test-infra paths.
- docs: REPLAY_E2E.md cross-stack scenario section.

* test(e2e): address Copilot review on the replay harness

- Fix stale recorder references (scripts/record_traces.py ->
  scripts/record_gateway.py + scripts/build_fixture_from_jsonl.py) in
  replay_provider.py, test_replay_golden.py, _replay_fixture.py.
- MODE_CONTEXT['ultra']: thinking_enabled False -> True, mirroring the
  frontend's `context.mode !== 'flash'` (hooks.ts). It did not affect the
  hashed input (Layer 1 golden still green), but the table now matches the
  real frontend context it claims to mirror.
- replay_provider.py docstring: stop claiming memory is recorded-enabled;
  the replay config disables memory/summarization for determinism (title
  stays, as an in-graph deterministic call).
- record_gateway.py / run_replay_gateway.py: override DEER_FLOW_HOME instead
  of setdefault, so an outer value can't leak into the hermetic harness.
- record_gateway.py: clear error when DEERFLOW_RECORD_OUT is unset (was a
  bare KeyError).
- playwright.record.config.ts: forward OPENAI_*/DEERFLOW_RECORD_OUT only when
  set, so the gateway raises a clear 'missing env' error instead of getting ''.

* test(e2e): address Copilot review round 2

- seed_runs_router.py: constrain SeedMessage.role to Literal['human','ai']
  so a bad value is a clean 422 at the boundary instead of a 500
  (KeyError on _EVENT_TYPE).
- record-write-read-file.spec.ts: waitForCaptureStable now throws on
  timeout instead of returning the last count, so a truncated/partial
  recording can't pass silently.
- real-backend-render.spec.ts: guard the suggestions JSON.parse; a
  bracket-prefixed non-JSON turn falls back to '' so the existing
  not.toBe('') assertion fails clearly instead of a generic parse throw.

2026-06-08 12:35:03 +08:00

API.md

fix(security): harden MCP config endpoint (#3425 )

2026-06-08 12:21:02 +08:00

APPLE_CONTAINER.md

Fix command syntax for container image pull (#1349 )

2026-03-26 00:14:08 +08:00

ARCHITECTURE.md

docs: clarify LangGraph compatibility entrypoints (#2914 )

2026-05-12 23:15:11 +08:00

AUTH_DESIGN.md

docs: document auth design and user isolation (#2913 )

2026-05-12 23:07:11 +08:00

AUTH_TEST_DOCKER_GAP.md

docs: clean gateway runtime transition remnants (#3334 )

2026-06-02 10:03:28 +08:00

AUTH_TEST_PLAN.md

docs: clean standalone LangGraph server remnants (#3301 )

2026-05-29 11:36:45 +08:00

AUTH_UPGRADE.md

docs: clean gateway runtime transition remnants (#3334 )

2026-06-02 10:03:28 +08:00

AUTO_TITLE_GENERATION.md

docs: fix some broken links (#1864 )

2026-04-05 15:35:42 +08:00

BLOCKING_IO_DETECTION.md

fix(agents): offload UploadsMiddleware uploads scan off the event loop (#3311 )

2026-05-30 21:46:35 +08:00

CONFIGURATION.md

feat: upgrade MiniMax default model to M3 (#3357 )

2026-06-03 17:04:16 +08:00

FILE_UPLOAD.md

fix(uploads): enforce streaming upload limits in gateway (#2589 )

2026-05-01 20:19:30 +08:00

GUARDRAILS.md

fix: rename present_file to present_files in docs and prompts (#2393 )

2026-04-21 16:10:14 +08:00

MCP_SERVER.md

docs: discourage MCP filesystem workspace config (#3141 )

2026-05-22 09:19:23 +08:00

MEMORY_IMPROVEMENTS_SUMMARY.md

refactor: split backend into harness (deerflow.*) and app (app.*) (#1131 )

2026-03-14 22:55:52 +08:00

MEMORY_IMPROVEMENTS.md

fix(memory): inject stored facts into system prompt memory context (#1083 )

2026-03-13 14:37:40 +08:00

MEMORY_SETTINGS_REVIEW.md

feat: support manual add and edit for memory facts (#1538 )

2026-03-29 23:53:23 +08:00

memory-settings-sample.json

feat: support manual add and edit for memory facts (#1538 )

2026-03-29 23:53:23 +08:00

middleware-execution-flow.md

feat(loop-detection): defer warning injection (#2752 )

2026-05-21 14:36:07 +08:00

PATH_EXAMPLES.md

refactor: split backend into harness (deerflow.*) and app (app.*) (#1131 )

2026-03-14 22:55:52 +08:00

plan_mode_usage.md

refactor: split backend into harness (deerflow.*) and app (app.*) (#1131 )

2026-03-14 22:55:52 +08:00

README.md

chore: add sandbox memory profiling tools (#3249 )

2026-06-03 22:02:27 +08:00

REPLAY_E2E.md

test(e2e): deterministic record/replay front-back contract verification (#3365 )

2026-06-08 12:35:03 +08:00

rfc-create-deerflow-agent.md

feat: add create_deerflow_agent SDK entry point (Phase 1) (#1203 )

2026-03-29 15:31:18 +08:00

rfc-extract-shared-modules.md

refactor: extract shared skill installer and upload manager to harness (#1202 )

2026-03-25 16:28:33 +08:00

rfc-grep-glob-tools.md

feat(sandbox): add built-in grep and glob tools (#1784 )

2026-04-03 16:03:06 +08:00

SANDBOX_MEMORY_PROFILING.md

chore: add sandbox memory profiling tools (#3249 )

2026-06-03 22:02:27 +08:00

SETUP.md

fix(harness): resolve runtime paths from project root (#2642 )

2026-05-01 22:19:50 +08:00

STREAMING.md

fix(backend): stream DeerFlowClient AI text as token deltas (#1969 ) (#1974 )

2026-04-10 18:16:38 +08:00

summarization.md

fix(middleware): avoid rescuing non-skill tool outputs during summarization (#2458 )

2026-04-24 21:19:46 +08:00

task_tool_improvements.md

refactor: split backend into harness (deerflow.*) and app (app.*) (#1131 )

2026-03-14 22:55:52 +08:00

TITLE_GENERATION_IMPLEMENTATION.md

feat(persistence):Unified persistence layer with event store, feedback, and rebase cleanup (#2134 )

2026-04-26 11:09:55 +08:00

TODO.md

docs: clean standalone LangGraph server remnants (#3301 )

2026-05-29 11:36:45 +08:00

README.md

Documentation

This directory contains detailed documentation for the DeerFlow backend.

Quick Links

Document	Description
ARCHITECTURE.md	System architecture overview
API.md	Complete API reference
AUTH_DESIGN.md	User authentication, CSRF, and per-user isolation design
CONFIGURATION.md	Configuration options
SETUP.md	Quick setup guide

Feature Documentation

Document	Description
STREAMING.md	Token-level streaming design: Gateway vs DeerFlowClient paths, `stream_mode` semantics, per-id dedup
FILE_UPLOAD.md	File upload functionality
PATH_EXAMPLES.md	Path types and usage examples
SANDBOX_MEMORY_PROFILING.md	Sandbox memory baseline and runtime comparison guide
summarization.md	Context summarization feature
plan_mode_usage.md	Plan mode with TodoList
AUTO_TITLE_GENERATION.md	Automatic title generation

Development

Document	Description
TODO.md	Planned features and known issues

Getting Started

New to DeerFlow? Start with SETUP.md for quick installation
Configuring the system? See CONFIGURATION.md
Understanding the architecture? Read ARCHITECTURE.md
Building integrations? Check API.md for API reference

Document Organization

docs/
├── README.md                  # This file
├── ARCHITECTURE.md            # System architecture
├── API.md                     # API reference
├── AUTH_DESIGN.md             # User authentication and isolation design
├── CONFIGURATION.md           # Configuration guide
├── SETUP.md                   # Setup instructions
├── FILE_UPLOAD.md             # File upload feature
├── PATH_EXAMPLES.md           # Path usage examples
├── summarization.md           # Summarization feature
├── plan_mode_usage.md         # Plan mode feature
├── STREAMING.md               # Token-level streaming design
├── AUTO_TITLE_GENERATION.md   # Title generation
├── TITLE_GENERATION_IMPLEMENTATION.md  # Title implementation details
└── TODO.md                    # Roadmap and issues