Xinmin Zeng 799bef6d9d
fix(replay-e2e): match by conversation, not the living system prompt (#3436)
* fix(replay-e2e): match by conversation, not the living system prompt

The model-replay match key hashed the full input including the lead-agent
system prompt. That prompt is edited frequently (e.g. #3195 added a "File
Editing Workflow" section), so the committed fixture went stale the moment
the prompt changed on main — turning the Layer-2 render gate RED on every
unrelated PR (#3430, #3432, ...). This was a self-inflicted false positive.

Root-cause fix:
- replay_provider._canonical_messages now EXCLUDES the system message from
  the hash. The conversation (human/ai/tool) is the stable contract that
  identifies a recorded turn; the system prompt is an internal detail not
  part of the front-back contract under test. (Mirrors how open-design keys
  its mock picker on the user prompt, not the system internals.) Proven
  robust: injecting a prompt edit no longer causes a replay miss.
- Layer-1 golden was BLIND to replay misses: the gateway swallows a miss
  into an assistant error message, so the shape-only golden stayed green on
  a stale fixture. It now inspects replay_provider.replay_misses() and fails
  loud. (Layer-2 already fails on a miss.)
- Re-recorded write_read_file.ultra fixture + regenerated golden under the
  new conversation-only hash.
- Layer-2 render spec: assert the in-graph auto-title (deterministic); the
  follow-up suggestion is fired async and depends on a clean JSON model
  output, so assert it only when the fixture captured one — never gate on
  its absence (recording flakiness must not block CI).
- docs: REPLAY_E2E.md updated.

Verified: Layer-1 golden green (no miss), Layer-2 both specs green,
CI=true make test 4033 passed / 0 failed, frontend pnpm check clean.

* test(replay-e2e): restore suggestions coverage with a reliable capture

Addresses review feedback (the suggestion path was dropped from Layer-2):

- record spec now waits for the `/suggestions` response before checking
  capture stability, so the recorded fixture reliably includes the
  frontend-fired suggestions turn (previously the stability window could
  return before suggestions fired, yielding a fixture without it).
- Re-recorded write_read_file.ultra: 5 turns (write_file, auto-title,
  read_file, answer, suggestions). Golden unchanged — suggestions is a
  separate /suggestions call, not part of the /runs/stream SSE sequence.
- Layer-2 spec: restore the hard `EXPECTED_SUGGESTION` assertion. With the
  record spec now waiting for /suggestions, a fixture missing the suggestion
  turn means a broken recording and must fail loud, not pass silently.

Verified: Layer-1 golden green (no miss), Layer-2 both specs green
(auto-title + suggestion render), frontend pnpm check clean.

* ci: re-trigger (flaky Docker Hub image pull in sandbox e2e, unrelated)

backend-unit-tests failed only in test_sandbox_orphan_reconciliation_e2e.py
with 'docker pull busybox:latest ... context deadline exceeded' — a CI-runner
network flake reaching Docker Hub, not related to this docs/tests-only change.
Empty commit to re-run CI.

---------

Co-authored-by: DanielWalnut <45447813+hetaoBackend@users.noreply.github.com>
2026-06-08 17:32:41 +08:00
..

Documentation

This directory contains detailed documentation for the DeerFlow backend.

Document Description
ARCHITECTURE.md System architecture overview
API.md Complete API reference
AUTH_DESIGN.md User authentication, CSRF, and per-user isolation design
CONFIGURATION.md Configuration options
SETUP.md Quick setup guide

Feature Documentation

Document Description
STREAMING.md Token-level streaming design: Gateway vs DeerFlowClient paths, stream_mode semantics, per-id dedup
FILE_UPLOAD.md File upload functionality
PATH_EXAMPLES.md Path types and usage examples
SANDBOX_MEMORY_PROFILING.md Sandbox memory baseline and runtime comparison guide
summarization.md Context summarization feature
plan_mode_usage.md Plan mode with TodoList
AUTO_TITLE_GENERATION.md Automatic title generation

Development

Document Description
TODO.md Planned features and known issues

Getting Started

  1. New to DeerFlow? Start with SETUP.md for quick installation
  2. Configuring the system? See CONFIGURATION.md
  3. Understanding the architecture? Read ARCHITECTURE.md
  4. Building integrations? Check API.md for API reference

Document Organization

docs/
├── README.md                  # This file
├── ARCHITECTURE.md            # System architecture
├── API.md                     # API reference
├── AUTH_DESIGN.md             # User authentication and isolation design
├── CONFIGURATION.md           # Configuration guide
├── SETUP.md                   # Setup instructions
├── FILE_UPLOAD.md             # File upload feature
├── PATH_EXAMPLES.md           # Path usage examples
├── summarization.md           # Summarization feature
├── plan_mode_usage.md         # Plan mode feature
├── STREAMING.md               # Token-level streaming design
├── AUTO_TITLE_GENERATION.md   # Title generation
├── TITLE_GENERATION_IMPLEMENTATION.md  # Title implementation details
└── TODO.md                    # Roadmap and issues