* feat(eval): add report quality evaluation module
Addresses issue #773 - How to evaluate generated report quality objectively.
This module provides two evaluation approaches:
1. Automated metrics (no LLM required):
- Citation count and source diversity
- Word count compliance per report style
- Section structure validation
- Image inclusion tracking
2. LLM-as-Judge evaluation:
- Factual accuracy scoring
- Completeness assessment
- Coherence evaluation
- Relevance and citation quality checks
The combined evaluator provides a final score (1-10) and letter grade (A+ to F).
Files added:
- src/eval/__init__.py
- src/eval/metrics.py
- src/eval/llm_judge.py
- src/eval/evaluator.py
- tests/unit/eval/test_metrics.py
- tests/unit/eval/test_evaluator.py
* feat(eval): integrate report evaluation with web UI
This commit adds the web UI integration for the evaluation module:
Backend:
- Add EvaluateReportRequest/Response models in src/server/eval_request.py
- Add /api/report/evaluate endpoint to src/server/app.py
Frontend:
- Add evaluateReport API function in web/src/core/api/evaluate.ts
- Create EvaluationDialog component with grade badge, metrics display,
and optional LLM deep evaluation
- Add evaluation button (graduation cap icon) to research-block.tsx toolbar
- Add i18n translations for English and Chinese
The evaluation UI allows users to:
1. View quick metrics-only evaluation (instant)
2. Optionally run deep LLM-based evaluation for detailed analysis
3. See grade (A+ to F), score (1-10), and metric breakdown
* feat(eval): improve evaluation reliability and add LLM judge tests
- Extract MAX_REPORT_LENGTH constant in llm_judge.py for maintainability
- Add comprehensive unit tests for LLMJudge class (parse_response,
calculate_weighted_score, evaluate with mocked LLM)
- Pass reportStyle prop to EvaluationDialog for accurate evaluation criteria
- Add researchQueries store map to reliably associate queries with research
- Add getResearchQuery helper to retrieve query by researchId
- Remove unused imports in test_metrics.py
* fix(eval): use resolveServiceURL for evaluate API endpoint
The evaluateReport function was using a relative URL '/api/report/evaluate'
which sent requests to the Next.js server instead of the FastAPI backend.
Changed to use resolveServiceURL() consistent with other API functions.
* fix: improve type accuracy and React hooks in evaluation components
- Fix get_word_count_target return type from Optional[Dict] to Dict since it always returns a value via default fallback
- Fix useEffect dependency issue in EvaluationDialog using useRef to prevent unwanted re-evaluations
- Add aria-label to GradeBadge for screen reader accessibility
When editing reports, tiptap-markdown escapes special characters (*, _, [, ])
which corrupts LaTeX formulas. This fix:
1. Adds unescapeLatexInMath() function to reverse markdown escaping within
math delimiters ($...$ and 94410...94410)
2. Applies the unescape function in the editor's onChange callback to clean
the markdown before storing it
3. Adds comprehensive tests covering edge cases and round-trip scenarios
The fix ensures formulas like $(f * g)[n]$ remain unescaped when editing,
preventing display errors after save/reload.
This fix addresses the issue where math formulas become corrupted or
incorrectly displayed after editing the generated report in the editor.
**Root Cause:**
The issue occurred due to incompatibility between markdown processing
in the display component and the Tiptap editor:
1. Display component used \[\] and \(\) LaTeX delimiters
2. Tiptap Mathematics extension expects $ and 70868 delimiters
3. tiptap-markdown didn't have built-in math node serialization
4. Math syntax was lost/corrupted during editor save operations
**Solution Implemented:**
1. Created MathematicsWithMarkdown extension that adds markdown
serialization support to Tiptap's Mathematics nodes
2. Added math delimiter normalization functions:
- normalizeMathForEditor(): Converts LaTeX delimiters to $/70868
- normalizeMathForDisplay(): Standardizes all delimiters to 70868
3. Updated Markdown component to use new normalization
4. Updated ReportEditor to normalize content before loading
**Changes:**
- web/src/components/editor/math-serializer.ts (new)
- web/src/components/editor/extensions.tsx
- web/src/components/editor/index.tsx
- web/src/components/deer-flow/markdown.tsx
- web/src/core/utils/markdown.ts
- web/tests/markdown-math-editor.test.ts (new)
- web/tests/markdown-katex.test.ts
**Testing:**
- Added 15 comprehensive tests for math normalization round-trip
- All tests passing (math editor + existing katex tests)
- Verified TypeScript compilation and linting
Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>
* feat: implment backend for adjust report style
* feat: add web part
* fix test cases
* fix: fix typing
---------
Co-authored-by: Henry Li <henry1943@163.com>
* feat: local search tool call result display
* chore: add file copyright
* fix: miss edit plan interrupt feedback
* feat: disable pasting html into input box