deer-flow

mirror of https://github.com/bytedance/deer-flow.git synced 2026-04-25 11:18:22 +00:00

History

test(skills): add evaluation + trigger analysis for systematic-literature-review (#2061 )

* test(skills): add trigger eval set for systematic-literature-review skill

20 eval queries (10 should-trigger, 10 should-not-trigger) for use with
skill-creator's run_eval.py. Includes real-world SLR queries contributed
by @VANDRANKI (issue #1862 author) and edge cases for routing
disambiguation with academic-paper-review.

* test(skills): add grader expectations for SLR skill evaluation

5 eval cases with 39 expectations covering:
- Standard SLR flow (APA/BibTeX/IEEE format selection)
- Keyword extraction and search behavior
- Subagent dispatch for metadata extraction
- Report structure (themes, convergences, gaps, per-paper annotations)
- Negative case: single-paper routing to academic-paper-review
- Edge case: implicit SLR without explicit keywords

* refactor(skills): shorten SLR description for better trigger rate

Reduce description from 833 to 344 chars. Key changes:
- Lead with "systematic literature review" as primary trigger phrase
- Strengthen single-paper exclusion: "Not for single-paper tasks"
- Remove verbose example patterns that didn't improve routing

Tested with run_eval.py (10 runs/query):
- False positive "best paper on RL": 67% → 20% (improved)
- True positive explicit SLR query: ~30% (unchanged)

Low recall is a routing-layer limitation, not a description issue —
see PR description for full analysis.

* Potential fix for pull request finding

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

2026-04-10 18:02:45 +08:00

academic-paper-review

feat(skills): add academic-paper-review, code-documentation, and newsletter-generation skills (#1861 )

2026-04-05 10:19:35 +08:00

bootstrap

feat(agent):Supports custom agent and chat experience with refactoring (#957 )

2026-03-03 21:32:01 +08:00

chart-visualization

feat(agent):Supports custom agent and chat experience with refactoring (#957 )

2026-03-03 21:32:01 +08:00

claude-to-deerflow

feat: add claude-to-deerflow skill for DeerFlow API integration (#1024 )