* feat(uploads): guide agent to use grep/glob/read_file for uploaded documents
Add workflow guidance to the <uploaded_files> context block so the agent
knows to use grep and glob (added in #1784) alongside read_file when
working with uploaded documents, rather than falling back to web search.
This is the final piece of the three-PR PDF agentic search pipeline:
- PR1 (#1727): pymupdf4llm converter produces structured Markdown with headings
- PR2 (#1738): document outline injected into agent context with line numbers
- PR3 (this): agent guided to use outline + grep + read_file workflow
* feat(uploads): add file-first priority and fallback guidance to uploaded_files context
* fix(uploads): handle split-bold headings and ** ** artefacts in extract_outline
- Add _clean_bold_title() to merge adjacent bold spans (** **) produced
by pymupdf4llm when bold text crosses span boundaries
- Add _SPLIT_BOLD_HEADING_RE (Style 3) to recognise **<num>** **<title>**
headings common in academic papers; excludes pure-number table headers
and rows with more than 4 bold blocks
- When outline is empty, read first 5 non-empty lines of the .md as a
content preview and surface a grep hint in the agent context
- Update _format_file_entry to render the preview + grep hint instead of
silently omitting the outline section
- Add 3 new extract_outline tests and 2 new middleware tests (65 total)
* fix(uploads): address Copilot review comments on extract_outline regex
- Replace ASCII [A-Za-z] guard with negative lookahead to support non-ASCII
titles (e.g. **1** **概述**); pure-numeric/punctuation blocks still excluded
- Replace .+ with [^*]+ and cap repetition at {0,2} (four blocks total) to
keep _SPLIT_BOLD_HEADING_RE linear and avoid ReDoS on malformed input
- Remove now-redundant len(blocks) <= 4 code-level check (enforced by regex)
- Log debug message with exc_info when preview extraction fails