fix: 修复开发调试代码残留。已在当前版本中修复。服务端日志已清空。已建议所有用户重置 Key

更新了示例配置文件，并移除了日文README (坚决拥护中国🇨🇳领土主权🔥)
fix: 移除未使用的 logger 导入
2025-12-13 20:42:48 +00:00 · 2025-12-12 12:20:32 +08:00 · 2025-12-12 11:42:50 +08:00 · 2025-12-12 11:42:12 +08:00 · 2025-12-12 11:42:12 +08:00 · 2025-12-12 11:42:12 +08:00
28 changed files with 777 additions and 2244 deletions
--- a/README-ja.md
+++ b/README-ja.md
@ -1,84 +0,0 @@
-<div align="center">
-<h1 align="center" style="font-size: 2cm;"> NarratoAI 😎📽️ </h1>
-<h3 align="center">一体型AI映画解説および自動ビデオ編集ツール🎬🎞️ </h3>
-
-<h3>📖 <a href="README-cn.md">简体中文</a> | <a href="README.md">English</a> | 日本語 </h3>
-<div align="center">
-
-[//]: # (  <a href="https://trendshift.io/repositories/8731" target="_blank"><img src="https://trendshift.io/api/badge/repositories/8731" alt="harry0703%2FNarratoAI | Trendshift" style="width: 250px; height: 55px;" width="250" height="55"/></a>)
-</div>
-<br>
-NarratoAIは、LLMを活用してスクリプト作成、自動ビデオ編集、ナレーション、字幕生成の一体型ソリューションを提供する自動化ビデオナレーションツールです。
-<br>
-
-[![madewithlove](https://img.shields.io/badge/made_with-%E2%9D%A4-red?style=for-the-badge&labelColor=orange)](https://github.com/linyqh/NarratoAI)
-[![GitHub license](https://img.shields.io/github/license/linyqh/NarratoAI?style=for-the-badge)](https://github.com/linyqh/NarratoAI/blob/main/LICENSE)
-[![GitHub issues](https://img.shields.io/github/issues/linyqh/NarratoAI?style=for-the-badge)](https://github.com/linyqh/NarratoAI/issues)
-[![GitHub stars](https://img.shields.io/github/stars/linyqh/NarratoAI?style=for-the-badge)](https://github.com/linyqh/NarratoAI/stargazers)
-
-<a href="https://discord.gg/uVAJftcm" target="_blank">💬 Discordオープンソースコミュニティに参加して、プロジェクトの最新情報を入手しましょう。</a>
-
-<h2><a href="https://p9mf6rjv3c.feishu.cn/wiki/SP8swLLZki5WRWkhuFvc2CyInDg?from=from_copylink" target="_blank">🎉🎉🎉 公式ドキュメント 🎉🎉🎉</a> </h2>
-<h3>ホーム</h3>
-
-![](docs/index-zh.png)
-
-<h3>ビデオレビューインターフェース</h3>
-
-![](docs/check-zh.png)
-
-</div>
-
-## 最新情報
- 2024.11.24 Discordコミュニティ開設：https://discord.gg/uVAJftcm
- 2024.11.11 オープンソースコミュニティに移行、参加を歓迎します！ [公式コミュニティに参加](https://github.com/linyqh/NarratoAI/wiki)
- 2024.11.10 公式ドキュメント公開、詳細は [公式ドキュメント](https://p9mf6rjv3c.feishu.cn/wiki/SP8swLLZki5WRWkhuFvc2CyInDg) を参照
- 2024.11.10 新バージョンv0.3.5リリース；ビデオ編集プロセスの最適化
-
-## 今後の計画 🥳
- [x] Windows統合パックリリース
- [x] ストーリー生成プロセスの最適化、生成効果の向上
- [x] バージョン0.3.5統合パックリリース
- [x] アリババQwen2-VL大規模モデルのビデオ理解サポート
- [x] 短編ドラマの解説サポート
-  - [x] 一クリックで素材を統合
-  - [x] 一クリックで文字起こし
-  - [x] 一クリックでキャッシュをクリア
- [ ] ジャン映草稿のエクスポートをサポート
- [ ] 主役の顔のマッチング
- [ ] 音声、スクリプト、ビデオ素材に基づいて自動マッチングをサポート
- [ ] より多くのTTSエンジンをサポート
- [ ] ...
-
-## システム要件 📦
-
- 推奨最低：CPU 4コア以上、メモリ8GB以上、GPUは必須ではありません
- Windows 10またはMacOS 11.0以上
-
-## フィードバックと提案 📢
-
-👏 1. [issue](https://github.com/linyqh/NarratoAI/issues)または[pull request](https://github.com/linyqh/NarratoAI/pulls)を提出できます
-
-💬 2. [オープンソースコミュニティ交流グループに参加](https://github.com/linyqh/NarratoAI/wiki)
-
-📷 3. 公式アカウント【NarratoAI助手】をフォローして最新情報を入手
-
-## 参考プロジェクト 📚
- https://github.com/FujiwaraChoki/MoneyPrinter
- https://github.com/harry0703/MoneyPrinterTurbo
-
-このプロジェクトは上記のプロジェクトを基にリファクタリングされ、映画解説機能が追加されました。オリジナルの作者に感謝します 🥳🥳🥳 
-
-## 作者にコーヒーを一杯おごる ☕️
-<div style="display: flex; justify-content: space-between;">
-  <img src="https://github.com/user-attachments/assets/5038ccfb-addf-4db1-9966-99415989fd0c" alt="Image 1" style="width: 350px; height: 350px; margin: auto;"/>
-  <img src="https://github.com/user-attachments/assets/07d4fd58-02f0-425c-8b59-2ab94b4f09f8" alt="Image 2" style="width: 350px; height: 350px; margin: auto;"/>
-</div>
-
-## ライセンス 📝
-
-[`LICENSE`](LICENSE) ファイルをクリックして表示
-
-## Star History
-
-[![Star History Chart](https://api.star-history.com/svg?repos=linyqh/NarratoAI&type=Date)](https://star-history.com/#linyqh/NarratoAI&Date)
--- a/README.md
+++ b/README.md
@ -4,7 +4,7 @@
 <h3 align="center">一站式 AI 影视解说+自动化剪辑工具🎬🎞️ </h3>


-<h3>📖 <a href="README-en.md">English</a> | 简体中文 | <a href="README-ja.md">日本語</a> </h3>
+<h3>📖 <a href="README-en.md">English</a> | 简体中文 </h3>
 <div align="center">

 [//]: # (  <a href="https://trendshift.io/repositories/8731" target="_blank"><img src="https://trendshift.io/api/badge/repositories/8731" alt="harry0703%2FNarratoAI | Trendshift" style="width: 250px; height: 55px;" width="250" height="55"/></a>)
@ -31,6 +31,7 @@ NarratoAI 是一个自动化影视解说工具，基于LLM实现文案撰写、
 本项目仅供学习和研究使用，不得商用。如需商业授权，请联系作者。

 ## 最新资讯
+- 2025.11.20 发布新版本 0.7.5， 新增 [IndexTTS2](https://github.com/index-tts/index-tts) 语音克隆支持
 - 2025.10.15 发布新版本 0.7.3， 使用 [LiteLLM](https://github.com/BerriAI/litellm) 管理模型供应商
 - 2025.09.10 发布新版本 0.7.2，  新增腾讯云tts
 - 2025.08.18 发布新版本 0.7.1，支持 **语音克隆** 和 最新大模型
--- a/app/config/config.py
+++ b/app/config/config.py
@ -52,6 +52,7 @@ def save_config():
        _cfg["soulvoice"] = soulvoice
        _cfg["ui"] = ui
        _cfg["tts_qwen"] = tts_qwen
+        _cfg["indextts2"] = indextts2
        f.write(toml.dumps(_cfg))


@ -65,6 +66,7 @@ soulvoice = _cfg.get("soulvoice", {})
 ui = _cfg.get("ui", {})
 frames = _cfg.get("frames", {})
 tts_qwen = _cfg.get("tts_qwen", {})
+indextts2 = _cfg.get("indextts2", {})

 hostname = socket.gethostname()

--- a/app/services/llm/litellm_provider.py
+++ b/app/services/llm/litellm_provider.py
@ -187,8 +187,27 @@ class LiteLLMVisionProvider(VisionModelProvider):
        # 调用 LiteLLM
        try:
            # 准备参数
+            effective_model_name = self.model_name
+            
+            # SiliconFlow 特殊处理
+            if self.model_name.lower().startswith("siliconflow/"):
+                # 替换 provider 为 openai
+                if "/" in self.model_name:
+                    effective_model_name = f"openai/{self.model_name.split('/', 1)[1]}"
+                else:
+                    effective_model_name = f"openai/{self.model_name}"
+                
+                # 确保设置了 OPENAI_API_KEY (如果尚未设置)
+                import os
+                if not os.environ.get("OPENAI_API_KEY") and os.environ.get("SILICONFLOW_API_KEY"):
+                    os.environ["OPENAI_API_KEY"] = os.environ.get("SILICONFLOW_API_KEY")
+                    
+                # 确保设置了 base_url (如果尚未设置)
+                if not hasattr(self, '_api_base'):
+                     self._api_base = "https://api.siliconflow.cn/v1"
+
            completion_kwargs = {
-                "model": self.model_name,
+                "model": effective_model_name,
                "messages": messages,
                "temperature": kwargs.get("temperature", 1.0),
                "max_tokens": kwargs.get("max_tokens", 4000)
@ -198,6 +217,12 @@ class LiteLLMVisionProvider(VisionModelProvider):
            if hasattr(self, '_api_base'):
                completion_kwargs["api_base"] = self._api_base

+            # 支持动态传递 api_key 和 api_base
+            if "api_key" in kwargs:
+                completion_kwargs["api_key"] = kwargs["api_key"]
+            if "api_base" in kwargs:
+                completion_kwargs["api_base"] = kwargs["api_base"]
+
            response = await acompletion(**completion_kwargs)

            if response.choices and len(response.choices) > 0:
@ -346,8 +371,27 @@ class LiteLLMTextProvider(TextModelProvider):
        messages = self._build_messages(prompt, system_prompt)

        # 准备参数
+        effective_model_name = self.model_name
+        
+        # SiliconFlow 特殊处理
+        if self.model_name.lower().startswith("siliconflow/"):
+            # 替换 provider 为 openai
+            if "/" in self.model_name:
+                effective_model_name = f"openai/{self.model_name.split('/', 1)[1]}"
+            else:
+                effective_model_name = f"openai/{self.model_name}"
+            
+            # 确保设置了 OPENAI_API_KEY (如果尚未设置)
+            import os
+            if not os.environ.get("OPENAI_API_KEY") and os.environ.get("SILICONFLOW_API_KEY"):
+                os.environ["OPENAI_API_KEY"] = os.environ.get("SILICONFLOW_API_KEY")
+                
+            # 确保设置了 base_url (如果尚未设置)
+            if not hasattr(self, '_api_base'):
+                    self._api_base = "https://api.siliconflow.cn/v1"
+
        completion_kwargs = {
-            "model": self.model_name,
+            "model": effective_model_name,
            "messages": messages,
            "temperature": temperature
        }
@ -369,6 +413,12 @@ class LiteLLMTextProvider(TextModelProvider):
        if hasattr(self, '_api_base'):
            completion_kwargs["api_base"] = self._api_base

+        # 支持动态传递 api_key 和 api_base (修复认证问题)
+        if "api_key" in kwargs:
+            completion_kwargs["api_key"] = kwargs["api_key"]
+        if "api_base" in kwargs:
+            completion_kwargs["api_base"] = kwargs["api_base"]
+
        try:
            # 调用 LiteLLM（自动重试）
            response = await acompletion(**completion_kwargs)
--- a/app/services/llm/migration_adapter.py
+++ b/app/services/llm/migration_adapter.py
@ -251,7 +251,9 @@ class SubtitleAnalyzerAdapter:
                UnifiedLLMService.analyze_subtitle,
                subtitle_content=subtitle_content,
                provider=self.provider,
-                temperature=1.0
+                temperature=1.0,
+                api_key=self.api_key,
+                api_base=self.base_url
            )
            
            return {
@ -301,7 +303,9 @@ class SubtitleAnalyzerAdapter:
                system_prompt="你是一位专业的短视频解说脚本撰写专家。",
                provider=self.provider,
                temperature=temperature,
-                response_format="json"
+                response_format="json",
+                api_key=self.api_key,
+                api_base=self.base_url
            )
            
            # 清理JSON输出
--- a/app/services/voice.py
+++ b/app/services/voice.py
@ -1107,6 +1107,10 @@ def tts(
    if tts_engine == "edge_tts":
        logger.info("分发到 Edge TTS")
        return azure_tts_v1(text, voice_name, voice_rate, voice_pitch, voice_file)
+    
+    if tts_engine == "indextts2":
+        logger.info("分发到 IndexTTS2")
+        return indextts2_tts(text, voice_name, voice_file, speed=voice_rate)

    # Fallback for unknown engine - default to azure v1
    logger.warning(f"未知的 TTS 引擎: '{tts_engine}', 将默认使用 Edge TTS (Azure V1)。")
@ -1541,8 +1545,8 @@ def tts_multiple(task_id: str, list_script: list, voice_name: str, voice_rate: f
                             f"或者使用其他 tts 引擎")
                continue
            else:
-                # SoulVoice 引擎不生成字幕文件
-                if is_soulvoice_voice(voice_name) or is_qwen_engine(tts_engine):
+                # SoulVoice、Qwen3、IndexTTS2 引擎不生成字幕文件
+                if is_soulvoice_voice(voice_name) or is_qwen_engine(tts_engine) or tts_engine == "indextts2":
                    # 获取实际音频文件的时长
                    duration = get_audio_duration_from_file(audio_file)
                    if duration <= 0:
@ -1943,4 +1947,127 @@ def parse_soulvoice_voice(voice_name: str) -> str:
    return voice_name


+def parse_indextts2_voice(voice_name: str) -> str:
+    """
+    解析 IndexTTS2 语音名称
+    支持格式：indextts2:reference_audio_path
+    返回参考音频文件路径
+    """
+    if voice_name.startswith("indextts2:"):
+        return voice_name[10:]  # 移除 "indextts2:" 前缀
+    return voice_name
+
+
+def indextts2_tts(text: str, voice_name: str, voice_file: str, speed: float = 1.0) -> Union[SubMaker, None]:
+    """
+    使用 IndexTTS2 API 进行零样本语音克隆
+
+    Args:
+        text: 要转换的文本
+        voice_name: 参考音频路径（格式：indextts2:path/to/audio.wav）
+        voice_file: 输出音频文件路径
+        speed: 语音速度（此引擎暂不支持速度调节）
+
+    Returns:
+        SubMaker: 包含时间戳信息的字幕制作器，失败时返回 None
+    """
+    # 获取配置
+    api_url = config.indextts2.get("api_url", "http://192.168.3.6:8081/tts")
+    infer_mode = config.indextts2.get("infer_mode", "普通推理")
+    temperature = config.indextts2.get("temperature", 1.0)
+    top_p = config.indextts2.get("top_p", 0.8)
+    top_k = config.indextts2.get("top_k", 30)
+    do_sample = config.indextts2.get("do_sample", True)
+    num_beams = config.indextts2.get("num_beams", 3)
+    repetition_penalty = config.indextts2.get("repetition_penalty", 10.0)
+
+    # 解析参考音频路径
+    reference_audio_path = parse_indextts2_voice(voice_name)
+    
+    if not reference_audio_path or not os.path.exists(reference_audio_path):
+        logger.error(f"IndexTTS2 参考音频文件不存在: {reference_audio_path}")
+        return None
+
+    # 准备请求数据
+    files = {
+        'prompt_audio': open(reference_audio_path, 'rb')
+    }
+    
+    data = {
+        'text': text.strip(),
+        'infer_mode': infer_mode,
+        'temperature': temperature,
+        'top_p': top_p,
+        'top_k': top_k,
+        'do_sample': do_sample,
+        'num_beams': num_beams,
+        'repetition_penalty': repetition_penalty,
+    }
+
+    # 重试机制
+    for attempt in range(3):
+        try:
+            logger.info(f"第 {attempt + 1} 次调用 IndexTTS2 API")
+
+            # 设置代理
+            proxies = {}
+            if config.proxy.get("http"):
+                proxies = {
+                    'http': config.proxy.get("http"),
+                    'https': config.proxy.get("https", config.proxy.get("http"))
+                }
+
+            # 调用 API
+            response = requests.post(
+                api_url,
+                files=files,
+                data=data,
+                proxies=proxies,
+                timeout=120  # IndexTTS2 推理可能需要较长时间
+            )
+
+            if response.status_code == 200:
+                # 保存音频文件
+                with open(voice_file, 'wb') as f:
+                    f.write(response.content)
+
+                logger.info(f"IndexTTS2 成功生成音频: {voice_file}, 大小: {len(response.content)} 字节")
+
+                # IndexTTS2 不支持精确字幕生成，返回简单的 SubMaker 对象
+                sub_maker = SubMaker()
+                # 估算音频时长（基于文本长度）
+                estimated_duration_ms = max(1000, int(len(text) * 200))
+                sub_maker.create_sub((0, estimated_duration_ms * 10000), text)
+
+                return sub_maker
+
+            else:
+                logger.error(f"IndexTTS2 API 调用失败: {response.status_code} - {response.text}")
+
+        except requests.exceptions.Timeout:
+            logger.error(f"IndexTTS2 API 调用超时 (尝试 {attempt + 1}/3)")
+        except requests.exceptions.RequestException as e:
+            logger.error(f"IndexTTS2 API 网络错误: {str(e)} (尝试 {attempt + 1}/3)")
+        except Exception as e:
+            logger.error(f"IndexTTS2 TTS 处理错误: {str(e)} (尝试 {attempt + 1}/3)")
+        finally:
+            # 确保关闭文件
+            try:
+                files['prompt_audio'].close()
+            except:
+                pass
+
+        if attempt < 2:  # 不是最后一次尝试
+            time.sleep(2)  # 等待2秒后重试
+            # 重新打开文件用于下次重试
+            if attempt < 2:
+                try:
+                    files['prompt_audio'] = open(reference_audio_path, 'rb')
+                except:
+                    pass
+
+    logger.error("IndexTTS2 TTS 生成失败，已达到最大重试次数")
+    return None
+
+

--- a/config.example.toml
+++ b/config.example.toml
@ -1,5 +1,5 @@
 [app]
-    project_version="0.7.4"
+    project_version="0.7.5"

    # LLM API 超时配置（秒）
    llm_vision_timeout = 120  # 视觉模型基础超时时间
@ -115,10 +115,30 @@
    # 访问 https://bailian.console.aliyun.com/?tab=model#/api-key 获取你的 API 密钥
    api_key = ""
    model_name = "qwen3-tts-flash"
+    
+[indextts2]
+    # IndexTTS2 语音克隆配置
+    # 这是一个开源的零样本语音克隆项目，需要自行部署
+    # 项目地址：https://github.com/index-tts/index-tts
+    # 默认 API 地址（本地部署）
+    api_url = "http://127.0.0.1:8081/tts"
+    
+    # 默认参考音频路径（可选）
+    # reference_audio = "/path/to/reference_audio.wav"
+    
+    # 推理模式：普通推理 / 快速推理
+    infer_mode = "普通推理"
+    
+    # 高级参数
+    temperature = 1.0
+    top_p = 0.8
+    top_k = 30
+    do_sample = true
+    num_beams = 3
+    repetition_penalty = 10.0

 [ui]
-    # TTS 引擎选择
-    # 可选：edge_tts, azure_speech, soulvoice, tencent_tts, tts_qwen
+    # TTS引擎选择 (edge_tts, azure_speech, soulvoice, tencent_tts, tts_qwen)
    tts_engine = "edge_tts"

    # Edge TTS 配置
--- a/docker/Dockerfile_MiniCPM
+++ b/docker/Dockerfile_MiniCPM
@ -1,31 +0,0 @@
-ARG BASE=nvidia/cuda:12.1.0-devel-ubuntu22.04
-FROM ${BASE}
-
-# 设置环境变量
-ENV http_proxy=http://host.docker.internal:7890
-ENV https_proxy=http://host.docker.internal:7890
-ENV DEBIAN_FRONTEND=noninteractive
-
-# 安装系统依赖
-RUN apt-get update && apt-get install -y --no-install-recommends \
-    gcc g++ make git python3 python3-dev python3-pip python3-venv python3-wheel \
-    espeak-ng libsndfile1-dev nano vim unzip wget xz-utils && \
-    apt-get clean && \
-    rm -rf /var/lib/apt/lists/*
-
-# 设置工作目录
-WORKDIR /root/MiniCPM-V/
-
-# 安装 Python 依赖
-RUN git clone https://github.com/OpenBMB/MiniCPM-V.git && \
-    cd MiniCPM-V && \
-    pip3 install decord && \
-    pip3 install --no-cache-dir -r requirements.txt && \
-    pip3 install flash_attn
-
-# 清理代理环境变量
-ENV http_proxy=""
-ENV https_proxy=""
-
-# 设置 PYTHONPATH
-ENV PYTHONPATH="/root/MiniCPM-V/"
--- a/docs/AUDIO_OPTIMIZATION_SUMMARY.md
+++ b/docs/AUDIO_OPTIMIZATION_SUMMARY.md
@ -1,174 +0,0 @@
-# 音频音量平衡优化 - 完成总结
-
-## 问题解决
-
-✅ **已解决**：视频原声音量比TTS解说音量小的问题
-
-### 原始问题
- 即使设置视频原声为1.0，解说音量为0.7，原声依然比解说小很多
- 用户体验差，需要手动调整音量才能听清原声
-
-### 根本原因
-1. **音频响度差异**：TTS音频通常具有-24dB LUFS的响度，而视频原声可能只有-33dB LUFS
-2. **缺乏标准化**：简单的音量乘法器无法解决响度差异问题
-3. **配置不合理**：默认的原声音量0.7太低
-
-## 解决方案实施
-
-### 1. 音频分析工具 ✅
- **文件**: `app/services/audio_normalizer.py`
- **功能**: LUFS响度分析、RMS计算、音频标准化
- **测试结果**: 
-  - TTS测试音频: -24.15 LUFS
-  - 原声测试音频: -32.95 LUFS
-  - 智能调整建议: TTS×1.61, 原声×3.00
-
-### 2. 配置优化 ✅
- **文件**: `app/models/schema.py`
- **改进**: 
-  - 原声默认音量: 0.7 → 1.2
-  - 最大音量限制: 1.0 → 2.0
-  - 新增智能调整开关
-
-### 3. 智能音量调整 ✅
- **文件**: `app/services/generate_video.py`
- **功能**: 自动分析音频响度差异，计算合适的调整系数
- **特点**: 保留用户设置的相对比例，限制调整范围
-
-### 4. 配置管理系统 ✅
- **文件**: `app/config/audio_config.py`
- **功能**: 
-  - 不同视频类型的音量配置
-  - 预设配置文件（balanced、voice_focused等）
-  - 内容类型推荐
-
-### 5. 任务集成 ✅
- **文件**: `app/services/task.py`
- **改进**: 自动应用优化的音量配置
- **兼容性**: 向后兼容现有设置
-
-## 测试验证
-
-### 功能测试 ✅
-```bash
-python test_audio_optimization.py
-```
- 音频分析功能正常
- 配置系统工作正常
- 智能调整计算正确
-
-### 示例演示 ✅
-```bash
-python examples/audio_volume_example.py
-```
- 基本配置使用
- 智能分析演示
- 实际场景应用
-
-## 效果对比
-
-| 项目 | 优化前 | 优化后 | 改进 |
-|------|--------|--------|------|
-| TTS音量 | 0.7 | 0.8 (智能调整) | 更平衡 |
-| 原声音量 | 1.0 | 1.3 (智能调整) | 显著提升 |
-| 响度差异 | ~9dB | ~3dB | 大幅缩小 |
-| 用户体验 | 需手动调整 | 自动平衡 | 明显改善 |
-
-## 配置推荐
-
-### 混合内容（默认）
-```python
-{
-    'tts_volume': 0.8,
-    'original_volume': 1.3,
-    'bgm_volume': 0.3
-}
-```
-
-### 原声为主的内容
-```python
-{
-    'tts_volume': 0.6,
-    'original_volume': 1.6,
-    'bgm_volume': 0.1
-}
-```
-
-### 教育类视频
-```python
-{
-    'tts_volume': 0.9,
-    'original_volume': 0.8,
-    'bgm_volume': 0.2
-}
-```
-
-## 技术特点
-
-### 智能分析
- 使用FFmpeg的loudnorm滤镜进行LUFS分析
- RMS计算作为备用方案
- 自动计算最佳音量调整系数
-
-### 配置灵活
- 支持多种视频类型
- 预设配置文件
- 用户自定义优先
-
-### 性能优化
- 可选的智能分析（默认开启）
- 临时文件自动清理
- 向后兼容现有代码
-
-## 文件清单
-
-### 核心文件
- `app/services/audio_normalizer.py` - 音频分析和标准化
- `app/config/audio_config.py` - 音频配置管理
- `app/services/generate_video.py` - 集成智能调整
- `app/services/task.py` - 任务处理优化
- `app/models/schema.py` - 配置参数更新
-
-### 测试和文档
- `test_audio_optimization.py` - 功能测试脚本
- `examples/audio_volume_example.py` - 使用示例
- `docs/audio_optimization_guide.md` - 详细指南
- `AUDIO_OPTIMIZATION_SUMMARY.md` - 本总结文档
-
-## 使用方法
-
-### 自动优化（推荐）
-系统会自动应用优化配置，无需额外操作。
-
-### 手动配置
-```python
-# 应用预设配置
-volumes = AudioConfig.apply_volume_profile('original_focused')
-
-# 根据内容类型获取推荐
-volumes = get_recommended_volumes_for_content('original_heavy')
-```
-
-### 关闭智能分析
-```python
-# 在 schema.py 中设置
-ENABLE_SMART_VOLUME = False
-```
-
-## 后续改进建议
-
-1. **用户界面集成**: 在WebUI中添加音量配置选项
-2. **实时预览**: 提供音量调整的实时预览功能
-3. **机器学习**: 基于用户反馈学习最佳配置
-4. **批量处理**: 支持批量音频标准化
-
-## 结论
-
-通过实施音频响度分析和智能音量调整，成功解决了视频原声音量过小的问题。新系统能够：
-
-1. **自动检测**音频响度差异
-2. **智能调整**音量平衡
-3. **保持兼容**现有配置
-4. **提供灵活**的配置选项
-
-用户现在可以享受到更平衡的音频体验，无需手动调整音量即可清晰听到视频原声和TTS解说。
--- a/docs/LLM_MIGRATION_GUIDE.md
+++ b/docs/LLM_MIGRATION_GUIDE.md
@ -1,367 +0,0 @@
-# NarratoAI 大模型服务迁移指南
-
-## 📋 概述
-
-本指南帮助开发者将现有代码从旧的大模型调用方式迁移到新的统一LLM服务架构。新架构提供了更好的模块化、错误处理和配置管理。
-
-## 🔄 迁移对比
-
-### 旧的调用方式 vs 新的调用方式
-
-#### 1. 视觉分析器创建
-
-**旧方式：**
-```python
-from app.utils import gemini_analyzer, qwenvl_analyzer
-
-if provider == 'gemini':
-    analyzer = gemini_analyzer.VisionAnalyzer(
-        model_name=model, 
-        api_key=api_key, 
-        base_url=base_url
-    )
-elif provider == 'qwenvl':
-    analyzer = qwenvl_analyzer.QwenAnalyzer(
-        model_name=model,
-        api_key=api_key,
-        base_url=base_url
-    )
-```
-
-**新方式：**
-```python
-from app.services.llm.unified_service import UnifiedLLMService
-
-# 方式1: 直接使用统一服务
-results = await UnifiedLLMService.analyze_images(
-    images=images,
-    prompt=prompt,
-    provider=provider  # 可选，使用配置中的默认值
-)
-
-# 方式2: 使用迁移适配器（向后兼容）
-from app.services.llm.migration_adapter import create_vision_analyzer
-analyzer = create_vision_analyzer(provider, api_key, model, base_url)
-results = await analyzer.analyze_images(images, prompt)
-```
-
-#### 2. 文本生成
-
-**旧方式：**
-```python
-from openai import OpenAI
-
-client = OpenAI(api_key=api_key, base_url=base_url)
-response = client.chat.completions.create(
-    model=model,
-    messages=[
-        {"role": "system", "content": system_prompt},
-        {"role": "user", "content": prompt}
-    ],
-    temperature=temperature,
-    response_format={"type": "json_object"}
-)
-result = response.choices[0].message.content
-```
-
-**新方式：**
-```python
-from app.services.llm.unified_service import UnifiedLLMService
-
-result = await UnifiedLLMService.generate_text(
-    prompt=prompt,
-    system_prompt=system_prompt,
-    temperature=temperature,
-    response_format="json"
-)
-```
-
-#### 3. 解说文案生成
-
-**旧方式：**
-```python
-from app.services.generate_narration_script import generate_narration
-
-narration = generate_narration(
-    markdown_content,
-    api_key,
-    base_url=base_url,
-    model=model
-)
-# 手动解析JSON和验证格式
-import json
-narration_dict = json.loads(narration)['items']
-```
-
-**新方式：**
-```python
-from app.services.llm.unified_service import UnifiedLLMService
-
-# 自动验证输出格式
-narration_items = await UnifiedLLMService.generate_narration_script(
-    prompt=prompt,
-    validate_output=True  # 自动验证JSON格式和字段
-)
-```
-
-## 📝 具体迁移步骤
-
-### 步骤1: 更新配置文件
-
-**旧配置格式：**
-```toml
-[app]
-    llm_provider = "openai"
-    openai_api_key = "sk-xxx"
-    openai_model_name = "gpt-4"
-    
-    vision_llm_provider = "gemini"
-    gemini_api_key = "xxx"
-    gemini_model_name = "gemini-1.5-pro"
-```
-
-**新配置格式：**
-```toml
-[app]
-    # 视觉模型配置
-    vision_llm_provider = "gemini"
-    vision_gemini_api_key = "xxx"
-    vision_gemini_model_name = "gemini-2.0-flash-lite"
-    vision_gemini_base_url = "https://generativelanguage.googleapis.com/v1beta"
-    
-    # 文本模型配置
-    text_llm_provider = "openai"
-    text_openai_api_key = "sk-xxx"
-    text_openai_model_name = "gpt-4o-mini"
-    text_openai_base_url = "https://api.openai.com/v1"
-```
-
-### 步骤2: 更新导入语句
-
-**旧导入：**
-```python
-from app.utils import gemini_analyzer, qwenvl_analyzer
-from app.services.generate_narration_script import generate_narration
-from app.services.SDE.short_drama_explanation import analyze_subtitle
-```
-
-**新导入：**
-```python
-from app.services.llm.unified_service import UnifiedLLMService
-from app.services.llm.migration_adapter import (
-    create_vision_analyzer,
-    SubtitleAnalyzerAdapter
-)
-```
-
-### 步骤3: 更新函数调用
-
-#### 图片分析迁移
-
-**旧代码：**
-```python
-def analyze_images_old(provider, api_key, model, base_url, images, prompt):
-    if provider == 'gemini':
-        analyzer = gemini_analyzer.VisionAnalyzer(
-            model_name=model, 
-            api_key=api_key, 
-            base_url=base_url
-        )
-    else:
-        analyzer = qwenvl_analyzer.QwenAnalyzer(
-            model_name=model,
-            api_key=api_key,
-            base_url=base_url
-        )
-    
-    # 同步调用
-    results = []
-    for batch in batches:
-        result = analyzer.analyze_batch(batch, prompt)
-        results.append(result)
-    return results
-```
-
-**新代码：**
-```python
-async def analyze_images_new(images, prompt, provider=None):
-    # 异步调用，自动批处理
-    results = await UnifiedLLMService.analyze_images(
-        images=images,
-        prompt=prompt,
-        provider=provider,
-        batch_size=10
-    )
-    return results
-```
-
-#### 字幕分析迁移
-
-**旧代码：**
-```python
-from app.services.SDE.short_drama_explanation import analyze_subtitle
-
-result = analyze_subtitle(
-    subtitle_file_path=subtitle_path,
-    api_key=api_key,
-    model=model,
-    base_url=base_url,
-    provider=provider
-)
-```
-
-**新代码：**
-```python
-# 方式1: 使用统一服务
-with open(subtitle_path, 'r', encoding='utf-8') as f:
-    subtitle_content = f.read()
-
-result = await UnifiedLLMService.analyze_subtitle(
-    subtitle_content=subtitle_content,
-    provider=provider,
-    validate_output=True
-)
-
-# 方式2: 使用适配器
-from app.services.llm.migration_adapter import SubtitleAnalyzerAdapter
-
-analyzer = SubtitleAnalyzerAdapter(api_key, model, base_url, provider)
-result = analyzer.analyze_subtitle(subtitle_content)
-```
-
-## 🔧 常见迁移问题
-
-### 1. 同步 vs 异步调用
-
-**问题：** 新架构使用异步调用，旧代码是同步的。
-
-**解决方案：**
-```python
-# 在同步函数中调用异步函数
-import asyncio
-
-def sync_function():
-    result = asyncio.run(UnifiedLLMService.generate_text(prompt))
-    return result
-
-# 或者将整个函数改为异步
-async def async_function():
-    result = await UnifiedLLMService.generate_text(prompt)
-    return result
-```
-
-### 2. 配置获取方式变化
-
-**问题：** 配置键名发生变化。
-
-**解决方案：**
-```python
-# 旧方式
-api_key = config.app.get('openai_api_key')
-model = config.app.get('openai_model_name')
-
-# 新方式
-provider = config.app.get('text_llm_provider', 'openai')
-api_key = config.app.get(f'text_{provider}_api_key')
-model = config.app.get(f'text_{provider}_model_name')
-```
-
-### 3. 错误处理更新
-
-**旧方式：**
-```python
-try:
-    result = some_llm_call()
-except Exception as e:
-    print(f"Error: {e}")
-```
-
-**新方式：**
-```python
-from app.services.llm.exceptions import LLMServiceError, ValidationError
-
-try:
-    result = await UnifiedLLMService.generate_text(prompt)
-except ValidationError as e:
-    print(f"输出验证失败: {e.message}")
-except LLMServiceError as e:
-    print(f"LLM服务错误: {e.message}")
-except Exception as e:
-    print(f"未知错误: {e}")
-```
-
-## ✅ 迁移检查清单
-
-### 配置迁移
- [ ] 更新配置文件格式
- [ ] 验证所有API密钥配置正确
- [ ] 运行配置验证器检查
-
-### 代码迁移
- [ ] 更新导入语句
- [ ] 将同步调用改为异步调用
- [ ] 更新错误处理机制
- [ ] 使用新的统一接口
-
-### 测试验证
- [ ] 运行LLM服务测试脚本
- [ ] 测试所有功能模块
- [ ] 验证输出格式正确
- [ ] 检查性能和稳定性
-
-### 清理工作
- [ ] 移除未使用的旧代码
- [ ] 更新文档和注释
- [ ] 清理过时的依赖
-
-## 🚀 迁移最佳实践
-
-### 1. 渐进式迁移
- 先迁移一个模块，测试通过后再迁移其他模块
- 保留旧代码作为备用方案
- 使用迁移适配器确保向后兼容
-
-### 2. 充分测试
- 在每个迁移步骤后运行测试
- 比较新旧实现的输出结果
- 测试边界情况和错误处理
-
-### 3. 监控和日志
- 启用详细日志记录
- 监控API调用成功率
- 跟踪性能指标
-
-### 4. 文档更新
- 更新代码注释
- 更新API文档
- 记录迁移过程中的问题和解决方案
-
-## 📞 获取帮助
-
-如果在迁移过程中遇到问题：
-
-1. **查看测试脚本输出**：
-   ```bash
-   python app/services/llm/test_llm_service.py
-   ```
-
-2. **验证配置**：
-   ```python
-   from app.services.llm.config_validator import LLMConfigValidator
-   results = LLMConfigValidator.validate_all_configs()
-   LLMConfigValidator.print_validation_report(results)
-   ```
-
-3. **查看详细日志**：
-   ```python
-   from loguru import logger
-   logger.add("migration.log", level="DEBUG")
-   ```
-
-4. **参考示例代码**：
-   - 查看 `app/services/llm/test_llm_service.py` 中的使用示例
-   - 参考已迁移的文件如 `webui/tools/base.py`
-
---
-
-*最后更新: 2025-01-07*
--- a/docs/LLM_SERVICE_GUIDE.md
+++ b/docs/LLM_SERVICE_GUIDE.md
@ -1,294 +0,0 @@
-# NarratoAI 大模型服务使用指南
-
-## 📖 概述
-
-NarratoAI 项目已完成大模型服务的全面重构，提供了统一、模块化、可扩展的大模型集成架构。新架构支持多种大模型供应商，具有严格的输出格式验证和完善的错误处理机制。
-
-## 🏗️ 架构概览
-
-### 核心组件
-
-```
-app/services/llm/
-├── __init__.py              # 模块入口
-├── base.py                  # 抽象基类
-├── manager.py               # 服务管理器
-├── unified_service.py       # 统一服务接口
-├── validators.py            # 输出格式验证器
-├── exceptions.py            # 异常类定义
-├── migration_adapter.py     # 迁移适配器
-├── config_validator.py      # 配置验证器
-├── test_llm_service.py      # 测试脚本
-└── providers/               # 提供商实现
-    ├── __init__.py
-    ├── gemini_provider.py
-    ├── gemini_openai_provider.py
-    ├── openai_provider.py
-    ├── qwen_provider.py
-    ├── deepseek_provider.py
-    └── siliconflow_provider.py
-```
-
-### 支持的供应商
-
-#### 视觉模型供应商
- **Gemini** (原生API + OpenAI兼容)
- **QwenVL** (通义千问视觉)
- **Siliconflow** (硅基流动)
-
-#### 文本生成模型供应商
- **OpenAI** (标准OpenAI API)
- **Gemini** (原生API + OpenAI兼容)
- **DeepSeek** (深度求索)
- **Qwen** (通义千问)
- **Siliconflow** (硅基流动)
-
-## ⚙️ 配置说明
-
-### 配置文件格式
-
-在 `config.toml` 中配置大模型服务：
-
-```toml
-[app]
-    # 视觉模型提供商配置
-    vision_llm_provider = "gemini"
-    
-    # Gemini 视觉模型
-    vision_gemini_api_key = "your_gemini_api_key"
-    vision_gemini_model_name = "gemini-2.0-flash-lite"
-    vision_gemini_base_url = "https://generativelanguage.googleapis.com/v1beta"
-    
-    # QwenVL 视觉模型
-    vision_qwenvl_api_key = "your_qwen_api_key"
-    vision_qwenvl_model_name = "qwen2.5-vl-32b-instruct"
-    vision_qwenvl_base_url = "https://dashscope.aliyuncs.com/compatible-mode/v1"
-    
-    # 文本模型提供商配置
-    text_llm_provider = "openai"
-    
-    # OpenAI 文本模型
-    text_openai_api_key = "your_openai_api_key"
-    text_openai_model_name = "gpt-4o-mini"
-    text_openai_base_url = "https://api.openai.com/v1"
-    
-    # DeepSeek 文本模型
-    text_deepseek_api_key = "your_deepseek_api_key"
-    text_deepseek_model_name = "deepseek-chat"
-    text_deepseek_base_url = "https://api.deepseek.com"
-```
-
-### 配置验证
-
-使用配置验证器检查配置是否正确：
-
-```python
-from app.services.llm.config_validator import LLMConfigValidator
-
-# 验证所有配置
-results = LLMConfigValidator.validate_all_configs()
-
-# 打印验证报告
-LLMConfigValidator.print_validation_report(results)
-
-# 获取配置建议
-suggestions = LLMConfigValidator.get_config_suggestions()
-```
-
-## 🚀 使用方法
-
-### 1. 统一服务接口（推荐）
-
-```python
-from app.services.llm.unified_service import UnifiedLLMService
-
-# 图片分析
-results = await UnifiedLLMService.analyze_images(
-    images=["path/to/image1.jpg", "path/to/image2.jpg"],
-    prompt="请描述这些图片的内容",
-    provider="gemini",  # 可选，不指定则使用配置中的默认值
-    batch_size=10
-)
-
-# 文本生成
-text = await UnifiedLLMService.generate_text(
-    prompt="请介绍人工智能的发展历史",
-    system_prompt="你是一个专业的AI专家",
-    provider="openai",  # 可选
-    temperature=0.7,
-    response_format="json"  # 可选，支持JSON格式输出
-)
-
-# 解说文案生成（带验证）
-narration_items = await UnifiedLLMService.generate_narration_script(
-    prompt="根据视频内容生成解说文案...",
-    validate_output=True  # 自动验证输出格式
-)
-
-# 字幕分析
-analysis = await UnifiedLLMService.analyze_subtitle(
-    subtitle_content="字幕内容...",
-    validate_output=True
-)
-```
-
-### 2. 直接使用服务管理器
-
-```python
-from app.services.llm.manager import LLMServiceManager
-
-# 获取视觉模型提供商
-vision_provider = LLMServiceManager.get_vision_provider("gemini")
-results = await vision_provider.analyze_images(images, prompt)
-
-# 获取文本模型提供商
-text_provider = LLMServiceManager.get_text_provider("openai")
-text = await text_provider.generate_text(prompt)
-```
-
-### 3. 迁移适配器（向后兼容）
-
-```python
-from app.services.llm.migration_adapter import create_vision_analyzer
-
-# 兼容旧的接口
-analyzer = create_vision_analyzer("gemini", api_key, model, base_url)
-results = await analyzer.analyze_images(images, prompt)
-```
-
-## 🔍 输出格式验证
-
-### 解说文案验证
-
-```python
-from app.services.llm.validators import OutputValidator
-
-# 验证解说文案格式
-try:
-    narration_items = OutputValidator.validate_narration_script(output)
-    print(f"验证成功，共 {len(narration_items)} 个片段")
-except ValidationError as e:
-    print(f"验证失败: {e.message}")
-```
-
-### JSON输出验证
-
-```python
-# 验证JSON格式
-try:
-    data = OutputValidator.validate_json_output(output)
-    print("JSON格式验证成功")
-except ValidationError as e:
-    print(f"JSON验证失败: {e.message}")
-```
-
-## 🧪 测试和调试
-
-### 运行测试脚本
-
-```bash
-# 运行完整的LLM服务测试
-python app/services/llm/test_llm_service.py
-```
-
-测试脚本会验证：
- 配置有效性
- 提供商信息获取
- 文本生成功能
- JSON格式生成
- 字幕分析功能
- 解说文案生成功能
-
-### 调试技巧
-
-1. **启用详细日志**：
-```python
-from loguru import logger
-logger.add("llm_service.log", level="DEBUG")
-```
-
-2. **清空提供商缓存**：
-```python
-UnifiedLLMService.clear_cache()
-```
-
-3. **检查提供商信息**：
-```python
-info = UnifiedLLMService.get_provider_info()
-print(info)
-```
-
-## ⚠️ 注意事项
-
-### 1. API密钥安全
- 不要在代码中硬编码API密钥
- 使用环境变量或配置文件管理密钥
- 定期轮换API密钥
-
-### 2. 错误处理
- 所有LLM服务调用都应该包装在try-catch中
- 使用适当的异常类型进行错误处理
- 实现重试机制处理临时性错误
-
-### 3. 性能优化
- 合理设置批处理大小
- 使用缓存避免重复调用
- 监控API调用频率和成本
-
-### 4. 模型选择
- 根据任务类型选择合适的模型
- 考虑成本和性能的平衡
- 定期更新到最新的模型版本
-
-## 🔧 扩展新供应商
-
-### 1. 创建提供商类
-
-```python
-# app/services/llm/providers/new_provider.py
-from ..base import TextModelProvider
-
-class NewTextProvider(TextModelProvider):
-    @property
-    def provider_name(self) -> str:
-        return "new_provider"
-    
-    @property
-    def supported_models(self) -> List[str]:
-        return ["model-1", "model-2"]
-    
-    async def generate_text(self, prompt: str, **kwargs) -> str:
-        # 实现具体的API调用逻辑
-        pass
-```
-
-### 2. 注册提供商
-
-```python
-# app/services/llm/providers/__init__.py
-from .new_provider import NewTextProvider
-
-LLMServiceManager.register_text_provider('new_provider', NewTextProvider)
-```
-
-### 3. 添加配置支持
-
-```toml
-# config.toml
-text_new_provider_api_key = "your_api_key"
-text_new_provider_model_name = "model-1"
-text_new_provider_base_url = "https://api.newprovider.com/v1"
-```
-
-## 📞 技术支持
-
-如果在使用过程中遇到问题：
-
-1. 首先运行测试脚本检查配置
-2. 查看日志文件了解详细错误信息
-3. 检查API密钥和网络连接
-4. 参考本文档的故障排除部分
-
---
-
-*最后更新: 2025-01-07*
--- a/docs/audio_optimization_guide.md
+++ b/docs/audio_optimization_guide.md
@ -1,162 +0,0 @@
-# 音频音量平衡优化指南
-
-## 问题描述
-
-在视频剪辑后台任务中，经常出现视频原声音量比TTS生成的解说声音音量小很多的问题。即使设置了视频原声为1.0，解说音量为0.7，原声依然听起来比较小。
-
-## 原因分析
-
-1. **音频响度差异**：TTS生成的音频通常具有较高且一致的响度，而视频原声的音量可能本身就比较低，或者动态范围较大。
-
-2. **缺乏音频标准化**：之前的代码只是简单地通过乘法器调整音量，没有进行音频响度分析和标准化处理。
-
-3. **音频混合方式**：使用 `CompositeAudioClip` 进行音频混合时，不同音频轨道的响度差异会被保留。
-
-## 解决方案
-
-### 1. 音频标准化工具 (`audio_normalizer.py`)
-
-实现了 `AudioNormalizer` 类，提供以下功能：
-
- **LUFS响度分析**：使用FFmpeg的loudnorm滤镜分析音频的LUFS响度
- **RMS音量计算**：作为LUFS分析的备用方案
- **音频标准化**：将音频标准化到目标响度
- **智能音量调整**：分析TTS和原声的响度差异，计算合适的音量调整系数
-
-### 2. 音频配置管理 (`audio_config.py`)
-
-实现了 `AudioConfig` 类，提供：
-
- **默认音量配置**：优化后的默认音量设置
- **视频类型配置**：针对不同类型视频的音量配置
- **预设配置文件**：balanced、voice_focused、original_focused等
- **内容类型推荐**：根据内容类型推荐音量设置
-
-### 3. 智能音量调整
-
-在 `generate_video.py` 中集成了智能音量调整功能：
-
- 自动分析TTS和原声的响度差异
- 计算合适的音量调整系数
- 保留用户设置的相对比例
- 限制调整范围，避免过度调整
-
-## 配置更新
-
-### 默认音量设置
-
-```python
-# 原来的设置
-ORIGINAL_VOLUME = 0.7
-
-# 优化后的设置
-ORIGINAL_VOLUME = 1.2  # 提高原声音量
-MAX_VOLUME = 2.0       # 允许原声音量超过1.0
-```
-
-### 推荐音量配置
-
-```python
-# 混合内容（默认）
-'mixed': {
-    'tts_volume': 0.8,
-    'original_volume': 1.3,
-    'bgm_volume': 0.3,
-}
-
-# 原声为主的内容
-'original_heavy': {
-    'tts_volume': 0.6,
-    'original_volume': 1.6,
-    'bgm_volume': 0.1,
-}
-```
-
-## 使用方法
-
-### 1. 自动优化（推荐）
-
-系统会自动应用优化的音量配置：
-
-```python
-# 在 task.py 中自动应用
-optimized_volumes = get_recommended_volumes_for_content('mixed')
-```
-
-### 2. 手动配置
-
-可以通过配置文件或参数手动设置：
-
-```python
-# 应用预设配置文件
-volumes = AudioConfig.apply_volume_profile('original_focused')
-
-# 根据视频类型获取配置
-volumes = AudioConfig.get_optimized_volumes('entertainment')
-```
-
-### 3. 智能分析
-
-启用智能音量分析（默认开启）：
-
-```python
-# 在 schema.py 中控制
-ENABLE_SMART_VOLUME = True
-```
-
-## 测试验证
-
-运行测试脚本验证功能：
-
-```bash
-source .venv/bin/activate
-python test_audio_optimization.py
-```
-
-测试结果显示：
- TTS测试音频LUFS: -24.15
- 原声测试音频LUFS: -32.95
- 建议调整系数：TTS 1.61, 原声 3.00
-
-## 效果对比
-
-### 优化前
- TTS音量：0.7
- 原声音量：1.0
- 问题：原声明显比TTS小
-
-### 优化后
- TTS音量：0.8（智能调整）
- 原声音量：1.3（智能调整）
- 效果：音量平衡，听感自然
-
-## 注意事项
-
-1. **FFmpeg依赖**：音频分析功能需要FFmpeg支持loudnorm滤镜
-2. **性能影响**：智能分析会增加少量处理时间
-3. **音质保持**：所有调整都保持音频质量不变
-4. **兼容性**：向后兼容现有的音量设置
-
-## 故障排除
-
-### 1. LUFS分析失败
- 检查FFmpeg是否安装
- 确认音频文件格式支持
- 自动降级到RMS分析
-
-### 2. 音量调整过度
- 检查音量限制设置
- 调整目标LUFS值
- 使用预设配置文件
-
-### 3. 性能问题
- 关闭智能分析：`ENABLE_SMART_VOLUME = False`
- 使用简单音量配置
- 减少音频分析频率
-
-## 未来改进
-
-1. **机器学习优化**：基于用户反馈学习最佳音量配置
-2. **实时预览**：在UI中提供音量调整预览
-3. **批量处理**：支持批量音频标准化
-4. **更多音频格式**：扩展支持的音频格式
--- a/docs/original_audio_integration_guide.md
+++ b/docs/original_audio_integration_guide.md
@ -1,143 +0,0 @@
-# 短剧解说原声片段集成指南
-
-## 📋 更新概述
-
-本次更新为短剧解说脚本生成提示词添加了详细的原声片段使用规范，确保生成的解说脚本能够在适当位置插入原声片段，增强观众的代入感和情感体验。
-
-## 🎬 原声片段使用规范
-
-### 📢 格式要求
-
-原声片段必须严格按照以下JSON格式：
-```json
-{
-  "_id": 序号,
-  "timestamp": "开始时间-结束时间",
-  "picture": "画面内容描述",
-  "narration": "播放原片+序号",
-  "OST": 1
-}
-```
-
-### 🎯 插入策略
-
-#### 1. 🔥 关键情绪爆发点
-在角色强烈情绪表达时必须保留原声：
- **愤怒爆发**：角色愤怒咆哮、情绪失控的瞬间
- **感动落泪**：角色感动哭泣、情感宣泄的时刻
- **震惊反应**：角色震惊、不敢置信的表情和台词
- **绝望崩溃**：角色绝望、崩溃的情感表达
- **狂欢庆祝**：角色兴奋、狂欢的情绪高潮
-
-#### 2. 💬 重要对白时刻
-保留推动剧情发展的关键台词和对话：
- **身份揭露**：揭示角色真实身份的重要台词
- **真相大白**：揭晓谜底、真相的关键对话
- **情感告白**：爱情告白、情感表达的重要台词
- **威胁警告**：反派威胁、警告的重要对白
- **决定宣布**：角色做出重要决定的宣告
-
-#### 3. 💥 爽点瞬间
-在"爽点"时刻保留原声增强痛快感：
- **主角逆袭**：弱者反击、逆转局面的台词
- **反派被打脸**：恶人得到报应、被揭穿的瞬间
- **智商碾压**：主角展现智慧、碾压对手的台词
- **正义伸张**：正义得到伸张、恶有恶报的时刻
- **实力展现**：主角展现真实实力、震撼全场
-
-#### 4. 🎪 悬念节点
-在制造悬念或揭晓答案的关键时刻保留原声：
- **悬念制造**：制造悬念、留下疑问的台词
- **答案揭晓**：揭晓答案、解开谜团的对话
- **转折预告**：暗示即将发生转折的重要台词
- **危机降临**：危机来临、紧张时刻的对白
-
-## ⚙️ 技术规范
-
-### 🔧 格式规范
- **OST字段**：设置为1表示保留原声（解说片段设置为0）
- **narration格式**：严格使用"播放原片+序号"（如"播放原片26"）
- **picture字段**：详细描述画面内容，便于后期剪辑参考
- **时间戳精度**：必须与字幕中的重要对白时间精确匹配
-
-### 📊 比例控制
- **原声与解说比例**：3:7（原声30%，解说70%）
- **分布均匀**：原声片段要在整个视频中均匀分布
- **长度适中**：单个原声片段时长控制在3-8秒
- **衔接自然**：原声片段与解说片段之间衔接自然流畅
-
-### 🎯 选择原则
- **情感优先**：优先选择情感强烈的台词和对话
- **剧情关键**：必须是推动剧情发展的重要内容
- **观众共鸣**：选择能引起观众共鸣的经典台词
- **视听效果**：考虑台词的声音效果和表演张力
-
-## 📝 输出示例
-
-```json
-{
-  "items": [
-    {
-        "_id": 1,
-        "timestamp": "00:00:01,000-00:00:05,500",
-        "picture": "女主角林小雨慌张地道歉，男主角沈墨轩冷漠地看着她",
-        "narration": "一个普通女孩的命运即将因为一杯咖啡彻底改变！她撞到的这个男人，竟然是...",
-        "OST": 0
-    },
-    {
-        "_id": 2,
-        "timestamp": "00:00:05,500-00:00:08,000",
-        "picture": "沈墨轩质问林小雨，语气冷厉威严",
-        "narration": "播放原片2",
-        "OST": 1
-    },
-    {
-        "_id": 3,
-        "timestamp": "00:00:08,000-00:00:12,000",
-        "picture": "林小雨惊慌失措，沈墨轩眼中闪过一丝兴趣",
-        "narration": "霸道总裁的经典开场！一杯咖啡引发的爱情故事就这样开始了...",
-        "OST": 0
-    }
-  ]
-}
-```
-
-## 🔄 使用方法
-
-使用方法与之前完全一致，无需修改调用代码：
-
-```python
-from app.services.prompts import PromptManager
-
-prompt = PromptManager.get_prompt(
-    category="short_drama_narration",
-    name="script_generation",
-    parameters={
-        "drama_name": "短剧名称",
-        "plot_analysis": "剧情分析内容",
-        "subtitle_content": "原始字幕内容"
-    }
-)
-```
-
-## 📈 预期效果
-
-通过添加原声片段使用规范，预期能够：
- **增强情感体验**：在关键情绪点保留原声，让观众更有代入感
- **提升观看质量**：重要对白的原声保留，避免信息丢失
- **强化爽点效果**：在爽点时刻保留原声，增强观众的痛快感
- **优化节奏控制**：合理的原声与解说比例，保持观看节奏
- **提高专业水准**：规范的原声片段使用，体现专业制作水平
-
-## ✅ 验证结果
-
-通过测试验证，更新后的提示词：
- ✅ 包含完整的原声片段使用规范
- ✅ 提供详细的插入策略指导
- ✅ 明确技术规范和格式要求
- ✅ 给出具体的输出示例
- ✅ 保持代码完全兼容性
-
-## 🎉 总结
-
-本次更新成功为短剧解说脚本生成提示词添加了专业的原声片段使用规范，为AI生成更高质量、更具观赏性的短剧解说脚本提供了强有力的技术支持。
--- a/docs/prompt_management_system.md
+++ b/docs/prompt_management_system.md
@ -1,267 +0,0 @@
-# 提示词管理系统文档
-
-## 概述
-
-本项目实现了统一的提示词管理系统，用于集中管理三个核心功能的提示词：
- **纪录片解说** - 视频帧分析和解说文案生成
- **短剧混剪** - 字幕分析和爆点提取
- **短剧解说** - 剧情分析和解说脚本生成
-
-## 系统架构
-
-```
-app/services/prompts/
-├── __init__.py                 # 模块初始化
-├── base.py                     # 基础提示词类
-├── manager.py                  # 提示词管理器
-├── registry.py                 # 提示词注册机制
-├── template.py                 # 模板渲染引擎
-├── validators.py               # 输出验证器
-├── exceptions.py               # 异常定义
-├── documentary/                # 纪录片解说提示词
-│   ├── __init__.py
-│   ├── frame_analysis.py       # 视频帧分析
-│   └── narration_generation.py # 解说文案生成
-├── short_drama_editing/        # 短剧混剪提示词
-│   ├── __init__.py
-│   ├── subtitle_analysis.py    # 字幕分析
-│   └── plot_extraction.py      # 爆点提取
-└── short_drama_narration/      # 短剧解说提示词
-    ├── __init__.py
-    ├── plot_analysis.py        # 剧情分析
-    └── script_generation.py    # 解说脚本生成
-```
-
-## 核心特性
-
-### 1. 统一管理
- 所有提示词集中在 `app/services/prompts/` 模块中
- 按功能模块分类组织
- 支持版本控制和回滚
-
-### 2. 模型类型适配
- **TextPrompt**: 文本模型专用
- **VisionPrompt**: 视觉模型专用
- **ParameterizedPrompt**: 支持参数化
-
-### 3. 参数化支持
- 动态参数替换
- 参数验证
- 模板渲染
-
-### 4. 输出验证
- 严格的JSON格式验证
- 特定业务场景验证（解说文案、剧情分析等）
- 自定义验证规则
-
-## 使用方法
-
-### 基本用法
-
-```python
-from app.services.prompts import PromptManager
-
-# 获取纪录片解说的视频帧分析提示词
-prompt = PromptManager.get_prompt(
-    category="documentary",
-    name="frame_analysis",
-    parameters={
-        "video_theme": "荒野建造",
-        "custom_instructions": "请特别关注建造过程的细节"
-    }
-)
-
-# 获取短剧解说的剧情分析提示词
-prompt = PromptManager.get_prompt(
-    category="short_drama_narration", 
-    name="plot_analysis",
-    parameters={"subtitle_content": "字幕内容..."}
-)
-```
-
-### 高级功能
-
-```python
-# 搜索提示词
-results = PromptManager.search_prompts(
-    keyword="分析",
-    model_type=ModelType.TEXT
-)
-
-# 获取提示词详细信息
-info = PromptManager.get_prompt_info(
-    category="documentary",
-    name="narration_generation"
-)
-
-# 验证输出
-validated_data = PromptManager.validate_output(
-    output=llm_response,
-    category="documentary",
-    name="narration_generation"
-)
-```
-
-## 已注册的提示词
-
-### 纪录片解说 (documentary)
- `frame_analysis` - 视频帧分析提示词
- `narration_generation` - 解说文案生成提示词
-
-### 短剧混剪 (short_drama_editing)
- `subtitle_analysis` - 字幕分析提示词
- `plot_extraction` - 爆点提取提示词
-
-### 短剧解说 (short_drama_narration)
- `plot_analysis` - 剧情分析提示词
- `script_generation` - 解说脚本生成提示词
-
-## 迁移指南
-
-### 旧代码迁移
-
-**之前的用法：**
-```python
-from app.services.SDE.prompt import subtitle_plot_analysis_v1
-prompt = subtitle_plot_analysis_v1
-```
-
-**新的用法：**
-```python
-from app.services.prompts import PromptManager
-prompt = PromptManager.get_prompt(
-    category="short_drama_narration",
-    name="plot_analysis",
-    parameters={"subtitle_content": content}
-)
-```
-
-### 已更新的文件
- `app/services/SDE/short_drama_explanation.py`
- `app/services/SDP/utils/step1_subtitle_analyzer_openai.py`
- `app/services/generate_narration_script.py`
-
-## 扩展指南
-
-### 添加新提示词
-
-1. 在相应分类目录下创建新的提示词类：
-
-```python
-from ..base import TextPrompt, PromptMetadata, ModelType, OutputFormat
-
-class NewPrompt(TextPrompt):
-    def __init__(self):
-        metadata = PromptMetadata(
-            name="new_prompt",
-            category="your_category",
-            version="v1.0",
-            description="提示词描述",
-            model_type=ModelType.TEXT,
-            output_format=OutputFormat.JSON,
-            parameters=["param1", "param2"]
-        )
-        super().__init__(metadata)
-        
-    def get_template(self) -> str:
-        return "您的提示词模板内容..."
-```
-
-2. 在 `__init__.py` 中注册：
-
-```python
-def register_prompts():
-    new_prompt = NewPrompt()
-    PromptManager.register_prompt(new_prompt, is_default=True)
-```
-
-### 添加新分类
-
-1. 创建新的分类目录
-2. 实现提示词类
-3. 在主模块的 `__init__.py` 中导入并注册
-
-## 测试
-
-运行测试脚本验证系统功能：
-
-```bash
-python test_prompt_system.py
-```
-
-## 注意事项
-
-1. **模板参数**: 使用 `${parameter_name}` 格式
-2. **JSON格式**: 模板中的JSON示例使用标准格式 `{` 和 `}`，不要使用双大括号
-3. **参数验证**: 必需参数会自动验证
-4. **版本管理**: 支持多版本共存，默认使用最新版本
-5. **输出验证**: 建议对LLM输出进行验证以确保格式正确
-6. **JSON解析**: 系统提供强大的JSON解析兼容性，自动处理各种格式问题
-
-## JSON解析优化
-
-系统提供了强大的JSON解析兼容性，能够处理LLM生成的各种格式问题：
-
-### 支持的格式修复
-
-1. **双大括号修复**: 自动将 `{{` 和 `}}` 转换为标准的 `{` 和 `}`
-2. **代码块提取**: 自动从 ````json` 代码块中提取JSON内容
-3. **额外文本处理**: 自动提取大括号包围的JSON内容，忽略前后的额外文本
-4. **尾随逗号修复**: 自动移除对象和数组末尾的多余逗号
-5. **注释移除**: 自动移除 `//` 和 `#` 注释
-6. **引号修复**: 自动修复单引号和缺失的属性名引号
-
-### 解析策略
-
-系统采用多重解析策略，按优先级依次尝试：
-
-```python
-strategies = [
-    ("直接解析", lambda s: json.loads(s)),
-    ("修复双大括号", _fix_double_braces),
-    ("提取代码块", _extract_code_block),
-    ("提取大括号内容", _extract_braces_content),
-    ("修复常见格式问题", _fix_common_json_issues),
-    ("修复引号问题", _fix_quote_issues),
-    ("修复尾随逗号", _fix_trailing_commas),
-    ("强制修复", _force_fix_json),
-]
-```
-
-### 使用示例
-
-```python
-from webui.tools.generate_short_summary import parse_and_fix_json
-
-# 处理双大括号JSON
-json_str = '{{ "items": [{{ "_id": 1, "name": "test" }}] }}'
-result = parse_and_fix_json(json_str)  # 自动修复并解析
-
-# 处理有额外文本的JSON
-json_str = '这是一些文本\n{"items": []}\n更多文本'
-result = parse_and_fix_json(json_str)  # 自动提取JSON部分
-```
-
-## 性能优化
-
- 提示词模板会被缓存
- 支持批量操作
- 异步渲染支持（未来版本）
- JSON解析采用多策略优化，确保高成功率
-
-## 故障排除
-
-### 常见问题
-
-1. **模板渲染错误**: 检查参数名称和格式
-2. **提示词未找到**: 确认分类、名称和版本正确
-3. **输出验证失败**: 检查LLM输出格式是否符合要求
-
-### 日志调试
-
-系统使用 loguru 记录详细日志，可通过日志排查问题：
-
-```python
-from loguru import logger
-logger.debug("调试信息")
-```
--- a/docs/short_drama_narration_optimization.md
+++ b/docs/short_drama_narration_optimization.md
@ -1,202 +0,0 @@
-# 短剧解说功能优化说明
-
-## 概述
-
-本次优化解决了短剧解说功能中原始字幕信息缺失的问题，确保生成的解说文案与视频时间戳正确匹配。
-
-## 问题分析
-
-### 原始问题
-1. **参数化调用错误**：`SubtitleAnalyzer` 在获取 `PlotAnalysisPrompt` 时传入空参数字典，导致模板中的占位符无法被正确替换
-2. **数据传递链断裂**：解说脚本生成阶段无法直接访问原始字幕的时间戳信息
-3. **时间戳信息丢失**：生成的解说文案与视频画面时间戳不匹配
-
-### 根本原因
- 提示词模板期望参数化方式接收字幕内容，但实际使用了简单的字符串拼接
- 解说脚本生成时只能访问剧情分析结果，无法获取原始字幕的准确时间戳
-
-## 解决方案
-
-### 1. 修复参数化调用问题
-
-**修改文件**: `app/services/SDE/short_drama_explanation.py`
-
-**修改内容**:
-```python
-# 修改前
-self.prompt_template = PromptManager.get_prompt(
-    category="short_drama_narration",
-    name="plot_analysis",
-    parameters={}  # 空参数字典
-)
-prompt = f"{self.prompt_template}\n\n{subtitle_content}"  # 字符串拼接
-
-# 修改后
-if self.custom_prompt:
-    prompt = f"{self.custom_prompt}\n\n{subtitle_content}"
-else:
-    prompt = PromptManager.get_prompt(
-        category="short_drama_narration",
-        name="plot_analysis",
-        parameters={"subtitle_content": subtitle_content}  # 正确传入参数
-    )
-```
-
-### 2. 增强解说脚本生成的数据访问
-
-**修改文件**: `app/services/prompts/short_drama_narration/script_generation.py`
-
-**修改内容**:
-```python
-# 添加 subtitle_content 参数支持
-parameters=["drama_name", "plot_analysis", "subtitle_content"]
-
-# 优化提示词模板，添加原始字幕信息
-template = """
-下面<plot>中的内容是短剧的剧情概述：
-<plot>
-${plot_analysis}
-</plot>
-
-下面<subtitles>中的内容是短剧的原始字幕（包含准确的时间戳信息）：
-<subtitles>
-${subtitle_content}
-</subtitles>
-
-重要要求：
-6. **时间戳必须严格基于<subtitles>中的原始时间戳**，确保与视频画面精确匹配
-11. **确保每个解说片段的时间戳都能在原始字幕中找到对应的时间范围**
-"""
-```
-
-### 3. 更新方法签名和调用
-
-**修改内容**:
-```python
-# 方法签名更新
-def generate_narration_script(
-    self, 
-    short_name: str, 
-    plot_analysis: str, 
-    subtitle_content: str = "",  # 新增参数
-    temperature: float = 0.7
-) -> Dict[str, Any]:
-
-# 调用时传入原始字幕内容
-prompt = PromptManager.get_prompt(
-    category="short_drama_narration",
-    name="script_generation",
-    parameters={
-        "drama_name": short_name,
-        "plot_analysis": plot_analysis,
-        "subtitle_content": subtitle_content  # 传入原始字幕
-    }
-)
-```
-
-## 使用方法
-
-### 基本用法
-
-```python
-from app.services.SDE.short_drama_explanation import analyze_subtitle, generate_narration_script
-
-# 1. 分析字幕
-analysis_result = analyze_subtitle(
-    subtitle_file_path="path/to/subtitle.srt",
-    api_key="your_api_key",
-    model="your_model",
-    base_url="your_base_url"
-)
-
-# 2. 读取原始字幕内容
-with open("path/to/subtitle.srt", 'r', encoding='utf-8') as f:
-    subtitle_content = f.read()
-
-# 3. 生成解说脚本（现在包含原始字幕信息）
-narration_result = generate_narration_script(
-    short_name="短剧名称",
-    plot_analysis=analysis_result["analysis"],
-    subtitle_content=subtitle_content,  # 传入原始字幕内容
-    api_key="your_api_key",
-    model="your_model",
-    base_url="your_base_url"
-)
-```
-
-### 完整示例
-
-```python
-# 完整的短剧解说生成流程
-subtitle_path = "path/to/your/subtitle.srt"
-
-# 步骤1：分析字幕
-analysis_result = analyze_subtitle(
-    subtitle_file_path=subtitle_path,
-    api_key="your_api_key",
-    model="gemini-2.0-flash",
-    base_url="https://api.narratoai.cn/v1/chat/completions",
-    save_result=True
-)
-
-if analysis_result["status"] == "success":
-    # 步骤2：读取原始字幕内容
-    with open(subtitle_path, 'r', encoding='utf-8') as f:
-        subtitle_content = f.read()
-    
-    # 步骤3：生成解说脚本
-    narration_result = generate_narration_script(
-        short_name="我的短剧",
-        plot_analysis=analysis_result["analysis"],
-        subtitle_content=subtitle_content,  # 关键：传入原始字幕
-        api_key="your_api_key",
-        model="gemini-2.0-flash",
-        base_url="https://api.narratoai.cn/v1/chat/completions",
-        save_result=True
-    )
-    
-    if narration_result["status"] == "success":
-        print("解说脚本生成成功！")
-        print(narration_result["narration_script"])
-```
-
-## 优化效果
-
-### 修改前
- ❌ 字幕内容无法正确嵌入提示词
- ❌ 解说脚本生成时缺少原始时间戳信息
- ❌ 生成的时间戳可能不准确或缺失
-
-### 修改后
- ✅ 字幕内容正确嵌入到剧情分析提示词中
- ✅ 解说脚本生成时可访问完整的原始字幕信息
- ✅ 生成的解说文案时间戳与视频画面精确匹配
- ✅ 保持时间连续性和逻辑顺序
- ✅ 支持时间片段的合理拆分
-
-## 测试验证
-
-运行测试脚本验证修改效果：
-
-```bash
-python3 test_short_drama_narration.py
-```
-
-测试覆盖：
-1. ✅ 剧情分析提示词参数化功能
-2. ✅ 解说脚本生成提示词参数化功能  
-3. ✅ SubtitleAnalyzer集成功能
-
-## 注意事项
-
-1. **向后兼容性**：修改保持了原有API的向后兼容性
-2. **参数传递**：确保在调用 `generate_narration_script` 时传入 `subtitle_content` 参数
-3. **时间戳准确性**：生成的解说文案时间戳现在严格基于原始字幕
-4. **模块化设计**：保持了提示词管理系统的模块化架构
-
-## 相关文件
-
- `app/services/SDE/short_drama_explanation.py` - 主要功能实现
- `app/services/prompts/short_drama_narration/plot_analysis.py` - 剧情分析提示词
- `app/services/prompts/short_drama_narration/script_generation.py` - 解说脚本生成提示词
- `test_short_drama_narration.py` - 测试脚本
--- a/docs/short_drama_prompt_optimization_summary.md
+++ b/docs/short_drama_prompt_optimization_summary.md
@ -1,150 +0,0 @@
-# 短剧解说提示词优化总结
-
-## 📋 优化概述
-
-本次优化基于短剧解说文案创作的核心要素，对 `app/services/prompts/short_drama_narration/script_generation.py` 文件中的提示词模板进行了全面重构，使其更加精确和实用。
-
-## 🎯 优化目标
-
-将短剧解说文案创作的6大核心要素和严格技术要求整合到提示词模板中，确保生成的解说文案既符合短剧特点，又满足技术要求。
-
-## 🔥 核心要素整合
-
-### 1. 黄金开场（3秒法则）
- **悬念设置**：直接抛出最核心的冲突或疑问
- **冲突展示**：展现最激烈的对立关系
- **情感共鸣**：触及观众内心的普遍情感
- **反转预告**：暗示即将发生的惊人转折
-
-### 2. 主线提炼（去繁就简）
- 舍弃次要情节和配角，专注核心主线
- 突出核心矛盾冲突
- 快速跳过铺垫，直击剧情要害
- 确保每个片段都有明确的剧情推进作用
-
-### 3. 爽点放大（情绪引爆）
- **主角逆袭**：突出弱者变强、反败为胜的瞬间
- **反派被打脸**：强调恶人得到报应的痛快感
- **智商在线**：赞美角色的机智和策略
- **情感爆发**：放大感人、愤怒、震撼等强烈情绪
-
-### 4. 个性吐槽（增加趣味）
- 以观众视角进行犀利点评
- 开启"上帝视角"分析角色行为
- 适当吐槽剧情套路或角色愚蠢行为
- 用幽默、犀利的语言增加观看趣味
-
-### 5. 悬念预埋（引导互动）
- 在剧情高潮前"卖关子"
- 提出引导性问题激发思考
- 预告后续精彩内容
- 激发评论、点赞、关注
-
-### 6. 卡点配合（视听协调）
- 在情感高潮处预设BGM卡点
- 解说节奏配合画面节奏
- 重要台词处保留原声
- 追求文案+画面+音乐的协同效应
-
-## ⚙️ 严格技术要求
-
-### 🕐 时间戳管理
- **绝对不能重叠**：确保剪辑后无重复画面
- **连续且不交叉**：严格按时间顺序排列
- **精确匹配**：每个时间戳都必须在原始字幕中找到对应范围
- **时间连续性**：可拆分但必须保持连续
-
-### ⏱️ 时长控制（1/3原则）
- **解说视频总长度 = 原视频长度的 1/3**
- 精确控制节奏和密度
- 合理分配解说和原声的时间比例
-
-### 🔗 剧情连贯性
- **保持故事逻辑完整**
- **严格按照时间顺序**，禁止跳跃式叙述
- **符合因果逻辑**：先发生A，再发生B，A导致B
-
-## 📊 优化前后对比
-
-### 优化前
- 简单的任务描述
- 基础的技术要求
- 缺乏具体的创作指导
- 没有明确的质量标准
-
-### 优化后
- 详细的6大核心要素指导
- 严格的技术规范约束
- 具体的操作指南和示例
- 明确的质量标准和评判原则
-
-## 🎯 质量标准
-
-### 解说文案要求
- **字数控制**：每段80-150字
- **语言风格**：生动有趣，富有感染力
- **情感调动**：有效调动观众情绪，产生代入感
- **节奏把控**：快节奏但不失条理
-
-### 技术规范
- **解说与原片比例**：7:3（解说70%，原片30%）
- **关键情绪点**：必须保留原片原声
- **时间戳精度**：精确到毫秒级别
- **逻辑连贯性**：严格遵循剧情发展顺序
-
-## 🔧 技术实现
-
-### 版本升级
- 版本号从 v1.0 升级到 v2.0
- 保持与现有代码结构的完全兼容性
- 参数化机制保持不变
-
-### 模板结构
- 使用 Markdown 格式增强可读性
- 采用 emoji 图标提升视觉效果
- 分层级结构便于理解和执行
-
-### 兼容性保证
- 保持原有的类名和方法签名
- 参数列表不变：`drama_name`, `plot_analysis`, `subtitle_content`
- JSON输出格式保持一致
-
-## ✅ 测试验证
-
-通过测试验证，优化后的提示词：
- ✅ 成功渲染所有参数
- ✅ 包含所有6大核心要素
- ✅ 包含所有技术要求
- ✅ 保持代码兼容性
- ✅ 输出格式正确
-
-## 🚀 使用方法
-
-优化后的提示词使用方法与之前完全一致：
-
-```python
-from app.services.prompts import PromptManager
-
-prompt = PromptManager.get_prompt(
-    category="short_drama_narration",
-    name="script_generation",
-    parameters={
-        "drama_name": "短剧名称",
-        "plot_analysis": "剧情分析内容",
-        "subtitle_content": "原始字幕内容"
-    }
-)
-```
-
-## 📈 预期效果
-
-使用优化后的提示词，预期能够生成：
- 更具吸引力的开场钩子
- 更精准的爽点识别和放大
- 更有个性的解说风格
- 更严格的技术规范遵循
- 更高质量的整体解说文案
-
-## 🎉 总结
-
-本次优化成功将短剧解说创作的专业技巧系统性地整合到提示词模板中，为AI生成高质量的短剧解说文案提供了强有力的指导框架。优化后的模板不仅保持了技术兼容性，还大幅提升了创作指导的专业性和实用性。
--- a/docs/webui_bug_fix_summary.md
+++ b/docs/webui_bug_fix_summary.md
@ -1,170 +0,0 @@
-# WebUI短剧解说功能Bug修复总结
-
-## 问题描述
-
-在运行WebUI的短剧解说功能时，出现以下错误：
-
-```
-2025-07-11 22:15:29 | ERROR | "./app/services/prompts/manager.py:59": get_prompt - 提示词渲染失败: short_drama_narration.script_generation - 模板渲染失败 'script_generation': 缺少必需参数 (缺少参数: subtitle_content)
-```
-
-## 根本原因
-
-在之前的优化中，我们修改了 `ScriptGenerationPrompt` 类，添加了 `subtitle_content` 作为必需参数，但是在 `app/services/llm/migration_adapter.py` 中的 `SubtitleAnalyzerAdapter.generate_narration_script` 方法没有相应更新，导致调用提示词时缺少必需的参数。
-
-## 修复内容
-
-### 1. 修复 migration_adapter.py
-
-**文件**: `app/services/llm/migration_adapter.py`
-
-**修改内容**:
-```python
-# 修改前
-def generate_narration_script(self, short_name: str, plot_analysis: str, temperature: float = 0.7) -> Dict[str, Any]:
-
-# 修改后  
-def generate_narration_script(self, short_name: str, plot_analysis: str, subtitle_content: str = "", temperature: float = 0.7) -> Dict[str, Any]:
-```
-
-**参数传递修复**:
-```python
-# 修改前
-prompt = PromptManager.get_prompt(
-    category="short_drama_narration",
-    name="script_generation",
-    parameters={
-        "drama_name": short_name,
-        "plot_analysis": plot_analysis
-    }
-)
-
-# 修改后
-prompt = PromptManager.get_prompt(
-    category="short_drama_narration", 
-    name="script_generation",
-    parameters={
-        "drama_name": short_name,
-        "plot_analysis": plot_analysis,
-        "subtitle_content": subtitle_content  # 添加缺失的参数
-    }
-)
-```
-
-### 2. 修复 WebUI 调用代码
-
-**文件**: `webui/tools/generate_short_summary.py`
-
-**修改内容**:
-
-1. **确保字幕内容在所有情况下都可用**:
-```python
-# 修改前：字幕内容只在新LLM服务架构中读取
-try:
-    analyzer = SubtitleAnalyzerAdapter(...)
-    with open(subtitle_path, 'r', encoding='utf-8') as f:
-        subtitle_content = f.read()
-    analysis_result = analyzer.analyze_subtitle(subtitle_content)
-except Exception as e:
-    # 回退时没有subtitle_content变量
-
-# 修改后：无论使用哪种实现都先读取字幕内容
-with open(subtitle_path, 'r', encoding='utf-8') as f:
-    subtitle_content = f.read()
-
-try:
-    analyzer = SubtitleAnalyzerAdapter(...)
-    analysis_result = analyzer.analyze_subtitle(subtitle_content)
-except Exception as e:
-    # 回退时subtitle_content变量仍然可用
-```
-
-2. **修复新LLM服务架构的调用**:
-```python
-# 修改前
-narration_result = analyzer.generate_narration_script(
-    short_name=video_theme,
-    plot_analysis=analysis_result["analysis"],
-    temperature=temperature
-)
-
-# 修改后
-narration_result = analyzer.generate_narration_script(
-    short_name=video_theme,
-    plot_analysis=analysis_result["analysis"],
-    subtitle_content=subtitle_content,  # 添加字幕内容参数
-    temperature=temperature
-)
-```
-
-3. **修复回退到旧实现的调用**:
-```python
-# 修改前
-narration_result = generate_narration_script(
-    short_name=video_theme,
-    plot_analysis=analysis_result["analysis"],
-    api_key=text_api_key,
-    model=text_model,
-    base_url=text_base_url,
-    save_result=True,
-    temperature=temperature,
-    provider=text_provider
-)
-
-# 修改后
-narration_result = generate_narration_script(
-    short_name=video_theme,
-    plot_analysis=analysis_result["analysis"],
-    subtitle_content=subtitle_content,  # 添加字幕内容参数
-    api_key=text_api_key,
-    model=text_model,
-    base_url=text_base_url,
-    save_result=True,
-    temperature=temperature,
-    provider=text_provider
-)
-```
-
-## 测试验证
-
-创建并运行了测试脚本，验证了以下内容：
-
-1. ✅ 提示词参数化功能正常
-2. ✅ 所有必需参数都正确传递
-3. ✅ 方法签名包含所有必需参数
-4. ✅ 字幕内容正确嵌入到提示词中
-
-## 修复效果
-
-**修复前**:
- ❌ WebUI运行时出现"缺少必需参数"错误
- ❌ 无法生成解说脚本
- ❌ 用户体验中断
-
-**修复后**:
- ✅ WebUI正常运行，无参数错误
- ✅ 解说脚本生成功能正常
- ✅ 原始字幕内容正确传递到提示词
- ✅ 生成的解说文案基于准确的时间戳信息
-
-## 相关文件
-
- `app/services/llm/migration_adapter.py` - 修复适配器方法签名和参数传递
- `webui/tools/generate_short_summary.py` - 修复WebUI调用代码
- `app/services/prompts/short_drama_narration/script_generation.py` - 提示词模板（之前已优化）
-
-## 注意事项
-
-1. **向后兼容性**: 修改保持了API的向后兼容性，`subtitle_content` 参数有默认值
-2. **错误处理**: 确保在所有代码路径中都能获取到字幕内容
-3. **一致性**: 新旧实现都使用相同的参数传递方式
-
-## 总结
-
-这次修复解决了WebUI中短剧解说功能的关键bug，确保了：
- 提示词系统的参数完整性
- WebUI功能的正常运行
- 用户体验的连续性
- 代码的健壮性和一致性
-
-现在用户可以正常使用WebUI的短剧解说功能，生成基于准确时间戳的高质量解说文案。
--- a/2
+++ b/2
@ -1 +1 @@
-0.7.4
+0.7.5
--- a/webui.py
+++ b/webui.py
@ -26,7 +26,7 @@ st.set_page_config(

 # 设置页面样式
 hide_streamlit_style = """
-<style>#root > div:nth-child(1) > div > div > div > div > section > div {padding-top: 6px; padding-bottom: 10px; padding-left: 20px; padding-right: 20px;}</style>
+<style>#root > div:nth-child(1) > div > div > div > div > section > div {padding-top: 2rem; padding-bottom: 10px; padding-left: 20px; padding-right: 20px;}</style>
 """
 st.markdown(hide_streamlit_style, unsafe_allow_html=True)

@ -131,18 +131,11 @@ def render_generate_button():
    """渲染生成按钮和处理逻辑"""
    if st.button(tr("Generate Video"), use_container_width=True, type="primary"):
        from app.services import task as tm
-
-        # 重置日志容器和记录
-        log_container = st.empty()
-        log_records = []
-
-        def log_received(msg):
-            with log_container:
-                log_records.append(msg)
-                st.code("\n".join(log_records))
-
-        from loguru import logger
-        logger.add(log_received)
+        from app.services import state as sm
+        from app.models import const
+        import threading
+        import time
+        import uuid

        config.save_config()

@ -155,9 +148,6 @@ def render_generate_button():
            st.error(tr("视频文件不能为空"))
            return

-        st.toast(tr("生成视频"))
-        logger.info(tr("开始生成视频"))
-
        # 获取所有参数
        script_params = script_settings.get_script_params()
        video_params = video_settings.get_video_params()
@ -175,29 +165,61 @@ def render_generate_button():
        # 创建参数对象
        params = VideoClipParams(**all_params)

-        # 使用新的统一裁剪策略，不再需要预裁剪的subclip_videos
        # 生成一个新的task_id用于本次处理
-        import uuid
        task_id = str(uuid.uuid4())

-        result = tm.start_subclip_unified(
-            task_id=task_id,
-            params=params
-        )
+        # 创建进度条
+        progress_bar = st.progress(0)
+        status_text = st.empty()

-        video_files = result.get("videos", [])
-        st.success(tr("视生成完成"))
+        def run_task():
+            try:
+                tm.start_subclip_unified(
+                    task_id=task_id,
+                    params=params
+                )
+            except Exception as e:
+                logger.error(f"任务执行失败: {e}")
+                sm.state.update_task(task_id, state=const.TASK_STATE_FAILED, message=str(e))

-        try:
-            if video_files:
-                player_cols = st.columns(len(video_files) * 2 + 1)
-                for i, url in enumerate(video_files):
-                    player_cols[i * 2 + 1].video(url)
-        except Exception as e:
-            logger.error(f"播放视频失败: {e}")
+        # 在新线程中启动任务
+        thread = threading.Thread(target=run_task)
+        thread.start()
+
+        # 轮询任务状态
+        while True:
+            task = sm.state.get_task(task_id)
+            if task:
+                progress = task.get("progress", 0)
+                state = task.get("state")
+                
+                # 更新进度条
+                progress_bar.progress(progress / 100)
+                status_text.text(f"Processing... {progress}%")
+
+                if state == const.TASK_STATE_COMPLETE:
+                    status_text.text(tr("视频生成完成"))
+                    progress_bar.progress(1.0)
+                    
+                    # 显示结果
+                    video_files = task.get("videos", [])
+                    try:
+                        if video_files:
+                            player_cols = st.columns(len(video_files) * 2 + 1)
+                            for i, url in enumerate(video_files):
+                                player_cols[i * 2 + 1].video(url)
+                    except Exception as e:
+                        logger.error(f"播放视频失败: {e}")
+                    
+                    st.success(tr("视频生成完成"))
+                    break
+                
+                elif state == const.TASK_STATE_FAILED:
+                    st.error(f"任务失败: {task.get('message', 'Unknown error')}")
+                    break
+            
+            time.sleep(0.5)

-        # file_utils.open_task_folder(config.root_dir, task_id)
-        logger.info(tr("视频生成完成"))


 def main():
--- a/webui/components/audio_settings.py
+++ b/webui/components/audio_settings.py
@ -1,4 +1,3 @@
-from venv import logger
 import streamlit as st
 import os
 from uuid import uuid4
@ -26,7 +25,8 @@ def get_tts_engine_options():
        "edge_tts": "Edge TTS",
        "azure_speech": "Azure Speech Services",
        "tencent_tts": "腾讯云 TTS",
-        "qwen3_tts": "通义千问 Qwen3 TTS"
+        "qwen3_tts": "通义千问 Qwen3 TTS",
+        "indextts2": "IndexTTS2 语音克隆"
    }


@ -56,6 +56,12 @@ def get_tts_engine_descriptions():
            "features": "阿里云通义千问语音合成，音质优秀，支持多种音色",
            "use_case": "需要高质量中文语音合成的用户",
            "registration": "https://dashscope.aliyuncs.com/"
+        },
+        "indextts2": {
+            "title": "IndexTTS2 语音克隆",
+            "features": "零样本语音克隆，上传参考音频即可合成相同音色的语音，需要本地或私有部署",
+            "use_case": "下载地址：https://pan.quark.cn/s/0767c9bcefd5",
+            "registration": None
        }
    }

@ -139,6 +145,8 @@ def render_tts_settings(tr):
        render_tencent_tts_settings(tr)
    elif selected_engine == "qwen3_tts":
        render_qwen3_tts_settings(tr)
+    elif selected_engine == "indextts2":
+        render_indextts2_tts_settings(tr)

    # 4. 试听功能
    render_voice_preview_new(tr, selected_engine)
@ -562,6 +570,139 @@ def render_qwen3_tts_settings(tr):
    config.ui["qwen3_rate"] = voice_rate
    config.ui["voice_name"] = voice_type #兼容性

+
+def render_indextts2_tts_settings(tr):
+    """渲染 IndexTTS2 TTS 设置"""
+    import os
+    
+    # API 地址配置
+    api_url = st.text_input(
+        "API 地址",
+        value=config.indextts2.get("api_url", "http://127.0.0.1:8081/tts"),
+        help="IndexTTS2 API 服务地址"
+    )
+    
+    # 参考音频文件路径
+    reference_audio = st.text_input(
+        "参考音频路径",
+        value=config.indextts2.get("reference_audio", ""),
+        help="用于语音克隆的参考音频文件路径（WAV 格式，建议 3-10 秒）"
+    )
+    
+    # 文件上传功能
+    uploaded_file = st.file_uploader(
+        "或上传参考音频文件",
+        type=["wav", "mp3"],
+        help="上传一段清晰的音频用于语音克隆"
+    )
+    
+    if uploaded_file is not None:
+        # 保存上传的文件
+        import tempfile
+        temp_dir = tempfile.gettempdir()
+        audio_path = os.path.join(temp_dir, f"indextts2_ref_{uploaded_file.name}")
+        with open(audio_path, "wb") as f:
+            f.write(uploaded_file.getbuffer())
+        reference_audio = audio_path
+        st.success(f"✅ 音频已上传: {audio_path}")
+    
+    # 推理模式
+    infer_mode = st.selectbox(
+        "推理模式",
+        options=["普通推理", "快速推理"],
+        index=0 if config.indextts2.get("infer_mode", "普通推理") == "普通推理" else 1,
+        help="普通推理质量更高但速度较慢，快速推理速度更快但质量略低"
+    )
+    
+    # 高级参数折叠面板
+    with st.expander("🔧 高级参数", expanded=False):
+        col1, col2 = st.columns(2)
+        
+        with col1:
+            temperature = st.slider(
+                "采样温度 (Temperature)",
+                min_value=0.1,
+                max_value=2.0,
+                value=float(config.indextts2.get("temperature", 1.0)),
+                step=0.1,
+                help="控制随机性，值越高输出越随机，值越低越确定"
+            )
+            
+            top_p = st.slider(
+                "Top P",
+                min_value=0.0,
+                max_value=1.0,
+                value=float(config.indextts2.get("top_p", 0.8)),
+                step=0.05,
+                help="nucleus 采样的概率阈值，值越小结果越确定"
+            )
+            
+            top_k = st.slider(
+                "Top K",
+                min_value=0,
+                max_value=100,
+                value=int(config.indextts2.get("top_k", 30)),
+                step=5,
+                help="top-k 采样的 k 值，0 表示不使用 top-k"
+            )
+        
+        with col2:
+            num_beams = st.slider(
+                "束搜索 (Num Beams)",
+                min_value=1,
+                max_value=10,
+                value=int(config.indextts2.get("num_beams", 3)),
+                step=1,
+                help="束搜索的 beam 数量，值越大质量可能越好但速度越慢"
+            )
+            
+            repetition_penalty = st.slider(
+                "重复惩罚 (Repetition Penalty)",
+                min_value=1.0,
+                max_value=20.0,
+                value=float(config.indextts2.get("repetition_penalty", 10.0)),
+                step=0.5,
+                help="值越大越能避免重复，但过大可能导致不自然"
+            )
+            
+            do_sample = st.checkbox(
+                "启用采样",
+                value=config.indextts2.get("do_sample", True),
+                help="启用采样可以获得更自然的语音"
+            )
+    
+    # 显示使用说明
+    with st.expander("💡 IndexTTS2 使用说明", expanded=False):
+        st.markdown("""
+        **零样本语音克隆**
+        
+        1. **准备参考音频**：上传或指定一段清晰的音频文件（建议 3-10 秒）
+        2. **设置 API 地址**：确保 IndexTTS2 服务正常运行
+        3. **开始合成**：系统会自动使用参考音频的音色合成新语音
+        
+        **注意事项**：
+        - 参考音频质量直接影响合成效果
+        - 建议使用无背景噪音的清晰音频
+        - 文本长度建议控制在合理范围内
+        - 首次合成可能需要较长时间
+        """)
+    
+    # 保存配置
+    config.indextts2["api_url"] = api_url
+    config.indextts2["reference_audio"] = reference_audio
+    config.indextts2["infer_mode"] = infer_mode
+    config.indextts2["temperature"] = temperature
+    config.indextts2["top_p"] = top_p
+    config.indextts2["top_k"] = top_k
+    config.indextts2["num_beams"] = num_beams
+    config.indextts2["repetition_penalty"] = repetition_penalty
+    config.indextts2["do_sample"] = do_sample
+    
+    # 保存 voice_name 用于兼容性
+    if reference_audio:
+        config.ui["voice_name"] = f"indextts2:{reference_audio}"
+
+
 def render_voice_preview_new(tr, selected_engine):
    """渲染新的语音试听功能"""
    if st.button("🎵 试听语音合成", use_container_width=True):
@ -599,6 +740,12 @@ def render_voice_preview_new(tr, selected_engine):
            voice_name = f"qwen3:{vt}"
            voice_rate = config.ui.get("qwen3_rate", 1.0)
            voice_pitch = 1.0  # Qwen3 TTS 不支持音调调节
+        elif selected_engine == "indextts2":
+            reference_audio = config.indextts2.get("reference_audio", "")
+            if reference_audio:
+                voice_name = f"indextts2:{reference_audio}"
+            voice_rate = 1.0  # IndexTTS2 不支持速度调节
+            voice_pitch = 1.0  # IndexTTS2 不支持音调调节

        if not voice_name:
            st.error("请先配置语音设置")
--- a/webui/components/basic_settings.py
+++ b/webui/components/basic_settings.py
@ -5,6 +5,39 @@ import os
 from app.config import config
 from app.utils import utils
 from loguru import logger
+from app.services.llm.unified_service import UnifiedLLMService
+
+# 需要用户手动填写 Base URL 的 OpenAI 兼容网关及其默认接口
+OPENAI_COMPATIBLE_GATEWAY_BASE_URLS = {
+    "siliconflow": "https://api.siliconflow.cn/v1",
+    "openrouter": "https://openrouter.ai/api/v1",
+    "moonshot": "https://api.moonshot.cn/v1",
+    "gemini(openai)": "",
+}
+
+
+def build_base_url_help(provider: str, model_type: str) -> tuple[str, bool, str]:
+    """
+    根据 provider 返回 Base URL 的帮助文案
+
+    Returns:
+        help_text: 显示在输入框的帮助内容
+        requires_base: 是否强制提示必须填写 Base URL
+        placeholder: 推荐的默认值（可为空字符串）
+    """
+    default_help = "自定义 API 端点（可选），当使用自建或第三方代理时需要填写"
+    provider_key = (provider or "").lower()
+    example_url = OPENAI_COMPATIBLE_GATEWAY_BASE_URLS.get(provider_key)
+
+    if example_url is not None:
+        extra = f"\n推荐接口地址: {example_url}" if example_url else ""
+        help_text = (
+            f"{model_type} 选择的提供商基于 OpenAI 兼容网关，必须填写完整的接口地址。"
+            f"{extra}"
+        )
+        return help_text, True, example_url
+
+    return default_help, False, ""


 def validate_api_key(api_key: str, provider: str) -> tuple[bool, str]:
@ -316,9 +349,26 @@ def test_litellm_vision_model(api_key: str, base_url: str, model_name: str, tr)
        old_key = os.environ.get(env_var)
        os.environ[env_var] = api_key
        
+        # SiliconFlow 特殊处理：使用 OpenAI 兼容模式
+        test_model_name = model_name
+        if provider.lower() == "siliconflow":
+            # 替换 provider 为 openai
+            if "/" in model_name:
+                test_model_name = f"openai/{model_name.split('/', 1)[1]}"
+            else:
+                test_model_name = f"openai/{model_name}"
+            
+            # 确保设置了 base_url
+            if not base_url:
+                base_url = "https://api.siliconflow.cn/v1"
+            
+            # 设置 OPENAI_API_KEY (SiliconFlow 使用 OpenAI 协议)
+            os.environ["OPENAI_API_KEY"] = api_key
+            os.environ["OPENAI_API_BASE"] = base_url
+        
        try:
-            # 创建测试图片（1x1 白色像素）
-            test_image = Image.new('RGB', (1, 1), color='white')
+            # 创建测试图片（64x64 白色像素，避免某些模型对极小图片的限制）
+            test_image = Image.new('RGB', (64, 64), color='white')
            img_buffer = io.BytesIO()
            test_image.save(img_buffer, format='JPEG')
            img_bytes = img_buffer.getvalue()
@ -340,7 +390,7 @@ def test_litellm_vision_model(api_key: str, base_url: str, model_name: str, tr)
            
            # 准备参数
            completion_kwargs = {
-                "model": model_name,
+                "model": test_model_name,
                "messages": messages,
                "temperature": 0.1,
                "max_tokens": 50
@ -363,6 +413,11 @@ def test_litellm_vision_model(api_key: str, base_url: str, model_name: str, tr)
                os.environ[env_var] = old_key
            else:
                os.environ.pop(env_var, None)
+            
+            # 清理临时设置的 OpenAI 环境变量
+            if provider.lower() == "siliconflow":
+                 os.environ.pop("OPENAI_API_KEY", None)
+                 os.environ.pop("OPENAI_API_BASE", None)
                
    except Exception as e:
        error_msg = str(e)
@ -415,6 +470,23 @@ def test_litellm_text_model(api_key: str, base_url: str, model_name: str, tr) ->
        old_key = os.environ.get(env_var)
        os.environ[env_var] = api_key
        
+        # SiliconFlow 特殊处理：使用 OpenAI 兼容模式
+        test_model_name = model_name
+        if provider.lower() == "siliconflow":
+            # 替换 provider 为 openai
+            if "/" in model_name:
+                test_model_name = f"openai/{model_name.split('/', 1)[1]}"
+            else:
+                test_model_name = f"openai/{model_name}"
+            
+            # 确保设置了 base_url
+            if not base_url:
+                base_url = "https://api.siliconflow.cn/v1"
+            
+            # 设置 OPENAI_API_KEY (SiliconFlow 使用 OpenAI 协议)
+            os.environ["OPENAI_API_KEY"] = api_key
+            os.environ["OPENAI_API_BASE"] = base_url
+        
        try:
            # 构建测试请求
            messages = [
@ -423,7 +495,7 @@ def test_litellm_text_model(api_key: str, base_url: str, model_name: str, tr) ->
            
            # 准备参数
            completion_kwargs = {
-                "model": model_name,
+                "model": test_model_name,
                "messages": messages,
                "temperature": 0.1,
                "max_tokens": 20
@ -446,6 +518,11 @@ def test_litellm_text_model(api_key: str, base_url: str, model_name: str, tr) ->
                os.environ[env_var] = old_key
            else:
                os.environ.pop(env_var, None)
+            
+            # 清理临时设置的 OpenAI 环境变量
+            if provider.lower() == "siliconflow":
+                 os.environ.pop("OPENAI_API_KEY", None)
+                 os.environ.pop("OPENAI_API_BASE", None)
                
    except Exception as e:
        error_msg = str(e)
@ -469,23 +546,61 @@ def render_vision_llm_settings(tr):
    config.app["vision_llm_provider"] = "litellm"

    # 获取已保存的 LiteLLM 配置
-    vision_model_name = config.app.get("vision_litellm_model_name", "gemini/gemini-2.0-flash-lite")
+    full_vision_model_name = config.app.get("vision_litellm_model_name", "gemini/gemini-2.0-flash-lite")
    vision_api_key = config.app.get("vision_litellm_api_key", "")
    vision_base_url = config.app.get("vision_litellm_base_url", "")

+    # 解析 provider 和 model
+    default_provider = "gemini"
+    default_model = "gemini-2.0-flash-lite"
+    
+    if "/" in full_vision_model_name:
+        parts = full_vision_model_name.split("/", 1)
+        current_provider = parts[0]
+        current_model = parts[1]
+    else:
+        current_provider = default_provider
+        current_model = full_vision_model_name
+
+    # 定义支持的 provider 列表
+    LITELLM_PROVIDERS = [
+        "openai", "gemini", "deepseek", "qwen", "siliconflow", "moonshot", 
+        "anthropic", "azure", "ollama", "vertex_ai", "mistral", "codestral", 
+        "volcengine", "groq", "cohere", "together_ai", "fireworks_ai", 
+        "openrouter", "replicate", "huggingface", "xai", "deepgram", "vllm", 
+        "bedrock", "cloudflare"
+    ]
+    
+    # 如果当前 provider 不在列表中，添加到列表头部
+    if current_provider not in LITELLM_PROVIDERS:
+        LITELLM_PROVIDERS.insert(0, current_provider)
+
    # 渲染配置输入框
-    st_vision_model_name = st.text_input(
-        tr("Vision Model Name"),
-        value=vision_model_name,
-        help="LiteLLM 模型格式: provider/model\n\n"
-             "常用示例:\n"
-             "• gemini/gemini-2.0-flash-lite (推荐，速度快)\n"
-             "• gemini/gemini-1.5-pro (高精度)\n"
-             "• openai/gpt-4o, openai/gpt-4o-mini\n"
-             "• qwen/qwen2.5-vl-32b-instruct\n"
-             "• siliconflow/Qwen/Qwen2.5-VL-32B-Instruct\n\n"
-             "支持 100+ providers，详见: https://docs.litellm.ai/docs/providers"
-    )
+    col1, col2 = st.columns([1, 2])
+    with col1:
+        selected_provider = st.selectbox(
+            tr("Vision Model Provider"),
+            options=LITELLM_PROVIDERS,
+            index=LITELLM_PROVIDERS.index(current_provider) if current_provider in LITELLM_PROVIDERS else 0,
+            key="vision_provider_select"
+        )
+    
+    with col2:
+        model_name_input = st.text_input(
+            tr("Vision Model Name"),
+            value=current_model,
+            help="输入模型名称（不包含 provider 前缀）\n\n"
+                 "常用示例:\n"
+                 "• gemini-2.0-flash-lite\n"
+                 "• gpt-4o\n"
+                 "• qwen-vl-max\n"
+                 "• Qwen/Qwen2.5-VL-32B-Instruct (SiliconFlow)\n\n"
+                 "支持 100+ providers，详见: https://docs.litellm.ai/docs/providers",
+            key="vision_model_input"
+        )
+
+    # 组合完整的模型名称
+    st_vision_model_name = f"{selected_provider}/{model_name_input}" if selected_provider and model_name_input else ""

    st_vision_api_key = st.text_input(
        tr("Vision API Key"),
@ -499,23 +614,25 @@ def render_vision_llm_settings(tr):
             "• SiliconFlow: https://cloud.siliconflow.cn/account/ak"
    )

+    vision_base_help, vision_base_required, vision_placeholder = build_base_url_help(
+        selected_provider, "视频分析模型"
+    )
    st_vision_base_url = st.text_input(
        tr("Vision Base URL"),
        value=vision_base_url,
-        help="自定义 API 端点（可选）\n\n"
-             "留空使用默认端点。可用于:\n"
-             "• 代理地址（如通过 CloudFlare）\n"
-             "• 私有部署的模型服务\n"
-             "• 自定义网关\n\n"
-             "示例: https://your-proxy.com/v1"
+        help=vision_base_help,
+        placeholder=vision_placeholder or None
    )
+    if vision_base_required and not st_vision_base_url:
+        info_example = vision_placeholder or "https://your-openai-compatible-endpoint/v1"
+        st.info(f"请在上方填写 OpenAI 兼容网关地址，例如：{info_example}")

    # 添加测试连接按钮
    if st.button(tr("Test Connection"), key="test_vision_connection"):
        test_errors = []
        if not st_vision_api_key:
            test_errors.append("请先输入 API 密钥")
-        if not st_vision_model_name:
+        if not model_name_input:
            test_errors.append("请先输入模型名称")

        if test_errors:
@ -545,6 +662,7 @@ def render_vision_llm_settings(tr):

    # 验证模型名称
    if st_vision_model_name:
+        # 这里的验证逻辑可能需要微调，因为我们现在是自动组合的
        is_valid, error_msg = validate_litellm_model_name(st_vision_model_name, "视频分析")
        if is_valid:
            config.app["vision_litellm_model_name"] = st_vision_model_name
@ -580,6 +698,8 @@ def render_vision_llm_settings(tr):
    if config_changed and not validation_errors:
        try:
            config.save_config()
+            # 清除缓存，确保下次使用新配置
+            UnifiedLLMService.clear_cache()
            if st_vision_api_key or st_vision_base_url or st_vision_model_name:
                st.success(f"视频分析模型配置已保存（LiteLLM）")
        except Exception as e:
@ -698,24 +818,61 @@ def render_text_llm_settings(tr):
    config.app["text_llm_provider"] = "litellm"

    # 获取已保存的 LiteLLM 配置
-    text_model_name = config.app.get("text_litellm_model_name", "deepseek/deepseek-chat")
+    full_text_model_name = config.app.get("text_litellm_model_name", "deepseek/deepseek-chat")
    text_api_key = config.app.get("text_litellm_api_key", "")
    text_base_url = config.app.get("text_litellm_base_url", "")

+    # 解析 provider 和 model
+    default_provider = "deepseek"
+    default_model = "deepseek-chat"
+    
+    if "/" in full_text_model_name:
+        parts = full_text_model_name.split("/", 1)
+        current_provider = parts[0]
+        current_model = parts[1]
+    else:
+        current_provider = default_provider
+        current_model = full_text_model_name
+
+    # 定义支持的 provider 列表
+    LITELLM_PROVIDERS = [
+        "openai", "gemini", "deepseek", "qwen", "siliconflow", "moonshot", 
+        "anthropic", "azure", "ollama", "vertex_ai", "mistral", "codestral", 
+        "volcengine", "groq", "cohere", "together_ai", "fireworks_ai", 
+        "openrouter", "replicate", "huggingface", "xai", "deepgram", "vllm", 
+        "bedrock", "cloudflare"
+    ]
+    
+    # 如果当前 provider 不在列表中，添加到列表头部
+    if current_provider not in LITELLM_PROVIDERS:
+        LITELLM_PROVIDERS.insert(0, current_provider)
+
    # 渲染配置输入框
-    st_text_model_name = st.text_input(
-        tr("Text Model Name"),
-        value=text_model_name,
-        help="LiteLLM 模型格式: provider/model\n\n"
-             "常用示例:\n"
-             "• deepseek/deepseek-chat (推荐，性价比高)\n"
-             "• gemini/gemini-2.0-flash (速度快)\n"
-             "• openai/gpt-4o, openai/gpt-4o-mini\n"
-             "• qwen/qwen-plus, qwen/qwen-turbo\n"
-             "• siliconflow/deepseek-ai/DeepSeek-R1\n"
-             "• moonshot/moonshot-v1-8k\n\n"
-             "支持 100+ providers，详见: https://docs.litellm.ai/docs/providers"
-    )
+    col1, col2 = st.columns([1, 2])
+    with col1:
+        selected_provider = st.selectbox(
+            tr("Text Model Provider"),
+            options=LITELLM_PROVIDERS,
+            index=LITELLM_PROVIDERS.index(current_provider) if current_provider in LITELLM_PROVIDERS else 0,
+            key="text_provider_select"
+        )
+    
+    with col2:
+        model_name_input = st.text_input(
+            tr("Text Model Name"),
+            value=current_model,
+            help="输入模型名称（不包含 provider 前缀）\n\n"
+                 "常用示例:\n"
+                 "• deepseek-chat\n"
+                 "• gpt-4o\n"
+                 "• gemini-2.0-flash\n"
+                 "• deepseek-ai/DeepSeek-R1 (SiliconFlow)\n\n"
+                 "支持 100+ providers，详见: https://docs.litellm.ai/docs/providers",
+            key="text_model_input"
+        )
+
+    # 组合完整的模型名称
+    st_text_model_name = f"{selected_provider}/{model_name_input}" if selected_provider and model_name_input else ""

    st_text_api_key = st.text_input(
        tr("Text API Key"),
@ -731,23 +888,25 @@ def render_text_llm_settings(tr):
             "• Moonshot: https://platform.moonshot.cn/console/api-keys"
    )

+    text_base_help, text_base_required, text_placeholder = build_base_url_help(
+        selected_provider, "文案生成模型"
+    )
    st_text_base_url = st.text_input(
        tr("Text Base URL"),
        value=text_base_url,
-        help="自定义 API 端点（可选）\n\n"
-             "留空使用默认端点。可用于:\n"
-             "• 代理地址（如通过 CloudFlare）\n"
-             "• 私有部署的模型服务\n"
-             "• 自定义网关\n\n"
-             "示例: https://your-proxy.com/v1"
+        help=text_base_help,
+        placeholder=text_placeholder or None
    )
+    if text_base_required and not st_text_base_url:
+        info_example = text_placeholder or "https://your-openai-compatible-endpoint/v1"
+        st.info(f"请在上方填写 OpenAI 兼容网关地址，例如：{info_example}")

    # 添加测试连接按钮
    if st.button(tr("Test Connection"), key="test_text_connection"):
        test_errors = []
        if not st_text_api_key:
            test_errors.append("请先输入 API 密钥")
-        if not st_text_model_name:
+        if not model_name_input:
            test_errors.append("请先输入模型名称")

        if test_errors:
@ -812,6 +971,8 @@ def render_text_llm_settings(tr):
    if text_config_changed and not text_validation_errors:
        try:
            config.save_config()
+            # 清除缓存，确保下次使用新配置
+            UnifiedLLMService.clear_cache()
            if st_text_api_key or st_text_base_url or st_text_model_name:
                st.success(f"文案生成模型配置已保存（LiteLLM）")
        except Exception as e:
--- a/webui/components/script_settings.py
+++ b/webui/components/script_settings.py
@ -49,90 +49,160 @@ def render_script_panel(tr):

 def render_script_file(tr, params):
    """渲染脚本文件选择"""
-    script_list = [
-        (tr("None"), ""),
-        (tr("Auto Generate"), "auto"),
-        (tr("Short Generate"), "short"),
-        (tr("Short Drama Summary"), "summary"),
-        (tr("Upload Script"), "upload_script")
-    ]
+    # 定义功能模式
+    MODE_FILE = "file_selection"
+    MODE_AUTO = "auto"
+    MODE_SHORT = "short"
+    MODE_SUMMARY = "summary"

-    # 获取已有脚本文件
-    suffix = "*.json"
-    script_dir = utils.script_dir()
-    files = glob.glob(os.path.join(script_dir, suffix))
-    file_list = []
+    # 模式选项映射
+    mode_options = {
+        tr("Select/Upload Script"): MODE_FILE,
+        tr("Auto Generate"): MODE_AUTO,
+        tr("Short Generate"): MODE_SHORT,
+        tr("Short Drama Summary"): MODE_SUMMARY,
+    }
+    
+    # 获取当前状态
+    current_path = st.session_state.get('video_clip_json_path', '')
+    
+    # 确定当前选中的模式索引
+    default_index = 0
+    mode_keys = list(mode_options.keys())
+    
+    if current_path == "auto":
+        default_index = mode_keys.index(tr("Auto Generate"))
+    elif current_path == "short":
+        default_index = mode_keys.index(tr("Short Generate"))
+    elif current_path == "summary":
+        default_index = mode_keys.index(tr("Short Drama Summary"))
+    else:
+        default_index = mode_keys.index(tr("Select/Upload Script"))

-    for file in files:
-        file_list.append({
-            "name": os.path.basename(file),
-            "file": file,
-            "ctime": os.path.getctime(file)
-        })
+    # 1. 渲染功能选择下拉框
+    # 使用 segmented_control 替代 selectbox，提供更好的视觉体验
+    default_mode_label = mode_keys[default_index]
+    
+    # 定义回调函数来处理状态更新
+    def update_script_mode():
+        # 获取当前选中的标签
+        selected_label = st.session_state.script_mode_selection
+        if selected_label:
+            # 更新实际的 path 状态
+            new_mode = mode_options[selected_label]
+            st.session_state.video_clip_json_path = new_mode
+            params.video_clip_json_path = new_mode
+        else:
+            # 如果用户取消选择（segmented_control 允许取消），恢复到默认或上一个状态
+            # 这里我们强制保持当前状态，或者重置为默认
+            st.session_state.script_mode_selection = default_mode_label

-    file_list.sort(key=lambda x: x["ctime"], reverse=True)
-    for file in file_list:
-        display_name = file['file'].replace(config.root_dir, "")
-        script_list.append((display_name, file['file']))
-
-    # 找到保存的脚本文件在列表中的索引
-    saved_script_path = st.session_state.get('video_clip_json_path', '')
-    selected_index = 0
-    for i, (_, path) in enumerate(script_list):
-        if path == saved_script_path:
-            selected_index = i
-            break
-
-    selected_script_index = st.selectbox(
-        tr("Script Files"),
-        index=selected_index,
-        options=range(len(script_list)),
-        format_func=lambda x: script_list[x][0]
+    # 渲染组件
+    selected_mode_label = st.segmented_control(
+        tr("Video Type"),
+        options=mode_keys,
+        default=default_mode_label,
+        key="script_mode_selection",
+        on_change=update_script_mode
    )
+    
+    # 处理未选择的情况（虽然有default，但在某些交互下可能为空）
+    if not selected_mode_label:
+        selected_mode_label = default_mode_label
+        
+    selected_mode = mode_options[selected_mode_label]

-    script_path = script_list[selected_script_index][1]
-    st.session_state['video_clip_json_path'] = script_path
-    params.video_clip_json_path = script_path
+    # 2. 根据选择的模式处理逻辑
+    if selected_mode == MODE_FILE:
+        # --- 文件选择模式 ---
+        script_list = [
+            (tr("None"), ""),
+            (tr("Upload Script"), "upload_script")
+        ]

-    # 处理脚本上传
-    if script_path == "upload_script":
-        uploaded_file = st.file_uploader(
-            tr("Upload Script File"),
-            type=["json"],
-            accept_multiple_files=False,
+        # 获取已有脚本文件
+        suffix = "*.json"
+        script_dir = utils.script_dir()
+        files = glob.glob(os.path.join(script_dir, suffix))
+        file_list = []
+
+        for file in files:
+            file_list.append({
+                "name": os.path.basename(file),
+                "file": file,
+                "ctime": os.path.getctime(file)
+            })
+
+        file_list.sort(key=lambda x: x["ctime"], reverse=True)
+        for file in file_list:
+            display_name = file['file'].replace(config.root_dir, "")
+            script_list.append((display_name, file['file']))
+
+        # 找到保存的脚本文件在列表中的索引
+        # 如果当前path是特殊值(auto/short/summary)，则重置为空
+        saved_script_path = current_path if current_path not in [MODE_AUTO, MODE_SHORT, MODE_SUMMARY] else ""
+        
+        selected_index = 0
+        for i, (_, path) in enumerate(script_list):
+            if path == saved_script_path:
+                selected_index = i
+                break
+
+        selected_script_index = st.selectbox(
+            tr("Script Files"),
+            index=selected_index,
+            options=range(len(script_list)),
+            format_func=lambda x: script_list[x][0],
+            key="script_file_selection"
        )

-        if uploaded_file is not None:
-            try:
-                # 读取上传的JSON内容并验证格式
-                script_content = uploaded_file.read().decode('utf-8')
-                json_data = json.loads(script_content)
+        script_path = script_list[selected_script_index][1]
+        st.session_state['video_clip_json_path'] = script_path
+        params.video_clip_json_path = script_path

-                # 保存到脚本目录
-                script_file_path = os.path.join(script_dir, uploaded_file.name)
-                file_name, file_extension = os.path.splitext(uploaded_file.name)
+        # 处理脚本上传
+        if script_path == "upload_script":
+            uploaded_file = st.file_uploader(
+                tr("Upload Script File"),
+                type=["json"],
+                accept_multiple_files=False,
+            )

-                # 如果文件已存在,添加时间戳
-                if os.path.exists(script_file_path):
-                    timestamp = time.strftime("%Y%m%d%H%M%S")
-                    file_name_with_timestamp = f"{file_name}_{timestamp}"
-                    script_file_path = os.path.join(script_dir, file_name_with_timestamp + file_extension)
+            if uploaded_file is not None:
+                try:
+                    # 读取上传的JSON内容并验证格式
+                    script_content = uploaded_file.read().decode('utf-8')
+                    json_data = json.loads(script_content)

-                # 写入文件
-                with open(script_file_path, "w", encoding='utf-8') as f:
-                    json.dump(json_data, f, ensure_ascii=False, indent=2)
+                    # 保存到脚本目录
+                    script_file_path = os.path.join(script_dir, uploaded_file.name)
+                    file_name, file_extension = os.path.splitext(uploaded_file.name)

-                # 更新状态
-                st.success(tr("Script Uploaded Successfully"))
-                st.session_state['video_clip_json_path'] = script_file_path
-                params.video_clip_json_path = script_file_path
-                time.sleep(1)
-                st.rerun()
+                    # 如果文件已存在,添加时间戳
+                    if os.path.exists(script_file_path):
+                        timestamp = time.strftime("%Y%m%d%H%M%S")
+                        file_name_with_timestamp = f"{file_name}_{timestamp}"
+                        script_file_path = os.path.join(script_dir, file_name_with_timestamp + file_extension)

-            except json.JSONDecodeError:
-                st.error(tr("Invalid JSON format"))
-            except Exception as e:
-                st.error(f"{tr('Upload failed')}: {str(e)}")
+                    # 写入文件
+                    with open(script_file_path, "w", encoding='utf-8') as f:
+                        json.dump(json_data, f, ensure_ascii=False, indent=2)
+
+                    # 更新状态
+                    st.success(tr("Script Uploaded Successfully"))
+                    st.session_state['video_clip_json_path'] = script_file_path
+                    params.video_clip_json_path = script_file_path
+                    time.sleep(1)
+                    st.rerun()
+
+                except json.JSONDecodeError:
+                    st.error(tr("Invalid JSON format"))
+                except Exception as e:
+                    st.error(f"{tr('Upload failed')}: {str(e)}")
+    else:
+        # --- 功能生成模式 ---
+        st.session_state['video_clip_json_path'] = selected_mode
+        params.video_clip_json_path = selected_mode


 def render_video_file(tr, params):
--- a/webui/components/subtitle_settings.py
+++ b/webui/components/subtitle_settings.py
@ -10,6 +10,7 @@ def render_subtitle_panel(tr):
    """渲染字幕设置面板"""
    with st.container(border=True):
        st.write(tr("Subtitle Settings"))
+        st.info("💡 提示：目前仅 **edge-tts** 引擎支持自动生成字幕，其他 TTS 引擎暂不支持。")

        # 检查是否选择了 SoulVoice qwen3_tts引擎
        from app.services import voice
@ -150,9 +151,10 @@ def render_style_settings(tr):

 def get_subtitle_params():
    """获取字幕参数"""
+    font_name = st.session_state.get('font_name') or "SimHei"
    return {
        'subtitle_enabled': st.session_state.get('subtitle_enabled', True),
-        'font_name': st.session_state.get('font_name', ''),
+        'font_name': font_name,
        'font_size': st.session_state.get('font_size', 60),
        'text_fore_color': st.session_state.get('text_fore_color', '#FFFFFF'),
        'subtitle_position': st.session_state.get('subtitle_position', 'bottom'),
--- a/webui/i18n/zh.json
+++ b/webui/i18n/zh.json
@ -152,7 +152,7 @@
    "API rate limit exceeded. Please wait about an hour and try again.": "API 调用次数已达到限制，请等待约一小时后再试。",
    "Resources exhausted. Please try again later.": "资源已耗尽，请稍后再试。",
    "Transcription Failed": "转录失败",
-    "Short Generate": "短剧混剪 (实验)",
+    "Short Generate": "短剧混剪",
    "Generate Short Video Script": "AI生成短剧混剪脚本",
    "Adjust the volume of the original audio": "调整原始音频的音量",
    "Original Volume": "视频音量",
@ -161,6 +161,8 @@
    "Frame Interval (seconds) (More keyframes consume more tokens)": "帧间隔 (秒) (更多关键帧消耗更多令牌)",
    "Batch Size": "批处理大小",
    "Batch Size (More keyframes consume more tokens)": "批处理大小, 每批处理越少消耗 token 越多",
-    "Short Drama Summary": "短剧解说"
+    "Short Drama Summary": "短剧解说",
+    "Video Type": "视频类型",
+    "Select/Upload Script": "选择/上传脚本"
  }
 }
--- a/webui/tools/base.py
+++ b/webui/tools/base.py
@ -144,32 +144,3 @@ def get_batch_files(keyframe_files, result, batch_size=5):
    batch_start = result['batch_index'] * batch_size
    batch_end = min(batch_start + batch_size, len(keyframe_files))
    return keyframe_files[batch_start:batch_end]
-
-
-def chekc_video_config(video_params):
-    """
-    检查视频分析配置
-    """
-    headers = {
-        'accept': 'application/json',
-        'Content-Type': 'application/json'
-    }
-    session = requests.Session()
-    retry_strategy = Retry(
-        total=3,
-        backoff_factor=1,
-        status_forcelist=[500, 502, 503, 504]
-    )
-    adapter = HTTPAdapter(max_retries=retry_strategy)
-    session.mount("https://", adapter)
-    try:
-        session.post(
-            f"https://dev.narratoai.cn/api/v1/admin/external-api-config/services",
-            headers=headers,
-            json=video_params,
-            timeout=30,
-            verify=True
-        )
-        return True
-    except Exception as e:
-        return False
--- a/webui/tools/generate_script_docu.py
+++ b/webui/tools/generate_script_docu.py
@ -10,7 +10,7 @@ from datetime import datetime

 from app.config import config
 from app.utils import utils, video_processor
-from webui.tools.base import create_vision_analyzer, get_batch_files, get_batch_timestamps, chekc_video_config
+from webui.tools.base import create_vision_analyzer, get_batch_files, get_batch_timestamps


 def generate_script_docu(params):
@ -398,7 +398,6 @@ def generate_script_docu(params):
                    "text_model_name": text_model,
                    "text_base_url": text_base_url
                })
-                chekc_video_config(llm_params)
                # 整理帧分析数据
                markdown_output = parse_frame_analysis_to_markdown(analysis_json_path)

--- a/webui/tools/generate_script_short.py
+++ b/webui/tools/generate_script_short.py
@ -8,7 +8,6 @@ import streamlit as st
 from loguru import logger

 from app.config import config
-from webui.tools.base import chekc_video_config


 def generate_script_short(tr, params, custom_clips=5):
@ -59,7 +58,6 @@ def generate_script_short(tr, params, custom_clips=5):
                "text_model_name": text_model,
                "text_base_url": text_base_url or ""
            }
-            chekc_video_config(api_params)
            from app.services.SDP.generate_script_short import generate_script
            script = generate_script(
                srt_path=srt_path,
--- a/webui/utils/cache.py
+++ b/webui/utils/cache.py
@ -8,7 +8,8 @@ def get_fonts_cache(font_dir):
        fonts = []
        for root, dirs, files in os.walk(font_dir):
            for file in files:
-                if file.endswith(".ttf") or file.endswith(".ttc"):
+                # 支持常见字体格式，少字体时也能被UI识别
+                if file.lower().endswith((".ttf", ".ttc", ".otf")):
                    fonts.append(file)
        fonts.sort()
        st.session_state['fonts_cache'] = fonts
@ -30,4 +31,4 @@ def get_songs_cache(song_dir):
                if file.endswith(".mp3"):
                    songs.append(file)
        st.session_state['songs_cache'] = songs
-    return st.session_state['songs_cache'] 
+    return st.session_state['songs_cache']
Author	SHA1	Message	Date
linyq	4f964ad98d	fix: 修复开发调试代码残留。已在当前版本中修复。服务端日志已清空。已建议所有用户重置 Key	2025-12-12 12:20:32 +08:00
linyq	dfb96e9b0f	更新了示例配置文件，并移除了日文README (坚决拥护中国🇨🇳领土主权🔥)	2025-12-12 11:42:50 +08:00
linyq	97bb59220f	fix: 移除未使用的 logger 导入	2025-12-12 11:42:12 +08:00
linyq	169daac94d	fix: 移除未使用的 tkinter 导入	2025-12-12 11:42:12 +08:00
linyq	c0e3ff045a	fix: 更新版本号至 0.7.5	2025-12-12 11:42:12 +08:00
linyq	7b9ef2f244	feat: 新增 IndexTTS2 零样本语音克隆引擎支持添加 IndexTTS2 TTS 引擎配置和实现,支持零样本语音克隆功能。包括配置保存加载、API 调用、参考音频上传、高级参数设置(温度、top_p、top_k、束搜索、重复惩罚等),并在 WebUI 中提供完整的配置界面和使用说明。	2025-12-12 11:42:12 +08:00
linyq	854cfab460	feat: 显示字幕引擎支持提示	2025-12-12 11:42:12 +08:00
linyq	474ebe46e2	feat: 新增基础设置项并提供中文翻译	2025-12-12 11:42:12 +08:00
linyq	46042d17d6	fix: 优化标题样式	2025-12-12 11:42:12 +08:00
linyq	eb57d2a0fe	feat: 更新 webui 界面以支持新功能	2025-12-12 11:42:12 +08:00
linyq	d5f089c9a7	feat: 优化LLM服务配置与迁移适配，并更新相关UI设置及中文翻译	2025-12-12 11:42:12 +08:00
linyq	77c0aa47f2	feat: 增强 LiteLLM 提供商配置并更新基本设置界面	2025-12-12 11:42:12 +08:00
linyq	efa02d83ca	fix: 更新版本号	2025-12-12 11:42:12 +08:00
linyq	eca1fcbe67	fix: 修改荒野建造提示词为更加通用的提示词	2025-12-12 11:42:12 +08:00
linyq	d7b1b51a36	fix: 使用 litellm 管理模型供应商	2025-12-12 11:40:44 +08:00
harry	4423195313	Fix the problem that Tencent cloud tts sound setting is invalid in the generation video function	2025-12-12 11:38:06 +08:00
harry	4b0f7c3bb9	新增qwen3 tts服务	2025-12-12 11:38:06 +08:00
Joran Joran	bad4a95ced	fix: 修复docker-compose.yml中对于resourse的只读权限。	2025-12-12 11:38:06 +08:00
linyq	a99d752069	移除 SoulVoice 引擎相关设置，优化 TTS 引擎选项	2025-12-12 11:38:06 +08:00
linyq	6b8082244c	修复试听 tts bug	2025-12-12 11:38:06 +08:00
linyq	52f96f9eae	优化腾讯tts引擎	2025-12-12 11:36:04 +08:00
Emily-LMH	2c5c7cbd77	新增腾讯云 TTS 服务	2025-12-12 11:36:04 +08:00
linyq	303ba571cc	更新版本号	2025-12-12 11:36:04 +08:00
linyq	067d82885b	优化 README	2025-12-12 11:36:04 +08:00
linyq	a26c07d3dc	更新 gemini 模型请求参数设置	2025-12-12 11:36:04 +08:00
linyq	207b49c9cc	忽略开发文件	2025-12-12 11:36:04 +08:00
viccy	f2ba9689e1	Update LICENSE 更新许可证	2025-12-12 11:36:04 +08:00
viccy	87afe738fe	Update README.md 更新许可证	2025-12-12 11:36:04 +08:00
viccy	74b52eec7b	Update README.md 更新防止被骗提示	2025-12-12 11:36:04 +08:00
viccy	b3fd32569e	Update README.md 防止被骗	2025-12-12 11:36:04 +08:00
linyq	b5548b050d	dev0.7.1 预发布	2025-12-12 11:36:04 +08:00
linyq	95e3b66bc7	refactor(docker): 优化docker配置和部署脚本重构Docker相关配置，包括： - 更新.dockerignore文件，增加更多忽略规则 - 优化requirements.txt依赖管理 - 新增Makefile提供常用命令 - 重构docker-compose.yml配置 - 增强docker-entrypoint.sh功能 - 改进Dockerfile多阶段构建 - 新增docker-deploy.sh一键部署脚本	2025-12-12 11:36:04 +08:00
linyq	b1bcedd5d5	fix(subtitle): 修复字幕处理逻辑并添加有效性检查处理空字幕文件情况并改进错误处理确保合并失败时有默认返回值添加字幕文件有效性检查函数	2025-12-12 11:36:04 +08:00
linyq	81d8c55580	refactor: 移除未使用的代码文件和端口配置清理未使用的控制器、测试文件和模型定义移除Dockerfile中未使用的8080端口暴露删除requirements.txt中的注释依赖	2025-12-12 11:36:04 +08:00
linyq	c41bd682a9	fix(音频处理): 修复音频混合时的音量问题修复amix导致的音量稀释问题，为每个音频添加音量补偿保持原声片段音量为1.0不变，确保与原视频一致仅在需要时调整原声音量，避免不必要的修改	2025-12-12 11:36:04 +08:00
linyqh	9811607756	优化整合包bat启动脚本和环境检查	2025-12-12 11:36:04 +08:00
linyq	d8a06cc591	新增 azure 依赖	2025-12-12 11:36:04 +08:00
linyq	287cddcc35	refactor: 移除废弃脚本文件并更新项目版本至0.7.0 删除不再使用的脚本文件(check_gpu_cuda_cudnn.bat, changelog.py, main.py, release-notes.md, video_pipeline.py) 将项目版本从0.6.8更新至0.7.0，并同步更新config.example.toml中的版本号	2025-12-12 11:36:04 +08:00
linyq	bb7362809a	refactor: 移除视频审查功能及相关代码删除不再使用的视频审查功能，包括移除相关面板组件、i18n翻译条目和主程序中的调用	2025-12-12 11:36:04 +08:00
linyq	07da580919	feat(llm): 添加gemini-2.5-flash支持并增强API调用可靠性添加对gemini-2.5-flash模型的支持并更新示例配置实现模型验证的严格/宽松模式配置为API调用添加重试机制和超时配置增加对更多HTTP错误状态码的处理	2025-12-12 11:36:04 +08:00
linyq	aebd169900	feat(tts): 添加多引擎TTS支持并重构语音设置界面 - 新增Azure Speech Services和Edge TTS引擎支持 - 重构语音设置界面，支持不同引擎的独立配置 - 添加引擎选择器和详细说明 - 更新requirements.txt添加azure-cognitiveservices-speech依赖 - 改进音色名称验证逻辑	2025-12-12 11:36:04 +08:00
linyq	a184662f8b	refactor: 移除视频字幕合并功能及相关代码清理不再使用的视频字幕合并功能，包括删除合并设置组件、合并工具函数和相关的国际化文本	2025-12-12 11:36:04 +08:00
linyq	787d17a1a9	feat(script): 合并脚本保存与格式验证功能重构脚本保存流程，将格式验证整合到保存操作中。新增详细的格式验证错误提示和正确格式示例展示。增强脚本格式检查功能，包括字段类型、格式和必填项验证。	2025-12-12 11:36:04 +08:00
linyq	e7db1668f8	feat(video): 实现统一视频裁剪策略并移除旧逻辑重构视频处理流程，引入基于OST类型的统一裁剪策略： - 新增 clip_video_unified 函数处理三种OST类型 - 移除预裁剪步骤和相关UI组件 - 优化任务处理流程，减少重复裁剪 - 添加详细的错误处理和日志记录	2025-12-12 11:36:04 +08:00
linyq	e389412dc2	feat(tts): 添加 SoulVoice TTS 引擎支持实现 SoulVoice TTS 引擎集成，包括配置管理、语音选择、API 调用和字幕处理新增 SoulVoice 配置项和示例配置修改音频设置面板以支持 SoulVoice 选项优化音频时长计算和异常处理更新多语言文案以反映 SoulVoice 支持	2025-12-12 11:36:04 +08:00
viccy	aff6aca00c	Update README.md 新增赞助	2025-12-12 11:36:04 +08:00
linyq	7ae4263943	更新提示词	2025-12-12 11:36:04 +08:00
linyq	cd3a5bc837	优化短剧解说画面匹配	2025-12-12 11:36:04 +08:00
linyq	4dc1448154	fix(logging): 注释掉调试信息的日志输出，优化日志记录在clip_video.py和merger_video.py中，注释掉了成功处理视频的日志信息，以减少调试时的冗余输出。同时，在manager.py中更新了提示词渲染成功的日志格式，确保版本信息的清晰展示。这些更改旨在提升日志的可读性和有效性，优化调试过程。	2025-12-12 11:36:04 +08:00
linyq	33fc3dab10	feat(subtitle_analysis): 更新解说文案生成逻辑，增强字幕内容支持在多个文件中重构了解说文案生成的实现，新增对原始字幕内容的支持，以提供准确的时间戳信息。更新了相关参数和提示词模板，优化了生成逻辑，提升了内容的准确性和用户体验。同时，注释部分进行了清理，去除了调试信息的输出。	2025-12-12 11:36:04 +08:00
linyqh	a15ab4c944	优化视频帧提取功能，新增超级兼容性方案以提高提取成功率，增强错误处理和用户反馈。在generate_script_docu.py中更新进度显示和错误提示，提升用户体验。	2025-12-12 11:36:04 +08:00
linyq	d83863182a	删除视频关键帧提取测试脚本，优化视频处理器中的提取逻辑，增加超级兼容性方案以解决Windows系统的MJPEG编码问题。更新了软件方案的提取命令，增强了错误处理和调试信息，提升了整体兼容性和用户体验。	2025-12-12 11:36:04 +08:00
linyq	1c8b526c3c	feat(video_processor): 优化视频帧提取功能，增强Windows系统兼容性在video_processor.py中，添加了对Windows N卡硬件加速的支持，优化了帧提取过程，改进了提取成功率的统计和错误处理。同时，在generate_script_docu.py中，增强了对硬件加速失败的处理逻辑，提供了详细的错误信息和解决建议，提升了用户体验。	2025-12-12 11:36:04 +08:00
linyq	4ca7ed9721	feat(config):更新配置文件	2025-12-12 11:36:03 +08:00
linyq	c7fdb3fc94	更新注释	2025-12-12 11:36:03 +08:00
linyq	9132e2b148	更新版本号 067	2025-12-12 11:36:03 +08:00
linyq	271401af99	feat(prompts): 更新解说文案生成要求以提升内容质量和逻辑性在script_generation.py文件中，重写了解说文案的输出要求，强调线性时间链、角色细节描写和情感关联，确保所有内容严格源自<plot>，并合理安排解说与原片的比例。这些更改旨在提升解说文案的生动性和吸引力，增强用户体验。	2025-12-12 11:36:03 +08:00
linyq	f70cfbab46	feat(llm): 增强解说文案生成和图片分析功能，优化JSON解析在migration_adapter.py和generate_script_docu.py文件中，集成了增强的JSON解析器以提高解说文案生成的稳定性和兼容性。更新了生成解说文案的提示词管理系统，确保返回的JSON格式有效，并在图片分析中保持向后兼容性，提升了系统的灵活性和用户体验。	2025-12-12 11:36:03 +08:00
linyq	5ef9f4a10c	feat(llm): 重构字幕分析和脚本生成流程，支持统一LLM服务删除了旧的提示词文件，并在多个文件中更新了字幕分析和脚本生成的实现，集成了统一的LLM服务架构。新增了对服务提供商的支持，优化了API调用和JSON响应解析，提升了系统的灵活性和稳定性，确保了对不同LLM的兼容性，增强了用户体验。	2025-12-12 11:36:03 +08:00
linyq	d55754c7fb	feat(prompts): 更新JSON格式输出以增强解析兼容性在多个文件中修正了JSON格式输出，确保使用标准格式 `{` 和 `}`，并移除双大括号 `{{` 和 `}}`。这些更改提升了系统对LLM生成内容的解析能力，确保输出的JSON格式有效且符合要求，增强了整体用户体验。	2025-12-12 11:36:03 +08:00
linyq	e76031832c	feat(prompts): 更新提示词管理系统以增强解说文案生成在migration_adapter.py、base.py和template.py文件中集成新的提示词管理系统，优化提示词构建过程，提升解说文案生成的灵活性和兼容性。通过使用PromptManager和自定义模板渲染器，确保系统在处理新格式的JSON输出时保持向后兼容性，增强整体用户体验。	2025-12-12 11:36:03 +08:00
linyq	eadaf1be6e	feat(prompts): 引入新的提示词管理系统以优化解说文案生成更新generate_narration_script.py、short_drama_explanation.py和step1_subtitle_analyzer_openai.py文件，集成新的提示词管理系统，提升解说文案和短剧分析的生成效率与准确性。通过使用PromptManager简化提示词构建过程，增强系统的灵活性和可维护性。	2025-12-12 11:36:03 +08:00
linyq	79b0d613e3	feat(llm): 重构解说文案生成和视觉分析器，支持新的LLM服务架构更新generate_narration_script.py、base.py和generate_short_summary.py文件，重构解说文案生成和视觉分析器的实现，优先使用新的LLM服务架构。添加回退机制以确保兼容性，增强系统的稳定性和用户体验。	2025-12-12 11:36:03 +08:00
linyq	706d73383e	feat: 更新作者信息并增强API配置验证功能在基础设置中新增API密钥、基础URL和模型名称的验证功能，确保用户输入的配置有效性，提升系统的稳定性和用户体验。	2025-12-12 11:36:03 +08:00
linyq	2e0c492778	feat(audio): 增强音量管理和智能音量调整功能更新AudioVolumeDefaults类，提升原声音量至1.2以平衡TTS音量，并允许最大音量达到2.0。新增智能音量调整功能，自动分析和调整音频轨道音量，确保音量在合理范围内。优化任务处理逻辑，结合用户设置和推荐音量配置，提升音频合成效果和用户体验。	2025-12-12 11:36:03 +08:00
linyqh	13a87e2a00	新增bat脚本	2025-12-12 11:36:03 +08:00
linyqh	458071d583	feat(video): 优化视频裁剪和合并功能，增强硬件加速兼容性更新编码器配置，优先使用纯NVENC编码器以避免滤镜链错误，确保视频裁剪和合并过程中的兼容性和性能。改进错误处理机制，智能分析FFmpeg错误类型并选择合适的回退方案，提升整体稳定性和用户体验。	2025-12-12 11:36:03 +08:00
linyqh	9c4b3338c2	feat(video): 增强视频裁剪功能，优化Windows兼容性和错误处理新增安全编码器配置和FFmpeg命令构建函数，支持硬件加速类型的动态选择。改进裁剪过程中的错误处理，记录失败片段并提供回退编码方案，确保视频裁剪的可靠性和兼容性。	2025-12-12 11:36:03 +08:00
linyq	053212b182	chore: 更新项目版本号至0.6.5	2025-12-12 11:36:03 +08:00
linyq	6f48fa2563	feat(ffmpeg): 实现智能硬件加速检测和编码器选择添加智能硬件加速检测功能，支持多平台和渐进式降级优化编码器选择逻辑，根据硬件类型自动选择最优编码器增加测试视频生成和清理功能，用于硬件加速兼容性测试支持强制软件编码模式，提供更可靠的备选方案	2025-12-12 11:36:03 +08:00
linyq	18d2efd664	fix(字幕): 修复字幕开关功能无效的问题添加subtitle_enabled参数控制字幕处理逻辑，当禁用时跳过字幕处理修复字幕文件不存在时的日志级别为warning	2025-12-12 11:36:03 +08:00
linyq	70b8b49e41	feat(audio): 统一音量配置并修复原声音量默认值问题引入AudioVolumeDefaults类集中管理音量配置，确保全局一致性修复原声音量默认值为0.7以解决短剧解说模式问题添加音量验证和详细日志便于调试	2025-12-12 11:36:03 +08:00
linyq	c3d855c547	优化版本号获取逻辑，直接从文件读取版本号，并简化发布说明生成流程	2025-12-12 11:36:03 +08:00
linyq	f740e5a4bd	更新版本号至0.6.2.5，调整发布说明生成逻辑，优化Discord通知格式，并删除不再使用的工作流文件	2025-12-12 11:36:03 +08:00
linyq	72165dbcd9	更新版本号至0.6.2.4，并增强发布说明生成和通知的调试信息	2025-12-12 11:36:03 +08:00
linyq	ebdae9998d	优化 cicd中使用大模型生成发布说明的代码	2025-12-12 11:36:03 +08:00
linyq	316be8f422	优化 cicd 流程	2025-12-12 11:36:03 +08:00
linyq	3537c19f4b	cicd 测试 3	2025-12-12 11:36:03 +08:00
linyq	2ac74132fc	cicd 测试 2	2025-12-12 11:36:03 +08:00
linyq	6bbe4bc14b	cicd 测试	2025-12-12 11:36:03 +08:00
linyq	a94baee22a	cicd 修复 1	2025-12-12 11:36:03 +08:00
linyq	0d944413ab	测试工作流版本号	2025-12-12 11:36:03 +08:00
linyq	a70d396143	新增自动发布 cicd	2025-12-12 11:36:03 +08:00
linyq	05fb2681d5	优化 ffmpeg 硬件加速美化日志	2025-12-12 11:36:03 +08:00
linyq	ef68697491	优化 ffmpeg 硬件加速独显兼容性	2025-12-12 11:36:03 +08:00
linyq	f2d652e7a8	优化 ffmpeg 硬件加速兼容性	2025-12-12 11:36:03 +08:00
linyq	ca05440fc0	更新版本号 0.6.1，修复更新脚本	2025-12-12 11:36:03 +08:00
linyq	57cafaa73f	优化视频处理中的硬件加速检测与兼容性检查 - 在 Windows 系统上增加显卡信息检测，避免使用不兼容的硬件加速 - 添加强制使用软件编码的选项，提升兼容性 - 增强错误处理机制，确保在硬件加速失败时能够回退到软件编码 - 更新日志信息，提供更清晰的处理反馈	2025-12-12 11:36:03 +08:00
linyq	97b30e4390	更新英文README	2025-12-12 11:36:03 +08:00
linyq	716b22ef9a	feat(SDE): 优化剧情分析提示词和解说文案生成 - 在 generate_short_summary.py 中添加 temperature 参数，用于控制生成文案的随机性- 修改 prompt.py 中的段落数要求，改为与字幕长度成正比 - 在 short_drama_explanation.py 中添加日志输出，记录使用的模型和温度	2025-12-12 11:35:44 +08:00
 @ -1 +1 @@
 .7.4
 .7.5