案例复盘 · 2025.11 — 2026.02 Case Study · Nov 2025 — Feb 2026

Pre-Flight
Agent Pre-Flight
Agent

为 AIGC 短剧行业设计的投产前置智能审核系统。在视频生成前，完成需求补全、成本估算、合规拦截与专业提示词生成。 A pre-production intelligence system for the AIGC short drama industry — completing requirement checks, cost estimation, compliance screening, and professional prompt generation before a single frame is rendered.

查看架构View Architecture ↓ 设计决策Design Decisions

pre-flight-agent · workflow preview

输入Input

用户自然语言描述User natural language input

↓

Step 0

意图识别Intent Classification

Qwen3-30B

↓

新需求New Request

重置记忆Reset Memory

补充信息Supplement

读取 · 合并Read · Merge

ERNIE-4.0

确认投产Confirm

→ 完成→ Done

↓

Step 1

智能阻断器 · 三级红线Smart Interceptor · 3-tier gates

DeepSeek-V3.2

PASS NEED_EDIT REJECT

↓

Step 2.5

成本计算器Cost Estimator

ERNIE-4.0

Step 2.6

决策路由器Decision Router

Python · Code

Step 3

Prompt 生成器 + RAGPrompt Generator + RAG

ERNIE-4.0 + KB

↓

Step 4

C端回复生成器User Response Generator

ERNIE-Speed

背景与问题定义Problem Background

"盲盒式投产"
的真实代价The real cost of
blind production

短剧行业正在大规模拥抱 AIGC，但没有人在"生成"之前做好把关。投产即开盲盒，试错成本极高。 The short drama industry is rapidly adopting AIGC, but nobody is gatekeeping before generation. Every production is a blind box — the cost of failure is enormous.

✗
需求模糊即投产Vague requirements go to production
创作者一句"我要仙侠视频"就开始生成，缺少时长、分辨率、预算等关键参数，最终结果往往与预期完全不符。 Creators submit "I want a xianxia video" and immediately pay for generation, missing duration, resolution, and budget — results never match expectations.
⚠
预算超支无预警Budget overruns with zero warning
没有前置的成本估算，用户在生成完成后才发现费用远超预算，重新生成带来双重损耗。 No upfront cost estimation means users discover budget overruns only after generation — paying twice for the same output.
◈
术语幻觉严重Severe hallucination on domain terms
通用大模型不懂"御剑飞行""丹炉"等仙侠专业词汇的正确英文表达，生成结果专业度低、风格跑偏。 General LLMs don't know the correct English Prompt equivalents for "immortal sword riding" or "alchemy furnace," producing generic, off-style results.
✗
剧本拆解极耗时Script breakdown is painfully slow
人工编写分镜 Prompt 需要熟悉 AIGC 工具的英文关键词体系，一集短剧需要数小时甚至一整天。 Manually writing storyboard prompts requires AIGC keyword expertise — a single episode takes hours to a full day.

为什么做成 Agent？Why an Agent?

这不是一个"生成工具"能解决的问题。用户需求是渐进式的，信息是分散的，判断是多维度的，优化是循环的。只有 Agent 能处理多轮对话记忆、动态路由和跨节点状态管理。 This isn't a problem a simple generation tool can solve. User requirements are incremental, information is scattered, judgment is multi-dimensional, and optimization is iterative. Only an Agent can handle multi-turn memory, dynamic routing, and cross-node state management.

30%+

预计降低无效投产成本Projected waste reduction

<15%

成本估算误差Cost estimation error

42s

平均响应时间Avg response time

系统架构System Architecture

多模型混排
工作流编排Multi-model routing
workflow orchestration

不同节点选用不同模型，核心原则：把对的模型用在对的任务上，而不是全程用最贵的。 Different nodes use different models. Core principle: use the right model for the right task — never default to the most expensive one everywhere.

完整工作流 · v0.3 BetaFull Workflow · v0.3 Beta PASS NEED_EDIT REJECT

STEP 0 · Qwen3-30B

为什么用 Qwen 做意图识别？Why Qwen for intent classification?

意图识别是三分类任务，核心诉求是分类准确率而非生成质量。Qwen3-30B 在中文自然语言理解和文本分类上表现更稳定，误判率低于测试中的其他备选模型。Intent classification is a 3-class task. The core need is accuracy, not generation quality. Qwen3-30B demonstrated superior stability in Chinese NLU classification with the lowest error rate among models tested.

STEP 1 · DeepSeek-V3.2

为什么用 DeepSeek 做合规检查？Why DeepSeek for compliance?

合规检查需要对语义风险有深层理解——"暗黑系打斗"和"暴力血腥"是不同的。DeepSeek 在安全对齐和结构化 JSON 输出上的稳定性高于备选模型，且误判率更低。Compliance requires nuanced semantic risk judgment — "dark style combat" vs. "graphic violence" are different. DeepSeek excels in safety alignment and stable structured JSON output with lower false positive rates.

STEP 2.5 + STEP 3 · ERNIE-4.0

为什么成本和 Prompt 都用 ERNIE？Why ERNIE for cost & prompts?

ERNIE-4.0 在中文内容理解和生成上有显著优势，结合仙侠场景 RAG 知识库的效果最好。同时它对百度平台的千帆工作流集成度最高，调用延迟最低。ERNIE-4.0 has a clear advantage in Chinese content understanding and generation, with the best results when combined with the xianxia RAG knowledge base and the lowest latency on the Qianfan platform.

STEP 2.6 · Python Code

为什么决策路由不用模型？Why use code instead of a model for routing?

决策路由是确定性逻辑：缺字段就 EDIT，违规就 REJECT，超预算就追加降本建议。用大模型做 if-else 浪费且不稳定。代码节点 100% 可预测，响应几乎无延迟。Decision routing is deterministic: missing fields → EDIT, violations → REJECT, over budget → add reduction tips. Using an LLM for if-else logic is wasteful and unstable. Code nodes are 100% predictable with near-zero latency.

关键设计决策Key Design Decisions

为什么这样设计？Why this design,
not that?

核心架构选择Core Architecture Choice

为什么不是单轮生成？Why not single-turn generation?

单轮生成适合信息完整的场景。但短剧创作者的现实是：他们往往不知道自己需要什么参数，不了解 AIGC 工具的成本结构，也不懂专业 Prompt 语法。多轮记忆机制允许用户渐进式补充信息，Agent 负责把碎片化输入拼装成完整需求，这才是真正降低用户认知门槛的设计。 Single-turn works only when all information is available upfront. But short drama creators typically don't know what parameters they need, don't understand AIGC cost structures, and can't write professional prompts. Multi-turn memory lets users incrementally provide information — the Agent assembles fragments into complete requirements, truly lowering the cognitive barrier.

CONVERSATION TRACE

User · Turn 1
我想做仙侠打斗视频I want a xianxia fight video

Agent · Missing fields
请问视频时长、分辨率和预算是多少？Please provide duration, resolution, and budget.

User · Turn 2
15s · 1080p · ¥200

Agent · Memory merged ✓
成本估算: ¥276-374。超出预算，建议缩短至8秒。Cost: ¥276-374. Over budget — suggest reducing to 8s.

风控设计Risk Control Design

为什么三级阻断，而不是一级？Why 3-tier blocking, not 1?

合规、完整性和预算是三种完全不同性质的问题，处理方式也不同：合规需要硬性拒绝（不给任何优化建议以防止绕过）；完整性需要追问；预算需要给出具体的降本路径。混在一起处理会导致逻辑混乱和用户体验割裂。 Compliance, completeness, and budget are three fundamentally different problem types requiring different handling: compliance → hard reject (no suggestions, to prevent bypassing); completeness → ask questions; budget → concrete cost-reduction paths. Combining them creates logical confusion and broken UX.

RAG 设计RAG Design

为什么要建垂直知识库？Why build a vertical knowledge base?

通用大模型的训练数据不包含"御剑飞行"对应的专业英文 Prompt 组合。让模型自由发挥会产生幻觉（如 "flying man on sword"），生成质量极差。RAG 知识库将中文意图硬绑定到专业 Prompt，从根本上消灭了这类幻觉，并通过负向词防止风格偏移。 General LLM training data doesn't include expert English prompt equivalents for xianxia terms. Free generation produces hallucinations like "flying man on sword." The RAG KB hard-binds Chinese intent to professional prompts, fundamentally eliminating this class of hallucination, while negative prompts prevent style drift.

成本估算设计Cost Estimation Design

为什么给成本区间而不是精确数字？Why cost ranges, not exact figures?

AIGC 生成成本受多种运行时因素影响（服务器负载、模型版本、排队时间），精确单价无法保证。区间（下限×0.85，上限×1.15）能诚实地表达不确定性，同时给用户足够的预算决策空间，避免因精确但错误的数字引发信任危机。 AIGC generation costs are affected by runtime factors (server load, model versions, queue time). Exact prices can't be guaranteed. Ranges (lower×0.85, upper×1.15) honestly represent uncertainty while giving users adequate decision-making space, avoiding trust issues from precise-but-wrong numbers.

迭代与失败复盘Iteration & Failure Analysis

从 v0.1 到 v0.3，
每次失败都有代价From v0.1 to v0.3,
every failure had a cost

v0.1

通用投产预审 · 基础版General Pre-Production Review · Baseline

接收需求 → 简单预审 → 生成通用绘画 Prompt → 输出结论。功能可用，但生成质量低，专业度严重不足。Receive input → basic review → generic painting prompt → output conclusion. Functional but low quality, severely lacking professional depth.

✗ 生成"big house, flying man"而不是专业仙侠术语，AIGC 工具效果极差Generated "big house, flying man" instead of professional xianxia terminology — AIGC results were terrible

✗ 单轮对话，用户必须一次说完所有参数，体验极差Single-turn only — users had to specify all parameters at once, terrible UX

v0.2

接入 RAG 知识库 · 漫剧专项RAG Integration · Drama Specialization

构建首个仙侠场景 RAG 知识库，升级为分镜表输出格式。专业度大幅提升，但成本计算和阻断逻辑出现新问题。Built the first xianxia RAG KB, upgraded to storyboard format. Professional quality improved significantly, but cost calculation and blocking logic surfaced new issues.

⚠ 成本计算失败率高：时长提取错误导致几个用例误差高达 70-95%（测试 001、014）High cost calculation failure rate: duration extraction errors caused 70-95% error in test cases 001, 014

⚠ 阻断器过于死板：专业用户直接贴入分镜脚本，被误判为"信息缺失"而拦截Interceptor too rigid: professional users pasting storyboard scripts were wrongly blocked for "missing information"

⚠ 平均响应时间 70 秒，用户体验明显下降Average response time 70 seconds, noticeably degraded UX

关键学习：Key learning: 大模型适合理解任务，不适合精确计算。参数提取需要强制性约束（"必须从输入中提取，找不到则输出 null"），而非依赖模型自由判断。 LLMs are good at understanding, not precise calculation. Parameter extraction needs hard constraints ("must extract from input, output null if not found"), not free model judgment.

v0.2.5

导演模式 · 视听双轨Director Mode · Visual-Audio Dual Track

从"摄影师"升级为"导演"：不只生成画面描述，还自动生成旁白、音效和关键台词，形成完整的微剧本。全中文本地化。Upgraded from "photographer" to "director": generate not just visual descriptions, but narration, sound effects, and key dialogue — forming a complete micro-screenplay. Full Chinese localization.

突破：Breakthrough: 用户价值从"获得一组绘画词"升级为"获得一个可直接拍摄的短剧本"。产品定位从工具升级为导演助手。 User value upgraded from "getting drawing keywords" to "getting a directly filmable short screenplay." Product positioning shifted from tool to director's assistant.

v0.3 Beta

智能化升级 · 工程质量提升Intelligence Upgrade · Engineering Quality

针对 v0.2 测试中发现的核心问题进行专项优化：VIP 直通机制、时间戳解析、复杂度关键词映射、Step4 切换快速模型。Targeted fixes for issues found in v0.2 testing: VIP exemption, timestamp parsing, complexity keyword mapping, Step4 switched to fast model.

结果：Result: 响应时间从 70s → 42s（-43%），参数提取成功率显著提升，合规拦截准确率 100%（2/2），Prompt 关键元素包含率 90%（18/20）。 Response time 70s → 42s (−43%), parameter extraction success rate significantly improved, compliance accuracy 100% (2/2), prompt element coverage 90% (18/20).

成果与验证数据Results & Validated Metrics

20 个测试用例，
真实数据说话20 test cases,
real numbers

95%

需求理解准确率（18/20 用例 Prompt 关键元素完整）Requirement accuracy (18/20 cases with complete key elements)

<15%

参数提取成功时的成本估算误差（中位误差 0%）Cost estimation error when extraction succeeds (median error: 0%)

42s

优化后平均响应时间（原 70s，提升 43%）Avg response time after optimization (was 70s, −43%)

100%

合规内容拦截准确率（2/2 违规用例全部正确）Compliance interception accuracy (2/2 violation cases correct)

优化前 v0.1Before v0.1

优化后 v0.3After v0.3

"big house, flying man"
通用无效输出"big house, flying man" — generic, unusable

"floating islands, ancient pagodas, wuxia style"
基于 RAG 知识库的专业输出"floating islands, ancient pagodas, wuxia style" — RAG-grounded professional output

单图描述，人景混杂Single image description, mixed elements

分镜表结构：环境/人物/动作独立拆解Storyboard format: environment/character/action separated

无成本预估，投产即开盲盒No cost preview — production is a blind box

动态成本区间 + 具体降本建议Dynamic cost range + specific reduction recommendations

剧本拆解：数小时/集Script breakdown: hours per episode

分钟级（平均 42 秒）Minutes (avg 42 seconds)

风格容易跑偏，国漫变韩漫/日漫Style drift — Chinese style becomes Korean/Japanese

锁定国漫/水墨风，负向词防跑偏Locked to Chinese ink painting style, negative prompts prevent drift

诚实的数据说明Honest data note

测试报告显示整体成本估算平均误差为 22.7%，被少数参数提取失败的异常用例拉高（001: 70.2%, 014: 94.8%）。根因是提示词未强制要求提取失败时输出 null，导致系统用默认值静默计算。 The test report shows an overall average cost error of 22.7%, inflated by a few parameter extraction failure outliers (case 001: 70.2%, case 014: 94.8%). Root cause: prompts didn't mandate null output on extraction failure, causing silent calculation with defaults.

在参数正确提取的用例中，公式层面误差<15%，中位误差 0%。v0.3 已针对性修复。 When extraction succeeded, formula-level error was <15% with 0% median error. v0.3 specifically addressed this with hard extraction constraints.

反思与下一步Reflection & Next Steps

如果继续迭代，
我会做什么If I continued,
here's what's next

⚙

成本模型精准化Cost Model Precision

接入 Sora、Runway 等真实 AIGC 工具的单价 API，替换当前的估算公式，实现基于真实模型报价的动态计算，并按建议自动计算节省百分比。Integrate real pricing APIs from Sora, Runway, and other AIGC tools to replace estimation formulas with dynamic real-world pricing, including auto-calculated savings percentages per recommendation.

🧠

跨会话记忆持久化Cross-session Memory Persistence

基于用户 ID 实现跨会话的历史偏好存储，让 Agent 记住用户的常用分辨率、预算范围、风格偏好，从陌生人变成了解用户的创作助手。User ID-based persistent memory to store historical preferences — common resolution, budget ranges, style preferences — evolving from a stranger to a creative assistant that knows the user.

📊

可观测性与评估体系Observability & Evaluation

当前测试是人工逐用例核查。需要建立自动化评估 pipeline：参数提取准确率、成本误差分布、Prompt 专业度打分（可引入 LLM-as-judge 机制）。Current testing is manual case-by-case review. Need automated evaluation pipelines: parameter extraction accuracy, cost error distributions, and prompt quality scoring (LLM-as-judge mechanism).

🔗

投产执行接口Production Execution API

"确认投产"后直接调用 AIGC 生成 API，闭合从预审到生成的完整链路，让 Pre-Flight 从前置助手变成端到端的制作平台入口。Auto-call AIGC generation APIs upon "confirm production," closing the loop from pre-review to generation — evolving Pre-Flight from a pre-production assistant to an end-to-end production platform entry.

📚

知识库扩容Knowledge Base Expansion

当前仅覆盖仙侠场景（200+ 条）。计划扩展悬疑、都市、古装、科幻等流派，并为每个场景添加推荐台词字段（Script Hook），支持导演模式的金句直取。Currently covers only xianxia (200+ entries). Plan to expand to thriller, urban, historical, and sci-fi genres, adding Script Hook fields per scene to enable direct golden-line extraction in Director Mode.

🧪

A/B 测试与回归A/B Testing & Regression

建立回归测试集，每次迭代后自动运行全量用例，对比关键指标变化（成本误差、响应时间、Prompt 质量），防止优化一处、破坏另一处。Build a regression test suite that auto-runs all cases after each iteration, comparing key metric changes — preventing optimization in one area from breaking another.

Pre-Flight
Agent Pre-Flight
Agent

一个项目，
一套方法论One project.
One methodology.

"盲盒式投产"
的真实代价The real cost of
blind production

多模型混排
工作流编排Multi-model routing
workflow orchestration

每个节点，
一个明确职责 Every node,
one clear responsibility

为什么这样设计？Why this design,
not that?

从 v0.1 到 v0.3，
每次失败都有代价From v0.1 to v0.3,
every failure had a cost

20 个测试用例，
真实数据说话20 test cases,
real numbers

选型原则：
适合的而非最贵的Selection principle:
right fit, not most expensive

如果继续迭代，
我会做什么If I continued,
here's what's next

有想法？
聊聊看。 Have a project?
Let's talk.

Pre-FlightAgent Pre-FlightAgent

一个项目，一套方法论One project.One methodology.

"盲盒式投产"的真实代价The real cost ofblind production

多模型混排工作流编排Multi-model routingworkflow orchestration

每个节点，一个明确职责 Every node,one clear responsibility

为什么这样设计？Why this design,not that?

从 v0.1 到 v0.3，每次失败都有代价From v0.1 to v0.3,every failure had a cost

20 个测试用例，真实数据说话20 test cases,real numbers

选型原则：适合的而非最贵的Selection principle:right fit, not most expensive

如果继续迭代，我会做什么If I continued,here's what's next

有想法？聊聊看。 Have a project?Let's talk.

Pre-Flight
Agent Pre-Flight
Agent

一个项目，
一套方法论One project.
One methodology.

"盲盒式投产"
的真实代价The real cost of
blind production

多模型混排
工作流编排Multi-model routing
workflow orchestration

每个节点，
一个明确职责 Every node,
one clear responsibility

为什么这样设计？Why this design,
not that?

从 v0.1 到 v0.3，
每次失败都有代价From v0.1 to v0.3,
every failure had a cost

20 个测试用例，
真实数据说话20 test cases,
real numbers

选型原则：
适合的而非最贵的Selection principle:
right fit, not most expensive

如果继续迭代，
我会做什么If I continued,
here's what's next

有想法？
聊聊看。 Have a project?
Let's talk.