Drafting Framework¶
The drafting framework implements multi-model generation strategies that trade off speed, quality, and cost. Strategies are defined in drafting/drafting_strategies.yaml.
Drafting is disabled by default (AgentConfig.enable_drafting = False).
Strategies¶
Speculative Decoding¶
A fast draft model generates tokens that a stronger target model verifies, accepting or rejecting each batch.
sequenceDiagram
participant D as Draft Model (fast)
participant T as Target Model (strong)
loop Until done or max iterations
D->>D: Generate N draft tokens
D->>T: Send draft for verification
T->>T: Score each token
T-->>D: Accept/reject (threshold)
end
| Config | Description |
|---|---|
draft_model |
Fast model (e.g., gpt-3.5-turbo, llama3.2) |
target_model |
Strong model (e.g., gpt-4-turbo, claude-3.5-sonnet) |
draft_tokens |
Tokens per draft batch (20–30) |
acceptance_threshold |
Minimum score to accept (0.7–0.8) |
max_iterations |
Maximum draft-verify cycles |
Pre-configured strategies: fast_accurate, local_cloud, claude_fast
Pipeline¶
Multi-stage generation where each stage uses a different model with a specific role.
| Stage Role | Description |
|---|---|
analyze / code |
Initial generation |
critique / review |
Critical review |
refine |
Incorporate feedback |
summarize |
Final synthesis |
Each stage has its own model, system prompt, and temperature.
Pre-configured strategies: code_review (generate → review → refine), writing_pipeline (outline → draft → edit → polish), analysis_pipeline (decompose → research → synthesize)
Candidate Generation¶
Generate multiple candidates and select the best using a scoring method.
| Scoring Method | Description |
|---|---|
majority_vote |
Most common answer wins |
verifier |
Separate model scores each candidate |
length_preference |
Prefer longer/shorter responses |
Pre-configured strategies: consensus (multi-model vote), best_of_n (N candidates + verifier), diverse_ensemble (varied models), self_consistency (same model, multiple samples)
Result Structure¶
DraftResult contains:
| Field | Type | Description |
|---|---|---|
content |
string | Final output |
strategy |
string | Strategy name |
status |
DraftStatus | "complete" or "failed" |
draft_tokens |
int | Tokens drafted |
accepted_tokens |
int | Tokens accepted (speculative) |
models_used |
list[string] | All models involved |
stages_completed |
int | Pipeline stages run |
candidates_generated |
int | Candidates produced |
estimated_cost |
float | Estimated USD cost |
total_time_ms |
float | Elapsed time |
Task Defaults¶
The defaults section in drafting_strategies.yaml maps task types to strategies:
| Task | Strategy |
|---|---|
general |
fast_accurate |
code |
code_review |
writing |
writing_pipeline |
analysis |
analysis_pipeline |
consensus |
consensus |
Related¶
- Providers — Model providers used by drafting
- API Models: DraftResult — Result schema
- Config file:
api/agentx_ai/drafting/drafting_strategies.yaml