Automatic Prompt Engineer (APE) Framework Design
Overviewโ
The Automatic Prompt Engineer (APE) framework implements advanced prompting techniques to automatically generate, evaluate, and optimize prompts for better performance across MCP ADR Analysis Server tools. This framework maintains the 100% prompt-driven architecture while providing intelligent prompt optimization capabilities.
Core Conceptโ
APE Framework works by:
- Candidate Generation: Generate multiple prompt candidates for a given task
- Evaluation: Evaluate prompt effectiveness using scoring mechanisms
- Selection: Select the best-performing prompts based on evaluation results
- Optimization: Iteratively improve prompts through feedback loops
- Caching: Cache optimized prompts for reuse and performance
Research Foundationโ
Based on Zhou et al. (2022) "Large Language Models Are Human-Level Prompt Engineers":
- Instruction Generation: Treat prompt creation as natural language synthesis
- Black-box Optimization: Use LLMs to generate and search candidate solutions
- Evaluation-Driven Selection: Select prompts based on computed evaluation scores
- Iterative Improvement: Continuously refine prompts through feedback
Architecture Integrationโ
Existing Components Integrationโ
- PromptObject Interface: APE generates optimized PromptObject instances
- Prompt Composition: Uses existing
combinePrompts()and composition utilities - Cache System: Leverages prompt-driven cache for storing optimized prompts
- MCP Tools: Integrates with existing tool structure for prompt optimization
Framework Componentsโ
APE Framework
โโโ Candidate Generator (Generate prompt variations)
โโโ Evaluation Engine (Score prompt effectiveness)
โโโ Selection Algorithm (Choose best prompts)
โโโ Optimization Loop (Iterative improvement)
โโโ Performance Tracker (Monitor optimization metrics)
โโโ Cache Manager (Store and retrieve optimized prompts)
Core APE Componentsโ
1. Prompt Candidate Generationโ
Purpose: Generate multiple prompt variations for optimization Strategies:
- Template-based Generation: Use predefined templates with variations
- Semantic Variation: Generate semantically similar but structurally different prompts
- Style Variation: Vary prompt style (formal, conversational, technical)
- Length Variation: Generate short, medium, and long prompt versions
- Structure Variation: Different prompt organization patterns
2. Prompt Evaluation Engineโ
Purpose: Score prompt effectiveness using multiple criteria Evaluation Criteria:
- Task Completion: How well the prompt achieves the intended task
- Clarity: How clear and unambiguous the prompt is
- Specificity: How specific and actionable the prompt is
- Robustness: How well the prompt handles edge cases
- Efficiency: How concise yet comprehensive the prompt is
3. Selection Algorithmโ
Purpose: Choose the best prompts from candidates Selection Methods:
- Score-based Selection: Select highest-scoring prompts
- Multi-criteria Selection: Balance multiple evaluation criteria
- Ensemble Selection: Combine multiple good prompts
- Context-aware Selection: Choose prompts based on specific contexts
4. Optimization Loopโ
Purpose: Iteratively improve prompts through feedback Optimization Process:
- Feedback Collection: Gather performance feedback from prompt usage
- Pattern Analysis: Identify successful prompt patterns
- Refinement: Generate improved prompt candidates
- Validation: Test refined prompts against evaluation criteria
APE Framework Interfacesโ
Core APE Typesโ
export interface APEConfig {
candidateCount: number; // Number of candidates to generate
evaluationCriteria: EvaluationCriterion[];
optimizationRounds: number; // Number of optimization iterations
selectionStrategy: SelectionStrategy;
cacheEnabled: boolean;
performanceTracking: boolean;
}
export interface PromptCandidate {
id: string;
prompt: string;
instructions: string;
context: any;
generationStrategy: string;
metadata: CandidateMetadata;
}
export interface EvaluationResult {
candidateId: string;
scores: Record<string, number>; // Criterion -> Score mapping
overallScore: number;
feedback: string[];
evaluationTime: number;
}
export interface OptimizationResult {
optimizedPrompt: PromptObject;
originalPrompt: PromptObject;
improvementScore: number;
optimizationRounds: number;
candidatesEvaluated: number;
cacheKey: string;
metadata: OptimizationMetadata;
}
Generation Strategiesโ
export type GenerationStrategy =
| 'template-variation'
| 'semantic-variation'
| 'style-variation'
| 'length-variation'
| 'structure-variation'
| 'hybrid-approach';
export type EvaluationCriterion =
| 'task-completion'
| 'clarity'
| 'specificity'
| 'robustness'
| 'efficiency'
| 'context-awareness';
export type SelectionStrategy =
| 'highest-score'
| 'multi-criteria'
| 'ensemble'
| 'context-aware'
| 'balanced';
Integration with MCP Toolsโ
Tool-Specific Optimizationโ
High-Priority Tools for APE Integration:
- generate_adrs_from_prd: Optimize PRD analysis and ADR generation prompts
- suggest_adrs: Optimize ADR suggestion prompts for different contexts
- analyze_project_ecosystem: Optimize analysis prompts for different tech stacks
- generate_research_questions: Optimize research question generation prompts
- incorporate_research: Optimize research integration prompts
Integration Patternโ
// Example: APE-enhanced tool
export async function generateOptimizedAdrSuggestions(context: any) {
// Step 1: Get base prompt
const basePrompt = createAdrSuggestionPrompt(context);
// Step 2: Apply APE optimization
const apeResult = await optimizePromptWithAPE(
basePrompt,
{
candidateCount: 5,
evaluationCriteria: ['task-completion', 'specificity', 'clarity'],
optimizationRounds: 3,
selectionStrategy: 'multi-criteria'
}
);
// Step 3: Return optimized prompt
return {
content: [{
type: 'text',
text: apeResult.optimizedPrompt.prompt
}],
metadata: {
apeOptimization: apeResult.metadata,
improvementScore: apeResult.improvementScore
}
};
}
Prompt Candidate Generation Strategiesโ
1. Template-based Variationโ
const templateVariations = [
"Please {action} the following {subject} by {method}...",
"Your task is to {action} {subject} using {method}...",
"I need you to {action} {subject}. Use {method} to...",
"Can you {action} the {subject}? Apply {method} and..."
];
2. Semantic Variationโ
- Synonym Replacement: Replace key terms with synonyms
- Phrase Restructuring: Reorganize sentence structure
- Perspective Shifting: Change from imperative to collaborative tone
- Detail Level Adjustment: Add or remove detail levels
3. Style Variationโ
- Formal Style: Professional, structured language
- Conversational Style: Friendly, approachable language
- Technical Style: Precise, domain-specific terminology
- Instructional Style: Step-by-step, educational approach
Evaluation Mechanismsโ
1. Automated Evaluationโ
Metrics:
- Prompt Length: Optimal length for clarity vs completeness
- Complexity Score: Readability and comprehension difficulty
- Specificity Index: How specific and actionable the prompt is
- Keyword Density: Presence of important domain keywords
2. Performance-based Evaluationโ
Criteria:
- Task Success Rate: How often the prompt achieves the intended outcome
- Response Quality: Quality of AI responses generated by the prompt
- Consistency: Consistency of results across multiple executions
- Error Rate: Frequency of errors or misunderstandings
3. Context-aware Evaluationโ
Factors:
- Domain Relevance: How well the prompt fits the architectural domain
- Technology Alignment: Alignment with detected technologies
- Project Context: Suitability for the specific project context
- User Preferences: Alignment with user or team preferences
Optimization Workflowโ
Phase 1: Candidate Generation (Parallel)โ
- Template-based Generation: Generate variations using templates
- Semantic Generation: Create semantically similar alternatives
- Style Generation: Produce different style variations
- Structure Generation: Create different organizational patterns