APE Framework Implementation Strategy
Overview
This document outlines the detailed implementation strategy for the Automatic Prompt Engineer (APE) framework, including prompt candidate generation algorithms, evaluation mechanisms, and optimization workflows.
Implementation Architecture
Core Components Implementation
1. Prompt Candidate Generator
Purpose: Generate diverse, high-quality prompt candidates using multiple strategies
Implementation Approach:
// Pseudo-implementation structure
class PromptCandidateGenerator {
async generateCandidates(
basePrompt: PromptObject,
strategies: GenerationStrategy[],
count: number
): Promise<PromptCandidate[]>
}
Generation Strategies:
Template-based Variation
- Strategy: Use predefined templates with variable substitution
- Templates:
- Imperative: "Please {action} the {subject} by {method}..."
- Collaborative: "Let's work together to {action} {subject}..."
- Instructional: "To {action} {subject}, follow these steps..."
- Question-based: "How can we {action} {subject} using {method}?"
Semantic Variation
- Synonym Replacement: Replace key terms with domain-appropriate synonyms
- Phrase Restructuring: Reorganize sentence structure while preserving meaning
- Perspective Shifting: Change from different viewpoints (user, system, expert)
- Abstraction Level: Adjust between high-level and detailed instructions
Style Variation
- Formal Style: "Please conduct a comprehensive analysis..."
- Conversational Style: "Let's take a look at this project and see..."
- Technical Style: "Execute architectural analysis using established patterns..."
- Instructional Style: "Step 1: Analyze the codebase. Step 2: Identify patterns..."
2. Evaluation Engine
Purpose: Score prompt candidates using multiple evaluation criteria
Evaluation Criteria Implementation:
Task Completion (Weight: 30%)
- Metric: How well the prompt achieves the intended task
- Evaluation: Compare expected vs actual outcomes
- Scoring: 0-1 scale based on task success rate
Clarity (Weight: 25%)
- Metric: How clear and unambiguous the prompt is
- Evaluation: Analyze sentence structure, word choice, organization
- Scoring: Readability scores, ambiguity detection
Specificity (Weight: 20%)
- Metric: How specific and actionable the prompt is
- Evaluation: Count specific instructions, concrete examples
- Scoring: Ratio of specific to general statements
Robustness (Weight: 15%)
- Metric: How well the prompt handles edge cases
- Evaluation: Test with various input scenarios
- Scoring: Success rate across different contexts
Efficiency (Weight: 10%)
- Metric: How concise yet comprehensive the prompt is
- Evaluation: Information density, redundancy analysis
- Scoring: Information per token ratio
3. Selection Algorithm
Purpose: Choose optimal prompts from evaluated candidates
Selection Strategies:
Multi-criteria Selection (Recommended)
function selectOptimalPrompts(
candidates: EvaluationResult[],
criteria: EvaluationCriterion[],
weights: Record<EvaluationCriterion, number>
): PromptCandidate[]
Algorithm:
- Weighted Scoring: Calculate weighted average of all criteria
- Pareto Optimization: Find candidates that are not dominated by others
- Diversity Filtering: Ensure selected prompts are sufficiently different
- Quality Threshold: Filter candidates below minimum quality threshold
Optimization Workflow
Phase 1: Initial Candidate Generation
Duration: 30-60 seconds Process:
- Strategy Selection: Choose appropriate generation strategies based on task type
- Parallel Generation: Generate candidates using multiple strategies simultaneously
- Quality Filtering: Remove obviously poor candidates early
- Diversity Checking: Ensure sufficient diversity in candidate pool
Phase 2: Comprehensive Evaluation
Duration: 60-120 seconds Process:
- Automated Evaluation: Apply all evaluation criteria to each candidate
- Context Matching: Assess how well candidates fit the specific context
- Performance Prediction: Estimate likely performance of each candidate
- Bias Detection: Check for potential biases in prompts
Phase 3: Selection and Refinement
Duration: 30-60 seconds Process:
- Multi-criteria Selection: Select top candidates using weighted criteria
- Ensemble Creation: Combine strengths of multiple good candidates
- Refinement Generation: Create refined versions of top candidates
- Final Validation: Validate selected prompts against requirements
Phase 4: Optimization Loop (Optional)
Duration: Variable (1-3 iterations) Process:
- Performance Feedback: Collect feedback from initial prompt usage
- Pattern Analysis: Identify successful patterns in top-performing prompts
- Targeted Generation: Generate new candidates based on successful patterns
- Convergence Check: Determine if further optimization is beneficial
Integration with MCP Tools
High-Priority Tool Optimizations
1. ADR Generation Tools
Target Tools: generate_adrs_from_prd
, suggest_adrs
Optimization Focus:
- Context Analysis: Better understanding of PRD requirements
- ADR Structure: Optimal ADR format and content organization
- Decision Rationale: Clearer explanation of architectural decisions
APE Configuration:
const adrOptimizationConfig: APEConfig = {
candidateCount: 7,
evaluationCriteria: ['task-completion', 'specificity', 'clarity', 'robustness'],
optimizationRounds: 3,
selectionStrategy: 'multi-criteria',
cacheEnabled: true,
performanceTracking: true,
maxOptimizationTime: 180000, // 3 minutes
qualityThreshold: 0.7,
diversityWeight: 0.3
};
2. Analysis Tools
Target Tools: analyze_project_ecosystem
, get_architectural_context
Optimization Focus:
- Technology Detection: Better identification of technologies and patterns
- Context Extraction: More comprehensive context analysis
- Insight Generation: Deeper architectural insights
3. Research Tools
Target Tools: generate_research_questions
, incorporate_research
Optimization Focus:
- Question Quality: More targeted and valuable research questions
- Research Integration: Better integration of research findings
- Knowledge Synthesis: Improved synthesis of multiple research sources
Tool Integration Pattern
// Example: APE-enhanced tool implementation
export async function generateOptimizedPrompt(
toolName: string,
basePrompt: PromptObject,
context: any,
config?: Partial<APEConfig>
): Promise<{ prompt: string; instructions: string; context: any }> {
// Step 1: Get tool-specific APE configuration
const apeConfig = getToolAPEConfig(toolName, config);
// Step 2: Generate optimization prompt for AI delegation
const optimizationPrompt = `
# Automatic Prompt Engineering Request
Please optimize the following prompt using APE techniques for the ${toolName} tool.
## Original Prompt
\`\`\`
${basePrompt.prompt}
\`\`\`
## Original Instructions
\`\`\`
${basePrompt.instructions}
\`\`\`
## Context
${JSON.stringify(context, null, 2)}
## APE Configuration
- **Candidate Count**: ${apeConfig.candidateCount}
- **Evaluation Criteria**: ${apeConfig.evaluationCriteria.join(', ')}
- **Selection Strategy**: ${apeConfig.selectionStrategy}
- **Quality Threshold**: ${apeConfig.qualityThreshold}
## Optimization Tasks
### Step 1: Generate Prompt Candidates
Create ${apeConfig.candidateCount} prompt variations using these strategies:
1. **Template Variation**: Use different prompt templates
2. **Semantic Variation**: Rephrase while preserving meaning
3. **Style Variation**: Adjust tone and style
4. **Structure Variation**: Reorganize prompt structure
5. **Specificity Variation**: Adjust detail level
### Step 2: Evaluate Candidates
Evaluate each candidate on:
${apeConfig.evaluationCriteria.map(criterion => `- **${criterion}**: Score 0-1 based on ${getEvaluationDescription(criterion)}`).join('\n')}
### Step 3: Select Optimal Prompt
Use ${apeConfig.selectionStrategy} strategy to select the best prompt considering:
- Weighted evaluation scores
- Diversity requirements
- Quality threshold (${apeConfig.qualityThreshold})
- Context appropriateness
## Expected Output Format
\`\`\`json
{
"optimizedPrompt": {
"prompt": "optimized prompt text",
"instructions": "optimized instructions",
"context": { "optimization_metadata": "..." }
},
"optimization": {
"candidatesGenerated": number,
"candidatesEvaluated": number,
"improvementScore": number,
"optimizationReasoning": "explanation of improvements",
"evaluationScores": {
"task-completion": number,
"clarity": number,
"specificity": number
}
}
}
\`\`\`
## Quality Requirements
- Optimized prompt must score above ${apeConfig.qualityThreshold} on all criteria
- Must maintain original task objectives
- Should improve clarity and effectiveness
- Must be appropriate for the ${toolName} context
`;
const instructions = `
# APE Optimization Instructions
You must:
1. **Generate Diverse Candidates**: Create varied prompt alternatives using multiple strategies
2. **Evaluate Systematically**: Score each candidate on all specified criteria
3. **Select Optimally**: Choose the best prompt using the specified selection strategy
4. **Validate Quality**: Ensure the optimized prompt meets quality thresholds
5. **Document Improvements**: Explain how the optimized prompt is better
## Success Criteria
- Optimized prompt scores higher than original on evaluation criteria
- Maintains task objectives and context appropriateness
- Provides clear improvement reasoning
- Follows exact JSON output format
`;
return {
prompt: optimizationPrompt,
instructions,
context: {
operation: 'ape_optimization',
toolName,
originalPrompt: basePrompt,
apeConfig,
securityLevel: 'high',
expectedFormat: 'json'
}
};
}
Performance Optimization
Caching Strategy
- Candidate Cache: Cache generated candidates by strategy and context
- Evaluation Cache: Cache evaluation results for reuse
- Optimization Cache: Cache final optimized prompts
- Performance Cache: Cache performance metrics and feedback
Resource Management
- Parallel Processing: Generate and evaluate candidates in parallel
- Memory Limits: Implement memory usage limits for large optimization tasks
- Time Limits: Set maximum optimization time to prevent runaway processes
- Quality Gates: Stop optimization early if quality threshold is met
Monitoring and Metrics
- Optimization Success Rate: Track percentage of successful optimizations
- Performance Improvement: Measure average improvement scores
- Resource Usage: Monitor CPU, memory, and time usage
- User Satisfaction: Collect feedback on optimized prompts
This implementation strategy provides a comprehensive roadmap for building the APE framework while maintaining the 100% prompt-driven architecture and ensuring optimal performance across MCP tools.