Skip to main content

APE Framework Implementation Strategy

Overviewโ€‹

This document outlines the detailed implementation strategy for the Automatic Prompt Engineer (APE) framework, including prompt candidate generation algorithms, evaluation mechanisms, and optimization workflows.

Implementation Architectureโ€‹

Core Components Implementationโ€‹

1. Prompt Candidate Generatorโ€‹

Purpose: Generate diverse, high-quality prompt candidates using multiple strategies

Implementation Approach:

// Pseudo-implementation structure
class PromptCandidateGenerator {
async generateCandidates(
basePrompt: PromptObject,
strategies: GenerationStrategy[],
count: number
): Promise<PromptCandidate[]>;
}

Generation Strategies:

Template-based Variationโ€‹
  • Strategy: Use predefined templates with variable substitution
  • Templates:
    • Imperative: "Please {action} the {subject} by {method}..."
    • Collaborative: "Let's work together to {action} {subject}..."
    • Instructional: "To {action} {subject}, follow these steps..."
    • Question-based: "How can we {action} {subject} using {method}?"
Semantic Variationโ€‹
  • Synonym Replacement: Replace key terms with domain-appropriate synonyms
  • Phrase Restructuring: Reorganize sentence structure while preserving meaning
  • Perspective Shifting: Change from different viewpoints (user, system, expert)
  • Abstraction Level: Adjust between high-level and detailed instructions
Style Variationโ€‹
  • Formal Style: "Please conduct a comprehensive analysis..."
  • Conversational Style: "Let's take a look at this project and see..."
  • Technical Style: "Execute architectural analysis using established patterns..."
  • Instructional Style: "Step 1: Analyze the codebase. Step 2: Identify patterns..."

2. Evaluation Engineโ€‹

Purpose: Score prompt candidates using multiple evaluation criteria

Evaluation Criteria Implementation:

Task Completion (Weight: 30%)โ€‹
  • Metric: How well the prompt achieves the intended task
  • Evaluation: Compare expected vs actual outcomes
  • Scoring: 0-1 scale based on task success rate
Clarity (Weight: 25%)โ€‹
  • Metric: How clear and unambiguous the prompt is
  • Evaluation: Analyze sentence structure, word choice, organization
  • Scoring: Readability scores, ambiguity detection
Specificity (Weight: 20%)โ€‹
  • Metric: How specific and actionable the prompt is
  • Evaluation: Count specific instructions, concrete examples
  • Scoring: Ratio of specific to general statements
Robustness (Weight: 15%)โ€‹
  • Metric: How well the prompt handles edge cases
  • Evaluation: Test with various input scenarios
  • Scoring: Success rate across different contexts
Efficiency (Weight: 10%)โ€‹
  • Metric: How concise yet comprehensive the prompt is
  • Evaluation: Information density, redundancy analysis
  • Scoring: Information per token ratio

3. Selection Algorithmโ€‹

Purpose: Choose optimal prompts from evaluated candidates

Selection Strategies:

function selectOptimalPrompts(
candidates: EvaluationResult[],
criteria: EvaluationCriterion[],
weights: Record<EvaluationCriterion, number>
): PromptCandidate[];

Algorithm:

  1. Weighted Scoring: Calculate weighted average of all criteria
  2. Pareto Optimization: Find candidates that are not dominated by others
  3. Diversity Filtering: Ensure selected prompts are sufficiently different
  4. Quality Threshold: Filter candidates below minimum quality threshold

Optimization Workflowโ€‹

Phase 1: Initial Candidate Generationโ€‹

Duration: 30-60 seconds Process:

  1. Strategy Selection: Choose appropriate generation strategies based on task type
  2. Parallel Generation: Generate candidates using multiple strategies simultaneously
  3. Quality Filtering: Remove obviously poor candidates early
  4. Diversity Checking: Ensure sufficient diversity in candidate pool

Phase 2: Comprehensive Evaluationโ€‹

Duration: 60-120 seconds Process:

  1. Automated Evaluation: Apply all evaluation criteria to each candidate
  2. Context Matching: Assess how well candidates fit the specific context
  3. Performance Prediction: Estimate likely performance of each candidate
  4. Bias Detection: Check for potential biases in prompts

Phase 3: Selection and Refinementโ€‹

Duration: 30-60 seconds Process:

  1. Multi-criteria Selection: Select top candidates using weighted criteria
  2. Ensemble Creation: Combine strengths of multiple good candidates
  3. Refinement Generation: Create refined versions of top candidates
  4. Final Validation: Validate selected prompts against requirements

Phase 4: Optimization Loop (Optional)โ€‹

Duration: Variable (1-3 iterations) Process:

  1. Performance Feedback: Collect feedback from initial prompt usage
  2. Pattern Analysis: Identify successful patterns in top-performing prompts
  3. Targeted Generation: Generate new candidates based on successful patterns
  4. Convergence Check: Determine if further optimization is beneficial

Integration with MCP Toolsโ€‹

High-Priority Tool Optimizationsโ€‹

1. ADR Generation Toolsโ€‹

Target Tools: generate_adrs_from_prd, suggest_adrs Optimization Focus:

  • Context Analysis: Better understanding of PRD requirements
  • ADR Structure: Optimal ADR format and content organization
  • Decision Rationale: Clearer explanation of architectural decisions

APE Configuration:

const adrOptimizationConfig: APEConfig = {
candidateCount: 7,
evaluationCriteria: ['task-completion', 'specificity', 'clarity', 'robustness'],
optimizationRounds: 3,
selectionStrategy: 'multi-criteria',
cacheEnabled: true,
performanceTracking: true,
maxOptimizationTime: 180000, // 3 minutes
qualityThreshold: 0.7,
diversityWeight: 0.3,
};

2. Analysis Toolsโ€‹

Target Tools: analyze_project_ecosystem, get_architectural_context Optimization Focus:

  • Technology Detection: Better identification of technologies and patterns
  • Context Extraction: More comprehensive context analysis
  • Insight Generation: Deeper architectural insights

3. Research Toolsโ€‹

Target Tools: generate_research_questions, incorporate_research Optimization Focus:

  • Question Quality: More targeted and valuable research questions
  • Research Integration: Better integration of research findings
  • Knowledge Synthesis: Improved synthesis of multiple research sources

Tool Integration Patternโ€‹

// Example: APE-enhanced tool implementation
export async function generateOptimizedPrompt(
toolName: string,
basePrompt: PromptObject,
context: any,
config?: Partial<APEConfig>
): Promise<{ prompt: string; instructions: string; context: any }> {
// Step 1: Get tool-specific APE configuration
const apeConfig = getToolAPEConfig(toolName, config);

// Step 2: Generate optimization prompt for AI delegation
const optimizationPrompt = `
# Automatic Prompt Engineering Request

Please optimize the following prompt using APE techniques for the ${toolName} tool.

## Original Prompt
\`\`\`
${basePrompt.prompt}
\`\`\`

## Original Instructions
\`\`\`
${basePrompt.instructions}
\`\`\`

## Context
${JSON.stringify(context, null, 2)}

## APE Configuration
- **Candidate Count**: ${apeConfig.candidateCount}
- **Evaluation Criteria**: ${apeConfig.evaluationCriteria.join(', ')}
- **Selection Strategy**: ${apeConfig.selectionStrategy}
- **Quality Threshold**: ${apeConfig.qualityThreshold}

## Optimization Tasks

### Step 1: Generate Prompt Candidates
Create ${apeConfig.candidateCount} prompt variations using these strategies:
1. **Template Variation**: Use different prompt templates
2. **Semantic Variation**: Rephrase while preserving meaning
3. **Style Variation**: Adjust tone and style
4. **Structure Variation**: Reorganize prompt structure
5. **Specificity Variation**: Adjust detail level

### Step 2: Evaluate Candidates
Evaluate each candidate on:
${apeConfig.evaluationCriteria.map(criterion => `- **${criterion}**: Score 0-1 based on ${getEvaluationDescription(criterion)}`).join('\n')}

### Step 3: Select Optimal Prompt
Use ${apeConfig.selectionStrategy} strategy to select the best prompt considering:
- Weighted evaluation scores
- Diversity requirements
- Quality threshold (${apeConfig.qualityThreshold})
- Context appropriateness

## Expected Output Format
\`\`\`json
{
"optimizedPrompt": {
"prompt": "optimized prompt text",
"instructions": "optimized instructions",
"context": { "optimization_metadata": "..." }
},
"optimization": {
"candidatesGenerated": number,
"candidatesEvaluated": number,
"improvementScore": number,
"optimizationReasoning": "explanation of improvements",
"evaluationScores": {
"task-completion": number,
"clarity": number,
"specificity": number
}
}
}
\`\`\`

## Quality Requirements
- Optimized prompt must score above ${apeConfig.qualityThreshold} on all criteria
- Must maintain original task objectives
- Should improve clarity and effectiveness
- Must be appropriate for the ${toolName} context
`;

const instructions = `
# APE Optimization Instructions

You must:
1. **Generate Diverse Candidates**: Create varied prompt alternatives using multiple strategies
2. **Evaluate Systematically**: Score each candidate on all specified criteria
3. **Select Optimally**: Choose the best prompt using the specified selection strategy
4. **Validate Quality**: Ensure the optimized prompt meets quality thresholds
5. **Document Improvements**: Explain how the optimized prompt is better

## Success Criteria
- Optimized prompt scores higher than original on evaluation criteria
- Maintains task objectives and context appropriateness
- Provides clear improvement reasoning
- Follows exact JSON output format
`;

return {
prompt: optimizationPrompt,
instructions,
context: {
operation: 'ape_optimization',
toolName,
originalPrompt: basePrompt,
apeConfig,
securityLevel: 'high',
expectedFormat: 'json',
},
};
}

Performance Optimizationโ€‹

Caching Strategyโ€‹

  1. Candidate Cache: Cache generated candidates by strategy and context
  2. Evaluation Cache: Cache evaluation results for reuse
  3. Optimization Cache: Cache final optimized prompts
  4. Performance Cache: Cache performance metrics and feedback

Resource Managementโ€‹

  • Parallel Processing: Generate and evaluate candidates in parallel
  • Memory Limits: Implement memory usage limits for large optimization tasks
  • Time Limits: Set maximum optimization time to prevent runaway processes
  • Quality Gates: Stop optimization early if quality threshold is met

Monitoring and Metricsโ€‹

  • Optimization Success Rate: Track percentage of successful optimizations
  • Performance Improvement: Measure average improvement scores
  • Resource Usage: Monitor CPU, memory, and time usage
  • User Satisfaction: Collect feedback on optimized prompts

This implementation strategy provides a comprehensive roadmap for building the APE framework while maintaining the 100% prompt-driven architecture and ensuring optimal performance across MCP tools.