APE Framework Implementation Strategy

Overview

This document outlines the detailed implementation strategy for the Automatic Prompt Engineer (APE) framework, including prompt candidate generation algorithms, evaluation mechanisms, and optimization workflows.

Implementation Architecture

Core Components Implementation

1. Prompt Candidate Generator

Purpose: Generate diverse, high-quality prompt candidates using multiple strategies

Implementation Approach:

// Pseudo-implementation structure
class PromptCandidateGenerator {
  async generateCandidates(
    basePrompt: PromptObject,
    strategies: GenerationStrategy[],
    count: number
  ): Promise<PromptCandidate[]>;
}

Generation Strategies:

Template-based Variation

Strategy: Use predefined templates with variable substitution
Templates:
- Imperative: "Please {action} the {subject} by {method}..."
- Collaborative: "Let's work together to {action} {subject}..."
- Instructional: "To {action} {subject}, follow these steps..."
- Question-based: "How can we {action} {subject} using {method}?"

Semantic Variation

Synonym Replacement: Replace key terms with domain-appropriate synonyms
Phrase Restructuring: Reorganize sentence structure while preserving meaning
Perspective Shifting: Change from different viewpoints (user, system, expert)
Abstraction Level: Adjust between high-level and detailed instructions

Style Variation

Formal Style: "Please conduct a comprehensive analysis..."
Conversational Style: "Let's take a look at this project and see..."
Technical Style: "Execute architectural analysis using established patterns..."
Instructional Style: "Step 1: Analyze the codebase. Step 2: Identify patterns..."

2. Evaluation Engine

Purpose: Score prompt candidates using multiple evaluation criteria

Evaluation Criteria Implementation:

Task Completion (Weight: 30%)

Metric: How well the prompt achieves the intended task
Evaluation: Compare expected vs actual outcomes
Scoring: 0-1 scale based on task success rate

Clarity (Weight: 25%)

Metric: How clear and unambiguous the prompt is
Evaluation: Analyze sentence structure, word choice, organization
Scoring: Readability scores, ambiguity detection

Specificity (Weight: 20%)

Metric: How specific and actionable the prompt is
Evaluation: Count specific instructions, concrete examples
Scoring: Ratio of specific to general statements

Robustness (Weight: 15%)

Metric: How well the prompt handles edge cases
Evaluation: Test with various input scenarios
Scoring: Success rate across different contexts

Efficiency (Weight: 10%)

Metric: How concise yet comprehensive the prompt is
Evaluation: Information density, redundancy analysis
Scoring: Information per token ratio

3. Selection Algorithm

Purpose: Choose optimal prompts from evaluated candidates

Selection Strategies:

Multi-criteria Selection (Recommended)

function selectOptimalPrompts(
  candidates: EvaluationResult[],
  criteria: EvaluationCriterion[],
  weights: Record<EvaluationCriterion, number>
): PromptCandidate[];

Algorithm:

Weighted Scoring: Calculate weighted average of all criteria
Pareto Optimization: Find candidates that are not dominated by others
Diversity Filtering: Ensure selected prompts are sufficiently different
Quality Threshold: Filter candidates below minimum quality threshold

Optimization Workflow

Phase 1: Initial Candidate Generation

Duration: 30-60 seconds Process:

Strategy Selection: Choose appropriate generation strategies based on task type
Parallel Generation: Generate candidates using multiple strategies simultaneously
Quality Filtering: Remove obviously poor candidates early
Diversity Checking: Ensure sufficient diversity in candidate pool

Phase 2: Comprehensive Evaluation

Duration: 60-120 seconds Process:

Automated Evaluation: Apply all evaluation criteria to each candidate
Context Matching: Assess how well candidates fit the specific context
Performance Prediction: Estimate likely performance of each candidate
Bias Detection: Check for potential biases in prompts

Duration: 30-60 seconds Process:

Multi-criteria Selection: Select top candidates using weighted criteria
Ensemble Creation: Combine strengths of multiple good candidates
Refinement Generation: Create refined versions of top candidates
Final Validation: Validate selected prompts against requirements

Phase 4: Optimization Loop (Optional)

Duration: Variable (1-3 iterations) Process:

Performance Feedback: Collect feedback from initial prompt usage
Pattern Analysis: Identify successful patterns in top-performing prompts
Targeted Generation: Generate new candidates based on successful patterns
Convergence Check: Determine if further optimization is beneficial

Integration with MCP Tools

High-Priority Tool Optimizations

1. ADR Generation Tools

Target Tools: generate_adrs_from_prd, suggest_adrs Optimization Focus:

Context Analysis: Better understanding of PRD requirements
ADR Structure: Optimal ADR format and content organization
Decision Rationale: Clearer explanation of architectural decisions

APE Configuration:

const adrOptimizationConfig: APEConfig = {
  candidateCount: 7,
  evaluationCriteria: ['task-completion', 'specificity', 'clarity', 'robustness'],
  optimizationRounds: 3,
  selectionStrategy: 'multi-criteria',
  cacheEnabled: true,
  performanceTracking: true,
  maxOptimizationTime: 180000, // 3 minutes
  qualityThreshold: 0.7,
  diversityWeight: 0.3,
};

2. Analysis Tools

Target Tools: analyze_project_ecosystem, get_architectural_context Optimization Focus:

Technology Detection: Better identification of technologies and patterns
Context Extraction: More comprehensive context analysis
Insight Generation: Deeper architectural insights

3. Research Tools

Target Tools: generate_research_questions, incorporate_research Optimization Focus:

Question Quality: More targeted and valuable research questions
Research Integration: Better integration of research findings
Knowledge Synthesis: Improved synthesis of multiple research sources

Tool Integration Pattern

// Example: APE-enhanced tool implementation
export async function generateOptimizedPrompt(
  toolName: string,
  basePrompt: PromptObject,
  context: any,
  config?: Partial<APEConfig>
): Promise<{ prompt: string; instructions: string; context: any }> {
  // Step 1: Get tool-specific APE configuration
  const apeConfig = getToolAPEConfig(toolName, config);

  // Step 2: Generate optimization prompt for AI delegation
  const optimizationPrompt = `
# Automatic Prompt Engineering Request

Please optimize the following prompt using APE techniques for the ${toolName} tool.

## Original Prompt
\`\`\`
${basePrompt.prompt}
\`\`\`

## Original Instructions
\`\`\`
${basePrompt.instructions}
\`\`\`

## Context
${JSON.stringify(context, null, 2)}

## APE Configuration
- **Candidate Count**: ${apeConfig.candidateCount}
- **Evaluation Criteria**: ${apeConfig.evaluationCriteria.join(', ')}
- **Selection Strategy**: ${apeConfig.selectionStrategy}
- **Quality Threshold**: ${apeConfig.qualityThreshold}

## Optimization Tasks

### Step 1: Generate Prompt Candidates
Create ${apeConfig.candidateCount} prompt variations using these strategies:
1. **Template Variation**: Use different prompt templates
2. **Semantic Variation**: Rephrase while preserving meaning
3. **Style Variation**: Adjust tone and style
4. **Structure Variation**: Reorganize prompt structure
5. **Specificity Variation**: Adjust detail level

### Step 2: Evaluate Candidates
Evaluate each candidate on:
${apeConfig.evaluationCriteria.map(criterion => `- **${criterion}**: Score 0-1 based on ${getEvaluationDescription(criterion)}`).join('\n')}

### Step 3: Select Optimal Prompt
Use ${apeConfig.selectionStrategy} strategy to select the best prompt considering:
- Weighted evaluation scores
- Diversity requirements
- Quality threshold (${apeConfig.qualityThreshold})
- Context appropriateness

## Expected Output Format
\`\`\`json
{
  "optimizedPrompt": {
    "prompt": "optimized prompt text",
    "instructions": "optimized instructions",
    "context": { "optimization_metadata": "..." }
  },
  "optimization": {
    "candidatesGenerated": number,
    "candidatesEvaluated": number,
    "improvementScore": number,
    "optimizationReasoning": "explanation of improvements",
    "evaluationScores": {
      "task-completion": number,
      "clarity": number,
      "specificity": number
    }
  }
}
\`\`\`

## Quality Requirements
- Optimized prompt must score above ${apeConfig.qualityThreshold} on all criteria
- Must maintain original task objectives
- Should improve clarity and effectiveness
- Must be appropriate for the ${toolName} context
`;

  const instructions = `
# APE Optimization Instructions

You must:
1. **Generate Diverse Candidates**: Create varied prompt alternatives using multiple strategies
2. **Evaluate Systematically**: Score each candidate on all specified criteria
3. **Select Optimally**: Choose the best prompt using the specified selection strategy
4. **Validate Quality**: Ensure the optimized prompt meets quality thresholds
5. **Document Improvements**: Explain how the optimized prompt is better

## Success Criteria
- Optimized prompt scores higher than original on evaluation criteria
- Maintains task objectives and context appropriateness
- Provides clear improvement reasoning
- Follows exact JSON output format
`;

  return {
    prompt: optimizationPrompt,
    instructions,
    context: {
      operation: 'ape_optimization',
      toolName,
      originalPrompt: basePrompt,
      apeConfig,
      securityLevel: 'high',
      expectedFormat: 'json',
    },
  };
}

Performance Optimization

Caching Strategy

Candidate Cache: Cache generated candidates by strategy and context
Evaluation Cache: Cache evaluation results for reuse
Optimization Cache: Cache final optimized prompts
Performance Cache: Cache performance metrics and feedback

Resource Management

Parallel Processing: Generate and evaluate candidates in parallel
Memory Limits: Implement memory usage limits for large optimization tasks
Time Limits: Set maximum optimization time to prevent runaway processes
Quality Gates: Stop optimization early if quality threshold is met

Monitoring and Metrics

Optimization Success Rate: Track percentage of successful optimizations
Performance Improvement: Measure average improvement scores
Resource Usage: Monitor CPU, memory, and time usage
User Satisfaction: Collect feedback on optimized prompts

This implementation strategy provides a comprehensive roadmap for building the APE framework while maintaining the 100% prompt-driven architecture and ensuring optimal performance across MCP tools.

Overview​

Implementation Architecture​

Core Components Implementation​

1. Prompt Candidate Generator​

Template-based Variation​

Semantic Variation​

Style Variation​

2. Evaluation Engine​

Task Completion (Weight: 30%)​

Clarity (Weight: 25%)​

Specificity (Weight: 20%)​

Robustness (Weight: 15%)​

Efficiency (Weight: 10%)​

3. Selection Algorithm​

Multi-criteria Selection (Recommended)​

Optimization Workflow​

Phase 1: Initial Candidate Generation​

Phase 2: Comprehensive Evaluation​

Phase 3: Selection and Refinement​

Phase 4: Optimization Loop (Optional)​

Integration with MCP Tools​

High-Priority Tool Optimizations​

1. ADR Generation Tools​

2. Analysis Tools​

3. Research Tools​

Tool Integration Pattern​

Performance Optimization​

Caching Strategy​

Resource Management​

Monitoring and Metrics​