Skip to main content

Reflexion Framework Design

Overviewโ€‹

The Reflexion framework implements the Actor-Evaluator-Self-Reflection pattern to enable MCP ADR Analysis Server tools to learn from mistakes through linguistic feedback and self-reflection. This framework maintains the 100% prompt-driven architecture while providing continuous learning and improvement capabilities.

Core Conceptโ€‹

Reflexion Framework works by:

  1. Actor: Executes tasks and generates outputs based on observations
  2. Evaluator: Scores and evaluates the Actor's performance
  3. Self-Reflection: Generates linguistic feedback for improvement
  4. Memory: Stores lessons learned for future reference
  5. Iteration: Continuously improves through feedback loops

Research Foundationโ€‹

Based on Shinn et al. (2023) "Reflexion: Language Agents with Verbal Reinforcement Learning":

  • Verbal Reinforcement: Converts feedback into linguistic self-reflection
  • Episodic Memory: Stores experiences and lessons learned
  • Iterative Improvement: Rapidly learns from prior mistakes
  • No Fine-tuning Required: Uses existing LLM capabilities without model updates

Architecture Integrationโ€‹

Existing Components Integrationโ€‹

  • PromptObject Interface: Reflexion generates enhanced prompts with memory context
  • File System Utilities: Uses existing prompt-driven file operations for memory persistence
  • Research Integration: Leverages research utilities for feedback analysis
  • Cache System: Stores reflection memories and learning outcomes

Framework Componentsโ€‹

Reflexion Framework
โ”œโ”€โ”€ Actor (Task execution with memory context)
โ”œโ”€โ”€ Evaluator (Performance assessment and scoring)
โ”œโ”€โ”€ Self-Reflection (Linguistic feedback generation)
โ”œโ”€โ”€ Memory Manager (Episodic and long-term memory)
โ”œโ”€โ”€ Learning Tracker (Progress and improvement monitoring)
โ””โ”€โ”€ Integration Layer (MCP tool integration utilities)

Core Reflexion Componentsโ€‹

1. Actor Componentโ€‹

Purpose: Execute tasks with memory-enhanced context Responsibilities:

  • Task Execution: Perform assigned tasks using current knowledge
  • Memory Integration: Incorporate past lessons into current actions
  • Context Awareness: Consider previous failures and successes
  • Trajectory Generation: Create detailed execution paths for evaluation

2. Evaluator Componentโ€‹

Purpose: Assess Actor performance and provide feedback Evaluation Criteria:

  • Task Success: Did the Actor achieve the intended outcome?
  • Quality Assessment: How well was the task executed?
  • Efficiency Analysis: Was the approach optimal?
  • Error Detection: What mistakes were made?
  • Improvement Potential: Where can performance be enhanced?

3. Self-Reflection Componentโ€‹

Purpose: Generate linguistic feedback for continuous improvement Reflection Types:

  • Success Analysis: What worked well and why?
  • Failure Analysis: What went wrong and how to fix it?
  • Pattern Recognition: What patterns emerge from multiple attempts?
  • Strategy Refinement: How can approaches be improved?
  • Knowledge Gaps: What knowledge is missing or incomplete?

4. Memory Managerโ€‹

Purpose: Store and retrieve lessons learned and experiences Memory Types:

  • Episodic Memory: Specific task attempts and outcomes
  • Semantic Memory: General lessons and principles learned
  • Procedural Memory: Improved methods and approaches
  • Meta-Memory: Knowledge about what has been learned

Reflexion Framework Interfacesโ€‹

Core Reflexion Typesโ€‹

export interface ReflexionConfig {
memoryEnabled: boolean;
maxMemoryEntries: number;
reflectionDepth: 'basic' | 'detailed' | 'comprehensive';
evaluationCriteria: EvaluationCriterion[];
learningRate: number; // How quickly to adapt (0-1)
memoryRetention: number; // How long to keep memories (days)
feedbackIntegration: boolean; // Enable external feedback
}

export interface TaskAttempt {
attemptId: string;
taskType: string;
context: any;
action: string;
outcome: TaskOutcome;
evaluation: EvaluationResult;
reflection: SelfReflection;
timestamp: string;
metadata: AttemptMetadata;
}

export interface TaskOutcome {
success: boolean;
result: any;
errors: string[];
warnings: string[];
executionTime: number;
resourcesUsed: ResourceUsage;
}

export interface EvaluationResult {
overallScore: number; // 0-1 scale
criteriaScores: Record<string, number>;
feedback: string[];
strengths: string[];
weaknesses: string[];
improvementAreas: string[];
confidence: number; // 0-1 scale
}

export interface SelfReflection {
reflectionText: string;
lessonsLearned: string[];
actionableInsights: string[];
futureStrategies: string[];
knowledgeGaps: string[];
confidenceLevel: number; // 0-1 scale
applicability: string[]; // Where these lessons apply
}

Memory System Interfacesโ€‹

export interface ReflexionMemory {
memoryId: string;
memoryType: MemoryType;
content: MemoryContent;
relevanceScore: number; // 0-1 scale
accessCount: number;
lastAccessed: string;
createdAt: string;
expiresAt?: string;
tags: string[];
metadata: MemoryMetadata;
}

export type MemoryType =
| 'episodic' // Specific experiences
| 'semantic' // General knowledge
| 'procedural' // Methods and approaches
| 'meta' // Learning about learning
| 'feedback'; // External feedback

export interface MemoryContent {
summary: string;
details: string;
context: any;
lessons: string[];
applicableScenarios: string[];
relatedMemories: string[];
evidence: string[];
}

export interface LearningProgress {
taskType: string;
totalAttempts: number;
successRate: number; // 0-1 scale
averageScore: number; // 0-1 scale
improvementTrend: number; // -1 to 1 (declining to improving)
lastImprovement: string;
keyLessons: string[];
persistentIssues: string[];
nextFocusAreas: string[];
}

Integration with MCP Toolsโ€‹

Tool-Specific Reflexion Patternsโ€‹

1. ADR Generation Toolsโ€‹

Learning Focus:

  • Decision Quality: Learn from ADR adoption outcomes
  • Context Analysis: Improve understanding of project requirements
  • Stakeholder Alignment: Learn from feedback on ADR clarity and relevance

Reflexion Pattern:

// Example: ADR suggestion with reflexion
export async function suggestAdrsWithReflexion(context: any) {
// Step 1: Retrieve relevant memories
const relevantMemories = await retrieveRelevantMemories('adr-suggestion', context);

// Step 2: Generate memory-enhanced prompt
const enhancedPrompt = await enhancePromptWithMemory(
createAdrSuggestionPrompt(context),
relevantMemories
);

// Step 3: Execute task with memory context
const result = await executeWithReflexion(enhancedPrompt, {
taskType: 'adr-suggestion',
context,
evaluationCriteria: ['relevance', 'clarity', 'feasibility']
});

return result;
}

2. Analysis Toolsโ€‹

Learning Focus:

  • Pattern Recognition: Learn from successful technology detection patterns
  • Context Understanding: Improve project context analysis accuracy
  • Insight Generation: Learn from valuable vs. superficial insights

3. Research Toolsโ€‹

Learning Focus:

  • Question Quality: Learn from research question effectiveness
  • Source Evaluation: Improve research source quality assessment
  • Synthesis Skills: Learn from successful research integration patterns

Reflexion Workflowโ€‹

Phase 1: Memory-Enhanced Executionโ€‹

Duration: Variable (based on task complexity) Process:

  1. Memory Retrieval: Find relevant past experiences and lessons
  2. Context Enhancement: Integrate memories into current task context
  3. Strategy Selection: Choose approach based on past learnings
  4. Execution: Perform task with memory-informed strategy

Phase 2: Performance Evaluationโ€‹

Duration: 30-60 seconds Process:

  1. Outcome Assessment: Evaluate task success and quality
  2. Criteria Scoring: Score performance against evaluation criteria
  3. Error Analysis: Identify specific mistakes and issues
  4. Strength Recognition: Acknowledge successful aspects

Phase 3: Self-Reflection Generationโ€‹

Duration: 60-120 seconds Process:

  1. Experience Analysis: Analyze what happened and why
  2. Lesson Extraction: Extract actionable lessons learned
  3. Strategy Refinement: Identify improved approaches
  4. Knowledge Gap Identification: Recognize missing knowledge

Phase 4: Memory Integrationโ€‹

Duration: 30-60 seconds Process:

  1. Memory Creation: Create new memory entries from experience
  2. Memory Linking: Connect to related existing memories
  3. Memory Consolidation: Strengthen important memories
  4. Memory Cleanup: Remove outdated or irrelevant memories

Memory Persistence Strategyโ€‹

File-Based Memory Storageโ€‹

Using Existing File System Utilities:

// Memory storage using prompt-driven file operations
export async function persistReflexionMemory(memory: ReflexionMemory) {
const memoryPrompt = await generateMemoryPersistencePrompt(memory);

// Delegate to AI for file system operations
return {
content: [{
type: 'text',
text: memoryPrompt.prompt
}],
metadata: {
operation: 'memory_persistence',
memoryId: memory.memoryId,
memoryType: memory.memoryType
}
};
}

Memory Organizationโ€‹

  • Directory Structure: ./reflexion-memory/
    • episodic/ - Specific task attempts and outcomes
    • semantic/ - General lessons and principles
    • procedural/ - Improved methods and approaches
    • meta/ - Learning about learning patterns

Memory Formatsโ€‹

  • JSON Format: Structured memory data for programmatic access
  • Markdown Format: Human-readable memory summaries
  • Index Files: Memory catalogs and search indices

Performance Optimizationโ€‹

Memory Managementโ€‹

  • Memory Limits: Maximum number of memories per type
  • Relevance Scoring: Prioritize most relevant memories
  • Automatic Cleanup: Remove outdated or low-value memories
  • Memory Compression: Consolidate similar memories

Learning Efficiencyโ€‹

  • Incremental Learning: Build on previous knowledge gradually
  • Transfer Learning: Apply lessons across similar tasks
  • Meta-Learning: Learn how to learn more effectively
  • Feedback Integration: Incorporate external feedback quickly

Security and Validationโ€‹

Memory Securityโ€‹

  • Content Validation: Ensure memory content is safe and appropriate
  • Access Control: Control access to sensitive memories
  • Privacy Protection: Protect confidential information in memories
  • Audit Trail: Track memory access and modifications

Learning Validationโ€‹

  • Progress Verification: Validate that learning is actually occurring
  • Quality Assurance: Ensure lessons learned are accurate and valuable
  • Bias Detection: Identify and correct learning biases
  • Performance Monitoring: Track learning effectiveness over time

Integration Patternsโ€‹

Pattern 1: Reflexion-Enhanced Tool Executionโ€‹

// Standard reflexion pattern for any MCP tool
export async function executeWithReflexion(
basePrompt: PromptObject,
reflexionConfig: ReflexionConfig
): Promise<ReflexionResult>

Pattern 2: Memory-Informed Decision Makingโ€‹

// Use past experiences to inform current decisions
export async function makeMemoryInformedDecision(
context: any,
taskType: string
): Promise<DecisionResult>

Pattern 3: Continuous Learning Loopโ€‹

// Implement continuous learning across multiple task executions
export async function continuousLearningLoop(
taskSequence: TaskDefinition[]
): Promise<LearningOutcome>

This Reflexion framework design provides a comprehensive foundation for enabling MCP tools to learn from mistakes and continuously improve through linguistic feedback and self-reflection while maintaining the 100% prompt-driven architecture.