Reflexion Framework Design

Overview

The Reflexion framework implements the Actor-Evaluator-Self-Reflection pattern to enable MCP ADR Analysis Server tools to learn from mistakes through linguistic feedback and self-reflection. This framework maintains the 100% prompt-driven architecture while providing continuous learning and improvement capabilities.

Core Concept

Reflexion Framework works by:

Actor: Executes tasks and generates outputs based on observations
Evaluator: Scores and evaluates the Actor's performance
Self-Reflection: Generates linguistic feedback for improvement
Memory: Stores lessons learned for future reference
Iteration: Continuously improves through feedback loops

Research Foundation

Based on Shinn et al. (2023) "Reflexion: Language Agents with Verbal Reinforcement Learning":

Verbal Reinforcement: Converts feedback into linguistic self-reflection
Episodic Memory: Stores experiences and lessons learned
Iterative Improvement: Rapidly learns from prior mistakes
No Fine-tuning Required: Uses existing LLM capabilities without model updates

Architecture Integration

Existing Components Integration

PromptObject Interface: Reflexion generates enhanced prompts with memory context
File System Utilities: Uses existing prompt-driven file operations for memory persistence
Research Integration: Leverages research utilities for feedback analysis
Cache System: Stores reflection memories and learning outcomes

Framework Components

Reflexion Framework
├── Actor (Task execution with memory context)
├── Evaluator (Performance assessment and scoring)
├── Self-Reflection (Linguistic feedback generation)
├── Memory Manager (Episodic and long-term memory)
├── Learning Tracker (Progress and improvement monitoring)
└── Integration Layer (MCP tool integration utilities)

Core Reflexion Components

1. Actor Component

Purpose: Execute tasks with memory-enhanced context Responsibilities:

Task Execution: Perform assigned tasks using current knowledge
Memory Integration: Incorporate past lessons into current actions
Context Awareness: Consider previous failures and successes
Trajectory Generation: Create detailed execution paths for evaluation

2. Evaluator Component

Purpose: Assess Actor performance and provide feedback Evaluation Criteria:

Task Success: Did the Actor achieve the intended outcome?
Quality Assessment: How well was the task executed?
Efficiency Analysis: Was the approach optimal?
Error Detection: What mistakes were made?
Improvement Potential: Where can performance be enhanced?

3. Self-Reflection Component

Purpose: Generate linguistic feedback for continuous improvement Reflection Types:

Success Analysis: What worked well and why?
Failure Analysis: What went wrong and how to fix it?
Pattern Recognition: What patterns emerge from multiple attempts?
Strategy Refinement: How can approaches be improved?
Knowledge Gaps: What knowledge is missing or incomplete?

4. Memory Manager

Purpose: Store and retrieve lessons learned and experiences Memory Types:

Episodic Memory: Specific task attempts and outcomes
Semantic Memory: General lessons and principles learned
Procedural Memory: Improved methods and approaches
Meta-Memory: Knowledge about what has been learned

Reflexion Framework Interfaces

Core Reflexion Types

export interface ReflexionConfig {
  memoryEnabled: boolean;
  maxMemoryEntries: number;
  reflectionDepth: 'basic' | 'detailed' | 'comprehensive';
  evaluationCriteria: EvaluationCriterion[];
  learningRate: number;              // How quickly to adapt (0-1)
  memoryRetention: number;           // How long to keep memories (days)
  feedbackIntegration: boolean;      // Enable external feedback
}

export interface TaskAttempt {
  attemptId: string;
  taskType: string;
  context: any;
  action: string;
  outcome: TaskOutcome;
  evaluation: EvaluationResult;
  reflection: SelfReflection;
  timestamp: string;
  metadata: AttemptMetadata;
}

export interface TaskOutcome {
  success: boolean;
  result: any;
  errors: string[];
  warnings: string[];
  executionTime: number;
  resourcesUsed: ResourceUsage;
}

export interface EvaluationResult {
  overallScore: number;             // 0-1 scale
  criteriaScores: Record<string, number>;
  feedback: string[];
  strengths: string[];
  weaknesses: string[];
  improvementAreas: string[];
  confidence: number;               // 0-1 scale
}

export interface SelfReflection {
  reflectionText: string;
  lessonsLearned: string[];
  actionableInsights: string[];
  futureStrategies: string[];
  knowledgeGaps: string[];
  confidenceLevel: number;          // 0-1 scale
  applicability: string[];          // Where these lessons apply
}

Memory System Interfaces

export interface ReflexionMemory {
  memoryId: string;
  memoryType: MemoryType;
  content: MemoryContent;
  relevanceScore: number;           // 0-1 scale
  accessCount: number;
  lastAccessed: string;
  createdAt: string;
  expiresAt?: string;
  tags: string[];
  metadata: MemoryMetadata;
}

export type MemoryType = 
  | 'episodic'                      // Specific experiences
  | 'semantic'                      // General knowledge
  | 'procedural'                    // Methods and approaches
  | 'meta'                          // Learning about learning
  | 'feedback';                     // External feedback

export interface MemoryContent {
  summary: string;
  details: string;
  context: any;
  lessons: string[];
  applicableScenarios: string[];
  relatedMemories: string[];
  evidence: string[];
}

export interface LearningProgress {
  taskType: string;
  totalAttempts: number;
  successRate: number;              // 0-1 scale
  averageScore: number;             // 0-1 scale
  improvementTrend: number;         // -1 to 1 (declining to improving)
  lastImprovement: string;
  keyLessons: string[];
  persistentIssues: string[];
  nextFocusAreas: string[];
}

Integration with MCP Tools

Tool-Specific Reflexion Patterns

1. ADR Generation Tools

Learning Focus:

Decision Quality: Learn from ADR adoption outcomes
Context Analysis: Improve understanding of project requirements
Stakeholder Alignment: Learn from feedback on ADR clarity and relevance

Reflexion Pattern:

// Example: ADR suggestion with reflexion
export async function suggestAdrsWithReflexion(context: any) {
  // Step 1: Retrieve relevant memories
  const relevantMemories = await retrieveRelevantMemories('adr-suggestion', context);
  
  // Step 2: Generate memory-enhanced prompt
  const enhancedPrompt = await enhancePromptWithMemory(
    createAdrSuggestionPrompt(context),
    relevantMemories
  );
  
  // Step 3: Execute task with memory context
  const result = await executeWithReflexion(enhancedPrompt, {
    taskType: 'adr-suggestion',
    context,
    evaluationCriteria: ['relevance', 'clarity', 'feasibility']
  });
  
  return result;
}

2. Analysis Tools

Learning Focus:

Pattern Recognition: Learn from successful technology detection patterns
Context Understanding: Improve project context analysis accuracy
Insight Generation: Learn from valuable vs. superficial insights

3. Research Tools

Learning Focus:

Question Quality: Learn from research question effectiveness
Source Evaluation: Improve research source quality assessment
Synthesis Skills: Learn from successful research integration patterns

Reflexion Workflow

Phase 1: Memory-Enhanced Execution

Duration: Variable (based on task complexity) Process:

Memory Retrieval: Find relevant past experiences and lessons
Context Enhancement: Integrate memories into current task context
Strategy Selection: Choose approach based on past learnings
Execution: Perform task with memory-informed strategy

Phase 2: Performance Evaluation

Duration: 30-60 seconds Process:

Outcome Assessment: Evaluate task success and quality
Criteria Scoring: Score performance against evaluation criteria
Error Analysis: Identify specific mistakes and issues
Strength Recognition: Acknowledge successful aspects

Phase 3: Self-Reflection Generation

Duration: 60-120 seconds Process:

Experience Analysis: Analyze what happened and why
Lesson Extraction: Extract actionable lessons learned
Strategy Refinement: Identify improved approaches
Knowledge Gap Identification: Recognize missing knowledge

Phase 4: Memory Integration

Duration: 30-60 seconds Process:

Memory Creation: Create new memory entries from experience
Memory Linking: Connect to related existing memories
Memory Consolidation: Strengthen important memories
Memory Cleanup: Remove outdated or irrelevant memories

Memory Persistence Strategy

File-Based Memory Storage

Using Existing File System Utilities:

// Memory storage using prompt-driven file operations
export async function persistReflexionMemory(memory: ReflexionMemory) {
  const memoryPrompt = await generateMemoryPersistencePrompt(memory);
  
  // Delegate to AI for file system operations
  return {
    content: [{
      type: 'text',
      text: memoryPrompt.prompt
    }],
    metadata: {
      operation: 'memory_persistence',
      memoryId: memory.memoryId,
      memoryType: memory.memoryType
    }
  };
}

Memory Organization

Directory Structure: ./reflexion-memory/
- episodic/ - Specific task attempts and outcomes
- semantic/ - General lessons and principles
- procedural/ - Improved methods and approaches
- meta/ - Learning about learning patterns

Memory Formats

JSON Format: Structured memory data for programmatic access
Markdown Format: Human-readable memory summaries
Index Files: Memory catalogs and search indices

Performance Optimization

Memory Management

Memory Limits: Maximum number of memories per type
Relevance Scoring: Prioritize most relevant memories
Automatic Cleanup: Remove outdated or low-value memories
Memory Compression: Consolidate similar memories

Learning Efficiency

Incremental Learning: Build on previous knowledge gradually
Transfer Learning: Apply lessons across similar tasks
Meta-Learning: Learn how to learn more effectively
Feedback Integration: Incorporate external feedback quickly

Security and Validation

Memory Security

Content Validation: Ensure memory content is safe and appropriate
Access Control: Control access to sensitive memories
Privacy Protection: Protect confidential information in memories
Audit Trail: Track memory access and modifications

Learning Validation

Progress Verification: Validate that learning is actually occurring
Quality Assurance: Ensure lessons learned are accurate and valuable
Bias Detection: Identify and correct learning biases
Performance Monitoring: Track learning effectiveness over time

Integration Patterns

Pattern 1: Reflexion-Enhanced Tool Execution

// Standard reflexion pattern for any MCP tool
export async function executeWithReflexion(
  basePrompt: PromptObject,
  reflexionConfig: ReflexionConfig
): Promise<ReflexionResult>

Pattern 2: Memory-Informed Decision Making

// Use past experiences to inform current decisions
export async function makeMemoryInformedDecision(
  context: any,
  taskType: string
): Promise<DecisionResult>

Pattern 3: Continuous Learning Loop

// Implement continuous learning across multiple task executions
export async function continuousLearningLoop(
  taskSequence: TaskDefinition[]
): Promise<LearningOutcome>

This Reflexion framework design provides a comprehensive foundation for enabling MCP tools to learn from mistakes and continuously improve through linguistic feedback and self-reflection while maintaining the 100% prompt-driven architecture.

Overview​

Core Concept​

Research Foundation​

Architecture Integration​

Existing Components Integration​

Framework Components​

Core Reflexion Components​

1. Actor Component​

2. Evaluator Component​

3. Self-Reflection Component​

4. Memory Manager​

Reflexion Framework Interfaces​

Core Reflexion Types​

Memory System Interfaces​

Integration with MCP Tools​

Tool-Specific Reflexion Patterns​

1. ADR Generation Tools​

2. Analysis Tools​

3. Research Tools​

Reflexion Workflow​

Phase 1: Memory-Enhanced Execution​

Phase 2: Performance Evaluation​

Phase 3: Self-Reflection Generation​

Phase 4: Memory Integration​

Memory Persistence Strategy​

File-Based Memory Storage​

Memory Organization​

Memory Formats​

Performance Optimization​

Memory Management​

Learning Efficiency​

Security and Validation​

Memory Security​

Learning Validation​

Integration Patterns​

Pattern 1: Reflexion-Enhanced Tool Execution​

Pattern 2: Memory-Informed Decision Making​

Pattern 3: Continuous Learning Loop​

Overview

Core Concept

Research Foundation

Architecture Integration

Existing Components Integration

Framework Components

Core Reflexion Components

1. Actor Component

2. Evaluator Component

3. Self-Reflection Component

4. Memory Manager

Reflexion Framework Interfaces

Core Reflexion Types

Memory System Interfaces

Integration with MCP Tools

Tool-Specific Reflexion Patterns

1. ADR Generation Tools

2. Analysis Tools

3. Research Tools

Reflexion Workflow

Phase 1: Memory-Enhanced Execution

Phase 2: Performance Evaluation

Phase 3: Self-Reflection Generation

Phase 4: Memory Integration

Memory Persistence Strategy

File-Based Memory Storage

Memory Organization

Memory Formats

Performance Optimization

Memory Management

Learning Efficiency

Security and Validation

Memory Security

Learning Validation

Integration Patterns

Pattern 1: Reflexion-Enhanced Tool Execution

Pattern 2: Memory-Informed Decision Making

Pattern 3: Continuous Learning Loop