Reflexion Framework Design
Overview
The Reflexion framework implements the Actor-Evaluator-Self-Reflection pattern to enable MCP ADR Analysis Server tools to learn from mistakes through linguistic feedback and self-reflection. This framework maintains the 100% prompt-driven architecture while providing continuous learning and improvement capabilities.
Core Concept
Reflexion Framework works by:
- Actor: Executes tasks and generates outputs based on observations
- Evaluator: Scores and evaluates the Actor's performance
- Self-Reflection: Generates linguistic feedback for improvement
- Memory: Stores lessons learned for future reference
- Iteration: Continuously improves through feedback loops
Research Foundation
Based on Shinn et al. (2023) "Reflexion: Language Agents with Verbal Reinforcement Learning":
- Verbal Reinforcement: Converts feedback into linguistic self-reflection
- Episodic Memory: Stores experiences and lessons learned
- Iterative Improvement: Rapidly learns from prior mistakes
- No Fine-tuning Required: Uses existing LLM capabilities without model updates
Architecture Integration
Existing Components Integration
- PromptObject Interface: Reflexion generates enhanced prompts with memory context
- File System Utilities: Uses existing prompt-driven file operations for memory persistence
- Research Integration: Leverages research utilities for feedback analysis
- Cache System: Stores reflection memories and learning outcomes
Framework Components
Reflexion Framework
├── Actor (Task execution with memory context)
├── Evaluator (Performance assessment and scoring)
├── Self-Reflection (Linguistic feedback generation)
├── Memory Manager (Episodic and long-term memory)
├── Learning Tracker (Progress and improvement monitoring)
└── Integration Layer (MCP tool integration utilities)
Core Reflexion Components
1. Actor Component
Purpose: Execute tasks with memory-enhanced context Responsibilities:
- Task Execution: Perform assigned tasks using current knowledge
- Memory Integration: Incorporate past lessons into current actions
- Context Awareness: Consider previous failures and successes
- Trajectory Generation: Create detailed execution paths for evaluation
2. Evaluator Component
Purpose: Assess Actor performance and provide feedback Evaluation Criteria:
- Task Success: Did the Actor achieve the intended outcome?
- Quality Assessment: How well was the task executed?
- Efficiency Analysis: Was the approach optimal?
- Error Detection: What mistakes were made?
- Improvement Potential: Where can performance be enhanced?
3. Self-Reflection Component
Purpose: Generate linguistic feedback for continuous improvement Reflection Types:
- Success Analysis: What worked well and why?
- Failure Analysis: What went wrong and how to fix it?
- Pattern Recognition: What patterns emerge from multiple attempts?
- Strategy Refinement: How can approaches be improved?
- Knowledge Gaps: What knowledge is missing or incomplete?
4. Memory Manager
Purpose: Store and retrieve lessons learned and experiences Memory Types:
- Episodic Memory: Specific task attempts and outcomes
- Semantic Memory: General lessons and principles learned
- Procedural Memory: Improved methods and approaches
- Meta-Memory: Knowledge about what has been learned
Reflexion Framework Interfaces
Core Reflexion Types
export interface ReflexionConfig {
memoryEnabled: boolean;
maxMemoryEntries: number;
reflectionDepth: 'basic' | 'detailed' | 'comprehensive';
evaluationCriteria: EvaluationCriterion[];
learningRate: number; // How quickly to adapt (0-1)
memoryRetention: number; // How long to keep memories (days)
feedbackIntegration: boolean; // Enable external feedback
}
export interface TaskAttempt {
attemptId: string;
taskType: string;
context: any;
action: string;
outcome: TaskOutcome;
evaluation: EvaluationResult;
reflection: SelfReflection;
timestamp: string;
metadata: AttemptMetadata;
}
export interface TaskOutcome {
success: boolean;
result: any;
errors: string[];
warnings: string[];
executionTime: number;
resourcesUsed: ResourceUsage;
}
export interface EvaluationResult {
overallScore: number; // 0-1 scale
criteriaScores: Record<string, number>;
feedback: string[];
strengths: string[];
weaknesses: string[];
improvementAreas: string[];
confidence: number; // 0-1 scale
}
export interface SelfReflection {
reflectionText: string;
lessonsLearned: string[];
actionableInsights: string[];
futureStrategies: string[];
knowledgeGaps: string[];
confidenceLevel: number; // 0-1 scale
applicability: string[]; // Where these lessons apply
}
Memory System Interfaces
export interface ReflexionMemory {
memoryId: string;
memoryType: MemoryType;
content: MemoryContent;
relevanceScore: number; // 0-1 scale
accessCount: number;
lastAccessed: string;
createdAt: string;
expiresAt?: string;
tags: string[];
metadata: MemoryMetadata;
}
export type MemoryType =
| 'episodic' // Specific experiences
| 'semantic' // General knowledge
| 'procedural' // Methods and approaches
| 'meta' // Learning about learning
| 'feedback'; // External feedback
export interface MemoryContent {
summary: string;
details: string;
context: any;
lessons: string[];
applicableScenarios: string[];
relatedMemories: string[];
evidence: string[];
}
export interface LearningProgress {
taskType: string;
totalAttempts: number;
successRate: number; // 0-1 scale
averageScore: number; // 0-1 scale
improvementTrend: number; // -1 to 1 (declining to improving)
lastImprovement: string;
keyLessons: string[];
persistentIssues: string[];
nextFocusAreas: string[];
}
Integration with MCP Tools
Tool-Specific Reflexion Patterns
1. ADR Generation Tools
Learning Focus:
- Decision Quality: Learn from ADR adoption outcomes
- Context Analysis: Improve understanding of project requirements
- Stakeholder Alignment: Learn from feedback on ADR clarity and relevance
Reflexion Pattern:
// Example: ADR suggestion with reflexion
export async function suggestAdrsWithReflexion(context: any) {
// Step 1: Retrieve relevant memories
const relevantMemories = await retrieveRelevantMemories('adr-suggestion', context);
// Step 2: Generate memory-enhanced prompt
const enhancedPrompt = await enhancePromptWithMemory(
createAdrSuggestionPrompt(context),
relevantMemories
);
// Step 3: Execute task with memory context
const result = await executeWithReflexion(enhancedPrompt, {
taskType: 'adr-suggestion',
context,
evaluationCriteria: ['relevance', 'clarity', 'feasibility']
});
return result;
}
2. Analysis Tools
Learning Focus:
- Pattern Recognition: Learn from successful technology detection patterns
- Context Understanding: Improve project context analysis accuracy
- Insight Generation: Learn from valuable vs. superficial insights
3. Research Tools
Learning Focus:
- Question Quality: Learn from research question effectiveness
- Source Evaluation: Improve research source quality assessment
- Synthesis Skills: Learn from successful research integration patterns
Reflexion Workflow
Phase 1: Memory-Enhanced Execution
Duration: Variable (based on task complexity) Process:
- Memory Retrieval: Find relevant past experiences and lessons
- Context Enhancement: Integrate memories into current task context
- Strategy Selection: Choose approach based on past learnings
- Execution: Perform task with memory-informed strategy
Phase 2: Performance Evaluation
Duration: 30-60 seconds Process:
- Outcome Assessment: Evaluate task success and quality
- Criteria Scoring: Score performance against evaluation criteria
- Error Analysis: Identify specific mistakes and issues
- Strength Recognition: Acknowledge successful aspects
Phase 3: Self-Reflection Generation
Duration: 60-120 seconds Process:
- Experience Analysis: Analyze what happened and why
- Lesson Extraction: Extract actionable lessons learned
- Strategy Refinement: Identify improved approaches
- Knowledge Gap Identification: Recognize missing knowledge
Phase 4: Memory Integration
Duration: 30-60 seconds Process:
- Memory Creation: Create new memory entries from experience
- Memory Linking: Connect to related existing memories
- Memory Consolidation: Strengthen important memories
- Memory Cleanup: Remove outdated or irrelevant memories
Memory Persistence Strategy
File-Based Memory Storage
Using Existing File System Utilities:
// Memory storage using prompt-driven file operations
export async function persistReflexionMemory(memory: ReflexionMemory) {
const memoryPrompt = await generateMemoryPersistencePrompt(memory);
// Delegate to AI for file system operations
return {
content: [{
type: 'text',
text: memoryPrompt.prompt
}],
metadata: {
operation: 'memory_persistence',
memoryId: memory.memoryId,
memoryType: memory.memoryType
}
};
}
Memory Organization
- Directory Structure:
docs/reflexion-memory/
episodic/
- Specific task attempts and outcomessemantic/
- General lessons and principlesprocedural/
- Improved methods and approachesmeta/
- Learning about learning patterns
Memory Formats
- JSON Format: Structured memory data for programmatic access
- Markdown Format: Human-readable memory summaries
- Index Files: Memory catalogs and search indices
Performance Optimization
Memory Management
- Memory Limits: Maximum number of memories per type
- Relevance Scoring: Prioritize most relevant memories
- Automatic Cleanup: Remove outdated or low-value memories
- Memory Compression: Consolidate similar memories
Learning Efficiency
- Incremental Learning: Build on previous knowledge gradually
- Transfer Learning: Apply lessons across similar tasks
- Meta-Learning: Learn how to learn more effectively
- Feedback Integration: Incorporate external feedback quickly
Security and Validation
Memory Security
- Content Validation: Ensure memory content is safe and appropriate
- Access Control: Control access to sensitive memories
- Privacy Protection: Protect confidential information in memories
- Audit Trail: Track memory access and modifications
Learning Validation
- Progress Verification: Validate that learning is actually occurring
- Quality Assurance: Ensure lessons learned are accurate and valuable
- Bias Detection: Identify and correct learning biases
- Performance Monitoring: Track learning effectiveness over time
Integration Patterns
Pattern 1: Reflexion-Enhanced Tool Execution
// Standard reflexion pattern for any MCP tool
export async function executeWithReflexion(
basePrompt: PromptObject,
reflexionConfig: ReflexionConfig
): Promise<ReflexionResult>
Pattern 2: Memory-Informed Decision Making
// Use past experiences to inform current decisions
export async function makeMemoryInformedDecision(
context: any,
taskType: string
): Promise<DecisionResult>
Pattern 3: Continuous Learning Loop
// Implement continuous learning across multiple task executions
export async function continuousLearningLoop(
taskSequence: TaskDefinition[]
): Promise<LearningOutcome>
This Reflexion framework design provides a comprehensive foundation for enabling MCP tools to learn from mistakes and continuously improve through linguistic feedback and self-reflection while maintaining the 100% prompt-driven architecture.