DocuMCP Architecture Overview
This document explains the architectural design of DocuMCP, providing insight into how the system works and why key design decisions were made.
High-Level Architectureβ
DocuMCP follows a modular, stateless architecture built on the Model Context Protocol (MCP) standard, designed to provide intelligent documentation deployment capabilities through AI assistant integration.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β AI Assistant Layer β
β (Claude, GPT, Gemini, etc.) β
βββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββββ
β MCP Protocol
βββββββββββββββββββββββΌββββββββββββββββββββββββββββββββββββββββ
β DocuMCP MCP Server β
β βββββββββββββββ ββββββββββββββββ βββββββββββββββββββββββ β
β β Tools β β Prompts β β Resources β β
β β Layer β β Layer β β Layer β β
β βββββββββββββββ ββββββββββββββββ βββββββββββββββββββββββ β
βββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββΌββββββββββββββββββββββββββββββββββββββββ
β Core Engine Layer β
β βββββββββββββββ ββββββββββββββββ βββββββββββββββββββββββ β
β β Repository β β SSG Recommendationβ β Memory System β β
β β Analysis β β Engine β β β β
β βββββββββββββββ β βββββββββββββββ βββββββββββββββββββββββ β
β βββββββββββββββ ββββββββββββββββ βββββββββββββββββββββββ β
β β Content β β Deployment β β Validation β β
β β Generation β β Automation β β Engine β β
β βββββββββββββββ ββββββββββββββββ βββββββββββββββββββββββ β
βββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββΌββββββββββββββββββββββββββββββββββββββββ
β External Integrations Layer β
β βββββββββββββββ ββββββββββββββββ βββββββββ ββββββββββββββ β
β β GitHub β β Static β β File System β β
β β API β β Site β β Operations β β
β β β β Generators β β β β
β βββββββββββββββ ββββββββββββββββ βββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Core Design Principlesβ
1. Stateless Operationβ
DocuMCP operates as a stateless service where each tool invocation is independent:
- No persistent server state between requests
- Self-contained analysis for each repository
- Reproducible results given the same inputs
- Horizontal scalability without coordination
Benefits:
- Reliability and consistency
- Easy debugging and testing
- No complex state management
- Simple deployment model
2. Modular Architectureβ
Each component has a single, well-defined responsibility:
// Tool interface definition
interface MCPTool {
name: string;
description: string;
inputSchema: ZodSchema;
handler: (args: ToolArgs) => Promise<ToolResult>;
}
// Example tool implementation
export async function analyzeRepository(
args: AnalysisArgs,
): Promise<AnalysisResult> {
// Isolated business logic
return performAnalysis(args);
}
Benefits:
- Easy to test and maintain
- Clear separation of concerns
- Extensible without breaking changes
- Independent component evolution
3. Progressive Complexityβ
Users can start simple and add sophistication as needed:
- Basic: Simple repository analysis
- Intermediate: SSG recommendations and configuration
- Advanced: Full deployment automation with optimization
- Expert: Memory-enhanced workflows with pattern learning
4. Security-First Designβ
All operations follow security best practices:
- Minimal permissions in generated workflows
- OIDC authentication for GitHub Actions
- Input validation using Zod schemas
- No secret exposure in logs or outputs
Component Architectureβ
MCP Server Coreβ
The main server (src/index.ts
) implements the MCP protocol specification:
const server = new Server(
{
name: "documcp",
version: packageJson.version,
},
{
capabilities: {
tools: {}, // 25+ documentation tools
prompts: {
// Guided workflow prompts
listChanged: true,
},
resources: {
// Generated content resources
subscribe: true,
listChanged: true,
},
},
},
);
Key Features:
- Tool Registration: Dynamic tool discovery and registration
- Schema Validation: Zod-based input/output validation
- Resource Management: Automatic resource creation and storage
- Error Handling: Comprehensive error management and reporting
Repository Analysis Engineβ
The analysis engine examines projects from multiple perspectives:
interface RepositoryAnalysis {
structure: ProjectStructure; // Files, languages, organization
dependencies: DependencyAnalysis; // Package ecosystems, frameworks
documentation: DocuAnalysis; // Existing docs, quality assessment
recommendations: ProjectProfile; // Type, complexity, team size
}
Analysis Layers:
- File System Analysis: Language detection, structure mapping
- Dependency Analysis: Package manager integration, framework detection
- Documentation Assessment: README quality, existing docs evaluation
- Complexity Scoring: Project size, team collaboration patterns
Performance Characteristics:
- Sub-second analysis for typical repositories
- Memory efficient with streaming file processing
- Extensible language and framework detection
SSG Recommendation Engineβ
A data-driven system for selecting optimal static site generators:
interface SSGRecommendation {
recommended: SSGType;
confidence: number; // 0-1 confidence score
reasoning: string[]; // Human-readable justifications
alternatives: Alternative[]; // Other viable options
scoring: ScoringBreakdown; // Detailed scoring matrix
}
Scoring Factors:
- Ecosystem Alignment: Language/framework compatibility
- Feature Requirements: Search, theming, plugins
- Complexity Match: Project size and team capacity
- Performance Needs: Build speed, site performance
- Maintenance Overhead: Learning curve, ongoing effort
Supported SSGs:
- Jekyll: Ruby-based, GitHub Pages native
- Hugo: Go-based, fast builds, extensive themes
- Docusaurus: React-based, modern features
- MkDocs: Python-based, simple and effective
- Eleventy: JavaScript-based, flexible and fast
Memory System Architectureβ
An intelligent learning system that improves recommendations over time:
interface MemorySystem {
storage: ProjectLocalStorage; // .documcp/memory/
patterns: PatternRecognition; // Success pattern learning
similarity: ProjectSimilarity; // Project comparison engine
insights: HistoricalInsights; // Usage patterns and outcomes
}
Storage Architecture:
.documcp/memory/
βββ analysis/ # Repository analysis results
β βββ analysis_*.jsonl
β βββ metadata.json
βββ recommendations/ # SSG recommendations
β βββ recommendations_*.jsonl
β βββ patterns.json
βββ deployments/ # Deployment outcomes
β βββ deployments_*.jsonl
β βββ success_rates.json
βββ system/ # System metadata
βββ config.json
βββ statistics.json
Learning Mechanisms:
- Pattern Recognition: Successful project-SSG combinations
- Similarity Matching: Find projects with similar characteristics
- Outcome Tracking: Monitor deployment success rates
- Feedback Integration: Learn from user choices and outcomes
Content Generation Systemβ
Automated content creation following the Diataxis framework:
Diataxis Framework Implementation:
βββββββββββββββββββ βββββββββββββββββββ
β Tutorials β β How-to Guides β
β (Learning) β β (Problem- β
β β β solving) β
βββββββββββββββββββ βββββββββββββββββββ
βββββββββββββββββββ βββββββββββββββββββ
β Reference β β Explanation β
β (Information) β β (Understanding) β
βββββββββββββββββββ βββββββββββββββββββ
Content Types Generated:
- Tutorials: Step-by-step learning guides
- How-to Guides: Task-oriented solutions
- Reference: API docs, configuration options
- Explanation: Architecture, design decisions
Deployment Automationβ
Automated GitHub Pages deployment with SSG-specific optimizations:
# Generated workflow characteristics
Security:
- OIDC authentication (no long-lived tokens)
- Minimal permissions (contents:read, pages:write)
- Secret masking and secure handling
Performance:
- Dependency caching (npm, gems, pip)
- Parallel builds where possible
- Optimized Docker images
Reliability:
- Build verification before deployment
- Rollback capabilities
- Health monitoring
Validation Engineβ
Multi-layered validation for content quality assurance:
interface ValidationEngine {
linkChecker: LinkValidation; // Internal/external link verification
contentValidator: ContentValidation; // Diataxis compliance, accuracy
codeValidator: CodeValidation; // Syntax checking, example testing
seoValidator: SEOValidation; // Meta tags, performance, accessibility
}
Validation Levels:
- Syntax: Markdown, code block syntax
- Structure: Diataxis compliance, navigation
- Content: Accuracy, completeness, consistency
- Performance: Loading speed, mobile optimization
- SEO: Meta tags, structured data, accessibility
Data Flow Architectureβ
Request Processing Flowβ
1. MCP Client Request
β
2. Schema Validation (Zod)
β
3. Tool Handler Routing
β
4. Business Logic Execution
β
5. Result Processing
β
6. Resource Storage
β
7. Response Formatting
β
8. MCP Protocol Response
Memory System Flowβ
1. Tool Execution
β
2. Result Analysis
β
3. Pattern Extraction
β
4. Similarity Matching
β
5. Storage Update
β
6. Learning Integration
β
7. Future Recommendations Enhancement
Performance Architectureβ
Performance Characteristicsβ
Analysis Performance:
- Small repos (<100 files): <500ms
- Medium repos (100-1000 files): <2s
- Large repos (1000+ files): <10s
Memory Efficiency:
- Streaming processing for large repositories
- Lazy loading of analysis components
- Garbage collection optimization
Scalability:
- Stateless design enables horizontal scaling
- No database dependencies for core functionality
- Process isolation for security and reliability
Optimization Strategiesβ
-
Caching:
- File system metadata caching
- Analysis result memoization
- Pattern matching optimization
-
Lazy Loading:
- On-demand tool loading
- Progressive analysis depth
- Conditional feature activation
-
Parallel Processing:
- Concurrent file analysis
- Parallel validation checks
- Asynchronous resource generation
Error Handling Architectureβ
Error Categoriesβ
enum ErrorCategory {
VALIDATION = "validation", // Input validation failures
FILESYSTEM = "filesystem", // File system access issues
NETWORK = "network", // GitHub API, external requests
PROCESSING = "processing", // Business logic errors
CONFIGURATION = "configuration", // Setup and config issues
}
Error Recovery Strategiesβ
- Graceful Degradation: Partial results when possible
- Retry Logic: Exponential backoff for transient failures
- Fallback Options: Alternative approaches when primary fails
- User Guidance: Actionable error messages and solutions
Security Architectureβ
Threat Modelβ
Protected Assets:
- User repository contents
- Generated configuration files
- Deployment credentials
- Memory system data
Threat Vectors:
- Malicious repository content
- Compromised dependencies
- Network interception
- Privilege escalation
Security Measuresβ
-
Input Validation:
- Zod schema validation for all inputs
- Path traversal prevention
- Content sanitization
-
Execution Environment:
- No arbitrary code execution
- Sandboxed file operations
- Limited network access
-
Secrets Management:
- OIDC token-based authentication
- No long-lived credential storage
- Environment variable isolation
-
Generated Security:
- Minimal permission workflows
- Security header configuration
- HTTPS enforcement
Extension Architectureβ
Adding New Toolsβ
// 1. Implement tool function
export async function newTool(args: NewToolArgs): Promise<NewToolResult> {
// Business logic
}
// 2. Define schema
const newToolSchema = z.object({
// Input validation
});
// 3. Register in server
const TOOLS = [
// ... existing tools
{
name: 'new_tool',
description: 'Tool description',
inputSchema: newToolSchema,
}
];
// 4. Add handler in CallToolRequestSchema
case 'new_tool': {
const result = await newTool(args);
return result;
}
Adding New SSG Supportβ
// 1. Extend SSG enum
type SSGType =
| "jekyll"
| "hugo"
| "docusaurus"
| "mkdocs"
| "eleventy"
| "new-ssg";
// 2. Add scoring logic
function scoreNewSSG(analysis: Analysis): number {
// Scoring implementation
}
// 3. Add configuration generator
function generateNewSSGConfig(args: ConfigArgs): ConfigResult {
// Configuration generation
}
// 4. Add deployment workflow
function createNewSSGWorkflow(args: DeployArgs): WorkflowResult {
// GitHub Actions workflow
}
Future Architecture Considerationsβ
Planned Enhancementsβ
- Distributed Memory: Shared learning across installations
- Plugin System: Third-party tool integration
- Real-time Monitoring: Live deployment health tracking
- Advanced Analytics: Usage patterns and optimization insights
Scalability Roadmapβ
- Phase 1: Current stateless design (β Complete)
- Phase 2: Memory system optimization (π In Progress)
- Phase 3: Distributed processing capabilities
- Phase 4: Cloud-native deployment options
This architecture provides a solid foundation for intelligent documentation deployment while maintaining simplicity, security, and extensibility.