Skip to main content

DocuMCP Architecture Overview

This document explains the architectural design of DocuMCP, providing insight into how the system works and why key design decisions were made.

High-Level Architecture​

DocuMCP follows a modular, stateless architecture built on the Model Context Protocol (MCP) standard, designed to provide intelligent documentation deployment capabilities through AI assistant integration.

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ AI Assistant Layer β”‚
β”‚ (Claude, GPT, Gemini, etc.) β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚ MCP Protocol
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ DocuMCP MCP Server β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ Tools β”‚ β”‚ Prompts β”‚ β”‚ Resources β”‚ β”‚
β”‚ β”‚ Layer β”‚ β”‚ Layer β”‚ β”‚ Layer β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Core Engine Layer β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ Repository β”‚ β”‚ SSG Recommendationβ”‚ β”‚ Memory System β”‚ β”‚
β”‚ β”‚ Analysis β”‚ β”‚ Engine β”‚ β”‚ β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ Content β”‚ β”‚ Deployment β”‚ β”‚ Validation β”‚ β”‚
β”‚ β”‚ Generation β”‚ β”‚ Automation β”‚ β”‚ Engine β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ External Integrations Layer β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ GitHub β”‚ β”‚ Static β”‚ β”‚ File System β”‚ β”‚
β”‚ β”‚ API β”‚ β”‚ Site β”‚ β”‚ Operations β”‚ β”‚
β”‚ β”‚ β”‚ β”‚ Generators β”‚ β”‚ β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Core Design Principles​

1. Stateless Operation​

DocuMCP operates as a stateless service where each tool invocation is independent:

  • No persistent server state between requests
  • Self-contained analysis for each repository
  • Reproducible results given the same inputs
  • Horizontal scalability without coordination

Benefits:

  • Reliability and consistency
  • Easy debugging and testing
  • No complex state management
  • Simple deployment model

2. Modular Architecture​

Each component has a single, well-defined responsibility:

// Tool interface definition
interface MCPTool {
name: string;
description: string;
inputSchema: ZodSchema;
handler: (args: ToolArgs) => Promise<ToolResult>;
}

// Example tool implementation
export async function analyzeRepository(
args: AnalysisArgs,
): Promise<AnalysisResult> {
// Isolated business logic
return performAnalysis(args);
}

Benefits:

  • Easy to test and maintain
  • Clear separation of concerns
  • Extensible without breaking changes
  • Independent component evolution

3. Progressive Complexity​

Users can start simple and add sophistication as needed:

  1. Basic: Simple repository analysis
  2. Intermediate: SSG recommendations and configuration
  3. Advanced: Full deployment automation with optimization
  4. Expert: Memory-enhanced workflows with pattern learning

4. Security-First Design​

All operations follow security best practices:

  • Minimal permissions in generated workflows
  • OIDC authentication for GitHub Actions
  • Input validation using Zod schemas
  • No secret exposure in logs or outputs

Component Architecture​

MCP Server Core​

The main server (src/index.ts) implements the MCP protocol specification:

const server = new Server(
{
name: "documcp",
version: packageJson.version,
},
{
capabilities: {
tools: {}, // 25+ documentation tools
prompts: {
// Guided workflow prompts
listChanged: true,
},
resources: {
// Generated content resources
subscribe: true,
listChanged: true,
},
},
},
);

Key Features:

  • Tool Registration: Dynamic tool discovery and registration
  • Schema Validation: Zod-based input/output validation
  • Resource Management: Automatic resource creation and storage
  • Error Handling: Comprehensive error management and reporting

Repository Analysis Engine​

The analysis engine examines projects from multiple perspectives:

interface RepositoryAnalysis {
structure: ProjectStructure; // Files, languages, organization
dependencies: DependencyAnalysis; // Package ecosystems, frameworks
documentation: DocuAnalysis; // Existing docs, quality assessment
recommendations: ProjectProfile; // Type, complexity, team size
}

Analysis Layers:

  1. File System Analysis: Language detection, structure mapping
  2. Dependency Analysis: Package manager integration, framework detection
  3. Documentation Assessment: README quality, existing docs evaluation
  4. Complexity Scoring: Project size, team collaboration patterns

Performance Characteristics:

  • Sub-second analysis for typical repositories
  • Memory efficient with streaming file processing
  • Extensible language and framework detection

SSG Recommendation Engine​

A data-driven system for selecting optimal static site generators:

interface SSGRecommendation {
recommended: SSGType;
confidence: number; // 0-1 confidence score
reasoning: string[]; // Human-readable justifications
alternatives: Alternative[]; // Other viable options
scoring: ScoringBreakdown; // Detailed scoring matrix
}

Scoring Factors:

  • Ecosystem Alignment: Language/framework compatibility
  • Feature Requirements: Search, theming, plugins
  • Complexity Match: Project size and team capacity
  • Performance Needs: Build speed, site performance
  • Maintenance Overhead: Learning curve, ongoing effort

Supported SSGs:

  • Jekyll: Ruby-based, GitHub Pages native
  • Hugo: Go-based, fast builds, extensive themes
  • Docusaurus: React-based, modern features
  • MkDocs: Python-based, simple and effective
  • Eleventy: JavaScript-based, flexible and fast

Memory System Architecture​

An intelligent learning system that improves recommendations over time:

interface MemorySystem {
storage: ProjectLocalStorage; // .documcp/memory/
patterns: PatternRecognition; // Success pattern learning
similarity: ProjectSimilarity; // Project comparison engine
insights: HistoricalInsights; // Usage patterns and outcomes
}

Storage Architecture:

.documcp/memory/
β”œβ”€β”€ analysis/ # Repository analysis results
β”‚ β”œβ”€β”€ analysis_*.jsonl
β”‚ └── metadata.json
β”œβ”€β”€ recommendations/ # SSG recommendations
β”‚ β”œβ”€β”€ recommendations_*.jsonl
β”‚ └── patterns.json
β”œβ”€β”€ deployments/ # Deployment outcomes
β”‚ β”œβ”€β”€ deployments_*.jsonl
β”‚ └── success_rates.json
└── system/ # System metadata
β”œβ”€β”€ config.json
└── statistics.json

Learning Mechanisms:

  • Pattern Recognition: Successful project-SSG combinations
  • Similarity Matching: Find projects with similar characteristics
  • Outcome Tracking: Monitor deployment success rates
  • Feedback Integration: Learn from user choices and outcomes

Content Generation System​

Automated content creation following the Diataxis framework:

Diataxis Framework Implementation:
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Tutorials β”‚ β”‚ How-to Guides β”‚
β”‚ (Learning) β”‚ β”‚ (Problem- β”‚
β”‚ β”‚ β”‚ solving) β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Reference β”‚ β”‚ Explanation β”‚
β”‚ (Information) β”‚ β”‚ (Understanding) β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Content Types Generated:

  • Tutorials: Step-by-step learning guides
  • How-to Guides: Task-oriented solutions
  • Reference: API docs, configuration options
  • Explanation: Architecture, design decisions

Deployment Automation​

Automated GitHub Pages deployment with SSG-specific optimizations:

# Generated workflow characteristics
Security:
- OIDC authentication (no long-lived tokens)
- Minimal permissions (contents:read, pages:write)
- Secret masking and secure handling

Performance:
- Dependency caching (npm, gems, pip)
- Parallel builds where possible
- Optimized Docker images

Reliability:
- Build verification before deployment
- Rollback capabilities
- Health monitoring

Validation Engine​

Multi-layered validation for content quality assurance:

interface ValidationEngine {
linkChecker: LinkValidation; // Internal/external link verification
contentValidator: ContentValidation; // Diataxis compliance, accuracy
codeValidator: CodeValidation; // Syntax checking, example testing
seoValidator: SEOValidation; // Meta tags, performance, accessibility
}

Validation Levels:

  • Syntax: Markdown, code block syntax
  • Structure: Diataxis compliance, navigation
  • Content: Accuracy, completeness, consistency
  • Performance: Loading speed, mobile optimization
  • SEO: Meta tags, structured data, accessibility

Data Flow Architecture​

Request Processing Flow​

1. MCP Client Request
↓
2. Schema Validation (Zod)
↓
3. Tool Handler Routing
↓
4. Business Logic Execution
↓
5. Result Processing
↓
6. Resource Storage
↓
7. Response Formatting
↓
8. MCP Protocol Response

Memory System Flow​

1. Tool Execution
↓
2. Result Analysis
↓
3. Pattern Extraction
↓
4. Similarity Matching
↓
5. Storage Update
↓
6. Learning Integration
↓
7. Future Recommendations Enhancement

Performance Architecture​

Performance Characteristics​

Analysis Performance:

  • Small repos (<100 files): <500ms
  • Medium repos (100-1000 files): <2s
  • Large repos (1000+ files): <10s

Memory Efficiency:

  • Streaming processing for large repositories
  • Lazy loading of analysis components
  • Garbage collection optimization

Scalability:

  • Stateless design enables horizontal scaling
  • No database dependencies for core functionality
  • Process isolation for security and reliability

Optimization Strategies​

  1. Caching:

    • File system metadata caching
    • Analysis result memoization
    • Pattern matching optimization
  2. Lazy Loading:

    • On-demand tool loading
    • Progressive analysis depth
    • Conditional feature activation
  3. Parallel Processing:

    • Concurrent file analysis
    • Parallel validation checks
    • Asynchronous resource generation

Error Handling Architecture​

Error Categories​

enum ErrorCategory {
VALIDATION = "validation", // Input validation failures
FILESYSTEM = "filesystem", // File system access issues
NETWORK = "network", // GitHub API, external requests
PROCESSING = "processing", // Business logic errors
CONFIGURATION = "configuration", // Setup and config issues
}

Error Recovery Strategies​

  1. Graceful Degradation: Partial results when possible
  2. Retry Logic: Exponential backoff for transient failures
  3. Fallback Options: Alternative approaches when primary fails
  4. User Guidance: Actionable error messages and solutions

Security Architecture​

Threat Model​

Protected Assets:

  • User repository contents
  • Generated configuration files
  • Deployment credentials
  • Memory system data

Threat Vectors:

  • Malicious repository content
  • Compromised dependencies
  • Network interception
  • Privilege escalation

Security Measures​

  1. Input Validation:

    • Zod schema validation for all inputs
    • Path traversal prevention
    • Content sanitization
  2. Execution Environment:

    • No arbitrary code execution
    • Sandboxed file operations
    • Limited network access
  3. Secrets Management:

    • OIDC token-based authentication
    • No long-lived credential storage
    • Environment variable isolation
  4. Generated Security:

    • Minimal permission workflows
    • Security header configuration
    • HTTPS enforcement

Extension Architecture​

Adding New Tools​

// 1. Implement tool function
export async function newTool(args: NewToolArgs): Promise<NewToolResult> {
// Business logic
}

// 2. Define schema
const newToolSchema = z.object({
// Input validation
});

// 3. Register in server
const TOOLS = [
// ... existing tools
{
name: 'new_tool',
description: 'Tool description',
inputSchema: newToolSchema,
}
];

// 4. Add handler in CallToolRequestSchema
case 'new_tool': {
const result = await newTool(args);
return result;
}

Adding New SSG Support​

// 1. Extend SSG enum
type SSGType =
| "jekyll"
| "hugo"
| "docusaurus"
| "mkdocs"
| "eleventy"
| "new-ssg";

// 2. Add scoring logic
function scoreNewSSG(analysis: Analysis): number {
// Scoring implementation
}

// 3. Add configuration generator
function generateNewSSGConfig(args: ConfigArgs): ConfigResult {
// Configuration generation
}

// 4. Add deployment workflow
function createNewSSGWorkflow(args: DeployArgs): WorkflowResult {
// GitHub Actions workflow
}

Future Architecture Considerations​

Planned Enhancements​

  1. Distributed Memory: Shared learning across installations
  2. Plugin System: Third-party tool integration
  3. Real-time Monitoring: Live deployment health tracking
  4. Advanced Analytics: Usage patterns and optimization insights

Scalability Roadmap​

  1. Phase 1: Current stateless design (βœ… Complete)
  2. Phase 2: Memory system optimization (πŸ”„ In Progress)
  3. Phase 3: Distributed processing capabilities
  4. Phase 4: Cloud-native deployment options

This architecture provides a solid foundation for intelligent documentation deployment while maintaining simplicity, security, and extensibility.