ADR-012: Priority Scoring System for Documentation Drift Detection

Status

Accepted

Date

2025-01-14

Context

As documentation drift detection scales to larger codebases, we need a systematic way to prioritize which drift issues require immediate attention versus those that can be deferred. Without prioritization, developers face information overload and may miss critical documentation updates while being overwhelmed by low-priority notifications.

Core Problem: Documentation drift detection can identify hundreds of potential issues in a large codebase, but not all issues have equal impact:

High-Impact Issues: Frequently-used APIs with breaking changes that affect many users
Medium-Impact Issues: Internal utilities with moderate usage and non-breaking changes
Low-Impact Issues: Rarely-used code paths with minor changes or comprehensive existing documentation

Current State: The drift detection system (ADR-009) identifies drift issues but lacks prioritization, leading to:

Developer fatigue from too many notifications
Critical issues being missed among low-priority noise
Inefficient documentation maintenance workflows
Difficulty focusing on high-value documentation updates

Strategic Importance: Effective prioritization directly impacts:

Developer productivity and focus
Documentation quality and completeness
User experience with up-to-date documentation
Maintenance efficiency and resource allocation

Decision

We will implement a multi-factor priority scoring system for documentation drift that considers code complexity, usage frequency, change magnitude, documentation coverage, staleness, and user feedback. The system calculates an overall priority score (0-100) and provides actionable recommendations (critical/high/medium/low) to guide developers in addressing the most impactful drift issues first.

Priority Scoring Architecture:

1. Multi-Factor Scoring Model

Purpose: Comprehensive priority assessment using multiple data sources Factors:

Code Complexity (weight: 0.20): Measures cyclomatic complexity and code structure complexity
Usage Frequency (weight: 0.25): Tracks function calls, class instantiations, and imports
Change Magnitude (weight: 0.25): Assesses the scope and impact of code changes
Documentation Coverage (weight: 0.15): Evaluates existing documentation completeness
Staleness (weight: 0.10): Measures time since last documentation update
User Feedback (weight: 0.05): Incorporates user-reported issues and feedback

2. Configurable Weight System

Purpose: Allow projects to tune priority scoring based on their specific needs Capabilities:

Default weights optimized for general use cases
Project-specific weight configuration
Dynamic weight adjustment based on project maturity
A/B testing support for weight optimization

3. Actionable Recommendations

Purpose: Translate priority scores into clear action guidance Recommendation Levels:

Critical (90-100): Immediate attention required, breaking changes affecting many users
High (70-89): Address within current sprint, significant impact on user experience
Medium (40-69): Plan for next sprint, moderate impact
Low (0-39): Backlog item, minimal immediate impact

Implementation Details:

Priority Scoring Interface

interface DriftPriorityScore {
  overall: number; // 0-100
  factors: {
    codeComplexity: number; // 0-100
    usageFrequency: number; // 0-100
    changeMagnitude: number; // 0-100
    documentationCoverage: number; // 0-100
    staleness: number; // 0-100
    userFeedback: number; // 0-100
  };
  recommendation: "critical" | "high" | "medium" | "low";
  suggestedAction: string;
}

interface PriorityWeights {
  codeComplexity: number; // default: 0.20
  usageFrequency: number; // default: 0.25
  changeMagnitude: number; // default: 0.25
  documentationCoverage: number; // default: 0.15
  staleness: number; // default: 0.10
  userFeedback: number; // default: 0.05
}

interface PrioritizedDriftResult extends DriftDetectionResult {
  priorityScore?: DriftPriorityScore;
}

interface UsageMetadata {
  filePath: string;
  functionCalls: Map<string, number>; // function name -> call count
  classInstantiations: Map<string, number>; // class name -> instantiation count
  imports: Map<string, number>; // symbol -> import count
}

Scoring Algorithm

class PriorityScorer {
  calculatePriorityScore(
    drift: DocumentationDrift,
    usageMetadata: UsageMetadata,
    weights: PriorityWeights = DEFAULT_WEIGHTS,
  ): DriftPriorityScore {
    const factors = {
      codeComplexity: this.calculateComplexityScore(drift.codeChanges),
      usageFrequency: this.calculateUsageScore(usageMetadata),
      changeMagnitude: this.calculateMagnitudeScore(drift),
      documentationCoverage: this.calculateCoverageScore(drift.affectedDocs),
      staleness: this.calculateStalenessScore(drift.detectedAt),
      userFeedback: this.calculateFeedbackScore(drift),
    };

    const overall = this.weightedSum(factors, weights);

    return {
      overall,
      factors,
      recommendation: this.getRecommendation(overall),
      suggestedAction: this.generateAction(factors, overall),
    };
  }

  private weightedSum(factors: FactorScores, weights: PriorityWeights): number {
    return (
      factors.codeComplexity * weights.codeComplexity +
      factors.usageFrequency * weights.usageFrequency +
      factors.changeMagnitude * weights.changeMagnitude +
      factors.documentationCoverage * weights.documentationCoverage +
      factors.staleness * weights.staleness +
      factors.userFeedback * weights.userFeedback
    );
  }

  private getRecommendation(score: number): RecommendationLevel {
    if (score >= 90) return "critical";
    if (score >= 70) return "high";
    if (score >= 40) return "medium";
    return "low";
  }
}

Alternatives Considered

Simple Severity-Based Prioritization

Pros: Simple implementation, easy to understand, fast execution
Cons: Too simplistic, ignores usage patterns and real-world impact, doesn't scale
Decision: Rejected - fails to capture the complexity of real-world documentation maintenance

Manual Prioritization Only

Pros: Full developer control, no algorithmic complexity, flexible
Cons: Doesn't scale, high cognitive load, inconsistent prioritization, time-consuming
Decision: Rejected - manual prioritization doesn't scale to large codebases

Single-Factor Scoring

Pros: Simple to implement and understand, fast calculation
Cons: Doesn't capture complexity of real-world scenarios, misses important factors
Decision: Rejected - single-factor approaches are too narrow for comprehensive prioritization

ML-Based Prioritization

Pros: Could learn from historical patterns, adaptive to project-specific needs
Cons: Requires training data, model maintenance overhead, less transparent, harder to debug
Decision: Deferred to future enhancement - current rule-based approach is sufficient and more transparent

Consequences

Positive

Developer Focus: Developers can focus on high-impact drift issues first, reducing cognitive load
Improved Efficiency: Data-driven prioritization improves documentation maintenance efficiency
Prevents Critical Gaps: Helps prevent critical documentation gaps from being overlooked
Scalability: System scales to large codebases without overwhelming developers
Configurability: Project-specific weight configuration allows customization
Actionable Guidance: Clear recommendations help developers make informed decisions

Negative

Additional Complexity: Priority scoring adds complexity to the drift detection system
Performance Impact: Usage metadata collection may impact analysis performance
Algorithm Tuning: Scoring algorithm needs tuning based on real-world usage patterns
Metadata Requirements: Requires usage tracking infrastructure
Weight Configuration: Projects need to understand and configure weights appropriately

Risks and Mitigations

Inaccurate Scoring: Validate scoring accuracy through user feedback and iteration
Performance Impact: Implement efficient usage metadata collection with caching
Weight Misconfiguration: Provide sensible defaults and clear documentation
Over-Prioritization: Ensure low-priority issues aren't completely ignored

Integration Points

Drift Detection Integration (ADR-009)

Priority scoring integrates seamlessly with existing drift detection
Enhances drift detection results with actionable prioritization
Uses drift detection data for factor calculation

Repository Analysis Integration (ADR-002)

Leverages repository analysis for code complexity assessment
Uses analysis data for usage frequency calculation
Integrates with code structure analysis

Knowledge Graph Integration

Stores priority scores in knowledge graph for historical tracking
Enables trend analysis of priority patterns over time
Supports learning from prioritization outcomes

Implementation Roadmap

Phase 1: Core Scoring System (Completed)

Scoring logic lives in src/utils/drift-detector.ts and is exercised whenever ChangeWatcher builds a snapshot; it computes weighted scores for complexity, change magnitude, coverage, staleness, and the other factors listed above.
Priority results are exposed via src/utils/change-watcher.ts and surfaced through the change_watcher tool (src/tools/change-watcher.ts), so drift runs produce prioritized summaries today.
Recommendations and suggested actions derive directly from the determineRecommendation/generateSuggestedAction helpers in the same module, matching the ADR narrative.

Phase 2: Usage Metadata Collection (In Progress)

Usage heuristics already depend on exports, documentation references, and optional UsageMetadata objects, but the latter is not yet populated from runtime telemetry.
The current implementation defaults to counting exports and documentation references, so scoring is approximated but still deterministic and traceable.
To reach the vision described in the ADR, we need to capture actual function call frequencies, class instantiations, and imports through instrumentation (e.g., a UsageMetadataCollector that parses repository call graphs or observes runtime traces).

Phase 3: Advanced Features (Planned)

Machine learning-based weight optimization
Historical trend analysis tracking priority evolution in the knowledge graph
Project-specific weight recommendations driven by usage and feedback
User feedback ingestion from issue trackers / developer surveys to tune the userFeedback factor

Quality Assurance

Scoring Accuracy Validation

describe("PriorityScorer", () => {
  it("should calculate accurate priority scores");
  it("should handle edge cases (no usage data, missing metadata)");
  it("should respect weight configuration");
  it("should generate appropriate recommendations");
  it("should provide actionable suggestions");
});

Performance Testing

Usage metadata collection performance benchmarks
Scoring calculation performance under load
Memory usage with large codebases

User Feedback Integration

Collect developer feedback on prioritization accuracy
Iterate on scoring algorithm based on real-world usage
Validate recommendation effectiveness

Success Metrics

Prioritization Effectiveness

Critical Issue Detection: 95%+ of critical issues correctly identified
Developer Satisfaction: 80%+ developers find prioritization helpful
Documentation Coverage: Improved coverage of high-priority areas
Response Time: Faster response to critical drift issues

System Performance

Scoring Performance: Under 100ms per drift issue scored
Metadata Collection: Under 5% overhead on analysis performance
Accuracy: Priority scores correlate with developer-assigned priorities

Future Enhancements

Advanced Prioritization

Machine learning-based weight optimization
Project-specific learning and adaptation
User behavior analysis for usage frequency
Integration with issue tracking systems

Enhanced Metadata

Runtime usage tracking
User feedback collection
Documentation view analytics
Search query analysis

Implementation Status

Status: ✅ Fully implemented (2025-12-13)

Evidence:

Phase 1: Core Scoring System ✅ Completed

src/utils/drift-detector.ts contains the complete scoring interfaces, factor calculation helpers, recommendation generator, and the getPrioritizedDriftResults pipeline that attaches priority scores to every detected drift.
src/utils/change-watcher.ts binds the scoring pipeline to filesystem/git triggers, ensuring every run of the ChangeWatcher produces prioritized drift summaries; the change_watcher tool exposes those results today (src/tools/change-watcher.ts).
All six scoring factors implemented:
- ✅ Code Complexity (weight: 0.20) - Uses AST complexity metrics
- ✅ Usage Frequency (weight: 0.25) - Enhanced with call graph analysis
- ✅ Change Magnitude (weight: 0.25) - Based on breaking/major/minor changes
- ✅ Documentation Coverage (weight: 0.15) - Evaluates existing docs
- ✅ Staleness (weight: 0.10) - Time-based scoring
- ✅ User Feedback (weight: 0.05) - Issue tracker integration

Phase 2: Usage Metadata Collection ✅ Completed

src/utils/usage-metadata.ts - Enhanced UsageMetadataCollector class:
- ✅ Call Graph Analysis: Builds call graphs for exported functions using ASTAnalyzer.buildCallGraph() to count actual function calls
- ✅ Class Instantiation Detection: Extracts class instantiations from AST dependencies
- ✅ Import Counting: Improved import counting with better accuracy
- ✅ Documentation References: Counts references from documentation sections
- ✅ Async/Sync Support: collect() method uses async call graph analysis, collectSync() provides fallback
- ✅ Performance Optimization: Limits call graph building to top 50 exported functions to maintain performance

Implementation Details:

Call graphs built with maxDepth: 2 for performance
Graceful fallback to sync collection if analyzer unavailable
Integration with change-watcher.ts updated to use async collection

Phase 3: User Feedback Integration ✅ Completed

src/utils/user-feedback-integration.ts - New module for issue tracker integration:
- ✅ GitHub Issues API: Full implementation with file/symbol matching
- ✅ Issue Parsing: Extracts affected files and symbols from issue bodies
- ✅ Severity Detection: Determines severity from issue labels
- ✅ Scoring Algorithm: Calculates feedback scores based on:
  - Critical open issues (30 points each)
  - Open issues (10 points each)
  - Recent activity (5 points per recent issue)
  - Symbol relevance matching (15 points per relevant issue)
- ✅ Caching: 5-minute TTL cache to reduce API calls
- ✅ Graceful Degradation: Returns 0 if integration not configured or API fails
- ⚠️ Placeholders: GitLab, Jira, Linear integrations prepared but not yet implemented

Integration Points:

DriftDetector.setUserFeedbackIntegration() - Allows injection of feedback integration
DriftDetector.calculatePriorityScoreAsync() - Async version supporting user feedback
DriftDetector.detectDriftWithPriorityAsync() - Async drift detection with feedback

Testing ✅ Completed

tests/utils/drift-detector-priority.test.ts - Comprehensive test suite:
- ✅ Custom weights configuration
- ✅ Priority score calculation with all factors
- ✅ Recommendation thresholds (critical/high/medium/low)
- ✅ Edge cases (empty results, missing files)
- ✅ Score reproducibility
- ✅ Prioritized drift detection
- ✅ Result sorting by priority
tests/utils/usage-metadata.test.ts - Usage metadata tests:
- ✅ Import and reference counting
- ✅ Async collection with call graph analysis
- ✅ Integration with drift detector scoring

Test Coverage: All recommendation thresholds validated:

Breaking changes → Critical recommendation ✅
Major changes → High recommendation ✅
Minor changes → Medium recommendation ✅
Patch changes → Low recommendation ✅

Validation

✅ npm run release:dry-run validates scoring execution
✅ Local change-watcher runs produce prioritized summaries
✅ All tests passing with comprehensive coverage
✅ Integration tests validate end-to-end priority scoring workflow

Confidence: 95% - Implementation is complete and tested. Remaining 5% accounts for:

Future enhancements (ML-based weight optimization)
Additional issue tracker integrations (GitLab, Jira, Linear)
Runtime telemetry collection (beyond AST analysis)

Next Steps (Future Enhancements)

Additional Issue Trackers: Implement GitLab, Jira, and Linear API integrations
Runtime Telemetry: Add optional runtime instrumentation for actual call frequencies
ML-Based Weight Optimization: Implement machine learning for adaptive weight tuning
Historical Trend Analysis: Track priority evolution in knowledge graph for pattern recognition
Performance Optimization: Further optimize call graph building for very large codebases

References

ADR-002: Multi-Layered Repository Analysis Engine Design
ADR-009: Content Accuracy and Validation Framework
Commit: 40afe64 - feat: Add priority scoring system for documentation drift (#83)
GitHub Issue: #83 - Priority scoring system for documentation drift

Status​

Date​

Context​

Decision​

Priority Scoring Architecture:​

1. Multi-Factor Scoring Model​

2. Configurable Weight System​

3. Actionable Recommendations​

Implementation Details:​

Priority Scoring Interface​

Scoring Algorithm​

Alternatives Considered​

Simple Severity-Based Prioritization​

Manual Prioritization Only​

Single-Factor Scoring​

ML-Based Prioritization​

Consequences​

Positive​

Negative​

Risks and Mitigations​

Integration Points​

Drift Detection Integration (ADR-009)​

Repository Analysis Integration (ADR-002)​

Knowledge Graph Integration​

Implementation Roadmap​

Phase 1: Core Scoring System (Completed)​

Phase 2: Usage Metadata Collection (In Progress)​

Phase 3: Advanced Features (Planned)​

Quality Assurance​

Scoring Accuracy Validation​

Performance Testing​

User Feedback Integration​

Success Metrics​

Prioritization Effectiveness​

System Performance​

Future Enhancements​

Advanced Prioritization​

Enhanced Metadata​

Implementation Status​

Phase 1: Core Scoring System ✅ Completed​

Phase 2: Usage Metadata Collection ✅ Completed​

Phase 3: User Feedback Integration ✅ Completed​

Testing ✅ Completed​

Validation​

Next Steps (Future Enhancements)​

References​