Building Distributed Annotation Platforms#

Data annotation is the unglamorous foundation of modern AI. Behind every impressive model is a system that efficiently collected, validated, and processed millions of human judgments. Here's how we build these systems to scale.

The Core Architecture#

A distributed annotation platform isn't just a task queue—it's a real-time collaboration system with strict consistency requirements. The fundamental tension: you need low latency for annotator experience but strong consistency for data integrity.

// Simplified task routing architecture
interface TaskRouter {
  // Route task to appropriate annotator pool
  route(task: AnnotationTask): Promise<AnnotatorAssignment>;
  
  // Handle annotator going offline mid-task
  handleDisconnect(annotatorId: string): Promise<void>;
  
  // Rebalance when capacity changes
  rebalance(poolId: string): Promise<void>;
}

class SmartTaskRouter implements TaskRouter {
  async route(task: AnnotationTask): Promise<AnnotatorAssignment> {
    const candidates = await this.getQualifiedAnnotators(task);
    const scored = candidates.map(a => ({
      annotator: a,
      score: this.calculateFitScore(a, task)
    }));
    
    // Weighted random selection prevents hotspotting
    return this.weightedSelect(scored);
  }
  
  private calculateFitScore(annotator: Annotator, task: AnnotationTask): number {
    return (
      annotator.skillMatch(task.requiredSkills) * 0.4 +
      annotator.recentAccuracy * 0.3 +
      (1 - annotator.currentLoad) * 0.3
    );
  }
}

Handling the CAP Theorem#

In annotation systems, we can't sacrifice consistency—duplicate or conflicting labels corrupt training data. Our approach:

Optimistic locking on task assignment with short TTLs
Idempotent submissions so network retries are safe
Event sourcing for complete audit trails

class TaskAssignment:
    def __init__(self, task_id: str, annotator_id: str):
        self.task_id = task_id
        self.annotator_id = annotator_id
        self.version = 0
        self.expires_at = datetime.now() + timedelta(minutes=30)
    
    def submit(self, annotation: dict, expected_version: int) -> bool:
        if self.version != expected_version:
            raise OptimisticLockException("Task was modified")
        if datetime.now() > self.expires_at:
            raise TaskExpiredException("Assignment expired")
        
        # Atomic submission with version bump
        self.annotation = annotation
        self.version += 1
        self.submitted_at = datetime.now()
        return True

Quality Control as a First-Class Citizen#

Quality isn't a post-hoc check—it's embedded in the routing layer:

Consensus routing: Critical tasks go to 3-5 annotators
Expert escalation: Disagreements trigger senior review
Confidence calibration: Annotators flag uncertain responses

The system tracks per-annotator, per-task-type accuracy and adjusts routing dynamically.

Lessons from Production#

Lesson 1: Annotators are users, not workers. UX matters enormously. A confusing interface doesn't just slow throughput—it introduces systematic errors. We A/B test UI changes like any product team.

Lesson 2: Build for partial failure. At scale, something is always broken. Tasks should gracefully timeout and reroute. Annotator sessions should survive brief network interruptions.

Lesson 3: Observability is survival. We track:

Task completion latency percentiles
Annotator session health
Inter-annotator agreement in real-time
Queue depths by task type

const metrics = {
  taskLatencyP99: gauge("task_latency_p99_ms"),
  annotatorSessionHealth: gauge("annotator_sessions_healthy"),
  agreementScore: histogram("inter_annotator_agreement"),
  queueDepth: gauge("task_queue_depth", ["task_type"])
};

Lesson 4: Plan for the data you don't have. Cold start is real. New task types have no golden sets. New annotators have no track record. Build systems that bootstrap quality signals quickly.

The best annotation platforms are invisible to the ML teams consuming the data. They just see clean, consistent, well-labeled datasets appearing on schedule. The complexity lives in the infrastructure.

Building Distributed Annotation Platforms