Building Distributed Annotation Platforms
Architecture patterns and lessons learned from building large-scale data labeling systems that power machine learning pipelines
Building Distributed Annotation Platforms#
Data annotation is the unglamorous foundation of modern AI. Behind every impressive model is a system that efficiently collected, validated, and processed millions of human judgments. Here's how we build these systems to scale.
The Core Architecture#
A distributed annotation platform isn't just a task queue—it's a real-time collaboration system with strict consistency requirements. The fundamental tension: you need low latency for annotator experience but strong consistency for data integrity.
// Simplified task routing architecture
interface TaskRouter {
// Route task to appropriate annotator pool
route(task: AnnotationTask): Promise<AnnotatorAssignment>;
// Handle annotator going offline mid-task
handleDisconnect(annotatorId: string): Promise<void>;
// Rebalance when capacity changes
rebalance(poolId: string): Promise<void>;
}
class SmartTaskRouter implements TaskRouter {
async route(task: AnnotationTask): Promise<AnnotatorAssignment> {
const candidates = await this.getQualifiedAnnotators(task);
const scored = candidates.map(a => ({
annotator: a,
score: this.calculateFitScore(a, task)
}));
// Weighted random selection prevents hotspotting
return this.weightedSelect(scored);
}
private calculateFitScore(annotator: Annotator, task: AnnotationTask): number {
return (
annotator.skillMatch(task.requiredSkills) * 0.4 +
annotator.recentAccuracy * 0.3 +
(1 - annotator.currentLoad) * 0.3
);
}
}Handling the CAP Theorem#
In annotation systems, we can't sacrifice consistency—duplicate or conflicting labels corrupt training data. Our approach:
- Optimistic locking on task assignment with short TTLs
- Idempotent submissions so network retries are safe
- Event sourcing for complete audit trails
class TaskAssignment:
def __init__(self, task_id: str, annotator_id: str):
self.task_id = task_id
self.annotator_id = annotator_id
self.version = 0
self.expires_at = datetime.now() + timedelta(minutes=30)
def submit(self, annotation: dict, expected_version: int) -> bool:
if self.version != expected_version:
raise OptimisticLockException("Task was modified")
if datetime.now() > self.expires_at:
raise TaskExpiredException("Assignment expired")
# Atomic submission with version bump
self.annotation = annotation
self.version += 1
self.submitted_at = datetime.now()
return TrueQuality Control as a First-Class Citizen#
Quality isn't a post-hoc check—it's embedded in the routing layer:
- Consensus routing: Critical tasks go to 3-5 annotators
- Expert escalation: Disagreements trigger senior review
- Confidence calibration: Annotators flag uncertain responses
The system tracks per-annotator, per-task-type accuracy and adjusts routing dynamically.
Lessons from Production#
Lesson 1: Annotators are users, not workers. UX matters enormously. A confusing interface doesn't just slow throughput—it introduces systematic errors. We A/B test UI changes like any product team.
Lesson 2: Build for partial failure. At scale, something is always broken. Tasks should gracefully timeout and reroute. Annotator sessions should survive brief network interruptions.
Lesson 3: Observability is survival. We track:
- Task completion latency percentiles
- Annotator session health
- Inter-annotator agreement in real-time
- Queue depths by task type
const metrics = {
taskLatencyP99: gauge("task_latency_p99_ms"),
annotatorSessionHealth: gauge("annotator_sessions_healthy"),
agreementScore: histogram("inter_annotator_agreement"),
queueDepth: gauge("task_queue_depth", ["task_type"])
};Lesson 4: Plan for the data you don't have. Cold start is real. New task types have no golden sets. New annotators have no track record. Build systems that bootstrap quality signals quickly.
The best annotation platforms are invisible to the ML teams consuming the data. They just see clean, consistent, well-labeled datasets appearing on schedule. The complexity lives in the infrastructure.