Crowdsourcing Literature¶
Notes on crowdsourcing literature that looks at finding true labels given estimates from noisy workers.
Lots of theory-focussed work which gives probabilistic bounds on errors in terms of the number of workers, their reliabilities, number of items and other parameters.
Addresses truth-tracking both in the sense of sample complexity and asymptotics.
Basic framework¶
Lot of work is in the Dawid-Skene model [DS79]
Often for the specific case of binary labels, although multi-label work exists too
Model:
A number of workers
A number of items, each of which has a true label among a set of possibilities (often same set of labels for all items)
Confusion probabilities for each worker: \(\pi_{kl}^{(i)}\) is the probability that worker \(i\) provides label \(l\) when \(k\) is the true label
Sometimes workers do not provide labels for all items:
Papers¶
Who moderates the moderators? (2011)
Binary aggregation
Algorithm for finding true labels using spectral decomposition of a matrix associated with the input
Probabilistic bounds on fraction of errors made, parametrised by number of users, items, and average informativeness of workers
Informativeness is \((2p_i - 1)^2\): algorithm works well even for workers with accuracy close to 0 (since it can flip their labels)
The catch: need to know the identity of one trustworthy worker (reliability greater than \(1/2\))
Online version of the algorithm estimates worker reliability levels, but they do not analyse how close they are to true levels
Basic look at robustness to adversarial workers
Aggregating Crowdsourced Binary Ratings (2013)
Binary aggregation
Arbitrary worker-item answering graph
Spectral methods to produce estimates of both the true labels and true worker reliability levels
Probabilistic bounds on error for both of these estimates, in terms of average reliability (amongst other things)
Spectral methods meet EM: A provably optimal algorithm for crowdsourcing (2014)
Any finite number of labels
Uses spectral methods + EM algorithm to estimate confusion matrices (basically worker reliability levels…)
Results about theoretical accuracy of their algorithm
Papers to read¶
Learning from crowdsourced labeled data: a survey. Loads of references and outlines different strands of the literature.
CDAS: A Crowdsourcing Data Analytics System. Seems to be more focussed on actual implementation of a crowdsourcing platform, but read this just to extract the bit about quality control.
Adaptive Task Assignment for Crowdsourced Classification (already in pl). Have had a skim of first few pages and highlighted some further references. It also seems to contain some good background info on the area. Just noticed that lemma 1 is basically a more general result (and using their different but equivalent formalism, simpler to show) than the thing I was working on all weekend…
Efficient crowdsourcing for multi-class labeling, Karger et. al.
Aggregating Ordinal Labels from Crowds by Minimax Conditional Entropy
Reputation-based Worker Filtering in Crowdsourcing. From the abstract seems that it looks more generally at arbitrary (deterministic) adversarial strategies. So it might be free from the independent-judgements-with-same-accuracy-on-each-item weakness.
A Permutation-based Model for Crowd Labeling: Optimal Estimation and Robustness
Exact Exponent in Optimal Rates for Crowdsourcing
Multicategory Crowdsourcing Accounting for Variable Task Difficulty, Worker Skill, and Worker Intention
Domain-Weighted Majority Voting for Crowdsourcing
Identifying unreliable and adversarial workers in crowdsourced labeling tasks. Models adversarial sources explicitly
Iterative learning for reliable crowdsourcing systems
Crowdsourcing with Arbitrary Adversaries