In Crowdsourcing Satellite Imagery (part 1) I discussed about the use of a parallel model to organise people in the analysis of satellite imagery and redundancy and meritocratic technics to produce quality data. Here I talk about the qualitative and quantitative differences of 2 organisational models: parallel vs iterative models (wikipedia’s style). This is a part of a paper (pdf) written in march for the next GISscience2012 conference
This work is based on the influence of two domains: human computation which is a paradigm for utilizing human processing power to solve problems that computers cannot solve, and the domain of collective problem solving emphasizing a collective process to solve problems.
The general idea here is to apply well known algorithmic process in computer science , but in the context of human organisations: iterative process vs parallel process of information.
Parallel vs. Iterative human organisations
In a parallel model, a set of volunteers performs independently the same task and an aggregation function is used to generate a collective output.
In an iterative model, a chain of volunteers is used to iteratively improve the work of previous workers (Wikipedia’s
On the properties on each model
- The nature of the problem and its divisibility can restrict the choice of one approach over the other. In the parallel model each participant solves the problem independently and thus alone. A problem too complex to be solved by one person should then be divided in easier pieces.
- Such constraint is not present in the case of an iterative model: the whole complexity of the problem can be presented at once. One volunteer can start but not complete the problem, and next participants improve the result.
Diffusion of information: exploration / exploitation trade-oﬀ
- A common issue on collective problem solving  is the exploration-exploitation trade-oﬀ emerging from the structure of the organization. Networked organizations like iterative models can beneﬁt from the experience of others via the diﬀusion of knowledge. But exploiting previously discovered solutions can lead to a premature convergence on suboptimal solutions.
- On the other hand, in the parallel model, individuals are unable to copy one another, leading to a broader exploration in the search space and thus generating a greater diversity of solutions.
Mechanism enforcing quality
- The concept of wisdom of crowd lies on the empirical evidence that the aggregation of diverse independently-deciding individuals is likely to make certain types of decisions and predictions better than those of a few experts. Thus, an unbiased approach like the parallel model better supports this property than the iterative model. However, the critical question about the aggregation of individual answer remains to be considered.
- The iterative model integrates more naturally the notion of improvement, but it is very sensitive to vandalism (e.g. spamming in Wikipedia). Furthermore, as discussed above, the social inﬂuence can impact negatively the collective output  due to the path dependency eﬀect : once past decisions have become suﬃciently informative, later members simply copy those around
task and eﬀort
In the human computation ﬁeld,  categorizes the nature of the tasks according to two types: generation of information, and the evaluation and selection of information. For the parallel model the human eﬀort (with potentially the task of aggregating annotation) are related to creation tasks,whereas the iterative model also enables the reviewing. Thus, the eﬀort required can be diﬀerent: starting from scratch to produce an output requires a priori more eﬀort than reviewing or improving a previous result.
Experiment with MT
Read the <a href=” paper (pdf) for more information about the experiment with MechanicalTurk as low-cost and cheap human simulator . I report here only the output/findings.
- Linus’law, studied for OpenStreetMap (see this blog), an iterative model, has a limited validity in the parallel model: after a given threshold, adding more volunteers will not change the representativeness of opinion and thus will not change the consensual output.
- Furthermore we showed that varying the decision threshold in the voting process is a factor impacting signicantly the global quality (F-measure). This threshold should be choosen carefully, especially regarding any bias at the individual level. In our case applying the majority rule produces sub-optimal performance due to such a common bias.
- We observed that the first iterations have a high impact on the final results due to a path dependency effect: stronger commitment during the first steps are thus a primary concern for using such model (asking expert/committed
users to start).
On the performance on each model
We investigated the quality of both organisational model according to two aspects: the accuracy (type I and type II errors) and consistency of the results.We concluded the following:
Accuracy – type I errors:
The parallel strategy, generating only consensual results, corrects type I errors (wrong annotations) more signicantly than the iterative model. However in dicult areas (e.g. map 3), it does not mitigate well disagreements. Thanks to the accumulation of knowledge, the iterative model is thus more approprieted to handle ambiguous cases, or problems being hardly divisible in smaller and easier tasks participants will perform better than a parallel model when ambigious cases are considered to migitigate decision). So the iterative model outperforms the parallel one for dicult/complex areas, but with a potential path dependency eect: mistakes could be propagated, generating
more easily type I errors as the iterations proceed.
Accuracy – type II errors
We observed that the iterative model reduces type II errors (the spatial coverage) from one iteration to the next. It outperforms the parallel model due to the accumulation of knowledge, enabling next users to focus their attention on `fresh’ areas. The lower spatial coverage that is usually seen with the parallel model is due to the nature of the strategy: due to the independence of the work, the nth volunteer might well annotate for the nth times the same obvious building, without bringing new information at the collective level. This results in a waste of time for the volunteer and the community.
About the consistency of the result
The parallel model provides an output which is more reliable than that of a basic iterative. The reason is that the latter is sensitive to vandalism or knowledge destruction.
Other strange organisations e.g. Human Automata cellular could be experimented and compared to these organisations.
-  – David Lazer and Allan Friedman. The network structure of exploration and exploitation. Administrative Science Quarterly, 52(4):667–694, 2007
-  – Christina Fang, Jeho Lee, and Melissa A Schilling. Balancing exploration and exploitation through structural design: The isolation of subgroups and organization learning. Organization Science, 21(3):625–642, 2010.
-  – Jan Lorenz, Heiko Rauhut, Frank Schweitzer, and Dirk Helbing. How social in- fluence can undermine the wisdom of crowd effect. Proceedings of the National Academy of Sciences of the United States of America, 108(22):9020–9025, 2011.
-  – Massimo Egidi and Alessandro Narduzzo. The emergence of path-dependent be- haviors in cooperative contexts. International Journal of Industrial Organization, 15(6):677 – 709, 1997.
-  – Thomas W Malone, Robert Laubacher, and Chrysanthos Dellarocas. Harnessing crowds: Mapping the genome of collective intelligence. MIT Center for Collective Intelligence 2009