A major concern facing website operators is the ability to extract information from natural language. To overcome this hurdle, they have utilized a combination of manual and automated strategies, albeit unsuccessfully. The most effective one has proven to be crowdsourcing. Looking at the other approaches, the manual approaches utilize simple and repeatable steps alongside coding frame used to classify texts. These manual approaches offer advantage of complex nuances and boundary cases that can be addressed using human-interpretable heuristics. Automated approaches do address these boundary cases as well. However, these manual approaches are associated with problems of scalability, i.e. considerable effort by just a few expert analysts (Breaux & Schaub, 2013). In this case, the main goal is to determine- for the sake of website operators- the key components of an effective information extraction framework, and crowdsourcing has provided a viable answer in this regard.
Especially, crowdsourcing is effective where other automated methods cannot work. For example, the automated methods are ineffective in the extraction of information from noisy images, analysis of political sentiments and translation of text, areas in which crowdsourcing has proved efficient. Typically, crowdsourcing tasks are divided into smaller units of tasks (i.e. microtasks), which are deemed to be manageable. The microtasks are distributed to the crowds, i.e. large number of autonomous workers, each offering their services via crowdsourcing platforms. CrowdFlower and Amazon Mechanical Turk are examples of crowds. The results of crowdsourcing are then combined to provide a solution for the bigger tasks. Despite these advantages, there are challenges with complex crowdsourcing tasks, such as designing task workflow, which is cost-effective and produces better quality results (Breaux & Schaub, 2013).
- Breaux, T.D., & Schaub, F. (2014). Scaling requirements extraction to the crowd:
experiments with privacy policies. IEEE
- Sadeh, N., Acquisti, A., Breaux, T.D., Cranor, L.F., McDonald, A.M., Reidenberg, J.R.,
- Zimmeck, S., & Bellovin, S.M. (2014). Privee: an architecture for automatically analyzing
web privacy policies. Proceedings of the 23rd USENIX Security Symposium, San Diego, CA, August 20-22