With a large set of test packets in need of transcription, little time to do so, and even less desire to spend the money on temporary workers, I needed to think of some other way to perform the task. Luckily I was able to come up with a method using a tool from Amazon that proved relatively easy to implement. Mechanical Turk .
Data Collection Hassles
For the past few years the Assessment team has sent out a large batch of items in what we call pilot tests. These tests are used to gauge the effectiveness of assessment items. The format consists of an item followed by a set of questions about the item (Was there anything confusing?, Did the picture help?, Is answer A correct?, etc.). The questions have set answers (Yes, No, Not Sure) and/or a space for the student to provide a short written response.
In previous years we have used temporary workers to input the student responses into the Items Utility. This proved to be extremely slow and relatively expensive.
Last year we attempted a new method of data collection using an automated system to capture the Yes/No/Not Sure responses. This system used optical mark recognition (OMR) and required the packets to be formatted in a consistent manner on each page. Formatted in this way we could create a master file indicating how to judge answer selection. This proved fairly successful, though we did have to outsource the scanning to a third party. The results of the scan came back as Excel documents. A little PHP coding and I easily imported the scanned data into the Items Utility. The main drawback was that we still had to hire temps to enter the written responses.
This year we embarked on a new automated scanning system for the Yes/No/Not Sure responses that proved even better than in previous years. This system uses an in-house scanner and design software. A lot of prep work had to be performed by the researchers to set up their test packets. And we did have to use specific printers to produce the packets. But I believe in the end we saved money, particularly since the majority of the leg work had already been done in figuring out how to import the data into the utility. The only snag is that we again had no way of capturing the written responses electronically.
However, one of the benefits to the new system is that in addition to creating a data file for the Yes/No/Not Sure values it can produce images of the scanned regions. At first I thought maybe we could find some way of performing character recognition on the scanned images. Unfortunately, there’s just no economical way to do that on hand-written text. So our only real option was, again, transcription by a human being. Past experience with the time and money required made the option unpalatable and we were likely going to just move ahead without the hand-written data.
Enter the Turk
Amazon recently released a web-based service, Mechanical Turk (MTurk), that provides access to a distributed workforce. The service provides a means of creating what Amazon calls “Human Intelligence Tasks” (HITs), tasks that are currently too complex for a computer but ideally suited for your average person. We decided to try the service out for transcription of the written responses.
Since MTurk is web-based, the HITs are set up as web pages. An online editor makes setting up HIT templates fairly simple, though it helps to know a bit of HTML. Each template consists of instructions and a form where the worker enters the requested information. It’s up to the requester to determine the amount paid for each HIT completed (starting as low as one cent). Amazon tacks on an overhead fee for each HIT that is completed by a worker and accepted by the requester.
The template uses variables to fill in each HIT with unique information. Once the template is set up you provide a data file that contains the variable data. MTurk then publishes your HITs on a public interface where anyone in the world with an internet connection can perform the work.
Using MTurk
As I said, the program we used to scan the packets is able to save scanned regions as images. These images were organized such that we could easily determine the packet/student/item combination based solely on the file name and directory in which the image was located. This information is all we needed to insert data into the Items Utility, and thus provided an easy identifier to use with MTurk.
To minimize the amount of time a worker spends on each HIT it’s good to try and have all the data the user needs on the screen. Since a MTurk HIT is a web page it can reference images located on external servers. I used URLs for the image scan as values for the template variables. The HIT displayed the image side by side with a text area where the workers could input the transcribed text.
Initially I thought to just link to the original images. A quick test, however, and I realized that the size of the images would be a problem. The images were larger than needed and any bandwidth constraints on the worker end would significantly increase the time per HIT. Smaller files were needed, and to save space I thought to dynamically resize the images using PHP. A sample proved the folly of this idea. Our system resources are limited and the processing power and memory required to enable this for more than a few simultaneous workers caused the HITs to load slower than if we hadn’t resized to begin with. In the end we decided to resize all images, a scenario we had avoided due to space constraints. The smaller image would be referenced on each HIT with a link to the full size in cases where a zoomed view might be benefitial.
Since Amazon tacks on a handling fee for each HIT I attempted to organize our HITs in a way that would work out best for both us and the workers. We needed HITs that could be completed quickly but that also collected sufficient data so that the cost-per-HIT would balance out against the amount of work required. Plus we were concerned about providing equitable pay for the work being done. Based on our testing we were able to make a determination of how many images per HIT and at what rate would keep our fees acceptable while also resulting in an equitable wage.
Our due dilligence paid off. We had 48,351 HITs comprised of 193,401 images. The work was completed within a couple of days, which was beyond my wildest estimates. The amount paid and the time taken to complete the work was significantly less than what we would have paid to hire traditional temporary workesrs to do the same.
Once the work was complete MTurk provided a data file that included the original data along with the worker input, all the information we needed to import the data into the utility.
Wrapping Up
While this was a bit of a learning process, using MTurk to transcribe this data proved successful. I fully expect to use this system in the future. This has been more of a review of the process, but there were a few technical aspects involved. Rather than go into detail on these I’m providing links to the files involved which should provide sufficient information in that regard (except for perhaps the import script).
One other thing to keep in mind is that paying well attracts a lot of workers. We try to pay what would equate to a decent hourly wage. A good summary of how to get decent results from Mechanical Turk can be found here: http://www.smartsheet.com/blog/brent-frei/getting-good-smartsourcing-results-amazon-mechanical-turk-best-practices