A Framework for Rigorous Comparisons of Human and Algorithm Performance on a One-Shot Learning Task
As the field of machine learning progresses, a central goal remains to create systems with human-like cognitive capabil- ities, including the ability to generalize knowledge from very few, noisy examples. The area of low-sample machine learn- ing research, including one-shot learning, lacks consensus on proper methods and rigor in comparing human and machine performance. In order to measure progress towards the goal of human-like one-shot learning performance, we present a flexible, open-source framework that combines Amazon Me- chanical Turk for obtaining task-related experimental data and Google Forms for collecting subjective and demograph- ics information from human participants. This framework was used to collect data for a one-shot learning task in order to provide a comparison between the performance of human participants and a baseline deep learning approach commonly used in the field. Preliminary results indicate that human ac- curacy exceeds algorithm accuracy on the one-shot class, mo- tivating future research in this area. We have open-sourced our framework, including Mechanical Turk templates, stim- uli, and analysis code to lower the barrier to entry into hu- man subjects testing in machine learning research, standard- ize the approach for evaluating human performance on ma- chine learning tasks, and improve machine learning algo- rithms.