The rich data that Massive Open Online Course (MOOC) platforms collect on the behavior of millions of students provide a unique opportunity to study human learning and to develop artificial intelligence methods to support learners. However, the anonymity of open online environments also provides a perfect opportunity for users to behave in a deceptive way. In this study, we explore the detection of fake learners (the specific case of fraud in educational settings, better known as academic dishonesty or cheating) and its impact on learning analytics results. We describe the implementation of algorithms to detect two different patterns have been applied for that purpose. The first one describes the pattern of Cheating Using Multiple Accounts (CUMA), where harvesting accounts are used to find correct answers that are later submitted in their master account. The second one is based on dissimilarity metrics to detect students that always submit their assignments very close in time because they are unauthorizedly collaborating with other peers to distribute the effort in an illicit way. Then, we analyze the impact of such cohort of fake learners for the MOOC sustainability and learning analytics research. We argue that since these fake learners have aberrant behaviors that do not represent genuine learning patterns, this can seriously and systematically affect the learning analytics results in MOOCs published so far. Following Replication and Sensitivity Analysis methodologies we replicate the analysis of two well-known learning analytics studies with and without fake learners, reporting on the robustness of those studies.
About José A. Ruipérez-Valiente
José A. Ruipérez-Valiente completed his B.Eng. and M.Eng. in Telecommunications at Universidad Católica de San Antonio de Murcia (UCAM) and Universidad Carlos III of Madrid (UC3M) respectively, graduating in both cases with the best academic transcript of the class. Afterwards, he completed his M.Sc. and P.hD. in Telematics at UC3M while conducting research at Institute IMDEA Networks in the area of learning analytics and educational data mining. During this time, he completed two research stays of three months each, the first one at MIT and the second one at the University of Edinburgh. He has received several academic and research awards and has published more than 25 scientific publications in prestigious journals and conferences of his area of research. He has also held industry appointments at Vocento, Accenture and ExoClick, combining experience in academia, research institutions and business companies. Currently he is a postdoctoral associate at the CMS/W department at MIT where he is part of the Teaching Systems Lab and also collaborates with the Education Arcade in applying data science to large scale free online courses and to game-based learning environments to enhance human knowledge on how we learn. He is passionate about how learning occurs, solving data-based problems, teaching and sharing knowledge, yoga, nature and photography.
This event will be conducted in English