Investigating the Feasibility of Automatic Assessment of Programming Tasks
The aims of this study were to investigate the feasibility of automatic assessment of programming tasks and to compare manual assessment with automatic assessment in terms of the effect of the different assessment methods on the marks of the students.
Manual assessment of programs written by students can be tedious. The assistance of automatic assessment methods might possibly assist in reducing the assessment burden, but there may be drawbacks diminishing the benefits of applying automatic assessment. The paper reports on the experience of a lecturer trying to introduce automated grading. Students’ solutions to a practical Java programming test were assessed both manually and automatically and the lecturer tied the experience to the unified theory of acceptance and use of technology (UTAUT).
The participants were 226 first-year students registered for a Java programming course. Of the tests the participants submitted, 214 were assessed both manually and automatically. Various statistical methods were used to compare the manual assessment of student’s solutions with the automatic assessment of the same solutions. A detailed investigation of reasons for differences was also carried out. A further data collection method was the lecturer’s reflection on the feasibility of automatic assessment of programming tasks based on the UTAUT.
This study enhances the knowledge regarding benefits and drawbacks of automatic assessment of students’ programming tasks. The research contributes to the UTAUT by applying it in a context where it has hardly been used. Furthermore, the study is a confirmation of previous work stating that automatic assessment may be less reliable for students with lower marks, but more trustworthy for the high achieving students.
An automatic assessment tool verifying functional correctness might be feasible for assessment of programs written during practical lab sessions but could be less useful for practical tests and exams where functional, conceptual and structural correctness should be evaluated. In addition, the researchers found that automatic assessment seemed to be more suitable for assessing high achieving students.
This paper makes it clear that lecturers should know what assessment goals they want to achieve. The appropriate method of assessment should be chosen wisely. In addition, practitioners should be aware of the drawbacks of automatic assessment before choosing it.
This work serves as an example of how researchers can apply the UTAUT theory when conducting qualitative research in different contexts.
The study would be of interest to lecturers considering automated assessment. The two assessments used in the study are typical of the way grading takes place in practice and may help lecturers understand what could happen if they switch from manual to automatic assessment.
Investigate the feasibility of automatic assessment of students’ programming tasks in a practical lab environment while accounting for structural, functional and conceptual assessment goals.