Evaluating the Consistency of Responses to Student AI Prompts to an Analytics Visualization [Abstract]

Benjamin Larson, Jeffrey A Bohler, Nandini Bolekar
InSITE 2025  •  2025  •  pp. 10
Aim/Purpose:
As industry and higher education are moving to incorporate generative AI into courses and training there is a need to understand how varied the responses are to a series of prompts. This study's purpose was to evaluate the consistency of AI responses to prompts for teaching analytics using a single visualization across several sections of the same course over a week. The students will eventually use the data to create a linear regression model and while the instructor will provide all the necessary knowledge the students will have freedom to engineer new features which may be identified using AI.

Background:
The use of generative AI is essential for most students. But it can be difficult to know if they are seeing comparable results from a series of complicated prompts, especially as we move up to more advanced topics within a graduate program. Generative AI will usually provide very generic answers as you upload visualizations. But as you begin to ask for information more aligned with your goals your answers could vary depending on the level of personalization, and your results will vary by platform and level of service. Understanding how different responses are is essential as you may want to incorporate a series of prompts that act as a basis for how to explore a figure for specific purposes such as evaluating model assumptions or fit. These prompts can also serve as a basis of motivation to learn additional theory or to critically think about the response that AI has provided. It may also be essential to help students to train their AI to provide prompts expected for the problem.

Methodology:
We collected information from a follow-along assignment that entailed creating a visualization from a publicly available Kaggle Competition that pertained to BMI, smoking, and insurance charges. The students were instructed to use ChatGpt and to upload the visualization which in general should lead to a generic description of the visualization. The students were then asked to prompt regarding a trend that was notable at a BMI of 30 and then ask ChatGPT for features that they should use for a regression model for calculating insurance charges. The responses were submitted and scored by 2 scorers working independently for among other aspects that the original response noted a trend at 30 BMI, if the difference at 30 BMI was noted correctly in the second prompt, and extent of the features provided in the third prompt. Specifically, we evaluated if the AI noted if there should be an interaction between smoking and BMI, whether BMI should be categorized, and if the model should explore an interaction between BMI as a category and the smoking status. After the assignments were independently scored the reviewers then deconflicted any differences.

Contribution:
This study shows that there were a wide variety of differences in the responses indicating that instructors need to take the time to evaluate how well students can utilize the technology and how well their AI has been personalized to the task if students are to be evaluated with the same available resources.

Findings:
The study indicated that most of the students only received a basic response to the original prompt, but a small percentage saw the trend identified before the second prompt. AI correctly identified the trend if specifically prompted with 94% of the responses being suggested to explore a basic interaction between BMI and smoking, 43% to explore categorizing BMI, and only 8% advised to explore an interaction between the categories and the smoking status.

Recommendations for Practitioners:
Ethical use of AI and the value of prompt engineering is necessary in today’s environment. We evaluate how well an assignment given over two weeks would provide consistent results. With the results suggesting a wide variety of responses, educators and trainers need to ensure that students can come to comparable results. This may require more prompting for some students and may also require taking additional time to help some students to train their AI to provide results in the manner needed for the problem.

Recommendations for Researchers:
More work needs to be done to continue to evaluate different AI platforms and personalization to ensure that educational research done on AI is being performed in a comparable manner. There is a need to make sure that interventions or suggested use of AI is conducted with students having equitable resources.

Impact on Society: The use of AI is quickly becoming a requirement for many individuals to perform at work. Evaluating and determining how students with diverse backgrounds can obtain equitable results from the system is important.

Future Research:
More research needs to be done across platforms. More research is needed to evaluate a balance in generative AI use and the ability to maintain material into memory and critical thinking. Studies should be undertaken to evaluate if student’s perception of theory or learning specific technologies or concepts are changed as they are exposed to diverse ways to prompt AI and the responses generated.
generative AI, business analytics, visualizations
25 total downloads
Share this
 Back

Back to Top ↑