What Can Holistic Assessment Still See When AI Enters the Learning Process?
By Aliya Assylbekova
Generative AI is changing education in ways that go beyond plagiarism debates. The issue is whether schools can still see how learning happens.
For a long time, learning depended on productive difficulty. Students reread, misunderstood, revised and slowly built understanding. Research on desirable difficulties and productive failure shows that effort and uncertainty can support deeper learning when they are guided carefully rather than left unsupported (Bjork & Bjork, 2011; Kapur, 2008).
AI enters exactly at this difficult point. Students use it to summarise texts, simplify language and organise arguments before fully working through the material themselves. This is understandable: students often experience AI as faster, emotionally easier and less intimidating than asking teachers or peers for help (Henderson et al., 2025; Bearman et al., 2026).
But this is where assessment becomes fragile.
Traditional assessment relied on a quiet assumption: student work reflected student thinking. That assumption is weaker now. A polished essay may show understanding, careful prompting, partial comprehension supported by AI, or all three. Recent assessment research therefore argues that schools should develop students’ evaluative judgement: the ability to judge AI outputs, their own work and the process that connects them (Bearman et al., 2024).
This does not mean every task should ban AI. A more realistic response is to decide when AI should be absent, limited or openly used, and to make those decisions visible. The AI Assessment Scale suggests that clarity about permitted AI use is more useful than vague rules or after-the-fact detection (Perkins et al., 2024; Furze et al., 2024).
The danger is not AI use itself. The danger is hidden AI use.
When students conceal how they used AI, schools lose the chance to guide them. Evidence on underreporting suggests that learners may hide AI use when they feel it is socially or academically unacceptable (Ling et al., 2025). Then teachers cannot see whether AI supported thinking or replaced it.
This is why holistic assessment becomes more urgent. If AI can help produce the visible product, assessment must capture the thinking behind it: interpretation, revision, verification, judgement and explanation. Research on AI-supported knowledge work shows that AI can improve performance on some tasks but weaken it on others, so students need to learn when to rely on AI, when to question it and when to work independently (Dell’Acqua et al., 2026).
So, yes, students may need schemes. But not mechanical forms that kill curiosity. They need light structures that force them to show judgement. A useful AI-use scheme might ask: What did I try before AI? What did AI suggest? What did I accept, reject or revise?
Such schemes matter because AI can produce fluent but shallow answers. A recent review warns that AI risks can move from superficial outputs to superficial learning when students passively accept generated responses, over-rely on them and lose agency (Delikoura et al., 2025). Corbin et al. (2024) raise a similar concern about reading: if students meet a text first through an AI summary, they may gain quick access but bypass the slow interpretation through which understanding develops.
That is why schools need more than rules about AI use.
They need assessment practices that strengthen the habits AI can weaken: inquiry, interpretation, self-regulation and judgement. Finland and Singapore are useful not as ready-made AI solutions, but because they emphasise transversal competences, inquiry and interpretation more than memorisation alone (Finnish National Agency for Education, n.d.; Ministry of Education Singapore, 2026). Estonia makes the AI link more explicit: digital competence includes critical, responsible technology use (Education Estonia, n.d.).
Systems moving beyond answer-based assessment may be better placed to work with AI, because they ask students to make thinking, judgement and responsibility visible.
Teachers therefore need tasks where AI use is visible: supervised drafting, oral explanation, process portfolios, concept maps, source-checking tasks and reflection logs. Students should not merely submit a final answer; they should show how the answer was built.
Parents and students also need a simple rule: AI may support learning after effort, not replace effort itself. Students should first read, attempt, question and draft; then use AI to compare or challenge their thinking. This fits research showing that effortful learning supports durable understanding, while parents shape children’s digital habits at home (Bjork & Bjork, 2011; Livingstone & Blum-Ross, 2020).
The goal is not to ban AI or celebrate it blindly. The goal is to stop AI from turning learning into a shortcut around thinking.
The central question is: what happens when schools can see polished performance, but not the struggle and judgement through which understanding develops?
Maybe the most important part of learning was never the final answer, but the difficult process of arriving there. Assessment must learn how to value that process again.
References
Bearman, M., Fawns, T., Corbin, T., Henderson, M., Liang, Y., Oberg, G., Walton, J., & Matthews, K. E. (2026). Time, emotions and moral judgements: How university students position GenAI within their study. Higher Education Research & Development, 45(4), 884-898. https://doi.org/10.1080/07294360.2025.2580616
Bearman, M., Tai, J., Dawson, P., Boud, D., & Ajjawi, R. (2024). Developing evaluative judgement for a time of generative artificial intelligence. Assessment & Evaluation in Higher Education, 49(6), 893-905. https://doi.org/10.1080/02602938.2024.2335321
Bjork, R. A., & Bjork, E. L. (2011). Making things hard on yourself, but in a good way: Creating desirable difficulties to enhance learning. In M. A. Gernsbacher, R. W. Pew, L. M. Hough, & J. R. Pomerantz (Eds.), Psychology and the real world: Essays illustrating fundamental contributions to society (pp. 56-64). Worth Publishers.
Corbin, T., Liang, Y., Bearman, M., Fawns, T., Flenady, G., Formosa, P., McKnight, L., Reynolds, J., & Walton, J. (2024). Reading at university in the time of GenAI. Learning Letters, 3, Article 35. https://doi.org/10.59453/ll.v3.35
Delikoura, I., Fung, Y. R., & Hui, P. (2025). From superficial outputs to superficial learning: Risks of large language models in education. arXiv. https://arxiv.org/abs/2509.21972
Education Estonia. (n.d.). Digital competence: Empowering teachers and students. https://www.educationestonia.org/innovation/digital-competence/ Finnish National Agency for Education. (n.d.).
National core curriculum for primary and lower secondary education. https://www.oph.fi/en/education-and-qualifications/national-core-curriculum-primary-and-lower-secondary-basic-education
Furze, L., Perkins, M., Roe, J., & MacVaugh, J. (2024). The AI Assessment Scale (AIAS) in action: A pilot implementation of GenAI-supported assessment. Australasian Journal of Educational Technology, 40(4), 38-55. https://doi.org/10.14742/ajet.9434
Henderson, M., Bearman, M., Chung, J., Fawns, T., Buckingham Shum, S., Matthews, K. E., & de Mello Heredia, J. (2025). Comparing generative AI and teacher feedback: Student perceptions of usefulness and trustworthiness. Assessment & Evaluation in Higher Education. https://doi.org/10.1080/02602938.2025.2502582
Kapur, M. (2008). Productive failure. Cognition and Instruction, 26(3), 379-424. https://doi.org/10.1080/07370000802212669
Ling, Y., Kale, A., & Imas, A. (2025). Underreporting of AI use: The role of social desirability bias. SSRN. https://doi.org/10.2139/ssrn.5464215
Livingstone, S., & Blum-Ross, A. (2020). Parenting for a digital future: How hopes and fears about technology shape children’s lives. Oxford University Press. https://doi.org/10.1093/oso/9780190874698.001.0001
Ministry of Education Singapore. (2026). 21st Century Competencies. https://www.moe.gov.sg/education-in-sg/21st-century-competencies
Perkins, M., Furze, L., Roe, J., & MacVaugh, J. (2024). The Artificial Intelligence Assessment Scale (AIAS): A framework for ethical integration of generative AI in educational assessment. Journal of University Teaching and Learning Practice, 21(6). https://doi.org/10.53761/q3azde36
Dell’Acqua, F., McFowland, E., III, Mollick, E. R., Lifshitz-Assaf, H., Kellogg, K. C., Rajendran, S., Krayer, L., Candelon, F., & Lakhani, K. R. (2026). Navigating the jagged technological frontier: Field experimental evidence of the effects of artificial intelligence on knowledge worker productivity and quality. Organization Science, 37(2), 403-423. https://doi.org/10.1287/orsc.2025.21838
About the Author

Aliya Assylbekova
Aliya Assylbekova is a Senior Manager at the Center for Pedagogical Measurements, Nazarbayev Intellectual Schools, Kazakhstan. With 15 years of experience in education policy and accreditation, she specializes in quality assurance, multilingual assessment, and evidence-based monitoring of student achievement. She has served as a peer-review expert for universities and schools.
A Chevening Scholar, she holds a Master’s degree in Educational Leadership and Innovation from the University of Warwick and is currently pursuing a PhD at Gumilyov Eurasian National University. She is a member of the Steering Committee of the Holistic Assessment SIG of AEA-Europe.
Read the Latest Articles
In addition to the annual conference, the AEA-Europe publishes a range of blogs and organizes webinars related to educational assessment. These activities support ongoing discussion, professional development, and knowledge exchange within the community throughout the year.
Participants and researchers can explore topics connected to assessment practices, policy developments, innovation, and current challenges in the field. Feel free to explore these resources and stay connected with the wider educational assessment community.
