What Can Holistic Assessment Still See When AI Enters the Learning Process?

By Aliya Assylbekova

Generative AI is changing education in ways that go beyond plagiarism debates. The issue is whether schools can still see how learning happens.

For a long time, learning depended on productive difficulty. Students reread, misunderstood, revised and slowly built understanding. Research on desirable difficulties and productive failure shows that effort and uncertainty can support deeper learning when they are guided carefully rather than left unsupported (Bjork & Bjork, 2011; Kapur, 2008).

AI enters exactly at this difficult point. Students use it to summarise texts, simplify language and organise arguments before fully working through the material themselves. This is understandable: students often experience AI as faster, emotionally easier and less intimidating than asking teachers or peers for help (Henderson et al., 2025; Bearman et al., 2026).

But this is where assessment becomes fragile.

Traditional assessment relied on a quiet assumption: student work reflected student thinking. That assumption is weaker now. A polished essay may show understanding, careful prompting, partial comprehension supported by AI, or all three. Recent assessment research therefore argues that schools should develop students’ evaluative judgement: the ability to judge AI outputs, their own work and the process that connects them (Bearman et al., 2024).

This does not mean every task should ban AI. A more realistic response is to decide when AI should be absent, limited or openly used, and to make those decisions visible. The AI Assessment Scale suggests that clarity about permitted AI use is more useful than vague rules or after-the-fact detection (Perkins et al., 2024; Furze et al., 2024).

The danger is not AI use itself. The danger is hidden AI use.

When students conceal how they used AI, schools lose the chance to guide them. Evidence on underreporting suggests that learners may hide AI use when they feel it is socially or academically unacceptable (Ling et al., 2025). Then teachers cannot see whether AI supported thinking or replaced it.

This is why holistic assessment becomes more urgent. If AI can help produce the visible product, assessment must capture the thinking behind it: interpretation, revision, verification, judgement and explanation. Research on AI-supported knowledge work shows that AI can improve performance on some tasks but weaken it on others, so students need to learn when to rely on AI, when to question it and when to work independently (Dell’Acqua et al., 2026).

So, yes, students may need schemes. But not mechanical forms that kill curiosity. They need light structures that force them to show judgement. A useful AI-use scheme might ask: What did I try before AI? What did AI suggest? What did I accept, reject or revise?

Such schemes matter because AI can produce fluent but shallow answers. A recent review warns that AI risks can move from superficial outputs to superficial learning when students passively accept generated responses, over-rely on them and lose agency (Delikoura et al., 2025). Corbin et al. (2024) raise a similar concern about reading: if students meet a text first through an AI summary, they may gain quick access but bypass the slow interpretation through which understanding develops.

That is why schools need more than rules about AI use.

They need assessment practices that strengthen the habits AI can weaken: inquiry, interpretation, self-regulation and judgement. Finland and Singapore are useful not as ready-made AI solutions, but because they emphasise transversal competences, inquiry and interpretation more than memorisation alone (Finnish National Agency for Education, n.d.; Ministry of Education Singapore, 2026). Estonia makes the AI link more explicit: digital competence includes critical, responsible technology use (Education Estonia, n.d.).

Systems moving beyond answer-based assessment may be better placed to work with AI, because they ask students to make thinking, judgement and responsibility visible.

Teachers therefore need tasks where AI use is visible: supervised drafting, oral explanation, process portfolios, concept maps, source-checking tasks and reflection logs. Students should not merely submit a final answer; they should show how the answer was built.

Parents and students also need a simple rule: AI may support learning after effort, not replace effort itself. Students should first read, attempt, question and draft; then use AI to compare or challenge their thinking. This fits research showing that effortful learning supports durable understanding, while parents shape children’s digital habits at home (Bjork & Bjork, 2011; Livingstone & Blum-Ross, 2020).

The goal is not to ban AI or celebrate it blindly. The goal is to stop AI from turning learning into a shortcut around thinking.

The central question is: what happens when schools can see polished performance, but not the struggle and judgement through which understanding develops?

Maybe the most important part of learning was never the final answer, but the difficult process of arriving there. Assessment must learn how to value that process again.

References

Bearman, M., Fawns, T., Corbin, T., Henderson, M., Liang, Y., Oberg, G., Walton, J., & Matthews, K. E. (2026). Time, emotions and moral judgements: How university students position GenAI within their study. Higher Education Research & Development, 45(4), 884-898. https://doi.org/10.1080/07294360.2025.2580616

Bearman, M., Tai, J., Dawson, P., Boud, D., & Ajjawi, R. (2024). Developing evaluative judgement for a time of generative artificial intelligence. Assessment & Evaluation in Higher Education, 49(6), 893-905. https://doi.org/10.1080/02602938.2024.2335321

Bjork, R. A., & Bjork, E. L. (2011). Making things hard on yourself, but in a good way: Creating desirable difficulties to enhance learning. In M. A. Gernsbacher, R. W. Pew, L. M. Hough, & J. R. Pomerantz (Eds.), Psychology and the real world: Essays illustrating fundamental contributions to society (pp. 56-64). Worth Publishers.

Corbin, T., Liang, Y., Bearman, M., Fawns, T., Flenady, G., Formosa, P., McKnight, L., Reynolds, J., & Walton, J. (2024). Reading at university in the time of GenAI. Learning Letters, 3, Article 35. https://doi.org/10.59453/ll.v3.35

Delikoura, I., Fung, Y. R., & Hui, P. (2025). From superficial outputs to superficial learning: Risks of large language models in education. arXiv. https://arxiv.org/abs/2509.21972

Education Estonia. (n.d.). Digital competence: Empowering teachers and students. https://www.educationestonia.org/innovation/digital-competence/ Finnish National Agency for Education. (n.d.).

National core curriculum for primary and lower secondary education. https://www.oph.fi/en/education-and-qualifications/national-core-curriculum-primary-and-lower-secondary-basic-education

Furze, L., Perkins, M., Roe, J., & MacVaugh, J. (2024). The AI Assessment Scale (AIAS) in action: A pilot implementation of GenAI-supported assessment. Australasian Journal of Educational Technology, 40(4), 38-55. https://doi.org/10.14742/ajet.9434

Henderson, M., Bearman, M., Chung, J., Fawns, T., Buckingham Shum, S., Matthews, K. E., & de Mello Heredia, J. (2025). Comparing generative AI and teacher feedback: Student perceptions of usefulness and trustworthiness. Assessment & Evaluation in Higher Education. https://doi.org/10.1080/02602938.2025.2502582

Kapur, M. (2008). Productive failure. Cognition and Instruction, 26(3), 379-424. https://doi.org/10.1080/07370000802212669
Ling, Y., Kale, A., & Imas, A. (2025). Underreporting of AI use: The role of social desirability bias. SSRN. https://doi.org/10.2139/ssrn.5464215
Livingstone, S., & Blum-Ross, A. (2020). Parenting for a digital future: How hopes and fears about technology shape children’s lives. Oxford University Press. https://doi.org/10.1093/oso/9780190874698.001.0001

Ministry of Education Singapore. (2026). 21st Century Competencies. https://www.moe.gov.sg/education-in-sg/21st-century-competencies
Perkins, M., Furze, L., Roe, J., & MacVaugh, J. (2024). The Artificial Intelligence Assessment Scale (AIAS): A framework for ethical integration of generative AI in educational assessment. Journal of University Teaching and Learning Practice, 21(6). https://doi.org/10.53761/q3azde36

Dell’Acqua, F., McFowland, E., III, Mollick, E. R., Lifshitz-Assaf, H., Kellogg, K. C., Rajendran, S., Krayer, L., Candelon, F., & Lakhani, K. R. (2026). Navigating the jagged technological frontier: Field experimental evidence of the effects of artificial intelligence on knowledge worker productivity and quality. Organization Science, 37(2), 403-423. https://doi.org/10.1287/orsc.2025.21838

About the Author

Aliya Assylbekova

Aliya Assylbekova is a Senior Manager at the Center for Pedagogical Measurements, Nazarbayev Intellectual Schools, Kazakhstan. With 15 years of experience in education policy and accreditation, she specializes in quality assurance, multilingual assessment, and evidence-based monitoring of student achievement. She has served as a peer-review expert for universities and schools.

A Chevening Scholar, she holds a Master’s degree in Educational Leadership and Innovation from the University of Warwick and is currently pursuing a PhD at Gumilyov Eurasian National University. She is a member of the Steering Committee of the Holistic Assessment SIG of AEA-Europe.

Read the Latest Articles

In addition to the annual conference, the AEA-Europe publishes a range of blogs and organizes webinars related to educational assessment. These activities support ongoing discussion, professional development, and knowledge exchange within the community throughout the year.

Participants and researchers can explore topics connected to assessment practices, policy developments, innovation, and current challenges in the field. Feel free to explore these resources and stay connected with the wider educational assessment community.

Teaching and assessing in higher military education – when worlds combine

BlogsTeaching and assessing in higher military education – when worlds combine By Marte Søve Syverud10 months ago, I left my position as an associate professor in teacher education, for a similar position at the Norwegian Defence University College. A scary leap into...

Assessment of employability constructs embedded in curriculum

BlogsAssessment of employability constructs embedded in curriculum By Steven Briggs and Julie BruntonIntroduction The UK graduate job market is currently ultra-competitive. The Institute for Student Employers (2026) report shows that demand is continuing to outstrip...

Work-related assessment: why it matters now – reflections from the inaugural AEA-Europe SiG webinar

BlogsWork-related assessment: why it matters now - reflections from the inaugural AEA-Europe SiG webinar By Stuart ShawOn 24 February, I had the pleasure of chairing the inaugural webinar of AEA-Europe’s Work-Related Assessment Special Interest Group (SiG). I was...

So what exactly is work-related assessment and why does it matter?

BlogsSo what exactly is work-related assessment and why does it matter? Insights from the AEA Europe Conference 2025 Key Themes and Challenges in Vocational and Professional Education AssessmentOn 6th November 2025, at the AEA-Europe Conference in The Hague,...

Rethinking what is assessed in knowledge-rich qualifications

BlogsRethinking what is assessed in knowledge-rich qualifications by Irenka SutoResearchers in the field of holistic assessment would agree that supporting schools to get learners ready for the future requires much more than assessing strong subject knowledge alone....

What I Learned from My PhD Journey

BlogsWhat I have Learned from my PhD Journey by Dan-Anders NormannAs I sit here now, waiting for the evaluation committee to review my PhD dissertation, I finally have the time to reflect on the journey I began five years ago. It was a journey into a research field,...

Assessment culture: looking forward or looking back?

BlogsAssessment culture: looking forward or looking back? by George MacBrideToday assessment is moving from looking back to looking forward, aligned with new thinking about learning, pedagogy and children’s rights. Instead of gathering evidence of attainment to judge...

Should we trust teachers’ practices or standardized tests?

BlogsShould we trust teachers’ practices or standardized tests? by Raphaël PasquiniWhile students around the world received last summer their report cards and diplomas, in Switzerland, the Federation of Swiss Enterprises (Economiesuisse[1]) recently took a stand on...

What should the word ‘holistic’ imply?

BlogsWhat should the word ‘holistic’ imply? by George MacBrideRecent Holistic Assessment SIG seminars have prompted reflection on what we mean by the term ‘holistic’, a word often employed in education as a token of approval without clear definition of what it means...

Educational Assessment in a Changing World

BlogsEducational Assessment in a Changing World: Lessons Learned and the Path Ahead by Isabel Nisbet and Stuart ShawWhat is the state of educational assessment a quarter of the way into the 21st century? What lessons have been learned from recent decades and what is...