- Research
- Open access
- Published:
Empowering medical students with AI writing co-pilots: design and validation of AI self-assessment toolkit
BMC Medical Education volume 25, Article number: 159 (2025)
Abstract
Background and objectives
Assessing and improving academic writing skills is a crucial component of higher education. To support students in this endeavor, a comprehensive self-assessment toolkit was developed to provide personalized feedback and guide their writing improvement. The current study aimed to rigorously evaluate the validity and reliability of this academic writing self-assessment toolkit.
Methods
The development and validation of the academic writing self-assessment toolkit involved several key steps. First, a thorough review of the literature was conducted to identify the essential criteria for authentic assessment. Next, an analysis of medical students' reflection papers was undertaken to gain insights into their experiences using AI-powered tools for writing feedback. Based on these initial steps, a preliminary version of the self-assessment toolkit was devised. An expert focus group discussion was then convened to refine the questions and content of the toolkit. To assess content validity, the toolkit was evaluated by a panel of 22 medical student participants. They were asked to review each item and provide feedback on the relevance and comprehensiveness of the toolkit for evaluating academic writing skills. Face validity was also examined, with the students assessing the clarity, wording, and appropriateness of the toolkit items.
Results
The content validity evaluation revealed that 95% of the toolkit items were rated as highly relevant, and 88% were deemed comprehensive in assessing key aspects of academic writing. Minor wording changes were suggested by the students to enhance clarity and interpretability. The face validity assessment found that 92% of the items were rated as unambiguous, with 90% considered appropriate and relevant for self-assessment. Feedback from the students led to the refinement of a few items to improve their clarity in the context of the Persian language. The robust reliability testing demonstrated the consistency and stability of the academic writing self-assessment toolkit in measuring students' writing skills over time.
Conclusion
The comprehensive evaluation process has established the academic writing self-assessment toolkit as a robust and credible instrument for supporting students' writing improvement. The toolkit's strong psychometric properties and user-centered design make it a valuable resource for enhancing academic writing skills in higher education.
Introduction
Writing is frequently regarded as the most challenging language skill due to genre, style, and language-related complexities [1]. This challenge is often greater for multilingual and second language (L2) learners, as they must navigate both linguistic and cognitive demands simultaneously [2]. Medical students, in particular, face unique challenges in academic writing due to the specialized vocabulary, precise language, and structured formats required in scientific writing [3]. For multilingual students, these challenges are further compounded by language-specific issues, such as limited vocabulary, grammatical difficulties, and the struggle to convey complex ideas clearly in a second language [4]. Multilingual learners often struggle to interpret feedback provided in academic English and may have difficulty distinguishing between appropriate and inappropriate uses of academic language [5]. Given these unique needs, traditional assessments and general-purpose automated tools often fail to provide the level of support required to effectively bridge these gaps.
Despite the introduction of mandatory academic writing classes in many medical schools to address these challenges, traditional instruction and assessment methods often prove insufficient in developing students' proficiency in scientific writing [6]. With the recent introduction of AI-based tools like ChatGPT, the landscape of academic writing has undergone a significant transformation [7]. AI-powered writing assistants offer instant feedback, coherent text generation, and vocabulary suggestions, allowing students to produce high-quality texts quickly and easily [8]. However, these tools also raise concerns regarding academic integrity and the authenticity of student work. Conventional evaluation methods may no longer be effective in distinguishing between genuine student efforts and AI-generated content, posing a challenge for educators in accurately assessing students' academic capabilities [8].
The implications of AI on traditional writing assessments necessitate a shift towards more authentic evaluation methods that capture students' developmental processes over time rather than static final products. Bridgeman [9] proposes three potential strategies: avoiding AI altogether, modifying assessments to outsmart AI or adapting educational practices to incorporate AI's presence. Given the rapid evolution of AI technologies, avoiding or outrunning AI may only serve as a temporary solution [10]. Instead, educational practices must embrace AI integration and focus on process-based assessments that evaluate students' learning journeys, critical thinking, and problem-solving abilities [11].
AI tools hold transformative potential for academic and clinical writing for medical students by enhancing clarity, coherence, and precision—critical skills for effective patient care and professional communication [12]. AI-driven platforms, such as natural language processing tools, provide real-time grammar, syntax, and organization feedback, enabling students to refine their documentation practices and adhere to medical writing standards [13]. These tools can also assess clinical notes' logical flow and content accuracy, which is vital for creating clear and actionable patient records [14]. Moreover, AI technologies such as automated citation managers and literature summarization algorithms can streamline the process of academic writing, allowing students to efficiently integrate evidence-based research into essays, case reports, or research articles [15]. By cultivating these writing competencies, AI supports the development of structured and concise documentation, a cornerstone of effective patient care and interprofessional communication [16]. Additionally, the iterative feedback loop provided by AI fosters continuous improvement in medical writing, helping students identify recurring weaknesses and progressively build confidence in their skills [17]. This integration of AI into medical education ensures that future healthcare professionals are adept at producing high-quality written content that directly impacts clinical outcomes and the broader medical community [17].
For medical students, in particular, developing robust writing skills is crucial, as it directly impacts their ability to publish research, communicate clinical findings, and contribute to the broader medical field [18]. On the other hand, the integration of AI in academic and clinical settings is inevitable. To prepare medical students for real-world challenges and foster authentic learning, they must learn how to effectively leverage AI tools as part of their academic and professional development [19, 20]. This approach ensures that medical students are equipped not only with the technical proficiency to utilize AI but also the critical thinking and evaluative judgment necessary to discern when and how to use these tools responsibly in medical practice [5, 18]. AI tools, while beneficial in providing immediate feedback and suggesting structural improvements, risk fostering over-reliance if not balanced with traditional learning strategies [5]. Therefore, it is essential to design assessment frameworks that leverage the strengths of AI while promoting critical thinking, creativity, and ethical considerations. The current study addresses this need by developing and validating an AI-assisted self-assessment toolkit specifically tailored for medical students. This toolkit aims to enhance students' writing skills by providing structured feedback that encourages self-reflection, goal setting, and continuous improvement [21].
The imperative of recognizing the evolving nature of assessment
Traditional assessments have been criticized for being too standardized, giving all students the same tasks or questions regardless of their knowledge, abilities, experiences, and cultural backgrounds [22]. Furthermore, traditional assessments are reportedly inauthentic representations of student capabilities [23, 24]. Traditional assessments in writing, for example, are often criticized for not being truly authentic reflections of a person's writing skills. In the real world, people can access many resources and tools to aid them in writing. They can browse the internet, conduct research, use the ideas of others, share drafts with colleagues or friends, receive feedback, and make revisions based on that feedback. However, in final exams or renowned assessments like the IELTS and TOEFL closed-book exams, test takers are often required to work in isolation, without access to materials or textbooks, and are not allowed to use any of the tools or resources that are naturally at their disposal in the real world [11]. This issue can be a significant disadvantage for many individuals as it does not accurately reflect their writing abilities in a real-world context. By limiting students to working in isolation without access to these tools, traditional writing assessments fail to accurately assess students’ true abilities. Memorization is unquestionably not the ultimate learning aim in courses like academic/scientific writing, notwithstanding the distinctions between fields. Additionally, how evaluation and feedback are now conducted positions students as passive input consumers rather than teaching them how to recognize specific criteria that apply in different contexts [25].
Moreover, traditional assessments are outdated as they evaluate skills that are becoming obsolete. For example, in a writing assessment, students might still be asked to write essays by hand, even though most writing in the real world is done on computers. In the past, academic writing was primarily done by hand or on typewriters, so criteria like legibility and indentation were important factors in determining the quality of a paper. However, with the prevalence of computers and word-processing software today, these criteria are no longer as relevant. Students now have access to tools like spell check, grammar check, and formatting options that can easily fix issues with legibility and indentation. Consequently, many teachers and professors now expect students to type their papers, rendering these outdated criteria unnecessary. By including criteria like legibility and indentation in academic writing rubrics, educators may inadvertently penalize students for issues that can be resolved with a simple command.
To address these issues, instructors should include modern writing practices that require students to research a topic, collaborate with peers, or revise their work based on feedback. By providing students with opportunities to demonstrate their writing skills in a more realistic setting, assessments can better reflect their true abilities and better prepare them for success in the real world.
However, even with access to modern tools like grammar checkers and automated writing evaluation systems, students may still face challenges in developing critical thinking and independent writing skills due to the limitations of these technologies.
Existing automated writing tools, such as Grammarly and other Automated Writing Evaluation (AWE) systems, primarily focus on surface-level corrections, including grammar, punctuation, and syntax errors [26]. While these tools can be useful for identifying common language issues, research has shown that they often encourage students to accept suggested changes without fully understanding the reasoning behind them [27]. This passive reliance can inhibit students’ ability to engage critically with their own work and hinder the development of independent writing skills [10]. Moreover, AWE tools may perpetuate errors due to the limitations of their algorithms, which are not always able to provide context-specific feedback or recognize discipline-specific conventions accurately [28].
For multilingual students, this lack of nuanced, context-aware feedback can be particularly detrimental, as it prevents them from addressing deeper issues related to language use, coherence, and academic argumentation [29]. Our AI-assisted self-assessment toolkit addresses these limitations by integrating structured reflection prompts and promoting active evaluation of AI-generated feedback. This approach encourages students to critically analyze the feedback they receive, compare it with their own understanding, and make informed decisions about its applicability to their writing. By fostering a more interactive and reflective process, the toolkit helps students develop their evaluative judgment and reduces the risk of passive acceptance of AI suggestions, ultimately supporting a deeper engagement with their writing.
Why is ‘Authentic Assessment’ truly indispensable in the era of AI writing co-pilots?
In education, AI tools pose challenges related to accuracy, reliability, and plagiarism [10, 30]. Issues such as biased data, limited knowledge updates, and the potential for generating incorrect information raise concerns about academic integrity [30]. There is also a risk of overreliance on AI, which can impact students' critical thinking skills and enable them to bypass plagiarism detection software [19]. The tool's ability to complete writing assignments with minimal effort may lead to learning loss and reduced creativity and critical thinking skills in students [5].
Furthermore, AI can create original content that is difficult to distinguish from students' work, presenting challenges to teachers in evaluating assignments [31]. The tool's use may hinder important goals of writing pedagogy, such as fostering creativity, critical thinking, and clear communication [32]. Teachers may struggle to discern between students' authentic work and AI-generated text, potentially impacting students' ability to effectively link written texts to real-life contexts and solve practical problems [33].
Another significant pitfall of overusing AI in academic settings is the potential for information and cognitive overload. Due to the tool's ability to quickly generate vast amounts of text, students and researchers may struggle to process and analyze the information effectively, leading to confusion and difficulty synthesizing complex ideas [34]. This overreliance on AI for text-generation tasks can hinder students' ability to develop their research and writing skills, ultimately limiting their academic growth and intellectual development [35]. Therefore, by implementing authentic assessment methods that prioritize original thinking, thorough research, and in-depth analysis, educators can help mitigate the negative impacts of overreliance on tools like ChatGPT in academic writing and foster academic integrity, critical thinking skills, and research quality in educational settings.
Authentic assessment is a method that focuses on evaluating students' understanding and abilities through real-world tasks, emphasizing the process rather than just the end product. Authentic assessment stands out for its process-oriented nature. Unlike traditional assessments that often prioritize rote memorization or regurgitation of facts, students are encouraged to participate in higher-order thinking activities, including analysis, synthesis, and evaluation through authentic assessment. Through such tasks, students gain a deeper understanding of the subject matter [36].
Incorporating authentic assessment in academic writing can involve tasks such as research papers with iterative feedback loops from peers or instructors. These collaborative writing projects require multiple drafts, revisions, or reflective journals where students document their writing processes. These methods assess the final written product and provide valuable insights into how students develop their ideas, structure their arguments, and incorporate evidence to support their claims. Educators can nurture essential skills beyond surface-level content generation by focusing on how students navigate complex writing tasks rather than just evaluating the final output.
In the wake of AI's evolution, the significance of incorporating authenticity in assessment has become more important than ever. AI systems often struggle to understand and integrate human qualities like cultural sensitivity, interpersonal skills, creativity, moral judgment, social responsibility, and decision-making abilities. Lodge, Yang [37] argue that while generative AI can produce specific artifacts effectively, it cannot replicate the unique human learning experience with its inherent challenges and discoveries. He suggests that although AI can simulate this learning journey, it falls short of fully replicating it. Lodge emphasizes the importance of assessing learning processes to track this journey, ensuring that assessment remains relevant and trustworthy, rather than focusing solely on outcomes.
Therefore, this research investigates the potential applications of authentic assessment in the academic writing self-evaluation checklist. The study is grounded in principles established by Villarroel, Bloxham [11], who identified 13 essential characteristics grouped into three categories: realism, cognitive challenge, and authentic evaluative judgment. According to Bosco and Ferns [38], realism— defined as depicting situations that might be encountered outside universities—is the first principle that distinguishes authentic assessment.
In a five-year study, O’Neill and Short [39] analyzed the free-text responses of 93,743 students to determine how institutions can improve engagement in learning. Although students did not describe their experiences using the term “authentic learning,” their feedback frequently referenced the theme of “work.” The themes of “real world,” “real life,” and “group” were closely connected to “work,” as students expressed a desire for instruction that was relevant, practical, and aligned with their careers. Authentic audiences play a crucial role in creating meaningful, engaging learning opportunities by bridging classroom knowledge and real-life application [40].
According to Thorburn [41], the second principle introduces a cognitive challenge that requires students to apply higher-order cognitive skills to transform knowledge into something new. This principle aligns with experiential learning, which emphasizes practical and reflective activities over rote memorization. This approach to learning fosters a more profound comprehension of academic subjects [42]. Salinas-Navarro, Vilalta-Perdomo [43] suggest that assessments evaluating higher-order cognitive skills—such as analysis, creation, and evaluation—have the potential to enable more meaningful learning experiences for students. Furthermore, these assessments can challenge AI tools, making it more difficult for them to handle such complex evaluations.
Graduates must also develop evaluative judgment to determine their own and other people's work caliber. According to Tai, Ajjawi [25], evaluative judgment enables students to predict, monitor, and improve the quality of their own work, thereby promoting self-regulated learning. This skill is essential for lifelong learning and not limited to specific courses [25, 44]. Tai, Ajjawi [25] outline two interdependent parts of evaluative judgment: first, knowing what constitutes high-quality work by engaging with a standard, whether explicit or implicit, and second, applying this knowledge to assess one's own or others' work.
Egodawatte [45] states, “One defining characteristic of independent learners is their ability to self-assess,” which can be facilitated using a rubric. According to Broad [46], using rubrics as a scaffold has significantly improved writing assessment and teaching. He suggests that rubrics may have been more beneficial than any other concept or technology.
To sum up, authentic assessment is more effective than conventional approaches since it is dynamic and adaptable [47]– qualities that are essential for AI writing co-pilots. It presents students with intricate intellectual challenges grounded in various criteria and prioritizes their development toward mastery in realistic and context-specific scenarios.
Given the importance of realism, cognitive challenge, and evaluative judgment (outlined by Villarroel, Bloxham [11]) in fostering deeper comprehension and higher-order thinking skills, this study set out to develop and integrate a self-assessment toolkit into our compulsory academic writing courses.
The pedagogical philosophy behind self-assessment
In response to the earlier concerns about the value of process-oriented assessment, this study incorporated a self-assessment mechanism to further examine its pedagogic value for academic writing courses. This approach is based on the assumption that learning progresses through evaluating one’s and others’ performances against clear criteria [48].
In this context, self-assessment in this study refers to the process by which students reflect on their learning, performance, and progress, typically through reflective exercises or by evaluating their work. The transition from behaviorism to constructivism, from summative to formative assessment, and from a product-oriented to a process-oriented approach to writing has paved the way for exploring self-assessment as an effective strategy in higher education [49]. In the ESL classroom, self-assessment has been increasingly recognized as an alternative or non-traditional assessment form [50]. According to [51], self-assessment is integral to education, fostering students' ability to assess their outcomes and understand the impact on their learning. Nielsen [52] suggests that self-assessment promotes reflection and metacognition in ESL writing classrooms, transferring the responsibility from the teacher to the students. Emphasizing self-assessment is crucial in aligning classroom practices with the goals of higher education, which aim to produce independent and critical thinkers [53].
Research questions:
-
1. Which evaluative criteria should the AI self-assessment writing toolkit include?
-
2. To what extent do experts consider the questions in the self-assessment toolkit to be important and comprehensive for evaluating students' writing skills?
-
3. To what extent does the academic writing self-assessment toolkit demonstrate content validity, face validity, and reliability in measuring students' academic writing skills?
-
4. To what extent does the academic writing self-assessment toolkit influence students' self-evaluations of their writing abilities compared to self-assessments conducted without the toolkit?
Methodology
Design and method
The current study sought to devise and validate a toolkit for medical students to improve their writing skills by providing diverse self-assessment prompts and guidelines. This toolkit was developed and evaluated for reliability and validity using a mixed-method approach, incorporating both qualitative and quantitative methodologies. Data from medical students’ reflections and experts’ focus group discussions were analyzed qualitatively to inform the construction of the AI-assisted self-assessment toolkit. In the quantitative phase, content and face validity analyses were conducted to confirm the tool’s validity. Finally, the study employed a comparative design to evaluate the impact of the toolkit on students' self-evaluations of their writing abilities. This involved examining self-assessment outcomes both with and without the toolkit to determine its influence on students' reflective practices and perceptions of their writing skills. Quantitative measures were complemented by qualitative insights to comprehensively understand the toolkit’s effectiveness in enhancing self-regulated learning.
Population and sampling
The study employed a multi-stage sampling approach involving two distinct participant groups:
-
(1)
Initial Participant Group: The initial group was recruited using a purposive sampling strategy. The researchers issued a call for participation to students enrolled in a compulsory academic writing course in Fall 2023. From this pool, 22 medical students, aged 20 to 22 years, volunteered to reflect on their experiences using AI for feedback on their writing assignments. All participants were third-year students at a medical university and English as a Foreign Language (EFL) learners with an intermediate level of English proficiency, as determined by their university entrance exam scores. Enrollment in the 3-unit academic writing course required prior completion of foundational English courses, including General English 1, General English 2, and an English for Specific Purposes (ESP) course tailored specifically for medical students. This prerequisite coursework ensured that all students possessed a baseline English proficiency level adequate for engaging with the advanced academic writing skills taught in the course.
The purposive sampling approach allowed researchers to target students who had direct experience using AI writing co-pilots, providing valuable insights into their perceptions, learning experiences, and the influence of AI-powered assistance on their writing.
During the final stage of the study, the same group of medical students was asked to evaluate the self-assessment writing toolkit. This process helped determine the toolkit's face validity, clarity, and effectiveness in enabling students to use AI writing co-pilots responsibly.
-
(1)
Expert Panel: A panel of six university experts was also purposefully selected to explore the themes and patterns present in the student reflections and to help craft questions aligned with authentic assessment criteria. The experts were chosen based on their proficiency in the following areas:
-
a) AI applications in education and writing assessment.
-
b) Designing writing prompts and rubrics.
-
c) Implementing effective feedback practices to promote student learning and growth in writing skills.
-
d) Addressing the unique challenges and needs of medical students in developing their writing abilities.
The multi-faceted sampling approach ensured that the perspectives of both students and subject matter experts were thoroughly incorporated into the development and validation of the AI self-assessment writing toolkit.
Development process of the AI self-assessment writing toolkit
The six-step process of devising self-assessment questions for the medical students in the academic writing course followed a systematic approach that took into account various factors, such as the students' engagement with AI for feedback, their reflection on the feedback received, and improvement in writing skills, critical thinking, and analysis.
Stage 1: designing authentic writing prompts
This initial stage involved thorough literature research to determine the fundamentals of authentic assessment necessary to inject into the writing course [11]. In the rubric development process, defining the criteria and standards is a critical step [53]. Considering ‘realism,’ ‘cognitive challenge,’ and ‘authentic evaluative judgment,’ we designed our course activities and writing prompts to prepare our medical students for the demands of their future careers. We used easy real-world medical scenarios and case studies as writing prompts to immerse students in the practical aspects of medicine. For example, we asked students to write on a current medical issue, analyze the ethical implications of a particular medical procedure, or compare and contrast two disorders.
One of the prompts designed for our medical students was the following: "Eating disorders such as anorexia and bulimia can have devastating effects on teenagers, impacting both physical and mental health. Explore the key differences and similarities between anorexia and bulimia. Additionally, examine the role of social media in exacerbating these disorders among teenagers, discussing how unrealistic beauty standards can contribute to the development and perpetuation of these disorders.”
Evaluating Realism: To assess the realism of this writing prompt, it is crucial to assess the accuracy and depth of information provided about anorexia and bulimia. Student essays should comprehensively understand the key distinctions and similarities between these two eating disorders, including their symptoms, causes, and potential treatments. Realism also involves the utilization of appropriate terminology and evidence-based research to substantiate arguments. Cognitive Challenge: The cognitive challenge is determined by the complexity and depth of analysis required to explore the differences and similarities between anorexia and bulimia, as well as the influence of social media in exacerbating these disorders among teenagers. Students should be encouraged to critically evaluate the impact of unrealistic beauty standards propagated by social media and devise potential strategies for prevention and intervention.
Authentic Evaluative Judgment: Authentic evaluative judgment in the writing prompt necessitates critically assessing the information provided, drawing logical conclusions, and proposing insightful recommendations. Students must show a firm grasp of the complexities surrounding anorexia and bulimia, as well as the broader societal influences contributing to these disorders. Authentic evaluative judgment is also exemplified in the ability to suggest thoughtful solutions and interventions to mitigate social media's detrimental effects on teenagers' mental health and body image.
Stage 2: medical student reflections on AI feedback
After preparing eight writing prompts (see samples in Appendix A) covering genres such as argumentative, cause-and-effect, problem–solution, and analytical essays —genres aligned with their final exam—in stage 2, we instructed students to initially compose their short papers (with a maximum length of 400 words) without using AI assistance. They were then asked to engage with AI by inputting the following prompt into the dialogue box: "Please review and revise the following essay, addressing any necessary corrections related to grammar, punctuation, vocabulary, and coherence. Additionally, provide detailed rationales for each modification or suggestion to facilitate comprehension of the editing process." The purpose of this prompt was to underscore the significance of providing feedback to students that is not only timely but also accompanied by suitable preparation that enables students to use it effectively. According to Black and Wiliam [54], by providing detailed feedback on the rubric, students can evaluate their areas of strength and weakness, pinpoint areas in which they need to progress, and create their growth strategies. They argue that feedback is a continual process that calls for several opportunities for students to receive it and the ability to change their attitudes about themselves and their behavior rather than a one-time solution. Our medical students were particularly instructed to submit the following:
-
1) The original, unedited essay.
-
2) The AI-edited version of the essay.
-
3) A structured reflection that included:
-
Describing (in Persian or English) the key changes and suggestions made by the AI system.
-
Explaining their understanding of the AI's rationale for each edit or recommendation.
-
Discussing whether they agreed or disagreed with the AI's feedback and why.
-
Reflecting on how the AI's input influenced their writing process and the final essay.
-
Identifying any ambiguities or areas of confusion in the AI's explanations.
-
This structured reflection exercise served two purposes: it enabled researchers to gauge the students' understanding of the AI feedback and their capacity to critically evaluate the suggestions. Additionally, it provided valuable insights into the students' perceptions, learning experiences, and the overall impact of AI-powered writing assistance.
At the end of the 17-week academic semester, the researchers gathered feedback from 22 medical students on the AI-generated feedback they received for their eight assignments. This amounted to a total of 176 reflection papers.
Stage 3: thematic analysis of student reflections
The student reflection papers were thoroughly analyzed to gain insights into their learning experiences, perceptions of the AI feedback, and the impact on their writing skills. By closely reviewing the 176 reflections (see the sample in Appendix B), the researchers identified recurring themes, patterns, and valuable insights shared by the students regarding their use of AI writing co-pilots. This qualitative data provided a window into the students' thought processes, challenges faced, and areas of growth throughout the writing assignments. The thematic analysis of the reflections allowed the researchers to understand the students’ perspectives in depth. Key themes and insights were extracted, which directly informed the development of the first draft of the self-assessment writing toolkit. This iterative process ensured that the toolkit effectively addressed the students' needs and concerns, empowering them to leverage AI tools responsibly while cultivating essential academic writing skills.
Sage 4: expert focus group
In this stage, six expert participants were chosen using a purposive sampling strategy. According to Colton and Covert [55], focus groups are a helpful way to get input while a new instrument is being developed. The literature states that the focus group's size is determined by the study's scope and the available resources [56]. Bloor [57] suggests that the ideal size for focus groups ranges between six and eight participants.
Two focus group discussion sessions, each lasting about three hours (with a 20-min break for refreshments), were conducted with a panel of experts to explore the themes and patterns identified in the student reflections and refine the toolkit questions according to authentic assessment criteria. At the conclusion of the sessions, the experts were assured that they would receive the transcriptions for final review and approval.
Stage 5: thematic analysis and expert validation
The next step involved transcribing the discussion sessions. According to Patrick, Burke [58], the participants must review and verify the accuracy of the transcribed content as a part of the validation process. Consequently, once the discussion was transcribed, it was shared with each participant for review. The experts carefully read the transcript, made the necessary corrections, and returned it to the researchers.
During this stage, the transcripts were closely read and indexed for easier interpretation. The researchers used the indexed data to refine the writing toolkit, resulting in an updated version. The revised evaluative criteria checklist of the self-assessment toolkit (available in the supplementary file) was then sent to the same experts who participated in the focus group sessions for their final evaluation. The experts were asked to rank the significance of each question in the toolkit on a scale of 0 to 5, with 5 indicating high significance and 0 denoting unimportance.
Stage 6: student evaluation of the academic writing Toolkit
In the final stage of development, the writing self-assessment toolkit was evaluated for validity and reliability by the students. The 22 medical student participants were asked to carefully review each item and question included in the academic writing toolkit. They were instructed to assess the relevance and comprehensiveness of the toolkit in evaluating key aspects of academic writing skills. The students provided feedback on whether the current items adequately covered the essential elements necessary for assessing writing proficiency. Where gaps were identified, the students suggested additional content that could be incorporated to enhance the overall scope and validity of the toolkit.
In addition to reviewing content, the medical student participants evaluated each item's clarity, wording, and appropriateness in the academic writing toolkit. They noted whether the items were easy to understand and interpret as intended by the toolkit developers. The students identified any questions or statements that were ambiguous or confusing and provided recommendations for rewording or reformatting those problematic items. This feedback was crucial for improving the face validity of the self-assessment tool.
To assess the reliability of the academic writing toolkit, the medical student participants completed the full set of items twice, with a 2-week interval between the two administrations. They were instructed to carefully respond to each item to the best of their ability during both completions. After the second round, the students compared their responses between the two-time points and noted any discrepancies. They also reflected on factors that may have influenced the consistency of their self-assessments, such as changes in their writing skills or interpretation of the items. This test–retest reliability data provided valuable insights for refining the toolkit and improving its consistency as a self-assessment instrument. The feedback and data gathered from the medical student participants were instrumental in evaluating the content validity, face validity, and reliability of the academic writing toolkit. Their input helped the research team identify areas for improvement and make necessary revisions to strengthen the overall quality and utility of the self-assessment toolkit.
Stage 7: Comparative evaluation
To address the fourth research question, which examined the influence of the academic writing self-assessment toolkit on students’ self-evaluations of their writing abilities, a comparative evaluation was conducted with the same group of 22 medical students who participated in earlier stages of the study. This approach ensured consistency and enabled a robust paired analysis of their self-assessment scores.
The final stage involved two rounds of self-assessments:
Without the Toolkit: Students completed an initial self-assessment using a 5-point Likert scale to evaluate four criteria: Understanding Feedback, Usefulness of Feedback, Improvement in Writing Skills, and Goal Setting. This baseline assessment was conducted using writing prompts distinct from those used in the subsequent toolkit-based assessment to avoid practice effects. The questions provided in this assessment are outlined in Table 1.
These questions were designed to capture students’ perceptions of the feedback they received and their ability to act on it before interacting with the self-assessment toolkit. By focusing on their understanding of feedback, its usefulness, their perceived writing improvement, and their ability to set goals, this initial assessment establishes a baseline for evaluating the toolkit's influence.
In addition to scoring each question, students were given the option to provide comments. These qualitative responses offered deeper insights into their perceptions of feedback processes and highlighted any challenges or areas of uncertainty in their self-assessment practices prior to using the toolkit. A dual approach (quantitative scoring and qualitative elaboration) ensured a comprehensive understanding [59] of the students’ baseline self-assessment abilities.
With the Toolkit: After a sufficient interval to minimize memory effects, students completed a second self-assessment using the academic writing self-assessment toolkit (Appendix D). This phase employed a new writing prompt explicitly designed to align with the toolkit’s objectives, ensuring the observed differences reflected the toolkit’s influence rather than familiarity with the prompts.
Following the quantitative evaluation, students were interviewed to reflect on their experiences and compare their self-assessment scores from both phases. These interviews aimed to delve deeper into their perceptions and understand the impact of the toolkit on their self-evaluation practices. The reflections focused on:
-
a) The differences they observed between their self-assessments with and without the toolkit.
-
b) Which set of scores they felt better represented their writing abilities and the reasons for their preferences?
-
c) How did the toolkit influence their ability to critically evaluate their writing and set clear, actionable goals for improvement?
The mixed-method approach to data analysis provided a comprehensive evaluation of the toolkit’s effectiveness:
-
Quantitative Analysis: A paired t-test was conducted to compare the self-assessment scores from the two rounds (pre- and post-toolkit), allowing for the identification of statistically significant differences attributable to the toolkit’s influence. The paired t-test was selected due to the within-subject design, which required comparing scores from the same group of participants across two conditions.
-
Qualitative Analysis: The interview transcripts were subjected to thematic analysis to identify recurring patterns and insights. This qualitative data offered rich, nuanced perspectives on how the toolkit affected students' reflective practices, goal-setting, and overall approach to self-assessment.
By combining quantitative data with qualitative reflections, this stage provided a holistic understanding of the toolkit’s practical utility and its role in enhancing students' metacognitive skills and self-regulated learning. These insights were instrumental in assessing the toolkit’s broader educational value and guiding its refinement.
Results and discussion
As expected, the set of criteria underwent several revisions and adjustments. Some criteria were modified, some were removed, and new ones were added to the list. This section discusses each of these changes in detail.
Which evaluative criteria should the AI self-assessment writing toolkit include?
At the outset, we devised three major questions for ourselves, considering "realism," "cognitive challenge," and "authentic evaluative judgment" to ensure that the toolkit remains innovative, adaptable, and proactive in addressing the evolving needs and challenges our students face. Table 2 provides an overview of how each question aligns with these criteria.
Our toolkit was tailored based on students' experience by conducting a thorough analysis of their reflections on utilizing AI while keeping the above-mentioned questions in mind at all times. For example, an analysis of medical students' reflections led to the formulation of the question, “Have you understood the feedback provided by AI?” This question emerged from difficulties reported by some students (7 students) learning English as a foreign language in comprehending certain feedback. For example, one student had written, “Some of the suggestions were too advanced for me to understand and implement fully. I wish there were a simpler explanation or more guidance provided.”
Feedback literacy, introduced by Sutton [60], refers to the ability to read, interpret, and utilize written feedback effectively. This skill enables students to use feedback to improve their work and develop their evaluative judgment [61]. Therefore, upon evaluating the question based on authentic assessment criteria, it was categorized under the criterion of 'realism,' as it encourages students to honestly assess their understanding of the AI-generated feedback.
Another reason for including this question is to emphasize the importance of academic integrity and the responsible use of AI tools. When students do not fully understand the feedback provided by AI and simply copy and paste the suggested text without critical evaluation, they risk committing plagiarism and academic misconduct. Encouraging students to reflect on whether they genuinely understand the feedback helps prevent this passive acceptance of AI suggestions, promoting ethical behavior in academic writing. According to Sefcik et al. [62], effective academic integrity education not only addresses the prevention of dishonest behaviors but also fosters the development of ethical decision-making skills. Integrating such reflective questions within the toolkit helps ensure that students engage with AI-generated content critically and use it to enhance their learning rather than replace their independent efforts.
By prompting students to critically assess AI-generated feedback and reflect on its application, the toolkit aims to build a strong foundation for ethical practices and the long-term development of writing skills. Guerrero-Dib et al. [63] emphasize that promoting academic integrity within educational settings has a positive influence on ethical behavior in professional contexts, making it crucial to incorporate such considerations into AI-supported learning tools.
Another key question designed for the toolkit was: ‘Was there any specific feedback that caught your attention and was particularly helpful in your life-long improvement process? Please use a highlighter marker to emphasize or mark them.’ This question emerged because 11 students used adjectives or adverbs like ‘interesting,’ ‘surprisingly,’ and ‘exciting’ when faced with feedback that caught their attention and helped improve their writing process. One student, for example, remarked, “The feedback on the organization of my essay was incredibly valuable in highlighting the need for a logical flow of ideas.” This question is grounded in Krashen’s input hypothesis and constructivist theory. The former suggests that language learning happens when students are subjected to understandable input slightly above their current proficiency level. Feedback that helps students improve their language skills by providing clear and easily understandable input can align with this theory [64]. Constructivist theory posits that learners actively construct their knowledge and understanding through interaction with the environment. The feedback that encourages students to reflect on their writing and language learning processes and take ownership of their learning can be viewed through this lens [65].
We also came up with the following question to enhance students’ critical thinking skills: “How has the feedback from AI helped you think more critically about your writing? Please explain.” This question was devised because some students [9] had mentioned remarks like “the feedback forced me to question my assumptions and gather more evidence” or “it pushed me to evaluate the evidence supporting my decisions and consider alternative approaches.” [66] points out that writing at the university level differs greatly from writing in secondary schools since it requires students to write in a more critical academic style. Academic writing heavily relies on critical thinking and analysis, essential for evaluating information, questioning assumptions, analyzing arguments, synthesizing data, and constructing a coherent argument [67]. However, when students are not taught how to utilize AI tools and assess the information they provide, utilizing them may reduce their creativity and critical thinking skills [5]. Therefore, we thought this question would help student writers develop a more nuanced perspective on their writing abilities and how technology can support their growth as writers.
We formulated another question: "Have you identified your weaknesses (e.g., any recurring mistakes or patterns) as a writer and taken measures to improve them?" For instance, one student pointed out, "I have a tendency to use complex medical jargon that may confuse my readers," or "I have realized that I neglect proper transitions between paragraphs.” Addressing this weakness has significantly improved my health-related essays' coherence and flow." This question was posed in light of a study by Kristensen, Torkildsen and Andersson [68], which found that students who do not pay attention to the feedback they receive are more likely to repeat their mistakes in the future. It is essential to break this cycle to ensure progress. Previous research has found links between error-monitoring efficiency and working memory [69], self-regulation skills [70], and overall academic performance [71]. Therefore, this component has been hypothesized to represent an improvement in cognitive control or the compensatory effort to prevent errors [72].
Regarding writing pedagogy and focusing on the positive side, we designed a specific question for our writing course: "Have you noticed any particular areas in which your writing has improved due to feedback from AI? Explain and provide specific examples." This question is particularly relevant because all 22 students have reported that certain areas of their writing improved when they received feedback from AI. For instance, one student mentioned, “ChatGPT has pointed out areas where I could be more precise in my language and eliminate unnecessary fluff.” Although almost all students reported improvements in their work, it is important to acknowledge that AI-generated feedback is direct and often lacks the nuanced, personalized insights provided by human evaluators. Consequently, students may become over-reliant on AI tools, which can hinder the development of critical and creative thinking skills over time [73]. According to Barrot [5], while tools like ChatGPT can help students complete writing assignments quickly, this ease of use raises concerns about potential learning loss, particularly in higher-order cognitive skills. As Kanungo, Gupta [74] emphasize, this dependency can weaken metacognitive skills by discouraging active reflection on the writing process. Excessive reliance on AI-generated feedback may hinder students' ability to engage in authentic self-reflection and self-assessment [35].
To counter these challenges, our toolkit incorporates structured reflection prompts specifically designed to mitigate over-reliance on AI. These prompts encourage students to evaluate their work critically and identify areas for improvement before consulting AI-generated feedback. For example, a prompt such as " How does the AI feedback challenge the assumptions you've made in your arguments, and how might alternative perspectives improve your analysis?" helps students engage with the underlying assumptions of their writing, fostering deeper reflection.
Over-reliance on AI, as Barrot [5] warns, can hinder the development of critical thinking and creativity by encouraging surface-level engagement with feedback rather than analytical and reflective interaction. For instance, students relying solely on AI for grammar corrections or structural suggestions may miss the opportunity to learn underlying principles, such as constructing coherent arguments or synthesizing diverse ideas, which are foundational for academic and professional success [75]. Our toolkit directly addresses these risks by embedding reflection questions that promote active engagement with AI feedback. For example, the prompt, “How does feedback about coherence and cohesion influence the way you connect ideas and structure paragraphs in future assignments?” helps students recognize areas requiring improvement, fostering long-term learning rather than reliance on immediate fixes. This active engagement aligns with Zimmerman [76] framework of self-regulated learning, which highlights the importance of self-monitoring and iterative improvement in sustained academic growth.
The toolkit integrates AI feedback into the reflective process, encouraging students to engage with the feedback critically and independently. By using AI’s insights in conjunction with self-assessment prompts, students are guided to identify areas for improvement and develop a deeper understanding of their writing strengths and weaknesses. This approach not only fosters critical thinking and creativity but also nurtures metacognitive skills, empowering students to become more autonomous and reflective learners. As Denisova-Schmidt [77] emphasizes, fostering ethical and reflective engagement with technological tools in education supports the development of transferable skills critical for professional success.
The final two questions identified as essential for fulfilling authentic assessment objectives were related to strategic goal setting. The first question was: “Have you set specific writing goals for yourself based on the feedback received from AI?” and “How do you plan to apply the feedback received on this assignment to future writing tasks to continue enhancing your language and critical thinking skills?” Answering these two questions involves creating specific, measurable, achievable, relevant, and time-bound (SMART) goals and devising a detailed plan that outlines the steps and actions needed to achieve those goals. According to goal-setting theory, planning and strategizing are a mediating factor [78] and can help students stay organized, focused, and on track toward reaching the desired outcomes [79]. According to Troia, Harbaugh [80], mastery goals focus on acquiring knowledge and skills and attaining a sense of competence, aligning with Ryan and Deci [81] cognitive evaluation theory. Creating mastery goals can also lead to increased self-efficacy, self-regulation, and academic achievement. By setting mastery goals, students can view writing as purposeful and meaningful [82]. Teaching students to create writing objectives and track their progress toward those goals promotes creativity and engagement [83]. Please find below the list of questions from the first draft:
-
1) Have you understood the feedback AI has given to you? If not, circle the number of or underline the feedback you did not fully understand.
-
2) Was there any specific feedback that caught your attention and was particularly helpful in your life-long improvement process? Please mark them.
-
3) How has AI's feedback helped you think more critically about your writing? Please explain.
-
4) Have you noticed any particular areas in which your writing has improved due to AI feedback?
-
5) Have you been able to identify your weaknesses (e.g., any recurring mistakes or patterns, such as having more than one S-V agreement error) as a writer and take measures to improve?
-
6) Have you set specific writing goals for yourself based on the feedback received from AI?
-
7) How do you plan to apply the feedback received on this assignment to future writing tasks to continue enhancing your language and critical thinking skills?
Once the initial version of the self-assessment toolkit was created, the questions were presented to experts in the focus group discussion. At the same time, anonymous students' sample reflection papers (Appendix C) were also shared with the experts. This process allowed them to better understand the connection between the questions in the toolkit and the reflection papers created by the students.
As a result of feedback, certain questions were modified, and new ones were added. For instance, two experts suggested that question 3 could be too difficult for some students. Therefore, they recommended that the question be rephrased or that critical thinking skills be broken down into specific factors to enable students to learn from them. The original question, "How has the feedback from AI helped you to think more critically about your writing?" remains the same; however, the following factors have been added to enhance critical thinking: a) Critically evaluate the sources of information. b) Consider alternative perspectives and think more critically about the underlying assumptions. c) Analyze the arguments presented in writing more thoroughly. d) Provide more context for the data presented. e) Construct more coherent arguments in writing.
Experts presented a similar suggestion to help students identify areas in their writing in the following question, “Have you noticed any particular areas in which your writing has improved as a result of feedback from AI?” Similarly, the question was kept, but the following items were added:
-
a) Grammar: Ensure sentences are well-structured and relevant; eliminate grammatical errors.
-
b) Sentence Structure: Vary sentence structure (simple, compound, complex, and compound-complex) to maintain reader interest and clarity.
-
c) Academic Vocabulary: Utilize appropriate and rich terminology and language specific to the field of study.
-
d) Coherence and Cohesion: Ensure that ideas are connected logically using conjunctions and sentence connectors and that transitions between paragraphs are smooth.
-
e) Clarity: Communicate ideas clearly and avoid using language that may confuse the reader. Utilize appropriate tone and register.
-
f) Mechanics: Correctly use mechanics of writing such as indentation, punctuation, capitalization, word endings, etc.
In addition, experts believe that providing students with specific criteria can raise their awareness of writing improvement areas. According to the cognitive theory of consciousness-raising, learners must be conscious of their language production to make progress [84]. By breaking down grammar, sentence structure, academic vocabulary, coherence, cohesion, and clarity, students can more easily identify where to focus their attention to enhance their writing skills. This type of explicit feedback helps students develop a metacognitive awareness of their writing abilities, ultimately improving their proficiency [85].
These subtopics are aligned with the principles of formative assessment, which highlight the significance of offering constructive feedback to students to guide their learning process [54]. By specifically pointing out areas for improvement and offering suggestions for enhancement, students can engage in self-reflection and actively work towards strengthening their writing skills [86]. This targeted feedback can also boost students' confidence in their abilities and motivate them to continue developing as writers [53].
The experts have also suggested that each set of questions in the self-assessment toolkit should be given an appropriate title so the students can easily understand them. For example, the first two questions (1 & 2) were titled ‘Understanding and Incorporation of Feedback.’ Similarly, questions (4 and 5) were titled ‘Improvement in Writing Skills’ and moved to the second set of questions in the toolkit. Likewise, question 3 was titled ‘Critical Thinking and Analysis,’ and questions 6 and 7 were titled ‘Strategic Goal Setting and Planning.’ As their final suggestion, experts recommend providing sample examples for some questions like questions 6 and 7 to help students learn how to set detailed academic goals and provide strategic planning to achieve those goals. Özlem [87] study shows that when learners receive advice on goal-setting, goal-achievement tactics, and goal reflection, personal goal-setting may help support them in EFL writing situations.
The revised version of the self-assessment toolkit after experts’ suggestions is presented in Table 3.
To what extent do experts consider the questions in the self-assessment toolkit to be important and comprehensive for evaluating students' writing skills?
As shown in Table 4, the experts provided ratings for the seven questions included in the toolkit. The mean scores ranged from 4.33 to 5.00, with most questions receiving a mean score of 4.83 or higher. This result suggests that the experts collectively viewed the questions as highly important, with over 90% of the questions rated as significant or very significant. The detailed feedback from the experts offered valuable insights that were used to refine and finalize the academic writing toolkit, ensuring the instrument effectively addressed the key areas of concern identified in the earlier research stages.
Overall, the experts rated the toolkit questions highly, with most questions considered very important. This finding indicates that the toolkit is well-designed to assess writing self-assessment skills.
To what extent does the academic writing self-assessment toolkit demonstrate content validity, face validity, and reliability in measuring students' academic writing skills?
The 22 medical student participants reviewed the academic writing self-assessment toolkit and provided feedback on its content validity and face validity.
Content Validity: The students rated 95% of the toolkit items as highly relevant (score of 4 or 5 on a 5-point scale) for evaluating academic writing proficiency. Additionally, 88% of the items were deemed comprehensive, covering the essential elements necessary for a thorough assessment of writing abilities. However, the students recommended rephrasing a few items for clarity, such as:
Item 3a: "Ensure that sentences are properly structured and free of grammatical errors" could be changed to " Demonstrate proper sentence structure and grammatical accuracy." The latter phrasing was considered more natural and intuitive in the Persian language.
Item 3e: "Communicate ideas clearly and avoid using language that may not be very clear to the reader" could be revised to "Express ideas clearly and concisely, using language that is accessible to the reader." The suggested change was intended to better align with common Persian language usage and conventions.
These minor wording changes were incorporated to enhance the content validity of the toolkit. Content validity is an important aspect of assessment design, as it ensures the instrument measures what it intends to measure [88].
Face Validity: Face validity refers to the extent to which an assessment appears to measure the intended construct at face value [89]. Establishing face validity is crucial for ensuring the acceptability and usability of an assessment tool from the perspective of the target population [88].
The students rated 92% of the toolkit items as highly clear and unambiguous (score of 4 or 5 on a 5-point scale). Furthermore, 90% of the items were deemed appropriate and relevant for a self-assessment of academic writing skills. A small number of items were identified as somewhat confusing or unclear, such as:
Item 5b: "Consider alternative perspectives and think more critically about the underlying assumptions" was rephrased as "Critically examine different viewpoints and underlying assumptions."
Item 7: The example "Enhance my academic vocabulary knowledge by using a dictionary and thesaurus more and studying root words and prefixes" was revised to "Expand my academic vocabulary by regularly consulting reference materials and studying word roots and affixes." These changes aimed to better align with common Persian language usage and study practices.
The research team incorporated this valuable student feedback to improve the clarity and interpretability of the self-assessment tool, enhancing its face validity. This student-centered evaluation of content validity and face validity was instrumental in strengthening the academic writing self-assessment toolkit, ensuring it was a comprehensive and user-friendly instrument for measuring writing proficiency.
Reliability: The toolkit’s reliability was rigorously evaluated using two methods. First, Cronbach's alpha coefficient—a measure of internal consistency—was calculated based on the responses of 20 participants who did not take part in the main phase of the study. The resulting coefficient of 0.91 indicates an excellent level of reliability, demonstrating that the toolkit consistently measures academic writing skills. Additionally, a test–retest reliability method was employed, in which the same toolkit was administered to the same participants at two different time points. The results of this assessment showed a high degree of reliability (r = 0.93), suggesting that the toolkit provides consistent results over time. This finding confirms the reliability of the academic writing self-assessment toolkit. The high Cronbach's alpha coefficient and Pearson's r values further attest to the toolkit's dependability, making it a valuable resource for students to assess their academic writing skills. The comprehensive evaluation process, including rigorous content validity, face validity, and reliability checks, contributes significantly to the credibility and utility of the academic writing self-assessment toolkit. The finalized English version of the reflective toolkit can be found in the supplementary material.
To what extent does the academic writing self-assessment toolkit influence students' self-evaluations of their writing abilities compared to self-assessments conducted without the toolkit?
To explore the influence of the academic writing self-assessment toolkit on students’ self-evaluations, a comparison was made between self-assessment scores from the pre-toolkit and post-toolkit phases. The findings (Table 5) revealed significant differences in students' scoring patterns, indicating that the toolkit fostered more critical and reflective evaluations of their writing abilities.
The comparison revealed intriguing trends that underscore the impact of the toolkit on students' self-assessment processes. Quantitative results indicated decreases in scores for certain areas after using the toolkit. For example, average scores for Understanding Feedback decreased from 4.1 to 3.5, and for Improvement in Writing Skills from 4.0 to 3.3. These decreases reflect a shift toward more critical self-evaluation, consistent with research suggesting that structured tools encourage students to reassess their performance more rigorously [90, 91].
Conversely, Goal Setting saw a notable increase from 3.9 to 4.4, suggesting that the toolkit’s structured approach to goal setting was effective in helping students create more specific and actionable writing objectives. Additionally, the increase in Usefulness of Feedback scores (from 4.3 to 4.8) highlights that students found the toolkit instrumental in using feedback more effectively.
To provide additional context and deepen understanding, follow-up interviews were conducted with ten students, selected through purposive sampling to represent diverse perspectives while working toward data saturation. During these interviews, students were asked to compare their pre- and post-toolkit self-assessment scores and explain their rationale for the differences. Their writing samples were reviewed alongside the interview, aligning scores with the actual quality of their work.
To deepen our understanding of these shifts, follow-up interviews with students were conducted. The qualitative data from these interviews highlighted three primary themes, which further explain the toolkit’s influence on students’ self-evaluations.
Increased awareness of strengths and weaknesses
The toolkit enhanced students’ ability to critically evaluate their writing, fostering a nuanced understanding of strengths and areas for improvement. Several students reflected on how their initial self-assessments lacked the nuance needed for meaningful improvement. As one student noted, “Before using the toolkit, I’d rate myself high in everything because I thought my essay was fine. But after going through the checklist, I realized my paragraphs don’t always flow well, and sometimes my examples are weak or unclear.” Another added, “I didn’t realize how important transitions were. Now, I see how crucial it is to make ideas flow smoothly.”
This deeper awareness was reflected in the drop in Understanding Feedback and Improvement in Writing scores (4.1 to 3.5) and Improvement in Writing (4.0 to 3.3). The toolkit prompted students to evaluate writing tasks more critically, especially in areas like argument structure and clarity. One student commented, “I thought my grammar was my main issue, but now I understand that organizing my ideas clearly is just as important.” However, some students also acknowledged the emotional discomfort of identifying weaknesses they had previously ignored. One student remarked, “It was hard to realize how much I was missing, but it gave me a direction to improve.” This reflects the toolkit’s dual role as both a developmental tool and a mirror for self-awareness.
Clearer goal setting
The toolkit not only made students more aware of their writing weaknesses, but also helped them set more specific and actionable goals for improvement. Many students noted that their pre-toolkit goals were vague, while post-toolkit goals were more targeted. One participant explained, “Before, I just said I wanted to improve my grammar, but I didn’t know where to start. The toolkit made me realize I need to focus on subject-verb agreement or comma usage.” Another added, “Now, I can break down goals like improving introductions or clarifying arguments.”
The toolkit also helped students think long-term about their improvement. One said, “Now I feel like the goals I set will help me in future assignments too.” This improvement in goal-setting clarity aligns with the quantitative increase in Goal-Setting scores (3.9 to 4.4) reflects the greater clarity students gained in setting specific, measurable objectives. The toolkit’s structure made goal-setting a more focused and deliberate process, enhancing students' self-regulated learning.
Alignment with teacher feedback
Interviews also revealed closer alignment between post-toolkit self-assessments and teacher evaluations, suggesting that the toolkit enabled students to evaluate themselves more realistically. The teacher-interviewer noted, “Before the toolkit, students often overestimated their grammar and vocabulary. After using the toolkit, their self-assessments were much more aligned with what I saw in their writing.” Also, one participant stated, “After the toolkit, my self-assessment matched my teacher’s feedback more closely, which made me feel more confident that I was evaluating myself accurately.” However, some discrepancies remained, particularly in grammar and vocabulary, where students still tended to overrate their abilities. The teacher-interviewer commented, “Students still tend to rate their grammar high, despite noticeable errors. This area could benefit from more focused prompts.” Despite these gaps, students found the toolkit valuable in clarifying academic expectations. As one student noted, “The toolkit showed me what teachers are looking for—clarity and structure, not just fancy words.”
This qualitative data supports the conclusion that the toolkit prompts students to critically reflect on their abilities, set strategic goals, and align their self-assessments with academic standards, all of which support their writing development. This aligns with existing research on self-assessment, which suggests that structured tools can reduce overestimation and encourage more critical reflection [92]. Rickert [90] argues that students often overestimate their abilities due to a lack of structured evaluation guidelines, while Zhang’s research [91] found that students struggle to assess areas like coherence and grammar accurately.
Furthermore, students’ reflections showed how the toolkit encouraged them to address overlooked aspects of writing, such as logical flow and coherence, aligning with Nicol and Macfarlane-Dick’s [75] recommendation that self-assessment tools should guide students in breaking down complex tasks into manageable components for critical evaluation. The emotional discomfort some students expressed in recognizing gaps in their writing skills also highlights the toolkit’s role in fostering critical self-reflection, as noted by Boud and Falchikov [93], who stress that self-assessment can challenge students' perceptions, which is necessary for growth. For example, in nursing or medical education, the toolkit could be used to help students improve their reflective practices in clinical documentation by prompting them to assess the clarity, accuracy, and structure of patient notes. This would encourage them to critically reflect on their documentation skills and identify areas for improvement, much like how the toolkit helps students in this study evaluate their writing. Similarly, in pharmacy education, the toolkit could assist students in evaluating the quality of medication histories, prescription notes, and other written communications. Studies have shown that documentation errors in these fields—such as inaccurate medication records or incomplete patient histories—are common and can have serious consequences [94, 95]. The toolkit could help students in these fields develop stronger documentation skills by prompting more accurate and reflective assessments.
The closer alignment between students’ post-toolkit self-assessments and teacher-interviewer evaluations suggests that the toolkit improved the accuracy of students’ evaluations. This is particularly important given the findings of León, Panadero and García-Martínez [96], who note that self-assessments are most effective when they closely mirror external evaluations. The alignment observed in areas such as coherence and argumentation highlights the toolkit’s success in bridging the gap between student perceptions and academic standards [97].
Students also expressed increased confidence in their self-assessments post-toolkit. However, persistent discrepancies in grammar and vocabulary assessments suggest that these areas remain challenging for students to evaluate independently. This aligns with findings by Topping [98], who highlights the complexity of self-assessing technical aspects of writing, such as grammar, without additional scaffolding. The teacher-interviewer’s observation that students continued to overestimate their grammar skills underscores the need for further refinement of the toolkit. Adding more targeted prompts or detailed rubrics for technical aspects could enhance students’ ability to assess these areas accurately [91].
Conclusion and pedagogical implications
This study explored the effectiveness of integrating artificial intelligence (AI) tools into medical education to enhance the academic writing skills of medical students. Our findings reveal several important conclusions and pedagogical implications specifically relevant to the field of medical education:
-
1.
Enhanced Writing Proficiency in Medical Education: Integrating AI technologies, such as automated essay grading and personalized feedback, can significantly improve medical students’ ability to communicate complex clinical and research findings. This is particularly relevant for future physicians, as strong writing skills are essential for disseminating research, contributing to scientific literature, and documenting patient care accurately.
-
2.
Scalability in Medical Training Programs: AI-driven tools allow for scalable assessment and individualized feedback, which is critical in medical programs with large class sizes and diverse student needs. By automating routine tasks, such as grammar correction and initial content review, AI enables educators to focus on providing deeper, content-specific feedback that enhances students' learning outcomes.
-
3.
Supporting Clinical Reasoning and Reflective Practice: The AI-assisted self-assessment toolkit encourages medical students to critically evaluate their own writing critically, fostering reflective practice—a key component of medical professionalism. By promoting metacognitive awareness, the toolkit helps students recognize areas of improvement, refine their clinical reasoning, and improve the clarity of their written communication.
-
4.
Alignment with Authentic Assessment in Medical Education: The toolkit provides a process-oriented approach that shifts the focus from rote memorization to authentic assessment of writing skills. This aligns with best practices in medical education, where assessment is designed to reflect real-world applications, such as case report writing and the articulation of clinical findings.
-
5.
Ethical and Responsible AI Integration: While AI offers substantial benefits, it is crucial to educate medical students on the ethical implications of AI use, particularly in relation to academic integrity. Training should emphasize the responsible use of AI in both academic and clinical contexts, helping students understand the importance of maintaining originality and avoiding plagiarism. This involves teaching students to critically evaluate AI-generated content rather than accepting it without reflection and ensuring that they use AI as a supplementary tool rather than a substitute for their own thinking. By integrating principles of academic integrity into AI training, students can learn to discern when to rely on AI support and when to exercise independent judgment, thus fostering a culture of ethical behavior and responsible decision-making.
-
6.
Broader Applications and Adaptability: Beyond the context of medical education, the findings from this study demonstrate the potential adaptability of the AI-assisted self-assessment toolkit in other health and science-related disciplines. For instance, in nursing education, where clear and accurate patient documentation is essential, the toolkit could help students refine their ability to write precise clinical notes. Similarly, in pharmacy and public health programs, the toolkit could support students in structuring research reports and analyzing health policy documents with clarity and coherence. The toolkit’s versatility extends beyond health disciplines and can be adapted for use in non-medical fields such as law, engineering, and social sciences. In law education, for example, it could help students improve legal writing and analysis by providing structured feedback on case briefs or legal memos. In engineering, the toolkit could support students in writing clear and comprehensive technical reports or project proposals. In the social sciences, where analytical writing and argumentation are central, the toolkit could assist students in developing well-structured essays or research papers. The modular design of the toolkit allows for customization of feedback criteria, ensuring it meets the unique demands of these diverse academic fields, thereby broadening its applicability and relevance to a wider academic audience.
In summary, integrating AI tools in medical education has the potential to enhance students’ writing abilities and overall educational outcomes significantly. However, it requires thoughtful implementation, with a focus on promoting critical thinking, clinical relevance, and ethical practice. The adaptability of this toolkit across a range of educational contexts highlights its scalability and underscores its relevance to advancing interdisciplinary education and digital literacy. This broader impact provides a foundation for future research and innovative pedagogical strategies that can transform education in the digital age.
Practical guidelines for balancing AI support with traditional learning strategies
To ensure effective integration of AI tools in academic settings without compromising critical thinking and originality, we propose the following practical guidelines for educators:
-
1.
Use AI Feedback as a Supplement, not a Replacement: Encourage students to review and analyze AI-generated feedback alongside instructor and peer feedback. For instance, educators could organize peer review sessions where students compare AI-generated feedback with human feedback to critically analyze differences in depth, tone, and specificity. This practice helps students recognize the complementary strengths of human and AI inputs, such as the AI’s ability to highlight technical issues like grammar errors and a human’s ability to provide contextualized and nuanced suggestions.
-
2.
Incorporate Reflection Activities: Design assignments that require students to reflect on AI feedback and compare it with their understanding or peer input. For example, educators could ask students to write a short reflective essay after using AI tools, detailing how the feedback influenced their revisions, what they agreed or disagreed with, and why. These activities encourage students to think critically about AI suggestions and prevent over-reliance on automated input. For instance, in a public health course, students could reflect on how AI helped refine their policy brief drafts, ensuring that their analysis aligns with evidence-based practices.
-
3.
Blend AI and Human-Centered Assessment Approaches: Include assignments that cannot be easily completed by AI alone, such as reflective essays, personal narratives, or projects that require complex argumentation. For example, students can use AI tools to refine technical language in their work, such as in a patient care report or policy analysis, before submitting it for human review. This approach enhances both technical accuracy and the students’ ability to engage critically with their writing. By blending AI with human feedback, educators can ensure clarity in technical writing while retaining students' reflective insights.
-
4.
Create Collaborative Learning Opportunities: Use AI tools as part of group activities where students can discuss and evaluate AI-generated suggestions together, fostering collaboration and collective critical thinking. A practical application could involve small-group workshops where students collectively edit a draft using both AI and peer feedback, discussing why certain suggestions are adopted or rejected. Such activities help students develop critical evaluation skills while leveraging AI as a supportive tool. This method can also be applied in engineering or business courses, where collaboration and precision in communication are critical.
-
5.
Promote Self-Assessment and Goal Setting: Guide students to set specific goals for their learning based on both AI feedback and traditional assessments. For instance, students could maintain a progress log where they record recurring weaknesses identified by AI (e.g., sentence structure, transitions) and outline specific actions to improve in those areas. Educators can support this process by periodically reviewing these logs and providing targeted advice. This approach empowers students to track their progress and encourages iterative learning, ensuring sustained improvement over time.
By incorporating these actionable strategies, educators can ensure that AI tools are used effectively as a complement to traditional methods. These approaches help balance the efficiency of AI with the depth and personalization of human-centered feedback, promoting originality, critical thinking, and metacognitive skills in learners.
Limitations and future research
Despite the potential advantages, our study also identified several limitations and areas for future research:
-
a) Bias and Fairness: AI algorithms may inadvertently perpetuate biases present in the training data. Investigating bias mitigation strategies and ensuring fairness in AI-driven assessments is essential to maintain equity in educational contexts. Future research should explore methods to enhance the fairness and inclusivity of AI tools, especially in diverse student populations.
-
b) Overreliance on AI: Students might become overly dependent on AI tools, which could result in the neglect of essential writing and critical thinking skills. Future research should investigate strategies to balance AI assistance with the promotion of independent thought and self-assessment skills, particularly in contexts where students must critically evaluate clinical scenarios or develop reflective practices.
-
c) Long-Term Impact: While this study focused on immediate outcomes, longitudinal studies are needed to assess the long-term impact of AI integration on student learning outcomes. For example, research could explore whether sustained exposure to AI tools enhances students’ ability to write comprehensive case reports or improves the quality of reflective journaling over time.
-
d) Customization and Adaptability: Tailoring AI tools to diverse student needs remains challenging. Future research should focus on personalized adaptations and flexibility in AI-based pedagogy to meet the specific requirements of various medical specialties. For instance, the toolkit could be adapted to help students in nursing programs develop patient communication documentation or assist public health students in articulating evidence-based policy recommendations.
-
e) Adaptability to Other Medical Contexts: While this study primarily focused on medical students, the toolkit’s underlying principles—such as promoting self-reflection, critical thinking, and ethical AI use—can be adapted to other areas within medical education. Future research could explore how the toolkit might be customized for use in nursing education, pharmacy training, or allied health professions to address discipline-specific documentation and communication skills. Such adaptations could broaden the toolkit’s applicability and relevance across different medical fields, thereby enhancing its overall utility.
-
f) Limitations of Generalization: While this study demonstrated the effectiveness of the toolkit in enhancing writing skills and reflective practices among medical students, its context-specific design may limit the broader applicability of the findings. The study was conducted with a specific sample of medical students from a single university, which may not fully represent other populations or disciplines. Additionally, the decision not to conduct subgroup analysis, due to participant homogeneity, may limit the ability to explore how demographic factors such as prior writing experience, educational background, or other individual characteristics could influence the outcomes. For example, students with more extensive writing experience may engage differently with the AI tool, potentially benefiting more in tasks like case report development, while those with less experience might encounter a steeper learning curve. Similarly, variations in writing proficiency across demographic groups could affect the tool’s impact on reflective practices. These factors may introduce nuances in the toolkit’s effectiveness across diverse populations, highlighting the need for future studies to explore how such variations influence outcomes.
In conclusion, while AI holds immense educational potential, addressing its limitations and refining its implementation will be crucial for maximizing its benefits. Expanding the toolkit’s application to various contexts within medical education, ensuring fairness and bias mitigation, and promoting responsible AI use are essential areas for future research and development.
Data availability
No datasets were generated or analysed during the current study.
References
Hamp-Lyons L, Heasley B. Study writing: A course in written English for academic purposes: Cambridge University Press; 2006.
Morton J, Storch N, Thompson C. What our students tell us: Perceptions of three multilingual students on their academic writing in first year. J Second Lang Writ. 2015;30:1–13.
Gollins A, Gentner D. A framework for a cognitive theory of writing. Cognitive processes in writing: Routledge; 2016. p. 51–72.
Mahmood R, Shah A, Alam I. The impact of l1 on l2 in academic english writing: a multilingual dilemma of pakistani students. Engl Specif Purp. 2020;16(5):67–80.
Barrot JS. Using ChatGPT for second language writing: Pitfalls and potentials. Assess Writ. 2023;57:100745.
Gerova G, Ivanova I. University students and instructors’perceptions of challenges in academic writing. Lyuboslovie. 2023(23).
Stokel-Walker C. ChatGPT listed as author on research papers: many scientists disapprove. Nature. 2023;613(7945):620–1.
Swiecki Z, Khosravi H, Chen G, Martinez-Maldonado R, Lodge JM, Milligan S, et al. Assessment in the age of artificial intelligence. Computers and Education: Artificial Intelligence. 2022;3.
Bridgeman DLaA. Did AI write your essay? How artificial intelligence is changing education Linkedin2023 [
Zimmerman A. A ghostwriter for the masses: ChatGPT and the future of writing. Ann Surg Oncol. 2023;30(6):3170–3.
Villarroel V, Bloxham S, Bruna D, Bruna C, Herrera-Seda C. Authentic assessment: creating a blueprint for course design. Assess Eval High Educ. 2018;43(5):840–54.
Koo M. The importance of proper use of ChatGPT in medical writing. Radiology. 2023;307(3):e230312.
Houssein EH, Mohamed RE, Ali AA. Machine learning techniques for biomedical natural language processing: a comprehensive review. IEEE Access. 2021;9:140628–53.
Perkins SW, Muste JC, Alam T, Singh RP. Improving Clinical Documentation with Artificial Intelligence: A Systematic Review. Perspectives in Health Information Management. 2024;21(2).
Topol EJ. High-performance medicine: the convergence of human and artificial intelligence. Nat Med. 2019;25(1):44–56.
Balloch J, Sridharan S, Oldham G, Wray J, Gough P, Robinson R, et al. Use of an ambient artificial intelligence tool to improve quality of clinical documentation. Future Healthcare J. 2024;11(3):100157.
Bergstrand S, Heddle C, Sabaté M, Mas M. Embracing artificial intelligence in medical writing: a new era of efficiency and collaboration. Medical Writing. 2023;32(3):82–7.
Li J, Zong H, Wu E, Wu R, Peng Z, Zhao J, et al. Exploring the potential of artificial intelligence to enhance the writing of english academic papers by non-native english-speaking medical students-the educational application of ChatGPT. BMC Med Educ. 2024;24(1):736.
Rudolph J, Tan S, Tan S. ChatGPT: Bullshit spewer or the end of traditional assessments in higher education? J Appl Learn Teach. 2023;6(1):342–63.
Swiecki Z, Khosravi H, Chen G, Martinez-Maldonado R, Lodge JM, Milligan S, et al. Assessment in the age of artificial intelligence. Comput Educ Artif Intell. 2022;3:100075.
Boud D, Ajjawi R, Dawson P, Tai J. Developing evaluative judgement in higher education: Routledge London; 2018.
Gipps C, Stobart G. Fairness in assessment. Educational assessment in the 21st century: Connecting theory and practice: Springer; 2009. p. 105–18.
Agan Şİ, Deniz S. A Rubric Study for Assessing Paragraph Level Written Texts. Journal of Education and Training Studies. 2019;8(1).
Nejad AM, Pakdel F, Khansir AA. Interaction between Language Testing Research and Classroom Testing Practice. Educ Process Int J. 2019;8(1):59–71.
Tai J, Ajjawi R, Boud D, Dawson P, Panadero E. Developing evaluative judgement: enabling students to make decisions about the quality of work. High Educ. 2018;76:467–81.
Wei P, Wang X, Dong H. The impact of automated writing evaluation on second language writing skills of Chinese EFL learners: A randomized controlled trial. Front Psychol. 2023;14:1249991.
Wang P-l. Effects of an Automated Writing Evaluation Program: Student Experiences and Perceptions. Electronic Journal of Foreign Language Teaching. 2015;12(1).
Cotos E. Automated writing evaluation for non-native speaker English academic writing: The case of IADE and its formative feedback: Iowa State University; 2010.
Zou M, Huang L. To use or not to use? Understanding doctoral students’ acceptance of ChatGPT in writing through technology acceptance model. Front Psychol. 2023;14:1259531.
Michel-Villarreal R, Vilalta-Perdomo E, Salinas-Navarro DE, Thierry-Aguilera R, Gerardou FS. Challenges and Opportunities of Generative AI for Higher Education as Explained by ChatGPT. Education Sciences. 2023;13(9).
Lo CK. What is the impact of ChatGPT on education? A rapid review of the literature. Educ Sci. 2023;13(4):410.
Lim WM, Gunasekara A, Pallant JL, Pallant JI, Pechenkina E. Generative AI and the future of education: Ragnarök or reformation? A paradoxical perspective from management educators. Int J Manag Educ. 2023;21(2):100790.
Dwivedi YK, Kshetri N, Hughes L, Slade EL, Jeyaraj A, Kar AK, et al. “So what if ChatGPT wrote it?” Multidisciplinary perspectives on opportunities, challenges and implications of generative conversational AI for research, practice and policy. Int J Inf Manage. 2023;71:102642.
Rainie L, Anderson J. A New Age of Enlightenment? A New Threat to Humanity? Experts Imagine the Impact of Artificial Intelligence by 2040. 2024.
Washington J. The Impact of Generative Artificial Intelligence on Writer's Self-Efficacy: A Critical Literature Review. Available at SSRN 4538043. 2023.
Entwistle N. Student learning and academic understanding: a research perspective with implications for teaching. 2018.
Lodge JM, Yang S, Furze L, Dawson P. It’s not like a calculator, so what is the relationship between learners and generative artificial intelligence? Learning: Research and Practice. 2023;9(2):117–24.
Bosco AM, Ferns S. Embedding of authentic assessment in work-integrated learning curriculum. Asia Pac J Cooperative Educ. 2014;15(4):281–90.
O’Neill G, Short A. Relevant, practical and connected to the real world: what higher education students say engages them in the curriculum. Irish Educational Studies. 2023:1–18.
Jopp R. A case study of a technology enhanced learning initiative that supports authentic assessment. Teach High Educ. 2020;25(8):942–58.
Thorburn M. Articulating a Merleau-Pontian phenomenology of physical education: The quest for active student engagement and authentic assessment in high-stakes examination awards. Eur Phys Educ Rev. 2008;14(2):263–80.
Geerling W, Mateer GD, Wooten J, Damodaran N. ChatGPT has aced the test of understanding in college economics: Now what? Am Econ. 2023;68(2):233–45.
Salinas-Navarro DE, Vilalta-Perdomo E, Michel-Villarreal R, Montesinos L. Using generative artificial intelligence tools to explain and enhance experiential learning for authentic assessment. Educ Sci. 2024;14(1):83.
Boud D, Soler R. Sustainable assessment revisited. Assess Eval High Educ. 2016;41(3):400–13.
Egodawatte G. A Rubric to Self-Assess and Peer-Assess Mathematical Problem Solving Tasks of College Students. Acta didactica napocensia. 2010;3(1):75–88.
Broad B. What we really value: Beyond rubrics in teaching and assessing writing: University Press of Colorado; 2003.
Wiggins G. The case for authentic assessment. Practical assessment, research, and evaluation. 1990;2(1).
Alderson JC, McIntyre D. Implementing and evaluating a self-assessment mechanism for the Web-based language and style course. Lang Lit. 2006;15(3):291–306.
Mushtaq R, Taseer N, Ghori U. Effectiveness of Process-Oriented Approach in the Development of English Writing Skills of Undergraduate Students. Glob Educ Stud Rev VI. 2021;6:186–94.
Seow A. The writing process and process writing. Method Lang Teach An Anthology Curr Pract. 2002;315:320.
Oscarson M. Self-assessment in the classroom. Companion Lang Assessment. 2013;2:712–29.
Nielsen K. Self-assessment methods in writing instruction: a conceptual framework, successful practices and essential strategies. J Res Reading. 2014;37(1):1–16.
Ragupathi K, Lee A. Beyond Fairness and Consistency in Grading: The Role of Rubrics in Higher Education. Diversity and Inclusion in Global Higher Education2020. p. 73–95.
Black P, Wiliam D. Assessment and classroom learning. Assessment Educ Princ Policy Pract. 1998;5(1):7–74.
Colton D, Covert RW. Designing and constructing instruments for social research and evaluation: John Wiley & Sons; 2007.
Greenbaum TL. The handbook for focus group research: Sage; 1998.
Bloor M. Focus groups in social research: Sage; 2001.
Patrick DL, Burke LB, Gwaltney CJ, Leidy NK, Martin ML, Molsen E, Ring L. Content validity—establishing and reporting the evidence in newly developed patient-reported outcomes (PRO) instruments for medical product evaluation: ISPOR PRO Good Research Practices Task Force report: part 2—assessing respondent understanding. Value in Health. 2011;14(8):978–88.
Gibson CB. Elaboration, generalization, triangulation, and interpretation: On enhancing the value of mixed method research. Organ Res Methods. 2017;20(2):193–223.
Sutton P. Conceptualizing feedback literacy: Knowing, being, and acting. Innov Educ Teach Int. 2012;49(1):31–40.
Ferguson P. Student perceptions of quality feedback in teacher education. Assess Eval High Educ. 2011;36(1):51–62.
Sefcik L, Striepe M, Yorke J. Mapping the landscape of academic integrity education programs: what approaches are effective? Assessment & evaluation in higher education. 2020.
Guerrero-Dib JG, Portales L, Heredia-Escorza Y. Impact of academic integrity on workplace ethical behaviour. Int J Educ Integr. 2020;16(1):2.
Patrick R. Comprehensible Input and Krashen’s theory. J Classics Teach. 2019;20(39):37–44.
Teixeira PJ, Marques MM, Silva MN, Brunet J, Duda JL, Haerens L, et al. A classification of motivation and behavior change techniques used in self-determination theory-based interventions in health contexts. Motiv Sci. 2020;6(4):438.
Daud NSBM. Developing Critical Thinking Skills in Tertiary Academic Writing Through the Use of an Instructional Rubric for Peer Evaluation: A Thesis Submitted in Fulfilment of the Requirements for the Degree of Doctor of Philosophy in the University of Canterbury: University of Canterbury; 2011.
Intja NS, Nahole M. Critical Thinking in Academic Writing Amongst Rukwangali Students: Lecturers’ Perspectives. Critical Thinking in Academic Writing amongst Rukwangali Students: Lecturers? Perspectives. 2021;90(1):14-.
Kristensen JK, Torkildsen JvK, Andersson B. Repeated mistakes in app-based language learning: Persistence and relation to learning gains. Computers & Education. 2024;210:104966.
Miller EK, Lundqvist M, Bastos AM. Working Memory 2.0. Neuron. 2018;100(2):463–75.
Unsworth N, Miller AL, Robison MK. The influence of working memory capacity and lapses of attention for variation in error monitoring. Cogn Affect Behav Neurosci. 2022;22(3):450–66.
Hirsh JB, Inzlicht M. Error-related negativity predicts academic performance. Psychophysiology. 2010;47(1):192–6.
Van Beuningen CG, De Jong NH, Kuiken F. Evidence on the effectiveness of comprehensive error correction in second language writing. Lang Learn. 2012;62(1):1–41.
Kurban CF, Şahin M. The Impact of ChatGPT on Higher Education: Exploring the AI Revolution: Emerald Publishing Limited; 2024.
Kanungo RP, Gupta S, Patel P, Prikshat V, Liu R. Digital consumption and socio-normative vulnerability. Technol Forecast Soc Chang. 2022;182:121808.
Nicol DJ, Macfarlane-Dick D. Formative assessment and self-regulated learning: A model and seven principles of good feedback practice. Stud High Educ. 2006;31(2):199–218.
Zimmerman BJ. Goal setting: A key proactive source of academic self-regulation. Motivation and self-regulated learning: Routledge; 2012. p. 267–95.
Denisova-Schmidt E. The challenges of academic integrity in higher education: Current trends and prospects. CIHE perspectives. 2017;5:1–27.
Latham GP, Arshoff AS. Planning: A mediator in goal-setting theory. The Psychology of Planning in Organizations: Routledge; 2015. p. 89–104.
Schippers MC, Morisano D, Locke EA, Scheepers AW, Latham GP, de Jong EM. Writing about personal goals and plans regardless of goal type boosts academic performance. Contemp Educ Psychol. 2020;60:101823.
Troia GA, Harbaugh AG, Shankland RK, Wolbers KA, Lawrence AM. Relationships between writing motivation, writing activity, and writing performance: Effects of grade, sex, and ability. Read Writ. 2013;26:17–44.
Ryan RM, Deci EL. Self-determination theory and the facilitation of intrinsic motivation, social development, and well-being. Am Psychol. 2000;55(1):68.
Alvarez Sainz M, Ferrero AM, Ugidos A. Time management: skills to learn and put into practice. Education+ Training. 2019;61(5):635–48.
Magnifico AM. Writing for whom? Cognition, motivation, and a writer’s audience. Educ Psychol. 2010;45(3):167–84.
Ellis R. Planning and task performance in a second language. Planning and task performance in a second language. 2005:1–320.
Efklides A. Interactions of metacognition with motivation and affect in self-regulated learning: The MASRL model. Educ Psychol. 2011;46(1):6–25.
Dawson P. Assessment rubrics: towards clearer and more replicable design, research and practice. Assess Eval High Educ. 2017;42(3):347–60.
Özlem Ö. Personal goal-setting in an EFL writing class. Dil Dergisi. 2019;170(1):89–107.
Bolarinwa OA. Principles and methods of validity and reliability testing of questionnaires used in social and health science researches. Nig Postgrad Med J. 2015;22(4):195–201.
Cox TL, Bown J, Bell TR. In Advanced L2 Reading Proficiency Assessments, Should the Question Language Be in the L1 or the L2?: Does It Make a Difference? Foreign language proficiency in higher education. 2019:117–36.
Rickert H. Fostering Self-assessment Strategies as a Learning Tool for ELLs: Greensboro College; 2020.
Zhang X. Effects of using self-assessment on English-as-a-foreign-language (EFL) Students’ self-efficacy beliefs and writing improvement: ResearchSpace@ Auckland; 2021.
Chung HQ, Chen V, Olson CB. The impact of self-assessment, planning and goal setting, and reflection before and after revision on student self-efficacy and writing performance. Read Writ. 2021;34(7):1885–913.
Boud D, Falchikov N. Aligning assessment with long-term learning. Assess Eval High Educ. 2006;31(4):399–413.
Van Rosse F, de Bruijne M, Suurmond J, Essink-Bot M-L, Wagner C. Language barriers and patient safety risks in hospital care. A mixed methods study. International journal of nursing studies. 2016;54:45–53.
Wasserman M, Renfrew MR, Green AR, Lopez L, Tan-McGrory A, Brach C, Betancourt JR. Identifying and preventing medical errors in patients with limited English proficiency: key findings and tools for the field. J Healthcare Qual. 2014;36(3):5–16.
León SP, Panadero E, García-Martínez I. How accurate are our students? A meta-analytic systematic review on self-assessment scoring accuracy. Educ Psychol Rev. 2023;35(4):106.
Andrade H, Valtcheva A. Promoting learning and achievement through self-assessment. Theory Pract. 2009;48(1):12–9.
Topping K. Self and peer assessment in school and university: Reliability, validity and utility. Optimising new modes of assessment: In search of qualities and standards: Springer; 2003. p. 55–87.
Acknowledgements
Not applicable.
Funding
We confirm that no funding was received for this work.
Author information
Authors and Affiliations
Contributions
L.KH and F. P. devised the study concept, designed the study, supervised the intervention, data collection, and analysis, coordinated the research, and critically revised the manuscript. R.K. and J.M. participated in the study concept and revised the manuscript. All authors have read and approved the content of the manuscript.
Corresponding authors
Ethics declarations
Ethics approval and consent to participate
The study was approved by the local ethics council of Shiraz University of Medical Sciences (decree code: IR.SUMS.REC.1403.240). We made every effort to adhere to ethical principles in our research design and implementation:
Informed Consent: We secured written consent from all participants, ensuring they were fully informed about the study's objectives, methodology, and the intended use of the data collected.
Voluntary Participation: Participants were informed of their right to withdraw from the study at any time without any negative consequences, thereby ensuring their autonomy throughout the process.
Confidentiality and Anonymity: We maintained the anonymity and confidentiality of all participants. All data collected was de-identified prior to analysis to further protect participants' identities.
Data Security: Data was stored securely and was accessible only to the research team to guarantee the privacy of participants.
Minimal Risk: The study involved minimal risk to participants, as it focused exclusively on their feedback concerning the self-assessment toolkit, without introducing interventions or collecting sensitive personal data.
Beneficence: Our research aimed to enhance academic writing assessment and instruction, ultimately benefitting future students and educators through improved educational practices.
Justice: We ensured that all eligible participants had equal opportunities to contribute to the study, promoting equitable participation.
Consent for publication
Not applicable.
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Khojasteh, L., Kafipour, R., Pakdel, F. et al. Empowering medical students with AI writing co-pilots: design and validation of AI self-assessment toolkit. BMC Med Educ 25, 159 (2025). https://doi.org/10.1186/s12909-025-06753-3
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s12909-025-06753-3