A Comparative Analysis of Human and Generative AI-Based Evaluation of Scientific Abstracts
DOI:
https://doi.org/10.55040/q8sgtr65Keywords:
comparative analysis, artificial Intelligence, abstract, evaluation, educational researchAbstract
This study analyzes the differences in the evaluation of abstracts submitted to the II Educational Research Congress COIE-CIEDU 2024, comparing the assessments made by two subject-matter experts with those generated by a generative artificial intelligence system. A standardized evaluation rubric was used, and mean difference tests were applied to determine the presence of statistically significant discrepancies. The results indicate that, while no significant differences were found between the human experts, statistically significant discrepancies were identified between the human evaluations and those generated by the generative artificial intelligence system (p < 0.05). This finding demonstrates that, although human judgment maintains methodological consistency, generative artificial intelligence is not yet capable of replicating the academic quality standards applied by expert reviewers. It is concluded that, although generative artificial intelligence may serve as a valuable support tool for technical or administrative tasks within the review process, it is not ready to autonomously perform academic peer-review functions. Its implementation is recommended as a complementary resource, under clear supervision protocols and continuous performance validation, in order to ensure fairness, rigor, and integrity in the evaluation of scientific content.
References
Acosta Camino, D. F., & Andrade Clavijo, B. P. (2024). La inteligencia artificial en la investigación y redacción de textos académicos. Espíritu Emprendedor TES, 8(1), 19–34. https://doi.org/10.33970/eetes.v8.n1.2024.369
Anderson, D., Sweeney, D., & Williams, T. (2008). Estadística para administración y economía. 10a edición. Cengage Learning Editores, S.A.
Cheng, S. L., Tsai, S. J., Bai, Y. M., Ko, C. H., Hsu, C. W., Yang, F. C., ... & Su, K. P. (2023). Comparisons of quality, correctness, and similarity between ChatGPT-generated and human-written abstracts for basic research: cross-sectional study. Journal of Medical Internet Research, 25(1), e51229. https://doi.org/10.2196/51229
Dergaa, I., Chamari, K., Żmijewski, P., & Ben Saad, H. (2023). From human writing to artificial intelligence generated text: examining the prospects and potential threats of ChatGPT in academic writing. Biology of sport, 40(2), 615-622. https://doi.org/10.5114/biolsport.2023.125623
Faintuch, J., & Faintuch, S. (2022). Integrity of Scientific Research: Fraud, Misconduct and Fake News in the Academic, Medical and Social Environment. Springer Nature. https://doi.org/10.1007/978-3-030-99680-2
Farber, S. (2024). Enhancing peer review efficiency: A mixed‐methods analysis of artificial intelligence‐assisted reviewer selection across academic disciplines. Learned Publishing, 37(4), Article e1638. https://doi.org/10.1002/leap.1638
González, P., Wilson, G., & Purvis, A. (2022). Peer review in academic publishing: Challenges in achieving the gold standard. Journal of University Teaching and Learning Practice, 19(5). https://doi.org/10.53761/1.19.5.1
Harris, R. W., & Davison, R. M. (2020). Peer review: Academia's most important but least understood task. The Electronic Journal of Information Systems in Developing Countries, 86(6), isd212150. https://doi.org/10.1002/isd2.12150
Heesen, R., & Bright, L. K. (2021). Is Peer Review a Good Idea? The British Journal for the Philosophy of Science, 72(3), 635-663. https://doi.org/10.1093/bjps/axz029
Jiang, Y. (2024). An Exploration of AI Aid to the First Review of Academic Journals. Journal of New Media and Economics, 1(3), 97-100. https://doi.org/10.62517/jnme.202410317
Kelly, J., Sadeghieh, T., & Adeli, K. (2014). Peer Review in Scientific Publications: Benefits, Critiques, & A Survival Guide. EJIFCC, 25(3), 227 - 243. https://pmc.ncbi.nlm.nih.gov/articles/PMC4975196/
Kharipova, R., Khaydarov, I., Akramova, S., Lutfullaeva, D., Saidov, S., Erkinov, A., Azizkhonova, S., & Erkinova, N. (2024). The Role of Artificial Intelligence Technologies in Evaluating the Veracity of Scientific Research. Journal of Internet Services and Information Security, 14(4), 554-568. https://doi.org/10.58346/JISIS.2024.I4.035
Koga, S. (2023). The Integration of Large Language Models Such as ChatGPT in Scientific Writing: Harnessing Potential and Addressing Pitfalls. Korean journal of radiology, 24(9), 924-925. https://doi.org/10.3348/kjr.2023.0738
Kousha, K., & Thelwall, M. (2024). Artificial intelligence to support publishing and peer review: A summary and review. Learned Publishing, 37(1), 4-12. https://doi.org/10.1002/leap.1570
Liu, R., & Shah, N. B. (2023). ReviewerGPT? An Exploratory Study on Using Large Language Models for Paper Reviewing. arXiv preprint, arXiv:2306.00622 https://doi.org/10.48550/arXiv.2306.00622
Liu, J. Q., Hui, K. T., Al Zoubi, F., Zhou, Z. Z., Samartzis, D., Yu, C.C., Chang, J. R., & Wong, A. Y. (2024). The great detectives: humans versus AI detectors in catching large language model-generated medical writing. International Journal for Educational Integrity, 20(8), 1-14. https://doi.org/10.1007/s40979-024-00155-6
Meléndez, N., Gibertoni, J., Briceño, M., & Lucente, R. (2023). Metodología de evaluación cualitativa de ensayos en educación superior utilizando inteligencia artificial (IA): Modelos lingüísticos Avanzados (LLM). Actas del II Congreso de Creatividad e Innovación en Educación (CIE-2023). Presentado en II Congreso de Creatividad e Innovación en Educación. https://doi.org/10.47300/978-9962-738-17-6-11
Mondal, S., Juhi, A., Kumari, A., Dhanvijay, A., Mittal, S., & Mondal, H. (2023). Peer review in scientific publishing: Current practice, guidelines, relevancy, and way forward. Cosmoderma, 3(40). https://doi.org/10.25259/CSDM_35_2023
Mondal, H., & Mondal, S. (2024). Peer review: opportunity and challenges. Indian J Cardiovasc Dis Women, 9(2), 118-120. https://doi.org/10.25259/IJCDW_6_2024
Mostafapour, M., Fortier, J.H., Pacheco, K., Murray, H., & Garber, G.E. (2024). Evaluating Literature Reviews Conducted by Humans Versus ChatGPT: Comparative Study. JMIR AI, 3, Article e56537. https://doi.org/10.2196/56537
Nematov, D. (2025). Progress, Challenges, Threats and Prospects of ChatGPT in Science and Education: How Will AI Impact the Academic Environment? SSRN, 1-17. https://doi.org/10.2139/ssrn.5188827
Riding, J. B. (2022). An evaluation of the process of peer review. Palynology, 47(1). Article 2151052. https://doi.org/10.1080/01916122.2022.2151052
Salman, H. A., Ahmad, M. A., Ibrahim, R., & Mahmood, J. (2025). Systematic analysis of generative AI tools integration in academic research and peer review. Online Journal of Communication and Media Technologies, 15(1), Article e202502. https://doi.org/10.30935/ojcmt/15832
Shcherbiak, A., Habibnia, H., Böhm, R., & Fiedler, S. (2024). Evaluating science: A comparison of human and AI reviewers. Judgment and Decision Making, 19(21). https://doi.org/10.1017/jdm.2024.24
Silva, G. S., Khera, R., & Schwamm, L. H. (2024). Reviewer Experience Detecting and Judging Human Versus Artificial Intelligence Content: The Stroke Journal Essay Contest. Stroke, 55(10). 2573-2578. https://doi.org/10.1161/STROKEAHA.124.045012
Yeadon, W., Agra, E., Inyang, O., Mackay, P., & Mizouri, A. (2024). Evaluating AI and Human Authorship Quality in Academic Writing through Physics Essays. European Journal of Physics, 45, Article 055703. https://doi.org/10.1088/1361-6404/ad669d
Van Dijk, S. H. B., Brusse-Keizer, M. G. J., Bucsán, C. C., van der Palen, J., Doggen, C. J. M., & Lenferink, A. (2023). Artificial intelligence in systematic reviews: promising when appropriately used. BMJ Open, 13(7), Article e072254. https://doi.org/10.1136/bmjopen-2023-072254
Vuong, Q. H., La, V. P., Nguyen, M. H., Jin, R., Le, T. T. (2023). Are we at the start of the artificial intelligence era in academic publishing? Science Editing, 10(2), 158-164. https://doi.org/10.6087/kcse.310
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Aura L. López de Ramos, Belka Bonnett-Bogallo, Dimas Concepción, Gustavo Quintero-Barreto, Jarles Durán, Nelly Meléndez, yuly

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
https://creativecommons.org/licenses/by-nc-nd/4.0