A Comparative Analysis of Human and Generative AI-Based Evaluation of Scientific Abstracts

Authors

  • Aura L. López de Ramos Educational Research Center CIEDU AIP https://orcid.org/0000-0002-8983-9704
  • Belka Bonnett-Bogallo Interamerican University of Panama (UIP)
  • Dimas Concepción Technological University of Panama
  • Gustavo Quintero-Barreto Technological University of Panama
  • Jarles Durán Libertador Experimental Pedagogical University (UPEL); International University of Science and Technology (UNICyT)
  • Nelly Meléndez Monteávila University (UMA)
  • Yuly Esteves Libertador Experimental Pedagogical University (UPEL); International University of Science and Technology (UNICyT)

DOI:

https://doi.org/10.55040/q8sgtr65

Keywords:

comparative analysis, artificial Intelligence, abstract, evaluation, educational research

Abstract

This study analyzes the differences in the evaluation of abstracts submitted to the II Educational Research Congress COIE-CIEDU 2024, comparing the assessments made by two subject-matter experts with those generated by a generative artificial intelligence system. A standardized evaluation rubric was used, and mean difference tests were applied to determine the presence of statistically significant discrepancies. The results indicate that, while no significant differences were found between the human experts, statistically significant discrepancies were identified between the human evaluations and those generated by the generative artificial intelligence system (p < 0.05). This finding demonstrates that, although human judgment maintains methodological consistency, generative artificial intelligence is not yet capable of replicating the academic quality standards applied by expert reviewers. It is concluded that, although generative artificial intelligence may serve as a valuable support tool for technical or administrative tasks within the review process, it is not ready to autonomously perform academic peer-review functions. Its implementation is recommended as a complementary resource, under clear supervision protocols and continuous performance validation, in order to ensure fairness, rigor, and integrity in the evaluation of scientific content.

Author Biography

  • Aura L. López de Ramos, Educational Research Center CIEDU AIP

    Chemical Engineer with a Master's and PhD in Chemical Engineering. She holds a Master's and Specialization in Higher Education. She is a researcher in the area of ​​ICT in Education and Transfer Phenomena in Capillary Regions (14 h on her Google Scholar Profile with over 1,000 citations). She is also Academic Coordinator of the Educational Research Center (CIEDU AIP). She teaches courses on Scientific, Technological, and Humanistic Research Methodology and Quality in Higher Education at the International University of Science and Technology (UNICyT). She is an active member of APANAC (National Association of Educational Research Associations) and a member of the National Governing Council (CDN) of the SNI (National Institute of Statistics and Census).

References

Acosta Camino, D. F., & Andrade Clavijo, B. P. (2024). La inteligencia artificial en la investigación y redacción de textos académicos. Espíritu Emprendedor TES, 8(1), 19–34. https://doi.org/10.33970/eetes.v8.n1.2024.369

Anderson, D., Sweeney, D., & Williams, T. (2008). Estadística para administración y economía. 10a edición. Cengage Learning Editores, S.A.

Cheng, S. L., Tsai, S. J., Bai, Y. M., Ko, C. H., Hsu, C. W., Yang, F. C., ... & Su, K. P. (2023). Comparisons of quality, correctness, and similarity between ChatGPT-generated and human-written abstracts for basic research: cross-sectional study. Journal of Medical Internet Research, 25(1), e51229. https://doi.org/10.2196/51229

Dergaa, I., Chamari, K., Żmijewski, P., & Ben Saad, H. (2023). From human writing to artificial intelligence generated text: examining the prospects and potential threats of ChatGPT in academic writing. Biology of sport, 40(2), 615-622. https://doi.org/10.5114/biolsport.2023.125623

Faintuch, J., & Faintuch, S. (2022). Integrity of Scientific Research: Fraud, Misconduct and Fake News in the Academic, Medical and Social Environment. Springer Nature. https://doi.org/10.1007/978-3-030-99680-2

Farber, S. (2024). Enhancing peer review efficiency: A mixed‐methods analysis of artificial intelligence‐assisted reviewer selection across academic disciplines. Learned Publishing, 37(4), Article e1638. https://doi.org/10.1002/leap.1638

González, P., Wilson, G., & Purvis, A. (2022). Peer review in academic publishing: Challenges in achieving the gold standard. Journal of University Teaching and Learning Practice, 19(5). https://doi.org/10.53761/1.19.5.1

Harris, R. W., & Davison, R. M. (2020). Peer review: Academia's most important but least understood task. The Electronic Journal of Information Systems in Developing Countries, 86(6), isd212150. https://doi.org/10.1002/isd2.12150

Heesen, R., & Bright, L. K. (2021). Is Peer Review a Good Idea? The British Journal for the Philosophy of Science, 72(3), 635-663. https://doi.org/10.1093/bjps/axz029

Jiang, Y. (2024). An Exploration of AI Aid to the First Review of Academic Journals. Journal of New Media and Economics, 1(3), 97-100. https://doi.org/10.62517/jnme.202410317

Kelly, J., Sadeghieh, T., & Adeli, K. (2014). Peer Review in Scientific Publications: Benefits, Critiques, & A Survival Guide. EJIFCC, 25(3), 227 - 243. https://pmc.ncbi.nlm.nih.gov/articles/PMC4975196/

Kharipova, R., Khaydarov, I., Akramova, S., Lutfullaeva, D., Saidov, S., Erkinov, A., Azizkhonova, S., & Erkinova, N. (2024). The Role of Artificial Intelligence Technologies in Evaluating the Veracity of Scientific Research. Journal of Internet Services and Information Security, 14(4), 554-568. https://doi.org/10.58346/JISIS.2024.I4.035

Koga, S. (2023). The Integration of Large Language Models Such as ChatGPT in Scientific Writing: Harnessing Potential and Addressing Pitfalls. Korean journal of radiology, 24(9), 924-925. https://doi.org/10.3348/kjr.2023.0738

Kousha, K., & Thelwall, M. (2024). Artificial intelligence to support publishing and peer review: A summary and review. Learned Publishing, 37(1), 4-12. https://doi.org/10.1002/leap.1570

Liu, R., & Shah, N. B. (2023). ReviewerGPT? An Exploratory Study on Using Large Language Models for Paper Reviewing. arXiv preprint, arXiv:2306.00622 https://doi.org/10.48550/arXiv.2306.00622

Liu, J. Q., Hui, K. T., Al Zoubi, F., Zhou, Z. Z., Samartzis, D., Yu, C.C., Chang, J. R., & Wong, A. Y. (2024). The great detectives: humans versus AI detectors in catching large language model-generated medical writing. International Journal for Educational Integrity, 20(8), 1-14. https://doi.org/10.1007/s40979-024-00155-6

Meléndez, N., Gibertoni, J., Briceño, M., & Lucente, R. (2023). Metodología de evaluación cualitativa de ensayos en educación superior utilizando inteligencia artificial (IA): Modelos lingüísticos Avanzados (LLM). Actas del II Congreso de Creatividad e Innovación en Educación (CIE-2023). Presentado en II Congreso de Creatividad e Innovación en Educación. https://doi.org/10.47300/978-9962-738-17-6-11

Mondal, S., Juhi, A., Kumari, A., Dhanvijay, A., Mittal, S., & Mondal, H. (2023). Peer review in scientific publishing: Current practice, guidelines, relevancy, and way forward. Cosmoderma, 3(40). https://doi.org/10.25259/CSDM_35_2023

Mondal, H., & Mondal, S. (2024). Peer review: opportunity and challenges. Indian J Cardiovasc Dis Women, 9(2), 118-120. https://doi.org/10.25259/IJCDW_6_2024

Mostafapour, M., Fortier, J.H., Pacheco, K., Murray, H., & Garber, G.E. (2024). Evaluating Literature Reviews Conducted by Humans Versus ChatGPT: Comparative Study. JMIR AI, 3, Article e56537. https://doi.org/10.2196/56537

Nematov, D. (2025). Progress, Challenges, Threats and Prospects of ChatGPT in Science and Education: How Will AI Impact the Academic Environment? SSRN, 1-17. https://doi.org/10.2139/ssrn.5188827

Riding, J. B. (2022). An evaluation of the process of peer review. Palynology, 47(1). Article 2151052. https://doi.org/10.1080/01916122.2022.2151052

Salman, H. A., Ahmad, M. A., Ibrahim, R., & Mahmood, J. (2025). Systematic analysis of generative AI tools integration in academic research and peer review. Online Journal of Communication and Media Technologies, 15(1), Article e202502. https://doi.org/10.30935/ojcmt/15832

Shcherbiak, A., Habibnia, H., Böhm, R., & Fiedler, S. (2024). Evaluating science: A comparison of human and AI reviewers. Judgment and Decision Making, 19(21). https://doi.org/10.1017/jdm.2024.24

Silva, G. S., Khera, R., & Schwamm, L. H. (2024). Reviewer Experience Detecting and Judging Human Versus Artificial Intelligence Content: The Stroke Journal Essay Contest. Stroke, 55(10). 2573-2578. https://doi.org/10.1161/STROKEAHA.124.045012

Yeadon, W., Agra, E., Inyang, O., Mackay, P., & Mizouri, A. (2024). Evaluating AI and Human Authorship Quality in Academic Writing through Physics Essays. European Journal of Physics, 45, Article 055703. https://doi.org/10.1088/1361-6404/ad669d

Van Dijk, S. H. B., Brusse-Keizer, M. G. J., Bucsán, C. C., van der Palen, J., Doggen, C. J. M., & Lenferink, A. (2023). Artificial intelligence in systematic reviews: promising when appropriately used. BMJ Open, 13(7), Article e072254. https://doi.org/10.1136/bmjopen-2023-072254

Vuong, Q. H., La, V. P., Nguyen, M. H., Jin, R., Le, T. T. (2023). Are we at the start of the artificial intelligence era in academic publishing? Science Editing, 10(2), 158-164. https://doi.org/10.6087/kcse.310

Downloads

Published

2025-07-01

Issue

Section

Original papers

How to Cite

López de Ramos, A. L., Bonnett-Bogallo, B., Concepción, D., Quintero-Barreto, G., Durán, J., Meléndez, N., & Esteves, Y. (2025). A Comparative Analysis of Human and Generative AI-Based Evaluation of Scientific Abstracts. EDUCA. International Journal for Educational Quality, 5(2), 1-21. https://doi.org/10.55040/q8sgtr65

Similar Articles

1-10 of 74

You may also start an advanced similarity search for this article.