By Dr. Liji Thomas, MD Reviewed by Benedette Cuffari, M.Sc. Sep 6 2024 A cutting-edge AI tool that beats traditional methods in spotting AI-generated content like ChatGPT articles, helping to safeguard scientific research from plagiarism!
Study: Detection of ChatGPT fake science with the xFakeSci learning algorithm . Image Credit: dauf / Shutterstock.com
The growing use of generative artificial intelligence (AI) tools like ChatGPT has increased the risk of human-appearing content plagiarized from other sources. A new study published in Scientific Reports assesses the performance of xFakeSci in differentiating authentic scientific content from ChatGPT-generated content. Threats posed to research by generative AI
AI generates content based on the supply of prompts or commands to direct its processing. Aided and abetted by social media, predatory journals have published fake scientific articles to lend authority to dubious viewpoints. This could be further exacerbated by publishing AI-generated content in actual scientific publications.
Previous research has emphasized the challenges associated with distinguishing AI-generated content from authentic scientific content. Thus, there remains an urgent need to develop accurate detection algorithms. Aim and overview of the study
In the current study, researchers utilized xFakeSci, a novel learning algorithm that can differentiate AI-generated content from authentic scientific content. This network-driven label prediction algorithm encompasses both single and multi modes of operation that are trained using one and multiple types of resources, respectively.
During training, researchers used engineered prompts to identify fake documents and their distinctive traits with ChatGPT. Thereafter, xFakeSci was used to predict the document class and its genuineness.
Two types of network training models were based on ChatGPT-generated and human-written content obtained from PubMed abstracts. Both data sets were analyzed for articles on cancer, depression, and Alzheimer’s disease (AD). Differences between two types of content
One of the striking differences between ChatGPT- and human-generated articles was the number of nodes and edges calculated from each type of content. Related StoriesScientists discover key to activating natural killer cells against cancerGarlics antioxidant and nitric oxide boosting effects may help lower blood pressureRecent studies underscore the importance of diet for the prevention of cancer
ChatGPT-generated content had significantly fewer nodes but a higher number of edges for a lower node-to-edge ratio. Moreover, AI-generated datasets had higher ratios for each of the k-Folds as compared to actual scientist-derived content on all three diseases. Testing scores
After training and calibration, xFakeSci was tested on 100 articles for each disease, 50 each from PubMed and ChatGPT. F1 scores were calculated from the true positives, true negatives, false positives, and false negatives.
F1 scores of 80%, 91%, and 89% were obtained for articles on depression, cancer, and AD, respectively. Whereas all human-generated content was detected by xFakeSci, only 25, 41, and 38 of ChatGPT-generated documents on these three diseases, respectively, were accurately identified. ChatGPT-generated content was more accurately identified when mixed with older authentic articles for analysis in a mixed class.
ChatGPT is classified as PubMed with (FP (false positives) =25), indicating that 50% of the test documents are misclassified as real publications.” Benchmarking xFakeSci
Against accepted or top 10 conventional data mining algorithms like Naïve Bayes, Support Vector Machine (SVM), Linear SVM, and Logistic Regression, xFakeSci scores remained between 80% and 91% for articles published between 2020 and 2024. In comparison, the other algorithms showed fluctuating performance, with scores ranging between 43% and 52%.
With earlier articles published between 2014-2019 and 2010-2014, the same disparity was observed for xFakeSci and other algorithms at 80-94% and 38%-52%, respectively. Thus, xFakeSci outperforms the other algorithms across all time periods. Conclusions
The xFakeSci algorithm is particularly appropriate for multi-mode classification to test a mixed test set and produce accurate labels for each type. The inclusion of a calibration step based on ratios and proximity distances improves the classification aspect of this algorithm; however, it precludes the addition of excessive sample quantities.
The multi-mode classification aspect of xFakeSci allowed this algorithm to accurately identify real articles, even when mixed with ChatGPT-generated articles. However, xFakeSci was not as successful in identifying all ChatGPT-generated content.
Networks generated from ChatGPT were associated with a lower node-to-edge ratio, thus indicating their higher connectedness, which was accompanied by an increased ratio of bigrams to total word count for each document.
Since ChatGPT was developed to produce human-like content by predicting the next word on the basis of statistical correlations, its objectives do not agree with the scientific goals of documenting hypothesis testing, experimentation, and observations.
The xFakeSci algorithm may have other applications, such as distinguishing potentially fake parts of ChatGPT-generated clinical notes, interventions, and summaries of clinical experiments. Nevertheless, ethical guidelines must be enforced to prevent the irresponsible use of generative AI tools, even while recognizing their benefits.
AI can provide simulated data, build segments of code for multiple programming applications, and assist in teaching, while helping to present scientific research in readable grammatical English for non-native speakers. However, AI-generated content may plagiarize research documents available online, which could interfere with scientific progress and learning. Thus, journal publishers have an important role in implementing detection algorithms and other technologies to identify counterfeit reports.
Future research could use knowledge graphs to cluster closely linked fields of publication to improve the accuracy of detection, training, and calibration, as well as test the performance of xFakeSci using multiple data sources. Journal reference: Hamed, A. A., & Wu, X. (2024). Detection of ChatGPT fake science with the xFakeSci learning algorithm. Scientific Reports. doi:10.1038/s41598-024-66784-6.