AI-generated scientific research is polluting the online academic information ecosystem, according to a worrying report published in the Harvard Kennedy School’s Misinformation Review.
A team of researchers investigated the prevalence of research articles with evidence of artificially generated text on Google Scholar, an academic search engine that makes it easy to search for research published historically in a wealth of academic journals.
The team specifically interrogated misuse of generative pre-trained transformers (or GPTs), a type of large language model (LLM) that includes now-familiar software such as OpenAI’s ChatGPT. These models are able to rapidly interpret text inputs and rapidly generate responses, in the form of figures, images, and long lines of text.
In the research, the team analyzed a sample of scientific papers found on Google Scholar with signs of GPT-use. The selected papers contained one or two common phrases that conversational agents (commonly, chatbots) undergirded by LLMs use. The researchers then investigated the extent to which those questionable papers were distributed and hosted across the internet.
“The risk of what we call ‘evidence hacking’ increases significantly when AI-generated research is spread in search engines,” said Björn Ekström, a researcher at the Swedish School of Library and Information Science, and co-author of the paper, in a University of Borås release. “This can have tangible consequences as incorrect results can seep further into society and possibly also into more and more domains.”
The way Google Scholar pulls research from around the internet, according to the recent team, does not screen out papers whose authors lack a scientific affiliation or peer-review; the engine will pull academic bycatch—student papers, reports, preprints, and more—along with the research that has passed a higher bar of scrutiny.
The team found that two-thirds of the papers they studied were at least in part produced through undisclosed use of GPTs. Of the GPT-fabricated papers, the researchers found that 14.5% pertained to health, 19.5% pertained to the environment, and 23% pertained to computing.
“Most of these GPT-fabricated papers were found in non-indexed journals and working papers, but some cases included research published in mainstream scientific journals and conference proceedings,” the team wrote.
The researchers outlined two main risks brought about by this development. “First, the abundance of fabricated ‘studies’ seeping into all areas of the research infrastructure threatens to overwhelm the scholarly communication system and jeopardize the integrity of the scientific record,” the group wrote. “A second risk lies in the increased possibility that convincingly scientific-looking content was in fact deceitfully created with AI tools and is also optimized to be retrieved by publicly available academic search engines, particularly Google Scholar.”
Because Google Scholar is not an academic database, it is easy for the public to use when searching for scientific literature. That’s good. Unfortunately, it is harder for members of the public to separate the wheat from the chaff when it comes to reputable journals; even the difference between a piece of peer-reviewed research and a working paper can be confusing. Besides, the AI-generated text was found in some peer-reviewed works as well as those less-scrutinized write-ups, indicating that the GPT-fabricated work is muddying the waters throughout the online academic information system—not just in the work that exists outside of most official channels.
“If we cannot trust that the research we read is genuine, we risk making decisions based on incorrect information,” said study co-author Jutta Haider, also a researcher at the Swedish School of Library and Information Science, in the same release. “But as much as this is a question of scientific misconduct, it is a question of media and information literacy.”
In recent years, publishers have failed to successfully screen a handful of scientific articles that were actually total nonsense. In 2021, Springer Nature was forced to retract over 40 papers in the Arabian Journal of Geosciences, which despite the title of the journal discussed varied topics, including sports, air pollution, and children’s medicine. Besides being off-topic, the articles were poorly written—to the point of not making sense—and sentences often lacked a cogent line of thought.
Artificial intelligence is exacerbating the issue. Last February, the publisher Frontiers caught flak for publishing a paper in its journal Cell and Developmental Biology that included images generated by the AI software Midjourney; specifically, very anatomically incorrect images of signaling pathways and rat genitalia. Frontiers retracted the paper several days after its publication.
AI models can be a boon to science; the systems can decode fragile texts from the Roman Empire, find previously unknown Nazca Lines, and reveal hidden details in dinosaur fossils. But AI’s impact can be as positive or negative as the human that wields it.
Peer-reviewed journals—and perhaps hosts and search engines for academic writing—need guardrails to ensure that the technology works in service of scientific discovery, not in opposition to it.