Hallucinated citations and phantom references
How generative AI undermines scientific accuracy and integrity
For me as a researcher who regularly publishes academic papers and explains the latest findings and their implications to a broad audience, scientific accuracy and integrity are key.
In my accounts of the research procedures and observed results that I am describing, regardless of whether they are my own or somebody else’s, I always aim at being as factual and transparent as possible.
The best and most direct approach for ensuring scientific accuracy and integrity when writing is appropriate citation and referencing. Whenever a scientific statement is made, the full bibliographic information to the academic sources must be included.
There are several ways of doing so. A common way is to provide a link in the text that readers can click on, taking them directly to the website of the underlying study. Another way is to include a short citation - i.e., name of author(s) & year of publication - in the text and mention the full bibliographic information of the underlying study as a reference in a section at the end or in a footnote.
By providing appropriate citations and references, readers can verify the academic sources, i.e., not only see that they actually exist, but also that the underlying research really examined and found what is being claimed it did. They can also form their own and independent impression about the suggested interpretation of the scientific evidence.
Accordingly, when reading published academic papers and explanations of the latest findings and their implications for a broad audience, the same principles should be followed. In other words, readers should verify that this information also has scientific accuracy and integrity, i.e., that it provides appropriate citations and references.
Unfortunately, particularly when it comes to explanations of the latest scientific findings and their implications for a broad audience, I have recently observed a strong increase in ‘hallucinated citations’ and ‘phantom references’ most likely linked to the widespread use of general-purpose chatbots (like ChatGPT, Gemini, etc.) to generate content.
What are hallucinated citations and phantom references and how are they generated by AI?
‘Hallucinated citations’ and ‘phantom references’ are citations and references invented by Large Language Models (LLMs) that look real but don’t exist. They’re a creative combination of author names, paper titles, journal names and publication years that mimic genuine publications but in reality are completely fictional.
Most current LMMs are optimized to be agreeable and fluent, not to be fact-checking engines. When users ask for ‘scientific evidence’, these models often treat it as a creative writing prompt using ‘scientific-sounding’ components rather than a strict database query. They prioritise ‘looking right’ over ‘being right’.
What’s more, for non-expert content creators and readers alike, ‘hallucinated citations’ and ‘phantom references’ can easily look genuine and credible. Because LLMs create them based on real publications, they usually contain the names of real academics who previously published in the respective research areas, plausibly sounding paper titles and real academic journal names.
However, the specific combination of these elements is completely made up - authors who never published together are merged, paper titles are altered by either removing or adding words and real journal names are randomly picked and added to this cocktail.
To give an example, I can refer to an instance in which - as far as I can tell - a chatbot created a ‘hallucinated citation’ and ‘phantom reference’ of my own work. Back in 2024, an academic colleague made me aware of a published paper that comprised a citation and reference to one of my studies that read as follows:
Vrtička et al. (2019) - Vrtička, P., Bondolfi, G., and Sander, D. (2019). The neural substrates of social emotion regulation: A fMRI study on emotion reappraisal and suppression in individuals with and without social anxiety disorder. Frontiers in Human Neuroscience, 13.
At first glance, this reference looked familiar, even to me. However, upon closer inspection, I realised that it was flawed in several of its elements.
First, although I did publish with Bondolfi and Sander before, I never did so in a paper with only these three authors. Second, I did also publish a paper before that had ‘Neural substrates of social emotion regulation: A fMRI study’ in its title, but never combined with ‘on emotion reappraisal and suppression in individuals with and without social anxiety disorder’. And third, while I did publish before in the journal Frontiers in Human Neuroscience, I never did so in Volume 13 in 2019 (only in Volume 6 in 2012).
The two closest real references to my own work that I did publish, are:
Vrticka et al. (2013) - Vrticka, P., Simioni, S., Fornari, E., Schluep, M., Vuilleumier, P., & Sander, D. (2013). Neural substrates of social emotion regulation: A fMRI study on imitation and expressive suppression to dynamic facial signals. Frontiers in Psychology, 4, Article 95. https://doi.org/10.3389/fpsyg.2013.00095
Vrtička et al. (2012) - Vrtička, P., Bondolfi, G., Sander, D., & Vuilleumier, P. (2012). The Neural Substrates of Social Emotion Perception and Regulation are modulated by Adult Attachment Style. Social Neuroscience, 7(5):473-93. https://doi.org/10.1080/17470919.2011.647410
It seems highly unlikely to me that such a specific combination of elements emerged as part of a simple error or oversight. The initial citation and reference to the non-existent paper have since been removed as part of a correction and replaced by the Vrtička et al. (2012) reference listed above (second bullet point).
What can be done to restore scientific accuracy and integrity when citing and referencing?
I’m encountering ‘hallucinated citations’ and ‘phantom references’ more and more often these days. They’re basically popping up everywhere. And I personally feel that this is a real problem that systematically undermines scientific accuracy and integrity.
If a scientific claim is made, it must be verifiable. It’s associated citation and reference not only must exist but the research and results described in the cited and referenced work must closely reflect what is reported as factual information in the claim.
Well then, what can be done to avoid generating more ‘hallucinated citations’ and ‘phantom references’ and to more readily spot them in already existent content?
On the one hand, when creating content, I think that the first and foremost principle should be to not use general-purpose chatbots (like ChatGPT, Gemini, etc.) as a primary source for scientific evidence, especially in a generative way. These AI tools can be sensitively used for explaining complex concepts, summarizing scientific sources (if full texts are uploaded to specific AI tools in which context is limited to the sources provided) as well as drafting and brainstorming.
However, when using them as a primary source for scientific evidence in a generative way, the the risk of obtaining ‘hallucinated citations’ and ‘phantom references’ is just too strong of a liability. Especially because this liability is usually paired with AI’s aim to be conversational and ‘smooth’ and therefore to provide ‘polished’, oversimplified and/or factually incorrect accounts.
On the other hand, to promote scientific accuracy and transparency when consuming content, full citations and references should always be actively looked for. If they are absent or appear incomplete or inconsistent, more information should be requested from content creators. And more generally, any, and especially strong scientific claims should be regarded with great caution and always independently verified.
The tricky bit when doing so is that the devil lies in the details. Very often, ‘hallucinated citations’ and ‘phantom references’ look just fine at a first glance. As content consumers, close attention and sustained awareness are required to spot the sometimes subtle inconsistencies. And when creating content, it nowadays takes a conscious effort to resist the temptation of quantity over quality. Creators should always remember that scientific accuracy and integrity (i.e., properly researching and reporting scientific sources) are both a necessity and an ethical mandate.


