Introduction
In 2023, a lawyer made headlines for submitting a legal brief filled with fake case law generated by ChatGPT. He wasn't malicious; he just didn't understand how the technology worked. He thought he was using a search engine. He was actually using a text generator.
In medicine, the stakes are infinitely higher. A "hallucination" — the industry term for when an AI confidentially states a falsehood — can be dangerous. To use these tools safely, you must understand why they lie.
The Mechanism: Prediction vs. Knowledge
It Is Not a Bug
LLMs do not retrieve facts from a database. They predict the most likely next word based on patterns. When those patterns point toward a plausible but nonexistent paper, the model generates it with full confidence. It cannot distinguish real from fabricated.
Large Language Models (LLMs) are probabilistic, not deterministic.
When you ask a question, the model does not look up a fact in a database. Instead, it calculates: "Given the sequence of words so far, what is the most likely next word?"
The "Stochastic Parrot"
Imagine a parrot that has heard every medical conversation in history. If you say "The treatment for bacterial pneumonia is...", the parrot will likely squawk "Antibiotics." It doesn't know what bacteria are. It just knows that word usually follows the others.
Now imagine asking: "What is the specific dosage of drug X for a patient with this rare genetic mutation?"
If the model hasn't seen that exact pattern enough times, it might follow the pattern of a standard dosage recommendation, filling in plausible-sounding numbers. It is completing the pattern of a medical answer, not retrieving the fact of the dosage.
Types of Hallucinations
- Fact Fabrication: Inventing a paper, a citation, or a bio.
- Example: Citing "Smith et al., JAMA 2024" for a study that doesn't exist.
- Instruction Contradiction: Ignoring a negative constraint.
- Example: You ask for a summary without mentioning prognosis, and it mentions prognosis anyway.
- Logical Fallacy: Reasoning correctly through steps A and B, but failing at step C.
- Example: Correctly identifying symptoms but concluding with a diagnosis that doesn't match them.
How to Detect Hallucinations
Hallucinations are treacherous because they sound plausible. They are often grammatically perfect and delivered with total confidence.
Red Flags:
- Vague Citations: "Recent studies suggest..." (Which studies?).
- Generic Numbers: Round numbers or standard doses that don't account for specific patient factors (e.g., renal function).
- Over-Compliance: The AI agrees with a false premise you put in the prompt.
- You: "Why is Vitamin C the standard of care for glioblastoma?"
- AI: "Vitamin C is considered by some to be a standard..." (It tries to be helpful rather than correcting you).
Strategies for Verification
1. The "Zero-Shot" Verification
Don't trust the AI's own memory. Instead, use a tool like Perplexity or Bing Copilot that searches the web.
- Bad: "What is the survival rate for..." (ChatGPT native).
- Good: "Search for the latest SEER data on survival rates for..." (Perplexity).
2. Request Quotes/Excerpts
If using a tool like NotebookLM or Claude with a PDF uploaded:
- Prompt: "Answer this question based ONLY on the text provided. Quote the sentence that supports your answer."
3. Cross-Examination
Ask the model to critique itself.
- Prompt: "Are you sure? Please check your previous answer for potential errors or hallucinations. If you are uncertain, state that you do not know."
Conclusion
Hallucinations are not a "bug" that will be easily fixed - a feature of how LLMs work (creativity and generation). While models are getting better (GPT-4 hallucinates far less than GPT-3.5), the risk never hits zero.
The Golden Rule: If the output affects patient care, verify it. If you can't verify it, don't use it.
