The Hammer Problem
When you first discover how capable general-purpose AI models are, it is tempting to use them for everything. Need a citation? Ask ChatGPT. Need to find a paper? Ask ChatGPT. Need to analyze data? Ask ChatGPT. Need to check if a drug interaction exists? Ask ChatGPT.
This is the equivalent of using a hammer for every task in the clinic. Sometimes you need a scalpel. Sometimes you need an Allen wrench.
The AI landscape is much bigger than one chatbot, and knowing which tool to reach for is the difference between frustration and efficiency.
What "General Purpose" Actually Means
ChatGPT, Claude, and Gemini are general-purpose language models. They were trained on broad swaths of the internet and can handle a remarkable range of tasks: writing, reasoning, coding, summarization, brainstorming, translation, and more.
But "general purpose" means exactly what it sounds like: good at many things, specialized in nothing.
It is the difference between an internist and a subspecialist. The internist can handle most problems competently, but for a complex brachytherapy plan, you want the disease-site expert.
The AI ecosystem now has specialists. Here is how to map your tasks to the right tool.
The Rule
If your task depends on real data (papers, web results, calculations), do not ask a general model to improvise. Use a tool that actually accesses that data.
If You Want Medical Evidence: Use Evidence Tools
This is where the 2023 citation disasters came from. People asked general models to retrieve specific medical literature, and the models - which do not have access to PubMed and cannot look up real papers - generated plausible-sounding but fake citations.
| Tool | What It Does | Best Use Case |
|---|---|---|
| Open Evidence | AI answers grounded in medical literature and clinical guidelines | Evidence-based clinical questions with real citations |
| Consensus | Searches published scientific papers and synthesizes findings with AI | Questions like 'What does the evidence say about X?' |
| Elicit | AI-powered research assistant for literature search and synthesis | Literature reviews, extracting data from papers |
| Semantic Scholar | AI-powered academic search from Allen Institute | Finding relevant papers, understanding citation networks |
The rule: If you need real citations to real papers, use a tool that searches real databases. Do not ask a general model to make them up for you.
If You Want to Search the Web: Use Search-Grounded Tools
General models like base ChatGPT and Claude are trained on data with a cutoff date. They do not know what happened last week. If you ask about a trial that was just published, they will either confess ignorance or, worse, confabulate an answer.
For current information:
Gemini - Has built-in Google Search integration. When you ask a factual question, it can pull from live web results. This makes it substantially more reliable for recent information.
Perplexity - An AI-powered search engine that synthesizes results from multiple web sources and provides citations to the actual pages it found. Think of it as Google Search with an AI brain. For factual, current questions, this is often the best first stop.
ChatGPT with browsing - When enabled, ChatGPT can search the web and provide sourced answers. Not as seamless as Perplexity or Gemini's native integration, but functional.
The rule: If your question depends on current information, use a tool with web access. Do not trust training data for anything time-sensitive.
If You Want to Write or Run Code: Use Code-Capable Tools
Some models can not only write code but actually execute it. This is enormously useful for data analysis.
ChatGPT with Code Interpreter - Upload a CSV or Excel file, ask it to analyze your data, and it writes and runs Python code to produce statistics, plots, and summaries. You do not need to know Python. You describe what you want in plain language, and it handles the code.
Claude with Artifacts - Claude can create interactive documents, charts, and code that you can preview directly in the interface. Particularly good for quick visualizations and exploratory analysis.
The rule: If you need to analyze data, generate plots, or run statistical tests, use a model with code execution capability. Asking a text-only model to "calculate the p-value" will get you a made-up number.
If You Want to Chat With Your Own Documents: Use Grounded Tools
General models do not know the contents of your personal notes, your institution's protocols, or that PDF you downloaded yesterday. But some tools let you upload your own documents and chat exclusively with that content.
NotebookLM - Google's tool that lets you upload documents and ask questions grounded entirely in your uploaded content. We use this extensively and have a dedicated article on our workflow.
Claude Projects - You can upload files to a Claude Project and the model will reference them in its responses. Good for ongoing work where you need the model to know your specific context.
The rule: If you want AI to reference YOUR materials rather than its general training, use a tool designed for document-grounded conversation.
The Emerging Frontier
The tool landscape is expanding rapidly into new categories worth knowing about:
Agents are AI systems that can take multi-step actions autonomously. Instead of just generating text, an agent might search the web, read several papers, synthesize findings, and produce a report - all from a single request. Think of the difference between asking a colleague a question and handing a task to a research assistant. Agents do the latter.
Deep research tools take this further. Give them a complex question and they will spend minutes (not seconds) conducting a multi-step investigation: searching sources, reading content, synthesizing across documents, and producing a comprehensive report. OpenAI's Deep Research, Gemini's Deep Research, and Perplexity's research modes all offer versions of this.
Reasoning models (like OpenAI's o1 and o3) think step by step before producing an answer. They are slower but meaningfully better at complex problems that require careful logic - clinical scenarios with multiple variables, treatment planning decisions with competing priorities, or statistical reasoning.
These are not future concepts. They are available today across the major platforms, and they represent where the field is heading: AI that does not just answer questions but actually helps you complete complex workflows.
A Quick Reference Guide
| Task | Best Tool | Why Not General ChatGPT? |
|---|---|---|
| Find real medical citations | Elicit, Consensus, Open Evidence | General models fabricate citations |
| Get current information | Perplexity, Gemini | Training data has a cutoff |
| Analyze a dataset | ChatGPT Code Interpreter, Claude Artifacts | Text-only models cannot run calculations |
| Chat with your documents | NotebookLM, Claude Projects | General models do not know your content |
| Complex multi-step research | Deep Research tools | Single-turn answers are too shallow |
| Careful clinical reasoning | Reasoning models (o1, o3) | Standard models answer too quickly |
The Bottom Line
ChatGPT is not the only AI tool, and treating it as one is like treating Google as the entire internet.
The landscape is broader, more specialized, and more capable than any single product. Build a toolkit. Know which tool to reach for.
And when someone says "AI cannot do X" - check whether they were just using the wrong tool for the job.
