Back to Guides
GuideToolsBeginner

When Should I NOT Use ChatGPT? Matching AI Tools to Tasks

Ramez Kouzy, MD 6 min

What you'll learn

  • Why general-purpose models are not enough
  • Evidence tools: Open Evidence, Consensus, Elicit
  • Search-grounded tools: Perplexity, Gemini
  • Code-capable tools: ChatGPT Code Interpreter, Claude Artifacts
  • The emerging frontier: agents, deep research, reasoning models

The Hammer Problem

When you first discover how capable general-purpose AI models are, it is tempting to use them for everything. Need a citation? Ask ChatGPT. Need to find a paper? Ask ChatGPT. Need to analyze data? Ask ChatGPT. Need to check if a drug interaction exists? Ask ChatGPT.

This is the equivalent of using a hammer for every task in the clinic. Sometimes you need a scalpel. Sometimes you need an Allen wrench.

The AI landscape is much bigger than one chatbot, and knowing which tool to reach for is the difference between frustration and efficiency.


What "General Purpose" Actually Means

ChatGPT, Claude, and Gemini are general-purpose language models. They were trained on broad swaths of the internet and can handle a remarkable range of tasks: writing, reasoning, coding, summarization, brainstorming, translation, and more.

But "general purpose" means exactly what it sounds like: good at many things, specialized in nothing.

It is the difference between an internist and a subspecialist. The internist can handle most problems competently, but for a complex brachytherapy plan, you want the disease-site expert.

The AI ecosystem now has specialists. Here is how to map your tasks to the right tool.

The Rule

If your task depends on real data (papers, web results, calculations), do not ask a general model to improvise. Use a tool that actually accesses that data.


If You Want Medical Evidence: Use Evidence Tools

This is where the 2023 citation disasters came from. People asked general models to retrieve specific medical literature, and the models - which do not have access to PubMed and cannot look up real papers - generated plausible-sounding but fake citations.

ToolWhat It DoesBest Use Case
Open EvidenceAI answers grounded in medical literature and clinical guidelinesEvidence-based clinical questions with real citations
ConsensusSearches published scientific papers and synthesizes findings with AIQuestions like 'What does the evidence say about X?'
ElicitAI-powered research assistant for literature search and synthesisLiterature reviews, extracting data from papers
Semantic ScholarAI-powered academic search from Allen InstituteFinding relevant papers, understanding citation networks

The rule: If you need real citations to real papers, use a tool that searches real databases. Do not ask a general model to make them up for you.


If You Want to Search the Web: Use Search-Grounded Tools

General models like base ChatGPT and Claude are trained on data with a cutoff date. They do not know what happened last week. If you ask about a trial that was just published, they will either confess ignorance or, worse, confabulate an answer.

For current information:

Gemini - Has built-in Google Search integration. When you ask a factual question, it can pull from live web results. This makes it substantially more reliable for recent information.

Perplexity - An AI-powered search engine that synthesizes results from multiple web sources and provides citations to the actual pages it found. Think of it as Google Search with an AI brain. For factual, current questions, this is often the best first stop.

ChatGPT with browsing - When enabled, ChatGPT can search the web and provide sourced answers. Not as seamless as Perplexity or Gemini's native integration, but functional.

The rule: If your question depends on current information, use a tool with web access. Do not trust training data for anything time-sensitive.


If You Want to Write or Run Code: Use Code-Capable Tools

Some models can not only write code but actually execute it. This is enormously useful for data analysis.

ChatGPT with Code Interpreter - Upload a CSV or Excel file, ask it to analyze your data, and it writes and runs Python code to produce statistics, plots, and summaries. You do not need to know Python. You describe what you want in plain language, and it handles the code.

Claude with Artifacts - Claude can create interactive documents, charts, and code that you can preview directly in the interface. Particularly good for quick visualizations and exploratory analysis.

The rule: If you need to analyze data, generate plots, or run statistical tests, use a model with code execution capability. Asking a text-only model to "calculate the p-value" will get you a made-up number.


If You Want to Chat With Your Own Documents: Use Grounded Tools

General models do not know the contents of your personal notes, your institution's protocols, or that PDF you downloaded yesterday. But some tools let you upload your own documents and chat exclusively with that content.

NotebookLM - Google's tool that lets you upload documents and ask questions grounded entirely in your uploaded content. We use this extensively and have a dedicated article on our workflow.

Claude Projects - You can upload files to a Claude Project and the model will reference them in its responses. Good for ongoing work where you need the model to know your specific context.

The rule: If you want AI to reference YOUR materials rather than its general training, use a tool designed for document-grounded conversation.


The Emerging Frontier

The tool landscape is expanding rapidly into new categories worth knowing about:

Agents are AI systems that can take multi-step actions autonomously. Instead of just generating text, an agent might search the web, read several papers, synthesize findings, and produce a report - all from a single request. Think of the difference between asking a colleague a question and handing a task to a research assistant. Agents do the latter.

Deep research tools take this further. Give them a complex question and they will spend minutes (not seconds) conducting a multi-step investigation: searching sources, reading content, synthesizing across documents, and producing a comprehensive report. OpenAI's Deep Research, Gemini's Deep Research, and Perplexity's research modes all offer versions of this.

Reasoning models (like OpenAI's o1 and o3) think step by step before producing an answer. They are slower but meaningfully better at complex problems that require careful logic - clinical scenarios with multiple variables, treatment planning decisions with competing priorities, or statistical reasoning.

These are not future concepts. They are available today across the major platforms, and they represent where the field is heading: AI that does not just answer questions but actually helps you complete complex workflows.


A Quick Reference Guide

TaskBest ToolWhy Not General ChatGPT?
Find real medical citationsElicit, Consensus, Open EvidenceGeneral models fabricate citations
Get current informationPerplexity, GeminiTraining data has a cutoff
Analyze a datasetChatGPT Code Interpreter, Claude ArtifactsText-only models cannot run calculations
Chat with your documentsNotebookLM, Claude ProjectsGeneral models do not know your content
Complex multi-step researchDeep Research toolsSingle-turn answers are too shallow
Careful clinical reasoningReasoning models (o1, o3)Standard models answer too quickly

The Bottom Line

ChatGPT is not the only AI tool, and treating it as one is like treating Google as the entire internet.

The landscape is broader, more specialized, and more capable than any single product. Build a toolkit. Know which tool to reach for.

And when someone says "AI cannot do X" - check whether they were just using the wrong tool for the job.

Enjoyed this guide?

Subscribe to Beam Notes for more insights delivered to your inbox.

Subscribe