BeamPath AI | Education for Medicine & Science

The Hammer Problem

When you first discover how capable general-purpose AI models are, it is tempting to use them for everything. Need a citation? Ask ChatGPT. Need to find a paper? Ask ChatGPT. Need to analyze data? Ask ChatGPT. Need to check if a drug interaction exists? Ask ChatGPT.

This is the equivalent of using a hammer for every task in the clinic. Sometimes you need a scalpel. Sometimes you need an Allen wrench.

The AI landscape is much bigger than one chatbot, and knowing which tool to reach for is the difference between frustration and efficiency.

What "General Purpose" Actually Means

ChatGPT, Claude, and Gemini are general-purpose language models. They were trained on broad swaths of the internet and can handle a remarkable range of tasks: writing, reasoning, coding, summarization, brainstorming, translation, and more.

But "general purpose" means exactly what it sounds like: good at many things, specialized in nothing.

It is the difference between an internist and a subspecialist. The internist can handle most problems competently, but for a complex brachytherapy plan, you want the disease-site expert.

The AI ecosystem now has specialists. Here is how to map your tasks to the right tool.

The Rule

If your task depends on real data (papers, web results, calculations), do not ask a general model to improvise. Use a tool that actually accesses that data.

If You Want Medical Evidence: Use Evidence Tools

This is where the 2023 citation disasters came from. People asked general models to retrieve specific medical literature, and the models - which do not have access to PubMed and cannot look up real papers - generated plausible-sounding but fake citations.

Tool	What It Does	Best Use Case
Open Evidence	AI answers grounded in medical literature and clinical guidelines	Evidence-based clinical questions with real citations
Consensus	Searches published scientific papers and synthesizes findings with AI	Questions like 'What does the evidence say about X?'
Elicit	AI-powered research assistant for literature search and synthesis	Literature reviews, extracting data from papers
Semantic Scholar	AI-powered academic search from Allen Institute	Finding relevant papers, understanding citation networks

The rule: If you need real citations to real papers, use a tool that searches real databases. Do not ask a general model to make them up for you.

If You Want to Search the Web: Use Search-Grounded Tools

General models like base ChatGPT and Claude are trained on data with a cutoff date. They do not know what happened last week. If you ask about a trial that was just published, they will either confess ignorance or, worse, confabulate an answer.

For current information:

Gemini - Has built-in Google Search integration. When you ask a factual question, it can pull from live web results. This makes it substantially more reliable for recent information.

Perplexity - An AI-powered search engine that synthesizes results from multiple web sources and provides citations to the actual pages it found. Think of it as Google Search with an AI brain. For factual, current questions, this is often the best first stop.

ChatGPT with browsing - When enabled, ChatGPT can search the web and provide sourced answers. Not as seamless as Perplexity or Gemini's native integration, but functional.

The rule: If your question depends on current information, use a tool with web access. Do not trust training data for anything time-sensitive.

If You Want to Write or Run Code: Use Code-Capable Tools

Some models can not only write code but actually execute it. This is enormously useful for data analysis.

ChatGPT with Code Interpreter - Upload a CSV or Excel file, ask it to analyze your data, and it writes and runs Python code to produce statistics, plots, and summaries. You do not need to know Python. You describe what you want in plain language, and it handles the code.

Claude with Artifacts - Claude can create interactive documents, charts, and code that you can preview directly in the interface. Particularly good for quick visualizations and exploratory analysis.

The rule: If you need to analyze data, generate plots, or run statistical tests, use a model with code execution capability. Asking a text-only model to "calculate the p-value" will get you a made-up number.

If You Want to Chat With Your Own Documents: Use Grounded Tools

General models do not know the contents of your personal notes, your institution's protocols, or that PDF you downloaded yesterday. But some tools let you upload your own documents and chat exclusively with that content.

NotebookLM - Google's tool that lets you upload documents and ask questions grounded entirely in your uploaded content. We use this extensively and have a dedicated article on our workflow.

Claude Projects - You can upload files to a Claude Project and the model will reference them in its responses. Good for ongoing work where you need the model to know your specific context.

The rule: If you want AI to reference YOUR materials rather than its general training, use a tool designed for document-grounded conversation.

The Emerging Frontier

The tool landscape is expanding rapidly into new categories worth knowing about:

Agents are AI systems that can take multi-step actions autonomously. Instead of just generating text, an agent might search the web, read several papers, synthesize findings, and produce a report - all from a single request. Think of the difference between asking a colleague a question and handing a task to a research assistant. Agents do the latter.

Deep research tools take this further. Give them a complex question and they will spend minutes (not seconds) conducting a multi-step investigation: searching sources, reading content, synthesizing across documents, and producing a comprehensive report. OpenAI's Deep Research, Gemini's Deep Research, and Perplexity's research modes all offer versions of this.

Reasoning models (like OpenAI's o1 and o3) think step by step before producing an answer. They are slower but meaningfully better at complex problems that require careful logic - clinical scenarios with multiple variables, treatment planning decisions with competing priorities, or statistical reasoning.

These are not future concepts. They are available today across the major platforms, and they represent where the field is heading: AI that does not just answer questions but actually helps you complete complex workflows.

A Quick Reference Guide

Task	Best Tool	Why Not General ChatGPT?
Find real medical citations	Elicit, Consensus, Open Evidence	General models fabricate citations
Get current information	Perplexity, Gemini	Training data has a cutoff
Analyze a dataset	ChatGPT Code Interpreter, Claude Artifacts	Text-only models cannot run calculations
Chat with your documents	NotebookLM, Claude Projects	General models do not know your content
Complex multi-step research	Deep Research tools	Single-turn answers are too shallow
Careful clinical reasoning	Reasoning models (o1, o3)	Standard models answer too quickly

The Bottom Line

ChatGPT is not the only AI tool, and treating it as one is like treating Google as the entire internet.

The landscape is broader, more specialized, and more capable than any single product. Build a toolkit. Know which tool to reach for.

And when someone says "AI cannot do X" - check whether they were just using the wrong tool for the job.

When Should I NOT Use ChatGPT? Matching AI Tools to Tasks

What you'll learn

The Hammer Problem

What "General Purpose" Actually Means

If You Want Medical Evidence: Use Evidence Tools

If You Want to Search the Web: Use Search-Grounded Tools

If You Want to Write or Run Code: Use Code-Capable Tools

If You Want to Chat With Your Own Documents: Use Grounded Tools

The Emerging Frontier

A Quick Reference Guide

The Bottom Line

What to Read Next

There Are So Many AI Models. How Do I Pick the Right One?

My AI Toolkit: What I Actually Use Every Day

ChatGPT by OpenAI

Claude by Anthropic

Enjoyed this guide?