I'm not a programmer. I learned just enough R and Python to run basic statistics and make figures. For years, every analysis meant hours on Stack Overflow, copying code I barely understood, debugging cryptic errors.
AI changed this completely. Not because it writes perfect code — it doesn't. But because it writes working code that I can iterate on. The loop is now: describe what I want, get code, run it, see error, show error to AI, get fix, repeat.
Transformative for researchers who aren't trained programmers but need to analyze data.
That said: AI makes statistical errors that sound correct. You need to understand the concepts even if you don't write the code yourself. AI is not a substitute for statistical literacy.
For a complete overview of using AI across the research lifecycle, see the LLM Research Guide.
When AI Helps (and When It Doesn't)
AI Is Excellent For:
1. Code generation — Turn plain English into working code 2. Debugging — Paste error messages, get fixes 3. Data wrangling — Cleaning, reshaping, merging datasets 4. Visualization — Creating publication-quality plots 5. Explaining code — Understanding what code does 6. Translating between languages — R to Python, etc.
AI Is Dangerous For:
1. Statistical advice without oversight — Suggests plausible but wrong analyses 2. Interpreting clinical significance — Can't judge what matters 3. Choosing appropriate tests — Makes assumptions you need to verify 4. Sample size calculations — Gets power analysis wrong 5. Causal inference — Doesn't understand confounding
Rule: Use AI to write code that implements analyses you understand. Don't use AI to decide what analysis to run.
Tool Selection for Data Analysis
ChatGPT Plus with Code Interpreter (Best Overall)
OpenAI's Code Interpreter (now called "Advanced Data Analysis") is the best tool for most research data analysis.
Why it's good:
- Executes Python code in a sandbox
- Can upload CSV/Excel files directly
- Shows code, output, and plots
- Iterates on errors automatically
- Handles datasets up to ~100MB
Cost: $20/month (ChatGPT Plus)
When to use it: Exploratory analysis, descriptive statistics, basic modeling, visualization
Claude with Artifacts (Best for R Users)
Claude can't execute code, but its Artifacts feature shows code in a readable pane and it's better at R than ChatGPT.
Why it's good:
- Better R code generation
- Cleaner code output
- Good at explaining statistical concepts
- Can analyze methods from papers
Limitation: Can't run code, you copy-paste into RStudio
When to use it: R-based analysis, understanding statistics, learning new methods
Local Execution (Most Control)
For sensitive data or complex workflows, run AI-generated code locally.
Workflow:
- Ask AI for code (using ChatGPT or Claude)
- Copy code to your environment (RStudio, Jupyter)
- Run, debug, iterate
- Paste errors back to AI for fixes
When to use it: PHI/sensitive data, large datasets, production analyses
Practical Workflow: Descriptive Statistics
Let's walk through a real example: analyzing a clinical trial dataset.
Dataset: trial_data.csv with columns: patient_id, age, sex, treatment_arm, baseline_score, followup_score, adverse_event
Step 1: Upload and Explore
Prompt to ChatGPT Code Interpreter:
I have a clinical trial dataset. Please:
1. Load the CSV
2. Show the first 10 rows
3. Describe the dataset (number of rows, columns, data types)
4. Check for missing values
5. Provide summary statistics for all numeric columns
[Upload trial_data.csv]
ChatGPT will generate Python code like:
import pandas as pd
import numpy as np
# Load data
df = pd.read_csv('trial_data.csv')
# First 10 rows
print(df.head(10))
# Dataset info
print(f"\nDataset shape: {df.shape}")
print(f"\nData types:\n{df.dtypes}")
# Missing values
print(f"\nMissing values:\n{df.isnull().sum()}")
# Summary statistics
print(f"\nSummary statistics:\n{df.describe()}")
And execute it automatically.
Step 2: Create Table 1
Prompt:
Create a Table 1 comparing baseline characteristics by treatment arm:
- Age (mean ± SD)
- Sex (n, %)
- Baseline score (mean ± SD)
Include p-values (t-test for continuous, chi-square for categorical).
Format as a clean markdown table.
ChatGPT will generate code using scipy.stats for tests and format results. The output needs review — check that tests are appropriate for your data.
Step 3: Primary Analysis
Prompt:
Analyze the primary outcome (change in score from baseline to follow-up)
by treatment arm.
1. Calculate change score (followup - baseline) for each patient
2. Compare change scores between arms using independent t-test
3. Report: mean change ± SD for each arm, difference, 95% CI, p-value
4. Create a box plot comparing change scores by arm
Critical: Verify the analysis plan matches your protocol. AI might suggest t-test when you need Mann-Whitney, or ignore paired structure.
Step 4: Handle Missing Data
Prompt:
I have missing follow-up scores for 15 patients.
Suggest appropriate approaches for handling missingness and implement
complete case analysis (excluding patients with missing follow-up).
Then perform sensitivity analysis using multiple imputation (5 imputations).
Warning: AI will generate code, but you must verify:
- Is missingness truly at random (MAR)?
- Are complete case analysis assumptions met?
- Are imputation model specifications appropriate?
Don't blindly trust AI recommendations on missing data.
Practical Workflow: Regression Analysis
Linear Regression
Prompt:
Run a multivariable linear regression predicting follow-up score from:
- Treatment arm (reference = control)
- Age (continuous)
- Sex (reference = male)
- Baseline score (continuous)
Report:
- Coefficient estimates with 95% CI
- P-values
- R-squared
- Check model assumptions (residual plots, normality)
ChatGPT will generate:
from sklearn.linear_model import LinearRegression
from scipy import stats
import statsmodels.api as sm
# Prepare data
X = pd.get_dummies(df[['treatment_arm', 'age', 'sex', 'baseline_score']],
drop_first=True)
y = df['followup_score']
# Fit model
model = sm.OLS(y, sm.add_constant(X)).fit()
# Results
print(model.summary())
# Assumption checks
import matplotlib.pyplot as plt
fig, axes = plt.subplots(1, 2, figsize=(12, 5))
# Residual plot
axes[0].scatter(model.fittedvalues, model.resid)
axes[0].axhline(y=0, color='r', linestyle='--')
axes[0].set_xlabel('Fitted values')
axes[0].set_ylabel('Residuals')
# Q-Q plot
stats.probplot(model.resid, dist="norm", plot=axes[1])
plt.tight_layout()
plt.show()
You must verify:
- Reference categories are correct
- Model assumptions are met (check the plots)
- Coefficients make clinical sense
- Sample size is adequate for number of predictors
Logistic Regression
Prompt:
Run logistic regression predicting adverse events (binary outcome) from:
- Treatment arm
- Age
- Sex
- Baseline score
Report odds ratios with 95% CI and p-values.
AI handles this well, but verify:
- Sample size (at least 10 events per predictor)
- Convergence (check for warnings)
- Odds ratios are interpretable in your clinical context
Cox Proportional Hazards (Survival Analysis)
Prompt:
I have survival data with columns: patient_id, treatment_arm, age, sex,
time_to_event (months), event (1=death, 0=censored).
Run Cox proportional hazards regression predicting overall survival from
treatment arm, adjusting for age and sex.
Report:
- Hazard ratios with 95% CI
- P-values
- Check proportional hazards assumption
- Create Kaplan-Meier curves by treatment arm
ChatGPT will use lifelines library:
from lifelines import CoxPHFitter, KaplanMeierFitter
import matplotlib.pyplot as plt
# Cox regression
cph = CoxPHFitter()
cph.fit(df[['time_to_event', 'event', 'treatment_arm', 'age', 'sex']],
duration_col='time_to_event', event_col='event')
print(cph.summary)
# Check proportional hazards
cph.check_assumptions(df, p_value_threshold=0.05)
# Kaplan-Meier curves
kmf = KaplanMeierFitter()
fig, ax = plt.subplots(figsize=(10, 6))
for arm in df['treatment_arm'].unique():
mask = df['treatment_arm'] == arm
kmf.fit(df.loc[mask, 'time_to_event'],
df.loc[mask, 'event'],
label=arm)
kmf.plot_survival_function(ax=ax)
plt.xlabel('Time (months)')
plt.ylabel('Survival probability')
plt.title('Kaplan-Meier Curves by Treatment Arm')
plt.show()
You must verify:
- Proportional hazards assumption is met (check output)
- Censoring is appropriate
- Follow-up time is adequate
Data Visualization
AI excels at creating publication-quality figures.
Forest Plots
Prompt:
Create a forest plot showing hazard ratios from a Cox regression with these
results:
Variable | HR | 95% CI Lower | 95% CI Upper | P-value
---------|----|--------------|--------------|---------
Treatment | 0.65 | 0.45 | 0.93 | 0.018
Age (per year) | 1.02 | 1.00 | 1.04 | 0.042
Male sex | 1.15 | 0.82 | 1.61 | 0.420
Use matplotlib. Make it publication-ready (clear labels, appropriate size).
Box Plots, Violin Plots, Scatter Plots
Prompt:
Create a figure with three subplots:
1. Box plot of change scores by treatment arm
2. Scatter plot of baseline vs follow-up scores (colored by arm)
3. Violin plot of age distribution by treatment arm
Use seaborn. Make it publication-ready.
AI generates clean, customizable code. You can iterate:
Make the font size larger, use colorblind-friendly palette,
add individual data points to the box plot.
Common Pitfalls (and How to Avoid Them)
Pitfall 1: AI Suggests the Wrong Test
Example: You have paired data (pre/post) but AI suggests independent t-test.
Solution: Explicitly state data structure in your prompt.
Better prompt:
I have paired data (baseline and follow-up for the same patients).
Run paired t-test comparing scores.
Pitfall 2: AI Ignores Multiple Comparisons
Example: Running 20 t-tests, AI reports raw p-values without correction.
Solution: Ask for correction explicitly.
Prompt:
I'm comparing 15 biomarkers between groups. Run t-tests and
apply Benjamini-Hochberg FDR correction. Report adjusted p-values.
Pitfall 3: AI Misinterprets Clinical Context
Example: Reports statistically significant but clinically meaningless difference.
Solution: You interpret clinical significance. AI reports statistics.
Pitfall 4: Code Runs But Results Are Wrong
Example: AI uses wrong reference category, flips sign on coefficient.
Solution: Sanity-check results. If treatment effect is negative when you expected positive, verify code.
Pitfall 5: AI Doesn't Check Assumptions
Example: Runs linear regression without checking linearity, homoscedasticity, normality.
Solution: Explicitly request assumption checks in prompt.
R vs Python: Which LLM Handles Better?
Python
- ChatGPT Code Interpreter: Native Python, executes in browser
- Best for: pandas data manipulation, scikit-learn ML, quick iteration
R
- Claude: Better R code generation than ChatGPT
- Best for: Classical statistics, publication-quality ggplot2 figures, survival analysis
My workflow:
- Exploratory analysis: ChatGPT Code Interpreter (Python)
- Final analysis for paper: Claude-generated R code → run in RStudio
- Complex stats (mixed models, Bayesian): Consult statistician, use AI for code
When to Consult a Real Statistician
AI cannot replace statistical expertise. Consult a statistician for:
- Study design and sample size calculation
- Complex analyses (mixed models, structural equation modeling, Bayesian methods)
- Missing data strategies
- Interpreting unusual results
- Causal inference questions
- When reviewers question your analysis
AI helps you implement analyses. Statisticians help you choose the right analysis.
Debugging Workflow
When code fails:
- Copy the full error message
- Paste back to AI with prompt:
This code produced an error:
[paste code]
Error message:
[paste error]
What's wrong and how do I fix it?
- AI usually fixes it immediately
- If fix fails, iterate 2-3 times
- If still failing, Google the error or ask a colleague
AI handles 80% of debugging instantly. The remaining 20% requires human problem-solving.
Learning from AI Code
Don't just run code — understand it.
Prompt for learning:
Explain this code line by line in simple terms. What is each function doing?
[paste code]
Over time, you'll recognize patterns and understand what code does without explanation.
Example: Complete Analysis Workflow
Research question: Does intervention reduce hospital readmissions compared to usual care?
Step 1: Exploratory analysis (ChatGPT Code Interpreter)
Upload hospital_readmissions.csv
Dataset has: patient_id, age, sex, comorbidities, treatment_group,
readmitted_30d (1=yes, 0=no)
1. Summarize patient characteristics
2. Check for missing data
3. Compare baseline characteristics by treatment group (Table 1)
Step 2: Primary analysis (ChatGPT or Claude)
Run logistic regression predicting 30-day readmission from treatment group,
adjusting for age, sex, and comorbidity count.
Report odds ratio for treatment effect with 95% CI and p-value.
Step 3: Sensitivity analyses
1. Repeat analysis excluding patients with missing comorbidity data
2. Repeat with propensity score matching instead of multivariable adjustment
3. Create subgroup analyses by age (<65 vs ≥65)
Step 4: Visualizations
Create:
1. Forest plot showing OR from main and subgroup analyses
2. Bar chart showing readmission rates by group with error bars
Step 5: Verify and refine
- Check that all code ran without errors
- Verify results make clinical sense
- Iterate on figures for publication quality
- Copy final code to R/Python script for reproducibility
Total time: 2-3 hours instead of 2-3 days
Reproducibility and Documentation
Critical for reproducibility:
- Save all AI-generated code in scripts
- Comment code explaining what each section does
- Document AI tool and version used
- Include prompts in lab notebook
- Store raw outputs before any manual editing
Example documentation:
# Analysis conducted using ChatGPT Code Interpreter (GPT-4, Jan 2026)
# Prompt: "Run logistic regression predicting readmission from
# treatment, age, sex, comorbidities"
# Code edited to change reference group from intervention to control
Limitations and Warnings
- AI suggests plausible but wrong analyses — verify appropriateness
- AI doesn't understand your data structure — specify paired, clustered, hierarchical explicitly
- AI makes statistical errors confidently — don't outsource judgment
- AI can't assess clinical significance — you interpret meaning
- Code may not be optimal — works doesn't mean efficient
- Results may not be reproducible — random seeds, package versions matter
Cost Summary
| Tool | Cost | Best For | Limitation | |------|------|---------|-----------| | ChatGPT Code Interpreter | $20/mo | Python, executes code, visualization | Max file size ~100MB | | Claude Pro | $20/mo | R code, explaining concepts | Doesn't execute code | | Free ChatGPT | Free | Code generation (copy-paste) | Rate limits, less powerful | | Free Claude | Free | R code, statistics questions | Rate limits |
For most researchers: ChatGPT Plus ($20/mo) is sufficient. Add Claude Pro if you use R heavily.
Key Takeaways
- AI writes working code from plain English — transformative for non-programmers
- ChatGPT Code Interpreter is best for data analysis — executes Python, iterates automatically
- Claude is better for R code generation — but can't execute, you copy-paste
- Use AI to implement analyses you understand — not to decide which analysis to run
- Always verify statistical assumptions — AI generates code but doesn't check validity
- AI makes plausible statistical errors — don't outsource judgment to LLMs
- Debugging is fast — paste errors, get fixes, iterate
- Consult statisticians for complex analyses — AI supplements, doesn't replace expertise
- Document AI use for reproducibility — save prompts and code versions
- Cost is minimal — $20/mo ChatGPT Plus handles most workflows
