Back to Guides
GuideResearch

AI for Data Analysis and Coding

Ramez Kouzy

I'm not a programmer. I learned just enough R and Python to run basic statistics and make figures. For years, every analysis meant hours on Stack Overflow, copying code I barely understood, debugging cryptic errors.

AI changed this completely. Not because it writes perfect code — it doesn't. But because it writes working code that I can iterate on. The loop is now: describe what I want, get code, run it, see error, show error to AI, get fix, repeat.

Transformative for researchers who aren't trained programmers but need to analyze data.

That said: AI makes statistical errors that sound correct. You need to understand the concepts even if you don't write the code yourself. AI is not a substitute for statistical literacy.

For a complete overview of using AI across the research lifecycle, see the LLM Research Guide.

When AI Helps (and When It Doesn't)

AI Is Excellent For:

1. Code generation — Turn plain English into working code 2. Debugging — Paste error messages, get fixes 3. Data wrangling — Cleaning, reshaping, merging datasets 4. Visualization — Creating publication-quality plots 5. Explaining code — Understanding what code does 6. Translating between languages — R to Python, etc.

AI Is Dangerous For:

1. Statistical advice without oversight — Suggests plausible but wrong analyses 2. Interpreting clinical significance — Can't judge what matters 3. Choosing appropriate tests — Makes assumptions you need to verify 4. Sample size calculations — Gets power analysis wrong 5. Causal inference — Doesn't understand confounding

Rule: Use AI to write code that implements analyses you understand. Don't use AI to decide what analysis to run.

Tool Selection for Data Analysis

ChatGPT Plus with Code Interpreter (Best Overall)

OpenAI's Code Interpreter (now called "Advanced Data Analysis") is the best tool for most research data analysis.

Why it's good:

  • Executes Python code in a sandbox
  • Can upload CSV/Excel files directly
  • Shows code, output, and plots
  • Iterates on errors automatically
  • Handles datasets up to ~100MB

Cost: $20/month (ChatGPT Plus)

When to use it: Exploratory analysis, descriptive statistics, basic modeling, visualization

Claude with Artifacts (Best for R Users)

Claude can't execute code, but its Artifacts feature shows code in a readable pane and it's better at R than ChatGPT.

Why it's good:

  • Better R code generation
  • Cleaner code output
  • Good at explaining statistical concepts
  • Can analyze methods from papers

Limitation: Can't run code, you copy-paste into RStudio

When to use it: R-based analysis, understanding statistics, learning new methods

Local Execution (Most Control)

For sensitive data or complex workflows, run AI-generated code locally.

Workflow:

  1. Ask AI for code (using ChatGPT or Claude)
  2. Copy code to your environment (RStudio, Jupyter)
  3. Run, debug, iterate
  4. Paste errors back to AI for fixes

When to use it: PHI/sensitive data, large datasets, production analyses

Practical Workflow: Descriptive Statistics

Let's walk through a real example: analyzing a clinical trial dataset.

Dataset: trial_data.csv with columns: patient_id, age, sex, treatment_arm, baseline_score, followup_score, adverse_event

Step 1: Upload and Explore

Prompt to ChatGPT Code Interpreter:

I have a clinical trial dataset. Please:
1. Load the CSV
2. Show the first 10 rows
3. Describe the dataset (number of rows, columns, data types)
4. Check for missing values
5. Provide summary statistics for all numeric columns

[Upload trial_data.csv]

ChatGPT will generate Python code like:

import pandas as pd
import numpy as np

# Load data
df = pd.read_csv('trial_data.csv')

# First 10 rows
print(df.head(10))

# Dataset info
print(f"\nDataset shape: {df.shape}")
print(f"\nData types:\n{df.dtypes}")

# Missing values
print(f"\nMissing values:\n{df.isnull().sum()}")

# Summary statistics
print(f"\nSummary statistics:\n{df.describe()}")

And execute it automatically.

Step 2: Create Table 1

Prompt:

Create a Table 1 comparing baseline characteristics by treatment arm:
- Age (mean ± SD)
- Sex (n, %)
- Baseline score (mean ± SD)

Include p-values (t-test for continuous, chi-square for categorical).
Format as a clean markdown table.

ChatGPT will generate code using scipy.stats for tests and format results. The output needs review — check that tests are appropriate for your data.

Step 3: Primary Analysis

Prompt:

Analyze the primary outcome (change in score from baseline to follow-up) 
by treatment arm.

1. Calculate change score (followup - baseline) for each patient
2. Compare change scores between arms using independent t-test
3. Report: mean change ± SD for each arm, difference, 95% CI, p-value
4. Create a box plot comparing change scores by arm

Critical: Verify the analysis plan matches your protocol. AI might suggest t-test when you need Mann-Whitney, or ignore paired structure.

Step 4: Handle Missing Data

Prompt:

I have missing follow-up scores for 15 patients. 

Suggest appropriate approaches for handling missingness and implement 
complete case analysis (excluding patients with missing follow-up).

Then perform sensitivity analysis using multiple imputation (5 imputations).

Warning: AI will generate code, but you must verify:

  • Is missingness truly at random (MAR)?
  • Are complete case analysis assumptions met?
  • Are imputation model specifications appropriate?

Don't blindly trust AI recommendations on missing data.

Practical Workflow: Regression Analysis

Linear Regression

Prompt:

Run a multivariable linear regression predicting follow-up score from:
- Treatment arm (reference = control)
- Age (continuous)
- Sex (reference = male)
- Baseline score (continuous)

Report:
- Coefficient estimates with 95% CI
- P-values
- R-squared
- Check model assumptions (residual plots, normality)

ChatGPT will generate:

from sklearn.linear_model import LinearRegression
from scipy import stats
import statsmodels.api as sm

# Prepare data
X = pd.get_dummies(df[['treatment_arm', 'age', 'sex', 'baseline_score']], 
                   drop_first=True)
y = df['followup_score']

# Fit model
model = sm.OLS(y, sm.add_constant(X)).fit()

# Results
print(model.summary())

# Assumption checks
import matplotlib.pyplot as plt
fig, axes = plt.subplots(1, 2, figsize=(12, 5))

# Residual plot
axes[0].scatter(model.fittedvalues, model.resid)
axes[0].axhline(y=0, color='r', linestyle='--')
axes[0].set_xlabel('Fitted values')
axes[0].set_ylabel('Residuals')

# Q-Q plot
stats.probplot(model.resid, dist="norm", plot=axes[1])
plt.tight_layout()
plt.show()

You must verify:

  • Reference categories are correct
  • Model assumptions are met (check the plots)
  • Coefficients make clinical sense
  • Sample size is adequate for number of predictors

Logistic Regression

Prompt:

Run logistic regression predicting adverse events (binary outcome) from:
- Treatment arm
- Age
- Sex
- Baseline score

Report odds ratios with 95% CI and p-values.

AI handles this well, but verify:

  • Sample size (at least 10 events per predictor)
  • Convergence (check for warnings)
  • Odds ratios are interpretable in your clinical context

Cox Proportional Hazards (Survival Analysis)

Prompt:

I have survival data with columns: patient_id, treatment_arm, age, sex, 
time_to_event (months), event (1=death, 0=censored).

Run Cox proportional hazards regression predicting overall survival from 
treatment arm, adjusting for age and sex.

Report:
- Hazard ratios with 95% CI
- P-values
- Check proportional hazards assumption
- Create Kaplan-Meier curves by treatment arm

ChatGPT will use lifelines library:

from lifelines import CoxPHFitter, KaplanMeierFitter
import matplotlib.pyplot as plt

# Cox regression
cph = CoxPHFitter()
cph.fit(df[['time_to_event', 'event', 'treatment_arm', 'age', 'sex']], 
        duration_col='time_to_event', event_col='event')

print(cph.summary)

# Check proportional hazards
cph.check_assumptions(df, p_value_threshold=0.05)

# Kaplan-Meier curves
kmf = KaplanMeierFitter()
fig, ax = plt.subplots(figsize=(10, 6))

for arm in df['treatment_arm'].unique():
    mask = df['treatment_arm'] == arm
    kmf.fit(df.loc[mask, 'time_to_event'], 
            df.loc[mask, 'event'], 
            label=arm)
    kmf.plot_survival_function(ax=ax)

plt.xlabel('Time (months)')
plt.ylabel('Survival probability')
plt.title('Kaplan-Meier Curves by Treatment Arm')
plt.show()

You must verify:

  • Proportional hazards assumption is met (check output)
  • Censoring is appropriate
  • Follow-up time is adequate

Data Visualization

AI excels at creating publication-quality figures.

Forest Plots

Prompt:

Create a forest plot showing hazard ratios from a Cox regression with these 
results:

Variable | HR | 95% CI Lower | 95% CI Upper | P-value
---------|----|--------------|--------------|---------
Treatment | 0.65 | 0.45 | 0.93 | 0.018
Age (per year) | 1.02 | 1.00 | 1.04 | 0.042
Male sex | 1.15 | 0.82 | 1.61 | 0.420

Use matplotlib. Make it publication-ready (clear labels, appropriate size).

Box Plots, Violin Plots, Scatter Plots

Prompt:

Create a figure with three subplots:
1. Box plot of change scores by treatment arm
2. Scatter plot of baseline vs follow-up scores (colored by arm)
3. Violin plot of age distribution by treatment arm

Use seaborn. Make it publication-ready.

AI generates clean, customizable code. You can iterate:

Make the font size larger, use colorblind-friendly palette, 
add individual data points to the box plot.

Common Pitfalls (and How to Avoid Them)

Pitfall 1: AI Suggests the Wrong Test

Example: You have paired data (pre/post) but AI suggests independent t-test.

Solution: Explicitly state data structure in your prompt.

Better prompt:

I have paired data (baseline and follow-up for the same patients). 
Run paired t-test comparing scores.

Pitfall 2: AI Ignores Multiple Comparisons

Example: Running 20 t-tests, AI reports raw p-values without correction.

Solution: Ask for correction explicitly.

Prompt:

I'm comparing 15 biomarkers between groups. Run t-tests and 
apply Benjamini-Hochberg FDR correction. Report adjusted p-values.

Pitfall 3: AI Misinterprets Clinical Context

Example: Reports statistically significant but clinically meaningless difference.

Solution: You interpret clinical significance. AI reports statistics.

Pitfall 4: Code Runs But Results Are Wrong

Example: AI uses wrong reference category, flips sign on coefficient.

Solution: Sanity-check results. If treatment effect is negative when you expected positive, verify code.

Pitfall 5: AI Doesn't Check Assumptions

Example: Runs linear regression without checking linearity, homoscedasticity, normality.

Solution: Explicitly request assumption checks in prompt.

R vs Python: Which LLM Handles Better?

Python

  • ChatGPT Code Interpreter: Native Python, executes in browser
  • Best for: pandas data manipulation, scikit-learn ML, quick iteration

R

  • Claude: Better R code generation than ChatGPT
  • Best for: Classical statistics, publication-quality ggplot2 figures, survival analysis

My workflow:

  • Exploratory analysis: ChatGPT Code Interpreter (Python)
  • Final analysis for paper: Claude-generated R code → run in RStudio
  • Complex stats (mixed models, Bayesian): Consult statistician, use AI for code

When to Consult a Real Statistician

AI cannot replace statistical expertise. Consult a statistician for:

  • Study design and sample size calculation
  • Complex analyses (mixed models, structural equation modeling, Bayesian methods)
  • Missing data strategies
  • Interpreting unusual results
  • Causal inference questions
  • When reviewers question your analysis

AI helps you implement analyses. Statisticians help you choose the right analysis.

Debugging Workflow

When code fails:

  1. Copy the full error message
  2. Paste back to AI with prompt:
This code produced an error:

[paste code]

Error message:
[paste error]

What's wrong and how do I fix it?
  1. AI usually fixes it immediately
  2. If fix fails, iterate 2-3 times
  3. If still failing, Google the error or ask a colleague

AI handles 80% of debugging instantly. The remaining 20% requires human problem-solving.

Learning from AI Code

Don't just run code — understand it.

Prompt for learning:

Explain this code line by line in simple terms. What is each function doing?

[paste code]

Over time, you'll recognize patterns and understand what code does without explanation.

Example: Complete Analysis Workflow

Research question: Does intervention reduce hospital readmissions compared to usual care?

Step 1: Exploratory analysis (ChatGPT Code Interpreter)

Upload hospital_readmissions.csv

Dataset has: patient_id, age, sex, comorbidities, treatment_group, 
readmitted_30d (1=yes, 0=no)

1. Summarize patient characteristics
2. Check for missing data
3. Compare baseline characteristics by treatment group (Table 1)

Step 2: Primary analysis (ChatGPT or Claude)

Run logistic regression predicting 30-day readmission from treatment group, 
adjusting for age, sex, and comorbidity count.

Report odds ratio for treatment effect with 95% CI and p-value.

Step 3: Sensitivity analyses

1. Repeat analysis excluding patients with missing comorbidity data
2. Repeat with propensity score matching instead of multivariable adjustment
3. Create subgroup analyses by age (<65 vs ≥65)

Step 4: Visualizations

Create:
1. Forest plot showing OR from main and subgroup analyses
2. Bar chart showing readmission rates by group with error bars

Step 5: Verify and refine

  • Check that all code ran without errors
  • Verify results make clinical sense
  • Iterate on figures for publication quality
  • Copy final code to R/Python script for reproducibility

Total time: 2-3 hours instead of 2-3 days

Reproducibility and Documentation

Critical for reproducibility:

  1. Save all AI-generated code in scripts
  2. Comment code explaining what each section does
  3. Document AI tool and version used
  4. Include prompts in lab notebook
  5. Store raw outputs before any manual editing

Example documentation:

# Analysis conducted using ChatGPT Code Interpreter (GPT-4, Jan 2026)
# Prompt: "Run logistic regression predicting readmission from 
# treatment, age, sex, comorbidities"
# Code edited to change reference group from intervention to control

Limitations and Warnings

  1. AI suggests plausible but wrong analyses — verify appropriateness
  2. AI doesn't understand your data structure — specify paired, clustered, hierarchical explicitly
  3. AI makes statistical errors confidently — don't outsource judgment
  4. AI can't assess clinical significance — you interpret meaning
  5. Code may not be optimal — works doesn't mean efficient
  6. Results may not be reproducible — random seeds, package versions matter

Cost Summary

| Tool | Cost | Best For | Limitation | |------|------|---------|-----------| | ChatGPT Code Interpreter | $20/mo | Python, executes code, visualization | Max file size ~100MB | | Claude Pro | $20/mo | R code, explaining concepts | Doesn't execute code | | Free ChatGPT | Free | Code generation (copy-paste) | Rate limits, less powerful | | Free Claude | Free | R code, statistics questions | Rate limits |

For most researchers: ChatGPT Plus ($20/mo) is sufficient. Add Claude Pro if you use R heavily.


Key Takeaways

  • AI writes working code from plain English — transformative for non-programmers
  • ChatGPT Code Interpreter is best for data analysis — executes Python, iterates automatically
  • Claude is better for R code generation — but can't execute, you copy-paste
  • Use AI to implement analyses you understand — not to decide which analysis to run
  • Always verify statistical assumptions — AI generates code but doesn't check validity
  • AI makes plausible statistical errors — don't outsource judgment to LLMs
  • Debugging is fast — paste errors, get fixes, iterate
  • Consult statisticians for complex analyses — AI supplements, doesn't replace expertise
  • Document AI use for reproducibility — save prompts and code versions
  • Cost is minimal — $20/mo ChatGPT Plus handles most workflows

Enjoyed this guide?

Subscribe to Beam Notes for more insights delivered to your inbox.

Subscribe