Focused Tutorials
Table of contents
- Overview
- Tutorial 1: Introduction to NLP in Medicine
- Tutorial 2: Generative AI in Medicine
- Tutorial 3: Medical Image Analysis
- Tutorial 4: Supervised Fine-Tuning and Reinforcement Learning for LLMs
- Tutorial 5: Radiology Report Generation with Multimodal LLMs
- Tutorial 6: Multimodal Learning with EHRs (Clinical Events + Notes, Optional Imaging)
Overview
Through the tutorials, you will apply AI methods to healthcare problems using open-source datasets. You will build models, evaluate performance, and visualize results.
Each tutorial is designed to be easy to follow so you can start running the code quickly. The tutorials cover core areas such as natural language processing, medical image analysis, graph neural networks, generative models, LLMs, and biological and clinical foundation models. They complement the lecture material by showing how these methods behave on real data, including where they work well and where they fail.
Working with real-world datasets will strengthen your coding and data analysis skills and help you practice model evaluation. The tutorials also support the course project and provide a practical baseline you can reuse in future research.
Tutorial 1: Introduction to NLP in Medicine
Speaker
Datasets
- Clinical notes dataset (e.g., MIMIC-III or de-identified open-source clinical datasets).
Tasks
- Preprocessing Medical Text Data
- Tokenize text into sentences and words using NLP libraries like spaCy or NLTK.
- Normalize text by handling abbreviations, medical jargon, and common typos.
- Annotate datasets with relevant medical entities using Named Entity Recognition (NER).
- Fine-Tuning Pre-trained Transformers
- Use Hugging Face Transformers to fine-tune ClinicalBERT or similar models.
- Train the model on specific clinical tasks like disease classification or drug identification.
- Parameter-Efficient Fine-Tuning with LoRA
- Apply Low-Rank Adaptation (LoRA) to reduce computational requirements.
- Compare model performance between full fine-tuning and LoRA.
- Text Generation with LLMs
- Use GPT-based models to summarize medical notes or generate clinical trial matches.
- Experiment with controlling output length and quality using temperature and sampling parameters.
- Advanced NLP Applications (Stretch Task)
- Implement a pipeline for de-identifying clinical notes using trained NER models.
- Use a GPT model to answer medical questions based on clinical scenarios.
Skills Developed
- Mastery of medical text preprocessing techniques.
- Practical experience fine-tuning transformers for clinical NLP tasks.
- Hands-on introduction to efficient NLP model training.
Tutorial 2: Generative AI in Medicine
Speaker
Datasets
- Medical imaging datasets (e.g., CheXpert or the Medical Segmentation Decathlon).
Tasks
- Building Variational Autoencoders (VAEs)
- Implement a VAE using PyTorch for generating synthetic medical images.
- Visualize the learned latent space to understand its representation of data variability.
- Building Generative Adversarial Networks (GANs)
- Train a GAN to generate synthetic medical images (e.g., X-rays or MRI scans).
- Evaluate the quality of generated images using Fréchet Inception Distance (FID).
- Exploring Generative Models for Text
- Use a GPT model to generate synthetic clinical records.
- Compare synthetic records to real notes and evaluate semantic coherence.
- Understanding Data Privacy in Generative AI
- Experiment with techniques to assess privacy risks, such as membership inference attacks.
- Discuss how synthetic data can mitigate privacy concerns in healthcare.
- Model Comparison (Stretch Task)
- Compare VAEs and GANs for their ability to generate realistic medical data.
- Explore how generated data can be used to augment training datasets for downstream tasks.
Skills Developed
- Building and evaluating generative models for healthcare applications.
- Understanding privacy implications and synthetic data generation.
Tutorial 3: Medical Image Analysis
Speaker
Datasets
- Datasets from the Medical Segmentation Decathlon (e.g., liver or brain scans).
Tasks
- Image Preprocessing
- Load, resize, and normalize medical images using Python libraries like Pillow or OpenCV.
- Visualize medical images and perform basic exploratory analysis.
- Image Classification and Regression
- Train a CNN to classify medical images into diagnostic categories (e.g., tumor or non-tumor).
- Perform regression analysis to predict risk scores based on image features.
- Image Segmentation with U-Net
- Build and train a U-Net model for segmenting anatomical regions in medical images.
- Evaluate segmentation performance using the Dice coefficient and Jaccard index.
- Experimentation and Model Tracking
- Use Weights & Biases to log model performance and compare experiments.
- Optimize model performance through hyperparameter tuning.
- Advanced Segmentation Tasks (Stretch Task)
- Perform multi-class segmentation for complex medical images (e.g., segmenting organs and tumors).
- Explore using pre-trained models for transfer learning in segmentation tasks.
Skills Developed
- Understanding medical image preprocessing, classification, and segmentation.
- Hands-on experience with CNNs and U-Net architectures.
Tutorial 4: Supervised Fine-Tuning and Reinforcement Learning for LLMs
Speaker
Description
This tutorial introduces fine-tuning of large language models for medical applications through hands-on exercises. Students will learn how to fine-tune models using full-parameter approaches, including reinforcement learning methods such as GRPO, PPO, and DAPO, as well as parameter-efficient techniques such as LoRA. The tutorial covers both the underlying principles and practical implementation, with an emphasis on code-level workflows.
In addition to methods, the tutorial reviews recent research on large language models in medical AI and ends with a discussion of open challenges and directions for future work.
Tutorial 5: Radiology Report Generation with Multimodal LLMs
Speaker
Datasets
- Paired medical image–report datasets (e.g., chest X-ray datasets with free-text radiology reports).
Tasks
- Understanding the Report Generation Task
- Overview of radiology report structure (Findings vs. Impression)
- Common failure modes: omissions, hallucinations
- Why report generation differs from generic image captioning
- Baseline Models for Report Generation
- Implement an image-to-text model using a vision encoder and LLM-based decoder
- Using Hugging Face Transformers
- Learn the Hugging Face Transformers library for model loading, training, and inference
- Fine-tune pre-trained Transformer decoders for medical report generation
- Evaluation of Generated Reports
- Apply standard NLP metrics (BLEU, ROUGE, BERTScore)
- Analyze qualitative examples where automated metrics fail to reflect clinical quality
- Introduce impression-level and finding-level evaluation concepts
Skills Developed
- Practical use of the Hugging Face Transformers library for multimodal modeling
- Building and training radiology report-generation systems
- Evaluating medical text generation beyond surface-level NLP metrics
- Understanding clinical constraints in medical AI applications
Tutorial 6: Multimodal Learning with EHRs (Clinical Events + Notes, Optional Imaging)
Speaker
Datasets
- MIMIC-IV (structured EHR + de-identified free-text clinical notes) (Nature)
- Optional imaging add-on: MIMIC-CXR (chest X-rays with free-text radiology reports) or MIMIC-CXR-JPG (PhysioNet)
Tasks
- Cohort + Prediction Setup (Warm-up)
- Define a simple cohort (for example, adult ICU stays) and a prediction target (for example, in-hospital mortality or prolonged LOS) using structured MIMIC-IV tables (PhysioNet)
- Create a clean train/val/test split at the patient level (avoid leakage)
- Modality 1: Structured EHR Feature Pipeline
- Build time-aligned features from vitals, labs, and interventions (first 24 hours, summary stats, trend features)
- Establish a strong baseline model (logistic regression or gradient boosted trees)
- Modality 2: Clinical Notes Representations
- Basic text cleaning and note selection (which note types, what time window)
- Create note embeddings (simple baseline: TF-IDF; stronger baseline: pretrained clinical transformer embeddings)
- Train a notes-only baseline model
- Multimodal Fusion Models (Core)
- Late fusion baseline: concatenate structured features + note embeddings, train a single classifier
- Attention or transformer-based fusion: fuse structured time series representations with note representations
- Reproduce the key idea behind a multimodal transformer that combines structured EHR and notes for outcome prediction (PMC)
- Evaluation + What Actually Changes with Multimodality
- Compare structured-only vs notes-only vs fused models (AUROC, AUPRC, calibration)
- Do a short error analysis: cases where notes add signal, cases where they add noise
- Practical discussion: missingness, note availability, and real-world deployment tradeoffs
- Advanced (Stretch Task): Add Imaging
- Use pretrained image embeddings from MIMIC-CXR (or use provided embeddings to save time) and fuse as a third modality (PhysioNet)
- Ablate: structured + notes vs structured + notes + imaging, and interpret when imaging helps
Skills Developed
- Building an end-to-end multimodal clinical ML pipeline from raw EHR tables and text (Nature)
- Representation learning for clinical notes and principled fusion with structured data (PMC)
- Strong experimental practice: leakage control, ablations, calibration, and failure-mode analysis
- Optional exposure to tri-modal modeling with imaging using MIMIC-CXR (PhysioNet)