Welcome to the Course Series

Bridging Medical Research with Data Science, AI, and Advanced Statistics

Course 1: Computational Methods and Machine Learning for Medical Research

This course focuses on advanced predictive modeling, from classical machine learning to deep learning, with a strong emphasis on medical applications.

  • Regression vs. Classification
  • Parametric, Non-Parametric, and Semi-Parametric Models
  • Bias-Variance Tradeoff & Model Fitting (Overfitting, Underfitting)
  • The Curse of Dimensionality
  • Model Training, Testing, and Cross-Validation Strategies
  • Model Accuracy Metrics (for classification and regression)
  • Bootstrapping and Permutation Testing (for t-statistics, z-statistics)
  • Regression Models: Linear/Multiple Linear, Polynomial/Splines, Regularization (Lasso, Ridge, Elastic Net)
  • Classification Models: Logistic Regression, Support Vector Machines (SVMs), Decision Trees & Random Forest, Generalized Additive Models (GAMs)
  • Dimensionality Reduction: Principal Component Analysis (PCA)
  • Data Visualization and Embedding: t-SNE, UMAP
  • Clustering Algorithms
  • Feature Selection and Engineering Techniques
  • Hyperparameter Tuning and Optimization
  • Model Selection and Comparison Strategies
  • Model Interpretation and Feature Importance (e.g., SHAP, LIME)
  • Building and Interpreting Nomograms
  • Deep Learning Basics: The Gradient Descent Algorithm and Optimizers
  • Introduction to PyTorch for Medical AI
  • Neural Network Architectures: FNNs, CNNs, RNNs & Transformers
  • Medical Imaging and Computer Vision: Image pre-processing and analysis
  • Natural Language Processing (NLP): Using Large Language Models (LLMs)
  • Specialized Applications: Neural Signal Decoding (EEG), Genetic Analysis
  • Using Pre-trained Models and Transfer Learning
  • Creating and Deploying Interactive Models with Shiny

Course 2: Scientific Research Methodology and Medical Statistics

This foundational course covers the entire research lifecycle, from formulating a question to publishing results, including the essential statistical methods required.

  • Research Philosophy and the History of Evidence-Based Medicine
  • Paradigms: Positivist, Interpretivist, Pragmatist, Critical
  • Hierarchy of Evidence and Quality Assessment
  • Quantitative, Qualitative, and Mixed-Methods Research Designs
  • Secondary Research Designs (Intro to Systematic Reviews, Meta-Analyses)
  • Genesis of a Strong Research Question: FINER Criteria
  • The PICO(T) Framework for Clinical Questions
  • Operationalizing Questions into Variables and Hypotheses
  • Literature Review and Identifying Research Gaps
  • Writing a Study Protocol: Aims, Methods, Timeline
  • Data Collection Planning & Data Management (HIPAA)
  • IRB/Ethical Approval and Informed Consent
  • Protocol Registration (e.g., ClinicalTrials.gov)
  • Sampling: Probability and Non-Probability Methods
  • Bias: Selection, Information, Confounding, and Observer Bias
  • Strategies to Avoid Bias: Randomization, Blinding, Matching
  • Sample Size Calculation: Power, Effect Size, Type I/II Errors
  • Descriptive Statistics & Normality Testing
  • Univariate Analysis (Parametric and Non-parametric tests)
  • Categorical Data Analysis (Chi-square, Fisher's Exact Test)
  • Multivariate Analysis (Multiple Linear & Logistic Regression)
  • Statistical Inference: P-values, Confidence Intervals, Hypothesis Testing
  • Causal Inference: Bradford Hill Criteria, DAGs, Propensity Scores
  • Clinical Trials: Phases (I-IV), Adaptive Designs
  • Quality Assessment: CONSORT, STROBE, GRADE System
  • Research Integrity: Identifying Bad Research, Scientific Misconduct
  • The Scientific Peer Review System
  • Scientific Writing: Manuscripts, Case Reports (CARE guidelines)
  • Grant Writing and Research Funding
  • AI as an Intelligent Research Assistant

Course 3: Systematic Review and Meta-analysis with AI

A specialized course on evidence synthesis, integrating modern AI tools to streamline the review process from search to publication.

  • Overview and History of Evidence Synthesis (Cochrane)
  • Types of Reviews: Intervention, Diagnostic, Prognostic
  • PRISMA 2020 Reporting Guidelines
  • Statistical Foundations: Fixed vs. Random Effects, Heterogeneity (I²), Forest Plots
  • Scientific Databases (PubMed, Embase, Cochrane) & Grey Literature
  • Advanced Search Strategy Development (MeSH, Boolean operators)
  • AI-Enhanced Search: Semantic search, automated query expansion
  • Screening Tools: Manual (Rayyan) and AI-Assisted (ASReview)
  • Managing the PRISMA Flow Diagram
  • Traditional vs. AI-Assisted Data Extraction (NLP, NER)
  • Risk of Bias Assessment Tools: RoB 2 (RCTs), ROBINS-I (Observational)
  • Plotting and Visualizing Risk of Bias (Traffic light plots)
  • Pairwise and Single-Arm Meta-Analysis
  • Network Meta-Analysis (NMA)
  • Diagnostic Test Accuracy Meta-Analysis (HSROC)
  • Meta-Regression, Subgroup Analysis, and Trial Sequential Analysis
  • Publication Bias Assessment (Funnel Plots, Egger's Test)
  • Sensitivity Analysis (Leave-one-out)
  • The GRADE Approach for Assessing Certainty of Evidence
  • Agentic Frameworks: Coding (R: meta, metafor; Python: PyMeta) vs. No-Code Platforms
  • Writing the Results and Preparing the Manuscript for Publication

Course 4: Survival Analysis and Individual Reconstructed Data Meta-analysis

This course delves into time-to-event analysis and the advanced technique of reconstructing and pooling individual patient data from published studies.

  • Time-to-Event Data, Censoring, Survival and Hazard Functions
  • Data Structure for Survival Analysis
  • Kaplan-Meier Curves and Survival Probability Estimation
  • Estimating Median Survival Time
  • Comparing Survival Between Groups: Log-Rank Test
  • Interpreting Hazard Ratios
  • Building Multivariate Cox Regression Models
  • Assessing the Proportional Hazards Assumption
  • Time-Dependent Covariates & Landmark Analysis
  • Competing Risks Analysis (Fine-Gray model)
  • Conditional Survival and Smooth Survival Plots
  • Parametric Survival Models (Weibull, etc.)
  • Advantages of IPD over Aggregate Data
  • One-stage vs. Two-stage IPD Meta-analysis
  • Digitizing Kaplan-Meier Curves (Guyot Algorithm)
  • Reconstructing IPD using R/Python tools
  • Validating and Assessing Quality of Reconstructed Data
  • Performing Meta-Analysis with Reconstructed IPD

Course 5: R and Python Programming for Medical Research

A practical, hands-on course designed to equip researchers with the fundamental programming skills needed for data analysis in R and Python.

  • R Basics: Data Types, Functions, Control Structures
  • Data Engineering with the Tidyverse (dplyr, tidyr)
  • Data Visualization with ggplot2
  • Python Basics for Data Science
  • Data Manipulation with Pandas and NumPy
  • Introduction to PyTorch for Deep Learning
  • Data Pre-processing and Handling Missing Data (Imputation)
  • Data Transformation and Encoding
  • Balancing Data: Upsampling, Downsampling, Bootstrapping
  • Descriptive Statistics and Normality Testing in R/Python
  • Implementing Parametric and Non-parametric Tests
  • Bootstrapping for Confidence Intervals
  • Permutation Tests
  • Using Models like Gemini for automated data extraction
  • Using Models like BioMistral for prognostic tasks

Course 6: Retrospective Databases Data Analysis in R and Python

This course provides a practical guide to working with large, real-world retrospective databases like SEER and NIS, covering the entire workflow from data extraction to advanced analysis.

  • Overview of Major Databases: NSQIP, SEER, NIS
  • Data Extraction Techniques and Data Dictionaries
  • Ethical and Methodological Challenges
  • Data Filtering and Subsetting Techniques
  • Advanced Data Cleaning Strategies
  • Dealing with Missing Data in Large Datasets
  • Developing a Statistical Analysis Plan (SAP)
  • Controlling for Confounding
  • Propensity Score Matching (PSM)
  • Propensity Score Weighting (PSW)
  • Assessing Covariate Balance After Matching
  • Applying Multiple and Logistic Regression Models
  • Model Building and Selection in a High-Dimensional Setting
  • Interpreting Results from Observational Database Studies