Career Profile

I am a PhD student in Statistics at Texas A&M with hands-on experience in predictive modeling, uncertainty quantification, and Bayesian methods. My research focuses on developing and applying Bayesian approaches for prevalence estimation, diagnostic accuracy evaluation, and latent class modeling, with growing interest in applications to clinical trial design and simulation-based inference.

Experiences

PhD Statistics and Data Science Internship

Summer 2025
Lubrizol Corporation, Wickliffe, OH
  • Developed and deployed credible intervals for predictive models, improving reliability of forecast.
  • Designed repeatability and reproducibility experiments, strengthening product validation.
  • Built predictive models linking chemical composition to transmission fluid performance to guide the data-driven product development decisions.

Biostatistics Research Assistant

Aug 2021 – May 2023
Institute of Biosciences and Technology, Texas A&M University (Dr. Kurt Zhang Lab), Houston, TX
  • Applied high-dimensional regression and clustering methods to DNA methylation data, identifying biomarkers linked to disease pathways.
  • Analyzed NHANES data to evaluate dietary risk factors for hypertension in pregnancy, contributing to peer-reviewed publications.

Projects

Selected projects demonstrating applications of Bayesian modeling, causal inference, and reproducible methods in health and clinical trial contexts.

Bayesian Latent Severity - We propose a Bayesian latent class model for disease diagnosis in the absence of a gold standard test, designed for field epidemiologic studies where only a limited number of assays are available.
Double Machine Learning in Clinical Trials - Simulated randomized controlled trial with non-compliance (PLIV) and NHANES observational analysis (IRM).

Applied Statistical Modeling

Selected projects highlighting advanced modeling frameworks for complex data, including zero-inflated count regression and spatio-temporal clustering analyses.

Spatio-Temporal Clustering of NYC Taxi Demand - A data visualization and clustering analysis of NYC taxi traffic movement patterns to uncover spatial and temporal demand structures.
Zero-Inflated Negative Binomial Regression for Disease Counts - A regression model for overdispersed and zero-inflated infection counts, identifying key predictors.

Publications

Below is a selection of my published work in biological and statistical research. My contributions span areas such as epigenomics, metabolic disease, and quantitative methods for diagnostic accuracy and prevalence estimation. These publications reflect my interdisciplinary approach at the intersection of data science, biostatistics, and biomedical research.

  • Epigenome-wide analysis of aging effects on liver regeneration
  • Wang, J., Zhang, W., Liu, X., Kim, M., Ke, Z., Tsai, R.
    BMC Biology, 21:30 (2023)
  • Maternal One-Carbon Supplement Reduced the Risk of Non-Alcoholic Fatty Liver Disease in Male Offspring
  • Peng, H., Xu, H., Wu, J., Li, J., Wang, X., Liu, Z., Kim, M., Jeon, M.S., Zhang, K.K., Xie, L.
    Nutrients, 14(12):2545 (2022)

    Skills & Proficiency

    R (tidyverse, data.table, ggplot2)

    Bayesian modeling (including Stan/rstan, brms, JAGS)

    Reproducible analysis (R Markdown, Quarto, Git/GitHub)

    Python (scientific stack basics)