Associate Professor of Biostatistics, Harvard T.H. Chan School of Public Health
Email: rmukherj@hsph.harvard.edu
I am an Associate Professor in the Department of Biostatistics at Harvard T.H. Chan School of Public Health. Previously, I was an Assistant Professor in the Division of Biostatistics at UC Berkeley following my time as a Stein Fellow in the Department of Statistics at Stanford University. I obtained my PhD in Biostatistics from Harvard University, advised by Prof. Xihong Lin.
I am generally interested in understanding broad aspects of causal inference in observational studies in modern data settings, with a focus on learning about fundamental challenges in the statistical analysis of environmental mixtures and their effects on the cognitive development of children and cognitive decline in aging populations. My research is also motivated by learning through applications in large-scale genetic association studies, developing statistical methods to quantify the effects of climate change on human health, and understanding the effects of homelessness on human health.
Lee, S., Mukherjee, R., Mukherjee, S(2025). Inference on Gaussian mixture models with dependent labels arXiv preprint https://arxiv.org/abs/2510.06501 (link)
Chen, X., Liu, L., Mukherjee, R.(2025). Method-of-Moments Inference for GLMs and Doubly Robust Functionals under Proportional Asymptotics arXiv preprint https://arxiv.org/abs/2408.06103. (link)
McGrath,S., Mukherjee, R.(2025). Nuisance Function Tuning and Sample Splitting for Optimal Doubly Robust Estimation arXiv preprint https://arxiv.org/abs/2212.14857. (link)
Bhattacharya,S., Dey, R., Mukherjee, R.(2024). PC Adjusted Testing for Low-Dimensional Parameters arXiv preprint https://arxiv.org/abs/2209.10774 (link)
Mukherjee, R., Sen, S.(2020). On Minimax Exponents of Sparse Testing arXiv preprint https://arxiv.org/abs/2003.00570 (link)
Levis, A., Mukherjee, R., Wang, R., & Haneuse, S. (2025). Robust causal inference for point exposures with missing confounders. Canadian Journal of Statistics, 53(2), e11832.
Jiang, K., Mukherjee, R., Sen, S., & Sur, P. (2025). A New Central Limit Theorem for the Augmented IPW Estimator: Variance Inflation, Cross-Fit Covariance, and Beyond. The Annals of Statistics, 53(2), 647–675.
Sun, S., Haneuse, S., Levis, A., Lee, C., Arterburn, D., Fischer, H., Shortreed, S.M., & Mukherjee, R. (2025). Estimating weighted quantile treatment effects with missing outcome data by double sampling. Biometrics, 81(2), ujaf038.
Bhattacharya, S., Mukherjee, R., & Ray, G. (2025). Sharp Signal Detection Under Ferromagnetic Ising Models. IEEE Transactions on Information Theory.
Bhattacharya, S., Mukherjee, R., & Ogburn, B. (2025). Nonsense associations in Markov random fields with pairwise dependence. Biometrika, asaf041.
McGrath, S., Mukherjee, D., Mukherjee, R., & Wang, Z. (2025). Optimal Nuisance Function Tuning for Doubly Robust Functional Estimation under Proportional Asymptotics. NeurIPS (Spotlight).
Mukherjee, R.(2025). From Univariate Analysis to Global Inference Harvard Data Science Review.
Wu, B., Vyas,C., Medina,A., Slopen, N., Mahalingaiah, S., Chavarro, J., Mukherjee, R., Weisskopf,M., & Roberts, A. (2025). Estimating the associations between women’s maltreatment in childhood and inflammatory biomarker levels prior to and during pregnancy PLOS One .
Tang, I., Knekt,P., Rantakokko,P., Heliövaara,M., Rissanen,H., Ruokojärvi, P., Mukherjee, R., & Weisskopf,M. (2025). Pre-disease biomarkers of persistent organic pollutants (POPs) and amyotrophic lateral sclerosis (ALS) risk in Finland Environmental Health Perspectives .
Benz, L., Mukherjee, R., Wang, R., Arterburn, D., Fischer, H., Lee, C., Shortreed, S.M., & Haneuse, S. (2024). Adjusting for Selection Bias Due to Missing Eligibility Criteria in Emulated Target Trials. American Journal of Epidemiology, kwae471.
Levis, A., Mukherjee, R., Wang, R., & Haneuse, S. (2024). Double sampling and semiparametric methods for informatively missing data. Statistics in Medicine, 43(30), 6086–6098.
Deb, N., Mukherjee, R., Mukherjee, S., & Yuan, M. (2024). Detecting Structured Signals in Ising Models. The Annals of Applied Probability, 34(1A), 1–45.
Laha, N., Sonabend, A., Mukherjee, R., & Cai, T. (2024). Finding the Optimal Dynamic Treatment Regime under Fisher Consistent Surrogate Loss. The Annals of Statistics, 52(2), 679–707.
Liu, L., Mukherjee, R., & Robins, J. (2024). Assumption-lean falsification tests of rate double-robustness of double-machine-learning estimators. Journal of Econometrics, 240(2), 105500.
Farmer, J., Specht, A., Pushon, T., Jackson, B., Bidlack, F., Bakalar, C., Mukherjee, R., Davis, M., Steadman, D., & Weisskopf, M. (2024). Lead exposure across the life course and age of death. Science of The Total Environment, 927, 171975.
Bhattacharya, B., & Mukherjee, R. (2024). Sparse Uniformity Testing. IEEE Transactions on Information Theory.
Chhor, J., Mukherjee, R., & Sen, S. (2023). Sparse Signal Detection in Heteroscedastic Gaussian Sequence Models: Sharp Minimax Rates. Bernoulli, 30(3), 2127–2153.
Sonabend, A., Laha, N., Cai, T., & Mukherjee, R. (2023). Semi-Supervised Off Policy non-Markovian Reinforcement Learning. Journal of Machine Learning Research, 24(323), 1–86.
McGrath, S., Mukherjee, R., Requia, W. J., & Lee, W. L. (2023). Wildfire exposure and academic performance in Brazil: a causal inference approach for spatiotemporal data. Science of The Total Environment, 905, 167625.
Laha, N., Huey, N., Coull, B., & Mukherjee, R. (2023). On Statistical Inference with High Dimensional Sparse CCA. Information and Inference, 12(4), 2818–2850.
Hou, J., Mukherjee, R., & Cai, T. (2023). Efficient and Robust Semi-supervised Estimation of ATE with Partially Annotated Treatment and Response. Journal of Machine Learning Research, 24(265), 1–58.
Ho, C.-H., Huang, Y.-J., Lai, Y.-J., Mukherjee, R., & Hsiao, C.-H. (2022). The misuse of distributional assumptions in functional class scoring gene-set and pathway analysis. G3: Genes|Genomes|Genetics, 12(1), jkab365.
Huang, Y.-J., Mukherjee, R., & Hsiao, C.-H. (2022). Probabilistic Edge Inference of Gene Networks with Bayesian Markov Random Field Modeling. Frontiers in Genetics, 13, 1034946.
Deng, W., Cocker, B., Mukherjee, R., Liu, J., & Coull, B. (2022). Towards a Unified Framework for Uncertainty-aware Nonlinear Variable Selection with Theoretical Guarantees. NeurIPS, 35, 27636–27651.
Laha, N., & Mukherjee, R. (2022). On Support Recovery With Sparse CCA: Information Theoretic and Computational Limits. IEEE Transactions on Information Theory, 69(3), 1695–1738.
Khorasanizadeh, M., Maroufi, S. F., Mukherjee, R., Sankaranarayanan, M., & Moore, J. (2022). Middle Meningeal Artery Embolization in Adjunction to Surgical Evacuation for Treatment of Subdural Hematomas: a Nationwide Comparison of Outcomes with Isolated Surgical Evacuation. Neurosurgery, 10–1227.
Mukherjee, R., & Sen, S. (2021). Testing Degree Corrections in Stochastic Block Models. Annales de l’Institut Henri Poincaré, Probabilités et Statistiques, 57(3), 1583–1635.
Mukherjee, R., & Ray, G. (2021). On Testing for Parameters in Ising Models. Annales de l’Institut Henri Poincaré, Probabilités et Statistiques, 58(1), 164–187.
Liu, R., Mukherjee, R., & Robins, J. (2021). On Adaptive Estimation of Nonparametric Functionals. Journal of Machine Learning Research, 22(1), 4507–4572.
Requia, W. J., Amini, H., Mukherjee, R., Gold, D., & Schwartz, J. (2021). Health impacts of wildfire-related air pollution in Brazil: A nationwide study of more than 2 million hospital admissions between 2008 and 2018. Nature Communications, 12(1), 6555.
Requia, W. J., Papatheodorou, S., Koutrakis, P., Mukherjee, R., & Roig, H. L. (2021). Increased preterm birth following maternal wildfire smoke exposure in Brazil. International Journal of Hygiene and Environmental Health, 240, 113901.
Lin, L., Mukherjee, R., & Robins, J. M. (2020). On nearly assumption-free tests of nominal confidence interval coverage for causal parameters estimated by machine learning. Statistical Science, 35(3), 518–539.
Lin, L., Mukherjee, R., & Robins, J. M. (2020). Rejoinder to “On nearly assumption-free tests of nominal confidence interval coverage for causal parameters estimated by machine learning.” Statistical Science, 35(3), 518–539.
Han, Y., Jiao, J., & Mukherjee, R. (2020). On estimation of ℓp-norms in Gaussian white noise models. Probability Theory and Related Fields, 177, 1243–1294.
Mukherjee, R., & Sen, B. (2019). On Efficiency of Plug-In Principles for Estimating Smooth Integral Functionals of a Non-increasing Density. Electronic Journal of Statistics, 13(2), 4416–4448.
Mukherjee, R., Mukherjee, S., & Sen, S. (2018). Detection Thresholds for the β-Model on Sparse Graphs. The Annals of Statistics, 46(3), 1288–1317.
Mukherjee, R., Mukherjee, S., & Yuan, M. (2018). Global Testing against Sparse Alternatives under Ising Models. The Annals of Statistics, 46(5), 2062–2093.
Barnett, I., Mukherjee, R., & Lin, X. (2017). Generalized Higher Criticism for SNP sets in Genetic Association Testing. Journal of the American Statistical Association, 112(517), 64–76.
Basu, K., & Mukherjee, R. (2017). Asymptotic Normality of Scrambled Geometric Net Quadrature. The Annals of Statistics, 45(4), 1759–1788.
Robins, J. M., Li, L., Mukherjee, R., Tchetgen Tchetgen, E., & van der Vaart, A. (2017). Minimax Estimation of a Functional in a Structured High Dimensional Model. The Annals of Statistics, 45(5), 1951–1987.
Mukherjee, R., & Sen, S. (2017). Optimal Adaptive Inference in Random Design Binary Regression. Bernoulli, 24(1), 699–739.
Mukherjee, R., Pillai, N., & Lin, X. (2015). Hypothesis Testing for High-Dimensional Sparse Binary Regression. The Annals of Statistics, 43(1), 352–381.