Targeted Learning in Data Science: Causal Inference for Complex Longitudinal Studies (2018)

Mark J. van der Laan and Sherri Rose

Targeted learning methods are critical tools within data science for answering complex statistical questions, including estimands in networks and longitudinal data with time-dependent confounding. We present a scientific roadmap to translate real-world data science applications into formal statistical estimation problems. This is accomplished using the general template of targeted maximum likelihood estimators to construct algorithms that incorporate the state-of-the-art in machine learning for estimation, while still providing valid inference. Standard tools are not currently equipped for these challenges. We include demonstrations with software packages and real data sets, as well as new methodological advances since the publication of the first targeted learning book.

Springer Book Page

Targeted Learning: Causal Inference for Observational and Experimental Data (2011)

Mark J. van der Laan and Sherri Rose

The statistics profession is at a unique point in history. The need for valid statistical tools is greater than ever; data sets are massive, often measuring hundreds of thousands of measurements for a single subject. The field is ready to move towards clear objective benchmarks under which tools can be evaluated. Targeted learning allows (1) the full generalization and utilization of cross-validation as an estimator selection tool so that the subjective choices made by humans are now made by the machine, and (2) targeting the fitting of the probability distribution of the data toward the target parameter representing the scientific question of interest.

The book is aimed at both statisticians and applied researchers interested in causal inference and general effect estimation for observational and experimental data. Part I is an accessible introduction to super learning and the targeted maximum likelihood estimator, including related concepts necessary to understand and apply these methods. Parts II-IX handle complex data structures and topics applied researchers will immediately recognize from their own research, including time-to-event outcomes, direct and indirect effects, positivity violations, case-control studies, censored data, longitudinal data, and genomic studies.

Springer Book Page