Targeted Learning in Data Science: Causal Inference for Complex Longitudinal Studies (2018)

Mark J. van der Laan and Sherri Rose

Targeted learning methods are critical tools within data science for answering complex statistical questions, including estimands in networks and longitudinal data with time-dependent confounding. We present a scientific roadmap to translate real-world data science applications into formal statistical estimation problems. This is accomplished using the general template of targeted maximum likelihood estimators to construct algorithms that incorporate the state-of-the-art in machine learning for estimation, while still providing valid inference. Standard tools are not currently equipped for these challenges. We include demonstrations with software packages and real data sets, as well as new methodological advances since the publication of the first targeted learning book.

Springer Book Page / Amazon Book Page

Targeted Learning: Causal Inference for Observational and Experimental Data (2011)

Mark J. van der Laan and Sherri Rose

The statistics profession is at a unique point in history. The need for valid statistical tools is greater than ever; data sets are massive, often measuring hundreds of thousands of measurements for a single subject. The field is ready to move towards clear objective benchmarks under which tools can be evaluated. Targeted learning allows (1) the full generalization and utilization of cross-validation as an estimator selection tool so that the subjective choices made by humans are now made by the machine, and (2) targeting the fitting of the probability distribution of the data toward the target parameter representing the scientific question of interest. 

The book is aimed at both statisticians and applied researchers interested in causal inference and general effect estimation for observational and experimental data. Part I is an accessible introduction to super learning and the targeted maximum likelihood estimator, including related concepts necessary to understand and apply these methods. Parts II-IX handle complex data structures and topics applied researchers will immediately recognize from their own research, including time-to-event outcomes, direct and indirect effects, positivity violations, case-control studies, censored data, longitudinal data, and genomic studies.

Springer Book Page / Amazon Book Page

From the Back Cover

"Targeted Learning, by Mark J. van der Laan and Sherri Rose, fills a much needed gap in statistical and causal inference. It protects us from wasting computational, analytical, and data resources on irrelevant aspects of a problem and teaches us how to focus on what is relevant – answering questions that researchers truly care about."  -Judea Pearl, author of "Causality" and professor of computer science at UCLA.

"In summary, this book should be on the shelf of every investigator who conducts observational research and randomized controlled trials. The concepts and methodology are foundational for causal inference and at the same time stay true to what the data at hand can say about the questions that motivate their collection." -Ira B. Tager, professor emeritus of epidemiology at UC Berkeley