Healthcare is rapidly becoming a data-intensive discipline, driven by increasing digitization of health data, novel measurement technologies, and new policy-based incentives. Critical decisions about whom and h ow to treat can be made more precisely by layering an individual’s data over that from a population. In this talk, I will begin by introducing the types of health data currently being collected and the challenges associated with learning models from these data. Next, I will describe new techniques that leverage probabilistic methods and counterfactual reasoning for tackling the aforementioned challenges. Finally, I will introduce areas where statistical machine-learning techniques are leading to new classes of computational diagnostic and treatment planning tools—tools that tease out subtle information from “messy” observational datasets, and provide reliable inferences given detailed context about the individual patient.