Causal Inference in Python¶

Python Packages¶

  • DoWhy
    • most beginner friendly
    • 4-step-interface: model · identify · estimate · refute (test assumptions)
    • backed by Microsoft Research (now under PyWhy organisation, with AWS as a collaborator)
    • uses networkx for the graphs
  • CausalNex
    • not for beginners, more powerful
    • uses Bayesian networks, Do-calculus, causal discovery, ...
    • based on networkx (DiGraph)
  • PyMC (with Probabilistic Programming interface)
    • not for beginners, more powerful
    • general probabilistic programming library, with CausalPy included
      • causal inference in quasi-experiments, Bayesian model fitting
    • no graph import interface

R (packages not reviewed)¶

  • CausalImpact
  • CausalTree
  • tipr (Tipping Point Analyses)
  • R6causal

DoWhy - Effect Inference and Structural Causal Models¶

DoWhy is a Python library for causal inference that supports explicit modeling and testing of causal assumptions using structural causal models (SCMs).

img

Amit Sharma, Emre Kiciman. DoWhy: An End-to-End Library for Causal Inference. 2020. https://arxiv.org/abs/2011.04216

DoWhy - Causal Inference¶

DoWhy focuses on the full pipeline of causal analysis. It requires the user to:

  • explicitly identify the assumptions when building the causal model
    • DoWhy can test the stated assumptions using observed data
  • separate between identification and estimation
    • Identification is the causal problem. Estimation is simply a statistical problem.
  • check for sensitivity and robustness (which itself can be automated by DoWhy)

DoWhy offers interoperability with other packages like EconML (PyWhy) for Machine Learning support, pytorch for causal prediction or CDT for causal discovery.

doWhy Extension - Graphical Causal Models¶

Causal Queries:

  • identifying the root causes of outliers and distributional changes
  • causal structure learning
  • attributing causal influences like "what caused the anomalies in my data?"
  • ...

Examples:

  • quantify arrow strength and causal influence of an ancestor
  • simulate impact of interventions
  • compute counterfactuals
  • estimate ATE/ACE (average causal effects)
  • attribute distribution changes to causes
  • estimate confidence intervals (uses bootstrapping)
    • avoid causal graph retrainings, focus to boostrap on a causal query (use gcm.boostrap_sampling())

Patrick Blöbaum, Peter Götz, Kailash Budhathoki, Atalanti A. Mastakouri, Dominik Janzing. DoWhy-GCM: An extension of DoWhy for causal inference in graphical causal models. 2022. https://arxiv.org/abs/2206.06821

img

Patrick Blöbaum: Performing Root Cause Analysis with DoWhy, a Causal Machine-Learning Library

img

DoWhy-GCM: An Extension of DoWhy for Causal Inference in Graphical Causal Models

Case Study¶

Effect of Sodium Intake on Blood Pressure

Motivation:

  • 63% of Americans aged over 60 have high blood pressure (>=140mmHg)
  • 85% of Americans aged over 50 consume more than 2.3g sodium/day
  • federal recommendation: less than 2.3g sodium/day

Data:

  • (simulated) epidemiological example taken from Luque-Fernandez et al. (2018) with some corrections
  • Outcome Y: (systolic) blood pressure
  • Treatment T: sodium intake
  • Covariates
    • W age
    • Z amount of protein excreted in urine

Variables¶

var type desc
w_age covariate Age (years)
z_prot covariate 24-hour excretion of urinary protein (proteinuria) (mg) (🇩🇪 Proteinurie)
t_sod treatment 24-hour dietary sodium intake (g)
y_sbp outcome Systolic blood pressure (mmHg)

Causal Mechanisms¶

  • z_prot$\leftarrow$y_sbp $\wedge$ z_prot$\leftarrow$t_sod:
    • "high levels of 24-h excretion of urinary protein (proteinuria) are caused by sustained high SBP and increased 24-h dietary sodium intake"
  • w_age$\rightarrow$y_sbp $\wedge$ w_age$\rightarrow$t_sod:
    • "age is a common cause of both high SBP and impaired sodium homeostasis"
  • we are interested in estimating the effect of 24-h dietary sodium intake (in grams) on SBP, adjusting for age.
  • in a realistic scenario, one might control for proteinuria:
    • if physiological factors influencing SBP are not completely understood
    • relationships between variables are not depicted in a DAG
    • proteinuria is conceptualized as a confounder
  • but: controlling for proteinuria (PRO) introduces collider bias
  • we show the paradoxical effect of 24-h dietary sodium intake on SBP after conditioning on a collider (proteinuria).
No description has been provided for this image
  • with DAGs in SCMs we can more easily distinguish between biases resulting from:
    • not conditioning on common causes of exposure and outcome (unadjusted confounding); or
    • conditioning on common effects (collider bias)

Sources¶

  • code and text based on: https://www.bradyneal.com/causal-inference-course
  • R code (primary source): https://academic.oup.com/ije/article/48/2/640/5248195?login=true
  • recommendation: https://www.cdc.gov/salt/about/index.html
  • excess sodium intake (age groups...): https://www.cdc.gov/mmwr/preview/mmwrhtml/mm6452a1.htm
  • 60% high blood pressure (age >60y): https://www.cdc.gov/nchs/products/databriefs/db289.htm

Issues¶

  • causal model is not sufficient

  • potassium is missing, can lower blood pressure:

    Americans and Canadians are consuming only about half the recommended daily amount of potassium: 4.7 g. Potassium blunts the effects of salt, lowers blood pressure, and reduces the risk of kidney stones and bone loss.
  • calorie intake? (maybe mediated by sodium intake)

  • ...

Jupyter Notebook DoWhy Case Study¶