Modified Causal Forests#
Welcome to the documentation of mcf, the Python package implementing the Modified Causal Forest introduced by Lechner (2018). This package allows you to estimate heterogeneous treatment effects for binary and multiple treatments from experimental or observational data. Additionally, mcf offers the capability to learn optimal policy allocations.
If you’re new to the mcf package, we recommend following these steps:
Installation Guide: Learn how to install mcf on your system.
Usage Example: Explore a simple example to quickly understand how to apply mcf to your data.
Getting started: Dive into a more detailed example to get a better feel for working with mcf.
For those seeking further information:
The User Guide offers explanations on additional features of the mcf package and provides several example scripts.
Check out the Python API for details on interacting with the mcf package.
The Algorithm Reference provides a technical description of the methods used in the package.
Installation Guide#
The current version of mcf is compatible with Python 3.12. You can install mcf from PyPI using:
pip install mcf
For a smoother experience and to avoid conflicts with other packages, we strongly recommend using a virtual environment based on conda.
You can manage conda environments either via the command line or a graphical interface. The command line offers a compatible solution for all operating systems, making it our recommended choice. However, the graphical interface is more user-friendly.
If you prefer to use the command line, first install conda as described here. Next follow the steps below in your Anaconda Prompt (Windows) or terminal (macOS and Linux):
Set up and activate a conda environment named mcf-env:
conda create -n mcf-envconda activate mcf-env
Install Python 3.12:
conda install Python="3.12"
Finally, install mcf in this environment using pip:
pip install mcf
If you prefer a graphical interface, you can:
Install Anaconda distribution including Anaconda navigator by downloading it here.
Set up an environment, follow the guide here and make sure you choose Python=3.12.5 for your environment.
Install the mcf package by using pip install in your IDE console:
pip install mcf
An alternative to the step above is to install the mcf package using this guide here.
Note: It is recommended to prioritize conda install
for package installations before using pip install
. On a Windows machine, if you plan to use Spyder as your IDE, make sure to execute conda install spyder
before proceeding with pip install mcf
to reduce the risk of errors during installation.
Usage Example#
To demonstrate how to use mcf, we will use the example_data()
function to generate synthetic datasets for training and prediction, and subsequently apply the ModifiedCausalForest
.
import numpy as np
import pandas as pd
from mcf.example_data_functions import example_data
from mcf import ModifiedCausalForest
from mcf import OptimalPolicy
from mcf import McfOptPolReport
# Generate example data using the built-in function `example_data()`
training_df, prediction_df, name_dict = example_data()
# Create an instance of the Modified Causal Forest model
my_mcf = ModifiedCausalForest(
var_y_name="outcome", # Outcome variable
var_d_name="treat", # Treatment variable
var_x_name_ord=["x_cont0", "x_cont1", "x_ord1"], # Ordered covariates
var_x_name_unord=["x_unord0"], # Unordered covariate
_int_show_plots=False # Disable plots for faster performance
)
# Train the Modified Causal Forest on the training data
my_mcf.train(training_df)
# Predict treatment effects using the model on prediction data
results = my_mcf.predict(prediction_df)
# The `results` object is a tuple with two elements:
# 1. A dictionary containing all estimates
results[0]
# 2. A string with the path to the results location
results[1]
# Extract the dictionary of estimates
results_dict = results[0]
# Access the Average Treatment Effect (ATE)
ate_array = results_dict.get('ate')
print("Average Treatment Effect (ATE):\n", ate_array)
# Access the Standard Error of the ATE
ate_se_array = results_dict.get('ate_se')
print("\nStandard Error of ATE:\n", ate_se_array)
# Access the Individualized Treatment Effects (IATE)
iate_array = results_dict.get('iate')
print("\nIndividualized Treatment Effects (IATE):\n", iate_array)
# Access the DataFrame of Individualized Treatment Effects
iate_df = results_dict.get('iate_data_df')
print("\nDataFrame of Individualized Treatment Effects:\n", iate_df)
# Create an instance of the OptimalPolicy class:
my_optimal_policy = OptimalPolicy(
var_d_name="treat",
var_polscore_name=['y_pot0', 'y_pot1', 'y_pot2'],
var_x_name_ord=["x_cont0", "x_cont1", "x_ord1"],
var_x_name_unord=["x_unord0"]
)
# Learn an optimal policy rule using the predicted potential outcomes
alloc_train_df, _, _ = my_optimal_policy.solve(training_df, data_title='training')
# Evaluate the optimal policy rule on the training data:
results_eva_train, _ = my_optimal_policy.evaluate(alloc_train_df, training_df,
data_title='training')
# Allocate observations to treatment state using the prediction data
alloc_pred_df, _ = my_optimal_policy.allocate(prediction_df, data_title='prediction')
# Evaluate allocation with potential outcome data.
results_eva_pred, _ = my_optimal_policy.evaluate(alloc_pred_df, prediction_df,
data_title='prediction')
# Allocation DataFrame for the training set
print(alloc_train_df)
# Produce a PDF-report that summarises the results
my_report = McfOptPolReport(mcf=my_mcf,
optpol=my_optimal_policy,
outputfile='mcf_report')
my_report.report()
For a more detailed example, see the Getting started section.
Source code and contributing#
The Python source code is available on GitHub. If you have questions, want to report bugs, or have feature requests, please use the issue tracker.
References#
Conceptual foundation#
Lechner M (2018). Modified Causal Forests for Estimating Heterogeneous Causal Effects. Read Paper
Lechner M, Mareckova J (2022). Modified Causal Forest. Read Paper
Algorithm demonstrations#
Bodory H, Busshoff H, Lechner M (2022). High Resolution Treatment Effects Estimation: Uncovering Effect Heterogeneities with the Modified Causal Forest. Entropy. 24(8):1039. Read Paper
Bodory H, Mascolo F, Lechner M (2024). Enabling Decision Making with the Modified Causal Forest: Policy Trees for Treatment Assignment. Algorithms. 17(7):318. Read Paper
Simulations#
Lechner M, Mareckova J (2024). Comprehensive Causal Causal Machine Learning. Read Paper
Applications in diverse fields#
Audrino F, Chassot J, Huang C, Knaus M, Lechner M, Ortega JP (2024). How does post-earnings announcement affect firms’ dynamics? New evidence from causal machine learning. Journal of Financial Econometrics. 22(3), 575–604. Read paper
Burlat H (2024). Everybody’s got to learn sometime? A causal machine learning evaluation of training programmes for jobseekers in France. Labour Economics. In Press. Paper 102573. Read paper
Cockx B, Michael L, Joost B (2023). Priority to unemployed immigrants? A causal machine learning evaluation of training in Belgium. Labour Economics. 80(102306). Read paper
Handouyahia A, Rikhi T, Awad G, Aouli E (2024). Heterogeneous causal effects of labour market programs: A machine learning approach. Proceedings of Statistics Canada Symposium 2022. Read paper
Heiniger S, Koeniger W, Lechner M (2024). The heterogeneous response of real estate prices during the Covid-19 pandemic. Journal of the Royal Statistical Society Series A: Statistics in Society, 00, 1–24. Read paper
Hodler R, Lechner M, and Raschky P (2023). Institutions and the Resource Course: New Insights from Causal Machine Learning. PLoS ONE. 18(6): e0284968. Read paper
Zhu M (2023). The Effect of Political Participation of Chinese Citizens on Government Satisfaction: Based on Modified Causal Forest. Procedia Computer Science. 221, 1044–1051. Read paper