4. CBGATEs and BGATEs#

Balanced Group Average Treatment Effects (\(\textrm{BGATE's}\)) and Causal Balanced Group Average Treatment Effects (\(\textrm{CBGATE's}\)) have been introduced by Bearth & Lechner (2024).

The \(\textrm{BGATE}\) can be used to estimate \(\textrm{ATE's}\) for different groups (\(\textrm{GATE's}\)), while accounting for differences in other covariates, i.e.

\[BGATE(m,l;x) = \mathbb{E} \bigg[ \mathbb{E} \big[ Y^1 - Y^0 \big\vert Z=z, W=w \big]\bigg]\]

Here, \(Z\) is a single feature of \(X\) and \(W\) denotes a subgroup of features of \(X\) excluding \(Z\). \(z\) is a possible value of the variable of interest \(Z\).

The \(\textrm{BGATE}\) partially overcomes the attribution problem related to a simple \(\textrm{GATE}\), where other relevant variables may confound effect heterogeneity. Furthermore, the Causal Balanced Group Average Treatment Effect (\(\textrm{CBGATE}\)) makes a causal interpretation of the \(\textrm{BGATE}\) possible, when all variables other than the heterogeneity variable \(Z\) are balanced and further asssumptions discussed in Bearth & Lechner (2024) hold. Hence, both \(\textrm{CBGATE}\) and the plain-vanilla \(\textrm{GATE}\) are limiting cases of the \(\textrm{BGATE}\).

Algorithmically, the \(\textrm{BGATE}\) and the \(\textrm{CBGATE}\) are implemented as follows:

  1. Draw a random sample from the prediction data.

  2. Keep the heterogeneity and balancing variables.

  3. Replicate the data from step 2 \(n_z\) times, where \(n_z\) denotes the cardinality of the heterogeneity variable of interest. In each \(n_z\) fold, set \(Z\) to a specific value.

  4. Draw the nearest neighbours of each observation in the prediction data in terms of the balancing variables and the heterogeneity variable. If there is a tie, the algorithm chooses one randomly.

  5. Form a new sample with all selected neighbours.

  6. Compute \(\textrm{GATE's}\) and their standard errors.

One should note that this procedure only happens in the prediction part using the previously trained forest. This implementation differs from Bearth & Lechner (2024) estimation approach. They use double/debiased machine learning to estimate the parameters of interest.

To turn on the \(\textrm{BGATE}\) , set p_bgate to True. To turn on the \(\textrm{CBGATE}\), set p_cbgate to True. The balancing variables \(W\) have to be specified in var_bgate_name.

Below you find a list of the main parameters which are related to the \(\textrm{BGATE's}\) and \(\textrm{CBGATE's}\). Please consult the API for more details or additional parameters.

4.1. Parameters#

Parameter

Description

var_bgate_name

This parameter, which can be a string or a list of strings, specifies the variables that the GATEs should be balanced on. It’s only relevant if p_bgate is set to True. When a BGATE is computed, the distribution of these specified variables remains constant, ensuring that the effect of the treatment is estimated in a balanced manner across these variables. This helps to control for potential confounding effects that these variables might have on the treatment effect. If set to None, the program defaults to using the other heterogeneity variables (specified in var_z) for balancing. This means that the GATEs will be balanced across the distribution of these heterogeneity variables.

p_bgate

Activates the estimation of a Balanced Group Average Treatment Effect (BGATE).

p_cbgate

Enables the estimation of a GATE that is balanced in all other features.

p_gate_no_evalu_points

Determines the number of evaluation points for discretized variables in (C)BGATE estimation. The default value is 50.

p_bgate_sample_share

Used to speed up the program as the (C)BGATE estimation is CPU intensive.

4.1.1. Examples#

from mcf.example_data_functions import example_data
from mcf.mcf_functions import ModifiedCausalForest

# Generate example data using the built-in function `example_data()`
training_df, prediction_df, name_dict = example_data()

my_mcf = ModifiedCausalForest(
    var_y_name="outcome",
    var_d_name="treat",
    var_x_name_ord=["x_cont0", "x_cont1", "x_ord1"],
    var_z_name_list=["x_cont0", "x_cont1"],
    # Variables to balance the GATEs on
    var_bgate_name=["x_cont0", "x_cont1"],
    # Estimate a balanced GATE in selected features
    p_bgate=True,
    # Random samples to speed up the programme
    p_bgate_sample_share = None
)

my_mcf.train(training_df)
results, _ = my_mcf.predict(prediction_df)
my_mcf = ModifiedCausalForest(
    var_y_name="outcome",
    var_d_name="treat",
    var_x_name_ord=["x_unord0", "x_cont0", "x_ord1"],
    var_z_name_list=["x_cont0"],
    var_bgate_name=["x_cont0"],
    # Estimate a GATE that is balanced in all other features
    p_cbgate=True
)

my_mcf.train(training_df)
results, _ = my_mcf.predict(prediction_df)