3. Sampling weights and clustering#
3.1. Sampling weights#
You can provide sampling weights for each observation in your data set. To estimate a Modified Causal Forest with sampling weights, you need to set the gen_weighted
parameter to True
and provide the name of the variable containing the sampling weights in the var_w_name
parameter.
3.2. Clustering#
If your data set contains clusters, you can provide the name of the variable containing the cluster identifier through the var_cluster_name
parameter.
In case your data has a panel structure, your data set is also clustered, namely at the level of the individual. In this case you can provide the name of the variable containing the individual identifier through the var_cluster_name
parameter.
The clusters are by default used to draw the random samples when growing the forest. You can control this behaviour through the gen_panel_in_rf
parameter. To compute clustered standard errors, you need to set the gen_panel_data
parameter to True.
3.3. Parameter overview#
The following table summarizes the parameters related to sampling weights and clustering in the class ModifiedCausalForest
:
Parameter |
Description |
---|---|
|
Name of the variable holding the sampling weight of each observation. |
|
If True, sampling weights from |
|
Name of the variable holding the cluster identifier. |
|
If True, clustered standard errors based on |
|
If True, clusters are used to draw the random samples when building the forest. Default: True. Only relevant if |
Please consult the API
for more details.
3.4. Examples#
from mcf.example_data_functions import example_data
from mcf.mcf_functions import ModifiedCausalForest
# Generate example data using the built-in function `example_data()`
training_df, prediction_df, name_dict = example_data()
my_mcf = ModifiedCausalForest(
var_y_name="outcome",
var_d_name="treat",
var_x_name_ord=["x_cont0", "x_cont1", "x_ord1"],
# Parameters for sampling weights:
var_w_name="weight",
gen_weighted=True
)
my_mcf = ModifiedCausalForest(
var_y_name="outcome",
var_d_name="treat",
var_x_name_ord=["x_cont0", "x_cont1", "x_ord1"],
# Parameters for clustering:
var_cluster_name="cluster",
gen_panel_data=True,
gen_panel_in_rf=True
)