Censoring event

When there are censoring events, the package provides the option to obtain inverse probability weighted (IPW) estimates for comparison with the g-formula estimates. The comparison of these two estimates can be useful to assess model misspecification of the g-formula [1]. To get the IPW estimate, the name of the censoring variable in the input data should be specified, users also need to specify a censor model to obtain the weights.

Note that the arguments ‘‘censor_name’’ and ‘‘censor_model’’ are only needed when users want to get the IPW estimate. The package will return the nonparametric observed risk in general cases.

The arguments for censoring events:

Arguments	Description
censor_name	(Optional) A string specifying the name of the censoring variable in obs_data. Only applicable when using inverse probability weights to estimate the natural course means / risk from the observed data.
censor_model	(Optional) A string specifying the model statement for the censoring variable. Only applicable when using inverse probability weights to estimate the natural course means / risk from the observed data.
ipw_cutoff_quantile	(Optional) Percentile value for truncation of the inverse probability weights.
ipw_cutoff_value	(Optional) Absolute value for truncation of the inverse probability weights.

Users can also specify a percentile value (in the argument ‘‘ipw_cutoff_quantile’’) or an absolute value (in the argument ‘‘ipw_cutoff_value’’) to truncate inverse probability weight.

Sample syntax:

censor_name = 'C'
censor_model = 'C ~ A + L'

g = ParametricGformula(..., censor_name = censor_name, censor_model = censor_model, ...)

Note

When there are categorical covariates (which are assigned a ‘C’ symbol) in the model statement of censoring variable, please name the censoring variable any name except ‘C’ to avoild name confusion.

Running example [code]:

import numpy as np
from pygformula import ParametricGformula
from pygformula.interventions import static
from pygformula.data import load_censor_data

obs_data = load_censor_data()
time_name = 't0'
id = 'id'

covnames = ['L', 'A']
covtypes = ['binary', 'normal']

covmodels = ['L ~ lag1_L + t0',
             'A ~ lag1_A + L + t0']

outcome_name = 'Y'
ymodel = 'Y ~ A + L'

censor_name = 'C'
censor_model = 'C ~ A + L'

time_points = np.max(np.unique(obs_data[time_name])) + 1
int_descript = ['Never treat', 'Always treat']

g = ParametricGformula(obs_data = obs_data, id = id, time_name=time_name,
    time_points = time_points,
    int_descript=int_descript,
    Intervention1_A = [static, np.zeros(time_points)],
    Intervention2_A = [static, np.ones(time_points)],
    censor_name= censor_name, censor_model=censor_model,
    covnames = covnames, covtypes = covtypes, covmodels = covmodels,
    outcome_name=outcome_name, ymodel=ymodel, outcome_type='survival')
g.fit()

Output: