Visit process

When the data are not recorded at regular intervals but rather are recorded everytime the patient visits the clinic, the times at which the time-varying covariates are measured will vary by subject. In this setting, it is typical to construct the data such that (i) at a time when there is no visit/measurement, the last measured value of a covariate is carried forward, and (ii) a subject is censored after a certain number of consecutive times with no visit/measurement [1] , [2].

In pygformula, the deterministic knowledge (i) and (ii) can be incorporated via the argument ‘‘visitprocess’’. Each vector in ‘‘visitprocess’’ contains three parameters that attach a visit process to one covariate. The first parameter is the name of a time-varying indicator in the input data set of whether a covariate was measured in each interval (1 means there is a visit/measurement, 0 means there is no visit/measurement). The second parameter is the name of the covariate. The third parameter is the maximum number s of missed measurements of this covariate allowed since the last measurement before a subject is censored.

For the visit indicator, in the fitting step, the probability of a visit is estimated only using records where the sum of consecutive missed visits through previous k-1 time points is less than the maximum number of consecutive missed visits s. Then in the simulation step, if the sum of consecutive missed visits through previous k-1 time points is less than s, then the visit indicator is simulated from a distribution based on this estimate; otherwise, the visit indicator is set to 1 so as to eliminate subjects with more than s consecutive missed visits. For the covariate, in the fitting step, the conditional mean of the covariate will be estimated only for data records where there is a current visit. If the visit indicator equals 1, then in simulation step, the value of the dependent covariate will be generated from a distribution based on this estimate; otherwise, the last value is carried forward.

The argument for visit process:

Arguments

Description

visitprocess

(Optional) List of lists. Each inner list contains its first entry the covariate name of a visit process; its second entry the name of a covariate whose modeling depends on the visit process; and its third entry the maximum number of consecutive visits that can be missed before an individual is censored.

covnames = ['visit_cd4', 'visit_rna', 'cd4_v', 'rna_v', 'everhaart']
covtypes = ['binary', 'binary', 'normal', 'normal', 'binary']
covmodels = ['visit_cd4 ~ lag1_everhaart + lag_cumavg1_cd4_v + sex + race + month',
             'visit_rna ~ lag1_everhaart + lag_cumavg1_rna_v + sex + race + month',
             'cd4_v ~ lag1_everhaart + lag_cumavg1_cd4_v + sex + race + month',
             'rna_v ~ lag1_everhaart + lag_cumavg1_rna_v + sex + race + month',
             'everhaart ~ lag1_everhaart + cd4_v + rna_v + sex + race + month']

visitprocess = [['visit_cd4', 'cd4_v', 3], ['visit_rna', 'rna_v', 3]]

g = ParametricGformula(..., covnames = covnames, covtypes = covtypes, covmodels = covmodels, visitprocess = visitprocess, ...)

Here is an example in clinical cohorts of HIV-positive patients, ‘‘cd4_v’’ is a time-varying covariate of CD4 cell count measurement, the visit indicator ‘‘visit_cd4’’ indicats whether the CD4 cell count measurements were taken in interval k. 3 means that the data is constructed such that the subjects are censored once they have not had CD4 measured for 3 consecutive intervals. Note that for the visit indicator ‘‘visit_cd4’’, it should come before the dependent covariate ‘‘cd4_v’’ and be assigned the ‘‘binary’’ covariate type in ‘‘covtypes’’.

Running example [code]:

from pygformula import ParametricGformula
from pygformula.interventions import static
from pygformula.data import load_visit_process

obs_data = load_visit_process()
time_name = 'month'
id = 'id'

covnames = ['visit_cd4', 'visit_rna', 'cd4_v', 'rna_v', 'everhaart']
covtypes = ['binary', 'binary', 'normal', 'normal', 'binary']
covmodels = ['visit_cd4 ~ lag1_everhaart + lag_cumavg1_cd4_v + sex + race + month',
             'visit_rna ~ lag1_everhaart + lag_cumavg1_rna_v + sex + race + month',
             'cd4_v ~ lag1_everhaart + lag_cumavg1_cd4_v + sex + race + month',
             'rna_v ~ lag1_everhaart + lag_cumavg1_rna_v + sex + race + month',
             'everhaart ~ lag1_everhaart + cd4_v + rna_v + sex + race + month']

basecovs = ['sex', 'race', 'age']

visitprocess = [['visit_cd4', 'cd4_v', 3], ['visit_rna', 'rna_v', 3]]

outcome_name = 'event'
ymodel = 'event ~ cd4_v + rna_v + everhaart + sex + race + month'

time_points = np.max(np.unique(obs_data[time_name])) + 1

int_descript = ['Never treat', 'Always treat']

g = ParametricGformula(obs_data = obs_data, id = id,  time_name = time_name,
    visitprocess = visitprocess,
    int_descript = int_descript,
    Intervention1_everhaart = [static, np.zeros(time_points)],
    Intervention2_everhaart = [static, np.ones(time_points)],
    covnames=covnames, covtypes=covtypes,
    covmodels=covmodels, basecovs = basecovs,
    outcome_name=ou tcome_name, ymodel=ymodel, outcome_type='survival')
g.fit()

Output:

../_images/visitprocess_example_output.png