A Simulation-based Decision Guide to Evaluate Missing Data in Clinical Studies
Comparative bioavailability (BA) or bioequivalence (BE) studies might be one of the most common clinical trials for both new drug innovators and the generic manufactures. Like other studies with PK endpoints, selecting an adequate and robust sampling time is essential for reliable estimations of pharmacokinetics (PK) parameters and accurate BE assessment.
Though the derived endpoints are approximations of drug characteristics and data points are inevitably missing in clinical trials, insufficient data undermine the precise determination of PK parameters. Although the impact of missing data on modelling approaches, analysis methods, and how to handle missing data in PK modelling has been extensively studied, less known about the extent and consequences of incomplete PK data on BE assessment, as we discussed in the previous part of this series (Refer to Part 1).
The most common approach to handle missing data that affects Pharmacokinetic parameter estimation in BE studies is to exclude the impacted parameter(s) from the BE assessment. However, some argue that in certain cases, including the affected data might yield advantages for the BE assessment. Moreover, the criteria used to determine the severity of the impact is also controversial. How many missing data points are significant, and are there any time points that are more important than others?
Therefore, we conducted simulations with various extents of randomly incomplete sampling and investigated the effects of this missing data on the BE results when impacted parameter(s) were included or excluded. In the scope of this poster, we focus on the measurement around the maximum concentration (Cmax) which is a critical parameter and often show the highest variability in BE assessment.
In this simulation, we simulate three theoretical studies for Dasatinib tablet – a high variability drug with the reported intra-subject variability (ISCV) ranges from 30% up to 70%.
The simulated 2-treatment, 2-period, crossover studies, each including 50 subjects, were generated by Python programming on Google Colab from the pool data of 280 subjects who provided reliable concentration data for 16 all pre-defined time points, denoted as “complete datasets”. The “complete” dataset demonstrated bioequivalence between two treatments exhibited a similar concentration-time profile, with an estimated ISCV of up to 36%.
“Incomplete” profiles were simulated using Python programming (NumPy and pandas), generating low to high missing rates (i.e. percentage of subjects who had at least one missing time point) with different number of missing time points per subject, varying from 1 to 7 within the reported Tmax range (0.5 to 3 hours).
The non-compartmental analysis to estimate Pharmacokinetic parameters and Bioequivalence assessment were performed using PROC GLM in WinNonlin® Phoenix 8.3 with simulated data. For each simulation, two BE assessments were conducted, one including and one excluding BE dataset with missing data. Any scenarios that did not meet the FDA-defined BE criteria of 80.00-125.00% for Cmax and/or AUCs were considered to have a critical impact on the BE results (Figure 1).
Figure 1 – Illustrated flowchart of the simulation procedure and representative scenarios of missing data
Additionally, “incomplete” concentration-time profiles were compared to the profile of “complete dataset” to further evaluate the potential impact of the missing data on the PK estimation.
Fourteen simulations and representative scenarios were generated for each dataset.
In brief, when including subjects with missing data in the BE analysis, the outcomes remained unaltered until 40% of subjects had at least 7 missing time points within the same period. Exceeding a missing data rate beyond 40% also did not impact the BE results, unless the extent of missing value was substantial (> 5 time points) which is not commonly seen in quality-controlled trials.
This suggests that the impact of missing data on the two-sided 90% CI could be mitigated with a robust sampling schedule around Tmax. In contrast, excluding subjects with missing data results in higher cases of unmet bioequivalence, in specific unmet BE at missing rate at and above 40%, which is most likely due to a lack of total sample size. (Figure 2).
To further discuss the higher tolerance of missing data when including the impacted subjects, we also evaluate the descriptive statistic and the PK profiles between two approaches. While incorporating subjects with missing data had minimal effect on the BE conclusion, the actual PK parameter values were skewed and PK profile shape were changed (Figure 3).
It’s worth noting that the impact of missing data to the PK parameters estimation and PK profiling depends on many factors, and the sampling schedule is one of the major ones. As the study drug is a highly variable drug and the sample collection schedule during the Tmax is robust enough, the change might be less pronounced when the missing rate is not substantial. The impact on AUCt is less prone compared to Cmax.
Similar observations were noted for all three datasets, especially in one dataset, at missing rate of 20% and 7 time points missing per subjects, including subjects in BE dataset lead to unmet BE, which further suggests that missing multiple time points per subjects is more critical than missing less time points in more subjects.
While excluding potential unreliable PK parameters from BE analysis may lead to an insufficient samples size for BE assessment, including subjects with missed critical points is unlikely to impact the BE criteria unless the extent of missing data is substantial (40% missing rate with 7 time points around Tmax). However, including subjects with missing data introduces bias in the PK parameter estimation and compromises study reliability.
This simulation is not intended to suggest an acceptable missing rate, however, emphasizes the importance of maintaining the balance between satisfying the statistical criteria and the practical of the data obtained from the study. Moreover, multiple missing data in clinical trial should not be recommended in any circumstances as not only it might impact the data integrity, but it might also suggest potential concerns with study compliance and validation.
These findings have been presented and discussed in the 2023 AAPS PharmSci 360. Please find the poster available for download here.
Please look forward to our future publication for more discussions and simulations of other scenarios of missing data. If you want to discuss this or other scientific interests or are looking for further cooperation with us, please visit our Contact page and reach out to our experts.
Written by: Sunny Le, Pharmacokinetics Manager