Robust Neighborhood Confidence Interval and Width to Evaluate the Outcome of a Binary Random Variable of Unequal Cluster Sizes
Abstract
Introduction: Confidence intervals (CIs) provide a more precise evaluation of outcomes, especially when the risk of an event is influenced by cluster size. While confidence intervals are commonly used to assess uncertainty in future data, in this study, we focus on their role in quantifying variability within currently observed outcomes. Specifically, the width of the predicted confidence interval serves as an indicator of existing intra-cluster heterogeneity, highlighting the extent of variability across different cluster sizes. This study introduces a novel method for evaluating observed outcomes of dichotomous random variables in datasets with unequal binary cluster sizes. By employing a robust neighborhood confidence interval width, this approach ensures a more reliable and adaptive estimation of intra-cluster variability, allowing for a more accurate interpretation of current data distributions. Methods: We introduce a novel algorithm for constructing an intra-cluster robust neighborhood confidence interval and its corresponding width for each cluster. This method enables the ranking of clusters based on confidence interval width, from the narrowest to the widest, providing a systematic approach to quantifying intra-cluster variability. By evaluating observed values within these ranked intervals, the algorithm offers a more precise assessment of data heterogeneity. To illustrate the effectiveness of this method, we present both a simulated example assessing its finite-sample performance and a real-world application in the context of antimicrobial resistance data with unequal binary cluster sizes. Results: The robust neighborhood intra-cluster confidence interval (CI) width was successfully derived for interpreting binary outcome data with unequal cluster sizes. The analysis showed that narrow confidence intervals indicate minimal random variation, suggesting higher reliability in the observed results, whereas wider intervals highlight increased intracluster variability. Conclusion: The intra-cluster robust neighborhood CI and its corresponding width provide a valuable tool for analyzing binary outcome data with unequal cluster sizes. This method enhances the interpretation of observed results by systematically quantifying variability within clusters, allowing for more reliable intra-cluster comparisons.