Dealing With Sparse Data Bias in Medical Sciences: Comprehensive Review of Methods and Applications

Mohammad Hossein  Panahi; Kazem Mohammad; Razieh Bidhendi Yarandi; Fahimeh  Ramezani Tehrani

doi:10.18502/acta.v58i11.5147

Mohammad Hossein Panahi
Kazem Mohammad
Razieh Bidhendi Yarandi
Fahimeh Ramezani Tehrani

DOI: https://doi.org/10.18502/acta.v58i11.5147

Keywords: Bayesian method; Complete/Quasi-complete separation; Data augmentation; Penalization methods; Sparse data bias

Abstract

This study aims to illustrate the problem of (Quasi) Complete Separation in the sparse data pattern occurring medical data. We presented the failure of traditional methods and then provided an overview of popular remedial approaches to reduce bias through vivid examples. Penalized maximum likelihood estimation and Bayesian methods are some remedial tools introduced to reduce bias. Data from the Tehran Thyroid and Pregnancy Study, a two-phase cohort study conducted from September 2013 through February 2016, was applied for illustration. The bias reduction of the estimate showed how sufficient these methods are compared to the traditional method. Extremely large measures of association such as the Risk ratios along with an extraordinarily wide range of confidence interval proved the traditional estimation methods futile in case of sparse data while it is still widely applying and reporting. In this review paper, we introduce some advanced methods such as data augmentation to provide unbiased estimations.