Bounded Multivariate Contaminated Normal Mixture Model with an Application in Skin Cancer Detection

  • Abbas Mahdavi Department of Statistics, Vali-e-Asr University of Rafsanjan, Rafsanjan, Iran.
Keywords: ECME algorithm; Mixture model; Contaminated normal distribution; Bounded distribution.

Abstract

Introduction: In real-world datasets, outliers are a common occurrence that can have a significant impact on the accuracy and reliability of statistical analyses. Detecting these outliers and developing robust models to handle their presence is a crucial challenge in data analysis. For instance, natural images may have complex distributions of values due to environmental factors like noise and illumination, resulting in objects with overlapping regions and non-trivial contours that cannot be accurately described by Gaussian mixture models. In many real life applications, observed data always fall in bounded support regions. This leads to the idea of bounded support mixture models. Motivated by the aforementioned observations, we introduce a bounded multivariate cntaminated normal distribution for fitting data with non-Gaussian distributions, asymmetry, and bounded support which makes finite mixture models more robust to fitting, since rare observations are given less importance in calculations.

Methods: A family of finite mixtures of bounded multivariate contaminated normal distributions is introduced. The model is well-suited for computer vision and pattern recognition problems due to its heavily-tailed and bounded nature, providing flexibility in modeling data in the presence of outliers. A feasible expectation- maximization algorithm is developed to compute the maximum likelihood estimates of the model parameters using a selection mechanism.

Results: The proposed methodology is validated by conducting experiments on both simulated data and two real natural skin cancer images. We estimate the parameters by the proposed expectation-maximization algorithm. The obtained results shown that the proposed model has successfully enhanced accuracy in segmenting skin lesions.

Conclusion: The reliable model-based clustering using finite mixtures of bounded multivariate contaminated normal distributions is introduced. An expectation-maximization algorithm was created to estimate parameters, with closed-form expressions utilized at the E-step. Practical tests on images for skin cancer detection showed enhanced accuracy in delineating skin lesions.

Published
2024-12-08
Section
Articles