Empowering data analysis and machine learning to predict asthma intensity using air pollutants in conjunction with environmental factors

Priyanshi  Kotlia; Janmejay  Pant; Manoj Chandra  Lohani

doi:10.18502/japh.v11i2.21873

Priyanshi Kotlia School of Computing, Bhimtal Campus, Graphic Era Hill University, India
Janmejay Pant School of Computing, Bhimtal Campus, Graphic Era Hill University, India
Manoj Chandra Lohani Department of Centre for Promotion of Research, Graphic Era (Deemed to be) University, Dehradun, Uttarakhand, India

DOI: https://doi.org/10.18502/japh.v11i2.21873

Keywords: Air pollutants; Machine learning; Logistic regression; Random Forest classifier; Gradient boosting (XGBoost

Abstract

Introduction: Asthma is a respiratory disease, the severity of which is affected by air pollutants and environmental factors. Predicting asthma severity can help in disease monitoring and control. The objective of this research is to develop a model for predicting the severity of asthma based on environmental and demographical factors using machine learning.

Materials and methods: Data was obtained from different districts in Uttarakhand, India, from government sources. Asthma severity was the output feature or dependent feature, while the input features or independent features were air pollutants such as Particulate Matters (PM2.5, PM10), Nitrogen dioxide (NO₂), Sulfur dioxide (SO₂), Ozone (O₃), Carbon monoxide (CO), environmental factors (temperature, humidity, wind speed) and socio- economic factors (age, gender) in addition to a pollution index. Logistic Regression, Random Forest and XGBoost machine learning models were used for multi-class classification. The metrics for model performance were accuracy, precision, recall and F1-score.

Results: Logistic Regression had the highest accuracy (98%) compared to Random Forest and XGBoost (both 89%). It had goo) with an F1-score of 0.00 (support=1).

Conclusion: Our findings show the potential of machine learning models, especially class performance with F1-scores of 0.99 (class 0) and 0.96 (class 1). But all models could not predict the minority class (class 2). Logistic Regression, to predict asthma severity from environmental data. But it has limitations due to the exclusion of various factors like smoking, obesity, genetics, previous asthma, and medication