A new imputation method for population mean in the presence of missing data based on a transformed variable with applications to air pollution data in Chiang Mai, Thailand

  • Natthapat Thongsak State Audit Office of the Kingdom of Thailand, Bangkok, Thailand
  • Nuanpan Lawson Department of Applied Statistics, Faculty of Applied Science, King Mongkut’s University of Technology North Bangkok, Bangkok, Thailand
Keywords: Imputation method; Missing data; Transformed variable; Air pollution data; Mean square error


Introduction: Chiang Mai’s air pollution has risen to number one in the world for the highest level of fine particulate matter which further exacerbates the damage to human health. Fine particulate matter can enter the human body and blood circulation, destroying organ systems, increasing the risk for chronic disease and cancer, despite not having smoking habits or other morbidities. The Thai government must sort out this issue before it is too late as the whole nation’s health is at risk due to excessive dust levels higher than standard guidelines. Collection of pollution data can help us to come up with solutions and prevent it from turning into a hazardous situation. Unfortunately, pollution data are missing and need to be dealt with before analysis to obtain accurate results.

Materials and methods: A new method of imputation for estimating population mean based on a transformed variable has been suggested under simple random sampling without replacement and the uniform nonresponse mechanism. The bias and mean square error of the proposed estimator are investigated up to the first order of approximation. The performance of the proposed estimator is studied via applications to air pollution data in Chiang Mai, Thailand.

Results: The proposed estimator shows the best performance, giving the least bias and mean square error for all levels of sampling fractions. For the results from application the estimated value of sulfur dioxide from Particulate Matter 2.5 (PM2.5), the Percentage Relative Efficiency (PRE) is higher than all existing estimators by at least 16%. For the estimated PM2.5 from PM10 the PRE is higher than all existing estimators by at least 1600%, an extremely significant difference exhibiting similarity to real values.

Conclusion: The proposed imputation technique based on the transformed auxiliary variable can be helpful for imputing missing values and improving the efficiency of the estimators.