The Critical Role of Programming Languages among Healthcare Data Scientists: A Systematic Review of Trends, Applications, and Future Directions
Abstract
Background: Artificial intelligence (AI) and data science have transformed healthcare by enabling advanced analytical techniques. AI-driven solutions rely on sophisticated algorithms that require specialized programming languages. Understanding the most commonly used programming languages is essential for healthcare data scientists in order to navigate this domain effectively. This study explores the trends and applications of programming languages in healthcare data science, highlighting their roles in machine learning (ML) and related methodologies.
Methods: A systematic search was conducted in PubMed/MEDLINE, Scopus, and Web of Science (WoS) covering the period 2010–2023. Keyword combinations included artificial intelligence, machine learning, programming languages, healthcare, and medical informatics. After screening, 174 studies that explicitly mentioned programming languages in their abstracts were included for analysis.
Results: Public health accounted for 50.6% (n=88/174) of the reviewed studies, followed by medicine at 25.9% (n=45/174) and genomics at 14.4% (n=25/174). Python emerged as the most widely used programming language, appearing in 37.47% (n=65) of the articles, followed by R at 29.6% (n=51) and MATLAB at 17.8% (n=31). Machine learning methods were predominant in genomics and epidemiology. The temporal trend showed an increasing preference for Python, while MATLAB use declined in recent years.
Conclusion: The selection of programming languages in healthcare data science is influenced by technical needs, application-specific requirements, and collaboration dynamics. Python’s versatility has made it a dominant choice, while R’s statistical focus and MATLAB’s specialized toolkits remain significant in specific domains. The findings provide a framework for educational strategies, guiding data scientists in making informed decisions about language proficiency. Future research should evaluate the long-term implications of programming language adoption on healthcare analytics