A Comprehensive Analysis of NFHS-5 data for TB in India
Abstract
This study presents a comprehensive analysis of tuberculosis (TB) in India usingdata from the NFHS-5 (National Family Health Survey) program. The researchbegins by providing a thorough understanding of the DHS (Demographic andHealth Survey) and NFHS programs, followed by an extensive literature reviewof TB-related studies and NFHS-related papers.The findings from the literature review indicate that directing tuberculosiscontrol initiatives toward the poorest 20% of the population may yield more successfuloutcomes compared to targeting the general population or the wealthiest20%. Additionally, an examination of trends in TB incidence and mortality in Indiafrom 1990 to 2019, based on data from the Global Burden of Disease Study2019, reveals significant insights into the country�s TB burden.One notable observation from the literature review is that a substantial proportionof TB patients over 60% have at least one comorbidity, with diabetes emergingas a prominent comorbidity. Furthermore, the study highlights a concerning lackof awareness regarding TB among Indian adults, with only 49.7% of participantsreporting prior knowledge of the disease.The research extensively utilizes complex and large-scale NFHS-5 data. A considerableamount of work is devoted to analyzing household-level data, whichis categorized into three groups based on the Human Development Index, witheach category representing five states. Python�s CSV file processing capabilitiesare employed to handle and process a vast amount of data.To identify the factors that most significantly affect TB, the study compares TBvariables with 402 other variables. However, due to the limited number of TBcases within each category, the researchers calculate the number of TB and non-TB patients per 100,000 people for all variables. This approach provides a betterunderstanding of the relationships between variables and TB incidence.The study goes beyond analysis and prediction, incorporating the developmentof a model to predict an individual�s likelihood of contracting TB. Notably,the data exhibit significant bias, as 99.7% of the cases are non-TB patients. To addressthis imbalance, the Synthetic Minority Over-sampling Technique (SMOTE)is applied to generate synthetic data for the minority class. The researchers thenfocus on the most influential features associated with TB, resulting in a predictionmodel that achieves an accuracy of over 70% in accurately identifying individualsat risk of TB.In summary, this comprehensive analysis of NFHS-5 data sheds light on thetuberculosis landscape in India. The findings emphasize the importance of targetingthe most marginalized populations, highlighting the prevalence of comorbiditiessuch as diabetes among TB patients, underscoring the need for increasedpublic awareness and showcasing the potential of data-driven prediction modelsin improving TB control and prevention efforts.
Collections
- M Tech Dissertations [923]