Friday, February 23, 2024

Machine learning analysis for the association between breast feeding and metabolic syndrome in women – Scientific Reports

Study population

This study was based on the fifth (2010–2012), sixth (2013–2015), seventh (2016–2018), and eighth (2019) Korean National Health and Nutrition Examination Survey (KNHANES) surveys. The KNHANES is a nationwide representative survey that obtains samples annually using a stratified multistage cluster sampling design. The KHANSE is conducted by a dedicated research team, visiting four regions each week (for a total of 192 regions annually). The survey is conducted over a period of 3 days in each region, with mobile examination vehicles visiting the area to perform health screenings, health surveys, and nutritional assessments. Health surveys and medical examinations are conducted in mobile examination vehicles, while nutritional assessments are performed by a specialized team of nutritionists who visit households directly. This data is used to assess the health status, prevalence of chronic diseases, and nutritional intake status of the population in South Korea. In the KNHANES 2010–2019, men and participants under the age of 20 years were excluded from the current analyses. The cases with missing data on the chronic occurrence or diagnosis of hypertension, myocardial infarction, angina, all the factors associated with the diagnosis of metabolic syndrome, and an outlier (the woman over 80 years old before menarche) were excluded.

The data were publicly available and de-identified. The requirement for ethical approval was waived by the institutional review board of Korea University Anam Hospital. All methods were conducted in accordance with relevant institutional/ethical committee guidelines and regulations. The requirement for informed consent was waived because all participant information was deidentified and encrypted to protect privacy.


The variables included in this study are summarized in Supplementary Materials 1. The sociodemographic characteristics, including the age at enrollment, sex, body mass index (BMI), household income (represented as quartiles), marital status, the level of education (elementary school and below, middle school, high school, and college and above), areas of residence, economic activities, and occupations, were assessed using questionnaires.

Information regarding the general obstetric characteristics, including gravidity, parity, breastfeeding (history of breasting, the number of children breastfed, and lifetime total breastfeeding duration), history of abortions, the age at menarche, and the menstrual status (menstruation, pregnancy, breastfeeding, menopause, and others), were also obtained from the questionnaires. The presence of the following diseases was defined based on an interview: (1) hypertension, (2) myocardial infarction, (3) angina, (4) stroke, (5) osteoarthritis, (6) rheumatoid arthritis, (7) pulmonary tuberculosis, (8) asthma, (9) thyroid-related disease, (10) major depressive disorder, (11) kidney failure, (12) hepatitis B, (13) hepatitis C, (14) liver cirrhosis, (14) cancers (gastric cancer, hepatic cancer, colorectal cancer, breast cancer, cervical cancer, and lung cancer), and (15) atopic dermatitis. Data on family histories of hypertension, hyperlipidemia, ischemic heart disease, stroke, and diabetes mellitus were also obtained from the questionnaires. Additionally, the questionnaires also provided the data on the use of (1) antihypertensive drugs, (2) lipid-lowering agents, (3) oral hypoglycemic agents, and (4) insulin.

The blood pressures, waist circumferences and body mass index (BMI) of the participants were measured. Levels of total cholesterol, TG, LDL, high-density lipoprotein (HDL), hemoglobin, hematocrit, blood urea nitrogen, blood creatinine, white blood cell, and red blood cell were also measured at the time of survey.

The participants answered questions about their insights and habits associated with their health. They were asked about their subjective body image, their goals associated with controlling their body weights, history of medical checkups for the past 2 years, history of smoking, frequency of alcohol consumption (per year), and weekly weight training routines. Data on mental health, including stress awareness and feelings of depression within a year, were also collected. The quality of life, based on health indicators, was assessed using the European Quality of Life-5 Dimensions (EQ-5D) scale30. The daily intake of energy (kcal), carbohydrates (g), protein (g), fat (g), sodium (mg), water (g), calcium (mg), phosphorus (mg), iron (mg), potassium (mg), and vitamin C (mg) was ascertained from the nutrition survey.

A diagnosis for CVD required the presence of at least one of the following: (1) hypertension, (2) myocardial infarction, or (3) angina. Based on the modified National Cholesterol Rationale Education Program Adult Treatment Program III criteria and the appropriate cutoff for central obesity in Korean adult women (suggested by the Korean Endocrine Society), metabolic syndrome was defined as having three or more of the following1,31: (1) central obesity (waist circumference ≥ 85 cm); (2) elevated TGs (serum TG concentration ≥ 150 mg/dL); (3) low HDL cholesterol (serum HDL cholesterol concentration < 50 mg/dL); (4) elevated blood pressure (systolic blood pressure ≥ 130 mmHg or diastolic blood pressure ≥ 85 mmHg) or the prescription of antihypertensive drugs; (5) elevated fasting glucose (fasting serum glucose ≥ 100 mg/dL) or the prescription of diabetes drugs. And we excluded the variables corresponding to the diagnostic criteria of metabolic syndrome among the independent variables, including waist circumference, TG, HDL cholesterol, blood pressure measurements, and fasting glucose.

Statistical analysis

An artificial neural network, decision tree, logistic regression, naïve Bayes, random forest, and support vector machine were used to predict metabolic syndrome. Data on 30,204 observations with full information were divided into training and validation sets in a 70:30 ratio (21,143:9061). The AUC curve and accuracy (the ratio of correct predictions among the 9061 observations in the validation set) were employed as the standard for model validation. The random forest variable importance, the contribution of a certain variable to the random forest performance (accuracy), was used to examine the major predictors of metabolic syndrome. Let us assume that the importance of the random forest variable of CVD is 0.0453. Here, the accuracy of the model drops by 4.53% if the values of a predictor of CVD are randomly permutated (or shuffled). The random split and analysis were repeated 50 times and averaged for external validation32,33,34. R-Studio 1.3.959 (R-Studio Inc.: Boston, United States) and Python 3.52 (CreateSpace: Scotts Valley, United States) were employed for the analysis between February 1, 2022–March 31, 2022.

Source link

Related Articles

Leave a Reply

[td_block_social_counter facebook="beingmedicos1" twitter="being_medicos" youtube="beingmedicosgroup" style="style8 td-social-boxed td-social-font-icons" tdc_css="eyJhbGwiOnsibWFyZ2luLWJvdHRvbSI6IjM4IiwiZGlzcGxheSI6IiJ9LCJwb3J0cmFpdCI6eyJtYXJnaW4tYm90dG9tIjoiMzAiLCJkaXNwbGF5IjoiIn0sInBvcnRyYWl0X21heF93aWR0aCI6MTAxOCwicG9ydHJhaXRfbWluX3dpZHRoIjo3Njh9" custom_title="Stay Connected" block_template_id="td_block_template_8" f_header_font_family="712" f_header_font_transform="uppercase" f_header_font_weight="500" f_header_font_size="17" border_color="#dd3333"]
- Advertisement -spot_img

Latest Articles