본문 바로가기

Upstage AI Lab 2기

Upstage AI Lab 2기 [Day015-022] EDA 조별 프로젝트 (데이터 개요)

데이터셋 : Diabetes Health Indicators Dataset

데이터 출처 : https://www.kaggle.com/datasets/alexteboul/diabetes-health-indicators-dataset/data

데이터셋 크기 : 총 229,474명에 대한 21개 feature (no diabetes : 194,377 / diabetes : 35,097)

 

  no diabetes diabetes  total
sample size 194,377  35,097 229,474
84.71% 15.29% 100%

 

21 features

 

dependent variable  'Diabetes_binary'
independent variables binary 
  • 'HighBP'
  • 'HighChol'
  • 'CholCheck'
  • 'Smoker'
  • 'Stroke'
  • 'HeartDiseaseorAttack'
  • 'PhysActivity'
  • 'Fruits'
  • 'Veggies'
  • 'HvyAlcoholConsump'
  • 'AnyHealthcare'
  • 'NoDocbcCost'
  • 'DiffWalk'
  • 'Sex'
categorical 



'GenHlth' 1 to 5
(1 = excellent / 5 = poor)
'Age' 1 to 13
(1 = 18-24 / 9 = 60-64 / 13 = 80 or older)
'Education' 1 to 6
(1 = Never attended school or only kindergarten
/ 6 = College graduate)
'Income' 1 to 8
(1 = less than $10,000 / 5 = less than $35,000
/ 8 = $75,000 or more)
numeric / disrete  'BMI'  
'MentHlth'  
'PhysHlth'  

 

means

 

  no diabetes diabetes total
HighBP 0.4005 0.7523 0.4543
HighChol 0.4005 0.6695 0.4416
CholCheck 0.9534 0.9931 0.9595
BMI 28.0959 31.9642 28.6875
Smoker 0.4562 0.5192 0.4658
Stroke 0.0361 0.0931 0.0448
HeartDiseaseorAttack 0.0816 0.2238 0.1033
PhysActivity 0.7519 0.6285 0.7330
Fruits 0.6178 0.5842 0.6127
Veggies 0.8018 0.7549 0.7946
HvyAlcoholConsump 0.0675 0.0237 0.0608
AnyHealthcare 0.9436 0.9595 0.9460
NoDocbcCost 0.0904 0.1066 0.0929
GenHlth 2.4765 3.2959 2.6018
MentHlth 3.3323 4.4934 3.5099
PhysHlth 4.0804 8.0085 4.6812
DiffWalk 0.1518 0.3737 0.1858
Sex 0.4322 0.4773 0.4391
Age 7.8520 9.3760 8.0851
Education 5.0231 4.7398 4.9797
Income 6.0137 5.1958 5.8886

 

 

 

 

 

 

 

 

 

 

 

 

to be continued...