데이터셋 : Diabetes Health Indicators Dataset
데이터 출처 : https://www.kaggle.com/datasets/alexteboul/diabetes-health-indicators-dataset/data
데이터셋 크기 : 총 229,474명에 대한 21개 feature (no diabetes : 194,377 / diabetes : 35,097)
no diabetes | diabetes | total | |
sample size | 194,377 | 35,097 | 229,474 |
84.71% | 15.29% | 100% |
21 features
dependent variable | 'Diabetes_binary' | |||
independent variables | binary |
|
||
categorical |
'GenHlth' | 1 to 5 (1 = excellent / 5 = poor) |
||
'Age' | 1 to 13 (1 = 18-24 / 9 = 60-64 / 13 = 80 or older) |
|||
'Education' | 1 to 6 (1 = Never attended school or only kindergarten / 6 = College graduate) |
|||
'Income' | 1 to 8 (1 = less than $10,000 / 5 = less than $35,000 / 8 = $75,000 or more) |
|||
numeric / disrete | 'BMI' | |||
'MentHlth' | ||||
'PhysHlth' |
means
no diabetes | diabetes | total | |
HighBP | 0.4005 | 0.7523 | 0.4543 |
HighChol | 0.4005 | 0.6695 | 0.4416 |
CholCheck | 0.9534 | 0.9931 | 0.9595 |
BMI | 28.0959 | 31.9642 | 28.6875 |
Smoker | 0.4562 | 0.5192 | 0.4658 |
Stroke | 0.0361 | 0.0931 | 0.0448 |
HeartDiseaseorAttack | 0.0816 | 0.2238 | 0.1033 |
PhysActivity | 0.7519 | 0.6285 | 0.7330 |
Fruits | 0.6178 | 0.5842 | 0.6127 |
Veggies | 0.8018 | 0.7549 | 0.7946 |
HvyAlcoholConsump | 0.0675 | 0.0237 | 0.0608 |
AnyHealthcare | 0.9436 | 0.9595 | 0.9460 |
NoDocbcCost | 0.0904 | 0.1066 | 0.0929 |
GenHlth | 2.4765 | 3.2959 | 2.6018 |
MentHlth | 3.3323 | 4.4934 | 3.5099 |
PhysHlth | 4.0804 | 8.0085 | 4.6812 |
DiffWalk | 0.1518 | 0.3737 | 0.1858 |
Sex | 0.4322 | 0.4773 | 0.4391 |
Age | 7.8520 | 9.3760 | 8.0851 |
Education | 5.0231 | 4.7398 | 4.9797 |
Income | 6.0137 | 5.1958 | 5.8886 |
to be continued...
'Upstage AI Lab 2기' 카테고리의 다른 글
Upstage AI Lab 2기 [Day015-022] EDA 조별 프로젝트 (4) 가설 설정 (0) | 2024.01.08 |
---|---|
통계학 복습 (0) | 2024.01.06 |
Upstage AI Lab 2기 [Day015-022] EDA 조별 프로젝트 (1) 기초통계 (0) | 2024.01.04 |
Upstage AI Lab 2기 [Day015-022] EDA 조별 프로젝트 (1) | 2024.01.03 |
Upstage AI Lab 2기 [Day014] (2) EDA 실습 (0) | 2023.12.31 |