added lab 4
This commit is contained in:
@@ -0,0 +1,38 @@
|
||||
# exploratory data analysis and models on the epi dataset
|
||||
date: 2025-10-13
|
||||
|
||||
## dataset and choices
|
||||
- **file**: `epi_results_2024_pop_gdp_v2.csv`
|
||||
- **region column**: `region`
|
||||
- **response var**: `EPI.new`
|
||||
- **regions**: `Sub-Saharan Africa` vs `Latin America & Caribbean`
|
||||
|
||||
## 1) variable distributions
|
||||
### 1.1 boxplots and histograms (with density!)
|
||||

|
||||

|
||||

|
||||

|
||||
|
||||
### 1.2 qq plot (two-sample)
|
||||

|
||||
|
||||
## 2) linear models
|
||||
### full: EPI.new ~ gdp
|
||||
|
||||
### full: EPI.new ~ gdp + population
|
||||
|
||||
### 2.2 same models on one region (comparison)
|
||||
on region `Sub-Saharan Africa`, the better model is **region Sub-Saharan Africa: EPI.new ~ gdp + population** (r²=0.361, aic=265.4, bic=272.7).
|
||||
|
||||
## 3) classification (knn, label = region)
|
||||
### model A
|
||||
- **k**: 5 | **accuracy**: 0.5581 | **test n**: 43
|
||||
variables: `c("AGR.new", "AIR.new", "APO.new")`
|
||||

|
||||
|
||||
### model B
|
||||
- **k**: 5 | **accuracy**: 0.5116 | **test n**: 43
|
||||
variables: `c("BCA.new", "BDH.new", "CBP.new")`
|
||||

|
||||
|
||||
Reference in New Issue
Block a user