This repository has been archived on 2026-05-09. You can view files and clone it. You cannot open issues or pull requests or push a commit.
Files
2025-10-31 17:55:13 -04:00

39 lines
1.3 KiB
Markdown

# exploratory data analysis and models on the epi dataset
date: 2025-10-13
## dataset and choices
- **file**: `epi_results_2024_pop_gdp_v2.csv`
- **region column**: `region`
- **response var**: `EPI.new`
- **regions**: `Sub-Saharan Africa` vs `Latin America & Caribbean`
## 1) variable distributions
### 1.1 boxplots and histograms (with density!)
![](figures/box_Sub-Saharan_Africa_EPI.new.png)
![](figures/box_Latin_America_Caribbean_EPI.new.png)
![](figures/hist_Sub-Saharan_Africa_EPI.new.png)
![](figures/hist_Latin_America_Caribbean_EPI.new.png)
### 1.2 qq plot (two-sample)
![](figures/qq_EPI.new_Sub-Saharan_Africa_vs_Latin_America_Caribbean.png)
## 2) linear models
### full: EPI.new ~ gdp
### full: EPI.new ~ gdp + population
### 2.2 same models on one region (comparison)
on region `Sub-Saharan Africa`, the better model is **region Sub-Saharan Africa: EPI.new ~ gdp + population** (r²=0.361, aic=265.4, bic=272.7).
## 3) classification (knn, label = region)
### model A
- **k**: 5 | **accuracy**: 0.5581 | **test n**: 43
variables: `c("AGR.new", "AIR.new", "APO.new")`
![](figures/knn_confusion_model_A.png)
### model B
- **k**: 5 | **accuracy**: 0.5116 | **test n**: 43
variables: `c("BCA.new", "BDH.new", "CBP.new")`
![](figures/knn_confusion_model_B.png)