This repository has been archived on 2026-05-09. You can view files and clone it. You cannot open issues or pull requests or push a commit.
Files
2025-10-31 17:55:13 -04:00

1.3 KiB

exploratory data analysis and models on the epi dataset

date: 2025-10-13

dataset and choices

  • file: epi_results_2024_pop_gdp_v2.csv
  • region column: region
  • response var: EPI.new
  • regions: Sub-Saharan Africa vs Latin America & Caribbean

1) variable distributions

1.1 boxplots and histograms (with density!)

1.2 qq plot (two-sample)

2) linear models

full: EPI.new ~ gdp

full: EPI.new ~ gdp + population

2.2 same models on one region (comparison)

on region Sub-Saharan Africa, the better model is region Sub-Saharan Africa: EPI.new ~ gdp + population (r²=0.361, aic=265.4, bic=272.7).

3) classification (knn, label = region)

model A

  • k: 5 | accuracy: 0.5581 | test n: 43 variables: c("AGR.new", "AIR.new", "APO.new")

model B

  • k: 5 | accuracy: 0.5116 | test n: 43 variables: c("BCA.new", "BDH.new", "CBP.new")