This repository has been archived on 2026-05-09. You can view files and clone it. You cannot open issues or pull requests or push a commit.
Files
2025-12-05 19:59:00 -05:00

54 lines
1.8 KiB
Plaintext
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
model 1 linear regression
r2: 0.1566089012155698
rmse: 1.8625218879551908
coefficients:
SentimentTitle -0.383499
SentimentHeadline -0.064708
DaysSinceEpoch -0.000678
Topic_microsoft 0.101848
Topic_obama 1.779152
Topic_palestine 0.023738
dtype: float64
model 2 random forest on raw ts
r2: 0.7441325592979975
rmse: 0.8661035218490399
top importances:
TS50 0.810814
SentimentHeadline 0.099992
SentimentTitle 0.067386
TS49 0.001883
TS48 0.000589
TS15 0.000503
TS18 0.000503
TS13 0.000498
TS24 0.000498
TS10 0.000480
dtype: float64
model 3 random forest on pca(ts)
r2: 0.7442278904925559
rmse: 0.8659421602173341
pca variance explained (first 10): [9.38529911e-01 3.24317512e-02 1.76049987e-02 7.50439628e-03
1.90148973e-03 6.83679307e-04 3.57135169e-04 2.12058930e-04
1.33577763e-04 9.66846072e-05]
total variance explained: 0.9994556829781833
model 4 logistic regression (viral vs non-viral)
threshold (shares): 214.0
accuracy: 0.7287481626653601
f1 (positive class): 0.35709101466105386
roc auc: 0.7530964866530827
confusion matrix:
[[10669 4023]
[ 406 1230]]
model 5 kmeans on ts shapes
silhouette score: 0.9732852082508215
count mean median max
cluster
0 4978 36.751708 3.0 7045.0
1 1 1886.000000 1886.0 1886.0
2 21 2477.761905 1291.0 8010.0
cluster centroid summary:
cluster avg_ts ts1 ts10 ts25 ts50
0 0 8.317766 0.297710 2.959221 7.836079 17.221977
1 1 1885.920000 1885.000000 1886.000000 1886.000000 1886.000000
2 2 640.917143 22.761905 211.142857 579.047619 1387.619048