54 lines
1.8 KiB
Plaintext
54 lines
1.8 KiB
Plaintext
model 1 – linear regression
|
||
r2: 0.1566089012155698
|
||
rmse: 1.8625218879551908
|
||
coefficients:
|
||
SentimentTitle -0.383499
|
||
SentimentHeadline -0.064708
|
||
DaysSinceEpoch -0.000678
|
||
Topic_microsoft 0.101848
|
||
Topic_obama 1.779152
|
||
Topic_palestine 0.023738
|
||
dtype: float64
|
||
model 2 – random forest on raw ts
|
||
r2: 0.7441325592979975
|
||
rmse: 0.8661035218490399
|
||
top importances:
|
||
TS50 0.810814
|
||
SentimentHeadline 0.099992
|
||
SentimentTitle 0.067386
|
||
TS49 0.001883
|
||
TS48 0.000589
|
||
TS15 0.000503
|
||
TS18 0.000503
|
||
TS13 0.000498
|
||
TS24 0.000498
|
||
TS10 0.000480
|
||
dtype: float64
|
||
model 3 – random forest on pca(ts)
|
||
r2: 0.7442278904925559
|
||
rmse: 0.8659421602173341
|
||
pca variance explained (first 10): [9.38529911e-01 3.24317512e-02 1.76049987e-02 7.50439628e-03
|
||
1.90148973e-03 6.83679307e-04 3.57135169e-04 2.12058930e-04
|
||
1.33577763e-04 9.66846072e-05]
|
||
total variance explained: 0.9994556829781833
|
||
model 4 – logistic regression (viral vs non-viral)
|
||
threshold (shares): 214.0
|
||
accuracy: 0.7287481626653601
|
||
f1 (positive class): 0.35709101466105386
|
||
roc auc: 0.7530964866530827
|
||
confusion matrix:
|
||
[[10669 4023]
|
||
[ 406 1230]]
|
||
model 5 – kmeans on ts shapes
|
||
silhouette score: 0.9732852082508215
|
||
count mean median max
|
||
cluster
|
||
0 4978 36.751708 3.0 7045.0
|
||
1 1 1886.000000 1886.0 1886.0
|
||
2 21 2477.761905 1291.0 8010.0
|
||
cluster centroid summary:
|
||
cluster avg_ts ts1 ts10 ts25 ts50
|
||
0 0 8.317766 0.297710 2.959221 7.836079 17.221977
|
||
1 1 1885.920000 1885.000000 1886.000000 1886.000000 1886.000000
|
||
2 2 640.917143 22.761905 211.142857 579.047619 1387.619048
|