model 1 – linear regression r2: 0.1566089012155698 rmse: 1.8625218879551908 coefficients: SentimentTitle -0.383499 SentimentHeadline -0.064708 DaysSinceEpoch -0.000678 Topic_microsoft 0.101848 Topic_obama 1.779152 Topic_palestine 0.023738 dtype: float64 model 2 – random forest on raw ts r2: 0.7441325592979975 rmse: 0.8661035218490399 top importances: TS50 0.810814 SentimentHeadline 0.099992 SentimentTitle 0.067386 TS49 0.001883 TS48 0.000589 TS15 0.000503 TS18 0.000503 TS13 0.000498 TS24 0.000498 TS10 0.000480 dtype: float64 model 3 – random forest on pca(ts) r2: 0.7442278904925559 rmse: 0.8659421602173341 pca variance explained (first 10): [9.38529911e-01 3.24317512e-02 1.76049987e-02 7.50439628e-03 1.90148973e-03 6.83679307e-04 3.57135169e-04 2.12058930e-04 1.33577763e-04 9.66846072e-05] total variance explained: 0.9994556829781833 model 4 – logistic regression (viral vs non-viral) threshold (shares): 214.0 accuracy: 0.7287481626653601 f1 (positive class): 0.35709101466105386 roc auc: 0.7530964866530827 confusion matrix: [[10669 4023] [ 406 1230]] model 5 – kmeans on ts shapes silhouette score: 0.9732852082508215 count mean median max cluster 0 4978 36.751708 3.0 7045.0 1 1 1886.000000 1886.0 1886.0 2 21 2477.761905 1291.0 8010.0 cluster centroid summary: cluster avg_ts ts1 ts10 ts25 ts50 0 0 8.317766 0.297710 2.959221 7.836079 17.221977 1 1 1885.920000 1885.000000 1886.000000 1886.000000 1886.000000 2 2 640.917143 22.761905 211.142857 579.047619 1387.619048