[Running] Rscript "/home/ion606/Desktop/Homework/Data Analytics/Assignment IV/analysis.r" ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ── ✔ dplyr 1.1.4 ✔ readr 2.1.5 ✔ forcats 1.0.1 ✔ stringr 1.6.0 ✔ ggplot2 4.0.0 ✔ tibble 3.3.0 ✔ lubridate 1.9.4 ✔ tidyr 1.3.1 ✔ purrr 1.1.0 ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ── ✖ dplyr::filter() masks stats::filter() ✖ dplyr::lag() masks stats::lag() ℹ Use the conflicted package () to force all conflicts to become errors Registered S3 method overwritten by 'quantmod': method from as.zoo.data.frame zoo [1] "month" "status" "new_answers" # A tibble: 6 × 3 month status new_answers 1 2018-01-01 deleted 26 2 2018-01-01 non-deleted 159 3 2018-02-01 deleted 20 4 2018-02-01 non-deleted 175 5 2018-03-01 deleted 18 6 2018-03-01 non-deleted 193 Rows: 95 Columns: 11 $ month 2018-01-01, 2018-02-01, 2018-03-01, 2018-04-01, 2… $ answers_total 185, 195, 211, 221, 227, 189, 149, 179, 198, 232, … $ answers_non_deleted 159, 175, 193, 191, 203, 172, 133, 154, 170, 198, … $ answers_deleted 26, 20, 18, 30, 24, 17, 16, 25, 28, 34, 20, 45, 33… $ year 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 20… $ month_num 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4,… $ time_index 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,… $ post_chatgpt FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, F… $ post_ai_policy FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, F… $ during_mod_strike FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, F… $ period "pre_chatgpt", "pre_chatgpt", "pre_chatgpt", "pre_… file already exists: data/stack-overflow-developer-survey-2023.zip file already exists: data/stack-overflow-developer-survey-2024.zip [1] "ResponseId" "Q120" [3] "MainBranch" "Age" [5] "Employment" "RemoteWork" [7] "CodingActivities" "EdLevel" [9] "LearnCode" "LearnCodeOnline" [11] "LearnCodeCoursesCert" "YearsCode" [13] "YearsCodePro" "DevType" [15] "OrgSize" "PurchaseInfluence" [17] "TechList" "BuyNewTool" [19] "Country" "Currency" [21] "CompTotal" "LanguageHaveWorkedWith" [23] "LanguageWantToWorkWith" "DatabaseHaveWorkedWith" [25] "DatabaseWantToWorkWith" "PlatformHaveWorkedWith" [27] "PlatformWantToWorkWith" "WebframeHaveWorkedWith" [29] "WebframeWantToWorkWith" "MiscTechHaveWorkedWith" [31] "MiscTechWantToWorkWith" "ToolsTechHaveWorkedWith" [33] "ToolsTechWantToWorkWith" "NEWCollabToolsHaveWorkedWith" [35] "NEWCollabToolsWantToWorkWith" "OpSysPersonal use" [37] "OpSysProfessional use" "OfficeStackAsyncHaveWorkedWith" [39] "OfficeStackAsyncWantToWorkWith" "OfficeStackSyncHaveWorkedWith" [41] "OfficeStackSyncWantToWorkWith" "AISearchHaveWorkedWith" [43] "AISearchWantToWorkWith" "AIDevHaveWorkedWith" [45] "AIDevWantToWorkWith" "NEWSOSites" [47] "SOVisitFreq" "SOAccount" [49] "SOPartFreq" "SOComm" [51] "SOAI" "AISelect" [53] "AISent" "AIAcc" [55] "AIBen" "AIToolInterested in Using" [57] "AIToolCurrently Using" "AIToolNot interested in Using" [59] "AINextVery different" "AINextNeither different nor similar" [61] "AINextSomewhat similar" "AINextVery similar" [63] "AINextSomewhat different" "TBranch" [65] "ICorPM" "WorkExp" [67] "Knowledge_1" "Knowledge_2" [69] "Knowledge_3" "Knowledge_4" [71] "Knowledge_5" "Knowledge_6" [73] "Knowledge_7" "Knowledge_8" [75] "Frequency_1" "Frequency_2" [77] "Frequency_3" "TimeSearching" [79] "TimeAnswering" "ProfessionalTech" 2023 so visit col: SOVisitFreq 2023 ai col : SOAI 2024 so visit col: SOVisitFreq 2024 ai col : AISelect Rows: 146,676 Columns: 10 $ year 2023, 2023, 2023, 2023, 2023, 2023, 2023, 2023, 2023, 202… $ main_branch "I am a developer by profession", "I am a developer by pr… $ country "United States of America", "United States of America", "… $ age 25, 45, 25, 25, 35, 35, 25, 45, 25, 25, 25, 25, 35, 25, 3… $ gender Unknown, Unknown, Unknown, Unknown, Unknown, Unknown, Unk… $ so_visit "Daily or almost daily", "A few times per month or weekly… $ ai_select "I don't think it's super necessary, but I think improvin… $ frequent_so 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1, … $ uses_chatgpt 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, … $ age_group 25-34, 45+, 25-34, 25-34, 35-44, 35-44, 25-34, 45+, 25-34… # A tibble: 2 × 7 period n_months mean_answers median_answers sd_answers min_answers max_answers 1 post_… 36 90.5 88 38.0 11 157 2 pre_c… 59 193. 185 44.7 122 313 Warning message: Removed 2 rows containing missing values or values outside the scale range (`geom_line()`). Warning message: Removed 2 rows containing missing values or values outside the scale range (`geom_line()`). [1] -10.02227 Call: lm(formula = answers_total ~ time + post_chatgpt + chatgpt_time, data = its_data) Residuals: Min 1Q Median 3Q Max -76.623 -22.914 -3.868 13.431 123.402 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 218.8013 9.3214 23.473 < 2e-16 *** time -0.8589 0.2702 -3.179 0.002022 ** post_chatgptTRUE -17.9635 15.0779 -1.191 0.236601 chatgpt_time -2.3661 0.6282 -3.767 0.000293 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 35.35 on 91 degrees of freedom Multiple R-squared: 0.717, Adjusted R-squared: 0.7077 F-statistic: 76.86 on 3 and 91 DF, p-value: < 2.2e-16 # A tibble: 4 × 5 term estimate std.error statistic p.value 1 (Intercept) 219. 9.32 23.5 2.23e-40 2 time -0.859 0.270 -3.18 2.02e- 3 3 post_chatgptTRUE -18.0 15.1 -1.19 2.37e- 1 4 chatgpt_time -2.37 0.628 -3.77 2.93e- 4 # A tibble: 1 × 12 r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC 1 0.717 0.708 35.3 76.9 7.39e-25 3 -471. 953. 966. # ℹ 3 more variables: deviance , df.residual , nobs Call: glm(formula = answers_total ~ time + post_chatgpt + chatgpt_time, family = poisson(link = "log"), data = its_data) Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 5.3936301 0.0183909 293.277 < 2e-16 *** time -0.0044547 0.0005512 -8.082 6.38e-16 *** post_chatgptTRUE -0.0187737 0.0365851 -0.513 0.608 chatgpt_time -0.0322028 0.0018440 -17.464 < 2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for poisson family taken to be 1) Null deviance: 2879.9 on 94 degrees of freedom Residual deviance: 713.8 on 91 degrees of freedom AIC: 1363 Number of Fisher Scoring iterations: 4 # A tibble: 4 × 5 term estimate std.error statistic p.value 1 (Intercept) 220. 0.0184 293. 0 2 time 0.996 0.000551 -8.08 6.38e-16 3 post_chatgptTRUE 0.981 0.0366 -0.513 6.08e- 1 4 chatgpt_time 0.968 0.00184 -17.5 2.71e-68 Series: train_ts ARIMA(1,1,0)(1,0,0)[12] Coefficients: ar1 sar1 -0.3956 0.3016 s.e. 0.1360 0.1381 sigma^2 = 1142: log likelihood = -281.17 AIC=568.34 AICc=568.8 BIC=574.47 Training set error measures: ME RMSE MAE MPE MAPE MASE Training set -0.1691686 32.90678 26.65938 -1.989033 14.30025 0.5170032 ACF1 Training set 0.03124461 ME RMSE MAE MPE MAPE MASE Training set -0.1691686 32.90678 26.65938 -1.989033 14.30025 0.5170032 Test set -78.4100374 89.26691 79.26493 -171.518981 171.98870 1.5371782 ACF1 Theil's U Training set 0.03124461 NA Test set 0.73383075 7.11443 dropping predictors with <2 levels: gender classification threshold (training frequent_so share): 0.384 Call: glm(formula = logit_formula, family = binomial(link = "logit"), data = survey_train) Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -0.358743 0.013009 -27.577 < 2e-16 *** uses_chatgpt -0.006783 0.066977 -0.101 0.91933 age_group25-34 0.040677 0.015439 2.635 0.00842 ** age_group35-44 -0.207571 0.017478 -11.876 < 2e-16 *** age_group45+ -0.345739 0.020289 -17.041 < 2e-16 *** age_groupunknown -0.222739 0.096177 -2.316 0.02056 * year2024 -0.082452 0.012319 -6.693 2.18e-11 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 156271 on 117339 degrees of freedom Residual deviance: 155647 on 117333 degrees of freedom AIC: 155661 Number of Fisher Scoring iterations: 4 # A tibble: 7 × 7 term estimate std.error statistic p.value conf.low conf.high 1 (Intercept) 0.699 0.0130 -27.6 2.10e-167 0.681 0.717 2 uses_chatgpt 0.993 0.0670 -0.101 9.19e- 1 0.870 1.13 3 age_group25-34 1.04 0.0154 2.63 8.42e- 3 1.01 1.07 4 age_group35-44 0.813 0.0175 -11.9 1.57e- 32 0.785 0.841 5 age_group45+ 0.708 0.0203 -17.0 4.09e- 65 0.680 0.736 6 age_groupunknown 0.800 0.0962 -2.32 2.06e- 2 0.662 0.965 7 year2024 0.921 0.0123 -6.69 2.18e- 11 0.899 0.943 pred truth 0 1 0 7560 10549 1 4009 7218 $accuracy [1] 0.5037497 $precision [1] 0.4062588 $recall [1] 0.6429144 [Done] exited with code=0 in 12.272 seconds