Data-Analytics/Assignment IV/Presentation Notes.md at main

Archived

This repository has been archived on 2026-05-09. You can view files and clone it. You cannot open issues or pull requests or push a commit.

Files

T

ION606 a9f73e4314 added presentation

2025-12-08 17:38:01 -05:00

4.9 KiB

Raw Permalink Blame History

Slide 1 – Title

Title: Did Stack Overflow Answers Increase After ChatGPT?

Changes in Stack Overflow answer activity post-ChatGPT launch
Impact of related policy events
Developer behavior balancing Stack Overflow vs. AI tools

Slide 2 – Research Question

Research Questions:

Volume of answers:
- Did Stack Overflow answers change systematically after ChatGPT launched (late 2022)?
Policy/event impact:
- Did AI-answer policies and moderation events create additional shifts?
Substitution effect:
- Are heavy ChatGPT users visiting/answering less on Stack Overflow?

Approach:

Look for structural breaks in answer time series
Link site-level patterns to developer survey data

Slide 3 – Data Sources

Dataset 1:

Monthly new answer counts (2018–2025)
Pulled from Stack Exchange Data Explorer
Includes deleted posts
Provides pre-ChatGPT baseline and post-event window

Dataset 2:

Microdata from Stack Overflow Developer Surveys (2023–2025)
Focus:
- Visit frequency
- Adoption of AI tools like ChatGPT

Exploratory Plots:

Raw time series
Pre/post comparisons
Seasonality
Moving averages

Slide 4 – Preliminary Patterns

Key Observations:

Long-run time series:
- Downward drift in answers pre-2022
- Sharper drop in level and slope post-ChatGPT launch
Pre/post comparison:
- Post-ChatGPT period sits lower, even after accounting for seasonal dips (e.g., summer, year-end)
Seasonal plots:
- 2018–2025 share consistent within-year rhythm
- Confirms changes aren’t due to seasonality

Slide 5 – Methodology

Modelling Strategies:

Interrupted Time-Series Regression (ITS):
- Predictors: time trend, level jump (ChatGPT launch), slope change
- Optional indicators: policy/moderation periods
Poisson/Negative-Binomial Count Models:
- Predictors: same as ITS
- Suitable for count data
- Quantifies percentage changes per month
ARIMA Model:
- Trained on pre-ChatGPT data
- Forecasts counterfactual trajectory
- Compares observed vs. predicted post-event counts
Survey Logistic Regression:
- Predicts frequent Stack Overflow visits
- Predictors: ChatGPT usage, demographics

Diagnostics:

Residual checks
Over-dispersion
Out-of-sample performance

Slide 6 – Model Fits & Counterfactuals

Findings:

Interrupted Time-Series Regression:
- Downward level shift post-2022
- Steeper negative slope post-ChatGPT
- Controls for pre-existing trend
Poisson Model:
- Pre-ChatGPT: mild monthly contraction
- Post-ChatGPT: steeper decline (compounds over time)
ARIMA Forecast:
- Trained on pre-ChatGPT data
- Post-2022 counts fall below 80% prediction interval
- Observed counts never recover

Takeaway:

Structural break in answer supply post-ChatGPT and policy changes
Changes not explained by trend/seasonality alone

Slide 7 – Survey Results

Key Insights:

ChatGPT Adoption (2023):
- Widespread among developers, especially heavy coders
- Daily use common in workflows
Visit Frequency (2023–2024):
- 2023: Heavy ChatGPT users visit Stack Overflow at similar daily rates as non-users
- 2024: Frequent visits drop more for heavy ChatGPT users
Logistic Regression:
- ChatGPT usage alone: weak predictor of visit frequency (low-50% accuracy)
- Combined with cross-tabs: supports partial substitution (marginal questions shifted to ChatGPT)

Slide 8 – Key Findings

Summary:

Monthly answers on Stack Overflow:
- Sharp drop post-ChatGPT release
- Continued lower trend (even after controlling for pre-existing decline)
Policy/moderation events:
- Additional dips align with governance decisions
- Suggest amplification of ChatGPT effect
ARIMA counterfactuals:
- Post-2022 counts outside expected range of pre-ChatGPT dynamics
Substitution effect:
- Heavy ChatGPT users less likely to visit Stack Overflow daily over time

Slide 9 – Limitations

Caveats:

Causality:
- Overlap of ChatGPT, AI policies, moderation strike
- Broader economic/tooling trends also in play
SEDE Data:
- Doesn’t capture moderation queues/private spaces
- Some activity may be invisible
Survey Data:
- Self-reported
- May under-represent active answerers or certain regions/roles

Interpretation:

Results are correlational evidence of shifts in answer supply/usage patterns
Not a precise causal estimate of “ChatGPT effect”

Slide 10 – Implications & Future Work

Implications:

Answer supply sensitive to:
- Assistance tooling
- Governance decisions
Platforms should:
- Carefully consider AI policies/moderation capacity
- Explore integration with conversational assistants (e.g., structured answer APIs)

Future Work:

Tag-level/user-cohort analyses
Stronger quasi-experimental designs (e.g., synthetic controls)

4.9 KiB Raw Permalink Blame History Unescape Escape