added presentation
This commit is contained in:
@@ -0,0 +1,171 @@
|
||||
### Slide 1 – Title
|
||||
|
||||
**Title:** Did Stack Overflow Answers Increase After ChatGPT?
|
||||
- Changes in Stack Overflow answer activity post-ChatGPT launch
|
||||
- Impact of related policy events
|
||||
- Developer behavior balancing Stack Overflow vs. AI tools
|
||||
|
||||
---
|
||||
|
||||
### Slide 2 – Research Question
|
||||
|
||||
**Research Questions:**
|
||||
1. Volume of answers:
|
||||
- Did Stack Overflow answers change systematically after ChatGPT launched (late 2022)?
|
||||
2. Policy/event impact:
|
||||
- Did AI-answer policies and moderation events create additional shifts?
|
||||
3. Substitution effect:
|
||||
- Are heavy ChatGPT users visiting/answering less on Stack Overflow?
|
||||
|
||||
**Approach:**
|
||||
- Look for structural breaks in answer time series
|
||||
- Link site-level patterns to developer survey data
|
||||
|
||||
---
|
||||
|
||||
### Slide 3 – Data Sources
|
||||
|
||||
**Dataset 1:**
|
||||
- Monthly new answer counts (2018–2025)
|
||||
- Pulled from Stack Exchange Data Explorer
|
||||
- Includes deleted posts
|
||||
- Provides pre-ChatGPT baseline and post-event window
|
||||
|
||||
**Dataset 2:**
|
||||
- Microdata from Stack Overflow Developer Surveys (2023–2025)
|
||||
- Focus:
|
||||
- Visit frequency
|
||||
- Adoption of AI tools like ChatGPT
|
||||
|
||||
**Exploratory Plots:**
|
||||
- Raw time series
|
||||
- Pre/post comparisons
|
||||
- Seasonality
|
||||
- Moving averages
|
||||
|
||||
---
|
||||
|
||||
### Slide 4 – Preliminary Patterns
|
||||
|
||||
**Key Observations:**
|
||||
- Long-run time series:
|
||||
- Downward drift in answers pre-2022
|
||||
- Sharper drop in level and slope post-ChatGPT launch
|
||||
- Pre/post comparison:
|
||||
- Post-ChatGPT period sits lower, even after accounting for seasonal dips (e.g., summer, year-end)
|
||||
- Seasonal plots:
|
||||
- 2018–2025 share consistent within-year rhythm
|
||||
- Confirms changes aren’t due to seasonality
|
||||
|
||||
---
|
||||
|
||||
### Slide 5 – Methodology
|
||||
|
||||
**Modelling Strategies:**
|
||||
1. **Interrupted Time-Series Regression (ITS):**
|
||||
- Predictors: time trend, level jump (ChatGPT launch), slope change
|
||||
- Optional indicators: policy/moderation periods
|
||||
2. **Poisson/Negative-Binomial Count Models:**
|
||||
- Predictors: same as ITS
|
||||
- Suitable for count data
|
||||
- Quantifies percentage changes per month
|
||||
3. **ARIMA Model:**
|
||||
- Trained on pre-ChatGPT data
|
||||
- Forecasts counterfactual trajectory
|
||||
- Compares observed vs. predicted post-event counts
|
||||
4. **Survey Logistic Regression:**
|
||||
- Predicts frequent Stack Overflow visits
|
||||
- Predictors: ChatGPT usage, demographics
|
||||
|
||||
**Diagnostics:**
|
||||
- Residual checks
|
||||
- Over-dispersion
|
||||
- Out-of-sample performance
|
||||
|
||||
---
|
||||
|
||||
### Slide 6 – Model Fits & Counterfactuals
|
||||
|
||||
**Findings:**
|
||||
- **Interrupted Time-Series Regression:**
|
||||
- Downward level shift post-2022
|
||||
- Steeper negative slope post-ChatGPT
|
||||
- Controls for pre-existing trend
|
||||
- **Poisson Model:**
|
||||
- Pre-ChatGPT: mild monthly contraction
|
||||
- Post-ChatGPT: steeper decline (compounds over time)
|
||||
- **ARIMA Forecast:**
|
||||
- Trained on pre-ChatGPT data
|
||||
- Post-2022 counts fall below 80% prediction interval
|
||||
- Observed counts never recover
|
||||
|
||||
**Takeaway:**
|
||||
- Structural break in answer supply post-ChatGPT and policy changes
|
||||
- Changes not explained by trend/seasonality alone
|
||||
|
||||
---
|
||||
|
||||
### Slide 7 – Survey Results
|
||||
|
||||
**Key Insights:**
|
||||
- **ChatGPT Adoption (2023):**
|
||||
- Widespread among developers, especially heavy coders
|
||||
- Daily use common in workflows
|
||||
- **Visit Frequency (2023–2024):**
|
||||
- 2023: Heavy ChatGPT users visit Stack Overflow at similar daily rates as non-users
|
||||
- 2024: Frequent visits drop more for heavy ChatGPT users
|
||||
- **Logistic Regression:**
|
||||
- ChatGPT usage alone: weak predictor of visit frequency (low-50% accuracy)
|
||||
- Combined with cross-tabs: supports partial substitution (marginal questions shifted to ChatGPT)
|
||||
|
||||
---
|
||||
|
||||
### Slide 8 – Key Findings
|
||||
|
||||
**Summary:**
|
||||
- Monthly answers on Stack Overflow:
|
||||
- Sharp drop post-ChatGPT release
|
||||
- Continued lower trend (even after controlling for pre-existing decline)
|
||||
- Policy/moderation events:
|
||||
- Additional dips align with governance decisions
|
||||
- Suggest amplification of ChatGPT effect
|
||||
- ARIMA counterfactuals:
|
||||
- Post-2022 counts outside expected range of pre-ChatGPT dynamics
|
||||
- Substitution effect:
|
||||
- Heavy ChatGPT users less likely to visit Stack Overflow daily over time
|
||||
|
||||
---
|
||||
|
||||
### Slide 9 – Limitations
|
||||
|
||||
**Caveats:**
|
||||
1. **Causality:**
|
||||
- Overlap of ChatGPT, AI policies, moderation strike
|
||||
- Broader economic/tooling trends also in play
|
||||
2. **SEDE Data:**
|
||||
- Doesn’t capture moderation queues/private spaces
|
||||
- Some activity may be invisible
|
||||
3. **Survey Data:**
|
||||
- Self-reported
|
||||
- May under-represent active answerers or certain regions/roles
|
||||
|
||||
**Interpretation:**
|
||||
- Results are **correlational evidence** of shifts in answer supply/usage patterns
|
||||
- Not a precise causal estimate of “ChatGPT effect”
|
||||
|
||||
---
|
||||
|
||||
### Slide 10 – Implications & Future Work
|
||||
|
||||
**Implications:**
|
||||
- Answer supply sensitive to:
|
||||
- Assistance tooling
|
||||
- Governance decisions
|
||||
- Platforms should:
|
||||
- Carefully consider AI policies/moderation capacity
|
||||
- Explore integration with conversational assistants (e.g., structured answer APIs)
|
||||
|
||||
**Future Work:**
|
||||
- Tag-level/user-cohort analyses
|
||||
- Stronger quasi-experimental designs (e.g., synthetic controls)
|
||||
-
|
||||
Reference in New Issue
Block a user