added presentation

This commit is contained in:
ION606
2025-12-08 17:38:01 -05:00
parent 091831c67c
commit a9f73e4314
+171
View File
@@ -0,0 +1,171 @@
### Slide 1 Title
**Title:** Did Stack Overflow Answers Increase After ChatGPT?
- Changes in Stack Overflow answer activity post-ChatGPT launch
- Impact of related policy events
- Developer behavior balancing Stack Overflow vs. AI tools
---
### Slide 2 Research Question
**Research Questions:**
1. Volume of answers:
- Did Stack Overflow answers change systematically after ChatGPT launched (late 2022)?
2. Policy/event impact:
- Did AI-answer policies and moderation events create additional shifts?
3. Substitution effect:
- Are heavy ChatGPT users visiting/answering less on Stack Overflow?
**Approach:**
- Look for structural breaks in answer time series
- Link site-level patterns to developer survey data
---
### Slide 3 Data Sources
**Dataset 1:**
- Monthly new answer counts (20182025)
- Pulled from Stack Exchange Data Explorer
- Includes deleted posts
- Provides pre-ChatGPT baseline and post-event window
**Dataset 2:**
- Microdata from Stack Overflow Developer Surveys (20232025)
- Focus:
- Visit frequency
- Adoption of AI tools like ChatGPT
**Exploratory Plots:**
- Raw time series
- Pre/post comparisons
- Seasonality
- Moving averages
---
### Slide 4 Preliminary Patterns
**Key Observations:**
- Long-run time series:
- Downward drift in answers pre-2022
- Sharper drop in level and slope post-ChatGPT launch
- Pre/post comparison:
- Post-ChatGPT period sits lower, even after accounting for seasonal dips (e.g., summer, year-end)
- Seasonal plots:
- 20182025 share consistent within-year rhythm
- Confirms changes arent due to seasonality
---
### Slide 5 Methodology
**Modelling Strategies:**
1. **Interrupted Time-Series Regression (ITS):**
- Predictors: time trend, level jump (ChatGPT launch), slope change
- Optional indicators: policy/moderation periods
2. **Poisson/Negative-Binomial Count Models:**
- Predictors: same as ITS
- Suitable for count data
- Quantifies percentage changes per month
3. **ARIMA Model:**
- Trained on pre-ChatGPT data
- Forecasts counterfactual trajectory
- Compares observed vs. predicted post-event counts
4. **Survey Logistic Regression:**
- Predicts frequent Stack Overflow visits
- Predictors: ChatGPT usage, demographics
**Diagnostics:**
- Residual checks
- Over-dispersion
- Out-of-sample performance
---
### Slide 6 Model Fits & Counterfactuals
**Findings:**
- **Interrupted Time-Series Regression:**
- Downward level shift post-2022
- Steeper negative slope post-ChatGPT
- Controls for pre-existing trend
- **Poisson Model:**
- Pre-ChatGPT: mild monthly contraction
- Post-ChatGPT: steeper decline (compounds over time)
- **ARIMA Forecast:**
- Trained on pre-ChatGPT data
- Post-2022 counts fall below 80% prediction interval
- Observed counts never recover
**Takeaway:**
- Structural break in answer supply post-ChatGPT and policy changes
- Changes not explained by trend/seasonality alone
---
### Slide 7 Survey Results
**Key Insights:**
- **ChatGPT Adoption (2023):**
- Widespread among developers, especially heavy coders
- Daily use common in workflows
- **Visit Frequency (20232024):**
- 2023: Heavy ChatGPT users visit Stack Overflow at similar daily rates as non-users
- 2024: Frequent visits drop more for heavy ChatGPT users
- **Logistic Regression:**
- ChatGPT usage alone: weak predictor of visit frequency (low-50% accuracy)
- Combined with cross-tabs: supports partial substitution (marginal questions shifted to ChatGPT)
---
### Slide 8 Key Findings
**Summary:**
- Monthly answers on Stack Overflow:
- Sharp drop post-ChatGPT release
- Continued lower trend (even after controlling for pre-existing decline)
- Policy/moderation events:
- Additional dips align with governance decisions
- Suggest amplification of ChatGPT effect
- ARIMA counterfactuals:
- Post-2022 counts outside expected range of pre-ChatGPT dynamics
- Substitution effect:
- Heavy ChatGPT users less likely to visit Stack Overflow daily over time
---
### Slide 9 Limitations
**Caveats:**
1. **Causality:**
- Overlap of ChatGPT, AI policies, moderation strike
- Broader economic/tooling trends also in play
2. **SEDE Data:**
- Doesnt capture moderation queues/private spaces
- Some activity may be invisible
3. **Survey Data:**
- Self-reported
- May under-represent active answerers or certain regions/roles
**Interpretation:**
- Results are **correlational evidence** of shifts in answer supply/usage patterns
- Not a precise causal estimate of “ChatGPT effect”
---
### Slide 10 Implications & Future Work
**Implications:**
- Answer supply sensitive to:
- Assistance tooling
- Governance decisions
- Platforms should:
- Carefully consider AI policies/moderation capacity
- Explore integration with conversational assistants (e.g., structured answer APIs)
**Future Work:**
- Tag-level/user-cohort analyses
- Stronger quasi-experimental designs (e.g., synthetic controls)
-