Followup: 10 Factors Predicting Participation in the Reddit Blackout. Building Statistical Models of Online Behavior through Qualitative Research
Three weeks ago, I shared dataviz and statistical models predicting participation in the Reddit Blackout in July 2015. Since then, many moderators have offered feedback and new ideas for the data analysis, alongside their own stories. Earlier today, I shared this update with redditors.
UPDATE, Sept 16, 9pm ET: Redditors brilliantly spotted an important gap in my dataset and worked with me to resolve it. After taking the post down for two days, I am posting the corrected results. Thanks to their quick work, the graphics and findings in this post are more robust.
This July, moderators of 2,278 subreddits joined a “blackout,” demanding better communication and improved moderator tools. As part of my wider research on the work and position of moderators in online communities, I have also been asking the question: who joined the July blackout, and what made some moderators and subs more likely to participate?
Academic research on the work of moderators would expect that the most important predictor of blackout participation would be the workload, which creates common needs across subs. Aaron Shaw and Benjamin Mako Hill argue, based on evidence from Wikia, that as the work of moderating becomes more complex within a community, moderators grow in their own sense of common identity and common needs as distinct from their community (read Shaw and Hill’s Wikia paper here). Postigo argues something similar in terms of moderators’ relationship to a platform: when moderators feel like they’re doing huge amounts of work for a company that’s not treating them well, they can develop common interests and push back (read my summary of Postigo’s AOL paper here).
Testing Redditors’ Explanations of The Blackout
After posting an initial data analysis to reddit three weeks ago, dozens of moderators generously contacted me with comments and offers to let me interview them. In this post, I test hypotheses straight from redditors’ explanations of what led different subreddits to join the blackout. By putting all of these hypotheses into one model, we can see how important they were across reddit, beyond any single sub. (see my previous post) (learn more about my research ethics and my promises to redditors)
- Subs who shared mods with other blackout subs were more likely to join the blackout, but controlling for that:
- Default subs were more likely to join the blackout
- NSFW subs were more likely to join the blackout
- Subs with more moderators were slightly more likely to join the blackout
- More active subs were more likely to join the blackout
- More isolated subs were less likely to join the blackout
- Subs whose mods participate in metareddits were more likely to join the blackout
- Subs whose mods get and give help in moderator-specific subs were no more or less likely to join the blackout
In my research I have read over a thousand reddit threads, interviewed over a dozen moderators, archived discussions in hundreds of subreddits, and collected data from the reddit API— starting before the blackout. Special thanks to everyone who has spoken with me and shared data.
Improving the Blackout Dataset With Comment Data
Based on conversations with redditors, I collected more data:
- Instead of the top 20,000 subreddits by subscribers, I now focus on the top subreddits by number of comments in June 2015, thanks to a comment dataset collected by /u/Stuck_In_the_Matrix
- I updated my /u/GoldenSights amageddon dataset to include 400 additional subs, after feedback from redditors on /r/TheoryOfReddit
- I include “NSFW” subreddits intended for people over 18
- I account for more bots thanks to redditor feedback
- I account for changes in subreddit leadership (with some gaps for subreddits that have experienced substantial leadership changes since July) In this dataset, half of the 10 most active subs joined the blackout, 24% of the 100 most active, 14.2% of the 1,000 most active, and 4.7% of the 20,000 most active subreddits.
To illustrate the data, here are two charts of the top 52,754 most active subreddits as they would have stood at the end of June. The font size and node size are related to the log-transformed number of comments from June. Ties between subreddits represent shared moderators. The charts are laid out using the ForceAtlas2 layout on Gephi, which has separated out some of the more prominent subreddit networks, including the ImaginaryNetwork, the “SFW Porn” Network, and several NSFW networks (I’ve circled notable networks in the network graph at the top of this post).
Redditors’ Explanations Of Blackout Participation
With 2,278 subreddits joining the blackout, redditors have many theories for what experiences and factors led subs to join the blackout. In the following section, I share these theories and then test one big logistic regression model that accounts for all of the theories together. In these tests, I consider 52,745 subreddits that had at least one comment in June 2015. A total of 1,342 of these subreddits joined the blackout.
The idea of blacking out had come up before. According to one moderator, blacking out was first discussed by moderators three years ago as a way to protest Gawker’s choice to publish details unmasking a reddit moderator. Although some subs banned Gawker URLs from being posted to their communities, the blackout didn’t take off. While some individual subreddits have blacked out in the intervening years, this was the first time that many subs joined together.
I tested these hypotheses with the set of (firth) logistic regression models shown below. The final model (on the right) offers the best fit of all the models, with a McFadden R2 of 0.123, which is pretty good.
PREDICTING PARTICIPATION IN THE REDDIT BLACKOUT JULY 2015 Preliminary logistic regression results, J. Nathan Matias, Microsoft Research Published on September 14, 2015 More info about this research: bit.ly/1V7c9i4 Contact: /u/natematias N = top 52,745 subreddits in terms of June 2015 comments, including NSFW, for subreddits still available on July 2 Comment dataset: https://www.reddit.com/r/datasets/comments/3bxlg7/i_have_every_publicly_available_reddit_comment/ List of subreddits "going private": https://www.reddit.com/r/GoldTesting/wiki/amageddon Moderator network queried in June 2015, with gap filling in July 2015 and September 2015 ================================================================================================================== Dependent variable: ---------------------------------------------------------------------------- blackout (1) (2) (3) (4) (5) (6) (7) ------------------------------------------------------------------------------------------------------------------ default sub 3.161*** 1.065*** 1.070*** 0.814** 0.720** 0.693** 0.705** (0.294) (0.305) (0.317) (0.336) (0.337) (0.337) (0.339) NSFW sub 0.179* 0.235** 0.268*** 0.291*** 0.288*** 0.314*** 0.313*** (0.098) (0.099) (0.099) (0.101) (0.101) (0.102) (0.102) log(comments in june 2015) 0.263*** 0.268*** 0.246*** 0.258*** 0.256*** 0.257*** (0.009) (0.010) (0.011) (0.011) (0.011) (0.011) moderator count 0.066*** 0.055*** 0.053*** 0.051*** 0.051*** (0.007) (0.008) (0.008) (0.008) (0.008) log(comments):moderator count -0.006*** -0.005*** -0.005*** -0.004*** -0.004*** (0.001) (0.001) (0.001) (0.001) (0.001) log(mod roles in other subs) -0.293*** -0.328*** -0.334*** -0.332*** (0.033) (0.033) (0.033) (0.033) log(mod roles in blackout subs) 2.163*** 2.134*** 2.134*** 2.133*** (0.096) (0.096) (0.096) (0.096) log(mod roles in other subs):log(mod roles in blackout subs) -0.255*** -0.248*** -0.254*** -0.254*** (0.017) (0.017) (0.017) (0.017) log(sub isolation, by comments) -2.608*** -2.568*** -2.569*** (0.347) (0.345) (0.345) log(metareddit participation per mod in june 2015) 0.100*** 0.103*** (0.036) (0.036) log(mod-specific sub participation per mod in june 2015) -0.024 (0.063) Constant -3.608*** -4.517*** -4.677*** -4.655*** -4.467*** -4.469*** -4.469*** (0.028) (0.050) (0.054) (0.058) (0.060) (0.060) (0.060) ------------------------------------------------------------------------------------------------------------------ Observations 52,745 52,745 52,745 52,745 52,745 52,745 52,745 Log Likelihood -6,520.505 -6,171.874 -6,130.725 -5,861.099 -5,806.916 -5,803.188 -5,803.098 Akaike Inf. Crit. 13,047.010 12,351.750 12,273.450 11,740.200 11,633.830 11,628.380 11,630.200 ================================================================================================================== Note: *p<0.1; **p<0.05; ***p
The network of moderators who moderate blackout subs is the strongest predictor in this model. At a basic level, it makes sense that moderators who participated in the blackout in one subreddit might participate in another. Making sense of this kind of network relationship is a hard problem in network science, and this model doesn’t include time as a dimension, so we don’t consider which subs went dark before which others. If I had data on the time that subreddits went dark, it might be possible to better research this interesting question, like Bogdan State and Lada Adamic did with their paper on the Facebook equality meme.
Hypothesis 1: Default subs were more likely to join the blackout
In interviews, some moderators pointed out that “most of the conversation about the blackout first took place in the default mod irc channel.” Moderators of top subs described the blackout as mostly an issue concerning default or top subreddits.
This hypothesis supported in the final model. For example, while a non-default subreddit with 4 million monthly comments had a 32.9% chance of joining the blackout (holding all else at their means), a default subreddit of the same size had a 48.6% chance of joining the blackout, on average in the population of subs.
Hypothesis 2: Subs with more comment activity were more likely to join the blackout
Moderators of large, non-default subreddits also had plenty of reasons to join the blackout, either because they also shared the need for better moderating tools, or because they had more common contact and sympathy with other moderators as a group.
Even among subreddits that declined to joint the blackout, many moderators described feeling obligated to make a decision one way or an other. This surprised moderators of large subreddits, who saw it as an issue for larger groups. Size was a key issue in the hundreds of smaller groups that discussed the possibility, with many wondering if they had much in common with larger subs, or whether blacking out their smaller sub would make any kind of dent in reddit’s advertising revenue.
In the final model, larger subs were more likely to join the blackout, a logarithmic relationship that is mediated by the number of moderators. When we set everything else to its mean, we can observe how this looks for subs of different sizes. In the 50th percentile, subreddits with 6 comments per month had a 1.6% chance of joining the blackout — a number that adds up with so many small subs. In the 75th percentile, subs with 46 comments a month had a 2.5% chance of joining the blackout. Subs with 1,000 comments a month had a 5.4% chance of joining, while subs with 100,000 comments a month had a 15.8% chance of joining the blackout, on average in the population of subs, holding all else constan.
Hypothesis 3: NSFW subs were more likely to join the blackout
In interviews, some moderators said that they declined to join the blackout because they saw it as something associated with support for hate speech subreddits taken down by the company in June or other parts of reddit they preferred not to be associated with. Default moderators denied this flatly, describing the lengths they went to dissociate from hate speech communities and sentiment against then-CEO Ellen Pao. Nevertheless, many journalists drew this connection, and moderators were worried that they might become associated with those subs despite their efforts.
Another possibility is that NSFW subs have to do more work to maintain subs that offer high quality NSFW conversations without crossing lines set by reddit and the law. Perhaps NSFW subs just have more work, so they were more likely to see the need for better tools and support from reddit.
In the final model, NSFW subs were more likely to join the blackout than non-NSFW subs. For example, while a non-default, non-NSFW subreddit with 22,800 of comments had a 11.4% chance of joining the blackout (holding all else at their means), an NSFW subreddit of the same size had a 15.3% chance of joining the blackout, on average in the population of subs. Among less popular subs, a non-NSFW sub with 1,000 comments per month had a 5.4% chance of joining the blackout, while an NSFW sub of the same size had a 7.5% chance of joining, holding all else constant, on average in the population of subs.
Hypothesis 4: More isolated subs were less likely to join the blackout
In the interviews I conducted, as well as the 90 or so interviews I read on /r/subredditoftheday, moderators often contrasted their communities with “the rest of reddit.” When I asked one moderator of a support-oriented subreddit about the blackout, they mentioned that “a lot of the users didn’t really identify with the rest of reddit.” Subscribers voted against the blackout, describing it as “a movement we didn’t identify with,” this moderator said.
To test hypotheses about more isolated subs, I parsed all comments in every public subreddit in June 2015, generating an “in/out” ratio. This ratio consists of the total comments within the sub divided by the total comments made elsewhere by the sub’s commenters. A subreddit whose users stayed in one sub would have a ratio above 1, while a subreddit whose users commented widely would have a ratio below 1. I tested other measures, such as the average of per-user in/out ratios, but the overall in/out ratio seems the best.
In the final model, more isolated subs were less likely to join the blackout, on a logarithmic scale. Most subreddit’s commenters participate actively elsewhere on reddit, at a mean in/out ratio of 0.24. That means that on average, a subreddit’s participants make 4 times more comments outside a sub than within it. At this level, holding everything else at their means, a subreddit with 1,000 comments a month had a 4.0% chance of joining the blackout. A similarly-sized subreddit whose users made half of their comments within the sub (in/out ratio of 1.0) had just a 1% chance of joining the blackout. Very isolated subs whose users commented twice as much in-sub had a 0.3% chance of joining the blackout, on average in the population of subs, holding all else constant.
Hypothesis 5: Subs with more moderators were more likely to join the blackout
This one was my hypothesis, based on a variety of interview details. Subs with more moderators tend to have more complex arrangements for moderating and tend to encounter limitations in mod tools. Sums with more mods also have more people around, so their chances of spotting the blackout in time to participate was also probably higher. On the other hand, subs with more activity tend to have more moderators, so it’s important to control for the relationship between mod count and sub activity.
I was wrong. In the final model, subs with more moderators were LESS likely to join the blackout. There is a very small relationship here, and the relationship is mediated by the number of comments. For a sub with 1000 comments per month, with everything else at its average, a subreddit with 3 moderators (the average) had 5.4% chance of joining the blackout. A subreddit with 8 moderators had a 6% chance of joining the blackout, on average in the population of subs.
Hypothesis 6: Subs with admins as mods were more (or less) likely to join the blackout
I heard several theories about admins. During the blackout, some redditors claimed that admins were preventing subs from going private. In interviews, moderators tended to voice the opposite opinion. They argued that subs with admin contact were joining the blackout in order to send a message to the company, urging it to pay more attention to employees who advocated for moderator interests. Moderators at smaller subs said, “we felt 100% independent from admin assistance so it really wasn’t our fight.”
None of my hypothesis tests showed any statistically significant relationship between current or past admin roles as moderators and participation in the blackout, either way. For that reason, I omit it from my final model.
Hypothesis 7: Subs with moderators who moderated other subs were more likely to join the blackout
I’ve been wondering if moderators with multiple mod roles elsewhere on reddit would be more likely to join the blackout, perhaps because they had greater “solidarity” with other subreddits, or because they were more likely to find out about the blackout.
In the final model, the reverse is supported. Subs that shared moderators with other subs were actually less likely to join the blackout, a relationship that is mediated by the by the number of moderators who also modded blackout subs. Holding blackout sub participation constant, a sub of 1,000 comments per month and 3 moderator roles shared with other subs had a 5.7% chance of joining the blackout, while a more connected sub with 6 shared moderator roles (in the 4th quantile) had a 4.2% chance of joining the blackout, on average in the population of subs, holding all else constant.
Hypothesis 8: Subreddits with mods who also moderate other blackout subs were more likely to join the blackout.
This hypothesis is also a carry-over from my previous analysis, where I found a statistically-significant relationship. Note that making sense of this kind of network relationship is a hard problem in network science, and that we can’t use this to test “influence.”
In the final model, subreddits with mods with roles in other blackout subs were more likely to join the blackout, a relationship on a log scale that is mediated by the number of moderator roles shared with other subs more generally. 19% of subs in the sample share at least one moderator with a blackout sub, after removing moderator bots. A sub with 1,000 comments per month that didn’t have any overlapping moderators with blackout subs had a 3.2% chance of joining the blackout, while a sub with one overlapping moderator had an 11.1% chance to join, and a sub with 2 overlapping moderators had a 21.1% chance to join. For a sub with 6 overlapping moderators with blackout subs, a sub had a 57.2% chance of joining the blackout.
I tend to see the network of co-moderation as a control variable. We can expect that moderators who joined the blackout would be likely to support it across the many subs they moderate. By accounting for this in the model, we get a clearer picture on the other factors that were important.
Hypothesis 9: Subs with moderators who participate in metareddits were more likely to join the blackout
In interviews, several moderators described learning about the blackout from “meta-reddits” which cover major events on the site, and which mostly stayed up during the blackout. Just like we might expect more isolated subs to stay out of the blackout, we might expect moderators who get involved in reddit-wide meta-discussion to join the blackout. I took my list of metareddits from this TheoryOfReddit wiki post.
In the final model, subs with moderators who participate in metareddits were more likely to join the blackout, on a logarithmic scale. Most moderators on the site do not participate in metareddits. A sub of 1,000 comments per month with no metareddit participation by its moderators had a 5.3% chance of joining the blackout, while a similar sub whose moderators made 5 comments on any metareddit per month had a 6.3% chance of joining the blackout.
Hypothesis 10: Subs with mods participating in moderator-focused subs were more likely to join the blackout
Although key moderator subs like /r/defaultmods and /r/modtalk are private and inaccessible to me, I could still test a “solidarity” theory. Perhaps moderators who participate in mod-specific subs, who have helped and been helped by other mods, would be more likely to join the blackout?
Although this predictor is significant in a single-covariate model, when you account for all of the other factors, mod participation in moderator-focused subs is not a significant predictor of participation in the blackout.
This surprises me. I wonder: since moderator-specific subs tend to have low volume, one month of comments may just not be enough to get a good sense of which moderators participate in those subs. Also, this dataset doesn’t include IRC discussions (nor will it ever), where moderators seem mostly to hang out with and help each other. But from the evidence I have, it looks like help from moderator-focused subs played no part to sway moderators to join the blackout.
So, how DID solidarity develop in the blackout?
The question is still open, but from these statistical models, it seems clear that factors beyond moderator workload had a big role to play, even when controlling for mods of multiple subs that joined the blackout.
In further analysis in the next week, I’m hoping to include:
- Activity by mods in each sub (comments, deletions)
- Comment karma, as another measure of activity (still making sense of the numbers to see if they are useful here)
- The complexity of the subreddit, as measured by things in the sidebar (possibly)
Building Statistical Models of Online Behavior through Qualitative Research
The process of collaborating with redditors on my statistical models has been wonderful. As I continue this work, I’m starting to think more and more about the idea of participatory hypothesis testing, in parallel with work we do at MIT around a Freire-inflected practices of “popular data“. If you’ve seen other examples of this kind of thing, do send them my way!