Decoding YouTube’s Collaboration Web

Introduction

In YouTube's digital landscape, collaborations between creators are more than just interactions; they are a force that shapes the platform's community of players. While viewers scroll endlessly through content, behind the scenes, YouTubers join forces and create alliances that can propel channels into the spotlight. But what actually happens when creators collaborate? Does it lead to a rise in popularity, or is it just another strategy lost in the noise of the internet? This data story gives an insight into the success model in the ever-changing world of YouTube gaming.

Gaming : A Window into YouTube Collaborations

The data we used is based on the Youniverse dataset. Since the dataset is too large for our tools to give a realistic representation of the whole of YouTube, we have limited our scope to the "Gaming" category. This category is a good subset to study the impact of collaborations, as most Gaming Youtubers engage in collaborative gameplay, teamwork and friendly competition.

Collaborations within a video were identified by examining the video description and extracting the usual types of chain links mentioned.

The approach of this article is to obtain an initial overview of network connectivity in the gaming category, then to carry out a general study of the impact of collaborations on the popularity of YouTubers and finally to examine more specifically the impact on a subset of channels.

The Connectivity Web: Mapping Collaborations in Gaming

In our exploration of the YouTube gaming universe, we aim to decode the vast network that exists between the various channels. Among the collaborations identified, our survey reveals a network of over 160,000 connections. This network represents a complex structure of relationships.

Through bootstrapping, we have distilled a representative sample of these interactions. The closer a channel (data point) is to the center, the more it collaborates, and gets collaborated with. These central nodes thus represent influential communities.

Image (input)

Since we will focus on the on the Gaming community, is important to distinguish between collaborations done within members of it, and members of other communities. Indeed, Gaming channels participate in a broader network of influence and interaction. For instance, there is a large number of collaborations done with Musicians, which suggest a strong relationship between these two communities. This result is not surprising as Gaming videos often includes music.

To better describe the world of gaming, we must thus evaluate its network by filtering out those “outsider” connections. This produces the following network, composed of interactions within the Gaming community only.

Image (input)

As we can see, the network is somewhat different from the one with connections with all categories. It is more connected, and the hubs are not as concentrated as the original one. This shows that, within the gaming community, channels are better at connecting between different parts of the network : it is easier to move around and get introduced to different subcommunities.

Despite that, it is clear that the more a channel is connected with (that is, the closest it is to the center), the more impactful it might be. To quantify this influence, we will thus perform an extended analysis of the network, using centrality measures - critical metrics in network theory that help identify the most influential nodes within a network. Namely, we refer to:

  • Degree Centrality as the number of connections a node has. This offers a basic yet insightful view of a channel’s activity and potential influence through contributions.
  • Betweenness Centrality as the number of times a node acts as a bridge along the shortest path between two other nodes. This indicates its role as a crucial connector or 'gatekeeper' within the network.
  • Closeness Centrality as how close a node is to all other nodes in the network. This indicates how quickly a channel can access or spread information (=influence) through its network.
By isolating the channels with the highest betweenness centrality, we can determine the influence of the gatekeepers and compare them to randomly selected non-gatekeepers. This comparison will likely reveal stark differences in connectivity and influence, reinforcing the notion that not all channels wield the same level of influence within the network.

Image (input)

The red network graph shows that, without these "gatekeeping" channels, the gaming community would have done little to no collaborations, making it a much more isolated place. Furthermore, it is easy to see that most of the gatekeepers are connected between themselves, reducing the individual influence a single channel can have, but increasing theirs as a group.

Finally, it is important to determine if there is a link between a channel’s popularity, measured by its number of subscribers, and the number of collaborations it makes. To answer this, we compute the average number of subscribers for the channels with the 100 highest degrees of centrality, and for a random sample amongst the rest of the channels from the broader network.

Image (input)

As we can see, there appears to be positive correlation between having a high number of collaborations and having a high subscriber count, making such channels more influential. This will be explored more in detail in the next part of the article.

To summarize this part of the study, we have done a detailed analysis of the impact of collaborations throughout the Gaming community of Youtube. We find that this community is far more interconnected and yet easy to move around in through collaborations. It is less isolated and the barrier to entry in the network is more spread out, as individual gatekeepers have less influence. Finally, we notice a positive correlation between the number of collaborations and the number of subscribers, suggesting that collaborating might increase popularity.

Collaboration and Popularity: A Gaming Perspective

We are now trying to find the impact of collaborations on a channel's popularity. We previously considered the number of subscribers as a popularity metric. In this part we will use the mean number of views instead, as it provides a quantitative and easily interpretable measure of the channel's overall viewer engagement, capturing the average level of interest across its entire video content. Indeed, the number of subscribers is less prone to decrease over time, even if a channel becomes less popular.

The following graph justifies this choice, as we can see that similar (mean) view counts can be obtained by channels spanning over a wide range of subscriber counts.

Image (input)

We can now study the impact of collaborations by plotting the average number of views versus the average number of collaborations per video.

Image (input)

We can see that most Youtubers don't do any collaborations, or only a few. A few lines appear for whole numbers of collaborations. It can be interpreted in different ways :

  • Collaborations are not evenly distributed among YouTubers. Many creators may prefer to work independently or may find it logistically challenging to collaborate frequently.
  • The dataset studied may not contain enough videos for each channel.

The number of collaborations does not seems to have an impact on the number of views. By performing a Welch t-test to compare the views between people that never collaborate and people who do, we find that the difference is indeed not significant.

However, we can assume that collaborations don't have the same impact on channels of different sizes. To investigate this hypothesis, we can separate the dataset into 10 groups of YouTubers, using their average number of views and perform the same analysis as before on these new groups. The results are represented below.

Image (input)

Image (input)

Even if overall collaborations does not have an impact on the popularity of a Youtuber we can see that it is not true for all bins. While for most collaborations don't have an impact, for small Youtubers (that don't make a lot of views) this result is not true anymore. For those who have less than 2420.333 views in average the Welch t-test suggests that Youtubers that collaborate will tend to have more views than those who don't.

Another result is that for really big Youtubers, collaborations also have an impact but a negative one. One possible explanation could be that when popular YouTubers are collaborating with other popular channels, the shared community of both channels will only watch one of the videos, which can result in less views for one of them. A more intricate investigation should be required to understand this effect but goes outside the scope of our study.

Analyzing the Impact: Before and After Collaborating

To understand more in detail the impact of collaborations on a channel, we will look at indicators of channel success such as channel viewership and subscriber count. We can get an idea of viewership and subscriber growth trends from the sample of 9 channels plotted below. As we can see, there are occasional periods of growth in views/subscribers of a channel followed by periods of milder growth/plateau for a few channels. The number of views and subscribers are plotted on a log scale to accommodate for viral channels.

Image (input)

Next we have graphs of the timeseries of the gain in views/subscribers per week for the sample channels.

Image (input)

Given the many fluctuations in the number of views obtained by a channel in a week, we can zoom into how collaborations may influence the number of views a channel obtains in a week. Below is a timeseries graph of the number of weekly views obtained by the channel “BrandonDoesEverythin…” overlaid with the weeks when the channel has engaged in collaboration. As the channel collaborates on a rather regular frequency, this graph does not provide much information about how collaborations may have influenced weekly views.

Image (input)

Thus, we plot the same kind of graph for a popular gaming channel: PewDiePie. There is a lack of channel weekly view data in our data set before 2017. However, we can see that PewDiePie collaborated rather frequently with other channels before the large spike in weekly views at the end of 2016. Afterwards, there was a long period of time between 2017 and 2018 during which PewDiePie did not collaborate with other channels. We can see that during this period, the weekly views on PewDiePie’s channel seems to be relatively steady. At the beginning of 2018, PewDiePie occasionally collaborates with other channels again. The number of weekly channel views obtained also seems to increase after the channel begins collaborating again.

Image (input)

Next, we will examine the collaborations that 9 sample channels engage in. There does not seem to be very strong correlation between collaboration and weekly viewership of a channel from these graphs. Thus, the relationship between collaborations and weekly channel views warrants more investigation.

Image (input)

We proceed to plot the same type of graph for a random sample of the 10 most popular channels. These channels seem to engage in collaborations, although to a rather limited scale.

Image (input)

We will run some regression analysis to model the impact of the number of collaborations a channel engages in and the number of subscribers a channel obtains. From the regression model below, we can see that since the p-value of the number of collaborations variable is < 0, the number of collaborations that a channel engages in has statistically significant influence on the number of subscribers of a channel. However, the r-squared value of the model is only 0.008 which implies that a model is insufficient in explaining the variance of the number of channel subscribers. Thus, there is reason to believe that the number of subscribers to a channel is likely influenced by many other factors apart from the number of collaborations that the channel engages in. Based on the linear regression model, a unit increase in the number of collaborators that the channel engages in can increase the number of subscribers of a channel by 6.69 subscribers.

Image (input)

Next, we will examine the relationship between the lifespan of a youtube channel and the collaborations it engages in.

First, we will define a channel to be dead when the number of views it obtains in a month is less than 20% of its subscriber count. From there we can obtain the lifespan of a channel. One point to note for this is that due to the dataset only containing data up to July of 2019, there is a limit to the maximum lifespan a channel may have. Below is a graph of the distribution of the lifespan of youtube channels in the dataset. We can see that this distribution seems to be bimodal with the two largest proportion of channels having lifespan either close to 0 or around 35 to 40 months.

Image (input)

Next, we shall run some regression analysis to model the relationship between lifespan of channels and the number of collaborations it engages in.

From the linear regression results below, the number of collaborations that a channel engages in has a statistically significant influence, at the 5% significant level, on the lifespan of the channel as it has a p-value < 0.05. However, this model with number of collaborations as the only factor influencing channel lifespan does not explain the variance in channel lifespan very well since the r-squared value is only 0.06. Thus there is reason to believe that several other factors may influence the lifespan of the channel. From this model, we can conclude that a unit increase in the number of collaborations a channel engages in on average increases the lifespan of the channel by 6.88 * 10^-5 months.

Image (input)

A potential confounding factor in the relationship between the number of collaborations of a channel and the number of videos a channel uploads. A channel which uploads more videos may have invested more resources into their youtube channels and are thus more likely to be willing to engage in a collaboration. When more videos are uploaded, it is likely that the channel is releasing new content for a longer period of time. New content may be able to sustain viewership with the aid of services like youtube notifications. Thus, we will model the relationship between lifespan and number of collaborations again with the number of videos a channel has uploaded as a confounding variable.

Image (input)

The new model is able to explain the variance in channel lifespan much better as seen from the much higher r-squared value of 0.034. Both the number of videos uploaded by a channel and the interaction term between number of videos uploaded and the number of collaborations the channel engages in are statistically significant, at the 5% significance level, in influencing the lifespan of the channel as the p-value of these variables are all < 0.05. The above regression model tells us that a unit increase in the number of collaborations that a channel engages in and a unit increase in the number of videos of a channel results in an increase of (6.77*10^-5 + 0.013 - 3.80*10^-9) months of the channel’s lifespan.

Thus, from the above analysis, we can see that although the impact of collaborations on a channel’s popularity is statistically significant, it is only one of the many factors influencing channel growth.

Conclusion

In summary, the study looks at YouTube collaborations within the gaming community, revealing a complex network of over 160,000 connections. While collaborations correlate positively with subscriber numbers, nuanced effects are observed - smaller channels benefit, while larger channels may experience negative impacts. Regression models suggest that collaborations influence subscriber numbers and channel lifetimes, but are only one of many factors determining channel growth. The study highlights the complex role of collaborations in the dynamic landscape of the YouTube gaming community.

Limitations

However, this study is not without its limitations. The use of the Youniverse dataset, with its inherent constraints, raises questions about the generalizability of the results to the entire YouTube gaming landscape. In addition, the temporal scope of the analysis may not capture recent changes in platform dynamics, requiring some caution in extrapolating results to today. The study recognizes that channel growth is a complex interplay of various factors, and that collaborations represent only one facet of this complex ecosystem.

Ethical risks

Ethical considerations are central in this exploration of YouTube collaborations. Although the analysis uses publicly available data to protect individual privacy, it is difficult to accurately attribute all collaborations based on video descriptions alone. The generalization of results must be approached with caution, given the diversity of content and creators. In addition, ongoing ethical vigilance is essential to ensure that the study remains respectful of creators' autonomy and avoids unintentional misrepresentation of their collaborative efforts on the platform.