Disillusion Avalon: Board Game Data Analysis

How would social deduction games look like with data?

15 min readMar 6, 2021

Introduction: Board Game!

Avalon, or The Resistance, is an exceptionally popular board game. Players are randomly and secretly assigned a role belonging to one of two possible forces (Good or Evil in Avalon; Resistance or Spy in the original Resistance game), and should act based on their affiliation in order to win.

Although the game can sometimes be quite intricate, the underlying concept is straightforward, enabling players to learn the rules and enjoy the gameplay in just a few minutes. Since I am also fascinated by the fun of social role-playing party games, I was drawn by the question: what results can we derive from analyzing data in a board game?

Note: Since the following article assumes readers have a basic understanding of Avalon rules and tactics, if you are not familiar with Avalon, please check the Wiki or this guide for reference.

With that thought, I found a very neat project called ProAvalon. The creator (Victor) of this open-source game has made a virtual version of Avalon that allows players around the globe to participate in a friendly setting. With Victor’s help, I was able to get ProAvalon’s log data for each game played, including some basic information and voting history. This allowed me to explore the game statistics and conduct a deeper analysis.

Data Description & Caveats

The original data has over 83,000 records. To simplify the analysis, I focused on games with the most popular setting, namely games that included 5 to 6 players and only 4 special characters (‘Merlin’, ‘Percival’, ‘Assassin’, ‘Morgana’). I also filtered out games with expansion content (e.g., Lady of the Lake). After basic cleaning, the final data set for analysis contained 55,040 records and 20+ features.

Note: The raw data is in JSON format. If you are interested in the preprocessing in Python, please read the details of the data cleaning and the pipeline on my GitHub. This article leaves out all of the technical processes.

The original JSON data format for a game

Before we jump into the analysis, I would like to mention a few caveats:

The data doesn’t include any of the text messages during the games, even though conversation plays a huge role in Avalon. This might sound like a deal-breaker for analysis, but other data such as voting history and win rate are still important and may provide interesting insights. We just have to keep in mind that we are making certain observations and inferences without knowing the discussion during the gameplay.
Although the data is very detailed in terms of the voting stage, we do not know who had voted “success” or “fail” when 3 or 4 were are on a mission. However, this does not pose much concern as we can use some methods to deduce whether the spy in the mission team had actually voted “fail” or cunningly voted “success” to deceive others.
Since the game is built with an English interface and the data doesn’t contain any personal or demographic information, we can only assume that most of the players are English speakers.

If you are a newcomer, you can also watch this video to understand the basics of the game.

Basic Statistics: What does the data look like?

Alright, enough of the introduction, let’s jump into it.

Number of Games Played

The data is collected from 7/2018 to 1/2021. As we can see, there is a spike in the number of games played in mid-2020, possibly suggesting that the COVID-19 pandemic had spurred an increase in gameplay.

The absence of data between 7/2019 and 10/2019 was due to a shutdown of the service.

Game Time: Huge variation in times!

To get a general idea of the gameplay, let’s first take a look at the most basic data, namely the time span. The average time for a single game is about 15 minutes. The time span varies greatly as the standard deviation is almost 10 minutes.

As the below graph shows, the distribution actually resembles more of a “log-normal distribution” instead of a normal distribution. Interestingly, a considerable percentage of the games (11.7%) were only played for less than 5 minutes! I suspect that many of those games were played by non-regular players, because in Avalon, approving everything without a second thought could result in a very short game. Another possible reason is that this could be the result of imbalanced match-making, since there is no ranking or match-making system used in the ProAvalon platform.

Mean: 15.57 mins; Standard Deviation: 9.99 mins

Note: There were some exceptionally long games that lasted over an hour; they were not caused by some bugs or AFK players but rather was because the games had very active voting sessions.

Major Features: Who won, and how

Next, the most important information in games: who won. The win rates of the Good team and the Evil team are 56.5% and 43.5% respectively. This came as a bit of a surprise to me, since I had always thought the “good guy” would win most of the time. However, the numbers say otherwise.

To further break it down, when the Evil team won, over half of their wins (57%) were because, despite mission success for the Good team, the Assassin correctly identified Merlin. Meanwhile, missions failing 3 times accounted for about 39% of Evil team wins. Hammer rejecting (which means the assembling vote fails 5 times) only accounted for a tiny portion (about 4%).

Spy/Evil team has different win conditions

Win Rate & Time: Longer games, more balanced results

Next, an important question we can ask is: what is the relationship between game time and which team wins?

If we split the data into two groups based on which team wins, then we can see slightly different distributions. From this, we can make the following two observations:

When the span of the game is very short, the chance of the Good team winning is higher. This could result from games ending in the third quests because only Good players get assigned to quests by chance and Evil players don’t have enough clues on Merlin.
When the game time is over 30 minutes, the gap between the number of games won by the Good team or the Evil team becomes smaller. Games that last longer could represent more competitive gameplays where it was not immediately apparent which team would win.

A clear distinction between which team wins the game.

Win Rate & Failed Vote: Dispute results in more losses for the good team?

Another metric we can look into is how many failed votes (the number of votes that rejected a mission team proposed by the mission leader) happened in one game. Let’s look at the standalone information about failed votes.

The histograms have some very interesting results if we split the data into two groups: games with 5 players, or those with 6. For the games with 5 players (left), many games have no failed votes, and there are some peaks at 4 and 8 failed votes. This could suggest that since reaching 5 failed votes for a single mission would immediately make the Good team lose, most players would stop disputing at the 5th consecutive votes. In the data, 4 and 8 failed votes likely meant the game experienced one or two rounds where four consecutive votes rejected a mission team, forcing the Good team to accept the team formed at the 5th vote.). Similarly, six-players games have this pattern too. There are spikes in 0, 8, 12, 16 failed votes (although not with 4). In particular, there was a huge peak for the number of games with 12 failed votes. I would speculate that this may stem from most games only stopping at rounds three or four (meaning missions have succeeded 3 times), so 12 is the magic number for very competitive games.

Based on these observations, I would also say that 6-player games appear far more competitive and favored by seasonal players than 5-players games because of fewer 0 failed votes. 6-player games might be more fun and balanced, given that there are more 6-player games played than 5-player games.

The number of games with each count of failed votes for 5-player games (left) and 6-player games (right).

Now, how about failed votes with win rates? As shown in the graph below (y-axis represents the win rate of the Good team), we can see a trend where the Good team has a lower win rate when the number of failed votes increases.

Again, if we split the data, some numbers stand out. When the number of failed votes reaches 4 for 5-player games and reaches 8 for 6-players games, the Good team has the highest win rates. The logic behind this is not clear; perhaps the Good team in those games tried to use as many failed votes as possible when the mission leader seemed suspicious and might have been Morgana or Assassin, for example. The Good team’s win rates being highest at 4 and 8 votes for 5-player and 6-player games, respectively, could also be due to more sophisticated game tactics correlating with such results, or even perhaps could be related to ProAvalon game systems issues.

The Good team’s win rates for 5-player games (left) and 6-player games (right), based on the number of failed votes in the game.

Advanced Analysis: Win condition vs Voting and starting position

Up until now, we have only made some fundamental observations. Because we are interested in how various factors might influence who wins, let’s next dive into the meat of the data, which is voting information. In this section, I used the winning team (which team won; i.e., Good or Evil) and win condition (how the game was won; e.g., Mission Fails, Assassination Succeeds) as target variables to see how other factors potentially affected them.

Note: All of the tests below are done with Chi-squared analysis (assessing independence among categorical variables) under the 95% confidence level. Thus, a p-value of 0.05 is the threshold for the significance of the test.

First leader

Some may believe that the choice of who acts as the first “leader” may influence the result of the game.

The leader is the person who proposes the team, assigning a few players to do the quest. And since this is the role that appears to have more power and freedom to manipulate the game, people often think this is a decisive factor for which team would ultimately win.

However, a statistical analysis tells a different story. Although the simple statistics test (chi-squared analysis) shows there is a significant relationship (p-value = 0.00) between “who is the first leader” and these two targets (“which team wins” and “how the game was won”), the effect size calculated by adjusted Cramér’s V is actually pretty small (0.030 for winning team & 0.022 for win condition). Therefore, I would actually conclude that “which character is the first leader” alone wouldn’t substantially affect the game’s result.

A chi-squared analysis on “howTheGameWasWon”

Note 1: Chi-squared tests tend to deflate the p-value as the sample size goes over 10,000, so I incorporated the calculation of the effect size, which represents the strength of the association between two variables; Cramér’s V varies from 0 (corresponding to no association between the variables) to 1 (complete association)
Note 2: I also tested useing games with just 5 players or 6 players. Once the sample size goes down, the p-value goes up for the Chi-squared test (0.158 and 0.318 respectively)

The sequence of leadership

If “who is the first leader” was not a decisive factor for the result of the game, then how about the more complicated “sequence of leadership”?

The sequence of leadership is the sequence of who proposes a team in each quest. It is reasonable to think that this ordering may be correlated to the result (for example, if Evil team members keep getting the chance to pick quest members, the Evil team could more easily sabotage the mission).

Wow, there are 5,791 unique sequences in this dataset!

However, a deeper analysis suggests that it is not so straightforward. If we switch “which role starts first” with “the sequence of the leadership” in our chi-squared test, the test result shows a significant relationship (p-value with 0.00 for “which team wins”). Despite that, the effect sizes for 5-players games and 6-players games are 0.125 and 0.144 respectively, which are still rather weak values. That being said, these still reflect stronger relationships than for the previous feature, and so I would consider the leadership sequence to be a meaningful factor.

One thing worth noticing is that, even in games with the sequence of [‘Resistance’, ‘Resistance’, ‘Resistance’, ‘Resistance’, ‘Resistance’] where the leader of each quest is from the Good team (this is among the top 10 most common sequences), the Evil team still wins most of the time (60% of the games). Under this scenario, although the Good team could win at the quest stage, the Evil team has an almost fifty-fifty chance to assassinate Merlin correctly. This shows that the game is so much more than just who gets to play the leader.

Top 10 common sequences in leaderships with “How the game was won”

More Practical: Special Characters as the first leader

Player’s view: What would you do if you are a special character?

Until this point, our analysis still hasn’t quite provided very useful tips that players could use during gameplay. Especially for newcomers to strategy games like Avalon, it can sometimes be quite tough at the outset to get the hang of the game. By further analyzing the data, including choices for the first mission teams, I hoped to uncover some insights and provide a general guideline for players new to Avalon. As such, I also had this thought — what could players do to increase their win rate when they are a special character and also the first leader?

Some special characters (Merlin, Percival, and Morgana/Hitman) might have some advantages while making a certain opening hand given they know something more. Thus, I decided to investigate whether the first mission leaders being special characters could possibly reveal additional insights that might be relevant for players during gameplay.

Morgana and Assassin: Does complicated play pay off?

In the evil team, Morgana and Assassin have the knowledge of who is in the opposite team. Let’s see the summary statistics of the choice of the first team and how that may impact the end results.

For the test with “Which team wins”, the p-value shows 0.00, rejecting the null hypothesis that two variables are independent. Yet again, the effect size shows only 0.0457, indicating a very small association. From this, we can only say that there is a slightly higher chance that the Evil team would win if an Evil team member picks two Evil members (Morgana and Assassin) as the first team members. Although this approach carries the risk that there could be two fail votes in the process (thereby giving a hint to the Good team on who Evil team affiliates are), the result is still better than picking one Evil member and one Good member. As an Evil team affiliate being the first mission leader, choosing two Good team members would generally not be worth considering (although on the other hand, this could be a cunning strategy to use, especially as the Evil still could have a good chance of winning).

If you are in the Evil team, what would you do?

Merlin: What should the player who knows everything do?

Merlin has the power of knowing who is who. But if Merlin is the mission leader in the first round, what is the best approach they could use?

Based on the statistics, the p-value once again shows 0.00, passing the significant test. Still, the effect size shows only 0.0655, a very minor effect. According to the summary table, picking two Good team members is the best option, a move that likely aligns with our intuition. The other options are unnecessary (Although some people might think that picking one Evil team member on purpose could be a clever trick to expose them).

Percival: Can the player who knows half the story influence game results?

Next, we move on to Percival, the character who knows which two players are Merlin and Morgana (but doesn’t know who’s who). I distinguished four meaningful choices Percival can make as the first mission leader creating a team: “Merlin and Morgana”, “Percival and 1 of those 2”, “Percival and others” (not including Merlin or Morgana), and “All the others” (meaning the mission team does not include Percival, Merlin, or Morgana). However, the result is not significant at all (p-value: 0.4863). Thus, it seems that by itself, what decision Percival makes is not that important to the final result.

Sadly, we can’t reject the hypothesis that Percival's choice doesn’t matter on its own.

Conclusion: A fair and complicated game!

We have spent quite some time analyzing many aspects of the data. Based on these findings, I would say that Avalon is a very smart and balanced board game; in other words, although there are certain tactics and strategies that general players can use to increase their chances of winning, ultimately there are no easy formulas that guarantee a win. I believe that if we had the discussion data, we could have explored the subject at an even deeper level.

That being said, we can still summarize a few practical findings here (tl;dr version):

Take more time to enjoy the game. Games with longer playtime tend to have a more balanced win rate. At least on the ProAvalon platform, once the game exceeded 20 minutes, the gap between win rates greatly decreased. Though there might be some additional factors for this result, taking more time to make a well-informed decision is always a good move.
If you are on the Evil team, there’s no need to worry. If you are the “bad guy”, don’t get too nervous about it. Even if you only have one other teammate and have to sabotage under the hood, the numbers tell you that you still have a great chance of winning the game. Furthermore, even if you lose during the voting rounds, there is still a considerable chance for you to win if you assassinate Merlin. After all, over 50% of the Evil team winning occurs through this assassination. Subtly misleading discussion during voting sessions is also a good way to increase your chance to win.
Don’t feel pressure to make sophisticated moves. If you are new to the game and wonder what you should do if you are the first mission leader, just make the most logical move, which is picking your allies as the quest members. We can’t say that this definitely applies to all levels of competition because of the limits of the data, but these analyses about picking first mission members do consistently show that if you want to confuse others or hide your intentions, it may be wiser to do so during discussion rather than during the choice of mission members.

Overall, we have been able to uncover several interesting points and see how statistical analysis could be applied to this subject. In general, it is hard to find this kind of gaming data about board games, so I again appreciate Victor’s efforts in making the online version of Avalon and offering the precious data. I would love to hear any comments and feedback on this subject.

Photo by Karthik Balakrishnan on Unsplash

What’s next?

This was a wild ride exploring various aspects of Avalon, and yet there are still many more details of Avalon’s fascinating gameplay to be explored!

If you are also interested, please feel free to contribute to the project as well!

To keep this article compact, I didn’t include all the things I could think of in this post (e.g., assassination time, failed votes in each quest, modeling)

For the next steps, I would try to use explorative Machine Learning techniques to uncover more insights from the data. I am optimistic that there are ways to model this and make it work (e.g., modeling with the Bayesian probability approach).

Thanks for reading the long post! I would like to give special thanks to Michelle for dedicated proofreading and suggestion. Again, if you are interested in the technical details about how to process the data and derive the analyses described above, please refer to my GitHub! I learned a lot from this fun project, so feel free to reply with any thoughts and opinions, or contact me through Linkedin!

Jeff