We are creating two different models to recommend board games based on features of the games or the ratings previous users have given a particular game. We want to sort the data and remove the least informative information. We plan on doing this in an increasing stepwise fashion based on percentages, re-evaluating the recommendations accuracy at each point, comparing differences amongst them and evaluating the total effect on quality of recommendations.
The dataset is a collection of user ratings of board games gathered by BGG. This dataset is composed of almost 19 million ratings, made by 411374 users for 21521 board games. The dataset contains a lot of features that describe the board games. There are 4 main classes of features: game category, game subcategory, game mechanics and game themes. Game category describes the type of board games; the dataset contains 8 different categories: family, children, strategy, war, thematic, CGS, abstract and party. The subcategories will be used in conjunctions with the category features in order to get a more detailed understanding of the game. For example a game of the war category might have a subcategory of territory building and card game.The dataset contains 11 of these features. The mechanical features refer to the various rules that govern how the board game is played and how players interact with each other. The dataset contains a total of 158 mechanical features such as dice rolling, hand management, roll/spin and move, action points, etc. The theme features refer to a general topic, or subject related to the board game. There are various themes presented in the dataset such as adventure, fantasy and science fiction. The dataset contains 18 columns corresponding to the themes category. The most popular themes were found to be fantasy, science fiction, fighting and economic games. Only a limited number of these features will be considered for the model. The features will be filtered in the dataset preprocessing phase.
To what extent can the removal of less informative data the dataset impact the accuracy of the recommender system when using latent factors collaborative filtering and Pearson-correlation distance content-based filtering?
Two algorithms will be used in the project:
Team 36
Student | ID |
---|---|
Ma-Ya McRae | 27536143 |
Fadi Albasha | 40087747 |
Vithya Nagamuthu | 40077465 |
Christian pangia-henneveld | 40034040 |