Improving collaborative filtering music recommender systems: a focus on user characterization from behavioral and contextual factors

Sánchez Moreno, Diego

Improving collaborative filtering music recommender systemsa focus on user characterization from behavioral and contextual factors

Sánchez Moreno, Diego

Dirigida por:

María Navelonga Moreno García Directora

Universidad de defensa: Universidad de Salamanca

Fecha de defensa: 30 de octubre de 2020

Tribunal:

Vivian Félix López Batista Presidenta
Montserrat Mateos Sánchez Secretario/a
Constantino Lopes Martins Vocal

Departamento:

INFORMÁTICA Y AUTOMÁTICA

Tipo: Tesis

Teseo: 641274 DIALNET

Resumen

The popularization of digital distribution of multimedia content, known as streaming, allows more and more users to access almost all existing music from anywhere without the limitation of the storage capacity of the devices. This enormous availability, as well as the great variety of providers of these services, makes it very difficult for users to find music that can fit their tastes. Hence, the great current interest in developing recommendation algorithms that help users to filter and discover the music that fits their preferences from the enormous amount of music content available in the digital space. Most platforms have search services, and some of them have recommendation mechanisms and offer personalized playlists, but many improvements are still required. The methods used in recommender systems are very diverse, although those based on collaborative filtering (CF) are among the most extended. The recommendations they provide are based on the ratings that users assign to items to be recommended, which in the case of music recommender systems are songs or artists. The recommendations for a given user are based on the ratings of other users with similar tastes. The results of this type of technique are quite good, however, the difficulty of obtaining explicit evaluation of the items by users makes the number of ratings insufficient, causing problems of sparsity, which prevent or hinder the application of such methods. For this reason, on some occasions implicit ways of obtaining such information are used, which are usually complex and not always effective. Other problems caused by the scarcity of ratings associated with new users or new products in the system are cold start and first-rater, respectively. In addition, it is difficult to provide reliable recommendations to users with unusual tastes (gray sheep users). A related drawback is the popularity bias that results in the most popular items being more likely to be recommended. This is due to the power law distribution of the frequency of plays of musical items (artists or songs) since high frequencies of plays are concentrated in very few items, while the remaining ones are part of the long tail of the curve. To address the above problems, content-based algorithms have been proposed as an alternative to CF methods. These methods can be used to recommend any item using its characteristics, so that the user receives recommendations of items similar to others for which he or she has shown interest in the past. Most current recommender systems use hybrid techniques to take advantage of the benefits of both approaches and avoid their drawbacks. These methods usually make use of item and user attributes, as well as rating information. This work focuses on user characterization in order to increase the degree of personalization and thus improve the recommendations provided by collaborative filtering methods. The proposals presented, although they could be extended to other application domains, are focused on the field of music because the way music is consumed differs significantly from other products and, consequently, some aspects related to recommendations are also different. The approaches proposed to characterize the user have in common the fact that they require only the information available on the streaming music platforms, without the need for any additional data such as user demographics or item attributes. Besides the fact that there are no explicit ratings of the music items and it is necessary to obtain them implicitly from the plays of artists or songs by each user. The first proposal addresses the gray sheep problem by characterizing the user according to the popularity of the music he or she listen to, which is closely related to the power law distribution of the item play frequency. This approach is applicable for both artist and song recommendations. In the latter case, recommendations are improved by considering the position of the songs in the user's sessions. Time is another important factor related to user behavior and habits. The proposal to improve recommendation methods in relation to this factor is addressed from three user-centered perspectives: modeling both the evolution of user preferences and user listening habits over time, and using time as a contextual variable to generate context-aware recommendations. The preference evolution model is involved in the process of obtaining implicit ratings. Another way to characterize the users is through their social context. Streaming music platforms do not have much information of this kind. However, available data on friendship connections and social tags can be used for this purpose. In particular, this information has been used in this work to model their degree of influence through the properties of trust and homophily, and their level of expertise respectively. Although the methods presented are not specifically designed to address the cold start drawback, some of them have been tested for that scenario, showing that they also contribute to minimize that problem.