Recommender Systems: Content-based, Social recommendations and Collaborative filtering

With the proliferation of video on-demand streaming services, viewers face a big challenge: finding content across multiple screens and apps. There may be quality information available online but it may be difficult to find. Traditionally, viewers resort to “app switching” which can be frustrating when it comes to finding quality content.

With the emergence of new technologies like AI, metadata, and machine learning, traditional content discovery approaches can’t cut the mustard anymore for content publishers. The solution is to integrate their catalogues and programming guides to a Content Discovery Platform. But, what is a Discovery Platform, and how can it make it easier for users to find what they want? Discovery Platforms with metadata aggregation, AI/ML enrichments, search and recommendations are the new disruptors in Content Marketing. Today’s post will only focus on one of the pillars of Content Discovery: the recommendations engine.

The goal of a recommendations engine is to predict the degree to which a user will like or dislike a set of items such as movies or videos. With this technology, viewers are automatically advised of content that they might like without the need to search for specific items or browse through an online guide. Recommender systems allow viewers to watch shows at times convenient for the viewer, convenient digital access to those shows and to find shows using numerous indices. Indices include genre, actor, director, keyword and the probability that the viewer will like the show as predicted by a collaborative filtering system. This results in greater satisfaction for the viewer with increased loyalty and higher revenues for the business.

1. Methods

Most recommender systems use a combination of different approaches, but broadly speaking there are three different methods that can be used:

Content-based analysis and extraction of common patterns
Social recommendations based on personal choices from other people
Collaborative filtering based on users’ behaviour, preferences and ratings

Each of these approaches can provide a level of recommendations so that most recommendation platforms take a hybrid approach, using information from each of these different sources to define what shows are recommended to the users.

1.1. Content-based

Content-based recommenders use features such as the genre, cast and age of the show as attributes for a learning system. However, such features are only weakly predictive of whether viewers will like the show. There are only a few hundred genres and they lack the specificity required for accurate prediction.

In the TV world, the only content-analysis technologies available to date rely on the metadata associated with the programmes. The recommendations are only as good as the metadata, and are typically recommendations within a certain genre or with a certain star.

1.2. Social recommendations

Social-networking technologies allow for a new level of sophistication whereby users can easily receive recommendations based on the shows that other people within their social network have ranked highly, providing a more personal level of recommendations than are achieved using a newspaper or web site.

A number of social networks dedicated to providing music recommendations have emerged over the last few years, the most well known of this being imdb.com which encourages users to like and review films and then applies a collaborative filtering algorithm to identify similar users and then ask them for recommendations.

The advantage of social recommendations is that because they have a high degree of personal relevance they are typically well received, with the disadvantage being that the suggested shows tend to cluster around a few well known or cult-interest programmes.

1.3. Collaborative filtering

Collaborative filter methods are based on collecting and analysing a large amount of information on users’ behaviour, activity or preferences and predicting what users will like based on their similarity to other users.

There are two types of filtering:

Passive filtering: Provides recommendations based on activity without explicitly asking the users’ permission (e.g. Google, Facebook). Passive filtering is less problematic when collecting the data, but requires substantial processing in order to make the data attributable to a single user. Any keywords that users have searched for within a site provides an excellent basis for passive filtering. The major disadvantage of passive filtering is that users cannot easily specify which information they want to have used for recommendations and which they don’t, so any information used for passive filtering must be carefully governed by a set of business rules to reduce the potential for inappropriate recommendations.

Active filtering: Uses the information provided by the user as the basis for recommendations (e.g. Netflix). The main issue with active collaborative filtering for TV shows is that viewers will only rate a show after watching it. And there has been limited success in getting users to build a sufficiently large database of information to provide solid recommendations.

Collaborative filtering systems can be categorised along the following major dimensions:

User-user or item-item systems: In user-user systems, correlations (or similarities or distances) are computed between users. In item-item systems metrics are computed between items (e.g. shows or movies).

Form of the learned model: Most collaborative filtering systems to date have used k-nearest neighbour models in user-user space. However there has been work using other model forms such as Bayesian networks, decision trees, cluster models and factor analysis.

Similarity or distance function: Memory-based systems and some others need to define a distance metric between pairs of items or users. The most popular and one of the most effective measures used to date has been the simple and obvious Pearson product moment correlation coefficient (PMCC). Other distance metrics used have included the cosine measure and extensions to the PMCC which correct for the possibility that one user may rate programs more or less harshly than another user. Another extension gives higher weight to users that rate infrequently.

Combination function: Having defined a similarity metric between pairs of users or items, the system needs to make recommendations for the active user for an unrated item. Memory-based systems typically use the k-nearest neighbour formula.

Evaluation criteria: The accuracy of the collaborative filtering algorithm may be measured either by using mean absolute error (MAE) or a ranking metric. Mean absolute error is just an average, over the test set, of the absolute difference between the true rating of an item and its rating as predicted by the collaborative filtering system. Whereas MAE evaluates each prediction separately and then forms an average, the ranking metric approach directly evaluates the goodness of the entire ordered list of recommendations. This allows the ranking metric approach to, for instance, penalise a mistake at rank 1 more severely than a mistake further down the list.

The tasks for which collaborative filtering is useful are:

Help me find new items I might like. In a world of information overload, I cannot evaluate all things. Present a few for me to choose from. This has been applied most commonly to consumer items (music, books, movies).

Advise me on a particular item. I have a particular item in mind; does the community know whether it is good or bad?

Help me find a user I might like. Sometimes, knowing who to focus on is as important as knowing what to focus on. This might help with forming discussion groups, matchmaking, or connecting users so that they can exchange recommendations socially.

Help our group find something new that we might like. CF can help groups of people find items that maximise value to a group as a whole. For example, a couple that wishes to see a movie together or a research group that wishes to read an appropriate paper.

Help me find a mixture of “new” and “old” items. I might wish a “balanced diet” of restaurants, including ones I have eaten in previously; or, I might wish to go to a restaurant with a group of people, even if some have already been there; or, I might wish to purchase some groceries that are appropriate for my shopping cart, even if I have already bought them before.

Help me with tasks that are specific to this domain. For example, a recommender for a movie and a restaurant might be designed to distinguish between recommendations for a first date versus a guys’ night out. To date, much research has focused on more abstract tasks (like “find new items”) while not probing deeply into the underlying user goals (like “find a movie for a first date”).

1.3.1. Time-based Collaborative Filtering with Implicit Feedback

Most collaborative filtering-based recommender systems use explicit feedback (ratings) that are collected directly from users. When users rate truthfully, using rating information is one of the best ways to quantify user preferences. However, many users assign arbitrary ratings that do not reflect their honest opinions. In some e-commerce environments, it is difficult to ask users to give ratings. For instance, in a mobile e-commerce environment the service fee is dependent on the connection time.

2. Accuracy

In the recommender systems community it is increasingly recognised that accuracy metrics such as mean average error (MAE), precision and recall, can only partially evaluate a recommender system. User satisfaction, and derivatives thereof such as serendipity, diversity and trust are increasingly seen as important. A system can make better recommendations using the following approaches:

Transparency. Explain how the system works. An explanation may clarify how a recommendation was chosen and isolate and correct misguided assumptions.

Scrutability. Allow users to tell the system it is wrong. Following transparency, a second step is to allow a user to correct reasoning, or make the system scrutable.

Trust. Increase users’ confidence in the system. Trust in the recommender system could also be dependent on the accuracy of the recommendation algorithm. A study of users’ trust suggests that users intend to return to recommender systems which they find trustworthy.

Persuasiveness. Convince users to try or buy. It has been shown that users can be manipulated to give a rating closer to the system’s prediction, whether this prediction is accurate or not.

Effectiveness. Help users make good decisions. Rather than simply persuading users to try or buy an item, an explanation may also assist users to make better decisions. Effectiveness is by definition highly dependent on the accuracy of the recommendation algorithm.

Satisfaction. Make the use of the system fun. Explanations may increase user satisfaction with the system, although poor explanations are likely to decrease a user’s interest, or acceptance of a system. The presence of longer descriptions of individual items has been found to be positively correlated with both the perceived usefulness and ease of use of the recommender system.

3. Relevance

Google’s PageRank mechanism is possible in the web because pages are linked to each other, but for video on-demand and streaming platforms we need to find another approach to relevance that will allow us to prioritise the most appropriate programming ahead of less relevant items. There are a number of potential elements that can be included and the best algorithms take into account each of these factors:

Platform: the platform that the content is on must be weighed against the scheduling.

Programme Information: the metadata provided with the programme typically includes information on the programme, cast details, and categorisation. Prioritisation can be made on the quality of the metadata.

Scheduling: when the content is going to be made available on a given platform. The viewer is typically looking for content that is more readily available than not, and the initial results in the list should reflect this.

Popularity: when searching for sports, topics, or actors, the algorithm must prioritise more popular content ahead of others. For example a search for Tennis during Wimbledon should bring up the best coverage for this tournament rather than a documentary on the origins of the sport, even though the documentary might be broadcast on a more popular platform.

Viewer behaviour: by building a relevance map of user viewing, it is possible to augment the metadata of a show with other metadata that is common amongst its nearest neighbours on the relevance map. In this way, content that has strong proximity to other content with a similar topic can be weighted as more relevant to this topic than content that’s standalone in the relevance map.

4. Challenges

The difficulty in implementing recommendations is that different users have different tastes and opinions about which content they prefer.

Quality: a substantial portion of the videos that are recommended to the user should be videos that they would like to watch, or at least might find interesting.

Transparency: it should be clear to the user why they have been recommended certain videos so that if they have been recommended a video they don’t like they can at least understand why.

User feedback: people are fanatical about their watching experience and if they are being recommended a video that they don’t like they should have an immediate way to say that they don’t like it and subsequently never have it recommended again.

Accuracy: use metrics to evaluate recommender systems, identify the strengths and the weaknesses of the metrics.

5. Research papers

A Survey of Explanations in Recommender Systems – Nava Tintarev, Judith Masthoff
A time-based approach to effective recommender systems using implicit feedback – T. Q. Lee, Y. Park
Evaluating collaborative filtering recommender systems – Jonathan L. Herlocker
Toward the Next Generation of Recommender Systems – Gediminas Adomavicius and Alexander Tuzhilin

Recommender Systems: Content-based, Social recommendations and Collaborative filtering

Share this:

Leave a comment