Published:Â 21 March 2024
Contributors:Â Jacob Murel Ph.D., Eda Kavlakoglu
Collaborative filtering is a type of recommender system. It groups users based on similar behavior, recommending new items according to group characteristics.
Collaborative filtering is an information retrieval method that recommends items to users based on how other users with similar preferences and behavior have interacted with that item. In other words, collaborative filtering algorithms group users based on behavior and use general group characteristics to recommend items to a target user. Collaborative recommender systems operate on the principle that similar users (behavior-wise) share similar interests and similar tastes.1
Collaborative filtering vs content-based filtering
Collaborative filtering is one of two primary types of recommender systems, the other being content-based recommenders. This latter method uses item features to recommend similar items as the items with which a particular user has positively interacted in the past.2Â While collaborative filtering focuses on user similarity to recommend items, content-based filtering recommends items exclusively according to item profile features. Content-based filtering targets recommendations to one specific userâs preferences rather than a group or type as in collaborative filtering.
Both methods have witnessed many real-world applications in recent years, from e-commerce like Amazon to social media to streaming services. Together, collaborative and content-based systems form hybrid recommender systems. In fact, in 2009, Netflix adopted a hybrid recommender system through its Netflix prize competition.
White paperWhy AI governance is a business imperative for scaling enterprise artificial intelligence
Learn about barriers to AI adoptions, particularly lack of AI governance and risk management solutions.
Related content
Register for the guide on foundation models
How collaborative filtering works
Collaborative filtering uses a matrix to map user behavior for each item in its system. The system then draws values from this matrix to plot as data points in a vector space. Various metrics then measure the distance between points as a means of calculating user-user and item-item similarity.
User-item matrix Â
In a standard setting of collaborative filtering, we have a set of n users and a set of x items. Each userâs individual preference for each item is displayed in a user-item matrix (sometimes called a user rating matrix). Here, users are represented in rows and items in columns. In the Rij matrix, a given value represents the behavior of user u toward item i. These values may be continuous numbers provided by users (for example ratings) or binary values that signify whether a given user viewed or purchased the item. Here is an example user-time matrix for a bookshop website:
         Â
This matrix displays user ratings for different books available. A collaborative filtering algorithm compares userâs provided ratings for each book. By identifying similar users or items based on those ratings, it predicts ratings for books a target user has not seenârepresented by null in the matrixâand recommend (or not recommend) those books to the target user according.
The example matrix used here is full given itâs restricted to four users and four items. However, in real world scenarios known usersâ preferences for items are often limited, leaving the user-item matrix sparse.3
Similarity measures
How does a collaborative recommendation algorithm determine similarity between various users? As mentioned, proximity in vector space is a primary method. But the specific metrics used to determine that proximity may vary. Two such metrics are cosine similarity and Pearson correlation coefficient.
Cosine similarity
Cosine similarity signifies the measurement of the angle between two vectors. Compared vectors comprise a subset of ratings for given user or item. The cosine similarity score can be any value between -1 and 1. The higher the cosine score, the more alike two items are considered. Some sources recommend this metric for high-dimensional feature spaces. In collaborative filtering, vector points are pulled directly from the user-item matrix. Cosine similarity is represented by this formula, where x and y signify two vectors in vector space:4
Pearson correlation coefficient (PCC)
PCC helps measure similarity between items or users by computing the correlation between two usersâ or itemsâ respective ratings. PCC ranges between -1 and 1, which signify negative to identical correlation. Unlike cosine similarity, PCC uses all the ratings for a given user or item. For example, if calculating PCC between two users, we use this formula, in which a and b are different users, and rai and rbi are that userâs rating for item i:5
Types of collaborative recommender systemsÂ
There are two primary types of collaborative filtering systems: memory-based and model-based.
Memory-based
Memory-based recommender systems, or neighbor-based systems, are extensions of k-nearest neighbors classifiers because they attempt to predict a target userâs behavior toward a given item based on similar users or set of items. Memory-based systems can be divided into two sub-types:
- User-based filtering recommends items to a target user based on the preferences of behaving users. The recommendation algorithm compares a target userâs past behavior to other users. Specifically, the system assigns each user a weight representing their perceived similarity with the target userâthis is the target userâs neighbors. It then selects n users with the highest weights and computes a prediction of the target userâs behavior (e.g. movie rating, purchase, dislikes, etc.) from a weighted average of the selected neighborsâ behavior. The system then recommends items to the target user based on this prediction. The principle is that, if the target user behaved similarly to this group in the past, they will behave similarly with unseen items. User-based similarity functions are computed between rows in the user-item matrix.6
- Item-based filtering recommends new items to a target user based on that userâs behavior toward similar items. Note, however, that in comparing items, the collaborative system does not compare item features (as in content-based filtering) but instead how users interact with those items. For instance, in a movie recommendation system, the algorithm may identify similar movies based on correlations between all user ratings for each movie (correcting for each userâs average rating). The system will then recommend a new movie to a target user based on correlated ratings. That is, if the target user rated movie a and b highly but has not seen movie c, and other users who rated the former two highly also rated movie c highly, the system will recommend movie c to the target user. In this way, item-based filtering calculates item similarity through user behavior. Item-based similarity functions are computed between columns in the user-item matrix.7
Model-based
At times, literature describes memory-based methods as instance-based learning methods. This points to how user and item-based filtering make predictions specific to a given instance of user-item interaction, such as a target userâs rating for an unseen movie.
By contrast, model-based methods create a predictive machine learning model of the data. The model uses present values in the user-item matrix as the training dataset and produces predictions for missing values with the resultant model. Model-based methods thus use data science techniques and machine learning algorithms such as decision trees, Bayes classifiers, and neural networks to recommend items to users.8
Matrix factorization is a widely discussed collaborative filtering method often classified as a type of latent factor model. As a latent factor model, matrix factorization assumes user-user or item-item similarity can be determined through a select number of features. For instance, a userâs book rating may be predicted using only book genre and user age or gender. This lower-dimensional representation thereby aims to explain, for example, book ratings by characterizing items and users according to a few select features pulled from user feedback data.9 Because it reduces the features of a given vector space, matrix factorization also serves as a dimensionality reduction method.10
Advantages and disadvantages of collaborative filtering
Advantages
Compared to content-based systems, collaborative filtering is more effective at providing users with novel recommendations. Collaborative-based methods draw recommendations from a pool of users who share interests with one target user. For instance, if a user group liked the same set of items as the target user, but also liked an additional item unknown to the target user because it shares no features with the previous set of items, a collaborative filtering system recommends this novel item to the user. Collaborative filtering can recommend items that a target user may have not considered but that nevertheless appeal to their user type.11
Disadvantages
The cold start problem is perhaps the most widely cited disadvantage of collaborative filtering systems. It occurs when a new user (or even a new item) enters the system. That userâs lack of item-interaction history prevents the system from being able to evaluate the new userâs similarity or association with existing users. By contrast, content-based systems are more adept at handling new items, although they also struggle with recommendations for new users.12
Data sparsity is another chief problem that can plague collaborative recommendation systems. As mentioned, recommender systems typically lack data on user preferences for most items in the system. This means that most of the systemâs feature space is empty, a condition called data sparsity. As data sparsity increases, vector points become so dissimilar that predictive models become less effective at identifying explanatory patterns.13Â This is a primary reason why matrix factorizationâand related latent factor methods such as singular value decompositionâis popular in collaborative filtering, as it alleviates data sparsity by reducing features. Other methods implemented for resolving this issue may also involve users themselves assessing and providing information on their own interests, which the system can then use to filter recommendations.
Recent research
While past studies have approached recommendation as a prediction or classification problem, a substantive body of recent research argues that it is understood as a sequential, decision-making problem. In this paradigm, reinforcement learning might be more suitable for addressing recommendation. This approach argues that recommendation updates in real-time according to user-item interaction; as the user skips, clicks, rates, purchases suggested items, the model develops an optimal policy from this feedback to recommend new items.14Â Recent studies propose a wide variety of reinforcement learning applications to address mutable, long-term user interests, which pose challenges for both content-based and collaborative filtering.15
Related products and solutions
AI consulting services
Reimagine how you work with AI: Our diverse, global team of more than 20,000 AI experts can help you quickly and confidently design and scale AI and automation across your business, working across our own IBM watsonxâąÂ technology and an open ecosystem of partners to deliver any AI model, on any cloud, guided by ethics and trust.
Explore IBM AI consulting servicesÂ
AI solutions
Operationalize AI across your business to deliver benefits quickly and ethically. Our rich portfolio of business-grade AI products and analytics solutions are designed to reduce the hurdles of AI adoption and establish the right data foundation while optimizing for outcomes and responsible use.
IBM watsonx
Multiply the power of AI with our next-generation portfolio of AI products. IBM watsonx offers business-ready tools, applications and solutions, designed to reduce the costs and hurdles of AI adoption while optimizing outcomes and responsible use of AI.
Explore watsonx Try watsonx.aiÂ
Related resources
CommunityIBM and a department store build a recommender
Walk through this IBM Community recommendation system use case which focuses on the retail industry.
Research paperBeyond collaborative filtering: the list recommendation problem
IBM Research proposes a novel two-layered framework that builds on existing CF algorithms to optimize a listâs click probability.
CourseEarn an IBM Machine Learning Professional Certificate
Prepare for a career in machine learning with IBMâs course on Coursera. Gain the in-demand skills and hands-on experience to get job-ready in less than 3 months.
- This link opens in a new tab
Take the next step
Train, validate, tune and deploy generative AI, foundation models and machine learning capabilities with IBM watsonx.ai, a next-generation enterprise studio for AI builders. Build AI applications in a fraction of the time with a fraction of the data.
Explore watsonx.aiBook a live demo
Footnotes
1 âCollaborative Filtering,â Encyclopedia of Machine Learning and Data Mining, Springer, 2017. Mohamed Sarwat and Mohamed Mokbel, âCollaborative Filtering,â Encyclopedia of Database Systems, Springer, 2018.
2 Prem Melville and Vikas Sindhwani, âRecommender Systems,â Encyclopedia of Machine learning and Data Mining, Springer, 2017.
3 YUE SHI, MARTHA LARSON, and ALAN HANJALIC, âCollaborative Filtering beyond the User-Item Matrix: A Survey of the State of the Art and Future Challenges,â ACM Computing Surveys, vol. 47, no. 1, 2014, https://dl.acm.org/doi/10.1145/2556270. Kim Falk, Practical Recommender Systems, Manning Publications, 2019.
4 Elsa Negre, Information and Recommender Systems, Vol. 4, Wiley-ISTE, 2015. Sachi Nandan Mohanty, Jyotir Moy Chatterjee, Sarika Jain, Ahmed A. Elngar, and Priya Gupta, Recommender System with Machine Learning and Artificial Intelligence, Wiley-Scrivener, 2020.
5 Kim Falk, Practical Recommender Systems, Manning Publications, 2019. J. Ben Schafer, Dan Frankowski, Jon Herlocker, and Shilad Sen, âCollaborative Filtering Recommender Systems,â The Adaptive Web: Methods and Strategies of Web Personalization, Springer, 2007.
6 Charu Aggarwal, Recommender Systems: The Textbook, Springer, 2016. Prem Melville and Vikas Sindhwani, âRecommender Systems,â Encyclopedia of Machine Learning and Data Mining, Springer, 2017.
7 Charu Aggarwal, Recommender Systems: The Textbook, Springer, 2016. Kim Falk, Practical Recommender Systems, Manning Publications, 2019.
8 Charu Aggarwal, Recommender Systems: The Textbook, Springer, 2016.
9 Prem Melville and Vikas Sindhwani, âRecommender Systems,â Encyclopedia of Machine Learning and Data Mining, Springer, 2017. Yehuda Koren, Steffen Rendle, and Robert Bell, âAdvances in Collaborative Filtering,â Recommender Systems Handbook, 3rd edition, Springer, 2022.
10 Charu Aggarwal, Recommender Systems: The Textbook, Springer, 2016.
11 Sachi Nandan Mohanty, Jyotir Moy Chatterjee, Sarika Jain, Ahmed A. Elngar, and Priya Gupta, Recommender System with Machine Learning and Artificial Intelligence, Wiley-Scrivener, 2020. Charu Aggarwal, Recommender Systems: The Textbook, Springer, 2016.
12 Charu Aggarwal, Recommender Systems: The Textbook, Springer, 2016. Ian Goodfellow, Yoshua Bengio, and Aaron Courville, Deep Learning, MIT Press, 2016.
13 Ian Goodfellow, Yoshua Bengio, and Aaron Courville, Deep Learning, MIT Press, 2016.
14 Guy Shani, David Heckerman, Ronen I. Brafman, âAn MDP-Based Recommender System,â Journal of Machine Learning Research, Vol. 6, No. 43, 2005, pp. 1265â1295, https://www.jmlr.org/papers/v6/shani05a.html (link resides outside ibm.com). Yuanguo Lin, Yong Liu, Fan Lin, Lixin Zou, Pengcheng Wu, Wenhua Zeng, Huanhuan Chen, and Chunyan Miao, âA Survey on Reinforcement Learning for Recommender Systems,â IEEE Transactions on Neural Networks and Learning Systems, 2023, https://ieeexplore.ieee.org/abstract/document/10144689 (link resides outside ibm.com). M. Mehdi Afsar, Trafford Crump, and Behrouz Far, Reinforcement Learning based Recommender Systems: A Survey,â ACM Computing Survey, Vol. 55, No. 7, 2023, https://dl.acm.org/doi/abs/10.1145/3543846 (link resides outside ibm.com).
15 Xinshi Chen, Shuang Li, Hui Li, Shaohua Jiang, Yuan Qi, Le Song, âGenerative Adversarial User Model for Reinforcement Learning Based Recommendation System,â Proceedings of the 36th International Conference on Machine Learning, PMLR, No. 97, 2019, pp. 1052-1061, http://proceedings.mlr.press/v97/chen19f.html (link resides outside ibm.com). Liwei Huang, Mingsheng Fu, Fan Li,Hong Qu, Yangjun Liu, and Wenyu Chen, âA deep reinforcement learning based long-term recommender system,â Knowledge-Based Systems, Vol. 213, 2021, https://www.sciencedirect.com/science/article/abs/pii/S0950705120308352 (link resides outside ibm.com).