Accuracy of models using Lenskit

This notebook shows how to test the recommendations accuracy of models using lenskit.

Setup

[1]:
from lenskit.datasets import MovieLens
from lenskit import batch, topn, util
from lenskit import crossfold as xf
from lenskit.algorithms import Recommender, als, item_knn, basic
import lenskit.metrics.predict as pm
import pandas as pd
/Users/carlos/anaconda3/lib/python3.7/site-packages/fastparquet/encoding.py:222: NumbaDeprecationWarning: The 'numba.jitclass' decorator has moved to 'numba.experimental.jitclass' to better reflect the experimental nature of the functionality. Please update your imports to accommodate this change and see http://numba.pydata.org/numba-doc/latest/reference/deprecation.html#change-of-jitclass-location for the time frame.
  Numpy8 = numba.jitclass(spec8)(NumpyIO)
/Users/carlos/anaconda3/lib/python3.7/site-packages/fastparquet/encoding.py:224: NumbaDeprecationWarning: The 'numba.jitclass' decorator has moved to 'numba.experimental.jitclass' to better reflect the experimental nature of the functionality. Please update your imports to accommodate this change and see http://numba.pydata.org/numba-doc/latest/reference/deprecation.html#change-of-jitclass-location for the time frame.
  Numpy32 = numba.jitclass(spec32)(NumpyIO)
/Users/carlos/anaconda3/lib/python3.7/site-packages/fastparquet/dataframe.py:5: FutureWarning: pandas.core.index is deprecated and will be removed in a future version.  The public classes are available in the top-level namespace.
  from pandas.core.index import CategoricalIndex, RangeIndex, Index, MultiIndex

Load data

[3]:
mlens = MovieLens('data/ml-latest-small')
ratings = mlens.ratings
ratings.head()
[3]:
user item rating timestamp
0 1 31 2.5 1260759144
1 1 1029 3.0 1260759179
2 1 1061 3.0 1260759182
3 1 1129 2.0 1260759185
4 1 1172 4.0 1260759205

Define algorithms

[4]:
biasedmf = als.BiasedMF(50)
bias = basic.Bias()
itemitem = item_knn.ItemItem(20)

Evaluate recommendations

[5]:
def create_recs(name, algo, train, test):
    fittable = util.clone(algo)
    fittable = Recommender.adapt(fittable)
    fittable.fit(train)
    users = test.user.unique()
    # now we run the recommender
    recs = batch.recommend(fittable, users, 100)
    # add the algorithm name for analyzability
    recs['Algorithm'] = name
    return recs

We loop over the data to generate recommendations for the defined algorithms.

[6]:
all_recs = []
test_data = []
for train, test in xf.partition_users(ratings[['user', 'item', 'rating']], 5, xf.SampleFrac(0.2)):
    test_data.append(test)
    all_recs.append(create_recs('ItemItem', itemitem, train, test))
    all_recs.append(create_recs('BiasedMF', biasedmf, train, test))
    all_recs.append(create_recs('Bias', bias, train, test))

We create a single data frame with the recommendations

[7]:
all_recs = pd.concat(all_recs, ignore_index=True)
all_recs.head()
[7]:
item score user rank Algorithm
0 3171 5.366279 9 1 ItemItem
1 104283 5.279667 9 2 ItemItem
2 27803 5.105468 9 3 ItemItem
3 4338 5.037831 9 4 ItemItem
4 86000 4.991602 9 5 ItemItem

We also concatenate the test data

[8]:
test_data = pd.concat(test_data, ignore_index=True)

Let’s analyse the recommendation lists

[9]:
rla = topn.RecListAnalysis()
rla.add_metric(topn.ndcg)
results = rla.compute(all_recs, test_data)
results.head()
[9]:
nrecs ndcg
Algorithm user
Bias 1 100.0 0.0
2 100.0 0.0
3 100.0 0.0
4 100.0 0.0
5 100.0 0.0

Let’s see the nDCG mean value for each algorithm

[10]:
results.groupby('Algorithm').ndcg.mean()
[10]:
Algorithm
Bias        0.000309
BiasedMF    0.069957
ItemItem    0.005367
Name: ndcg, dtype: float64
[11]:
results.groupby('Algorithm').ndcg.mean().plot.bar()
[11]:
<matplotlib.axes._subplots.AxesSubplot at 0x12335b110>
_images/accuracy_20_1.png

Evaluate prediction accuracy

[12]:
def evaluate_predictions(name, algo, train, test):
    algo_cloned = util.clone(algo)
    algo_cloned.fit(train)
    return test.assign(preds=algo_cloned.predict(test), algo=name)
[13]:
preds_bias = pd.concat(evaluate_predictions('bias', bias, train, test) for (train, test) in xf.partition_users(ratings, 5, xf.SampleFrac(0.2)))
preds_biasedmf = pd.concat(evaluate_predictions('biasedmf', biasedmf, train, test) for (train, test) in xf.partition_users(ratings, 5, xf.SampleFrac(0.2)))
preds_itemitem = pd.concat(evaluate_predictions('itemitem', itemitem, train, test) for (train, test) in xf.partition_users(ratings, 5, xf.SampleFrac(0.2)))

Bias

[14]:
print(f'MAE: {pm.mae(preds_bias.preds, preds_bias.rating)}')
print(f'RMSE: {pm.rmse(preds_bias.preds, preds_bias.rating)}')
MAE: 0.6950106667260073
RMSE: 0.9066546007561017

BiasedMF

[15]:
print(f'MAE: {pm.mae(preds_biasedmf.preds, preds_biasedmf.rating)}')
print(f'RMSE: {pm.rmse(preds_biasedmf.preds, preds_biasedmf.rating)}')
MAE: 0.6818618886318303
RMSE: 0.8911595961607526

ItemItem

[16]:
print(f'MAE: {pm.mae(preds_itemitem.preds, preds_itemitem.rating)}')
print(f'RMSE: {pm.rmse(preds_itemitem.preds, preds_itemitem.rating)}')
MAE: 0.6640965754633255
RMSE: 0.8730680515165724
[ ]: