Accuracy of models using Lenskit¶
This notebook shows how to test the recommendations accuracy of models using lenskit.
Setup¶
[1]:
from lenskit.datasets import MovieLens
from lenskit import batch, topn, util
from lenskit import crossfold as xf
from lenskit.algorithms import Recommender, als, item_knn, basic
import lenskit.metrics.predict as pm
import pandas as pd
/Users/carlos/anaconda3/lib/python3.7/site-packages/fastparquet/encoding.py:222: NumbaDeprecationWarning: The 'numba.jitclass' decorator has moved to 'numba.experimental.jitclass' to better reflect the experimental nature of the functionality. Please update your imports to accommodate this change and see http://numba.pydata.org/numba-doc/latest/reference/deprecation.html#change-of-jitclass-location for the time frame.
Numpy8 = numba.jitclass(spec8)(NumpyIO)
/Users/carlos/anaconda3/lib/python3.7/site-packages/fastparquet/encoding.py:224: NumbaDeprecationWarning: The 'numba.jitclass' decorator has moved to 'numba.experimental.jitclass' to better reflect the experimental nature of the functionality. Please update your imports to accommodate this change and see http://numba.pydata.org/numba-doc/latest/reference/deprecation.html#change-of-jitclass-location for the time frame.
Numpy32 = numba.jitclass(spec32)(NumpyIO)
/Users/carlos/anaconda3/lib/python3.7/site-packages/fastparquet/dataframe.py:5: FutureWarning: pandas.core.index is deprecated and will be removed in a future version. The public classes are available in the top-level namespace.
from pandas.core.index import CategoricalIndex, RangeIndex, Index, MultiIndex
Load data¶
[3]:
mlens = MovieLens('data/ml-latest-small')
ratings = mlens.ratings
ratings.head()
[3]:
user | item | rating | timestamp | |
---|---|---|---|---|
0 | 1 | 31 | 2.5 | 1260759144 |
1 | 1 | 1029 | 3.0 | 1260759179 |
2 | 1 | 1061 | 3.0 | 1260759182 |
3 | 1 | 1129 | 2.0 | 1260759185 |
4 | 1 | 1172 | 4.0 | 1260759205 |
Define algorithms¶
[4]:
biasedmf = als.BiasedMF(50)
bias = basic.Bias()
itemitem = item_knn.ItemItem(20)
Evaluate recommendations¶
[5]:
def create_recs(name, algo, train, test):
fittable = util.clone(algo)
fittable = Recommender.adapt(fittable)
fittable.fit(train)
users = test.user.unique()
# now we run the recommender
recs = batch.recommend(fittable, users, 100)
# add the algorithm name for analyzability
recs['Algorithm'] = name
return recs
We loop over the data to generate recommendations for the defined algorithms.
[6]:
all_recs = []
test_data = []
for train, test in xf.partition_users(ratings[['user', 'item', 'rating']], 5, xf.SampleFrac(0.2)):
test_data.append(test)
all_recs.append(create_recs('ItemItem', itemitem, train, test))
all_recs.append(create_recs('BiasedMF', biasedmf, train, test))
all_recs.append(create_recs('Bias', bias, train, test))
We create a single data frame with the recommendations
[7]:
all_recs = pd.concat(all_recs, ignore_index=True)
all_recs.head()
[7]:
item | score | user | rank | Algorithm | |
---|---|---|---|---|---|
0 | 3171 | 5.366279 | 9 | 1 | ItemItem |
1 | 104283 | 5.279667 | 9 | 2 | ItemItem |
2 | 27803 | 5.105468 | 9 | 3 | ItemItem |
3 | 4338 | 5.037831 | 9 | 4 | ItemItem |
4 | 86000 | 4.991602 | 9 | 5 | ItemItem |
We also concatenate the test data
[8]:
test_data = pd.concat(test_data, ignore_index=True)
Let’s analyse the recommendation lists
[9]:
rla = topn.RecListAnalysis()
rla.add_metric(topn.ndcg)
results = rla.compute(all_recs, test_data)
results.head()
[9]:
nrecs | ndcg | ||
---|---|---|---|
Algorithm | user | ||
Bias | 1 | 100.0 | 0.0 |
2 | 100.0 | 0.0 | |
3 | 100.0 | 0.0 | |
4 | 100.0 | 0.0 | |
5 | 100.0 | 0.0 |
Let’s see the nDCG mean value for each algorithm
[10]:
results.groupby('Algorithm').ndcg.mean()
[10]:
Algorithm
Bias 0.000309
BiasedMF 0.069957
ItemItem 0.005367
Name: ndcg, dtype: float64
[11]:
results.groupby('Algorithm').ndcg.mean().plot.bar()
[11]:
<matplotlib.axes._subplots.AxesSubplot at 0x12335b110>

Evaluate prediction accuracy¶
[12]:
def evaluate_predictions(name, algo, train, test):
algo_cloned = util.clone(algo)
algo_cloned.fit(train)
return test.assign(preds=algo_cloned.predict(test), algo=name)
[13]:
preds_bias = pd.concat(evaluate_predictions('bias', bias, train, test) for (train, test) in xf.partition_users(ratings, 5, xf.SampleFrac(0.2)))
preds_biasedmf = pd.concat(evaluate_predictions('biasedmf', biasedmf, train, test) for (train, test) in xf.partition_users(ratings, 5, xf.SampleFrac(0.2)))
preds_itemitem = pd.concat(evaluate_predictions('itemitem', itemitem, train, test) for (train, test) in xf.partition_users(ratings, 5, xf.SampleFrac(0.2)))
Bias¶
[14]:
print(f'MAE: {pm.mae(preds_bias.preds, preds_bias.rating)}')
print(f'RMSE: {pm.rmse(preds_bias.preds, preds_bias.rating)}')
MAE: 0.6950106667260073
RMSE: 0.9066546007561017
BiasedMF¶
[15]:
print(f'MAE: {pm.mae(preds_biasedmf.preds, preds_biasedmf.rating)}')
print(f'RMSE: {pm.rmse(preds_biasedmf.preds, preds_biasedmf.rating)}')
MAE: 0.6818618886318303
RMSE: 0.8911595961607526
ItemItem¶
[16]:
print(f'MAE: {pm.mae(preds_itemitem.preds, preds_itemitem.rating)}')
print(f'RMSE: {pm.rmse(preds_itemitem.preds, preds_itemitem.rating)}')
MAE: 0.6640965754633255
RMSE: 0.8730680515165724
[ ]: