Given a set of user's rating on movies, how to recommend movies to a new user ?
Ideas from the book: Collective Intelligence chapter one
Given a set of ratings per user
[ person1 => {movieA => rating-1A, movieB => rating-1B},
person2 => {movieX => rating-2X, movieY => rating-2Y, movieA => rating-2A}
...
]
Determine similarity
person1 and person2 are similar if they have rate the same movies with similar ratings
person_distance =
square_root of sum of
-- for each movie_name in person1.ratings.keys
---- if (person2.ratings.keys contains movie_name)
------ square(person1.ratings[movie_name] - person1.ratings[movie_name])
Person similarity =
0 if no common movies in corresponding ratings
1 / (1 + person_distance) otherwise
How to find similar persons to personK ?
Calculate every other person's similarity to personK, and sorted by similarity.
How about movie similarity ?
Invert the set of rankings to ...
[ movieA => {person1 => rating-1A, person2 => rating-2A},
movieB => {person1 => rating-1B}
movieX => {person2 => rating-2X}
movieY => {person2 => rating-2Y}
...
]
movie_distance =
square_root of sum of
-- for each person_name in movieX.ratings.keys
---- if (movieX.ratings.keys contains person_name)
------ square(movieX.ratings[person_name] - movieY.ratings[person_name])
Movie similarity =
0 if no common persons in corresponding ratings
1 / (1 + movie_distance) otherwise
How to find similar movies to movieX ?
Calculate every other movie's similarity to movieX, and sorted by similarity.
Making recommendations
Lets say there is a new personK provide his ratings. How do we recommend movies that may interests him ?
User-based filtering
For each person in persons
-- similarity = person_similarity(personK, person)
-- For each movie_name in person.ratings.keys
---- weighted_ratings[movie_name] += (similarity * person.ratings[movie_name])
---- sum_similarity[movie_name] += similarity
For each movie_name in weighted_ratings.keys
-- weighted_ratings[movie_name] /= sum_similarity[movie_name]
return weighted_ratings.sort_by(:rating)
Item-based filtering
Pre-calculate the movie similarity
For each movie in movies
-- neighbors[movie] = rank_similarity(movie, no_of_close_neighbors)
{
movie1 => {movieA => similarity1A, movieB => similarity1B}
movie2 => {movieC => similarity2C, movieD => similarity2D}
}
At run time ...
personK => {movieX => rating-kX, movieY => rating-kY}
For each movie in personK.ratings.keys
-- for each close_movie in neighbors[movie]
---- weight_ratings[close_movie] += neighbors[movie][close_movie] * personK.ratings[movie]
---- similarity_sum[close_movie] += neighbors[movie][close_movie]
For each target_movie in weight_ratings.keys
-- weight_ratings[target_movie] /= similarity_sum[target_movie]
return weight_ratings.sort_by(:rating)
1 comment:
It's an old post but I read it with pleasure.
a little bugfix:
------ square(person2.ratings[movie_name] - person1.ratings[movie_name])
Post a Comment