Sunday, December 9, 2007

Collaborative Filtering

Given a set of user's rating on movies, how to recommend movies to a new user ?

Ideas from the book: Collective Intelligence chapter one

Given a set of ratings per user
[ person1 => {movieA => rating-1A, movieB => rating-1B},
person2 => {movieX => rating-2X, movieY => rating-2Y, movieA => rating-2A}
...
]

Determine similarity

person1 and person2 are similar if they have rate the same movies with similar ratings

person_distance =
square_root of sum of
-- for each movie_name in person1.ratings.keys
---- if (person2.ratings.keys contains movie_name)
------ square(person1.ratings[movie_name] - person1.ratings[movie_name])


Person similarity =
0 if no common movies in corresponding ratings
1 / (1 + person_distance) otherwise

How to find similar persons to personK ?
Calculate every other person's similarity to personK, and sorted by similarity.

How about movie similarity ?

Invert the set of rankings to ...
[ movieA => {person1 => rating-1A, person2 => rating-2A},
movieB => {person1 => rating-1B}
movieX => {person2 => rating-2X}
movieY => {person2 => rating-2Y}
...
]

movie_distance =
square_root of sum of
-- for each person_name in movieX.ratings.keys
---- if (movieX.ratings.keys contains person_name)
------ square(movieX.ratings[person_name] - movieY.ratings[person_name])


Movie similarity =
0 if no common persons in corresponding ratings
1 / (1 + movie_distance) otherwise

How to find similar movies to movieX ?
Calculate every other movie's similarity to movieX, and sorted by similarity.

Making recommendations

Lets say there is a new personK provide his ratings. How do we recommend movies that may interests him ?

User-based filtering

For each person in persons
-- similarity = person_similarity(personK, person)
-- For each movie_name in person.ratings.keys
---- weighted_ratings[movie_name] += (similarity * person.ratings[movie_name])
---- sum_similarity[movie_name] += similarity

For each movie_name in weighted_ratings.keys
-- weighted_ratings[movie_name] /= sum_similarity[movie_name]

return weighted_ratings.sort_by(:rating)


Item-based filtering

Pre-calculate the movie similarity

For each movie in movies
-- neighbors[movie] = rank_similarity(movie, no_of_close_neighbors)

{
movie1 => {movieA => similarity1A, movieB => similarity1B}
movie2 => {movieC => similarity2C, movieD => similarity2D}
}

At run time ...
personK => {movieX => rating-kX, movieY => rating-kY}

For each movie in personK.ratings.keys
-- for each close_movie in neighbors[movie]
---- weight_ratings[close_movie] += neighbors[movie][close_movie] * personK.ratings[movie]
---- similarity_sum[close_movie] += neighbors[movie][close_movie]

For each target_movie in weight_ratings.keys
-- weight_ratings[target_movie] /= similarity_sum[target_movie]

return weight_ratings.sort_by(:rating)

1 comment:

Pavel said...

It's an old post but I read it with pleasure.

a little bugfix:
------ square(person2.ratings[movie_name] - person1.ratings[movie_name])