Given a set of user's rating on movies, how to recommend movies to a new user ?

Ideas from the book: Collective Intelligence chapter one

Given a set of ratings per user

[ person1 => {movieA => rating-1A, movieB => rating-1B},

person2 => {movieX => rating-2X, movieY => rating-2Y, movieA => rating-2A}

...

]

Determine similarityperson1 and person2 are similar if they have rate the same movies with similar ratings

person_distance =

square_root of sum of

-- for each movie_name in person1.ratings.keys

---- if (person2.ratings.keys contains movie_name)

------ square(person1.ratings[movie_name] - person1.ratings[movie_name])

Person similarity =

0 if no common movies in corresponding ratings

1 / (1 + person_distance) otherwise

How to find similar persons to personK ?

Calculate every other person's similarity to personK, and sorted by similarity.

How about movie similarity ?

Invert the set of rankings to ...

[ movieA => {person1 => rating-1A, person2 => rating-2A},

movieB => {person1 => rating-1B}

movieX => {person2 => rating-2X}

movieY => {person2 => rating-2Y}

...

]

movie_distance =

square_root of sum of

-- for each person_name in movieX.ratings.keys

---- if (movieX.ratings.keys contains person_name)

------ square(movieX.ratings[person_name] - movieY.ratings[person_name])

Movie similarity =

0 if no common persons in corresponding ratings

1 / (1 + movie_distance) otherwise

How to find similar movies to movieX ?

Calculate every other movie's similarity to movieX, and sorted by similarity.

Making recommendationsLets say there is a new personK provide his ratings. How do we recommend movies that may interests him ?

User-based filteringFor each person in persons

-- similarity = person_similarity(personK, person)

-- For each movie_name in person.ratings.keys

---- weighted_ratings[movie_name] += (similarity * person.ratings[movie_name])

---- sum_similarity[movie_name] += similarity

For each movie_name in weighted_ratings.keys

-- weighted_ratings[movie_name] /= sum_similarity[movie_name]

return weighted_ratings.sort_by(:rating)

Item-based filtering

Pre-calculate the movie similarity

For each movie in movies

-- neighbors[movie] = rank_similarity(movie, no_of_close_neighbors)

{

movie1 => {movieA => similarity1A, movieB => similarity1B}

movie2 => {movieC => similarity2C, movieD => similarity2D}

}

At run time ...

personK => {movieX => rating-kX, movieY => rating-kY}

For each movie in personK.ratings.keys

-- for each close_movie in neighbors[movie]

---- weight_ratings[close_movie] += neighbors[movie][close_movie] * personK.ratings[movie]

---- similarity_sum[close_movie] += neighbors[movie][close_movie]

For each target_movie in weight_ratings.keys

-- weight_ratings[target_movie] /= similarity_sum[target_movie]

return weight_ratings.sort_by(:rating)