The web today revolves around content recommendation, from major platforms like Amazon where goods are recommended to you, to social media applications like Facebook where friends are recommended for you. Recommendation systems has become the new normal for the web, it has become rarely difficult to find a major web page without one form of recommendation or the other.
If you do not have previous knowledge of how recommender systems work, you might want to check out this Recommender Systems Course on Google.
For this article i will be building a Movie Recommendation App. This app is going to take input( the name of a movie a user likes ) and recommend movies that are related to it. The working logic here is that if you like a movie, then you should also like movies related to it.
You can view the resulting application here.Now lets get to work.
View the source codes here.
The dataset used in building the recommender algorithm is the TMBD 5000 Movie Dataset available on Kaggle. The dataset contains 4803 entries. Let's go through the dataset very briefly so that we can focus on building the machine learning model part. We load the two csv files into df1 & df2 dataframes Instead of handling both the data frames, We merged the data frames so that we have to work on a single data frame. The dataset thankfully does not have a large number of empty values. Let’s handle them one by one. Here is an overview of all the columns. Looking at the id column, which is unique for each movie, we do not need it because it will not contribute to the recommendations. Also, the tagline column should be eliminated because most of the movies have an overview and thus the tagline would result in more of a similar context. Dropping these 2 columns results in a data frame with 21 attributes. There are multiple columns where we have a string or node which contains a dictionary. We can use literal_eval from ast module to remove these strings or nodes and get the embedded dictionary. So we use literal_eval for attributes cast, keywords, crew, & genres. Now we have these attributes in the form of a dictionary, we can use these attributes and get important features such as director names, a very important factor for our recommender system. Also for the cast, keywords, & genre attributes, we can return the top 3 names in each category in a list. Now we can create a single column which will a sum of all these 4 attributes, which are very dominant factors for our recommender system. Let’s call this column “soup” (because it’s like a soup/combination of 4 attributes).
To build our model, we first create a count matrix that is created by the help of a count vectorizer. We create a count vector with English stopwords & fit and transform over the soup column we just created in the previous section. Scikit-learn has a very beautiful method called cosine similarity. It is simply a metric that is used to determine how similar documents are, irrespective of their size. After building the cosine similarity matrix for our dataset, we can now sort the results to find out the top 10 similar movies. We return the movie title & indexes to the user.
import difflib import pandas as pd from sklearn.feature_extraction.text import CountVectorizer from sklearn.metrics.pairwise import cosine_similarity df2 = pd.read_csv('./model/tmdb.csv') count = CountVectorizer(stop_words='english') count_matrix = count.fit_transform(df2['soup']) cosine_sim2 = cosine_similarity(count_matrix, count_matrix) df2 = df2.reset_index() indices = pd.Series(df2.index, index=df2['title']) all_titles = [df2['title'][i] for i in range(len(df2['title']))] def get_recommendations(title): cosine_sim = cosine_similarity(count_matrix, count_matrix) idx = indices[title] sim_scores = list(enumerate(cosine_sim[idx])) sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True) sim_scores = sim_scores[1:11] movie_indices = [i[0] for i in sim_scores] tit = df2['title'].iloc[movie_indices] dat = df2['release_date'].iloc[movie_indices] return_df = pd.DataFrame(columns=['Title','Year']) return_df['Title'] = tit return_df['Year'] = dat return return_df
Enter fullscreen mode
Exit fullscreen mode
Now that we have our algorithm for the recommender system we want to create an interface where a user can input movies and receive recommendation based on the movie inserted. The easiest framework to use for this kind of task is the flask framework. If you have no previous knowledge of how the framework works you can check out this article if found on real python here After creating our html templates we use the codes below in our app.py to simple render our templates.
import flask app = flask.Flask(__name__, template_folder=’templates’) # Set up the main route @app.route(‘/’, methods=[‘GET’, ‘POST’]) def main(): if flask.request.method == ‘GET’: return(flask.render_template(‘index.html’))
Enter fullscreen mode
Exit fullscreen mode
Now that we have our index.html rendered, let’s hope that the user enters a movie name. Upon entering, the user clicks on the submit button and the form is submitted. Now we have a movie name, which is submitted by the user in the form. Let’s hold this name into the m_name variable in python. We accept the form submission using the post method.
if flask.request.method == ‘POST’: m_name = flask.request.form[‘movie_name’] m_name = m_name.title()
Enter fullscreen mode
Exit fullscreen mode
if m_name not in all_titles: return(flask.render_template(‘negative.html’,name=m_name)) else: result_final = get_recommendations(m_name) names = [] dates = [] for i in range(len(result_final)): names.append(result_final.iloc[i][0]) dates.append(result_final.iloc[i][1]) return flask.render_template(‘positive.html’,movie_names=names,movie_date=dates,search_name=m_name)
Enter fullscreen mode
Exit fullscreen mode
With this we have a functional recommendation engine where you can input movies and get movie recommendations based on movies avalable in our database.
You can view the resulting application here.
View the source codes here.