Chapter 8 Answers

What problem does collaborative filtering solve?

Bring recommendations to users
How does it solve it?

Create an embedding vector for users and items, and then compare vector distances
Why might a collaborative filtering predictive model fail to be a very useful recommendation system?

If there is a positive feedback loop that reinforces a small group. For example, anime watchers watch a lot of anime, and the recommender may bias towards anime
What does a crosstab representation of collaborative filtering data look like?

Left side y axis represents users+user_embeddings, x axis represents items+item_embeddings. The cross section represents the dependent variable
Write the code to create a crosstab representation of the MovieLens data (you might need to do some web searching!)

*** Not done
What is a latent factor? Why is it “latent”?

Latent factor are indirect variables, not directly observed, but seen through a combination of other variables. Latent means hidden or concealed
What is a dot product? Calculate a dot product manually using pure python with lists.

Dot product is the element wise multiplication of two vectors, and then summing all the products
What does pandas.DataFrame.merge do?

Combines data frames together along, and aligns data with a specific column
What is an embedding matrix?

Embedding matrix is a matrix of users/items and latent factors
What is the relationship between an embedding and a matrix of one-hot encoded vectors?

You use the one-hot encoded vector to pull the embeddings of 1 user. You can think of an embedding as a compressed version of the one-hot encoded vectors
Why do we need Embedding if we could use one-hot encoded vectors for the same thing?

Embeddings save a lot more memory especially if the there is high cardinality. Also, embeddings allow turning categories into continuous variables
What does an embedding contain before we start training (assuming we’re not using a prertained model)?

Randomly initialized numbers
Create a class (without peeking, if possible!) and use it.

*** Not done
What does x[:,0] return?

Every row of column 0 (first column)
Rewrite the DotProduct class (without peeking, if possible!) and train a model with it

*** Not done
What is a good loss function to use for MovieLens? Why?

Mean squared error because we have a range of values (1,2,3,4,5)
What would happen if we used CrossEntropy loss with MovieLens? How would we need to change the model?

??? It wouldn’t work because it looks for a 1 or 0. You need to do categorical cross entropy
What is the use of bias in a dot product model?

The bias centers the function in order to balance with other neurons
What is another name for weight decay?

L2 regularization
Write the equation for weight decay (without peeking!)

total_loss = loss + sum(wd*(w**2))
Write the equation for the gradient of weight decay. Why does it help reduce weights?

weight = weight - lrgrad grad = grad + 2weight weight = weight - lr(grad + 2weight) weight = (1-2lr)weight - lr*grad
Why does reducing weights lead to better generalization?

More neurons/weights are used, and therefore have to share features among themselves versus just 1 weight
What does argsort do in PyTorch?

argsort gives you the indices of the sorted values
Does sorting the movie biases give the same result as averaging overall movie ratings by movie? Why / why not?

No, sorting the movie biases gives additional information that given a movie, the people who like that genre, may not like that movie. Whereas the overall movie rating just gives the average across people
How do you print the names and details of the layers in a model?

layer.model
What is the “bootstrapping problem” in collaborative filtering?

How to recommend things to new users who have no previous history. You can’t bootstrap them to previous knowledge, because you don’t have any
How could you deal with the bootstrapping problem for new users? For new movies?

Start people at the average, or ask questions, or get meta data
How can feedback loops impact collaborative filtering systems?

They could impact systems negatively if there is a reinforcing bias
When using a neural network in collaborative filtering, why can we have different number of factors for movie and user?

Each represent different complexities. We will flatten, and concatenate all these features before feeding into the neural network
Why is there a nn.Sequential in the CollabNN model?

To create a small NN model
What kind of model should be use if we want to add metadata about users and items, or information such as date and time, to a collaborative filter model?

EmbeddingNN, which inherits from TabularModel