22 Apr 2016 | Abhijith Chandraprabhu

Recommender systems play a pivotal role in various business settings like e-commerce sites, social media platforms, and other platforms involving user interaction with other users or products. Recommender systems provide valuable insights to gain actionable intelligence on these users.

Large Scale Recommender systems help in unraveling the latent information in the complex relational data between users and items.However mapping the users space to the items space to predict the interaction is a challenge. Inferring actionable information from a variety of data sources collected either implicitly like click patterns, browser history etc, or explicitly like ratings of books and movies, is what well-designed recommender systems do consistently well.

Depending on the source of information on the users and the items, there are a variety of techniques to build recommender systems, each with a unique mathematical approach. Linear algebra and matrix factorizations are important to certain types of recommenders where user ratings are available and it is most ideal to apply methods like `svd`

in such cases.

In matrix factorization the users and items are mapped onto a joint latent factor space of reduced dimension f, and the inner product of the user vector with the item vector gives the corresponding interaction. Dimensionality reduction is mainly about a more compact representation of the large training data which is obtained by matrix factorization. We want to quantify the nature or the characteristics of the movies defined by a certain number of aspects (factors), i.e., we are trying to generalize the information (independent and unrelated ratings matrix) in a concise and descriptive way.

Example :Let us consider a simple example to figure out how matrix factorization helps in predicting the likelihood of a user liking a movie or not. For sake of brevity, we have couple of users, Joe and Jane and couple of movies, Titanic and Troll 2. The users and the movies are characterized based on certain number of factors as show in the below tables.

Factors/Movies | Titanic | Troll 2 |
---|---|---|

Romance | 4 | 1 |

Comedy | 2 | 4 |

Box Office success | 5 | 2 |

Drama | 3 | 2 |

Horror | 1 | 4 |

Factors/Movies | Joe | Jane |
---|---|---|

Romance | 4 | 1 |

Comedy | 3 | 5 |

Box Office success | 4 | 1 |

Drama | 3 | 3 |

Horror | 1 | 5 |

Consider Joe to be characterized by vector `[4 3 4 3 1]`

, which suggests that Joe likes Romance and big hit movies and not so much horror or comedy. Similarly Jane likes comedy horror and she is not very particular about box office success of the movies, neither is she a big fan of romance movies.

The movies Titanic, is a popular romance movie, where as the movie Troll 2, is not so popular and horror comedy. It is intuitively obvious that Joe will end up liking Titanic and Jane will like Troll 2. This is based on how the users and movies score on the 5 factors. Using *Cosine distance* as shown in the below table, confirms this.

Factors/Movies | Joe | Jane |
---|---|---|

Titanic | 0.98 | 0.57 |

Troll 2 | 0.74 | 0.97 |

With large rating data matrix, like in the NETFLIX dataset which had around 20 thousand movies and 0.5 million users, mapping all the users and the movies in the above way is impossible. This is where matrix factorization helps in factoring the Rating matrix into user matrix and movie matrix.

Let be the user feature matrix where and , and let be the item or movie feature matrix, where and . Here is the number of factors, i.e., the reduced dimension or the lower rank, which is determined by cross validation. The predictions can be calculated for any user-movie combination, , as .

Here we minimize the loss function of and as the condition in the iterative process of obtaining these matrices. Let us start by considering the loss due to a single prediction in terms of squared error: \begin{equation} \mathcal{L}^2(r,{u},{m})=(r-<{u},{m}>)^2. \end{equation}

Based on the above equation generalizing it for the whole data set, the empirical total loss as: \begin{equation} \mathcal{L}^{emp}(R,U,M)=\frac{1}{n} \sum_{(i,j) \in I}\mathcal{L}^2(r_{ij},{u_i},{m_j}), \end{equation} where is the known ratings dataset having ratings.

The package RecSys.jl is a package for recommender systems in Julia, it can currently work with explicit ratings data. For preparing the input create an object of `ALSWR`

type. This takes two input parameters, firstly input file location, and second optional input is the variable `par`

which specifies the type of parallelism. The parallelism is about how the data is shared/distributed across the processing units. When `par=ParShemm`

the data is present at one location and is shared across the processing units, when `par=ParChunk`

the data is distributed across the processing units as chunks. For this report only sequential timings were captured, i.e., with `nprocs`

=1.

`rec=ALSWR("/location/to/input/file/File.delim", par=ParShemm)`

The file can be any tabular structured data, delimited by any character, which needs to be specified,

`inp=DlmFile(name::AbstractString; dlm::Char=Base.DataFmt.invalid_dlm(Char), header::Bool=false, quotes::Bool=true)`

The call to the function to create a model is `train(rec, 10, 10)`

where 10 is the number of iterations to run and 10 is the number of factors.

The sequential performance of the ALS algorithm is tested on Apache Spark and Julia. The scala example code shown in the mentioned link was run with `rank = 10`

and `iterations = 10`

. The timing of the `ALS.train()`

function is recorded in order to analyse the core computational part only. For the same parameters in Julia, the timings for the computationally intensive `train()`

function is captured.

The algorithm took around 500 seconds to train on the NETFLIX dataset on a single processor, which is good for data as large as 1 billion ratings.

The below table also summarises the performance(single processor) on various other datasets like the Movielens and lastFM.

Parameters/Datasets | Size (No. of interactions) | Factorization time (in secs) |
---|---|---|

Movielens | 20 Million | 119 |

Last.fm | 0.5 Billion | 2913 |

The NETFLIX dataset is not available publicly anymore, however datasets for movielens and lastfm can be downloaded. Please refer the dataset specific julia example scripts in the examples/ directory for more details on how to model the recommender system for the respective datasets.

Parallelism is made possible in Julia mainly 2 ways, a). Multiprocessing and b). Multithreading. The multithreading development is onging. However the multiprocessing based parallel processing in Julia is mature and mainly based around `Tasks`

which are concurrent function calls. The implementation details are not covered here, the following graph summarises the performance of parallel ALS implementation in Julia and Spark,

In the above graph, `Julia Distr`

breaks up the problem and uses Julia’s distributed computing capabilities. `Julia Shared`

uses shared memory through mmap arrays. `Julia MT`

is the multithreading version of the ALS. While multi-threading in Julia is nascent, it already gives parallel speedups. There are several planned improvements to Julia’s multi-threading that we expect will make the multi-threaded ALS faster than the other parallel implementations.

The experiments were conducted by invoking spark with flags `--master local[N]`

with N being the number of threads. The experiments were conducted on a 30 core Intel Xeon machine with 132 GB memory and 2 hyperthreads per core.

Credits to Tanmay KM for contributing towards the parallel implementation of the package.

Apart from methods to model the data and check for accuracy, there are also abilities to make recommendations for users who have not interacted with items, by picking the most likely items the user would interact with. Hence in RecSys.jl we have a fast, scalable and accurate recommender system which can be used to for end to end system. Currently we are working on a demo of such a recommender system with a UI interface too implemented in Julia.

02 Mar 2020 | Julia Computing

Automatic Differentiation Meets Conventional Machine Learning
24 Feb 2020 | Deepak Suresh and Abhijith Chandraprabhu

Newsletter February 2020
07 Feb 2020 | Julia Computing

Get the latest news about Julia delivered to your inbox.