I won’t go into the details of the business here since they don’t matter that much for this post.
We started our project with some research and talked with a lot of potential clients to get to know their problems and evaluate whether we are on the right track. We used a method called problem centered interview.
One challenge of the system is that we cannot use Collaborative Filtering because we neither have a large amount of existing user data nor do we think that the domain is optimal for such recommendations. We are not Amazon so we had to use something else.
We decided to go with a feature based approach. We generate a vector characterizing the user’s preferences and also generate a vector for each item in our database. Then we compare those vectors and rank all items based on the similarity to the user vector. Using this method we can create a list of recommended (and not recommended) items for each user.
The goal of our system is not only to present the best matches but also to provide the user with some serendipitous options. To read more about novel vs serendipitous see Chapter 3.4.2 of this book.
We also wanted to provide the user with a bundle of items and not with the items alone.
The systems we now constructed works like a pipeline:
We wanted to create a very modular system where each has a simple to use REST API. This has a lot of advantages: Each component can be easily replaced, we can work platform independent, scalable and the system easier to maintain. The idea is to go into the direction of microservices from the start. This might be an overkill for the project in it’s current state but we wanted to try stuff out and get some experience - after all it’s a university project 😉.
Beside the existing user facing web prototype, which is build elsewhere, we created 3 new components. To be honest we should split one of them in two or three parts already to really satisfy the microservice definition and single-responsibility principle.
- Part A: Reads data from the database (a Mongo DB) and creates the user and feature vectors. This part also combines multiple items into a set (This should be it’s own service… 😬).
- Part B: Computes the scores for the items.
- Playground: A user interface to create user profiles to test the recommender.
Via the exposed REST API it’s now hopefully easy to integrate this system into the user facing web prototype.
I never really loved JS because its syntax is very verbose, has some odd twirks and you can easily run into bugs that are hard to debug but caused by simple things. It also completely lacks a static type system and a compiler which could guide you and find a lot of errors. Therefore you need to be very careful to not break things and a JS code base without a large and good set of unit tests is unmaintainable. Of course you should use TDD for every project nowadays, but for JS and other script languages it’s way more important than for compiled languages like Swift or Java.
JS has some advantages like a short deploy cycle and that it can now run nearly everywhere (browsers, servers and even client apps). You can also achieve some small things with not much effort and there a lot of good open source libraries out there.
Back then at ImmobilienScout we used CoffeeScript, which can be compiled to JS. It adds a lot of syntactic sugar and introduces concepts like classes which are otherwise a bit strange to create. However it does not introduce a static type system.
So I started with node.js and CoffeeScript but soon realized that I miss my good and strict type system with generics that I got used to in Swift. So I had a look at Typescript!
It’s still not as strict as some real binary compiled languages because you can easily opt out of the typesystem if needed. E.g. when working with a 3rd party library for which no type definitions exist.
So I started to migrate part A to TypeScript and it works really fine. In combination with a set of unit tests you can mak changes fast and with confidence not to break everything. Refactorings are way easier and the IDE (WebStorm in my case) can help you a lot.
After creating the REST APIs we saw that we should have an easy way to validate our recommendations and that we have to experiment a lot to optimize the feature vectors and recommendation engine.
Therefore I built a small single-page interface which uses the existing REST APIs. With it you can easily create a user profile and see the generated recommendations.
I used Angular JS 1.x at ImmobilienScout and it made the life of a frontend developer so much easier. Data-binding, dependency injection and all the other concepts that make your app nicely structured and easy testable are very good.
Because of that Angular JS was my first choice when searching for a frontend framework and since Angular 2 is in the release candidate phase it seemed stable enough to give it a try and see what changed. The changes and improved abstractions are a significant step forward. A nice thing is also that it’s built with TypeScript in mind and uses a lot of the new ECMAScript features. It was easy to get into the concepts and to write small components that follow the single responsibility principle.
The only two things that I have to critize is that the the Angular 2 Material Design library is still in very early stages and a lot of familiar controls and effects are missing. The other thing that needs improvement is on how to get your app from dev stage to a real production stage. I didn’t put much effort and research there since the playground interface is not intended to be used by end users. But from what I saw it’s currently necessary to perform several nontrivial steps to get your Angular 2 app into production mode.
In the beginning we worked with a simple command line script setup to run all the parts on our local machines (all macs). But at some point I didn’t want to fiddle around with python versions, node installations and differing local configurations. We also wanted to be able to deploy it easily to some cloud service. That’s when I got my hands on Docker.
I once had a little tutorial about Docker at university but a lot has changed since then! You need some time to get into all the services and which tool to use for what task, but in the end it makes your work significantly easier.
One rule that I discovered to late or thought that it is just overhead is: One container per service. If you have a node.js app and a database put them in different, dedicated containers! Then use Docker Compose to bundle all services of your system together. This makes configuration and management much easier. You can then deploy your whole system with just one command - which is great to onboard developers and be able to try out any part of the system before knowing how it exactly works.
After I had Docker running locally I wanted to deploy the whole system on a simple machine in the cloud. No complex scalability or distribution. Surprisingly that’s not that easy if you try to start with the advertisement pages for Amazons Container Service 🙄. It includes and builds upon publishing your Docker images to a repository and a lot of configuration. That’s certainly great for a system on production scale for a larger company but not to just try stuff out 😝
Fortunately Docker has some nice guides on their own homepage to deploy a Docker Compose setup via Docker Machine on AWS and Digital Ocean. I found Digital Ocean to be easier and now have an instance of my system (consisting of 3 docker images combined via docker-compose) deployed in the cloud on a Digital Ocean instance. 🎉
It needs to be integrated in the main frontend prototype - not that easy since there are no unit tests yet :/ . We also need more data and items to improve the recommender and validate our approach. There needs to be a systematic test for different user profiles.
Hopefully I have some time to continue the work in my spare time 😀