Recursive Auto-encoders: Example in Theano

Okay, time to get our hands dirty with some code! I’ve written an example in Theano that encodes a stream of one-hot encodings, and this is the example I’ll run through with this post.

As a quick recap of what was covered in the previous post, here’s a diagram:


Continue reading

Recursive Auto-encoders: An Introduction

I’ve talked a little bit about recursive auto-encoders a couple of posts ago. In the deep learning lingo, an auto-encoder network usually refers to an architecture that takes in an input vector, and through a series of transformations, is trained to reproduce that input in its prediction layer. The reason for doing this is to extract features that describe the input. One might think of it as a form of compression: If the network is asked to be able to reproduce an input with after passing it through hidden layers with a lot less neurons than the input layer, then some sort of compression has to happen in order for it to be able to create a good reconstruction. Screenshot from 2014-05-09 23:33:55 So let’s consider the above network. 8 inputs, 8 outputs, and 3 in the hidden layer. If we feed the network a one-hot encoding of 1 to 8 (setting only the neuron corresponding to the input to 1), and insist that that input be reconstructed at the output layer, guess what happens? Continue reading

“It’s like Hinton diagrams, but for the terminal.”

Which of the two matrix representations below would you rather be looking at?

Screenshot from 2014-05-04 01:38:04

Hinton diagrams are often used for visualising the learnt weights of neural networks. I’ve often found myself trying to imagine what the weights look like. And fortunately for me today, I remembered this project by GitHub’s Zach Holman.

Turns out, overriding the way NumPy represents numbers wasn’t too hard, so I hacked myself a cool little solution. The code’s here for the time being, until I spin it off into it’s own little repo.


EDIT: The code is now in my theano_toolkit repository. Check it out!

Finding Maximum Dot (or Inner) Product

A problem that often arises in machine learning tasks is trying to find a row in a matrix that gives the highest dot product given a query vector. Some examples of such situations:

  • You’ve performed some kind of matrix factorisation for collaborative filtering for say, a movie recommendation system, and now, given a new user, you want to be able to specify a couple of movies that your system would predict he would rate highly.
  • A neural network where the final softmax predictive layer is huge (but you managed to train it, somehow).

In both these cases, the problem boils down to trying to search a collection of vectors to find the one that gives the highest (or the $k$ highest) dot product(s).

A simple way to do this would be to perform a matrix multiplication, and then to find the best scoring vector by scanning through the values. This is effectively performing $N$ dot product computations for a matrix with $N$ rows. Can we do better?
Continue reading

Remembering sequences (poorly) with RNNs

I have a project going recently that aims to train a recurrent network to memorise and repeat sequences of characters. It’s here, and it hasn’t been going really well, but I thought I’d share a little bit of why I wanted to do this and why I thought it might work. Continue reading

March Madness with Theano

I’m not particularly familiar with NCAA Men’s Division I Basketball Championship, but I’ve seen the  March Machine Learning Madness challenge come up for a few years now, and I’ve decided to try my hand at it today.

I also haven’t tried a machine learning task like this one. At it’s simplest (assuming you don’t harvest more data about each team and their players), all you have is a set of game data: who won, who lost, and their respective scores. Intuitively, we should be able to look at tables like these and get a rough sense of who the better teams are. But how do we model it as a machine learning problem? Continue reading

Learning about reinforcement learning, with Tetris

For our final assignment for the NUS Introduction to Artificial Intelligence class (CS3243), we were asked to design a Tetris playing agent. The goal of the assignment was to get students to be familiar with the idea of heuristics and how they work, getting them to manually tune features to get a reasonably intelligent agent. However, the professor included this in the assignment folder, which made me think we had to implement the Least-squares Policy Iteration algorithm for the task.

I’ll probably discuss LSPI in more detail in another post, but for now, here are the useful features we found for anyone trying to do the same thing. Continue reading

Naive Bayes Categorisation (with some help from Elasticsearch)

Back in November, I gave a talk during one of the Friday Hackers and Painters sessions at Plug-in@Block 71, aptly titled “How I do categorisation and some naive bayes sh*t” by Calvin Cheng. I promised I’d write a follow-up blog post with the materials I presented during the talk, so here it is. Continue reading

My Quora Codesprint Submission

(this is x-posted on Quora)

I’ve had some experience in the past with machine learning, but I feel like I still don’t have a proper methodology. I’d like to hear what you guys think about what I’ve done here. Continue reading

Vectorisation and the logsumexp trick

Having been played around and writing up some simple machine learning algorithms the last couple of months, I’ve decided to write about some useful tricks of the trade that I’ve learnt. These are are not specific to any algorithm, but stuff that I’ve found are useful. regardless of the kind of approach you intend to take. Many machine learning techniques can be performed as a series of matrix operations. As as a result, it isn’t too uncommon to see the term “vectorisation” being thrown about in a machine learning related paper. This essentially means to reframe the problem as a series of matrix operations, which results in some performance gains.

There has been plenty of work done to speed up matrix operations which have been implemented as libraries, and bindings written in different languages to take full advantage of these libraries. Continue reading