Neural Turing Machines – Implementation Hell

I’ve been struggling with the implementation of the NTM for the past week and a half now.

There are various problems that I’ve been trying to deal with. The paper is relatively sparse when it comes to details of the architecture, and a lot more brief when it comes to the training process. Alex Graves trains RNNs a lot in his work, and it seems to me some of the tricks he has used here might have been distributed through his previous work.

Continue reading

Neural Turing Machines – A First Look

Some time last week, a paper from Google DeepMind caught my attention.

The paper is of particular interest to me because I’ve been thinking about how a recurrent neural network could learn to have access to an external form of memory. The approach taken here is interesting as it makes use of a balance between seeking using similarity of content, and shifting from that using location.

My focus this time would be on some of the details needed for implementation. Some of these specifics are glossed over in the paper, and I’ll try to infer whatever I can and, perhaps in the next post, have code (in Theano, what else?) to present.

Continue reading

Connectionist Temporal Classification (CTC) with Theano

This will be the first time I’m trying to present code I’ve written in an ipython notebook. The style’s different, but I think I’ll permanently switch to this method of presentation for code-intensive posts from now on. A nifty little tool that makes doing this so convenient is ipy2wp. It uses WordPress’ xml-rpc to post the HTML directly to the platform.

In any case, I’ve started working with the NUS School of Computing speech recognition group, and they’ve been using deep neural networks for classification of audio frames to phonemes. This requires a preprocessing step that aligns the audio frames to phonemes in order to reduce this to a simple classification problem.

CTC describes a way to compute the probability of a sequence of phonemes for a sequence of audio frames, accounting for all possible alignments. We can then define an objective function to maximise the probability of the phoneme sequence given the audio frame sequence from training data.
Continue reading

Implementing AdaDelta

The end of this post (I don’t know where the article is now. Can’t find it.) had  a diagram showing the improvements of AdaDelta over standard SGD and AdaGrad, so I decided to look up what AdaGrad actually does. The details are written in the paper, including it’s “derivation”. It’s basically an improvement over AdaGrad, using rolling averages and also multiplying by the RMS of the rolling average of changes to the weight. Continue reading

NLP with Neural Networks

Gave a presentation on neural networks at the NUS Web Information retrieval and NLP Group (WING). Idea was mainly to concretise my understanding of the topic and also to share some interesting concepts that have been introduced in neural networks research on NLP, while giving me some sorely needed experience doing public speaking.

Not sure how much of that I achieved, but here are the slides anyway.

Dropout using Theano

A month ago I tried my hand at the Higgs Boson Challenge on Kaggle. I tried using an approach neural networks that got me pretty far initially, but other techniques seemed to have won out.

Continue reading

Recursive Auto-encoders: Momentum

In the previous post, we wrote the code for RAE using the Theano library, but it wasn’t successful in performing the simple task of reversing a randomised sequence of 1 to 8. One of the tricks we can use for dealing with time sequence data is to use a small learning rate, along with momentum. I’ll be discussing what momentum is, and showing a simple way momentum can be implemented in Theano. Continue reading

Recursive Auto-encoders: Example in Theano

Okay, time to get our hands dirty with some code! I’ve written an example in Theano that encodes a stream of one-hot encodings, and this is the example I’ll run through with this post.

As a quick recap of what was covered in the previous post, here’s a diagram:


Continue reading

Recursive Auto-encoders: An Introduction

I’ve talked a little bit about recursive auto-encoders a couple of posts ago. In the deep learning lingo, an auto-encoder network usually refers to an architecture that takes in an input vector, and through a series of transformations, is trained to reproduce that input in its prediction layer. The reason for doing this is to extract features that describe the input. One might think of it as a form of compression: If the network is asked to be able to reproduce an input with after passing it through hidden layers with a lot less neurons than the input layer, then some sort of compression has to happen in order for it to be able to create a good reconstruction. Screenshot from 2014-05-09 23:33:55 So let’s consider the above network. 8 inputs, 8 outputs, and 3 in the hidden layer. If we feed the network a one-hot encoding of 1 to 8 (setting only the neuron corresponding to the input to 1), and insist that that input be reconstructed at the output layer, guess what happens? Continue reading

“It’s like Hinton diagrams, but for the terminal.”

Which of the two matrix representations below would you rather be looking at?

Screenshot from 2014-05-04 01:38:04

Hinton diagrams are often used for visualising the learnt weights of neural networks. I’ve often found myself trying to imagine what the weights look like. And fortunately for me today, I remembered this project by GitHub’s Zach Holman.

Turns out, overriding the way NumPy represents numbers wasn’t too hard, so I hacked myself a cool little solution. The code’s here for the time being, until I spin it off into it’s own little repo.


EDIT: The code is now in my theano_toolkit repository. Check it out!