Neural Turing Machines FAQ

There’s been some interest in Neural Turing Machines paper, and I’ve been getting some questions regarding my implementation via e-mail and the comments section on this blog. I plan to make this a blog post where I’ll regularly come back and update with answers to some of these questions as they come up, so do check back!
Continue reading

Learning Gaussian Feature Extractors

While playing around with the MNIST dataset and the example code, I tried to visualise the weights of the connections from the weights to the hidden layer. These can be thought of as feature extractors of the input.

Continue reading

Neural Turing Machines – Copy Task

After much fiddling around with the instability of the training procedure, I still haven’t found a recipe that would get it to converge consistently.

I did find though, that training it on shorter sequences first, before letting it see longer ones avoids huge gradients that would make the parameters explode into NaNs. And that is a huge help. Doing that still does not guarantee convergence though, and I only get a good model at random, like this one I’ve trained here copying a sequence of length 10:


Continue reading

Neural Turing Machines – Implementation Hell

I’ve been struggling with the implementation of the NTM for the past week and a half now.

There are various problems that I’ve been trying to deal with. The paper is relatively sparse when it comes to details of the architecture, and a lot more brief when it comes to the training process. Alex Graves trains RNNs a lot in his work, and it seems to me some of the tricks he has used here might have been distributed through his previous work.

Continue reading

Neural Turing Machines – A First Look

Some time last week, a paper from Google DeepMind caught my attention.

The paper is of particular interest to me because I’ve been thinking about how a recurrent neural network could learn to have access to an external form of memory. The approach taken here is interesting as it makes use of a balance between seeking using similarity of content, and shifting from that using location.

My focus this time would be on some of the details needed for implementation. Some of these specifics are glossed over in the paper, and I’ll try to infer whatever I can and, perhaps in the next post, have code (in Theano, what else?) to present.

Continue reading

Connectionist Temporal Classification (CTC) with Theano

This will be the first time I’m trying to present code I’ve written in an ipython notebook. The style’s different, but I think I’ll permanently switch to this method of presentation for code-intensive posts from now on. A nifty little tool that makes doing this so convenient is ipy2wp. It uses WordPress’ xml-rpc to post the HTML directly to the platform.

In any case, I’ve started working with the NUS School of Computing speech recognition group, and they’ve been using deep neural networks for classification of audio frames to phonemes. This requires a preprocessing step that aligns the audio frames to phonemes in order to reduce this to a simple classification problem.

CTC describes a way to compute the probability of a sequence of phonemes for a sequence of audio frames, accounting for all possible alignments. We can then define an objective function to maximise the probability of the phoneme sequence given the audio frame sequence from training data.
Continue reading

Implementing AdaDelta

The end of this post (I don’t know where the article is now. Can’t find it.) had  a diagram showing the improvements of AdaDelta over standard SGD and AdaGrad, so I decided to look up what AdaGrad actually does. The details are written in the paper, including it’s “derivation”. It’s basically an improvement over AdaGrad, using rolling averages and also multiplying by the RMS of the rolling average of changes to the weight. Continue reading

NLP with Neural Networks

Gave a presentation on neural networks at the NUS Web Information retrieval and NLP Group (WING). Idea was mainly to concretise my understanding of the topic and also to share some interesting concepts that have been introduced in neural networks research on NLP, while giving me some sorely needed experience doing public speaking.

Not sure how much of that I achieved, but here are the slides anyway.

Dropout using Theano

A month ago I tried my hand at the Higgs Boson Challenge on Kaggle. I tried using an approach neural networks that got me pretty far initially, but other techniques seemed to have won out.

Continue reading

Recursive Auto-encoders: Momentum

In the previous post, we wrote the code for RAE using the Theano library, but it wasn’t successful in performing the simple task of reversing a randomised sequence of 1 to 8. One of the tricks we can use for dealing with time sequence data is to use a small learning rate, along with momentum. I’ll be discussing what momentum is, and showing a simple way momentum can be implemented in Theano. Continue reading