Below you will find pages that utilize the taxonomy term “Recurrent Neural Networks”
Deciding <u>W</u>hen <u>T</u>o <u>F</u>eedforward (or WTF gates)
Another paper of mine, titled “Towards Implicit Complexity Control using Variable-Depth DNNs for ASR Systems” got accepted to the International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2016 in Shanghai, which happened not too long ago.
The idea behind this one was the intuition that in a classification task, some instances should be simpler than others to classify. Similarly, the problem of deciding when to stop in an RNN setting is also an important one. If we take the bAbI task for example, and go an extra step and assume the number of logical steps to arrive at the answer is not provided for you, then you need to know when the network is ‘ready’ to give an answer.
Learning to Transduce with Unbounded Memory – The Neural Stack
DeepMind has in the past week released a paper proposing yet another approach to having a memory structure within a neural network. This time, they implement a stack, queue and a deque “data structure” within their models. While this idea is not necessarily new, it incorporates some of the broad ideas seen in the Neural Turing Machines, where they try to have a model that is end-to-end differentiable, rather than have the data structure decoupled from the training process. I have to admit I haven’t read any of these previous papers before, but it’s definitely on my to read list.
In any case, this paper claims that using these memory structures beats having an 8-layered LSTM network trained for the same task. If this is true, this may mean we finally have some justification for these fancier models — simply throwing bigger networks at problems just isn’t as efficient.
I’ve spent some time trying to puzzle out what exactly they’re trying to do here with the neural stack. I suspect once I’ve figured this out, the queue and deque will be pretty similar, so I don’t think I will go through them in the same detail.
Generating Singlish with LSTMs
So in in the last week, Andrej Karpathy wrote a post about the current state of RNNs, and proceeded to dump a whole bunch of different kinds of text data into them to see what they learn. Training language models and then sampling from them is lots of fun, and a character-level model is extra interesting because you see it come up with new words that actually kind of mean something sometimes. Even Geoffrey Hinton has some fun with it in this talk.
So after reading through Karpathy’s code, I got a few tips about how to do this properly, as I haven’t been able to train a proper language model before.
The data is obtained from one of Singapore’s most active sub-forums on HardwareZone.com.sg, Eat-Drink-Man-Woman. I like to think it’s a… localised 4chan. So know what to expect going in. If you want to just see what the model generates, go here.
Long Short-Term Memory
There seems to be a resurgence in using these units in the past year. They were first proposed in 1997 by Hochreiter and Schmidhuber, but, along with most neural network literature seemed to have been forgotten for a while, until work on neural networks made a comeback, and focus started shifting toward RNNs again. Some of the more interesting recent work using LSTMs have come from Schmidhuber’s student Alex Graves. Notice the spike here in 2009 when Graves first wrote about cursive handwriting recognition (and generation) using LSTMs.
Neural Turing Machines – Copy Task
After much fiddling around with the instability of the training procedure, I still haven’t found a recipe that would get it to converge consistently.
I did find though, that training it on shorter sequences first, before letting it see longer ones avoids huge gradients that would make the parameters explode into NaNs. And that is a huge help. Doing that still does not guarantee convergence though, and I only get a good model at random, like this one I’ve trained here copying a sequence of length 10:
Neural Turing Machines – Implementation Hell
I’ve been struggling with the implementation of the NTM for the past week and a half now.
<p>
There are various problems that I’ve been trying to deal with. The paper is relatively sparse when it comes to details of the architecture, and a lot more brief when it comes to the training process. Alex Graves trains RNNs a lot in his work, and it seems to me some of the tricks he has used here might have been distributed through his previous work.
</p>
</div>
Neural Turing Machines – A First Look
Some time last week, a paper from Google DeepMind caught my attention.
<p>
The paper is of particular interest to me because I’ve been thinking about how a recurrent neural network could learn to have access to an external form of memory. The approach taken here is interesting as it makes use of a balance between seeking using similarity of content, and shifting from that using location.
</p>
<p>
My focus this time would be on some of the details needed for implementation. Some of these specifics are glossed over in the paper, and I’ll try to infer whatever I can and, perhaps in the next post, have code (in Theano, what else?) to present.
</p>
<p>
NLP with Neural Networks
Recursive Auto-encoders: Momentum
In the previous post, we wrote the code for RAE using the Theano library, but it wasn’t successful in performing the simple task of reversing a randomised sequence of 1 to 8. One of the tricks we can use for dealing with time sequence data is to use a small learning rate, along with momentum. I’ll be discussing what momentum is, and showing a simple way momentum can be implemented in Theano.
Recursive Auto-encoders: An Introduction
I’ve talked a little bit about recursive auto-encoders a couple of posts ago. In the deep learning lingo, an auto-encoder network usually refers to an architecture that takes in an input vector, and through a series of transformations, is trained to reproduce that input in its prediction layer. The reason for doing this is to extract features that describe the input. One might think of it as a form of compression: If the network is asked to be able to reproduce an input with after passing it through hidden layers with a lot less neurons than the input layer, then some sort of compression has to happen in order for it to be able to create a good reconstruction. So let’s consider the above network. 8 inputs, 8 outputs, and 3 in the hidden layer. If we feed the network a one-hot encoding of 1 to 8 (setting only the neuron corresponding to the input to 1), and insist that that input be reconstructed at the output layer, guess what happens?
Remembering sequences (poorly) with RNNs
I have a project going recently that aims to train a recurrent network to memorise and repeat sequences of characters. It’s here, and it hasn’t been going really well, but I thought I’d share a little bit of why I wanted to do this and why I thought it might work.