Generating Singlish with LSTMs

So in in the last week, Andrej Karpathy wrote a post about the current state of RNNs, and proceeded to dump a whole bunch of different kinds of text data into them to see what they learn. Training language models and then sampling from them is lots of fun, and a character-level model is extra […]

Long Short-Term Memory

There seems to be a resurgence in using these units in the past year. They were first proposed in 1997 by Hochreiter and Schmidhuber, but, along with most neural network literature seemed to have been forgotten for a while, until work on neural networks made a comeback, and focus started shifting toward RNNs again. Some […]

Neural Turing Machines FAQ

There’s been some interest in Neural Turing Machines paper, and I’ve been getting some questions regarding my implementation via e-mail and the comments section on this blog. I plan to make this a blog post where I’ll regularly come back and update with answers to some of these questions as they come up, so do […]

Learning Gaussian Feature Extractors

While playing around with the MNIST dataset and the example code, I tried to visualise the weights of the connections from the weights to the hidden layer. These can be thought of as feature extractors of the input.

Neural Turing Machines – Copy Task

After much fiddling around with the instability of the training procedure, I still haven’t found a recipe that would get it to converge consistently. I did find though, that training it on shorter sequences first, before letting it see longer ones avoids huge gradients that would make the parameters explode into NaNs. And that is […]

Neural Turing Machines – Implementation Hell

I’ve been struggling with the implementation of the NTM for the past week and a half now. There are various problems that I’ve been trying to deal with. The paper is relatively sparse when it comes to details of the architecture, and a lot more brief when it comes to the training process. Alex Graves […]

Neural Turing Machines – A First Look

Some time last week, a paper from Google DeepMind caught my attention. The paper is of particular interest to me because I’ve been thinking about how a recurrent neural network could learn to have access to an external form of memory. The approach taken here is interesting as it makes use of a balance between […]

Connectionist Temporal Classification (CTC) with Theano

This will be the first time I’m trying to present code I’ve written in an ipython notebook. The style’s different, but I think I’ll permanently switch to this method of presentation for code-intensive posts from now on. A nifty little tool that makes doing this so convenient is ipy2wp. It uses WordPress’ xml-rpc to post […]

Implementing AdaDelta

The end of this post (I don’t know where the article is now. Can’t find it.) had  a diagram showing the improvements of AdaDelta over standard SGD and AdaGrad, so I decided to look up what AdaGrad actually does. The details are written in the paper, including it’s “derivation”. It’s basically an improvement over AdaGrad, using […]

NLP with Neural Networks

Gave a presentation on neural networks at the NUS Web Information retrieval and NLP Group (WING). Idea was mainly to concretise my understanding of the topic and also to share some interesting concepts that have been introduced in neural networks research on NLP, while giving me some sorely needed experience doing public speaking. Not sure how much […]