Below you will find pages that utilize the taxonomy term “Neural Turing Machines”
Neural Turing Machines FAQ
There’s been some interest in Neural Turing Machines paper, and I’ve been getting some questions regarding my implementation via e-mail and the comments section on this blog. I plan to make this a blog post where I’ll regularly come back and update with answers to some of these questions as they come up, so do check back!
Neural Turing Machines – Copy Task
After much fiddling around with the instability of the training procedure, I still haven’t found a recipe that would get it to converge consistently.
I did find though, that training it on shorter sequences first, before letting it see longer ones avoids huge gradients that would make the parameters explode into NaNs. And that is a huge help. Doing that still does not guarantee convergence though, and I only get a good model at random, like this one I’ve trained here copying a sequence of length 10:
Neural Turing Machines – Implementation Hell
I’ve been struggling with the implementation of the NTM for the past week and a half now.
<p>
There are various problems that I’ve been trying to deal with. The paper is relatively sparse when it comes to details of the architecture, and a lot more brief when it comes to the training process. Alex Graves trains RNNs a lot in his work, and it seems to me some of the tricks he has used here might have been distributed through his previous work.
</p>
</div>
Neural Turing Machines – A First Look
Some time last week, a paper from Google DeepMind caught my attention.
<p>
The paper is of particular interest to me because I’ve been thinking about how a recurrent neural network could learn to have access to an external form of memory. The approach taken here is interesting as it makes use of a balance between seeking using similarity of content, and shifting from that using location.
</p>
<p>
My focus this time would be on some of the details needed for implementation. Some of these specifics are glossed over in the paper, and I’ll try to infer whatever I can and, perhaps in the next post, have code (in Theano, what else?) to present.
</p>
<p>