DeepMind has in the past week released a paper proposing yet another approach to having a memory structure within a neural network. This time, they implement a stack, queue and a deque “data structure” within their models. While this idea is not necessarily new, it incorporates some of the broad ideas seen in the Neural Turing Machines, where they try to have a model that is end-to-end differentiable, rather than have the data structure decoupled from the training process. I have to admit I haven’t read any of these previous papers before, but it’s definitely on my to read list.
In any case, this paper claims that using these memory structures beats having an 8-layered LSTM network trained for the same task. If this is true, this may mean we finally have some justification for these fancier models — simply throwing bigger networks at problems just isn’t as efficient.
I’ve spent some time trying to puzzle out what exactly they’re trying to do here with the neural stack. I suspect once I’ve figured this out, the queue and deque will be pretty similar, so I don’t think I will go through them in the same detail.
Continue reading →