An Intrinsic Difference Between Vanilla RNNs and GRU Models
In order to perform well in practice, Recurrent Neural Networks (RNN) require computationally heavy architectures, such as Gated Recurrent Unit (GRU) or Long Short Term Memory (LSTM). Indeed, the original Vanilla model fails to encapsulate middle and long term sequential dependencies. The aim of this paper is to show that gradient training issues, which have motivated the introduction of LSTM and GRU models, are not sufficient to explain the failure of the simplest RNN. Using the example of Reber’s grammar, we propose an experimental measure of both Vanilla and GRU models, which suggest an intrinsic difference in their dynamics. A better mathematical understanding of this difference could lead to more efficient models without compromising performance.
Download manuscript.
Bibtex@inproceedings{StÃFarGri201702,
author = {Tristan Stérin and Nicolas Farrugia and
Vincent Gripon},
title = {An Intrinsic Difference Between Vanilla
RNNs and GRU Models},
booktitle = {Proceedings of Cognitive},
year = {2017},
pages = {76--81},
month = {February},
}