Site de Vincent Gripon

Blog sur mes recherches et mon enseignement

An Intrinsic Difference Between Vanilla RNNs and GRU Models

T. Stérin, N. Farrugia et V. Gripon, "An Intrinsic Difference Between Vanilla RNNs and GRU Models," dans Proceedings of Cognitive, pp. 76--81, février 2017.

In order to perform well in practice, Recurrent Neural Networks (RNN) require computationally heavy architectures, such as Gated Recurrent Unit (GRU) or Long Short Term Memory (LSTM). Indeed, the original Vanilla model fails to encapsulate middle and long term sequential dependencies. The aim of this paper is to show that gradient training issues, which have motivated the introduction of LSTM and GRU models, are not sufficient to explain the failure of the simplest RNN. Using the example of Reber’s grammar, we propose an experimental measure of both Vanilla and GRU models, which suggest an intrinsic difference in their dynamics. A better mathematical understanding of this difference could lead to more efficient models without compromising performance.

Télécharger le manuscrit.

Bibtex
@inproceedings{StFarGri201702,
  author = {Tristan Stérin and Nicolas Farrugia and
Vincent Gripon},
  title = {An Intrinsic Difference Between Vanilla
RNNs and GRU Models},
  booktitle = {Proceedings of Cognitive},
  year = {2017},
  pages = {76--81},
  month = {February},
}




Vous êtes le 843989ème visiteur

Site de Vincent Gripon