Google DeepMind strikes again
http://csunplugged.org/the-turing-test/
Machine Translation
Subset of the WMT'2014 English to French Machine Translation track dataset
BLEU
Example with Phrase-based Machine Translation
A typical pipeline has many steps
Mostly trained/tuned independently
Apply Neural Language Models to Machine Translation:
→ one simple model instead
of many in previous work
http://www.walthampton.com/success/road-blocks-hula-hoops/
Fixed input size, fixed output size → usual feed forward networks work:
Varying input size, output size = input size:
Problems:
Key idea to model varying input and output lengths:
use the encoder-decoder pattern: 2 networks
Fixed input size, varying output size → Combination of feed forward and recurrent networks (2014):
Vinyals, Oriol, et al. "Show and tell: A neural image caption generator." 2014.
Varying input size, varying output size → Combination of two recurrent networks (this paper):
Model | Method | Performance |
---|---|---|
1 model normal input | generation | 26.17 |
1 model normal input | rescoring | 35.61 |
1 model reversed input | generation | 30.59 |
1 model reversed input | rescoring | 35.85 |
5 models reversed input | generation | 34.81 |
5 models reversed input | rescoring | 36.5 |
Klaus Greff, Rupesh Kumar Srivastava, Jan Koutník, Bas R. Steunebrink, Jürgen Schmidhuber. "LSTM: A Search Space Odyssey." Pre-print.
Alex Graves. "Supervised Sequence Labelling with Recurrent Neural Networks." Vol. 385. Heidelberg: Springer, 2012.
Alex Graves. "Supervised Sequence Labelling with Recurrent Neural Networks." Vol. 385. Heidelberg: Springer, 2012.
Sepp Hochreiter and Jürgen Schmidhuber. "Long short-term memory." Neural computation 9.8 (1997): 1735-1780.
The internal state tends to become saturated → Need a way to reset it.
Sepp Hochreiter and Jürgen Schmidhuber. "Long short-term memory." Neural computation 9.8 (1997): 1735-1780.
A LSTM with satured memory is just a standard RNN cell (it can't remember anything).
Solution: add a forget gate
Gers, Felix A., Jürgen Schmidhuber, and Fred Cummins. "Learning to forget: Continual prediction with LSTM." Neural computation 12.10 (2000): 2451-2471.
→ Now LSTMs can learn when to decrease/flush the state
Gers, Felix A., Jürgen Schmidhuber, and Fred Cummins. "Learning to forget: Continual prediction with LSTM." Neural computation 12.10 (2000): 2451-2471.
No end-of-sequence markers
Gates make decision with no knowledge of the state.
Sepp Hochreiter and Jürgen Schmidhuber. "Long short-term memory." Neural computation 9.8 (1997): 1735-1780.
Solution: add connections from the state to the gates.
Klaus Greff, Rupesh Kumar Srivastava, Jan Koutník, Bas R. Steunebrink, Jürgen Schmidhuber. "LSTM: A Search Space Odyssey." Pre-print.
Original paper | Long Short-Term Memory |
Forget gate | Learning to Forget: Continual Prediction with LSTM |
Peephole connections | Learning Precise Timing with LSTM Recurrent Networks |
Survey of many LSTM variants | LSTM: A Search Space Odyssey |
Image Captioning | Show and Tell: A Neural Image Caption Generator |
Machine Translation | Neural Machine Translation by Jointly Learning to Align and Translate |
Keywords Detection | Deep Sentence Embedding Using the Long Short Term Memory Network |
Speech Recognition | Towards End-to-End Speech Recognition with Recurrent Neural Networks |
Handwriting Generation | Generating Sequences With Recurrent Neural Networks |
Sentiment Analysis | Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks |
Neural Networks Overview | Supervised Sequence Labelling with Recurrent Neural Networks |
Regularization | Recurrent Neural Network Regularization |
Recurrent Networks Training | On the difficulty of training Recurrent Neural Networks |
CNN alternative | ReNet: A Recurrent Neural Network Based Alternative to Convolutional Networks |
Great LSTM blog post + code | The Unreasonable Effectiveness of Recurrent Neural Networks |