LSTM Basics: Understanding Long Short-Term Memory Networks (Lecture 12)
In this lecture, we will explore LSTM (Long Short-Term Memory) networks. Unlike simple RNNs that struggle with long-term dependencies, LSTMs use special gates to remember or forget information, making them powerful for NLP, speech recognition, and time-series prediction.
Table of Contents
{% toc %}
1) Why Do We Need LSTM?
Traditional RNNs suffer from the vanishing gradient problem, making it difficult to capture long-term context.
For example:
In the sentence, “Today the weather is sunny and I feel happy because I went to the park with my friend …”
A basic RNN may forget the earlier word “weather” by the time it processes the later words.
LSTMs solve this by introducing gates that regulate the flow of information.
2) LSTM Architecture Overview
LSTMs consist of three main gates:
- Forget Gate → decides what past information to discard.
- Input Gate → decides what new information to store.
- Output Gate → decides what information to pass on.
Together, they maintain a cell state (long-term memory) and a hidden state (short-term memory).
3) Mathematical Formulation (Simplified)
- Forget gate:
|
|
- Input gate:
|
|
- Output gate:
|
|
Where:
σ
is the sigmoid functiontanh
is hyperbolic tangentC_t
is the cell stateh_t
is the hidden state
4) Hands-on Example: Sentiment Classification with LSTM
We’ll use the IMDB movie reviews dataset to classify reviews as positive or negative.
|
|
Example Output
|
|
5) Key Takeaways
- LSTM solves the long-term dependency problem of RNNs.
- The gate mechanism allows selective memory and forgetting.
- In practice, LSTMs perform well on text, speech, and time-series data.
- Our IMDB example achieved ~87% accuracy.