GRU Basics: Simplifying Recurrent Neural Networks (Lecture 13)
In this lecture, we introduce GRU (Gated Recurrent Unit) networks, a simpler and faster variant of LSTM. You will learn the theory behind GRU gates, compare it with RNN and LSTM, and implement a sentiment analysis model on the IMDB dataset using TensorFlow/Keras.
Table of Contents
{% toc %}
1) Why GRU?
Traditional RNNs suffer from the vanishing gradient problem, making it difficult to learn long-term dependencies.
LSTMs solve this with a more complex structure but at the cost of slower training.
GRU Advantages:
- Fewer gates (2 instead of 3 in LSTM)
- Faster training
- Comparable accuracy in many tasks
2) GRU Architecture
GRUs rely on two gates:
- Reset Gate (r_t) – decides how much past information to forget
- Update Gate (z_t) – decides how much new information to add
Key Equations
- Update gate:
| |
- Reset gate:
| |
- Candidate hidden state:
| |
- Final hidden state:
| |
3) Example: IMDB Sentiment Analysis with GRU
Let’s implement a GRU model to classify movie reviews as positive or negative.
| |
Expected Output:
| |
4) GRU vs LSTM
| Feature | LSTM | GRU |
|---|---|---|
| Gates | 3 (Forget, Input, Output) | 2 (Reset, Update) |
| Memory Units | Cell state + Hidden state | Hidden state only |
| Speed | Slower | Faster |
| Accuracy | Often higher | Comparable |
5) Key Takeaways
- GRU is a simpler alternative to LSTM.
- Uses reset and update gates to balance past and new information.
- Achieves ~87% accuracy on IMDB sentiment classification.