GRU Basics: Simplifying Recurrent Neural Networks (Lecture 13)
In this lecture, we introduce GRU (Gated Recurrent Unit) networks, a simpler and faster variant of LSTM. You will learn the theory behind GRU gates, compare it with RNN and LSTM, and implement a sentiment analysis model on the IMDB dataset using TensorFlow/Keras.
Table of Contents
{% toc %}
1) Why GRU?
Traditional RNNs suffer from the vanishing gradient problem, making it difficult to learn long-term dependencies.
LSTMs solve this with a more complex structure but at the cost of slower training.
GRU Advantages:
- Fewer gates (2 instead of 3 in LSTM)
- Faster training
- Comparable accuracy in many tasks
2) GRU Architecture
GRUs rely on two gates:
- Reset Gate (r_t) – decides how much past information to forget
- Update Gate (z_t) – decides how much new information to add
Key Equations
- Update gate:
|
|
- Reset gate:
|
|
- Candidate hidden state:
|
|
- Final hidden state:
|
|
3) Example: IMDB Sentiment Analysis with GRU
Let’s implement a GRU model to classify movie reviews as positive or negative.
|
|
Expected Output:
|
|
4) GRU vs LSTM
Feature | LSTM | GRU |
---|---|---|
Gates | 3 (Forget, Input, Output) | 2 (Reset, Update) |
Memory Units | Cell state + Hidden state | Hidden state only |
Speed | Slower | Faster |
Accuracy | Often higher | Comparable |
5) Key Takeaways
- GRU is a simpler alternative to LSTM.
- Uses reset and update gates to balance past and new information.
- Achieves ~87% accuracy on IMDB sentiment classification.