GRU Basics: Simplifying Recurrent Neural Networks (Lecture 13)

In this lecture, we introduce GRU (Gated Recurrent Unit) networks, a simpler and faster variant of LSTM. You will learn the theory behind GRU gates, compare it with RNN and LSTM, and implement a sentiment analysis model on the IMDB dataset using TensorFlow/Keras.


Table of Contents

{% toc %}


1) Why GRU?

Traditional RNNs suffer from the vanishing gradient problem, making it difficult to learn long-term dependencies.
LSTMs solve this with a more complex structure but at the cost of slower training.

GRU Advantages:

  • Fewer gates (2 instead of 3 in LSTM)
  • Faster training
  • Comparable accuracy in many tasks

2) GRU Architecture

GRUs rely on two gates:

  1. Reset Gate (r_t) – decides how much past information to forget
  2. Update Gate (z_t) – decides how much new information to add

Key Equations

  • Update gate:
1
2

z\_t = σ(W\_z · \[h\_{t-1}, x\_t])
  • Reset gate:
1
2

r\_t = σ(W\_r · \[h\_{t-1}, x\_t])
  • Candidate hidden state:
1
2

h̃\_t = tanh(W · \[r\_t \* h\_{t-1}, x\_t])
  • Final hidden state:
1
2

h\_t = (1 - z\_t) \* h\_{t-1} + z\_t \* h̃\_t

3) Example: IMDB Sentiment Analysis with GRU

Let’s implement a GRU model to classify movie reviews as positive or negative.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
import tensorflow as tf
from tensorflow.keras.datasets import imdb
from tensorflow.keras.preprocessing import sequence
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, GRU, Dense

# 1. Load dataset
max_features = 10000
maxlen = 200
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=max_features)

# 2. Pad sequences
x_train = sequence.pad_sequences(x_train, maxlen=maxlen)
x_test = sequence.pad_sequences(x_test, maxlen=maxlen)

# 3. Build GRU model
model = Sequential()
model.add(Embedding(max_features, 128))
model.add(GRU(128))
model.add(Dense(1, activation='sigmoid'))

# 4. Compile and train
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
history = model.fit(x_train, y_train, epochs=3, batch_size=64, validation_split=0.2)

# 5. Evaluate
loss, acc = model.evaluate(x_test, y_test)
print("Test Accuracy:", acc)

Expected Output:

1
2
3
4
Epoch 1/3 - val_acc: ~0.83
Epoch 2/3 - val_acc: ~0.86
Epoch 3/3 - val_acc: ~0.87
Test Accuracy: ~0.87

4) GRU vs LSTM

FeatureLSTMGRU
Gates3 (Forget, Input, Output)2 (Reset, Update)
Memory UnitsCell state + Hidden stateHidden state only
SpeedSlowerFaster
AccuracyOften higherComparable

5) Key Takeaways

  • GRU is a simpler alternative to LSTM.
  • Uses reset and update gates to balance past and new information.
  • Achieves ~87% accuracy on IMDB sentiment classification.