GRU Basics: Simplifying Recurrent Neural Networks (Lecture 13)

In this lecture, we introduce GRU (Gated Recurrent Unit) networks, a simpler and faster variant of LSTM. You will learn the theory behind GRU gates, compare it with RNN and LSTM, and implement a sentiment analysis model on the IMDB dataset using TensorFlow/Keras.

{% toc %}

1) Why GRU?

Traditional RNNs suffer from the vanishing gradient problem, making it difficult to learn long-term dependencies.
LSTMs solve this with a more complex structure but at the cost of slower training.

GRU Advantages:

Fewer gates (2 instead of 3 in LSTM)
Faster training
Comparable accuracy in many tasks

2) GRU Architecture

GRUs rely on two gates:

Reset Gate (r_t) – decides how much past information to forget
Update Gate (z_t) – decides how much new information to add

Key Equations

Update gate:

1
2

z\_t = σ(W\_z · \[h\_{t-1}, x\_t])

Reset gate:

1
2

r\_t = σ(W\_r · \[h\_{t-1}, x\_t])

Candidate hidden state:

1
2

h̃\_t = tanh(W · \[r\_t \* h\_{t-1}, x\_t])

Final hidden state:

1
2

h\_t = (1 - z\_t) \* h\_{t-1} + z\_t \* h̃\_t

3) Example: IMDB Sentiment Analysis with GRU

Let’s implement a GRU model to classify movie reviews as positive or negative.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
import tensorflow as tf
from tensorflow.keras.datasets import imdb
from tensorflow.keras.preprocessing import sequence
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, GRU, Dense

# 1. Load dataset
max_features = 10000
maxlen = 200
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=max_features)

# 2. Pad sequences
x_train = sequence.pad_sequences(x_train, maxlen=maxlen)
x_test = sequence.pad_sequences(x_test, maxlen=maxlen)

# 3. Build GRU model
model = Sequential()
model.add(Embedding(max_features, 128))
model.add(GRU(128))
model.add(Dense(1, activation='sigmoid'))

# 4. Compile and train
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
history = model.fit(x_train, y_train, epochs=3, batch_size=64, validation_split=0.2)

# 5. Evaluate
loss, acc = model.evaluate(x_test, y_test)
print("Test Accuracy:", acc)

Expected Output:

1
2
3
4
Epoch 1/3 - val_acc: ~0.83
Epoch 2/3 - val_acc: ~0.86
Epoch 3/3 - val_acc: ~0.87
Test Accuracy: ~0.87

4) GRU vs LSTM

Feature	LSTM	GRU
Gates	3 (Forget, Input, Output)	2 (Reset, Update)
Memory Units	Cell state + Hidden state	Hidden state only
Speed	Slower	Faster
Accuracy	Often higher	Comparable

5) Key Takeaways

GRU is a simpler alternative to LSTM.
Uses reset and update gates to balance past and new information.
Achieves ~87% accuracy on IMDB sentiment classification.

GRU Basics: Simplifying Recurrent Neural Networks (Lecture 13)

GRU Basics: Simplifying Recurrent Neural Networks (Lecture 13)

Table of Contents

1) Why GRU?

2) GRU Architecture

Key Equations

3) Example: IMDB Sentiment Analysis with GRU

4) GRU vs LSTM

5) Key Takeaways

Recommended Reading

GRU Basics: Simplifying Recurrent Neural Networks (Lecture 13)#

Table of Contents#

1) Why GRU?#

2) GRU Architecture#

Key Equations#

3) Example: IMDB Sentiment Analysis with GRU#

4) GRU vs LSTM#

5) Key Takeaways#

Recommended Reading#

GRU Basics: Simplifying Recurrent Neural Networks (Lecture 13)

Table of Contents

1) Why GRU?

2) GRU Architecture

Key Equations

3) Example: IMDB Sentiment Analysis with GRU

4) GRU vs LSTM

5) Key Takeaways

Recommended Reading