對抗性訓練是一種用于增強模型對抗攻擊的方法。在Keras中,可以通過以下步驟實現對抗性訓練:
import tensorflow as tf
from tensorflow.keras import layers
from cleverhans.future.tf2.attacks import projected_gradient_descent
# 創建一個帶有對抗性訓練的模型
model = tf.keras.Sequential([
layers.Input(shape=(28, 28, 1)),
layers.Conv2D(32, (3, 3), activation='relu'),
layers.MaxPooling2D((2, 2)),
layers.Conv2D(64, (3, 3), activation='relu'),
layers.MaxPooling2D((2, 2)),
layers.Flatten(),
layers.Dense(64, activation='relu'),
layers.Dense(10, activation='softmax')
])
# 定義PGD攻擊
pgd_attack = projected_gradient_descent.ProjectedGradientDescent(model)
# 對抗性訓練循環
for images, labels in train_dataset:
with tf.GradientTape() as tape:
# 前向傳播
predictions = model(images)
# 計算損失
loss = tf.keras.losses.sparse_categorical_crossentropy(labels, predictions)
# 對抗攻擊
adv_images = pgd_attack.generate(images, y=labels)
# 前向傳播(對抗性樣本)
adv_predictions = model(adv_images)
adv_loss = tf.keras.losses.sparse_categorical_crossentropy(labels, adv_predictions)
# 損失合并
total_loss = loss + adv_loss
# 反向傳播
gradients = tape.gradient(total_loss, model.trainable_variables)
optimizer.apply_gradients(zip(gradients, model.trainable_variables))
在上面的代碼中,我們使用PGD攻擊生成對抗樣本,并在訓練循環中使用這些對抗樣本來訓練模型。在計算總損失時,我們將原始圖像和對抗性圖像的損失合并在一起。
# 對抗攻擊評估
adv_accuracy = tf.keras.metrics.SparseCategoricalAccuracy()
for images, labels in test_dataset:
adv_images = pgd_attack.generate(images, y=labels)
adv_predictions = model(adv_images)
adv_accuracy.update_state(labels, adv_predictions)
print("Adversarial accuracy: ", adv_accuracy.result())
通過以上步驟,可以在Keras中實現對抗性訓練來提高模型的魯棒性。