[KERAS] 콜백 - Callback

2021. 10. 30. 11:27딥러닝

학습시에 특정 시점에 호출될 함수를 등록할 수 있다

 

1.ModelCheckpoint(filepath, monitor='val_loss', verbose=0, save_best_only=False, save_weights_only=False, mode='auto', period=1)

  - 학습 중간중간에 저장할 수 있다.

  - filepath: filepath는 (on_epoch_end에서 전달되는) epoch의 값과 logs의 키로 채워진 이름 형식 옵션을 가질 수 있음.
예를 들어 filepath가 weights.{epoch:02d}-{val_loss:.2f}.hdf5라면, 파일 이름에 세대 번호와 검증 손실을 넣어 모델의 체크포인트가 저장 
* monitor: 모니터할 지표(loss 또는 평가 지표) 

  - val_loss : 줄어드는 것을 모니터링

  - val_accuracy : 커지는 것을 모니터링
* save_best_only: 가장 좋은 성능을 나타내는 모델만 저장할 여부
* save_weights_only: Weights만 저장할 지 여부 

  - True : weight와 bias 값만 저장한다

  - False : layer 구성, 노드 갯수, 노드 정보, activation 정보 등의 값을 같이 저장한다

  - 참고) True를 권장 , SaveWeights / LoadWeights
* mode: {auto, min, max} 중 하나. monitor 지표가 감소해야 좋을 경우 min, 증가해야 좋을 경우 max, auto는 monitor 이름에서 자동으로 유추. 

  - monitor가 val_loss인 경우 min

  - monitor가 val_accuracy인 경우 max

* period 확인 횟수 : 1이면 매번 1epoch시 확인, 3이면 3epoch시 확인

from tensorflow.keras.callbacks import ModelCheckpoint
from tensorflow.keras.optimizers import Adam

model = create_model()
model.compile(optimizer=Adam(0.001), loss='categorical_crossentropy', metrics=['accuracy'])

mcp_cb = ModelCheckpoint(filepath='/kaggle/working/weights.{epoch:02d}-{val_loss:.2f}.hdf5', monitor='val_loss', 
                         save_best_only=True, save_weights_only=True, mode='min', period=3, verbose=1)
history = model.fit(x=tr_images, y=tr_oh_labels, batch_size=128, epochs=10, validation_data=(val_images, val_oh_labels),
                   callbacks=[mcp_cb])

> 결과

Epoch 1/10
399/399 [==============================] - 3s 5ms/step - loss: 0.6208 - accuracy: 0.7874 - val_loss: 0.4639 - val_accuracy: 0.8360
Epoch 2/10
399/399 [==============================] - 2s 4ms/step - loss: 0.4271 - accuracy: 0.8485 - val_loss: 0.4034 - val_accuracy: 0.8553
Epoch 3/10
399/399 [==============================] - 2s 4ms/step - loss: 0.3787 - accuracy: 0.8646 - val_loss: 0.3800 - val_accuracy: 0.8649

Epoch 00003: val_loss improved from inf to 0.37997, saving model to /kaggle/working/weights.03-0.38.hdf5
Epoch 4/10
399/399 [==============================] - 2s 4ms/step - loss: 0.3516 - accuracy: 0.8738 - val_loss: 0.3673 - val_accuracy: 0.8676
Epoch 5/10
399/399 [==============================] - 2s 5ms/step - loss: 0.3331 - accuracy: 0.8798 - val_loss: 0.3420 - val_accuracy: 0.8778
Epoch 6/10
399/399 [==============================] - 2s 4ms/step - loss: 0.3127 - accuracy: 0.8871 - val_loss: 0.3651 - val_accuracy: 0.8716

Epoch 00006: val_loss improved from 0.37997 to 0.36512, saving model to /kaggle/working/weights.06-0.37.hdf5
Epoch 7/10
399/399 [==============================] - 2s 4ms/step - loss: 0.3005 - accuracy: 0.8905 - val_loss: 0.3428 - val_accuracy: 0.8756
Epoch 8/10
399/399 [==============================] - 2s 4ms/step - loss: 0.2852 - accuracy: 0.8963 - val_loss: 0.3354 - val_accuracy: 0.8769
Epoch 9/10
399/399 [==============================] - 2s 4ms/step - loss: 0.2810 - accuracy: 0.8963 - val_loss: 0.3265 - val_accuracy: 0.8813

Epoch 00009: val_loss improved from 0.36512 to 0.32655, saving model to /kaggle/working/weights.09-0.33.hdf5
Epoch 10/10
399/399 [==============================] - 2s 5ms/step - loss: 0.2694 - accuracy: 0.9007 - val_loss: 0.3181 - val_accuracy: 0.8841

 

*** 참고로 jupyter notebook에서는 명령어는 느낌표(!) 뒤에 붙이면 된다

!ls -lia
#!rm -rf weight*
#!ls -lia

 

2.ReduceLROnPlateau(monitor='val_loss', factor=0.1, patience=10, verbose=0, mode='auto', min_delta=0.0001, cooldown=0, min_lr=0)
* 특정 epochs 횟수동안 성능이 개선 되지 않을 시 Learning rate를 동적으로 감소 시킴 
* monitor: 모니터할 지표(loss 또는 평가 지표) 
* factor: 학습 속도를 줄일 인수. new_lr = lr * factor 
* patience: Learing Rate를 줄이기 전에 monitor할 epochs 횟수. 
* mode: {auto, min, max} 중 하나. monitor 지표가 감소해야 좋을 경우 min, 증가해야 좋을 경우 max, auto는 monitor 이름에서 유추. 

*** ex)val_loss가 감소하다가 특정구간에서 patience 횟수만큼 감소하지 않고 답보하거나 증가하면 learning rate에 factor를 곱해서 적용한다 

> 코드

from tensorflow.keras.callbacks import ReduceLROnPlateau

model = create_model()
model.compile(optimizer=Adam(0.001), loss='categorical_crossentropy', metrics=['accuracy'])

rlr_cb = ReduceLROnPlateau(monitor='val_loss', factor=0.3, patience=3, mode='min', verbose=1)
history = model.fit(x=tr_images, y=tr_oh_labels, batch_size=128, epochs=30, validation_data=(val_images, val_oh_labels),
                   callbacks=[rlr_cb])

> 결과(30번의 epoch를 돌리는데 val_loss가 3번이 상 감소하지 않으면 learning_rate에 0.3을 곱한다

Epoch 1/30
399/399 [==============================] - 3s 5ms/step - loss: 0.6246 - accuracy: 0.7862 - val_loss: 0.4564 - val_accuracy: 0.8412
Epoch 2/30
399/399 [==============================] - 2s 5ms/step - loss: 0.4167 - accuracy: 0.8547 - val_loss: 0.4136 - val_accuracy: 0.8520
Epoch 3/30
399/399 [==============================] - 2s 5ms/step - loss: 0.3772 - accuracy: 0.8668 - val_loss: 0.3806 - val_accuracy: 0.8647
Epoch 4/30
399/399 [==============================] - 2s 5ms/step - loss: 0.3506 - accuracy: 0.8757 - val_loss: 0.3913 - val_accuracy: 0.8581
Epoch 5/30
399/399 [==============================] - 2s 5ms/step - loss: 0.3294 - accuracy: 0.8816 - val_loss: 0.3576 - val_accuracy: 0.8702
Epoch 6/30
399/399 [==============================] - 2s 6ms/step - loss: 0.3130 - accuracy: 0.8857 - val_loss: 0.3492 - val_accuracy: 0.8758
Epoch 7/30
399/399 [==============================] - 2s 5ms/step - loss: 0.2984 - accuracy: 0.8930 - val_loss: 0.3354 - val_accuracy: 0.8804
Epoch 8/30
399/399 [==============================] - 2s 5ms/step - loss: 0.2905 - accuracy: 0.8931 - val_loss: 0.3288 - val_accuracy: 0.8813
Epoch 9/30
399/399 [==============================] - 2s 5ms/step - loss: 0.2795 - accuracy: 0.8974 - val_loss: 0.3295 - val_accuracy: 0.8799
Epoch 10/30
399/399 [==============================] - 2s 5ms/step - loss: 0.2694 - accuracy: 0.9016 - val_loss: 0.3254 - val_accuracy: 0.8834
Epoch 11/30
399/399 [==============================] - 2s 5ms/step - loss: 0.2611 - accuracy: 0.9036 - val_loss: 0.3229 - val_accuracy: 0.8806
Epoch 12/30
399/399 [==============================] - 2s 5ms/step - loss: 0.2530 - accuracy: 0.9064 - val_loss: 0.3449 - val_accuracy: 0.8756
Epoch 13/30
399/399 [==============================] - 2s 5ms/step - loss: 0.2490 - accuracy: 0.9086 - val_loss: 0.3321 - val_accuracy: 0.8802
Epoch 14/30
399/399 [==============================] - 2s 5ms/step - loss: 0.2432 - accuracy: 0.9095 - val_loss: 0.3236 - val_accuracy: 0.8847

Epoch 00014: ReduceLROnPlateau reducing learning rate to 0.0003000000142492354.
Epoch 15/30
399/399 [==============================] - 2s 5ms/step - loss: 0.2120 - accuracy: 0.9214 - val_loss: 0.3065 - val_accuracy: 0.8904
Epoch 16/30
399/399 [==============================] - 2s 5ms/step - loss: 0.2076 - accuracy: 0.9241 - val_loss: 0.3047 - val_accuracy: 0.8919
Epoch 17/30
399/399 [==============================] - 2s 5ms/step - loss: 0.2047 - accuracy: 0.9248 - val_loss: 0.3078 - val_accuracy: 0.8909
Epoch 18/30
399/399 [==============================] - 2s 5ms/step - loss: 0.2013 - accuracy: 0.9267 - val_loss: 0.3150 - val_accuracy: 0.8891
Epoch 19/30
399/399 [==============================] - 2s 5ms/step - loss: 0.1986 - accuracy: 0.9277 - val_loss: 0.3108 - val_accuracy: 0.8904

Epoch 00019: ReduceLROnPlateau reducing learning rate to 9.000000427477062e-05.
Epoch 20/30
399/399 [==============================] - 2s 5ms/step - loss: 0.1894 - accuracy: 0.9317 - val_loss: 0.3063 - val_accuracy: 0.8922
Epoch 21/30
399/399 [==============================] - 2s 5ms/step - loss: 0.1876 - accuracy: 0.9318 - val_loss: 0.3069 - val_accuracy: 0.8938
Epoch 22/30
399/399 [==============================] - 2s 5ms/step - loss: 0.1866 - accuracy: 0.9325 - val_loss: 0.3051 - val_accuracy: 0.8918

Epoch 00022: ReduceLROnPlateau reducing learning rate to 2.700000040931627e-05.
Epoch 23/30
399/399 [==============================] - 2s 6ms/step - loss: 0.1834 - accuracy: 0.9344 - val_loss: 0.3054 - val_accuracy: 0.8933
Epoch 24/30
399/399 [==============================] - 2s 5ms/step - loss: 0.1829 - accuracy: 0.9338 - val_loss: 0.3052 - val_accuracy: 0.8932
Epoch 25/30
399/399 [==============================] - 2s 5ms/step - loss: 0.1825 - accuracy: 0.9343 - val_loss: 0.3046 - val_accuracy: 0.8926

Epoch 00025: ReduceLROnPlateau reducing learning rate to 8.100000013655517e-06.
Epoch 26/30
399/399 [==============================] - 2s 5ms/step - loss: 0.1815 - accuracy: 0.9350 - val_loss: 0.3053 - val_accuracy: 0.8923
Epoch 27/30
399/399 [==============================] - 2s 6ms/step - loss: 0.1813 - accuracy: 0.9349 - val_loss: 0.3053 - val_accuracy: 0.8923
Epoch 28/30
399/399 [==============================] - 2s 5ms/step - loss: 0.1812 - accuracy: 0.9351 - val_loss: 0.3053 - val_accuracy: 0.8928

Epoch 00028: ReduceLROnPlateau reducing learning rate to 2.429999949526973e-06.
Epoch 29/30
399/399 [==============================] - 2s 5ms/step - loss: 0.1808 - accuracy: 0.9351 - val_loss: 0.3054 - val_accuracy: 0.8923
Epoch 30/30
399/399 [==============================] - 2s 5ms/step - loss: 0.1808 - accuracy: 0.9349 - val_loss: 0.3054 - val_accuracy: 0.8924

 

3. EarlyStopping(monitor='val_loss', min_delta=0, patience=0, verbose=0, mode='auto', baseline=None,                                          restore_best_weights=False)
특정 epochs 동안 성능이 개선되지 않을 시 학습을 조기에 중단
monitor: 모니터할 지표(loss 또는 평가 지표)
patience: Early Stopping 적용 전에 monitor할 epochs 횟수.
mode: {auto, min, max} 중 하나. monitor 지표가 감소해야 좋을 경우 min, 증가해야 좋을 경우 max, auto는 monitor 이름에서 유추.

* 예를 들어 loss는 계속 줄어드는데, val_loss는 늘어나는 경우

from tensorflow.keras.callbacks import EarlyStopping

model = create_model()
model.compile(optimizer=Adam(0.001), loss='categorical_crossentropy', metrics=['accuracy'])

ely_cb = EarlyStopping(monitor='val_loss', patience=3, mode='min', verbose=1)
history = model.fit(x=tr_images, y=tr_oh_labels, batch_size=128, epochs=30, validation_data=(val_images, val_oh_labels),
                   callbacks=[ely_cb])

> 결과 : val_loss가 3번 이상 떨어지지 않으면 중단

Epoch 1/30
399/399 [==============================] - 3s 6ms/step - loss: 0.5900 - accuracy: 0.7998 - val_loss: 0.4971 - val_accuracy: 0.8272
Epoch 2/30
399/399 [==============================] - 2s 5ms/step - loss: 0.4100 - accuracy: 0.8578 - val_loss: 0.4399 - val_accuracy: 0.8404
Epoch 3/30
399/399 [==============================] - 2s 5ms/step - loss: 0.3732 - accuracy: 0.8685 - val_loss: 0.4068 - val_accuracy: 0.8513
Epoch 4/30
399/399 [==============================] - 2s 5ms/step - loss: 0.3482 - accuracy: 0.8766 - val_loss: 0.3620 - val_accuracy: 0.8716
Epoch 5/30
399/399 [==============================] - 2s 6ms/step - loss: 0.3287 - accuracy: 0.8825 - val_loss: 0.3704 - val_accuracy: 0.8672
Epoch 6/30
399/399 [==============================] - 2s 5ms/step - loss: 0.3085 - accuracy: 0.8892 - val_loss: 0.3324 - val_accuracy: 0.8761
Epoch 7/30
399/399 [==============================] - 2s 5ms/step - loss: 0.2992 - accuracy: 0.8906 - val_loss: 0.3522 - val_accuracy: 0.8734
Epoch 8/30
399/399 [==============================] - 2s 5ms/step - loss: 0.2832 - accuracy: 0.8959 - val_loss: 0.3314 - val_accuracy: 0.8804
Epoch 9/30
399/399 [==============================] - 2s 5ms/step - loss: 0.2745 - accuracy: 0.8997 - val_loss: 0.3185 - val_accuracy: 0.8823
Epoch 10/30
399/399 [==============================] - 2s 6ms/step - loss: 0.2674 - accuracy: 0.9015 - val_loss: 0.3168 - val_accuracy: 0.8850
Epoch 11/30
399/399 [==============================] - 2s 6ms/step - loss: 0.2581 - accuracy: 0.9038 - val_loss: 0.3485 - val_accuracy: 0.8728
Epoch 12/30
399/399 [==============================] - 2s 5ms/step - loss: 0.2517 - accuracy: 0.9060 - val_loss: 0.3205 - val_accuracy: 0.8823
Epoch 13/30
399/399 [==============================] - 2s 5ms/step - loss: 0.2423 - accuracy: 0.9096 - val_loss: 0.3110 - val_accuracy: 0.8883
Epoch 14/30
399/399 [==============================] - 2s 6ms/step - loss: 0.2386 - accuracy: 0.9115 - val_loss: 0.3262 - val_accuracy: 0.8877
Epoch 15/30
399/399 [==============================] - 2s 5ms/step - loss: 0.2300 - accuracy: 0.9146 - val_loss: 0.3154 - val_accuracy: 0.8876
Epoch 16/30
399/399 [==============================] - 2s 5ms/step - loss: 0.2244 - accuracy: 0.9161 - val_loss: 0.3186 - val_accuracy: 0.8883
Epoch 00016: early stopping

 

 

* 사용예

  1) Callback 1 : Validation Loss가 향상되는 모델만 저장 

    - save_best_only : True

  2) Callback 2 : 5번 이내에 Loss가 향상되지 않으면 Learning Rate = 기존LR * 0.2

  3) Callback 3 : 10번 이내에 Loss가 향상되지 않으면 더 이상 학습하지 않고 종료

from tensorflow.keras.callbacks import ReduceLROnPlateau
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint

model = create_model()
model.compile(optimizer=Adam(lr=0.001), loss='categorical_crossentropy', metrics=['accuracy'])

# validation loss가 향상되는 모델만 저장.
mcp_cb = ModelCheckpoint(filepath='/kaggle/working/weights.{epoch:02d}-{val_loss:.2f}.hdf5', monitor='val_loss', 
                         save_best_only=True, save_weights_only=True, mode='min', period=1, verbose=0)

# 5번 iteration내에 validation loss가 향상되지 않으면 learning rate을 기존 learning rate * 0.2로 줄임.  
rlr_cb = ReduceLROnPlateau(monitor='val_loss', factor=0.2, patience=5, mode='min', verbose=1)
# 10번 iteration내에 validation loss가 향상되지 않으면 더 이상 학습하지 않고 종료
ely_cb = EarlyStopping(monitor='val_loss', patience=10, mode='min', verbose=1)


history = model.fit(x=tr_images, y=tr_oh_labels, batch_size=32, epochs=30, shuffle=True,
                    validation_data=(val_images, val_oh_labels),  
                    callbacks=[mcp_cb, rlr_cb, ely_cb] )

> 결과 (22번 iteration 후 LR 조정, 27번까지 돌고 종료)

Epoch 1/30
2021-11-08 01:31:08.029196: I tensorflow/stream_executor/cuda/cuda_dnn.cc:369] Loaded cuDNN version 8005
1329/1329 [==============================] - 16s 7ms/step - loss: 1.5707 - accuracy: 0.4319 - val_loss: 1.5818 - val_accuracy: 0.4725
Epoch 2/30
1329/1329 [==============================] - 8s 6ms/step - loss: 1.1025 - accuracy: 0.6091 - val_loss: 0.9318 - val_accuracy: 0.6757
Epoch 3/30
1329/1329 [==============================] - 8s 6ms/step - loss: 0.9168 - accuracy: 0.6804 - val_loss: 0.8748 - val_accuracy: 0.7053
Epoch 4/30
1329/1329 [==============================] - 8s 6ms/step - loss: 0.8169 - accuracy: 0.7155 - val_loss: 0.8552 - val_accuracy: 0.7063
Epoch 5/30
1329/1329 [==============================] - 8s 6ms/step - loss: 0.7370 - accuracy: 0.7476 - val_loss: 0.7564 - val_accuracy: 0.7397
Epoch 6/30
1329/1329 [==============================] - 8s 6ms/step - loss: 0.6601 - accuracy: 0.7747 - val_loss: 0.6641 - val_accuracy: 0.7764
Epoch 7/30
1329/1329 [==============================] - 8s 6ms/step - loss: 0.5948 - accuracy: 0.7965 - val_loss: 0.7745 - val_accuracy: 0.7393
Epoch 8/30
1329/1329 [==============================] - 8s 6ms/step - loss: 0.5477 - accuracy: 0.8139 - val_loss: 0.6274 - val_accuracy: 0.7904
Epoch 9/30
1329/1329 [==============================] - 8s 6ms/step - loss: 0.4969 - accuracy: 0.8302 - val_loss: 0.5946 - val_accuracy: 0.7999
Epoch 10/30
1329/1329 [==============================] - 8s 6ms/step - loss: 0.4542 - accuracy: 0.8427 - val_loss: 0.7064 - val_accuracy: 0.7689
Epoch 11/30
1329/1329 [==============================] - 8s 6ms/step - loss: 0.4284 - accuracy: 0.8537 - val_loss: 0.6313 - val_accuracy: 0.7969
Epoch 12/30
1329/1329 [==============================] - 8s 6ms/step - loss: 0.3928 - accuracy: 0.8657 - val_loss: 0.6357 - val_accuracy: 0.7984
Epoch 13/30
1329/1329 [==============================] - 8s 6ms/step - loss: 0.3600 - accuracy: 0.8757 - val_loss: 0.8772 - val_accuracy: 0.7440
Epoch 14/30
1329/1329 [==============================] - 8s 6ms/step - loss: 0.3359 - accuracy: 0.8828 - val_loss: 0.5897 - val_accuracy: 0.8095
Epoch 15/30
1329/1329 [==============================] - 8s 6ms/step - loss: 0.3163 - accuracy: 0.8897 - val_loss: 0.5692 - val_accuracy: 0.8196
Epoch 16/30
1329/1329 [==============================] - 8s 6ms/step - loss: 0.2925 - accuracy: 0.8983 - val_loss: 0.5771 - val_accuracy: 0.8213
Epoch 17/30
1329/1329 [==============================] - 8s 6ms/step - loss: 0.2770 - accuracy: 0.9027 - val_loss: 0.5220 - val_accuracy: 0.8355
Epoch 18/30
1329/1329 [==============================] - 8s 6ms/step - loss: 0.2576 - accuracy: 0.9110 - val_loss: 0.8496 - val_accuracy: 0.7651
Epoch 19/30
1329/1329 [==============================] - 8s 6ms/step - loss: 0.2446 - accuracy: 0.9163 - val_loss: 0.5671 - val_accuracy: 0.8309
Epoch 20/30
1329/1329 [==============================] - 8s 6ms/step - loss: 0.2308 - accuracy: 0.9202 - val_loss: 0.6052 - val_accuracy: 0.8303
Epoch 21/30
1329/1329 [==============================] - 8s 6ms/step - loss: 0.2150 - accuracy: 0.9254 - val_loss: 0.6799 - val_accuracy: 0.8145
Epoch 22/30
1329/1329 [==============================] - 8s 6ms/step - loss: 0.2024 - accuracy: 0.9299 - val_loss: 0.6664 - val_accuracy: 0.8232

Epoch 00022: ReduceLROnPlateau reducing learning rate to 0.00020000000949949026.
Epoch 23/30
1329/1329 [==============================] - 8s 6ms/step - loss: 0.1253 - accuracy: 0.9566 - val_loss: 0.5437 - val_accuracy: 0.8607
Epoch 24/30
1329/1329 [==============================] - 8s 6ms/step - loss: 0.0989 - accuracy: 0.9652 - val_loss: 0.5878 - val_accuracy: 0.8515
Epoch 25/30
1329/1329 [==============================] - 8s 6ms/step - loss: 0.0834 - accuracy: 0.9700 - val_loss: 0.5921 - val_accuracy: 0.8587
Epoch 26/30
1329/1329 [==============================] - 8s 6ms/step - loss: 0.0784 - accuracy: 0.9725 - val_loss: 0.5951 - val_accuracy: 0.8569
Epoch 27/30
1329/1329 [==============================] - 8s 6ms/step - loss: 0.0718 - accuracy: 0.9741 - val_loss: 0.6518 - val_accuracy: 0.8548

Epoch 00027: ReduceLROnPlateau reducing learning rate to 4.0000001899898055e-05.
Epoch 00027: early stopping