[문현서] Wide Residual Networks

Wide Residual Networks (2016)

https://arxiv.org/pdf/1605.07146

[요약]

- Deep residual network를 수천 층으로 scale해 성능 개선 가능하지만 개선하려는 정확도 퍼센트 당 두 배의 층이 필요 -> 훈련 속도 매우 느려짐

- Residual network의 깊이를 줄이고 너비를 넓혀 성능 개선 (Wide Residual Networks(WRNs)) – sota 달성

Ex) 간단한 16층 WRN이 정확도와 효율성 측면에서 과거 모든 deep ResNet 뛰어 넘음

[Introduction]

- increase in the number of layers in CNNs -> improvements in image recognition tasks

- but training deep networks has several difficulties: exploding/vanishing gradients and degradation

- up to this point, the study of residual networks focused mainly on the order of activations inside a ResNet block and the depth of residual blocks

-> Q. how do aspects other than the order of activations affect performance?

1. Width vs depth in residual networks

-깊은 학습 가능하게 하는 identity shortcut은 동시에 deep ResNet의 약점이 될 수 있다는 점에 초점 (shortcut 때문에gradient가 residual branch를 거치지 않아 일부 블록이 충분히 학습되지 않는 문제 발생 가능)

2. Use of dropout in ResNet blocks

- 통상적으로dropout (레이어 출력값 일부를 0으로 만들어 모델이 특정 feature에 과도하게 의존하는 것 방지 - > 학습 안정화하는 테크닉) 대신 batch normalization (레이어 출력값 분포 정규화 - > 학습 안정화) 사용 (정확도 더 좋았기 때문)