250x250
Link
๋‚˜์˜ GitHub Contribution ๊ทธ๋ž˜ํ”„
Loading data ...
Notice
Recent Posts
Recent Comments
๊ด€๋ฆฌ ๋ฉ”๋‰ด

Data Science LAB

ImageNet Classification with Deep Convolutional Neural Networks ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ ๋ฐ ์ •๋ฆฌ ๋ณธ๋ฌธ

๐Ÿ“œ ๋…ผ๋ฌธ review

ImageNet Classification with Deep Convolutional Neural Networks ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ ๋ฐ ์ •๋ฆฌ

ใ…… ใ…œ ใ…” ใ…‡ 2023. 4. 24. 00:33
728x90

๋”ฅ๋Ÿฌ๋‹์˜ ๊ธฐ์ดˆ ๋…ผ๋ฌธ์ด๋ผ๊ณ  ํ•  ์ˆ˜ ์žˆ๋Š” ImageNet Classification with Deep Convolutional Neural Networks (AlexNet) ๋ฅผ ๋ฆฌ๋ทฐํ•ด ๋ณด๋ ค๊ณ  ํ•œ๋‹ค. 

Link : https://proceedings.neurips.cc/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf

 

0. Abstract

  • ImageNet LSVRC-2010 ๋Œ€ํšŒ์—์„œ 120๋งŒ ๊ฐœ์˜ ๊ณ ํ•ด์ƒ๋„ ์ด๋ฏธ์ง€๋ฅผ 1000๊ฐœ์˜ ๋‹ค๋ฅธ ํด๋ž˜์Šค๋กœ ๋ถ„๋ฅ˜ํ•˜๊ธฐ ์œ„ํ•ด ๋Œ€๊ทœ๋ชจ์˜ ์‹ฌ์ธต ์ปจ๋ณผ๋ฃจ์…˜ ์‹ ๊ฒฝ๋ง์„ ํ›ˆ๋ จ์‹œํ‚ด
  • top1, top5 ์˜ค๋ฅ˜์œจ์€ ๊ฐ๊ฐ 37.5%, 17.0%์˜ ์˜ค๋ฅ˜์œจ์„ ๋‹ฌ์„ฑํ•จ (์ด์ „๋ณด๋‹ค ์ƒ๋‹นํžˆ ๋ฐœ์ „)
  • ๋งค์šฐ ํšจ์œจ์ ์ธ GPU ์‚ฌ์šฉ
  • ์˜ค๋ฒ„ ํ”ผํŒ…์„ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•ด ์ƒˆ๋กœ์šด ๋ฐฉ๋ฒ•์ธ ‘dropout’ ๊ธฐ๋ฒ• ์‚ฌ์šฉ
  • 5๊ฐœ์˜ convoloution layer์™€ 3๊ฐœ์˜ fully connected layer๋ฅผ ์‚ฌ์šฉํ•จ

 

 

1. Introduction

  • ์ด์ „๊นŒ์ง€์˜ ๋ฐ์ดํ„ฐ์…‹์€ ์ˆ˜ ๋งŒ์žฅ ์ •๋„๋กœ ์ƒ๋Œ€์ ์œผ๋กœ ์ž‘์•˜๊ธฐ ๋•Œ๋ฌธ์— ๊ฐ„๋‹จํ•œ ์ธ์‹ ์ž‘์—…์€ ์ž˜ ์ˆ˜ํ–‰ํ•จ(MNIST ๋ฐ์ดํ„ฐ์…‹). ๊ทธ๋Ÿฌ๋‚˜ ์‹ค์ œ ์‚ฌ๋ฌผ์€ ์ด๋Ÿฌํ•œ ๋ฐ์ดํ„ฐ์™€ ๋‹ค๋ฅด๊ธฐ ๋•Œ๋ฌธ์— ํ›จ์”ฌ ๋งŽ์€ ๋ฐ์ดํ„ฐ์…‹์ด ํ•„์š”ํ•จ → ImageNet์ด๋ผ๋Š” 22000 ์ด์ƒ์˜ ์นดํ…Œ๊ณ ๋ฆฌ์˜ 15๋งŒ์žฅ์˜ ๊ณ ํ•ด์ƒ๋„ ์ด๋ฏธ์ง€ ๋ฐ์ดํ„ฐ์…‹ ๋“ฑ์žฅ
  • ๋Œ€์šฉ๋Ÿ‰์˜ ๋ฐ์ดํ„ฐ์…‹์„ ์ฒ˜๋ฆฌํ•˜๊ธฐ ์œ„ํ•ด ๋” ํฐ ์šฉ๋Ÿ‰์˜ ๋ชจ๋ธ์ด ํ•„์š”ํ•จ
  • CNN์˜ ์šฉ๋Ÿ‰์€ depth๋‚˜ breath๋ฅผ ๋‹ฌ๋ฆฌํ•˜๋ฉด์„œ ์กฐ์ ˆํ•จ, ํ›จ์”ฌ ์ ์€ connection๊ณผ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์‚ฌ์šฉํ•ด ํ›ˆ๋ จํ•˜๊ธฐ ์‰ฝ์ง€๋งŒ, ์„ฑ๋Šฅ์ด ๋‚˜์  ๊ฐ€๋Šฅ์„ฑ ์กด์žฌ
  • ์ด ๋…ผ๋ฌธ์˜ ๊ธฐ์—ฌ
    • ILSVRC-2010 ๋ฐ ILSVRC-2012 ๋Œ€ํšŒ์— ์‚ฌ์šฉ๋œ ImageNet ๋ฐ์ดํ„ฐ์…‹์—์„œ ๊ฐ€์žฅ ํฐ CNN์„ ํ›ˆ๋ จ์‹œ์ผฐ๊ณ  ์ตœ๊ณ ์˜ ์„ฑ๋Šฅ์„ ๋‹ฌ์„ฑํ•จ
    • Highly-optimized GPU๋ฅผ ์ด์šฉํ•œ 2D conv์˜ implementation๊ณผ CNN์„ ํ•™์Šต์‹œํ‚ค๊ธฐ ์œ„ํ•œ operation์„ ๊ณต๊ฐœํ•จ
    • ๊ณผ์ ํ•ฉ์„ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•ด ์—ฌ๋Ÿฌ ๋ฐฉ๋ฒ• ์‚ฌ์šฉํ•จ
    • 5๊ฐœ์˜ conv-layer์™€ 3๊ฐœ์˜ fc-layer๋ฅผ ์‚ฌ์šฉํ–ˆ๋Š”๋ฐ conv-layer ์ค‘ ์–ด๋–ค layer๋ฅผ ์ œ๊ฑฐํ•ด๋„ ์„ฑ๋Šฅ์ด ํ™•์—ฐํžˆ ๋–จ์–ด์ง (๋ชจ๋‘ ์œ ์˜๋ฏธํ•œ layer)

 

 

 

2. The Dataset

  • ImageNet์€ ์•ฝ 22,000๊ฐœ ๋ฒ”์ฃผ์— ์†ํ•˜๋Š” 1,500๋งŒ ๊ฐœ ์ด์ƒ์˜ ๋ ˆ์ด๋ธ”์ด ์ง€์ •๋œ ๊ณ ํ•ด์ƒ๋„ ์ด๋ฏธ์ง€์˜ ๋ฐ์ดํ„ฐ์…‹
  • ILSVRC๋Š” 1000๊ฐœ์˜ ๋ฒ”์ฃผ ๊ฐ๊ฐ์— ์•ฝ 1000๊ฐœ์˜ ์ด๋ฏธ์ง€๊ฐ€ ์žˆ๋Š” ImageNet์˜ ๋ถ€๋ถ„ ์ง‘ํ•ฉ์œผ๋กœ ์ด ์•ฝ 120๋งŒ ๊ฐœ์˜ ํ›ˆ๋ จ ์ด๋ฏธ์ง€, 50,000๊ฐœ์˜ ๊ฒ€์ฆ ์ด๋ฏธ์ง€, 150,000๊ฐœ์˜ ํ…Œ์ŠคํŠธ ์ด๋ฏธ์ง€ ๋ฐ์ดํ„ฐ์…‹
  • ๋‘ ๊ฐ€์ง€ ์˜ค๋ฅ˜์œจ ์‚ฌ์šฉ
    • top-1
    • top-5 : ๊ฐ€์žฅ ๊ฐ€๋Šฅ์„ฑ์ด ๋†’์€ 5๊ฐœ์˜ ๋ ˆ์ด๋ธ” ์ค‘ ์ •๋‹ต ๋ ˆ์ด๋ธ”์ด ์กด์žฌํ•  ํ™•๋ฅ 
  • ImageNet์€ ๊ฐ€๋ณ€ํ•ด์ƒ๋„ ์ด๋ฏธ์ง€๋ฐ์ดํ„ฐ์…‹์œผ๋กœ 256 × 256์˜ ๊ณ ์ • ํ•ด์ƒ๋„๋กœ ๋‹ค์šด์ƒ˜ํ”Œ๋ง (CenterCrop)
  • ๊ฐ ํ”ฝ์…€์„ Centerize ํ•˜๋Š” ๊ฒƒ ์™ธ์—๋Š” ๋‹ค๋ฅธ ์ „์ฒ˜๋ฆฌ ๊ณผ์ •์„ ์ง„ํ–‰ํ•˜์ง€ ์•Š์Œ

 

 

 

3. The Architecture

5๊ฐœ์˜ convolution layer์™€ 3๊ฐœ์˜ fullt-connected layer๋กœ ์ด 8๊ฐœ์˜ layer๋กœ ๊ตฌ์„ฑ

3-1. ReLU Nonlinearity

  • ๋ณดํ†ต ๋‰ด๋Ÿฐ์˜ ์ถœ๋ ฅ์€ tanh(x) ํ•จ์ˆ˜๋‚˜ sigmoid๋ฅผ ๊ฑฐ์นจ → ์ด๋Ÿฌํ•œ ํ™œ์„ฑ ํ•จ์ˆ˜๋Š” gradient descent ๋ฐฉ๋ฒ•์œผ๋กœ ํ•™์Šตํ• ๋•Œ ํ•™์Šต ์†๋„๋ฅผ ๋งค์šฐ ์ €ํ•˜์‹œํ‚ด
  • non-saturating nonlinearity๋กœ ReLU๋ฅผ ์‚ฌ์šฉ (tanh๋ณด๋‹ค ๋น ๋ฆ„)
  • f(x) = max(0,x)
  • ์•„๋ž˜ ๊ทธ๋ž˜ํ”„๋ฅผ ํ†ตํ•ด tanh๋ฅผ ์‚ฌ์šฉํ–ˆ์„๋•Œ ๋ณด๋‹ค ReLU๋ฅผ ์‚ฌ์šฉํ–ˆ์„ ๋•Œ ์†๋„๊ฐ€ ํ›จ์”ฌ ๋น ๋ฅธ ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ์Œ

3-2. Training on Multi GPUs

  • ํ•˜๋‚˜์˜ GTX 580 GPU๋Š” ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ 3GB๋ฐ–์— ๋˜์ง€ ์•Š์•„ ๋„คํŠธ์›Œํฌ์˜ ํ•™์Šต์— ์ œํ•œ์ด ์žˆ๊ธฐ ๋•Œ๋ฌธ์— ๋‘ ๊ฐœ์˜ GPU๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํ•™์Šต์‹œํ‚ด
  • ํ˜„์žฌ์˜ GPU๋Š” ํ˜ธ์ŠคํŠธ ์‹œ์Šคํ…œ ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ๊ฑฐ์น˜์ง€ ์•Š๊ณ  ์„œ๋กœ์˜ ๋ฉ”๋ชจ๋ฆฌ์—์„œ ์ง์ ‘ ์ฝ๊ณ  ์“ธ ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์— ๊ต์ฐจ GPU ๋ณ‘๋ ฌํ™”์— ํŠนํžˆ ์ ํ•ฉํ•จ
  • GPU parallelization์€ ์ปค๋„์„ ๋ฐ˜์œผ๋กœ ๋‚˜๋ˆ„์–ด ๊ฐ๊ฐ ํ•˜๋‚˜์˜ GPU์— ํ• ๋‹นํ•˜๋Š” ๊ฑด๋ฐ, ์ถ”๊ฐ€๋กœ ๋‘ GPU ๊ฐ„์˜ communication์€ ํŠน์ • layer์—์„œ๋งŒ ๋ฐœ์ƒํ•˜๋„๋ก ํ•จ. ๋”ฐ๋ผ์„œ layer 3์—์„œ๋Š” layer 2์˜ ๋ชจ๋“  kernel map์„ ๋ฐ›์•„ ์˜ฌ ์ˆ˜ ์žˆ์ง€๋งŒ layer 4๋Š” ๊ฐ™์€ GPU์˜ layer 3์˜ ์ปค๋„ ๋งต์—์„œ๋งŒ ์ž…๋ ฅ์„ ๋ฐ›์Œ
  • 2๊ฐœ์˜ GPU๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด 1๊ฐœ์˜ GPU๋ฅผ ์‚ฌ์šฉํ•  ๋•Œ๋ณด๋‹ค ๋” ์ ์€ ์‹œ๊ฐ„์„ ์†Œ์š”ํ•จ

3.3 Local Response Normalization

  • ReLU๋Š” ์ž…๋ ฅ ์ •๊ทœํ™”๋ฅผ ํ•„์š”๋กœ ํ•˜์ง€ ์•Š์Œ
  •  

  • ํ•ฉ์€ ๋™์ผํ•œ ๊ณต๊ฐ„ ์œ„์น˜์—์„œ n๊ฐœ์˜ "์ธ์ ‘ํ•œ" ์ปค๋„ ๋งต์— ๊ฑธ์ณ ์‹คํ–‰๋˜๋ฉฐ, N์€ ๋ ˆ์ด์–ด์˜ ์ด ์ปค๋„ ์ˆ˜
  • ์ปค๋„ ๋งต์˜ ์ˆœ์„œ๋Š” ๋ฌผ๋ก  ํ›ˆ๋ จ์ด ์‹œ์ž‘๋˜๊ธฐ ์ „์— ์ž„์˜์ ์œผ๋กœ ๊ฒฐ์ •๋จ
  • response normalization์€ lateral inhibition์„ ๊ตฌํ˜„ํ•œ ํ˜•ํƒœ๋กœ ๋‹ค๋ฅธ ์ปค๋„์—์„œ ๊ณ„์‚ฐ๋œ ์ถœ๋ ฅ๊ณผ ๊ฒฝ์Ÿ์„ ์ผ์œผํ‚ค๋Š” ๊ฒƒ
  • ์ƒ์ˆ˜ k, n, α, β๋Š” hyperparameter๋กœ, k = 2, n = 5, α = 10-4, β = 0.75 ์‚ฌ์šฉ
  • ์ •๊ทœํ™”๋Š” top- 1, top- 5์˜ ์˜ค๋ฅ˜์œจ์„ ๊ฐ๊ฐ 1.4%์™€ 1.2% ๊ฐ์†Œ ์‹œํ‚ด

3.4 Overlapping Pooling

  • CNN์˜ Pooling layer๋Š” ๋™์ผํ•œ ์ปค๋„ ๋งต์—์„œ ์ธ์ ‘ํ•œ ๋‰ด๋Ÿฐ ๊ทธ๋ฃน์˜ ์ถœ๋ ฅ์„ ์š”์•ฝํ•จ
  • ๋ณดํ†ต, pooling layer๋Š” overlapํ•˜์ง€ ์•Š์Œ
  • pooling layer์˜ ์ปค๋„ ์‚ฌ์ด์ฆˆ๋ฅผ z, stride๋ฅผ s๋ผ๊ณ  ํ•  ๋•Œ, s=z์ด๋ฉด ๋ณดํ†ต์˜ overlapํ•˜์ง€ ์•Š๋Š” pooling layer๋กœ ํ•ด์„ํ•จ
  • ๊ทธ๋Ÿฌ๋‚˜ ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” z=3, s=2๋กœ ํ•˜์—ฌ overlapping pooling์„ ๊ตฌ์„ฑํ•ด ๊ณผ์ ํ•ฉ์ด ๋œ ๋ฐœ์ƒํ•˜๊ฒŒ ๋จ

3.5 Overall Architecture

  • CNN์˜ ์ „๋ฐ˜์ ์ธ ๊ตฌ์„ฑ์€ ๊ฐ€์ค‘์น˜๊ฐ€ ์žˆ๋Š” 8๊ฐœ์˜ layer๋ฅผ ํฌํ•จํ•˜๋Š”๋ฐ, ์ฒ˜์Œ 5๊ฐœ๋Š” ์ปจconvolution-layer์ด๊ณ  ๋‚˜๋จธ์ง€ 3๊ฐœ๋Š” fully-connected layer๋กœ ๊ตฌ์„ฑ๋จ, ๋งˆ์ง€๋ง‰ fc layer๋Š” ์†Œํ”„ํŠธ๋งฅ์Šค ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•ด 1000๊ฐœ์˜ ๋ ˆ์ด๋ธ”์— ๋Œ€ํ•œ ๋ถ„ํฌ๋ฅผ ์ƒ์„ฑํ•จ
  • 2, 4, 5 ๋ฒˆ์งธ conv layer์˜ ์ปค๋„์€ ๋™์ผํ•œ GPU์— ์ƒ์ฃผํ•˜๋Š” ์ด์ „ layer์˜ ์ปค๋„ ๋งต์—๋งŒ ์—ฐ๊ฒฐ๋จ
  • 3๋ฒˆ์งธ conv layer๋Š” 2๋ฒˆ์งธ conv layer์˜ ๋ชจ๋“  ์ปค๋„ ๋งต์— ์—ฐ๊ฒฐ
  • Response-normalization layer๋Š” 1๋ฒˆ์งธ์™€ 2๋ฒˆ์งธ conv layer ๋’ค์— ์œ„์น˜ํ•จ
  • Max-pooling layer๋Š” 1,2,5๋ฒˆ์งธ conv layer๋’ค์— ์œ„์น˜ํ•จ
  • ReLU๋Š” ๋ชจ๋“  conv layer์™€ fc layer๋’ค์— ์œ„์น˜ํ•จ
  • ๊ฐ layer
  1. Conv layer1: 224x224x3์˜ ์ž…๋ ฅ ์ด๋ฏธ์ง€๋ฅผ 11x11x3 ํฌ๊ธฐ์˜ stride๊ฐ€ 4์ธ 96๊ฐœ์˜ ์ปค๋„๋กœ ์ถœ๋ ฅ
  2. Conv layer2: conv layer1์˜ ์ถœ๋ ฅ์„ response-normalize, pooling์„ ๊ฑฐ์น˜๊ณ  5x5x48 ํฌ๊ธฐ์˜ 256๊ฐœ์˜ ์ปค๋„๋กœ ์ถœ๋ ฅ
  3. Conv layer3: conv layer2์˜ ์ถœ๋ ฅ์„ ormalize, pooling์„ 3x3x256 ํฌ๊ธฐ์˜ 384๊ฐœ์˜ ์ปค๋„๋กœ ์ถœ๋ ฅ
  4. Conv layer4: conv layer3์˜ ์ถœ๋ ฅ์„ 3x3x192 ํฌ๊ธฐ์˜ 384๊ฐœ์˜ ์ปค๋„๋กœ ์ถœ๋ ฅ
  5. Conv layer5: conv layer4์˜ ์ถœ๋ ฅ์„ 3x3x192 ํฌ๊ธฐ์˜ 256๊ฐœ์˜ ์ปค๋„๋กœ ์ถœ๋ ฅ
  6. FC layer1: conv layer5์˜ ์ถœ๋ ฅ์„ pooling์„ ๊ฑฐ์น˜๊ณ  4096๊ฐœ๋กœ ์ถœ๋ ฅ
  7. FC layer2: fc layer1์˜ ์ถœ๋ ฅ์„ 4096๊ฐœ๋กœ ์ถœ๋ ฅ
  8. FC layer3: fc layer2์˜ ์ถœ๋ ฅ์„ 1000๊ฐœ๋กœ ์ถœ๋ ฅ

 

 

 

4. Reducing Overfitting

neural network๋Š” 60๋งŒ๊ฐœ์˜ parameter๋ฅผ ๊ฐ€์ง → ๊ณผ์ ํ•ฉ ๋ฐœ์ƒ ๊ฐ€๋Šฅ

4.1 Data Augmentation

  • ๋ฐ์ดํ„ฐ ๋ณ€ํ™˜์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ฐ์ดํ„ฐ์˜ ์ˆ˜๋ฅผ ์ธ์œ„์ ์œผ๋กœ ์ฆ๊ฐ€ ์‹œํ‚ด
  • ๋‘ ๊ฐ€์ง€ ๋ฐฉ๋ฒ•์ด ์กด์žฌํ•จ (๋‘ ๋ฐฉ๋ฒ• ๋ชจ๋‘ ๋ณ€ํ™˜๋œ ์ด๋ฏธ์ง€๋ฅผ ๋””์Šคํฌ์— ์ €์žฅํ•  ํ•„์š” X)
    • ์ด๋ฏธ์ง€ ๋ณ€ํ™˜๊ณผ ์ˆ˜ํ‰ ๋ฐ˜์‚ฌ๋ฅผ ์ƒ์„ฑ : 256x256 ์ด๋ฏธ์ง€๋ฅผ 224x224 ํฌ๊ธฐ๋กœ RandomResizedCropํ•˜๊ณ  RandomHorizontalFlip์„ ์ ์šฉ → ํ•˜๋‚˜์˜ ์ด๋ฏธ์ง€์—์„œ 2048์žฅ์˜ ์ด๋ฏธ์ง€๋ฅผ ์–ป์„ ์ˆ˜ ์žˆ๊ฒŒ ๋จ
    • RGB ์ฑ„๋„์˜ ๊ฐ•๋„(intensity)๋ฅผ ๋ฐ”๊ฟˆ : pi์™€ γi๋Š” ๊ฐ๊ฐ RGB ํ”ฝ์…€ ๊ฐ’์˜ 3×3 ๊ณต๋ถ„์‚ฐ ํ–‰๋ ฌ์˜ ๊ณ ์œ  ๋ฒกํ„ฐ์™€ ๊ณ ์œ  ๊ฐ’์ด๋ฉฐ, αi๋Š” ํ‰๊ท ์ด 0์ด๊ณ  ํ‘œ์ค€ํŽธ์ฐจ๊ฐ€ 0.1์ธ ๊ฐ€์šฐ์‹œ์•ˆ์—์„œ ์ถ”์ถœํ•œ ๋žœ๋ค ๋ณ€์ˆ˜, ์•„๋ž˜ ๊ฐ’์„ ๊ฐ๊ฐ RGB ์ด๋ฏธ์ง€ ํ”ฝ์…€์— ๋”ํ•จ
    •  
  • top- 1์˜ ์˜ค๋ฅ˜์œจ์„ 1% ์ด์ƒ ์ค„์ž„

4.2 Dropout

  • Droptout ๊ธฐ๋ฒ•์€ ์‚ฌ์šฉ์ž๊ฐ€ ์ง€์ •ํ•œ ํ™•๋ฅ ์„ ๊ทผ๊ฑฐ๋กœ ํ•˜์—ฌ ํŠน์ • ๋‰ด๋Ÿฐ์— ์‹ ํ˜ธ๋ฅผ ์ „๋‹ฌํ•˜์ง€ ์•Š๋Š” ๋ฐฉ๋ฒ•
  • ํ™•๋ฅ  0.5์˜ ๊ฐ hidden neuren์˜ ์ถœ๋ ฅ์„ 0์œผ๋กœ ์„ค์ •ํ•˜๋Š” ๊ฒƒ์œผ๋กœ ๊ตฌ์„ฑ
  • 0์„ ์ถœ๋ ฅํ•œ ๋‰ด๋Ÿฐ์€ ์—ญ์ „ํŒŒ์— ์ฐธ์—ฌํ•˜์ง€ ์•Š์Œ
  • ๋งค๋ฒˆ ์ž…๋ ฅ์— ๋”ฐ๋ผ ํ™œ์„ฑํ™”๋˜๋Š” ๋‰ด๋Ÿฐ์ด ๋‹ฌ๋ผ์ง€๊ณ , ์ด๋กœ ์ธํ•ด ๋‰ด๋Ÿฐ ๊ฐ„์˜ ๋ณต์žกํ•œ co-adaptation์„ ์ค„์ด๊ฒŒ ๋จ → ์—ฌ๋Ÿฌ ๋‰ด๋Ÿฐ ๊ฐ„์˜ ์กฐํ•ฉ์—์„œ๋„ ํ™•์‹คํ•œ ํŠน์ง•๋งŒ์„ ํ•™์Šตํ•˜๊ฒŒ ๋จ

 

 

 

5. Details of learning

  • ๋ฐฐ์น˜ ํฌ๊ธฐ๊ฐ€ 128๊ฐœ, Momentum 0.9, Weight decay: 0.0005์ธ ํ™•๋ฅ ์  ๊ฒฝ์‚ฌ ํ•˜๊ฐ•๋ฒ•์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ชจ๋ธ์„ ํ›ˆ๋ จ
  • Weight initialization: Gaussian distribution of μ=0, σ=0.01
  • Bias initialization: 2,4,5๋ฒˆ์งธ conv layers, fc layers -> 1 / 1,3๋ฒˆ์งธ conv layers -> 0
  • learning rate: 0.01
  • Epochs: 90
  • ์•„๋ž˜ ์‹์— ์˜ํ•ด weight ๊ฐฑ์‹ 

  • Bias๋ฅผ 1๋กœ ์ดˆ๊ธฐํ™” ํ•œ layer๋Š” ReLU์— ์–‘์ˆ˜๊ฐ€ ๋“ค์–ด๊ฐ€๊ฒŒ ํ•˜์—ฌ ํ•™์Šต์„ ๊ฐ€์†์‹œํ‚ด
  • learning rate๋Š” ๋ชจ๋“  layer์— ๋Œ€ํ•ด ๊ณตํ†ต์ ์œผ๋กœ ์ ์šฉ์‹œํ‚ค์ง€๋งŒ, ํ˜„์žฌ learning rate์—์„œ validation error๊ฐ€ ๊ฐœ์„ ๋˜์ง€ ์•Š๋Š”๋‹ค๋ฉด ํ˜„์žฌ learning rate์— 10์„ ๋‚˜๋ˆ ์คŒ → ํ•™์Šต ์ข…๋ฃŒ๊นŒ์ง€ 3๋ฒˆ ๊ฐ์†Œ

 

 

 

6. Results

  • ์šฐ์ธก์˜ ๊ฒฐ๊ณผ๋ฅผ ํ™•์ธํ•˜๋ฉด CNN๊ตฌ์กฐ๊ฐ€ ๋‹ค๋ฅธ ๊ตฌ์กฐ์— ๋น„ํ•ด ํšจ์œจ์ ์ž„์„ ์•Œ ์ˆ˜ ์žˆ์Œ
  • ๋ถ„๋ฅ˜๋ฅผ ์ง„ํ–‰ํ• ๋•Œ ๋ชจ๋ธ ํ•œ๊ฐœ๊ฐ€ ์•„๋‹Œ ์—ฌ๋Ÿฌ๊ฐœ์˜ ๋ชจ๋ธ์˜ ํ‰๊ท  ์ถœ๋ ฅ๊ฐ’์„ ์ด์šฉํ•˜์—ฌ ๊ฒฐ๊ณผ๋ฅผ ๊ตฌํ•˜๋Š” ๋ฐฉ์‹์„ ์•™์ƒ๋ธ”์ด๋ผ ํ•˜๋ฉฐ ์•™์ƒ๋ธ” ๋ชจ๋ธ์„ ์ด์šฉํ•˜๋ฉด ์„ฑ๋Šฅ์ด ํ–ฅ์ƒ ๋จ์„ ์ขŒ์ธก ๊ฒฐ๊ณผ์—์„œ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Œ
  • ์—ฌ๋Ÿฌ๊ฐœ์˜ ์œ ์‚ฌํ•œ CNN์„ ์ด์šฉํ•˜์—ฌ 1๊ฐœ์˜ CNN๋ชจ๋ธ๋งŒ ์‚ฌ์šฉํ•œ ๊ฒƒ ๋ณด๋‹ค ์ข‹์€ ๊ฒฐ๊ณผ๋ฅผ ์–ป๋Š” ๊ฒƒ์„ ๋ณด์—ฌ์ฃผ๋ฉฐ ์ขŒ์ธก ๊ฒฐ๊ณผ์ค‘ 1CNN*/7CNN*์ด๋ผ ์ ํžŒ ๋ถ€๋ถ„์€ ์ด ์ „ ๋ชจ๋ธ์˜ ํ•™์Šต ๊ฐ€์ค‘์น˜์ธ fine-tune์„ ์ดˆ๊ธฐ๊ฐ’์œผ๋กœ ์ œ์ž‘ํ•œ ๋ชจ๋ธ์ด๋ผ๋Š” ์˜๋ฏธ์ด๋ฉฐ ์ฒ˜์Œ๋ถ€ํ„ฐ ๊ฐ€์ค‘์น˜๋ฅผ ํ•™์Šต์‹œํ‚ค๋Š”๊ฒƒ ๋ณด๋‹ค ๋” ๋‚˜์€ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์คŒ
728x90
Comments