250x250
Link
๋‚˜์˜ GitHub Contribution ๊ทธ๋ž˜ํ”„
Loading data ...
Notice
Recent Posts
Recent Comments
๊ด€๋ฆฌ ๋ฉ”๋‰ด

Data Science LAB

[Deep Learning] ์ตœ์ ํ™” ๋ฐฉ๋ฒ• ๋น„๊ต (SGD, Momentum, AdaGrad, Adam ๋ณธ๋ฌธ

๐Ÿง  Deep Learning

[Deep Learning] ์ตœ์ ํ™” ๋ฐฉ๋ฒ• ๋น„๊ต (SGD, Momentum, AdaGrad, Adam

ใ…… ใ…œ ใ…” ใ…‡ 2022. 11. 18. 00:20
728x90

1. SGD (ํ™•๋ฅ ์  ๊ฒฝ์‚ฌ ํ•˜๊ฐ•๋ฒ•)

  • ๋งค๊ฐœ ๋ณ€์ˆ˜์˜ ๊ธฐ์šธ๊ธฐ๋ฅผ ๊ตฌํ•ด ๊ธฐ์šธ์–ด์ง„ ๋ฐฉํ–ฅ์œผ๋กœ ๋งค๊ฐœ๋ณ€์ˆ˜ ๊ฐ’์„ ๊ฐฑ์‹ ํ•˜๋Š” ์ผ์„ ๋ช‡ ๋ฒˆ์ด๊ณ  ๋ฐ˜๋ณตํ•˜์—ฌ ์ตœ์ ์˜ ๊ฐ’์— ๋‹ค๊ฐ€๊ฐ€๋Š” ๊ฒƒ
  • ๊ฒฝ์‚ฌ ํ•˜๊ฐ•๋ฒ•๊ณผ๋Š” ๋‹ค๋ฅด๊ฒŒ ๋žœ๋คํ•˜๊ฒŒ ์ถ”์ถœํ•œ ์ผ๋ถ€ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด์„œ๋งŒ ๊ฐ€์ค‘์น˜๋ฅผ ์กฐ์ ˆํ•จ

SGD์˜ ๋‹จ์ 
 : ๋น„๋“ฑ๋ฐฉ์„ฑ ํ•จ์ˆ˜(๊ธฐ์šธ๊ธฐ๊ฐ€ ๋‹ฌ๋ผ์ง€๋Š” ํ•จ์ˆ˜)์—์„œ๋Š” ํƒ์ƒ‰ ๊ฒฝ๋กœ๊ฐ€ ๋น„ํšจ์œจ์ ์ž„

 

 

- ํŒŒ์ด์ฌ ์ฝ”๋“œ ๊ตฌํ˜„

class SGD:
    def __init__(self, lr=0.01):
        self.lr = lr
    
    def update(self, params, grads):
        for key in params.keys():
            params[key] -= self.lr * grad[key]

 

 

 

 

 

2. Momentum (๋ชจ๋ฉ˜ํ…€)

  • ๋ชจ๋ฉ˜ํ…€์€ '์šด๋™๋Ÿ‰'์„ ๋œปํ•˜๋Š” ๋‹จ์–ด๋กœ ๊ธฐ์šธ๊ธฐ ๋ฐฉํ–ฅ์œผ๋กœ ํž˜์„ ๋ฐ›์•„ ๋ฌผ์ฒด๊ฐ€ ๊ฐ€์†๋œ๋‹ค๋Š” ๋ฌผ๋ฆฌ ๋ฒ•์น™์„ ์ด์šฉํ•ด ๊ณต์ด ๊ทธ๋ฆ‡์˜ ๋ฐ”๋‹ฅ์„ ๊ตฌ๋ฅด๋Š” ๋“ฏํ•œ ์›€์ง์ž„์„ ๋ณด์—ฌ์คŒ
  • ๊ฒฝ์‚ฌ ํ•˜๊ฐ•๋ฒ•๊ณผ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ ๋งค๋ฒˆ ๊ธฐ์šธ๊ธฐ๋ฅผ ๊ตฌํ•˜์ง€๋งŒ, ์ด์ „์˜ ๋ฐฉํ–ฅ์„ ์ฐธ๊ณ ํ•˜์—ฌ (+,-) ๊ฐ™์€ ๋ฐฉํ–ฅ์œผ๋กœ ์ผ์ • ๋น„์œจ๋งŒ์„ ์ˆ˜์ •๋˜๊ฒŒ ํ•˜๋Š” ๋ฐฉ๋ฒ•
  • +, - ๋ฐฉํ–ฅ์ด ๋ฒˆ๊ฐˆ์•„๊ฐ€๋ฉฐ ๋‚˜ํƒ€๋‚˜๋Š” ์ง€๊ทธ์žฌ๊ทธ ํ˜„์ƒ์„ ์ค„์ด๊ณ , ์ด์ „ ์ด๋™ ๊ฐ’์„ ๊ณ ๋ คํ•ด์„œ ์ผ์ • ๋น„์œจ๋งŒํผ ๋‹ค์Œ ๊ฐ’์„ ๊ฒฐ์ •ํ•˜๋ฏ€๋กœ ๊ด€์„ฑ์˜ ์„ฑ์งˆ์„ ์ด์šฉํ•  ์ˆ˜ ์žˆ์Œ

 

- ํŒŒ์ด์ฌ ์ฝ”๋“œ ๊ตฌํ˜„

class Momentum:
    def __init__(self, lr=0.01, momentum=0.9):
        self.lr = lr
        self.momentum = momentum
        self.v = None
    
    def update(self, params, grads):
        if self.v is None:
            self.v = {}
            for key, val in params.items():
                self.v[key] = np.zeros_like(val)
        
        for key in params.keys():
            self.v[key] = self.momentum * self.v[key] - self.lr * grads[key]
            params[key] += self.v[key]

 

 

 

 

 

3. AdaGrad

  • ๊ฐœ๋ณ„ ๋งค๊ฐœ๋ณ€์ˆ˜์— ์ ์‘์ ์œผ๋กœ ํ•™์Šต๋ฅ ์„ ์กฐ์ •ํ•˜๋ฉฐ ํ•™์Šต์„ ์ง„ํ–‰ํ•จ
  • ๋ณ€์ˆ˜์˜ ์—…๋ฐ์ดํŠธ ํšŸ์ˆ˜์— ๋”ฐ๋ผ ํ•™์Šต๋ฅ ์„ ์กฐ์ ˆํ•˜๋Š” ์˜ต์…˜์ด ์ถ”๊ฐ€๋œ ๋ฐฉ๋ฒ•
  • ๋งŽ์ด ๋ณ€ํ™”ํ•˜์ง€ ์•Š์€ ๋ณ€์ˆ˜๋“ค์˜ ํ•™์Šต๋ฅ ์„ ํฌ๊ฒŒํ•˜๊ณ , ๋งŽ์ด ๋ณ€ํ™”๋œ ๋ณ€์ˆ˜๋“ค์˜ ํ•™์Šต๋ฅ ์„ ์ž‘๊ฒŒํ•จ (๋ณ€ํ™”ํ•œ ๋ณ€์ˆ˜๋Š” ์ตœ์ ๊ฐ’์— ๊ทผ์ ‘ํ–ˆ์„ ๊ฒƒ์ด๋ผ๋Š” ๊ฐ€์ •ํ•˜์— ์ž‘์€ ํฌ๊ธฐ๋กœ ์ด๋™ํ•˜๋ฉด์„œ ์„ธ๋ฐ€ํ•œ ๊ฐ’์„ ์กฐ์ •ํ•˜๊ณ , ๋ฐ˜๋Œ€๋กœ ์ ๊ฒŒ ๋ณ€ํ™”ํ•œ ๋ณ€์ˆ˜๋“ค์€ ํ•™์Šต๋ฅ ์„ ํฌ๊ฒŒํ•˜์—ฌ ๋น ๋ฅด๊ฒŒ loss๊ฐ’์„ ์ค„์ž„)

 

- ํŒŒ์ด์ฌ ์ฝ”๋“œ ๊ตฌํ˜„

class AdaGrad:
    def __init__(self,lr=0.01):
        self.lr = lr
        self.h = None
    
    def update(self, params, grads):
        if self.h == None:
            self.h = {}
            for key, val in params.items():
                self.h[key] = np.zeros_like(val)
                
            
        for key in params.keys():
            self.h[key] += grads[key] * grads[key]
            params[key] -= self.lr * grads[key] / (np.sqrt(self.h[key]) + 1e-7)

 

 

 

 

 

 

4. Adam

  • Momentum๊ณผ AdaGrad์˜ ์žฅ์ ์„ ์œตํ•ฉ
  • ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ์˜ ํŽธํ–ฅ ๋ณด์ •

 

- ํŒŒ์ด์ฌ ์ฝ”๋“œ ๊ตฌํ˜„

class Adam:
    def __init__(self, lr=0.001, beta1=0.9, beta2=0.999):
        self.lr = lr
        self.beta1 = beta1
        self.beta2 = beta2
        self.iter = 0
        self.m = None
        self.v = None
        
    
    def update(self, params, grads):
        if self.m is None:
            self.m, self.v = {}, {}
            for key, val in params.items():
                self.m[key] = np.zeros_like(val)
                self.v[key] = np.zeros_like(val)
                
        
        self.iter += 1
        lr_t = self.lr * np.sqrt(1.0 - self.beta2**self.iter) / (1.0 -self.beta1**self.iter)
        
        for key in params.keys():
            self.m[key] += (1 - self.beta1) * (grads[key] - self.m[key])
            self.v[key] += (1 - self.beta2) * (grads[key] - self.v[key])
            
            params[key] -= lr_t * self.m[key] / (np.sqrt(self.v[key]) + 1e-7)

 

 

 

 

4๊ฐ€์ง€ ๋ฐฉ๋ฒ•์˜ ์ตœ์ ํ™” ๋ฐฉ๋ฒ• ๋น„๊ต

๋ฐ์ดํ„ฐ์˜ ์ข…๋ฅ˜์— ๋”ฐ๋ผ ๋งž์ถฐ ์„ ํƒํ•ด์•ผํ•จ

 

 

 

 

ํ•™์Šต ์ง„๋„ ๋ณ„ ๋น„๊ต

๊ฐ€์žฅ ์œ ๋ช…ํ•œ ๋ฐ์ดํ„ฐ ์…‹์ธ MNIST ์— ๊ฐ ์ตœ์ ํ™” ๋ฐฉ์‹์„ ์ ์šฉํ–ˆ์„ ๋•Œ์˜ ํ•™์Šต ์ง„๋„๋ฅผ ๋‚˜ํƒ€๋‚ธ ๊ทธ๋ฆผ์ด๋‹ค.

MNIST์—๋Š” AdaGrad๊ฐ€ ๋ฏธ์„ธํ•˜๊ฒŒ ๊ฐ€์žฅ ์ข‹์•„๋ณด์ด๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค. 

728x90
Comments