AICurious Logo

What is: Stochastic Gradient Descent?

Year1951
Data SourceCC BY-SA - https://paperswithcode.com

Stochastic Gradient Descent is an iterative optimization technique that uses minibatches of data to form an expectation of the gradient, rather than the full gradient using all available data. That is for weights ww and a loss function LL we have:

w_t+1=w_tη^_wL(w_t)w\_{t+1} = w\_{t} - \eta\hat{\nabla}\_{w}{L(w\_{t})}

Where η\eta is a learning rate. SGD reduces redundancy compared to batch gradient descent - which recomputes gradients for similar examples before each parameter update - so it is usually much faster.

(Image Source: here)