What does learning rate refer to in deep learning?

Learning rate in deep learning and most learning algorithms is a hyper-parameter that determines the step size at each iteration while moving towards a minimum of a loss function. Essentially, it controls how much we are adjusting the weights with the respect to the loss gradient. With learning rate values like 0.1 and 0.01 you mentioned, the model learns by adjusting the weights by 10% and 1% of the gradient, respectively. Therefore, learning rate is directly related to how fast or slow a network trains. However, it's important to set a careful balance for learning rate. If it's too large, the algorithm might overshoot the minimum and diverge. Conversely, if it's too small, the algorithm will need more iterations to converge to the minimum, meaning it will work slowly. A common practice is to gradually decrease the learning rate, allowing the model to settle more comfortably into the global (or local) minimum. Procedures for changing the learning rate during training are called learning schedule or learning rate schedule. Examples include: time-based decay, drop-based decay, etc. To find an optimal learning rate, consider using grid search or a learning rate finder tool that will try multiple learning rates and see which one provides the least loss.

Your Answer

HOT QUESTIONS