RE: What does learning rate refer to in deep learning?
In deep learning, the learning rate is a hyperparameter which determines how much we are adjusting the weights of our network with respect the loss gradient. Learning rate directly impacts how quickly our model can converge to a local or global minima (i.e., lower error). The numbers you see (like 0.1, 0.01 etc.) are the actual values of the learning rate, typically between 0 and 1. These values define the step size during gradient descent. A smaller learning rate requires more training epochs given the smaller updates made to the weights each step, but with large learning rates we may skip over optimal solutions, or even fail to converge. Choosing the right learning rate is crucial. Too high, the gradient descent possibly overshoots the minimum. Too low, gradient descent may be slow to converge or get stuck in an undesirable local minimum. Oftentimes, the learning rate is set via trial-and-error or more systematic approaches such as Grid Search, Random Search or adaptive methods. Adaptive methods like AdaGrad, RMSProp, or Adam can adjust the learning rate throughout the training, based on the gradients.