Gradient Descent A simple algorithm to go “downward” against the gradient of the function. Algebrically: $$w_{t+1} = w_t - \eta \nabla f(w_t)$$ where $\eta$ is called learning rate or step size. Step Size $\eta$ too small, slow convergence $\eta$ too large, solution will bounce around In practice: Set $\eta$ to be a smalle constant Backtracking line search (work when $\nabla f$ is continuous) Parameter $\bar{\alpha}, c \in (0,1), \rho \in (0,1)$.……

Read More