Skip to content
Zhengyuan Zhu
Go back

Gradient Descent in Machine Learning

Introduction:

The position of gradient descent in machine learning: After a neural network uses the backpropagation algorithm, it needs to use optimization algorithms to reduce error. Among various optimization algorithms, gradient descent is the simplest and most common one, widely used in deep learning training.

Optimization Problem

The problem of finding the extremum of a function, including maximum and minimum values.

Derivative and Gradient

Where $\nabla$ is called the gradient operator, which acts on a multivariate function to obtain a vector. Here is an example of computing the gradient of a function: $\nabla{(x^2+xy-y^2)=(2x+y,x-2y)}$

How to determine whether a stationary point is a maximum or minimum? It depends on the second derivative/Hessian matrix:

Why can’t we directly solve the gradient of the function and solve the equation? Answer: The equation may be very difficult to solve: For equations with exponential functions, logarithmic functions, and trigonometric functions, we call them transcendental equations. For example, $3x^2e^{xy}+xcos(xy)=0$. The difficulty of solving it is not less than finding the extremum itself.

Derivation Process

Problems Faced

Supplement: Stationary points require the first derivative to exist, while extremum points have no requirement on derivatives

Variants

Stochastic Gradient Descent

Stochastic gradient descent converges in the sense of mathematical expectation, but cannot guarantee that the function value decreases with each iteration.

References and citations


Share this post on:

Previous Post
MongoDB Advanced Design Patterns
Next Post
Introduction to RNN and Binary Adder Implementation
Jack the orange tabby cat
I'm Jack 🧑
Luna the tuxedo cat
I'm Luna! πŸ–€