Logistic Regression
Uncertainty in Prediction
Related to Linear Regression.
The available features x do not contain enough information to perfectly predict y, such as
- x = medical record for patients at risk for a disease
- y = will he contact disease in next 5 years
Model
We still going to use linear model for conditional probability estmation
w1x1+w2x2+…+wdxd+b=w⋅x+b
We want the Pr(y=1):
- increases as linear function grows
- 0.5 when linear function is 0
This leads to the sigmoid function
Logistic Regression Model
Let $y \in$ {-1, 1}
Pr(y=1|x)=11+e−(w⋅x+b)
and
Pr(y=−1|x)=1−Pr(y=1|x)=11+ew⋅x+b
Or consisely
Pr(y|x)=11+e−y(w⋅x+b)
Maximum-likelihood
We want to maximize the probability:
n∏i=1Prw,b(y(i)|x(i))
Loss function
After taking log of maximum-likelihood formula, we convert it to the loss function
L(w,b)=−n∑i=1lnPrw,b(y(i)|x(i))=n∑i=1ln(1+e−y(i)(w⋅x(i)+b))
Solution
There is no closed-form solution for w, but L(x) is convex.
Convexity is crucial because the local minimum is also the global minimum.
We turn to numerical method gradient descent.
Gradient Descent
- Set $w_0 = 0$
- For t = 0, 1, 2, … until convergence:
- $w_{t+1} = w_t + \eta_t \sum_{i=1}^n y^{(i)}x^{(i)}Pr_{w_t}(-y^{(i)} | x^{(i)})$, where $\eta_t$ is called step size (learning rate)