Loading [MathJax]/jax/output/CommonHTML/jax.js

Articles in statistical learning series

Constrained Optimization

Constrained Optimization minxRnf(x) subject to: ci(x)=0,iE(equality constraints) ci(x)0,iI(inequality constraints) Feasible Set $\Omega = { x_i | c_i(x) = 0, i \in E \text{ and } c_i(x) \geq 0, i \in I}$. Case 1 $\min_{x \in R^n} f(x)$ subject to $c_1 (x) = 0$ x is local minimum if x + s $\notin \Omega$ or f(x+s) $\geq$ f(x).……

Read More

Gradient Descent

Gradient Descent A simple algorithm to go “downward” against the gradient of the function. Algebrically: wt+1=wtηf(wt) where $\eta$ is called learning rate or step size. Step Size $\eta$ too small, slow convergence $\eta$ too large, solution will bounce around In practice: Set $\eta$ to be a smalle constant Backtracking line search (work when $\nabla f$ is continuous) Parameter $\bar{\alpha}, c \in (0,1), \rho \in (0,1)$.……

Read More

Convex Optimization

Gradient && Hessian The gradient of a f, d x 1, can be represented as follow f(x)=[f(x)x1f(x)xd] and the Hessian, d x d, can be represented as 2f(x)=[2f(x)x212f(x)x1x22f(x)x1xd2f(x)x2d2f(x)xdx22f(x)xdxd]……

Read More

Logistic Regression

Uncertainty in Prediction Related to Linear Regression. The available features x do not contain enough information to perfectly predict y, such as x = medical record for patients at risk for a disease y = will he contact disease in next 5 years Model We still going to use linear model for conditional probability estmation w1x1+w2x2++wdxd+b=wx+b……

Read More

Linear Regression

Basic Idea Fit a line to a bunch of points. Example Without extra information, we will predict the mean 2.47. Average squared error = $\mathbb{E} [(studentGPA - predictedGPA)^2]$ = Variance If we have SAT scores, then we can fit a line. Now if we predict based on this line, the MSE drops to 0.43. This is a regression problem with: Predictor variable: SAT score Response variable: College GPA Formula For $\mathbb{R}$ y=ax+b……

Read More

Bayes Optimal Classifier

Background Marginal Distribution Three ways to sample from P Draw (x,y) Draw y according to its marginal distribution, then x according to the conditional distribution of x | y Draw X according to its marginal distribution, then Y according to the conditional distribution of y | x Define: $\mu$: distribution on $X$ $\eta$: conditional distribution y|x Classifier Normal Classifier $h : x \rightarrow y$ $R(h) = Pr_{(x,y) \in p} (h(x) \neq y)$, where R = risk……

Read More

lp norm

Families of Distance Function $l_p$ norm The most common one is $l_2$ norm (Euclidean distance): ||xz||2=mi=1(xizi)2 Notes: sometime 2 is dropped. For $p \geq 1$, the $l_p$ distance: ||xz||p=(mi=1(xizi)p)1/p Special case: $l_1$ distance: ||xz||1=mi=1|xizi| $l_\infty$ distance: ||xz||1=maxi|xizi| Metric space Let $X$ be the space that data lie.……

Read More

Nearest Neighbor Classification

Nearest Neighbor Classification Procedures Assemble a data set (training set) How to classify a new image x? find its closest neighbor y, and label it the same Notes: training set of 60000 images test set of 10000 images How do we determine if two data (images) are closest? With 28 x 28 image, we can strech it to become a vector of 784.……

Read More

Margin of Error

Z-Score vs T-Score Z-Score Link Z-Score’s formula z=Xμσ where X = sample mean, $\mu$ = population means, $\sigma$ = population standard deviation. Also, we use Z Score when sample size >= 30 or we know the population’s mean dna SD. Z Table T-Score T-Score T-Score’s formula T=Xμs/n where X = sample mean, $\mu$ = population mean, s = sample standard deviation, and n = sample size.……

Read More