2019/1/31
Background Marginal Distribution
Three ways to sample from P
Draw (x,y) Draw y according to its marginal distribution, then x according to the conditional distribution of x | y Draw X according to its marginal distribution, then Y according to the conditional distribution of y | x Define:
$\mu$: distribution on $X$
$\eta$: conditional distribution y|x
Classifier Normal Classifier
$h : x \rightarrow y$
$R(h) = Pr_{(x,y) \in p} (h(x) \neq y)$, where R = risk……
Read More
2019/1/31
Marginal Distribution: It’s a function that gives the probability based on only subset of the variables.1. For example,
Or in mathematical way, for discrete
$$Pr(X = x) = \sum_y Pr(X = x, Y = y) = \sum_y Pr(X = x | Y = y) Pr(Y = y)$$
and for continuous
$$p_X(x) = \int_y p_{X,Y}(x,y) dy = \int_y p_{X|Y}(x|y)p_Y(y) dy$$
and it can also be written as Expected Vaue
$$p_X(x) = \int_y p_{X|Y} (x|y) p_Y(y) dy = \mathbb{E}_Y[p_{X|Y} (x|y)]$$……
Read More
2019/1/31
Families of Distance Function $l_p$ norm The most common one is $l_2$ norm (Euclidean distance):
$$||x - z||_2 = \sqrt{\sum_{i=1}^{m}(x_i - z_i)^2}$$
Notes: sometime 2 is dropped.
For $p \geq 1$, the $l_p$ distance:
$$||x - z||_p = (\sum_{i=1}^{m}(x_i - z_i)^p)^{1/p}$$
Special case:
$l_1$ distance: $$||x - z||_1 = \sum_{i=1}^{m}|x_i - z_i|$$
$l_\infty$ distance:
$$||x - z||_1 = max_i |x_i - z_i|$$
Metric space Let $X$ be the space that data lie.……
Read More
2019/1/31
Nearest Neighbor Classification Procedures Assemble a data set (training set) How to classify a new image x? find its closest neighbor y, and label it the same Notes:
training set of 60000 images test set of 10000 images How do we determine if two data (images) are closest? With 28 x 28 image, we can strech it to become a vector of 784.……
Read More
2019/1/30
Problem Given two vectors $a = (a_1, a_2, a_{n-1})$ and $b = (a_1, b_2, b_{n-1})$.
The convolution of a * b is a vector with 2n - 1 coordinates, where coordinate k is $\sum_{(i,j):i+j=k|i,j < n} a_ib_j$, which is can be written as
$$a ∗ b = (a_0b_0, a_0b_1 + a_1b_0, a_0b_2 + a_1b_1 + a_2b_0, … , a_{n−2}b_{n−1} + a_{n−1}b_{n−2}, a_{n−1}b_{n−1})$$
Or an n x n table, whose (i, j) entry is $a_ib_j$……
Read More
2019/1/30
Problem Wiki Explanation
Algorithm Let’s say we define $x = x_1 2^{n/2} + x_0$, then xy become \begin{align} xy &= (x_1 2^{n/2} + x_0)(y_1 2^{n/2} + y_0) \newline &= x_1 y_1 2^n + (x_1 y_0 + x_0 y_1)2^{n/2} + x_0 y_0 \end{align}
So we have $$T(n) \leq 4T(n/2) + cn$$ But this is essentially $$T(n) \leq O(n^{\log_2 q}) = O(n^2)$$
However, we can reduce the time by observing that $(x_1 + x_0)(y_1 + y_0) = x_1y_1 + x_1y_0 + x_0y_1 + x_0y_0$.……
Read More
2019/1/29
Problem Given n points in the plane, find the pair that is closest together.
Brute force takes $O(n^2)$.
Algorithm Let’s $d(p_i, p_j)$ = Euclidean distance.
In 1-d, we can simply sort points and compute the distance with the next point, we then have complexity of O(nlogn). In 2-d, we can’t applied the same thing.
We will use divide and conquer. We find the closest pair in the left and closest pair in the right, and hoping to get it in linear time.……
Read More
2019/1/29
Problem Application in ranking, also called corraborative filtering.
Comparing two rankings and decide how similar they are, or how many pairs are out of order.
To quantify this, we count the number of inversions. The inversion is defined as two indices i < j that $a_i > a_j$. Algorithm Brute-Force $T(n) = O(n^2)$
Modified Merge-Sort By leverage the merge process form merger-sort, we can count the number of inversion. Basically, when the element from A is appended, there is not inversion.……
Read More
2019/1/29
Problem Sort the elements
Abstract the behavior:
1. Divide the input into two pieces of equal size O(n) 1. solve the two subproblems on these pieces separately by recursion 1. combine the two results into an overall solution O(n)
Recurrence Time Complexity q = 2 T(n) ≤ 2T(n/2) + cn
To analyze the above recurrence relation, check below image. At level j, we have $2^j$ nodes with size $n/2^j$ Each node takes $cn/2^j$, so level j takes $cn/2^j$ x $2^j = cn$ There are logn levels, so T(n) = O(nlogn) General Case For q > 2 T(n) ≤ qT(n/2) + cn……
Read More
2019/1/27
Z-Score vs T-Score Z-Score Link
Z-Score’s formula $$z = \frac{X - \mu}{\sigma}$$ where X = sample mean, $\mu$ = population means, $\sigma$ = population standard deviation.
Also, we use Z Score when sample size >= 30 or we know the population’s mean dna SD.
Z Table
T-Score T-Score T-Score’s formula $$T = \frac{X - \mu}{s/ \sqrt{n}}$$ where X = sample mean, $\mu$ = population mean, s = sample standard deviation, and n = sample size.……
Read More
-
Prev
-
1
-
2
-
3
-
4
-
Next