Prototype Selection

Backgrond kNN prototype selection Summary List There are couple drawbacks for KNN high storage for data computation for decision boundary intolerance to noise There are couple methods address above issue better similarity metric or better distance function k-d trees or R-trees as storage reduction technique (prototype selection) Prototype Selection 1. edition method - remove noise 1. condensation method - remove superfluous dataset 1. hybrid method - achive elimination of noise and superfluous at the same time……

Read More

K Clustering

Problem We have set of objects $U = \{o_1, o_2, …\}$, and we want to split them into k clusters. We also have following definition for distance function. $\forall_{i,j} dist(p_i, p_j) = dist(p_j, p_j)$ $\forall_{i,j} dist(p_i, p_i) = 0$ $\forall_{i,j} dist(p_i, p_j) > 0$. At the end, we should have $C = \{C_1, C_2, … C_K\}$. Let’s define spacing to be the minimum dist between clusters. Our goal is to find the k-clustering with maximum spacing.……

Read More

Probability

Discrete Random Variables A random variable is a number whose value depends upon the outcome of a random experiement. Such as tossing a coin 10 times and let X be the number of Head. A discrete random variable X has finitely countable values $x_i = 1, 2…$ and $p(x_i) = P(X = x_i)$ is called probability mass function. Probability mass functions has following properties: For all i, $p(x_i) > 0$ For any interval $P(X \in B) = \sum_{x_i \in B}p(x_i)$ $\sum_{i}p(x_i) = 1$ There are many types of discrete random variable……

Read More