2019/1/27
Backgrond kNN prototype selection Summary List
There are couple drawbacks for KNN
high storage for data computation for decision boundary intolerance to noise There are couple methods address above issue
better similarity metric or better distance function k-d trees or R-trees as storage reduction technique (prototype selection) Prototype Selection 1. edition method - remove noise 1. condensation method - remove superfluous dataset 1. hybrid method - achive elimination of noise and superfluous at the same time……
Read More
2019/1/25
Problem We have set of objects $U = \{o_1, o_2, …\}$, and we want to split them into k clusters.
We also have following definition for distance function.
$\forall_{i,j} dist(p_i, p_j) = dist(p_j, p_j)$ $\forall_{i,j} dist(p_i, p_i) = 0$ $\forall_{i,j} dist(p_i, p_j) > 0$. At the end, we should have $C = \{C_1, C_2, … C_K\}$.
Let’s define spacing to be the minimum dist between clusters. Our goal is to find the k-clustering with maximum spacing.……
Read More
2019/1/24
Discrete Random Variables A random variable is a number whose value depends upon the outcome of a random experiement. Such as tossing a coin 10 times and let X be the number of Head.
A discrete random variable X has finitely countable values $x_i = 1, 2…$ and $p(x_i) = P(X = x_i)$ is called probability mass function.
Probability mass functions has following properties:
For all i, $p(x_i) > 0$ For any interval $P(X \in B) = \sum_{x_i \in B}p(x_i)$ $\sum_{i}p(x_i) = 1$ There are many types of discrete random variable……
Read More
2019/1/24
Hello, this is the first post……
Read More