Bayes Optimal Classifier

Background

Three ways to sample from P

Draw (x,y)
Draw y according to its marginal distribution, then x according to the conditional distribution of x | y
Draw X according to its marginal distribution, then Y according to the conditional distribution of y | x

Define:
$\mu$: distribution on $X$
$\eta$: conditional distribution y|x

Normal Classifier
$h : x \rightarrow y$
$R(h) = Pr_{(x,y) \in p} (h(x) \neq y)$, where R = risk

Bayes Optimal Classifier
$h^* : x \rightarrow y$

\begin{equation} f(x)=\begin{cases} 1 & \text{if $\eta(x) \geq \frac{1}{2}$} \newline 0 & \text{otherwise}. \end{cases} \end{equation}

$R(h^*)$ should be minimum, so $R^* = R(h^*) = \mathbb{E}_{x \in p}[min(\eta(x), 1 - \eta(x))]$

Algorithm is consitent if $R(h_n) \rightarrow R^*$ as $n \rightarrow \infty$

Pick n points at random from $P = (\mu, \eta)$
Let $h_n$ be the 1-NN classifier based on these. 1-NN is not consistent.

e.g. consider the following distribution

Bayes optimal classifier

For 1-NN classifier

For k-NN classifier, it’s consistent ($R(h_{n,k}) \rightarrow R^*$) if

k is grwoing fn of n with $k \rightarrow \infty$, $\frac{k}{n} \rightarrow \infty$
$\eta$ is continuous

Proof

if $\eta(x) = \frac{1}{2}$, then $R(h_{n,k}) = 1 / 2$, $R(h^*) = 1 / 2$
if $\eta(x) \lt 1 / 2$, and assume there is Ball B around x, Pr(inside B) = $\mu(B)$
As $n \rightarrow \infty$, graph become more dense, radius of B become smaller
- Pr(k-NN of x fall in b) $\rightarrow$ 1 as $\frac{k}{n} \rightarrow 0$
- Pr(majority voe of their label is 0) $\rightarrow$ 1 as $k_n \rightarrow \infty$
- for example, $k(n) = \log n$

All data i.i.d. from fixed unknown distribution over X x y
Can’t cover shifting distribution
- $\mu$ changing, $\eta$ fixed(covariate shift): different handwriting/speech distribution
- $\mu$, $\eta$ both change: classifying articles $\rightarrow$ politics, sports