Sequence Alignment
Problem1
When you type “ocurrance”, the brower will will prompt “Perhaps you mean occurrence?”
How does the search engine knows?
How does it know the most similar word?
How do we determine the similarity between words?
The definition of similarity will be based on the optimal alignment between X and Y where X = $x_1x_2 … x_m$, Y = $y_1y_2 … y_n$.
- when there is gap (character not matched), we pay $\delta$
- when we match character p and q, we pay $\alpha_{pq}$, where $\alpha_{pq}$ = 0
- goal is to minimize the sum of cost.
In the optimal alignment
- (m,n) $\subset$ M
- the $m^{th}$ position of X is not matched
- the $n^{th}$ position of Y is not matched
Then we have following
- OPT(m, n) = $\alpha_{x_my_n}$ +OPT(m−1,n−1)
- OPT(m,n)= $\delta$ + OPT(m−1,n)
- OPT(m, n) = $\delta$ + OPT(m, n − 1).
Therefore we have
$$OPT(i,j)=\min[\alpha_{x_iy_j} +OPT(i−1,j−1),\delta + OPT(i−1,j),\delta+OPT(i,j−1)]$$
Algorithm
Alignment(X ,Y )
- Array A[0…m,0…n]
- Initialize A[i, 0]= i$\delta$ for each i
- Initialize A[0,j]=j$\delta$ for each j
- For j=1,…,n
- For i=1,…,m
- Use the recurrence (6.16) to compute A[i, j]
- Endfor
- For i=1,…,m
- Endfor
- Return A[m, n]
Analysis
Running time = O(mn). Space = O(mn)
- Algorithm Design by Jon Kleinber, Eva Tardos [return]