|
Problem 34: Worst Case of the Boyer-Moore Algorithm |
|
a Boyer-Moore string matching is based on a technique that leads to the fastest searching algorithms for fixed patterns. Its main feature is to scan the pattern backward when aligned with a factor of the searched text. A typical pattern preprocessing is shown in Problem 33.
When locating a fixed pattern $x$ of length $m$ in a text $y$ of length $n$, in the generic situation $x$ is aligned with a factor (the window) of $y$ ending at position $j$ (see picture). The algorithm computes the longest common suffix ($\lcs$) between $x$ and the factor of $y$; after possibly reporting an occurrence, it slides the window towards the end of $y$ based on the preprocessing and on information collected during the scan, without missing an occurrence of $x$. Algorithm BM implements the method with table $\tbsuff$ of Problem 33.
\begin{algorithmic}
\STATE $j\leftarrow m-1$
\WHILE{$j \lt n$}
\STATE $i\leftarrow m-1-|lcs(x,y[j-m+1..j])|$
\IF{$i \lt 0$}
\STATE report an occurrence of $x$ at position $j-m+1$ on $y$
\STATE $j\leftarrow j+per(x)$
\COMMENT{$per(x)=good\text{-}suff[0]$}
\ELSE
\STATE $j\leftarrow j+good\text{-}suff[i]$
\ENDIF
\ENDWHILE
\end{algorithmic}
After position $j$ on $y$ is treated, if an occurrence of $x$ is found the algorithm slides naturally the window at distance $\per(x)$. If no occurrence is found, the distance $\tbsuff[i]$ depends on the factor $bu$ of $y$ (it depends on $au$ of $x$ in Problem 33). Value $\per(x)$ and array $\tbsuff$ are preprocessed before the search.