Suffix Tree for a String T is Trie like data that represents the suffixes of T. For example, if your word were bibs, you would create the following tree.

Suffix trees answers the queries related to one string. There are many applications of suffix trees. Some of them are..

- String Matching and Pattern Matching algorithm
- Longest Repeated SubString
- Longest palindrome
- Longest common substring
- Longest Common prefix

Sample java Implementation of Suffix Tree:

###### String Search

Searching for a substring, `pat[1..m]`

, in `txt[1..n]`

, can be solved in O(m) time (after the suffix tree for `txt`

has been built in O(n) time).

###### Longest Repeated Substring

Add a special ``end of string'' character, e.g. `$', to `txt[1..n]`

and build a suffix tree; the *longest repeated substring* of `txt[1..n]`

is indicated by the *deepest* fork node in the suffix tree, where depth is measured by the number of *characters* traversed from the root, i.e., `issi' in the case of `mississippi'. The longest repeated substring can be found in O(n) time using a suffix tree.

###### Longest Common Substring

The *longest common substring* of two strings, T1 and T2, can be found by building a *generalized* suffix tree for T1 and T2 : Each node is *marked* to indicate if it represents a suffix of `T1 `

or T2 or both. The deepest node marked for both T1 and T2 represents the longest common substring.

Equivalently, one can build a (basic) suffix tree for the string T1$T2#, where `$' is a special terminator for T1 and `#' is a special terminator for T2. The longest common substring is indicated by the deepest fork node that has both `...$...' and `...#...' (no $) beneath it.

**Palindromes**

A palindrome is a string, P, such that P=reverse(P). e.g. `abba'=reverse(`abba'). e.g. `ississi' is the *longest* palindrome in `mississippi'.

The longest palindrome of T can be found in O(n) time, e.g. by building the suffix tree for T`$reverse(T)#`

or by building the generalized suffix tree for T and `reverse(T)`

.

## 2 comments:

Simple implementation. If I want to use it for example to check if a given string is a combination of dictionary words, how could I use this structure. For example:

String in1 = "TaskForSuffixTree"

I read this string into the tree and then check all possibilities against my dictionary (hashed into a Bloom Filter) and compare to determine: "True" because is a combination of 4 dictionary words.. But

String in2 = "tengrtstwdert"

Result = "False".

What would be your suggestion of its usage?

@lusim: I would suggest use Trie data structure for such scenarios instead of a suffix tree. I will post the solution in my next blog.

Post a Comment