Wednesday, March 16, 2011

Suffix Trees

Suffix Tree for a String T is Trie like data that represents the suffixes of T. For example, if your word were bibs, you would create the following tree.

Capture

Suffix trees answers the queries related to one string. There are many applications of suffix trees. Some of them are..

  1. String Matching and Pattern Matching algorithm
  2. Longest Repeated SubString
  3. Longest palindrome
  4. Longest common substring
  5. Longest Common prefix

Sample java Implementation of Suffix Tree:

String Search

Searching for a substring, pat[1..m], in txt[1..n], can be solved in O(m) time (after the suffix tree for txt has been built in O(n) time).

Longest Repeated Substring

Add a special ``end of string'' character, e.g. `$', to txt[1..n] and build a suffix tree; the longest repeated substring of txt[1..n] is indicated by the deepest fork node in the suffix tree, where depth is measured by the number of characters traversed from the root, i.e., `issi' in the case of `mississippi'. The longest repeated substring can be found in O(n) time using a suffix tree.

Longest Common Substring

The longest common substring of two strings, T1 and T2, can be found by building a generalized suffix tree for T1  and T2 : Each node is marked to indicate if it represents a suffix of T1 or T2 or both. The deepest node marked for both T1 and T2 represents the longest common substring.

Equivalently, one can build a (basic) suffix tree for the string T1$T2#, where `$' is a special terminator for T1 and `#' is a special terminator for T2. The longest common substring is indicated by the deepest fork node that has both `...$...' and `...#...' (no $) beneath it.

Palindromes

A palindrome is a string, P, such that P=reverse(P). e.g. `abba'=reverse(`abba'). e.g. `ississi' is the longest palindrome in `mississippi'.

The longest palindrome of T can be found in O(n) time, e.g. by building the suffix tree for T$reverse(T)# or by building the generalized suffix tree for T and reverse(T).

2 comments:

Luchito said...

Simple implementation. If I want to use it for example to check if a given string is a combination of dictionary words, how could I use this structure. For example:

String in1 = "TaskForSuffixTree"

I read this string into the tree and then check all possibilities against my dictionary (hashed into a Bloom Filter) and compare to determine: "True" because is a combination of 4 dictionary words.. But

String in2 = "tengrtstwdert"

Result = "False".

What would be your suggestion of its usage?

Ram said...

@lusim: I would suggest use Trie data structure for such scenarios instead of a suffix tree. I will post the solution in my next blog.