String Pattern Matching for a Deluge Survival Kit

Alberto Apostolico and Maxime Crochemore

String Pattern Matching concerns itself with algorithmic and combinatorial issues related to matching and searching on linearly arranged sequences of symbols, arguably the simplest possible discrete structures. As unprecedented volumes of sequence data are amassed, disseminated and shared at an increasing pace, effective access to, and manipulation of such data depend crucially on the efficiency with which strings are structured, compressed, transmitted, stored, searched and retrieved. This paper samples from this perspective, and with the authors' own bias, a rich arsenal of ideas and techniques developed in more than three decades of history.

Keywords: String matching, Searching, Indexing, Fingerprinting, Regular expressions, Filtration, Inference, Periodicities, Compression, encoding, Probabilistic automata, Markov chains, Antidictionaries, Association rules.