| tags:research categories:Publications

# Graph swap matching

In this paper we present a streaming algorithm for Pattern Matching with Swaps (Swap Matching).

## Abstract

The pattern matching problem with swaps is to find all occurrences of a pattern in a text while allowing the pattern to swap adjacent symbols. The goal is to design fast matching algorithm that takes advantage of the bit parallelism of bitwise machine instructions and has only streaming access to the input. We introduce a new approach to solve this problem based on the graph theoretic model and compare its performance to previously known algorithms. We also show that an approach using deterministic finite automata cannot achieve similarly efficient algorithms. Furthermore, we describe a fatal flaw in some of the previously published algorithms based on the same model. Finally, we provide experimental evaluation of our algorithm on real-world data.

## Availability

PDF is available from Arxive.

## Codes & Other

All related code materials: codes.tar.gz

Tested on:

- DNA ftp://ftp.ensembl.org/pub/release-90/fasta/homo_sapiens/dna/
- Proteins http://www.data-compression.info/Corpora/ProteinCorpus/
- Bible http://corpus.canterbury.ac.nz/descriptions/large/bible.html

Raw GSM code (used for testing):

```
void gsm(string &P, const char *T, int p, int t){
const int u = 0, m = 1, d = 2;
// DMASK initialization
int i;
bint F = 1;
bint D[SIGMA];
for(i = 0; i < SIGMA; ++i) D[i] = 0;
for(i = 0; i < p; ++i){
D[P[i]] |= F;
F = F << 1;
}
// GSM initialization
bint tmp[3]; // temporary vectors
bint r[3]; // result vectors
r[0]=r[1]=r[2]=0;
bint resCheck = ((bint)1) << (p - 1);
// GSM execution
for(i = 0; i < t; ++i){
tmp[u] = (r[d] << 1) | 1;
tmp[m] = (r[m] << 1) | (r[u] << 1) | 1;
tmp[d] = tmp[m];
r[u] = tmp[u] & (D[T[i]] << 1);
r[m] = tmp[m] & D[T[i]];
r[d] = tmp[d] & (D[T[i]] >> 1);
if(((r[m] | r[u]) & resCheck) != 0) {
FOUND(i - p + 2);
}
}
}
```