OpenFPGA Membership |
OpenFPGA Benchmarks Benchmark #1 Smith-Waterman The Smith-Waterman algorithm is a dynamic program used to find regions of similarity between two sequences. It is used to compute local alignment between nucleotide sequences, or between amino acid sequences. For this benchmark, we will assume that one of the inputs is the alphabet used is specified at run time (max 32 characters) and could be converted to a sequence of 5-bit integers as part of the preprocessing (in the CPU) before the data is passed to the FPGA. Also, one of these sequences, the reference sequence, is no longer than 10M characters long. The second sequence, the test sequence, is unbounded, and is expected to be streamed into the FPGA. The similarity metric allows not only replacement of characters, but also insertion and deletions (called gaps, since they represent a gap in one of the two sequences). It is usually considered less significant to extend a gap, once in a gap, than to initially enter the gap. Thus, two penalties are provided. The expected inputs for this benchmark are:
The expected output will be a file containing for each reported match, the start and end in each sequence, along with the match score. (Since the detailed match path is not required, this eliminates the need for backtracking (the start point can be carried as part of the state). The exact path can be computed off-line, in the rare case that a match is identified, and thus is not part of the benchmark.) Let R[i] be the i’th character in the reference sequence, and T[j] be the j’th character in the test sequence. Let S[i,j] be the state at position i,j. This state will contain the following information: The values in S[i,j] can be defined as follows: For i=0 or j=0: P[i,j] = G[i,j]=0 PS[i,j]= GS[i,j]=i,j Else: G[i,j] = max of 0 => GS[i,j]=i,j G[i-1,j]+GE=> GS[i,j]=GS[i-1,j] P[i-1,j]+GP=> GS[i,j]=PS[i-1,j] G[i,j-1]+GE=> GS[i,j]=GS[i,j-1] P[i,j-1]+GP=> GS[i,j]=PS[i,j-1] P[i,j] = max of G[i,j] => PS[i,j]=GS[i,j] P[i-1,j-1]+K[i,j]=> PS[i,j]=PS[i-1,j-1] Implementation note: Note that the calculation of S[i,j] and S[i+1,j-1] can be calculated at the same time, since they do not depend on each other. Thus the usual approach is to calculate at step t S[i,t-i] for all i. |