Up Next
Go up to Top
Go forward to Adaptors

Generic Algorithms

The C++ Standard Library specification of generic algorithms places requirements not only on the syntax (interfaces) and semantics (input/output relations) of the algorithms, but also on the computing time the algorithms are allowed to take. Most of the fundamental data processing algorithms specified have O(N) (linear time) or O(N logN) bounds, but in a few cases O(N2) (quadratic) worst case time behavior is permitted. Such bounds were specified wherever there were algorithms known, such as quicksort, that were significantly faster than other algorithms in the average case, and the cases that could cause quadratic behavior were rarely encountered in practice. It would be much more satisfactory, however, if these bounds could be tightened, disallowing quadratic behavior for these generic algorithms entirely. The following papers show how this goal can be achieved.

  • David R. Musser, Introspective Sorting and Selection Algorithms (draft April 1996, revised December 1996, final version published in Software--Practice and Experience, (8): 983-993 (1997)), presents a new, hybrid sorting algorithm, introsort (for introspective sort), that behaves almost exactly like median-of-3 quicksort for most inputs (and is just as fast) but which is capable of detecting when partitioning is tending toward quadratic behavior. By switching to heapsort in those situations, introsort achieves the same O(N logN) time bound as heapsort but is almost always faster than just using heapsort in the first place. Detailed performance measurements reported in the paper were conducted with the aid of the new counting components discussed in a later section.
  • The same paper describes a new selection algorithm, introselect (for selecting the k-th largest element of a sequence) based on an introspective approach similar to that of introsort. Its worst case computing time is O(N) instead of the O(N2) bound of Hoare's find algorithm.
  • David R. Musser and Gor V. Nishanov, A Fast Generic Sequence Matching Algorithm (December 1997; also available, without appendices, as TR 97-11), describes a string matching--and more generally, sequence matching--algorithm that has a linear worst-case computing time bound, a low worst-case bound on the number of comparisons (2N), and sublinear average-case behavior that is better than that of the fastest versions of the Boyer-Moore algorithm. The algorithm retains its efficiency advantages in a wide variety of sequence matching problems of practical interest, including traditional string matching; large-alphabet problems (as in Unicode strings); and small-alphabet, long-pattern problems (as in DNA searches). Since it is expressed as a generic algorithm for searching in sequences over an arbitrary type T, it is well suited for use in generic software libraries such as the C++ Standard Template Library. The algorithm was obtained by adding to the Knuth-Morris-Pratt algorithm one of the pattern-shifting techniques from the Boyer-Moore algorithm, with provision for use of hashing in this technique. In situations in which a hash function or random access to the sequences is not available, the algorithm falls back to an optimized version of the Knuth-Morris-Pratt algorithm. The source files for the algorithms, test programs, and test data are available.
Based on these results, quadratic computing time behavior could now be banished from the STL components! Unfortunately it is too late to change the C++ Standard, but perhaps compiler and library vendors will adopt the new algorithms voluntarily. Silicon Graphics has already adopted introsort instead of quicksort for the implementation of sort in SGI STL, and it has propagated from there to GNU C++ and to Boris Fomitchev's STLport (which makes SGI STL available on many different platforms).


 

Up Next