Strings in SGI STL
This is an attempt to answer some of the questions related to the use
of strings with SGI STL.
Why does the SGI STL not include a standard-conforming string package?
There is currently no approved standard for either C++ or its library.
There is a second committee draft, which has been made available for
public comment. Unfortunately, the string class interface is arguably
the least satisfactory part of the current draft. We do not consider
the basic_string template, as defined in the current draft, to
be usable. We expect the situation to improve as the standard is
finalized.
What's wrong with the string class defined by the standard?
There are several problems, but the most serious one is the
specification for lifetimes of references to characters in a string.
The current draft standard disallows the expression s[1] ==
s[2] where s is a nonconstant string. This is not simply
an oversight; current reference counted implementations may fail for
more complicated examples. They may fail even for s[1] ==
s[2] if the string s is simultaneously examined
(not necessarily modified) by another thread. It is hard to define
precisely what constitutes a correct use of one of the current
reference counted implementation. It currently appears that any
defensible solution involves a significant interface change.
I have been using a refrence counted implementation, and it works fine.
Why haven't I seen problems?
The current implementations do work correctly, most of the time. But
there may be some sequential programs that fail. A program may no
longer work when ported between platforms that both claim to support
the same string interface. Multithreaded applications will fail with
very small probability, perhaps once every few months. But it is likely
that a large fraction of multithreaded clients will fail occasionally,
thus making such a library completely inappropriate for multithreaded
use.
So what should I use to represent strings?
There are several possible options, which are appropriate under
different circumstances:
-
Ropes
-
Use the rope
package provided by SGI STL. This provides all functionality that's
likely to be needed. Its interface is similar to the current draft
standard, but different enough to allow a correct and thread-safe
implementation. It should perform reasonably well for all applications
that do not require very frequent small updates to strings. It is the
only alternative that scales well to very long strings, i.e.
that could easily be used to represent a mail message or a text file
as a single string.
The disadvantages are:
-
Single character replacements are slow. Consequently STL algorithms are
likely to be slow when updating ropes. (Insertions
near the beginning take roughly the same amount of time as single
character replacements, and much less time than corresponding
insertions for the other string alternatives.)
-
The rope
implementation stretches current compiler technology. Portability and
compilation time may be an issue in the short term. Pthread performance
on nonSGI platforms will be an issue on platforms for which
machine-specific fast reference counting code has not yet been written.
-
C strings
-
This is likely to be the most efficient way to represent a large
collection of very short strings. It is by far the most space efficient
alternative for small strings. For short strings, the C library
functions in <string.h> provide an efficient set of tools for
manipulating such strings. They allow easy communication with the C
library. The primary disadvantages are that
-
Operations such as concatenation and substring extraction are much more
expensive than for ropes if the strings
are long. A C string is not a good representation for a text file in an
editor.
-
The user needs to be aware of sharing between string representations.
If strings are assigned by copying pointers, an update to one string
may affect another.
-
They provide no help in storage management. This may be a major issue,
though not if a garbage collector is used.
-
vector<char>
-
If a string is treated primarily as an array of characters, with
frequent in-place updates, it is reasonable to represent it as vector<char> or vector<wchar_t>. The same is
true if it will be modified by STL container algorithms. Unlike C
strings, vectors handle internal storage management automatically, and
operations that modify the length of a string are generally more
convenient.
Disadvantages are:
-
Vector assignments are much more
expensive than C string pointer assignments; there is no possibility
for sharing string representations.
-
Most operations on entire strings (e.g. assignment,
concatenation) do not scale well to long strings.
-
A number of standard string operations (e.g.
concatenation and substring) are not provided with the usual syntax,
and must be expressed using generic STL algorithms. This is usually not
hard.
-
Conversion to C strings is currently slow, even for short strings. That
may change in future implementations.
What about mstring.h, as supplied with SGI's 7.1 compiler?
This package was a minimal adaptation of the freely available Modena
strings package. It was intended as a stopgap. We do not intend to
develop it further.
It shares some of the reference lifetime problems of other
implementations that try to conform to the draft standard. Its exact
semantics were never well-defined. Under rare conditions, it will have
unexpected semantics for single-threaded applications. We strongly
discourage use for multithreaded applications.
Copyright ©
1996 Silicon Graphics, Inc. All Rights Reserved.
TrademarkInformation