Silicon Graphics, Inc.

Strings in SGI STL

This is an attempt to answer some of the questions related to the use of strings with SGI STL.

Why does the SGI STL not include a standard-conforming string package?

There is currently no approved standard for either C++ or its library. There is a second committee draft, which has been made available for public comment. Unfortunately, the string class interface is arguably the least satisfactory part of the current draft. We do not consider the basic_string template, as defined in the current draft, to be usable. We expect the situation to improve as the standard is finalized.

What's wrong with the string class defined by the standard?

There are several problems, but the most serious one is the specification for lifetimes of references to characters in a string. The current draft standard disallows the expression s[1] == s[2] where s is a nonconstant string. This is not simply an oversight; current reference counted implementations may fail for more complicated examples. They may fail even for s[1] == s[2] if the string s is simultaneously examined (not necessarily modified) by another thread. It is hard to define precisely what constitutes a correct use of one of the current reference counted implementation. It currently appears that any defensible solution involves a significant interface change.

I have been using a refrence counted implementation, and it works fine. Why haven't I seen problems?

The current implementations do work correctly, most of the time. But there may be some sequential programs that fail. A program may no longer work when ported between platforms that both claim to support the same string interface. Multithreaded applications will fail with very small probability, perhaps once every few months. But it is likely that a large fraction of multithreaded clients will fail occasionally, thus making such a library completely inappropriate for multithreaded use.

So what should I use to represent strings?

There are several possible options, which are appropriate under different circumstances:

Ropes
Use the rope package provided by SGI STL. This provides all functionality that's likely to be needed. Its interface is similar to the current draft standard, but different enough to allow a correct and thread-safe implementation. It should perform reasonably well for all applications that do not require very frequent small updates to strings. It is the only alternative that scales well to very long strings, i.e. that could easily be used to represent a mail message or a text file as a single string.

The disadvantages are:

C strings
This is likely to be the most efficient way to represent a large collection of very short strings. It is by far the most space efficient alternative for small strings. For short strings, the C library functions in <string.h> provide an efficient set of tools for manipulating such strings. They allow easy communication with the C library. The primary disadvantages are that
vector<char>
If a string is treated primarily as an array of characters, with frequent in-place updates, it is reasonable to represent it as vector<char> or vector<wchar_t>. The same is true if it will be modified by STL container algorithms. Unlike C strings, vectors handle internal storage management automatically, and operations that modify the length of a string are generally more convenient.

Disadvantages are:

What about mstring.h, as supplied with SGI's 7.1 compiler?

This package was a minimal adaptation of the freely available Modena strings package. It was intended as a stopgap. We do not intend to develop it further.

It shares some of the reference lifetime problems of other implementations that try to conform to the draft standard. Its exact semantics were never well-defined. Under rare conditions, it will have unexpected semantics for single-threaded applications. We strongly discourage use for multithreaded applications.


[Silicon Surf] [STL Home]
Copyright © 1996 Silicon Graphics, Inc. All Rights Reserved. TrademarkInformation