Linked Lists I
Advanced Programming---CSCI-6090
See also SGI's STL Programmer's Guide
description of lists; or Austern, Generic Programming and the STL,
Chapter 5 and Sections 9.1, 9.2,
and 16.1.2.
Linked lists are one of the most useful ways of representing
linear sequences of data. Linked lists are the sequence representation
of choice when many insertions and/or deletions are required in
interior positions and there is little need to jump around randomly
from one position to another. That's because
- Insertion at any position in a linked list is a
constant time operation, because it only involves changing a
constant number of pointers. With an array (or STL vector or deque)
representation, on the other hand, insertion requires moving all the
elements past the point of insertion over to make room for the new
element, which takes linear time. A similar situation holds
with deletion.
- On the other hand, accessing (getting or storing) the value at a
given position can be much slower for linked lists than for arrays.
Given a position p in an array, if we need to obtain the
value at some position p + k, it can be done in constant time
(i.e., independent of the size of k), but in a linked list
representation it takes linear time---linearly proportional to the
size of k. (With an array the address of the
(p + k)-th position can be calculated as the address of
position p plus k times the size of each value, so
just with one multiplication and one addition operation; but with a
linked list the only way to advance to position p + k from
p is step by step, following links k times.)
We now want to look beyond these basic facts and examine in
detail how linked-list manipulation can be implemented. The issues we
should try to understand about linked list implementations include:
- Advantages/disadvantages of major implementation design choices; e.g.,
- doubly-linked lists versus singly-linked
- circular versus noncircular lists
- inclusion or noninclusion of header nodes
- How iterators can be defined for linked lists, to provide
a convenient means of traversal.
- What special sequence operations are permitted by
linked lists. ("Special" in the sense that they can be implemented
very efficiently with linked lists, such as the splice
operations in the STL list class.)
- What special storage management techniques can be used to
improve performance.
- What special considerations are necessary for implementation of
generic list structures (i.e., where the element type is a template
parameter) so that they work correctly and efficiently no matter what
type is substituted.
We will
look at the Hewlett-Packard (HP) STL list class source
code, or actually a slightly reorganized and simplified version of
it. (HP STL was the basis for most STL implementations. Later
implementations of STL lists differ most significantly from HP's
in the way they handle storage management.)
First, here is an overview of how the HP STL list class
implementation stacks up in terms of the issues mentioned above:
- Lists are doubly-linked, circular, and have a header node.
- Iterators are provided with two nested classes, named
iterator and const_iterator, which satisfy the STL
requirements for bidirectional iterators (e.g., they define
both ++ and -- operators so that efficient traversal
is possible in either direction). The const_iterator class
provides operations analogous to const pointers (which really means
pointer-to-const), so that one gets the full benefit of C++ compiler
checks for attempts to modify constant objects. There are also
reverse iterators defined by typedefs using STL's
reverse_iterator adaptor.
- There are several special list operations provided, including
splicing (transferring all or part of one list to another in constant
time by relinking) and special merge and sort algorithms that take
advantage of the linked representation.
- Special storage management techniques are used so that
new and delete do not have to be called for each
node. The additional complication
can be dealt with in a separate memory management class.
- Throughout the code, implementation techniques have been chosen
so that no matter what type of elements are stored in the list the
operations work correctly and efficiently.
For this discussion we will refer to a modified version
of the Hewlett-Packard STL source code for the list class, contained in the
following files:
- list.h: The list class (with only the
interface, not the body, of most member functions).
- listiter.h: iterator and
const_iterator classes and typedefs
of reverse_iterator, and const_reverse_iterator.
These are nested inside list via an #include directive.
- list.C: The bodies of all member functions
not defined in list.h.
- memman.h: Defines an
auxiliary class to handle list memory management efficiently.
This code is based on the Hewlett-Packard STL distribution and is
functionally identical to it except that all dependence on
allocators has been removed in order to make the code more
self-contained. This change does not affect the efficiency of the
implementation; it only makes it a little less flexible (the use of
allocators in STL containers makes them easily adaptable to different
memory models such as those supported by Borland's older 16-bit
compilers). There are also some changes in organization of the code to
make it easier to read and understand, the main one being that the
special list memory management code has been moved out of the list
class into the separate list_memory_manager class in
memman.h.
STL List Node Representation
The list class definition begins with the definition of the
node structure used for the doubly-linked representation:
template
class list {
protected:
// Representation (doubly linked):
struct list_node {
list_node* next;
list_node* prev;
T data;
};
...
This struct is defined in a protected
part of the class; it is not public because it is better not
to give the list user direct access to the representation (information
hiding), and protected is better than private in
that any class derived from list will have access to the
representation.
Type definitions
The next part of the list class definition is a series of typedefs
of identifiers that are required to be defined by the STL specifications
for sequence containers (value_type, reference, const_reference,
size_type, and difference_type) or for internal use (pointer, link_type,
and list_memory_manager_type).
...
public:
// Types:
typedef T value_type;
typedef T& reference;
typedef const T& const_reference;
typedef size_t size_type;
typedef ptrdiff_t difference_type;
private:
typedef T* pointer;
typedef list_node* link_type;
typedef list_memory_manager list_memory_manager_type;
...
The last three types are intended for internal use only and thus are
put in the private part of the class definition.
Data members
Next we have the data members of list objects:
...
protected:
link_type node;
size_type length;
static size_type number_of_lists;
static list_memory_manager_type manager;
...
We will see that node is a pointer to the header node of the
list and length is maintained as the number of data elements
in the list. The static member number_of_list keeps track of
how many lists are in existence, so that when the number becomes zero
the storage manager can clean everything up. Finally, storage
management routines are accessed through the static member
manager. (Recall that a static member is shared by all
objects of the class, rather than one per object.)
Iterators
The next section consists of a nested class definition or a typedef
for each of the four kinds of list iterators.
...
public:
// Iterator types (defined by nested classes and typedefs)
// class iterator;
// class const_iterator;
// class reverse_iterator;
// class const_reverse_iterator;
#include "listiter.h"
...
We will come back to these definitions later.
Member functions
The rest of the class definition contains the interfaces of all
of the member functions, including constructors, destructor,
assignment and swap operations, queries (or accessors), insertion,
erasure, and special list functions (splicing, removal, merging,
and sorting). In a few cases the bodies are given inline, but
for most of the operations the bodies are given in the separate
file list.C.
Constructors
Turning to the file list.C, let's see how the
default constructor is implemented:
...
// Default constructor:
template
inline list::list() : length(0) {
++number_of_lists;
node = manager.get_node();
node->next = node;
node->prev = node;
}
...
This constructor initializes the list with 0 for its length
member, increments number_of_lists (which is initialized to
zero with a static member initialization at the end of list.C),
initializes node with a pointer to a new list_node returned
by manager.get_node, and sets the next and
prev members of the node to point to the node itself. For
the moment, think of manager.get_node() as being a call to
new list_node; we will go into how it really works later.
The other constructors are defined in terms of one of
the insert member functions, so let us turn now
to those functions.
Insertion
For understanding the code in the insert member functions we need to
know just a little about the representation of iterators, namely that
each iterator object maintains its current place in a list
with a data member called current holding
a link_type pointer to a node of the list.
template
inline list::iterator list::insert(iterator position, const T& x) {
link_type tmp = manager.get_node();
new (&(tmp->data)) T(x);
tmp->next = position.current;
tmp->prev = (position.current)->prev;
(position.current)->prev->next = tmp;
(position.current)->prev = tmp;
++length;
return tmp;
}
The line
new (&(tmp->data)) T(x);
is a little unusual and will be more easily understood
after we've gone over memory management.
For now let's just assume it is equivalent to
tmp->data = x;
The rest of the code contains a lot of pointer manipulation, but it's
easy to see what it does if we draw the nodes and pointers and see how
they change as each assignment is executed. The nice part about it
is that there are no special cases to check for; that's one of
the advantages of having a header node.
Exercise: Do this hand simulation of the insert operation on an
empty list (as initialized above with list()), T
equal to int, position equal to begin() and
x equal to 3.
The other STL list function implementations
are in file list.C.
Insert members
So far we've looked at the implementation of the basic list
insert member function. The list class also defines several
other insert members and implements them in terms of the
basic one:
template
void list::insert(iterator position, const T* first, const T* last) {
while (first != last) insert(position, *first++);
}
template
void list::insert(iterator position, const_iterator first,
const_iterator last) {
while (first != last) insert(position, *first++);
}
template
void list::insert(iterator position, size_type n, const T& x) {
while (n--) insert(position, x);
}
Other constructors
The other list constructors besides the default constructor
are defined in terms of the above insert members:
// Construct a list of n elements, each a copy of a given value:
template
//inline list::list(size_type n, const T& value = T()) : length(0) {
inline list::list(size_type n, const T& value) : length(0) {
++number_of_lists;
node = manager.get_node();
node->next = node;
node->prev = node;
insert(begin(), n, value);
}
// Construct a list initialized to a given range of values:
template
inline list::list(const T* first, const T* last) : length(0) {
++number_of_lists;
node = manager.get_node();
node->next = node;
node->prev = node;
insert(begin(), first, last);
}
// Copy constructor:
template
inline list::list(const list& x) : length(0) {
++number_of_lists;
node = manager.get_node();
node->next = node;
node->prev = node;
insert(begin(), x.begin(), x.end());
}
Before looking at the destructor, let's examine the erase
operations, since the destructor uses one of them.
Erasure
Erasure of an item pointed to by an iterator is done by
relinking the surrounding nodes to point to each other
instead of the node containing the item, then cleaning up that node
and returning it to the storage manager:
template
inline void list::erase(iterator position) {
(position.current)->prev->next = (position.current)->next;
(position.current)->next->prev = (position.current)->prev;
(position.current)->data.~T();
manager.put_node(position.current);
--length;
}
Note that if T is instantiated with a built in type such as
int, the destructor call is to ~int(), a function
that doesn't exist. This looks like a problem, but in fact the
compiler is supposed to ignore it and generate no code for the call.
(xlC issues a warning, but it shouldn't.)
Instead of the destructor call and the call of manager.put_node
the more usual storage management procedure would be to call
delete position.current;
This would first call (position.current)->~link_node(), and then
return the node to the standard storage deallocator. In struct
link_node (defined in list.h), we didn't
define ~link_node(), but the default destructor definition
supplied by the compiler calls the destructor of each member
that can have one. In this case, the data member is the only
one that can have a destructor (pointers cannot), so the definition
supplied by the compiler is equivalent to
~link_node() { data.~T(); }
So our explicit destructor call
(position.current)->data.~T();
takes care of this chore, and
manager.put_node(position.current);
disposes of the node itself, in a much more efficient way than the
standard storage deallocator would (as we will see when we look at
the list_memory_manager class).
Exercise: Interesting cases to consider are when one or both of
the surrounding nodes are the header (that is, the node being erased
is the first, last, or only data node in the list). Verify by hand
simulation that the code works perfectly well in these cases even
though it has no special checks for them. (This is another advantage
of the header node and also of circular linking.)
There is one other erase member function:
template
void list::erase(iterator first, iterator last) {
while (first != last) erase(first++);
}
Destructor
The list destructor uses the second erase member to
erase all elements of the list (thus calling ~T() on each
data element). It then returns the header node to the storage
allocator and decrements number_of_lists. If this was the
last list, the next line "turns out the lights" (deallocates all of
the space allocated for list nodes).
template
inline list::~list() {
erase(begin(), end());
manager.put_node(node);
if (--number_of_lists == 0)
manager.deallocate_buffers();
}
Exercise: Consider the case in which T is list<int>
, as in
list< list > list1;
Suppose several list<int> values have been created and
inserted in list1. What happens, in terms of erase and
destructor calls, when list1 goes out of scope?
Assignment and swap operations
The code for the assignment operation is interesting because
of the way it reuses list nodes that make up the left hand
operand of the assignment.
template
list& list::operator=(const list& x) {
if (this != &x) {
iterator first1 = begin();
iterator last1 = end();
const_iterator first2 = x.begin();
const_iterator last2 = x.end();
while (first1 != last1 && first2 != last2)
*first1++ = *first2++;
if (first2 == last2)
erase(first1, last1);
else
insert(last1, first2, last2);
}
return *this;
}
Note that the code first checks if the left and right operands of
operator= are the same object; if so, it does nothing.
This check is necessary because executing the rest of the code would
be incorrect if both operands were the same object.
The first part of the code, through the while loop, copies as much
of the right hand list (x) as will fit into the nodes of the
left hand list (this). If the end of x is reached
before the end of the left hand list, the remainder of the left hand
list is erased; otherwise, the remainder of x is copied onto
the end of the left hand list.
This technique minimizes the number of calls to the storage manager.
Swap operation
Swapping two lists could be done with the STL generic swap
function:
template
inline void swap(T& a, T& b) {
T tmp = a;
a = b;
b = tmp;
}
Note, though, that when T is list<U> this uses
three list<U> assignments, each of which involves
copying all of the U values in one list to another. It is
much more efficient just to swap the data members of the two lists:
template
inline void list::swap(list& x) {
::swap(node, x.node);
::swap(length, x.length);
}
The notation ::swap indicates that the globally defined
swap function should be used in these calls.
Auxiliary function, transfer, for implementing special list
operations
The transfer operation alters a list by inserting into it a
segment removed from another list. It is not one of the list
public members, but it is used in implementing the public
splice, merge, and reverse operations. It
is similar to insert but involves no copying of data or
allocation of nodes; instead it re-links a segment of nodes out of one list
and into another, modifying both lists (which might in fact be the same
list).
template
inline void list::transfer(iterator position,
iterator first, iterator last) {
(last.current)->prev->next = position.current;
(first.current)->prev->next = last.current;
(position.current)->prev->next = first.current;
link_type tmp = (position.current)->prev;
(position.current)->prev = (last.current)->prev;
(last.current)->prev = (first.current)->prev;
(first.current)->prev = tmp;
}
As with insert, there is a lot of pointer manipulation, but it is
not difficult to see how it works:
Exercise: Do a hand simulation of the transfer
operation on list1 containing 1 2 3 4 and list2
containing 11 12 13 14 15 16, with position pointing to (the
node containing) 2, first pointing to 12, and last
pointing to 15. After the transfer operation, list1
should contain 1 12 13 14 2 3 4 and list2 should contain 11 15
16. Note that the item pointed to by last (15) is not
part of the segment transfered.
The main thing to note about transfer is that since it only
does a constant number of pointer assignments, it is a constant time
operation. If first and last define a long segment
it is much more efficient to move that segment via transfer
than it would be to copy it, since copying would have to take linear
time. On the other hand, using transfer is inappropriate if
the original value of the source list is needed in later calculations.
In that case copying (with insert(position, first, last))
should be done instead.
Splicing
The splice operations, which are described and illustrated in
SGI's STL Programmer's Guide description of
lists, are easily implemented in terms of
transfer.
template
inline void list::splice(iterator position, list& x) {
if (!x.empty()) {
transfer(position, x.begin(), x.end());
length += x.length;
x.length = 0;
}
}
template
inline void list::splice(iterator position, list& x, iterator i) {
iterator j = i;
if (position == i || position == ++j) return;
transfer(position, i, j);
++length;
--x.length;
}
template
inline void list::splice(iterator position, list& x,
iterator first, iterator last) {
if (first != last) {
if (&x != this) {
difference_type n = 0;
distance(first, last, n);
x.length -= n;
length += n;
}
transfer(position, first, last);
}
}
In the last version of splice we have to count the number of
nodes from first to last in order to update the
lengths of the two lists. Note that this is necessary only if two
different lists are involved (determined with the test (&x !=
this)). In this case, splice takes linear, rather than
constant, time.
Sorting
Of the remaining operations the most algorithmically interesting is
sort. It is a form of merge sort, with the merging done
bottom-up, nonrecursively. The key idea of the algorithm is ensuring
that merges are done on two equal-length lists whenever possible, by
maintaining a number of lists separately according to their length
until they can be combined with other lists of the same length. The
lists are maintained in an array called counter, with
counter[i] either empty or holding a list of length 2i. This
array is so named because its changing state resembles the state of a
binary counter as it is being incremented. Merging of two equal-length
lists produces a "carry" into the next position. The carry either
occupies that position if it is currently empty, or it is merged with
the nonempty list in that position, producing another carry, and so
on.
template
void list::sort() {
if (size() < 2) return;
list carry;
list counter[64];
int fill = 0;
while (!empty()) {
carry.splice(carry.begin(), *this, begin());
int i = 0;
while(i < fill && !counter[i].empty()) {
counter[i].merge(carry);
carry.swap(counter[i++]);
}
carry.swap(counter[i]);
if (i == fill) ++fill;
}
for (int i = 1; i < fill; ++i) counter[i].merge(counter[i-1]);
swap(counter[fill-1]);
}
Exercise: Hand simulate this algorithm with a list containing
14 8 9 5 11 2 4 15 1 3 6 12 7. Show the counter array contents
after each iteration of the main while loop.
Exercise: The counter array used in this algorithm
has a fixed size of 64. Is there any danger of running off the
end of the array? Explain.
Exercise: Show that the computing time for this algorithm
is O(n log n).