Linked Lists II
Advanced Programming---CSCI-6090
In looking at the details of the Hewlett-Packard STL list
class source code, we've gone over the class data and function
members and some of the type definitions inside the class.
Now let's look at two other major features of this implementation:
- The iterator class definitions and typedefs that are nested
inside the list class. These are defined in file
listiter.h.
- The special storage management code in the
list_memory_manager class defined in file
memman.h.
List iterator types
Iterator types define a convenient and standardized means of
traversing the items in a sequence. STL requires that these
types present an interface with the same operators
one uses with ordinary C++ pointers; in particular, they
must define ++ as an operation for advancing to the next
position in a sequence and * for getting a reference to
the current position so it can be stored into or accessed.
We will now see exactly how this is done in the case of
iterators for traversing STL lists.
Class iterator
Here is the definition of the basic list iterator type,
slightly simplified:
class iterator {
protected:
link_type current;
public:
iterator() {}
bool operator==(const iterator& x) const {
return current == x.current;
}
bool operator!=(const iterator& x) const {
return current != x.current;
}
reference operator*() const { return current->data; }
iterator& operator++() { // prefix ++
current = current->next;
return *this;
}
iterator operator++(int) { // postfix ++
iterator tmp = *this;
++(*this);
return tmp;
}
iterator& operator--() { // prefix --
current = current->prev;
return *this;
}
iterator operator--(int) { // postfix --
iterator tmp = *this;
--(*this);
return tmp;
}
};
This class defines one data member,
link_type current;
Recall that link_type is defined in class list as
list_node*. Thus iterator objects are
represented by ordinary pointers, but instead of using
the predefined meanings of ++, *, etc.,
on these pointers, we define these operators on the iterator
objects differently.
The definition of * is
reference operator*() const { return current->data; }
where the return type reference is defined in class
list as T&. The operator thus returns a reference to
the data member of the list_node object to which
current points. We can use this reference to get the current
value from the data member in the node or to store a new value into
it. Consider, for example,
list list1;
list1.push_back('a');
list::iterator i = list1.begin();
// Now iterator i points to the list node containing 'a'
cout << *i;
*i = 'b';
cout << *i;
This prints
ab
There are two definitions of ++ on iterator objects:
iterator& operator++() { // prefix ++
current = current->next;
return *this;
}
iterator operator++(int) { // postfix ++
iterator tmp = *this;
++(*this);
return tmp;
}
The first of these functions is the one that is called when
++ is used as a prefix operator, and the second is one called
when ++ is used as a postfix operator. (The use of the
int parameter in the second definition is just an artificial
device in the language design to give the compiler a way to
distinguish the two definitions. In calls of these functions no
int argument is used; we just write ++i for the
prefix version or i++ for the postfix version.) In both
cases, the internal current pointer is advanced to the next
node in the list. With the prefix version, the iterator object
returned is the original one with its new internal pointer value. With
the postfix version, on the other hand, a copy is made of the original
iterator (*this), the original iterator's current pointer is
advanced (with ++(*this), which is a call of the prefix
version), and the copy is returned. This semantics for the return
values is consistent with the way prefix and postfix ++ are
defined on ordinary pointers.
Exercise: Draw before and after diagrams of a list and two
iterators i and j both pointing to the same node in
the list, showing the difference between p = ++i and q =
j++, where p and q are two other iterators.
Exercise: Note that in the prefix version the return type is
iterator& but in the postfix version the return type is
just iterator. Explain why these are the most appropriate
return types. Hint: assume we want
++(++i);
to advance i twice. Would
++(i++);
advance i twice? Would it if postfix ++ returned a reference?
For -- there are again two definitions, set up exactly
analogously to ++, but using the list node's prev
pointer instead of next.
The actual definition of class iterator in class
list has a few more flourishes at its beginning:
class const_iterator; // incomplete definition, for use in friend decl.
class iterator : public std::iterator {
friend class list;
friend class const_iterator;
protected:
link_type current;
iterator(link_type x) : current(x) {}
public:
... rest is the same as discussed above, with one exception
... discussed below
};
Here the iterator class definition is preceded by an incomplete
definition (that's exactly what it's called in C++ terminology):
class const_iterator;
The actual, complete definition of class const_iterator
follows that of class iterator. The incomplete definition is
given so that class const_iterator can be declared a friend
of class iterator, which is necessary because one of
const_iterator's constructors needs access to the current
internal data member of iterator.
We also see that class iterator is derived from struct
iterator:
class iterator : public std::iterator {
...
};
The latter class is defined in the standard header iterator
as follows:
template
struct iterator {
typedef _Category iterator_category;
typedef _Tp value_type;
typedef _Distance difference_type;
typedef _Pointer pointer;
typedef _Reference reference;
};
};
It is used as a kind of "hook" to hang information about the iterator
type so that it can be extracted for use in other declarations where
necessary. For example, the information is extracted by the
the reverse_iterator struct template used in adapting
reverse iterator types from the normal list iterator types.
Finally, note that there is one protected member function, a
constructor for converting a link_type pointer into an
iterator.
iterator(link_type x) : current(x) {}
Such a constructor is needed in a few places in class
list. Even though it is protected in class iterator,
class list has access to it because class iterator
grants friendship to class list.
Class const_iterator
The definition of this class almost identical to that of class
iterator. The main difference is that operator * in class
const_iterator returns a const_reference (const
T&) instead of a reference (T&):
typename const_iterator::reference
operator*() const { return current->data; }
This means that a list item pointed to by a const_iterator
cannot be changed by assignment (or any other means); the
const in const T& prevents any attempt at using
*i as an lvalue.
list list1;
list1.push_back('a');
list::const_iterator i = list1.begin();
// Now iterator i points to the list node containing 'a'
cout << *i;
*i = 'b'; // COMPILATION ERROR
cout << *i;
Exercise: Another way to prevent the use of *i as an lvalue
would be to define
T operator*() const { return current->data; }
so that a copy of the data member is returned instead of a reference.
What would be the disadvantage of this method in comparison with
returning const T&?
There is one function defined by class const_iterator with no
analog in class iterator, namely the constructor
const_iterator(const iterator& x) : current(x.current) {}
This converts an iterator into a const_iterator.
Thus for example if we have an iterator i we can use
it to initialize a const_iterator j for use in computations
where we do not need to change the values to which the iterator points
and we want to be prevented from accidentally doing so.
Exercise: Explain why there is no corresponding constructor in
class iterator for converting a const_iterator into
an iterator.
Exercise: Where in the definition of class
const_iterator is there any need for the friendship
privileges granted it by class iterator?
Exercise (advanced): The definitions of class iterator
and class const_iterator are so similar that one would like,
if possible, to find a way of avoiding the code repetition. Discuss
the pros and cons of using class inheritance and virtual functions to
solve this problem.
Reverse Iterators
There are two other iterator types defined within class list,
namely reverse_iterator and const_reverse_iterator.
These types provide for traversal of lists in the opposite of the
normal direction, by exchanging the roles of ++ and --.
When such iterators are combined with STL generic algorithms, the effect
is to make the algorithms "see the sequence in reverse".
Consider, for example, the generic find algorithm, which
normally returns the first occurrence of a value in a traversal
starting from the beginning of the sequence. When combined with
reverse iterators it finds the first occurrence in a backward
traversal starting from the end of the sequence, which would be the
last occurrence in a normal traversal from the beginning.
Another example, using the generic accumulate algorithm, is
discussed in Musser & Saini, Section 2.5 and Example
2-13.
Instead of nested class definitions, like those for iterator
and const_iterator, the reverse iterator types are defined
by typedefs that use one STL's iterator adaptors,
reverse_iterator:
// Define reverse iterators using reverse iterator adaptors:
typedef std::reverse_iterator
reverse_iterator;
typedef std::reverse_iterator
const_reverse_iterator;
The way the reverse_iterator adaptor is defined is a bit
complicated and we won't go into it here, but the basic idea of the
class is simple: it redefines ++ as -- and
-- as ++.
Special list memory management
Recall that we said that special storage management techniques are
used in the Hewlett-Packard implementation of STL lists so that
new and delete do not have to be called for each
node. The problem with calling new and delete too
frequently is that they are very general run-time facilities and are
not optimized for the rather special usage in the list class. (They
have to be able to respond to requests to allocate or deallocate
blocks of any size, but all of our list nodes are of a fixed size and
their allocation can be handled with much simpler and faster methods
than are necessary for the general allocation problem.)
As a user of STL lists you don't have to worry about this issue,
because it's all done automatically. Nevertheless it is instructive to see
how it is done, for at least two reasons:
- The way storage is obtained and released involves additional useful list
processing techniques.
- The way storage is initialized and de-initialized sheds light on
the role of the constructor and destructor members of C++ classes. It
turns out that one needs a finer degree of control over initialization
and de-initialization than is normally supplied by constructors and
destructors, but (as usual) C++ does provide the necessary degree of
control.
Overview
The basic idea of storage management for list processing is to maintain a
"free storage list" (or "free list"), containing nodes that are not
in use by any application-level list. When a node is needed for
insertion into an application-level list, one can be taken off of
the free list. When a node is deleted from an application-level list,
it can be inserted into the free list.
How is the free list created in the first place? One way is to
allocate a array, or block, of N nodes and step through them
one by one, linking each to the one that follows it (or the one that
precedes it), forming a linked list of N nodes. Later, as
application-level lists are formed and destroyed, each node in the
block belongs either to one of these lists or to the free list.
A more clever variation is to do this traversal only as required, by
maintaining a boundary marker between nodes that have been linked and
ones that haven't. Initially the boundary is at the beginning of the
block, but as nodes are needed it is moved forward one node at a time.
To the left of the boundary, each node belongs either
to one of the application-level lists or to the free list; and to
the right of the boundary, the nodes have not yet been linked.
Whenever a node is needed for insertion into an application-level list
and the free list is empty, the boundary is moved one position to the
right, making another node available for linking.
Here is a diagram of the second scheme in progress, in which
next_avail marks the boundary and free_list points
to the beginning of the list of free nodes (it would be NULL
if all nodes were in use):
With either scheme, if we use up all N nodes in
application-level lists, we somehow have to create free nodes in some
other block of storage. One textbook suggests using an
"array-doubling" scheme much like STL uses when it runs out of space
in a vector. But array-doubling involves the overhead of copying the
currently active data into another block of memory, and it is really
only required if one needs all storage for the data structure
in one block of contiguous memory to
enable random-access. For lists, random-access isn't possible anyway,
and we can avoid the overhead of copying: when we run out of storage
in one N-element block, we just allocate another one. The
new block's position relative to the existing one doesn't matter; a
list can have nodes in both blocks because they are connected by links
anyway.
Obviously, we can keep doing this, allocating as many blocks of size
N as we need (or until we run out of storage completely). These
blocks can be obtained as needed using a new expression, in the normal
way. (Or almost normal, see below.) All we
have to be careful about is to keep track of where all these blocks
are, so that we can later clean up all the storage, when all need for
it is past. How do we do this? By keeping track of the locations
of the memory blocks in a linked list!
Here is a diagram of how the blocks of storage for lists are organized:
Exercise. Where do we get the nodes for the top-level list of block
locations? (Try to answer this before looking at the code details,
discussed next.)
Coding details
The code for implementing this overall scheme is shorter than the
explanation.
Block allocation function
First, there is a function block_allocate for
allocating the blocks of memory. Basically this function is just a
call of the global operator new to get a block of storage big enough
to hold a given number of items of type T:
template
inline T* block_allocate(ptrdiff_t size, T*) {
return (T*)(::operator new((size_t)(size * sizeof(T))));
}
This looks like a rather complicated way of doing things, compared to
the normal way of using new to get an array of size
elements of type T:
new T[size];
The problem with this would be that the elements of the array would
all be initialized using the default constructor of type T.
What we need instead at this point is a block of "raw" (uninitialized)
memory. We will call the type T constructor later, when
part of the memory is made into a node to be inserted in some list.
The reason for this strategy is efficiency:
If we initialized the entire block with N calls to the
default constructor, when we later use one node in a list we would
have to use the T assignment operator to put the value we
really want in the node. The effort the default constructor went to to
initialize the storage would be wasted. Furthermore, when we return
the node to the available space list we would have to do another
assignment to replace the current value with the default constructor
value (in order to release as much storage as possible).
Since we delay initializing the storage until we need it, we have to
make sure we do initialize it properly when we do use it for a node in an
application-level list. Let's now look back at how that was done in
the insert member function in class list:
template
inline list::iterator list::insert(iterator position, const T& x) {
link_type tmp = manager.get_node();
new (&(tmp->data)) T(x);
tmp->next = position.current;
tmp->prev = (position.current)->prev;
(position.current)->prev->next = tmp;
(position.current)->prev = tmp;
++length;
return tmp;
}
We said earlier that we could think of the second line of the
body of this function as though it were written
tmp->data = x;
but in fact the actual meaning is different. The expression
&(tmp->data)) in the new expression is called a
new-placement, because it tells the location of already
allocated storage. This kind of new expression does not
allocate any storage; it just does a constructor call, T(x),
to initialize the storage to which the new-placement points.
Thus raw storage is converted into a properly initialized type
T object.
Exercise. Consider the case in which T is itself is a
list instance, as in
list< list > list1;
What is the difference in this case between
tmp->data = x;
and
new (&(tmp->data)) list(x);
List memory manager class
The rest of the allocation scheme is implemented in class
list_memory_manager. This class begins with a
struct, list_memory_manager, and some types and data
members for implementing the list of buffers (blocks) of
list_nodes:
template
class list_memory_manager {
typedef list_node* link_type;
struct list_node_buffer {
list_node_buffer* next_buffer;
list_node* buffer;
};
typedef list_node_buffer* buffer_pointer;
buffer_pointer buffer_list;
link_type free_list;
link_type next_avail;
link_type last;
ptrdiff_t buffer_size() { return 1024; }
...
There is also a private member function, add_new_buffer,
that takes care of allocating a new buffer when put_node
needs one. This is where block_allocate is called.
...
void add_new_buffer() {
buffer_pointer tmp = new list_node_buffer;
tmp->buffer = block_allocate(buffer_size(), (list_node*)0);
tmp->next_buffer = buffer_list;
buffer_list = tmp;
next_avail = buffer_list->buffer;
last = next_avail + buffer_size();
}
...
There is a corresponding function deallocate_buffers, which
is public because the list destructor needs to call it to get rid
of all of the buffers when the last list has been destroyed.
...
public:
void deallocate_buffers() {
while (buffer_list) {
buffer_pointer tmp = buffer_list;
buffer_list = (buffer_pointer)(buffer_list->next_buffer);
delete (tmp->buffer);
delete tmp;
}
free_list = 0;
next_avail = 0;
last = 0;
}
...
Finally, we have the main allocation member functions used in
the list class:
...
link_type get_node() {
link_type tmp = free_list;
if (free_list != NULL) {
free_list = (link_type)(free_list->next);
return tmp;
}
if (next_avail == last)
add_new_buffer();
return next_avail++;
}
void put_node(link_type p) {
p->next = free_list;
free_list = p;
}
};
Exercise. Hand execute a few calls to get_node,
considering the cases where the current buffer (block) does or does
not have space left in it.
Exercise. Why doesn't put_node ever deallocate a buffer?
This completes our discussion of the STL List class.