zvrba.net

Introduction

In addition to standard C malloc/realloc/free functions, C++ has new/delete operators for dynamic memory allocation. These are needed to properly handle object construction and destruction. Obviously missing is a C++ equivalent of realloc. Namely, when realloc can't grow memory block in-place, it needs to copy it to new location. This copying is actually memcpy() which might not make sense for C++ objects defining copy constructor. Such objects must be copied by invoking their copy-constructor.

It is not allowed to mix the two styles of memory allocation (although new/delete often just call malloc/free to obtain raw memory). So the only legal way to grow an array is to allocate the new (larger) array, invoke copy constructor on each existing object, and deallocate the old array. Much of this functionality is encapsulated in the default implementation-provided allocator class, which underneath usually calls placement new and delete operators.

Problem with `std::vector`

The vector STL container has to maintain elements contiguosly, and &v[0] must give an address of the first element. Problem arises when a vector is grown element by element, and the number of elements to be inserted is not known in advance (otherwise, space could be allocated in advance with the resize() method). When the vector's allocated size is exceeded, the internal storage array is grown as described above. This can amount to large copying overhead, esp. if the objects are large or have complex copy constructors. Some people have suggested use of the dequeue in such cases. Currently, this indeed seems to be the best idea.

Possible solution

Many programs are executing in a virtual address space. Thus, realloc doesn't really need to physically copy a chunk of memory, it just needs to find a large enough range of virtual addresses and remap the current chunk into the new range. This is how realloc() in Linux glibc is implemented. The amount of neccessary changes in system page tables is still proportional to the size of allocated memory, but on much smaller scale. For example, with a small page size (4kB), to grow 1-page chunk to 2 pages, one does not need to copy 4096 bytes. Only one page table entry has to be modified. That's a great saving. It gets even better with larger page sizes. (No, I haven't forgotten the time to locate the larger range of virtual addresses. This work has to be done in any case, copying or not). Unfortunately, the very useful mremap() system call that realloc internally is using to perform this remapping is present only on Linux. No other popular Unix-like OS (Solaris, or *BSD) implements it.

The deficiency of C++ standard allocator interface is that, although it is supposed to manage just raw memory, it doesn't offer any mechanism to resize the raw memory chunk. A temporary fix could be to partially specialize the std::vector template for PODs, in the following way: Make it use allocator which calls malloc/free instead of new/delete. This is allowed for POD-types. Override the resize() method to call realloc() Although conceptually simple, this solution won't work. After looking through some STL implementations, they directly manage memory instead of making a call to resize() method. I guess the C++ programmers are doomed to sub-optimal performance in certain cases until this issue is satisfactorily resolved and the mremap() call (with Linux's semantic) becomes widespread. The annoying thing is that these "certain cases" could have been opened to "clean" optimization (not relying on implementation details) if the standard provided some resizing methods in the allocator interface.

Introduction

Problem with std::vector

Possible solution

Problem with `std::vector`