ianhenderson.org / 2024 september 30

a size-generic pointer type for C

I recently read Type-erased generic functions for C: A modest non-proposal, which describes a way to add generics to C by binding type parameters and allowing metadata of bound type parameters (like the size and alignment of objects of the type) to be passed to functions implicitly. It reminded me of a somewhat-related idea (also inspired by Swift generics) that's been bouncing around my head since earlier this year, so I thought I might as well write it up here.

The idea is to add a new type of pointer, an "any-pointer", that encodes both the pointer itself and the size of the object being pointed to.

#define any _Any
int i;
any *a = &i;
printf("%p %p %zu\n", a, a + 1, sizeof(*a)); // (e.g.) 0x16d81755c 0x16d817560 4

A pointer to any complete type T can be converted (implicitly or explicitly) to an any-pointer—this conversion encodes sizeof(T) in the any-pointer automatically. The sizeof operator yields the encoded size when applied to an any-pointer dereference expression, and any-pointer arithmetic is based on the encoded size.

The benefit of an any-pointer type over a loose void * and size_t is the ease of initialization and access. It's a way of passing a size implicitly that fits C's casual style and avoids awkward brackets-and-type-parameters-everywhere notation.

some uses of any-pointers

vector push with realloc and memcpy

Calling functions like realloc and memcpy often involves multiplying the number of elements by the size of each element. Using any-pointers in these functions passes the element size implicitly and lets the functions themselves do the multiplication (like calloc), which avoids mistakes and missing overflow checks. Compare the following listings, with the proposed code on the left and the current code on the right. The functions with underscores take any * rather than void * and numbers of elements rather than numbers of bytes.

any *realloc_(any *ptr, size_t count);
any *memcpy_(any *restrict dst, const void *restrict src, size_t count);

struct vector {
    size_t length;
    size_t capacity;

    any *elements;
};

void push_(struct vector *v, const void *elem)
{
    if (v->length == v->capacity) {
        size_t capacity = MAX(v->capacity * 2, 8);
        v->elements = realloc_(v->elements, capacity);
        if (capacity < v->capacity || !v->elements)
            abort();
        v->capacity = capacity;
    }
    memcpy_(&v->elements[v->length++], elem, 1);
}
void *realloc(void *ptr, size_t size);
void *memcpy(void *restrict dst, const void *restrict src, size_t size);

struct vector {
    size_t length;
    size_t capacity;
    size_t element_size;
    void *elements;
};

void push(struct vector *v, const void *elem)
{
    if (v->length == v->capacity) {
        size_t capacity = MAX(v->capacity * 2, 8);
        v->elements = realloc(v->elements, capacity * v->element_size);
        if (capacity < v->capacity || !v->elements)
            abort();
        v->capacity = capacity;
    }
    memcpy((char *)v->elements + v->element_size * v->length++, elem, v->element_size);
}

size-generic sort

The standard qsort function already takes a size parameter—the any-pointer version conveniently infers its value:

void qsort_(any *base, size_t count,
    int (*cmp)(const any *, const void *));

#define N 500
int array[N];
void sort_array(int (*cmp)(const any *, const void *))
{
    qsort_(array, N, cmp);
}
void qsort(void *base, size_t count, size_t size,
    int (*cmp)(const void *, const void *));

#define N 500
int array[N];
void sort_array(int (*cmp)(const void *, const void *))
{
    qsort(array, N, sizeof(array[0]), cmp);
}

Also note the const any * parameter to the comparison function. Having the size of an element available in the comparison function lets you write truly "size-generic" comparison functions like the compare_bytes_lexicographically function in the following listing (presented here with a memcmp_ that, like the rest of these adjusted functions, works with an element count rather than a raw size):

void qsort_(any *base, size_t count,
    int (*cmp)(const any *, const void *));
int memcmp_(const any *a, const void *b, size_t count);

int compare_bytes_lexicographically(const any *a, const void *b)
{
    return memcmp_(a, b, 1);
}
void sort_elements_by_bytes_lexicographically(any *array, size_t count)
{
    qsort_(array, count, compare_bytes_lexicographically);
}

This is convenient, though it may not do what you're expecting if the element type is a struct with padding bytes. It also doesn't really depend on any-pointers, since if you're changing the qsort function anyway, you could easily add another size parameter.

working with void pointers to raw bytes

Converting a void * to an any-pointer is an error, since void isn't a complete type. It's straightforward to cast to char * if you want to deal with bytes, though, and it's arguably an improvement to see the explicit type information.

int memcmp_(const any *a, const void *b, size_t count);
int memcmp_classic(const void *a, const void *b, size_t size)
{
    return memcmp_((const char *)a, b, size);
}

explicitly constructing any-pointers

At some point (maybe you're implementing realloc_) you're going to want to construct an any-pointer that encodes an arbitrary size. There's a cheesy way to do this with VLAs:

any *make_any(void *ptr, size_t sz)
{
    return (char (*)[sz])ptr;
}

…but you'd probably just want a built-in make_any() that does it directly.