AVL comparison method, correct approach or not?

Question

AVL comparison method, correct approach or not?

I'm using a small AVL tree library and as you know, you need to define a comparison method for your AVL tree nodes. Libraries pass AVL keys to this method for sorting elements of tree. Here is the syntax of comparison method:

int compare(const void* l, const void* r)
{
}

This function should return a positive value when l>r, zero when l==r and negative value when l<r and this method's efficiency has a tremendous effect on AVL's efficiency.

Now assume that AVL tree keys are uint32_t. A simple method is using following compare function:

int compare(const void* l, const void* r)
{
    if (*(uint32_t*)l > *(uint32_t*)r)
        return 1;
    if (*(uint32_t*)l < *(uint32_t*)r)
        return -1;
    return 0;
}

But this method's amortized running time is a disaster on well balanced data because of high chance of jump prediction fails on if statements. I normally write this comparison as following:

int compare(const void* l, const void* r)
{
    return *(uint32_t*)l - *(uint32_t*)r;
    // if you want to overcome underflow in subtraction, you can write it like this:
    // return (int64_t)*(uint32_t*)l - (int64_t)*(uint32_t*)r;
    // But this will not solve problem with casting to int for returning result
}

Which is a huge efficiency improvement. But considering return type of int, I wonder if overflow on casting from uint32_t to int or underflow on subtraction (or any other bigger key data size in general case) may cause incorrect tree structure. If your key values take up to 31 bits, every thing works perfectly with this compare method. But if keys take up to 32 bits, things will get tricky. For example see following results:

`l` points to value | `r` points to value | result
--------------------------------------------------
2                   |1                    | 1
2                   |0xFFFFFFFF           | 3

This means this compare function searches both value in same side of tree, while in the first case l>r and second one is l<r considering both keys as unsigned values.

I wonder if using this compare method may not result in finding a node in AVL tree when node really exists. what do you think? If this method may fail to find nodes, what highly efficient compare method will be suitable for these situations?

Best Regards

c

algorithm

performance

avl-tree

asked on Stack Overflow Jul 24, 2018 by

Afshin • edited Jul 24, 2018 by

Afshin

1 Answer

A simple, UB-free method of comparison is as follows:

int compare (void* a, void* b)
{
    unsigned int aa = *(unsigned int*)a;
    unsigned int bb = *(unsigned int*)b;
    return (aa > bb) - (aa < bb);
}

On x86, gcc compiles it to this rather efficient code (with -O2):

  mov ecx, DWORD PTR [rdi]
  mov edx, DWORD PTR [rsi]
  xor eax, eax
  cmp ecx, edx
  seta al
  sbb eax, 0
  ret

answered on Stack Overflow Jul 24, 2018 by

n. 'pronouns' m. • edited Jul 24, 2018 by

n. 'pronouns' m.

User contributions licensed under CC BY-SA 3.0