A long time ago, I made a game in which a sort of connected-component labeling is required to implement AI part. I used the two-pass algorithm unknowingly at that time.

Recently, I got to know that I can make them faster using bit-scan based method instead. It uses 1-bit-per-pixel data as input, instead of typical bytes-per-pixel input. Then it finds every linear chunks in each scan-lines using BSF instruction. Please see the code below. `Cut`

is a struct which saves information of a linear chunk of bit 1s in a scan-line.

```
Cut* get_cuts_in_row(const u32* bits, const u32* bit_final, Cut* cuts) {
u32 working_bits = *bits;
u32 basepos = 0, bitpos = 0;
for (;; cuts++) {
//find starting position
while (!_BitScanForward(&bitpos, working_bits)) {
bits++, basepos += 32;
if (bits == bit_final) {
cuts->start_pos = (short)0xFFFF;
cuts->end_pos = (short)0xFFFF;
return cuts + 1;
}
working_bits = *bits;
}
cuts->start_pos = short(basepos + bitpos);
//find ending position
working_bits = (~working_bits) & (0xFFFFFFFF << bitpos);
while (!_BitScanForward(&bitpos, working_bits)) {
bits++, basepos += 32;
working_bits = ~(*bits);
}
working_bits = (~working_bits) & (0xFFFFFFFF << bitpos);
cuts->end_pos = short(basepos + bitpos);
}
}
```

First, it uses the assembly BSF instruction to find the first position bit 1 appears. After it is found, it finds the the first position bit 0 appears after that position using bit inversion and bit masking, then repeats this process.

After getting the starting position and the ending position of all linear chunks of 1s (I prefer refer them as 'cuts') in every scan-line, it gives labels to them in CCL manner. For the first row, every cuts get different labels.

For each cut in rest rows, it checks if there are upper cuts which are connected to it first. If no upper cuts are connected to it, it gets new label. If only one upper cuts are connected to it, it gets the copy of the label. If many upper cuts are connected to it, those labels are merged and it gets the merged one. This can be done easily using two progressing pointers of upper chunks and lower chunks. Here is the full code doing that part.

```
Label* get_labels_8c(Cut* cuts, Cut* cuts_end, Label* label_next) {
Cut* cuts_up = cuts;
//generate labels for the first row
for (; cuts->start_pos != 0xFFFF; cuts++) cuts->label = [GET NEW LABEL FROM THE POOL];
cuts++;
//generate labels for the rests
for (; cuts != cuts_end; cuts++) {
Cut* cuts_save = cuts;
for (;; cuts++) {
u32 start_pos = cuts->start_pos;
if (start_pos == 0xFFFF) break;
//Skip upper slices ends before this slice starts
for (; cuts_up->end_pos < start_pos; cuts_up++);
//No upper slice meets this
u32 end_pos = cuts->end_pos;
if (cuts_up->start_pos > end_pos) {
cuts->label = [GET NEW LABEL FROM THE POOL];
continue;
};
Label* label = label_equiv_recursion(cuts_up->label);
//Next upper slice can not meet this
if (end_pos <= cuts_up->end_pos) {
cuts->label = label;
continue;
}
//Find next upper slices meet this
for (; cuts_up->start_pos <= end_pos; cuts_up++) {
Label* label_other = label_equiv_recursion(cuts_up->label);
if (label != label_other) [MERGE TWO LABELS]
if (end_pos <= cuts_up->end_pos) break;
}
cuts->label = label;
}
cuts_up = cuts_save;
}
return label_next;
}
```

After this, one can use these information for each scan-line to make the array of labels or any output he want directly.

I checked the execution time of this method and then I found that it's much faster the two-scan method I previously used. Surprisingly, it turned out to be much faster than the two-scan one even when the input data is random. Apparently the bit-scanning algorithm is best for data with relatively simple structures where each chunks in scan-lines are big. It wasn't designed to be used on random images.

What baffled me was that literally nobody says about this method. Frankly speaking, it doesn't seem to be an idea that hard to come up with. It's hard to believe that I'm the first one who tried it.

Perhaps while my method is better than the primitive two-scan method, it's worse than more developed ones based on the two-scan idea so that anyway doesn't worth to be mentioned.

However, if the two-scan method can be improved, the bit-scan method also can. I myself found a nice improvement for 8-connectivity. It analyses neighboring two scan-lines at ones my merging them using the bit OR instruction. You can find the full codes and detailed explanation on how they work here.

I got to know that there is a benchmark for CCL algorithms named YACCLAB. I'll test my algorithms in this with best CCL algorithms to see how really good they are. Before that, I want to ask several things here.

My question is,

Are these algorithms I found really new? It's still hard to believe that nobody has ever thought CCL algorithm using bit-scanning. If it's already a thing, why I can't found anyone says about it? Were bit-scan based algorithms proven to be bad and forgotten?

If I really found a new algorithm, what should I do next? Of course I'll test it in more reliable system like YACCLAB. I'm questioning about what I should do next. What I should to to make these algorithms mine and spread them?

My reasoning was getting too long for a comment, so here we are. There is a lot to unpack. I like the question quite a bit even though it might be better suited for a computer science site.

The thing is, there are two layers to this question:

- Was a new algorithm discovered?
- What about the bit scanning part?

You are combining these two so first I will explain why I would like to think about them separately:

The **algorithm** is a set of steps(more formal definition) that is language-agnostic. As such it should work even without the need for the bitscanning.

The **bit scanning** on the other hand I would consider to be an **optimization technique** - we are using a structure that the computer is comfortable with which can bring us performance gains.

Unless we separate these two, the question gets a bit fuzzy since there are several possible scenarios that can be happening:

- The algorithm is new and improved and bit scanning makes it even faster. That would be awesome.
- The algorithm is just a new way of saying
**"two pass"**or something similar. That would still be good if it beats the benchmarks. In this case it might be worth adding to a library for the CCL. - The algorithm is a good fit for some cases but somehow fails in others(speed-wise, not correction wise). The bit scanning here makes the comparison difficult.
- The algorithm is a good fit for some cases but completely fails in others(produces incorrect result). You just didn't find a counterexample yet.

Let us assume that 4 isn't the case and we want to decide between **1 to 3**. In each case, the bit scanning is making things fuzzy since it most likely speeds up things even more - so in some cases even a slower algorithm could outperform a better one.

**So first** I would try and remove the bit scanning and re-evaluate the performance. After a quick look it seems that the algorithms for the CCL have a linear complexity, depending on the image size - you need to check every pixel at least once. Rest is the fight for lowering the constant as much as possible. (Number of passes, number of neighbors to check etc.) I think it is safe to assume, that you can't do better then linear - so the first question is: does your algorithm improve on the complexity by a multiplicative constant? Since the algorithm is linear, the factor directly translates to performance which is nice.

**Second question** would then be: Does bit scanning further improve the performance of the algorithm?

Also, since I already started thinking about it, what about a chess-board pattern and 4-connectivity? Or alternatively, a chessboard of 3x3 crosses for the 8-connectivity.

answered on Stack Overflow Jan 21, 2021 by Shamis

User contributions licensed under CC BY-SA 3.0