NASM - The Netwide Assembler

NASM Forum => Programming with NASM => Topic started by: Ux on August 29, 2012, 01:29:13 PM

Title: Branch avoidance
Post by: Ux on August 29, 2012, 01:29:13 PM
Hi folks,
I have some code that does a bunch of these:

cmp CL, 29
jne .L2
mov AL,1 (or INC AL)
xor CL, CL
.L2:

So in other words, an 8-bit comparison to a counter, & if no branch then followed by clearing the counter and setting a flag that indicates that a clear occurred.

My question is, how can I avoid the branch? I can see how I can avoid the MOV by replacing it with a CMC and ADC before the branch, but the branch is still there. My impression also is that CMOV takes a lot of time.

Thanks.
Title: Re: Branch avoidance
Post by: Bryant Keller on August 29, 2012, 09:53:13 PM
Are you thinking of something like this?

Code: [Select]
CMP CL, 29   ; Compare CL to 29
SETZ AL      ; If (CL=29) AL = 1
SETNZ CL     ; If (CL=29) CL = 0
Title: Re: Branch avoidance
Post by: Frank Kotler on August 30, 2012, 12:59:31 AM
Another possible way to avoid a branch would be to use a "look up table". Create an array of 256 bytes with zeros where you want al to be zero and ones where you want al to be one. Then use ecx as an index into the table, loading al "whether it needs it or not"...
Code: [Select]
movzx ecx, cl  ; if high bits not already clear
mov al, [table + ecx]
...

I don't see how that can be used to clear the counter, though. May not help... What other values could cl take, and what do you want to do with 'em?

Best,
Frank

Title: Re: Branch avoidance
Post by: Ux on August 30, 2012, 02:35:53 PM
Code: [Select]
SETNZ CL     ; If (CL=29) CL = 0

That would work, except I have a series of these blocks of code. If any one of them sets the flag, the others have to leave it set. But SETNZ CL will clear CL if zero.

If the x86 had something like INCNZ, or increment-if-nonzero, then I'd be OK.

Title: Re: Branch avoidance
Post by: Ux on August 30, 2012, 02:40:50 PM
Another possible way to avoid a branch would be to use a "look up table".
I don't see how that can be used to clear the counter, though. May not help... What other values could cl take, and what do you want to do with 'em?

I thought about that but the delay in accessing memory (L1 cache) plus calculating the EA would, I think, be too long.

What I ended up doing was a CMOV from a register that happens to be 0 in the lower 8 bits.

To update AL, I do a CMC then ADC i.e. if there's a borrow in the compare, then make AL nonzero.

The speedup is very slight over using the branch.