Author Topic: counting bits in a dd (Read 23485 times)

beginthreadex · « **on:** August 01, 2014, 08:15:25 PM »

I'm trying to return the count of bits in a number using a lookup. While this is working for my commented "bitchecks" block it's not lookup for my actual lookups. I always get "0" as my return. My end result is to compare the performance between my "bit checks" routine and using a lookup.

Code: [Select]

global _start

section .data
	bitcount db	0, 1, 1, 2, 1, 2, 2, 3, 1, 2, 2, 3, 2, 3, 3, 4, 1 
		 db	2, 2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5, 1, 2 
		 db	2, 3, 2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5, 2, 3, 3 
		 db	4, 3, 4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 1, 2, 2, 3 
		 db	2, 3, 3, 4, 2, 3, 3, 4, 3, 4, 4, 5, 2, 3, 3, 4, 3 
		 db	4, 4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 2, 3, 3, 4, 3, 4 
		 db	4, 5, 3, 4, 4, 5, 4, 5, 5, 6, 3, 4, 4, 5, 4, 5, 5 
		 db	6, 4, 5, 5, 6, 5, 6, 6, 7, 1, 2, 2, 3, 2, 3, 3, 4 
		 db	2, 3, 3, 4, 3, 4, 4, 5, 2, 3, 3, 4, 3, 4, 4, 5, 3 
		 db	4, 4, 5, 4, 5, 5, 6, 2, 3, 3, 4, 3, 4, 4, 5, 3, 4 
		 db	4, 5, 4, 5, 5, 6, 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5 
		 db	6, 5, 6, 6, 7, 2, 3, 3, 4, 3, 4, 4, 5, 3, 4, 4, 5 
		 db	4, 5, 5, 6, 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5 
		 db	6, 6, 7, 3, 4, 4, 5, 4, 5, 5, 6, 4, 5, 5, 6, 5, 6 
		 db	6, 7, 4, 5, 5, 6, 5, 6, 6, 7, 5, 6, 6, 7, 6, 7, 7, 8

section .text
_start:
	mov ecx, 100000000				;test loop count
	mov edi, 00000000111111111111111111111111b	;number to get the bit count for
	;mov edi, 10000000000000000000000000000000b

	.testing:
		xor ebx, ebx				;clear out the return
		xor eax, edi				;keep a detroyable version of the number

		;.bitchecks:
		;	bsf esi, eax
		;	jz .done
		;	inc ebx
		;	btr eax, esi
		;	jmp .bitchecks

		xor edx, edx				;clear the register
		mov dl, al				;set the edx to just the lowest part of the number
		movzx ebx, byte [bitcount + edx]	;return the byte count via lookup
		mov dl, ah				;set the edx to the next part of the number
		movzx esi, byte [bitcount + edx]	;return the byte count via lookup
		add ebx, esi				;add the lookup value to the total count
		bswap eax				;swap the low and high parts of the target number
		mov dl, al				;set the edx to just the lowest part, next lookup
		movzx esi, byte [bitcount + edx]	;return the byte count via lookup
		add ebx, esi				;add the lookup value to the total count
		mov dl, ah				;set the edx to the next part of the number
		movzx esi, byte [bitcount + edx]	;return the bute count via lookup
		add ebx, esi				;add the final count to the return

		.done:
			dec ecx
			jnz .testing
	
	.exit:
	;the return should be in ebx as the return code
	mov eax, 1
	int 0x80

beginthreadex · « **Reply #1 on:** August 01, 2014, 08:52:13 PM »

I had an "xor eax, edi" that should have been a 'mov'. Duh.

nasm32 · « **Reply #2 on:** September 15, 2014, 09:04:39 AM »

I strongly discourage you from use lookup tables except in these cases:

* If you are working on large sets of data that needs to be converted all at once
* If your program is extremely small and uses few external references

Otherwise, stay away from lookup tables. Chances are that any API call or any other references will flush the buffer out of cache and the speed you gain, will be punished somewhere else in your program and the total gain will be negative.

Try to keep everything in code, except for the two points mentioned above and you'll be fine.

Will you loop through many times? (Use lookup table). Will you run it randomly and with only one call at the time (Use code)

Don't trust the timing when you measure the speed of a lookup table, that speed will be totally different in the final program, it depends on layout, it depends what code executed before it, it depends on total cache usage, size and how many other references are made. The speed you measure with a performance counter in a test program is useless. Here is what the timing does:

Test1[LookupTable]: Iteration#1: 1 microsecond, Iteration #2: 1 microsecond, Iteration #3-iteration#infinity: 500 nanoseconds.

The overall speed will show you that lookup tables perform better. But when you run this in your final program, the speed you get is the same as the first iteration, which is 1 microsecond, and not 500 nanoseconds. You can easily be fooled by the timing.

If you decide to use it still, use cpuid to find the size of cache on the target computer to see if the table fits good enough, and then align the data so that it doesn't conflict other data sets on "conflicting" data addresses in ram. It takes a bit of hand crafting to achieve that.

If you plan to put lookup tables in an object file and then reuse them across many programs, forget about it. All lookup tables must be specialized in each program and one final thing to remember about lookup tables is that it's not sustainable to keep programming several lookup tables, it will eventually fill up the cache and it's not good programming practice, it should be a rare resource, very very rare. The remaining "large" chunk of your program and whatever modules are embedded into that program has already claimed and filled up the cache, so it takes almost nothing to pollute it further.

Here is a useful tip for you. Your data cache is 32 KB and your code cache is also 32 KB. Which of these will be filled up first in a program? The resb (reserve bytes) tells me it is more likely to declare huge quantity of data in a program than it is to declare huge quantity of code. Code has to be written byte by byte, but data can be declared in enormous chunks at the time and there is a reason for that, because we have the need for more data than we have need for more code. So data will fill up more quickly than code will fill up. So keep things in code because that is the area of cache that is least likely to fill up first.

It is also worth noticing that ram speed varies in a much broader way on each computer than cpu speed. May be worth considering. Unless you want your program to perform like a bouncing ball on the computer "in the world" and consume up all the best cache memory they have, stick to code.

Lookup tables are mostly a myth from the dos days, but it did really work in the days of dos because in those days only one program ran at the time.

You'll be much happier if you do this instead:

Code: [Select]

; Straight forward code and not using too greedy alignment values to conserve expensive memory
align 4  ; This is enough for most purposes, you don't need to align on 16 or higher
mov eax,10
mov ecx,20

That is a happy and healthy code and you'll be happier as a programmer too

The keywords you should remember: Think "average". Think "streamlining". Think "normal code". These are boring terms, nobody wants to be average, but it's still going to benefit you if you streamline everything with an "average". If you specialize something, the remaining "average" is going to suffer from that. That is why it is better to stay average, and go crazy just on occasions when it is really necessary. "Necessary" should be not be taken lightly.

NASM - The Netwide Assembler

News:

Author Topic: counting bits in a dd (Read 23485 times)

beginthreadex

counting bits in a dd

beginthreadex

Re: counting bits in a dd

nasm32

Re: counting bits in a dd