Ok, I've started working on my own version of h2incn - a program that will parse C header files and convert them into Nasm compatible .inc files. After creating the initial program outline I am now ready to begin adding in the preprocessing support.
One of the things that I need is a fast way of performing lookups of defines and typedefs. I've settled on using hash maps and the hash function algorithm FNV-1 which, according to the authors, provides for a nice dispersion across a hash map. Thus I begin by defining a hash map to contain the key/value pairs and develop the hashing function.
I want the hashing function and the hash map to be capable of holding arbitrary types and data sizes. Thus the routines being developed do not depend on simple character strings (although you can indeed supply them to the functions). That way, the routines may be used to hold objects of all types for any purpose ( token processing, game world objects, in-memory databases, caching, etc. ).
The following is my Nasm modified version of the FNV-1 hash algorithm for use in both 32-bit and 64-bit systems. Note that the 32-bit version uses the standard C calling convention while the 64-bit version uses either Windows or Linux fastcall calling conventions. It may be optimized further (an exersize left for the reader) but I'm quite happy with it. I would love to know if others find use for it and what their collision findings are...
;
; FNV1HASH.ASM
;
; Copyright (C)2010 Rob Neff
; Source code is licensed under the new/simplified 2-clause BSD OSI license.
;
; This function implements the FNV-1 hash algorithm.
; This source file is formatted for Nasm compatibility although it
; is small enough to be easily converted into another assembler format.
;
; Example C/C++ call:
;
; #ifdef __cplusplus
; extern "C" {
; #endif
;
; unsigned int FNV1Hash(char *buffer, unsigned int len, unsigned int offset_basis);
;
; #ifdef __cplusplus
; }
; #endif
;
; int hash;
;
; /* obtain 32-bit FNV1 hash */
; hash = FNV1Hash(buffer, len, 2166136261);
;
; /* if desired - convert from a 32-bit to 16-bit hash */
; hash = ((hash >> 16) ^ (hash & 0xFFFF));
;
[section .text]
%ifidni __BITS__,32
;
; 32-bit C calling convention
;
%define buffer [ebp+8]
%define len [ebp+12]
%define offset_basis [ebp+16]
global _FNV1Hash
_FNV1Hash:
push ebp ; set up stack frame
mov ebp, esp
push esi ; save registers used
push edi
push ebx
push ecx
push edx
mov esi, buffer ;esi = ptr to buffer
mov ecx, len ;ecx = length of buffer (counter)
mov eax, offset_basis ;set to 2166136261 for FNV-1
mov edi, 1000193h ;FNV_32_PRIME = 16777619
xor ebx, ebx ;ebx = 0
nextbyte:
mul edi ;eax = eax * FNV_32_PRIME
mov bl, [esi] ;bl = byte from esi
xor eax, ebx ;al = al xor bl
inc esi ;esi = esi + 1 (buffer pos)
dec ecx ;ecx = ecx - 1 (counter)
jnz nextbyte ;if ecx != 0, jmp to NextByte
pop edx ; restore registers
pop ecx
pop ebx
pop edi
pop esi
mov esp, ebp ; restore stack frame
pop ebp
ret ; eax = fnv1 hash
%elifidni __BITS__,64
;
; 64-bit function
;
%ifidni __OUTPUT_FORMAT__,win64
;
; 64-bit Windows fastcall convention:
; ints/longs/ptrs: RCX, RDX, R8, R9
; floats/doubles: XMM0 to XMM3
;
global FNV1Hash
FNV1Hash:
xchg rcx, rdx ;rcx = length of buffer
xchg r8, rdx ;r8 = ptr to buffer
%elifidni __OUTPUT_FORMAT__,elf64
;
; 64-bit Linux fastcall convention
; ints/longs/ptrs: RDI, RSI, RDX, RCX, R8, R9
; floats/doubles: XMM0 to XMM7
;
global _FNV1Hash
_FNV1Hash:
mov rcx, rsi
mov r8, rdi
%endif
mov rax, rdx ;rax = offset_basis - set to 14695981039346656037 for FNV-1
mov r9, 100000001B3h ;r9 = FNV_64_PRIME = 1099511628211
mov r10, rbx ;r10 = saved copy of rbx
xor rbx, rbx ;rbx = 0
nextbyte:
mul r9 ;rax = rax * FNV_64_PRIME
mov bl, [r8] ;bl = byte from r8
xor rax, rbx ;al = al xor bl
inc r8 ;inc buffer pos
dec rcx ;rcx = rcx - 1 (counter)
jnz nextbyte ;if rcx != 0, jmp to nextbyte
mov rbx, r10 ;restore rbx
ret ;rax = fnv1 hash
%endif
Note the difference in housekeeping required in 32-bit code compared to it's 64-bit big brother. As a leaf function(a function which makes no other calls) this code becomes incredibly small and fast. The function prologue/epilogue can be made ridiculously easy on 64-bit systems using the fastcall convention.
Optimizing for register pressure is so much simpler since Windows and Linux allow us to trash the R10 and R11 registers along with the registers used for parameters.
In a nutshell: keep your assembly code small, your parameter list within the fastcall convention, create leaf functions if you can (unless calling your own functions), and write portable code.