NASM - The Netwide Assembler

NASM Forum => Other Discussion => Topic started by: Rob Neff on August 16, 2010, 04:18:41 PM

Title: h2incn
Post by: Rob Neff on August 16, 2010, 04:18:41 PM

I seem to recall seeing your name attached to this project. I've downloaded the code to begin testing but outside of simple headers it does not seem to want to complete properly. Do you happen to have a more recent version of the code or is v0.5 all there is to be found?

Title: Re: h2incn
Post by: Frank Kotler on August 16, 2010, 06:03:21 PM
Not me, Johannes Kroll. 0.5 is all I've got. I think Johannes mentioned that he was working on 0.6, but it wasn't working yet, last I heard from him - some time ago! I'm afraid I don't know how to get in touch with him, or where to find further versions, if any. Sorry.


Title: Re: h2incn
Post by: Rob Neff on August 21, 2010, 08:22:08 PM
Ok, I've started working on my own version of h2incn - a program that will parse C header files and convert them into Nasm compatible .inc files. After creating the initial program outline I am now ready to begin adding in the preprocessing support.

One of the things that I need is a fast way of performing lookups of defines and typedefs. I've settled on using hash maps and the hash function algorithm FNV-1 which, according to the authors, provides for a nice dispersion across a hash map. Thus I begin by defining a hash map to contain the key/value pairs and develop the hashing function.

I want the hashing function and the hash map to be capable of holding arbitrary types and data sizes. Thus the routines being developed do not depend on simple character strings (although you can indeed supply them to the functions). That way, the routines may be used to hold objects of all types for any purpose ( token processing, game world objects, in-memory databases, caching, etc. ).

The following is my Nasm modified version of the FNV-1 hash algorithm for use in both 32-bit and 64-bit systems. Note that the 32-bit version uses the standard C calling convention while the 64-bit version uses either Windows or Linux fastcall calling conventions. It may be optimized further (an exersize left for the reader) but I'm quite happy with it. I would love to know if others find use for it and what their collision findings are...

Code: [Select]
; Copyright (C)2010 Rob Neff
; Source code is licensed under the new/simplified 2-clause BSD OSI license.
; This function implements the FNV-1 hash algorithm.
; This source file is formatted for Nasm compatibility although it
; is small enough to be easily converted into another assembler format.
; Example C/C++ call:
; #ifdef __cplusplus
; extern "C" {
; #endif
; unsigned int FNV1Hash(char *buffer, unsigned int len, unsigned int offset_basis);
; #ifdef __cplusplus
; }
; #endif
; int hash;
; /* obtain 32-bit FNV1 hash */
; hash = FNV1Hash(buffer, len, 2166136261);
; /* if desired - convert from a 32-bit to 16-bit hash */
; hash = ((hash >> 16) ^ (hash & 0xFFFF));

[section .text]

%ifidni __BITS__,32
; 32-bit C calling convention

%define buffer [ebp+8]
%define len [ebp+12]
%define offset_basis [ebp+16]

global _FNV1Hash

   push ebp                    ; set up stack frame
   mov  ebp, esp
   push esi                    ; save registers used
   push edi
   push ebx
   push ecx
   push edx

   mov  esi, buffer            ;esi = ptr to buffer
   mov  ecx, len               ;ecx = length of buffer (counter)
   mov  eax, offset_basis      ;set to 2166136261 for FNV-1
   mov  edi, 1000193h          ;FNV_32_PRIME = 16777619
   xor  ebx, ebx               ;ebx = 0
   mul  edi                    ;eax = eax * FNV_32_PRIME
   mov  bl, [esi]              ;bl = byte from esi
   xor  eax, ebx               ;al = al xor bl
   inc  esi                    ;esi = esi + 1 (buffer pos)
   dec  ecx                    ;ecx = ecx - 1 (counter)
   jnz  nextbyte               ;if ecx != 0, jmp to NextByte

   pop  edx                    ; restore registers
   pop  ecx
   pop  ebx
   pop  edi
   pop  esi
   mov  esp, ebp               ; restore stack frame
   pop  ebp
   ret                         ; eax = fnv1 hash

%elifidni __BITS__,64
; 64-bit function

%ifidni __OUTPUT_FORMAT__,win64
; 64-bit Windows fastcall convention:
;    ints/longs/ptrs: RCX, RDX, R8, R9
;     floats/doubles: XMM0 to XMM3
global FNV1Hash

   xchg rcx, rdx               ;rcx = length of buffer
   xchg r8, rdx                ;r8 = ptr to buffer

%elifidni __OUTPUT_FORMAT__,elf64
; 64-bit Linux fastcall convention
;    ints/longs/ptrs: RDI, RSI, RDX, RCX, R8, R9
;     floats/doubles: XMM0 to XMM7
global _FNV1Hash

   mov  rcx, rsi
   mov  r8, rdi


   mov  rax, rdx               ;rax = offset_basis - set to 14695981039346656037 for FNV-1
   mov  r9, 100000001B3h       ;r9 = FNV_64_PRIME = 1099511628211
   mov  r10, rbx               ;r10 = saved copy of rbx
   xor  rbx, rbx               ;rbx = 0
   mul  r9                     ;rax = rax * FNV_64_PRIME
   mov  bl, [r8]               ;bl = byte from r8
   xor  rax, rbx               ;al = al xor bl
   inc  r8                     ;inc buffer pos
   dec  rcx                    ;rcx = rcx - 1 (counter)
   jnz  nextbyte               ;if rcx != 0, jmp to nextbyte
   mov  rbx, r10               ;restore rbx
   ret                         ;rax = fnv1 hash


Note the difference in housekeeping required in 32-bit code compared to it's 64-bit big brother. As a leaf function(a function which makes no other calls) this code becomes incredibly small and fast. The function prologue/epilogue can be made ridiculously easy on 64-bit systems using the fastcall convention.

Optimizing for register pressure is so much simpler since Windows and Linux allow us to trash the R10 and R11 registers along with the registers used for parameters.

In a nutshell: keep your assembly code small, your parameter list within the fastcall convention, create leaf functions if you can (unless calling your own functions), and write portable code. ;D
Title: Re: h2incn
Post by: Rob Neff on September 06, 2010, 04:38:16 AM

While developing h2incn I needed a way to insert and search for defined values quickly. Using my recently developed FNV-1 hash algorithm I've created a hash map (basically an array of pointers indexed by the hash value) with collisions now handled by a Binary Tree.

The files can be found at: (

The Binary Tree assembles in both 32-bit and 64-bit Linux and Windows.  It is not a balanced tree nor does it permit duplicates but it's perfect for this application. Node deletion is not completed yet but will be in the next few days.  All assembly files use Nasm syntax, of course ;)

Download the files bintree.asm and for the implementation.  Download hashmap.c and hashmap.h which shows it's usage.

This project relates well with the NASMX project. And yes, you can now download the tarball and compile and execute h2incn as it's slightly more stable although not feature complete yet.  I'm still trucking along with development...
Title: Binary Tree
Post by: Rob Neff on September 14, 2010, 12:55:18 AM
There is an update available to the Binary Tree routines in file bintree.asm available at:

A bug fix to the binarytree_insert_node function fixes a parent-child pointer issue and I've added the code for binarytree_delete_node() for 32 and 64-bit platforms.  So basically that particular package is now complete.  You can read the usage and assembling instructions as they are documented within the source file itself.

Don't be alarmed by the size of the actual source file itself (56KB).  Although there are over 2000 lines of code and comments remember that there are 3 different implementations within the same file: one for 32-bit using the C calling convention applicable to both Linux and Windows, one for Linux 64-bit fastcall, and one for Windows 64-bit fastcall.  Only the actual code for the target platform will be assembled.  For instance, the actual binary size of the assembled routines is only 877 bytes for a 32bit implementation using Nasm with optimizations enabled.

There are a few optimizations that can be made to the package such as iteration rather than recursion or shrinking the size of the _bst_node_t structure knowing that the key/ value pairs immediately follow it but I leave those exercises to the reader.  The source code itself provides a working x86 and x64 implementation compatible with Linux and Windows that is well documented, fast, and provides students of assembly an excellent example of programming without using more advanced techniques.  Of course, for a few pints of Guinness I can be persuaded to write a more optimized version ;)

One technique I do use that you may find appealing is the single allocation of memory that contains both the node structure itself as well as the variable length data that a node may contain. This enables the entire node to be fetched into the CPU cache during memory reads and, provided the lengths of the key/value pair are small enough, reduces thrashing.
Title: Re: h2incn
Post by: hmi222 on January 10, 2011, 12:32:45 PM
There's no download!   :-*
Title: Re: h2incn
Post by: Rob Neff on January 10, 2011, 04:44:32 PM
Wow, this reminds me I've got to get this project done too! 
Anyways, there is no download yet but you can grab a tarball of the current codebase if you're looking for the asm stuff: (