Fill memory
In essence this procedure has but one purpose and that is to fill a block of memory with a single byte. To facilitate different aspects, it has 3 entry points

InitFrame: Calculates the area by the difference between RSP & RBP.  You can call this procedure anywhere in the caller so long as there is a stack frame and no other registers have been pushed on the stack.

Code: [Select]
`; =========================================================================================================; This procedure has three entry points each of which finally falls into routine that fills memory.; Interation count is reduced by writing as many QWORDs as possible, but region no matter size will be; filled correctly.; ENTRY: AL = Fill pattern (Bits 4 - 63 are irrelevant as they will be shifted out); RCX = Size of area in bytes; RDI = Pointer to area to be written.; LEAVE: RAX = Pattern extended through all 64 bits; RCX = Unchanged except when InitFrame is called then sizeof fill area.; RDI = Pointer to fill area.; --------------------------------------------------------------------------------------------------------- ; InitFrame only requires AL be set, RCX & RDI are calculated   InitFrame: mov rdi, rsp ; Get pointer to base of fill area    add rdi, 8 ; Bump past callers return    mov rcx, rbp    sub rcx, rdi ; Get actual number of bytes to fill    `
FillMem has three parts:
1: AL = 0, simply bounce to ZeroMem and RAX will be set accordingly
2: AL = -1 simply set RAX to zero and decrement once
3: AL = -2 through 2 or 1 - FE.  Little more involved copying pattern.

Falling into this is how InitFrame determines what its pattern is going to be

Code: [Select]
`        ; Test if we are supposed to be filling with nulls         FillMem: or al, al ; Are we going to fill with nulls    jz ZeroMem        ; Test if we are supposed to be filling with -1's        inc al ; if AL = FF, bump to NULL to set ZF    jnz .Shift ; ZR = 0, means we have to extend pattern        ; More time effective that using .Shift and save a few bytes over move rax, -1.        xor rax, rax    dec rax    jmp ZeroMem + 3`
Is there a better way of doing this?

Code: [Select]
` ; Shift contents of AL through RAX           .Shift: push rcx ; Save size of fill area    xor ecx, ecx ; Trash bits    mov cl, 7    dec al ; Adjust back to original value    mov dl, al ; Save a copy of fill byte          .L0: shl rax, 8 ; Shift in 8 zero bits      mov al, dl ; and copy fill byte into nullified space      loop .L0          pop rcx ; Retrive buffer size    jmp \$ + 5 ; Bounce over next instruction    `
and finally finish by filling buffer

Code: [Select]
`    ; This entry point just simply nullify's RAX as more often that not, calling code would need to ; do this           ZeroMem: xor rax, rax ; Set Fill pattern      ; RAX = Fill pattern ; RCX = Size of buffer ; RDI = Pointer of area to be filled      ; As area may not be quadword aligned, preamble tests bits 0 - 2 as each of those indicates the ; number of bytes, words, dwords that need to be written to align buffer on 64 bits            push rcx ; Preserve      push rdi            ; If bit 0 is on, fill one byte      sar rcx, 1 ; Shift bit 0 into CY      jnc \$ + 3      stosb            ; Now we are word aligned and if bit 1 was on fill another word            sar rcx, 1 ; Shift bit 1 into CY      jnc \$ + 4      stosw            ; Now we are dword aligned and if bit 2 was on fill another dword            sar rcx, 1 ; Shift bit 2 into CY      jnc \$ + 3      stosd            ; RCX now equals the number of qwords to fill            repnz stosq ; Finish by writing RCX qwords.            pop rdi      pop rcx ; Recover            ret`
I've tested this fairly extensively, but for buffers smaller than 8 bytes or null for that matter I haven't. Doesn't seem reasonable this would be used for an area that small.

Re: Fill memory
Code: [Select]
`or rax, -1`... might be a shorter way to fill rax with -1. Dunno how it would compare for speed...

Best,
Frank

Re: Fill memory
Code: [Select]
`or rax, -1`
... might be a shorter way to fill rax with -1. Dunno how it would compare for speed...

Good catch Frank and it is shorter by 2 bytes.  I don't generally concern myself too much with cycles unless they are in a high iteration count loop.  I try to get from point A to point B as efficiently as possible algorithm wise using the least amount of code and at a glance, the logic stands right out. At least, that's the objective anyway