In essence this procedure has but one purpose and that is to fill a block of memory with a single byte. To facilitate different aspects, it has 3 entry points
InitFrame: Calculates the area by the difference between RSP & RBP. You can call this procedure anywhere in the caller so long as there is a stack frame and no other registers have been pushed on the stack.
; =========================================================================================================
; This procedure has three entry points each of which finally falls into routine that fills memory.
; Interation count is reduced by writing as many QWORDs as possible, but region no matter size will be
; filled correctly.
; ENTRY: AL = Fill pattern (Bits 4 - 63 are irrelevant as they will be shifted out)
; RCX = Size of area in bytes
; RDI = Pointer to area to be written.
; LEAVE: RAX = Pattern extended through all 64 bits
; RCX = Unchanged except when InitFrame is called then sizeof fill area.
; RDI = Pointer to fill area.
; ---------------------------------------------------------------------------------------------------------
; InitFrame only requires AL be set, RCX & RDI are calculated
InitFrame: mov rdi, rsp ; Get pointer to base of fill area
add rdi, 8 ; Bump past callers return
mov rcx, rbp
sub rcx, rdi ; Get actual number of bytes to fill
FillMem has three parts:
1: AL = 0, simply bounce to ZeroMem and RAX will be set accordingly
2: AL = -1 simply set RAX to zero and decrement once
3: AL = -2 through 2 or 1 - FE. Little more involved copying pattern.
Falling into this is how InitFrame determines what its pattern is going to be
; Test if we are supposed to be filling with nulls
FillMem: or al, al ; Are we going to fill with nulls
jz ZeroMem
; Test if we are supposed to be filling with -1's
inc al ; if AL = FF, bump to NULL to set ZF
jnz .Shift ; ZR = 0, means we have to extend pattern
; More time effective that using .Shift and save a few bytes over move rax, -1.
xor rax, rax
dec rax
jmp ZeroMem + 3
Is there a better way of doing this?
; Shift contents of AL through RAX
.Shift: push rcx ; Save size of fill area
xor ecx, ecx ; Trash bits
mov cl, 7
dec al ; Adjust back to original value
mov dl, al ; Save a copy of fill byte
.L0: shl rax, 8 ; Shift in 8 zero bits
mov al, dl ; and copy fill byte into nullified space
loop .L0
pop rcx ; Retrive buffer size
jmp $ + 5 ; Bounce over next instruction
and finally finish by filling buffer
; This entry point just simply nullify's RAX as more often that not, calling code would need to
; do this
ZeroMem: xor rax, rax ; Set Fill pattern
; RAX = Fill pattern
; RCX = Size of buffer
; RDI = Pointer of area to be filled
; As area may not be quadword aligned, preamble tests bits 0 - 2 as each of those indicates the
; number of bytes, words, dwords that need to be written to align buffer on 64 bits
push rcx ; Preserve
push rdi
; If bit 0 is on, fill one byte
sar rcx, 1 ; Shift bit 0 into CY
jnc $ + 3
stosb
; Now we are word aligned and if bit 1 was on fill another word
sar rcx, 1 ; Shift bit 1 into CY
jnc $ + 4
stosw
; Now we are dword aligned and if bit 2 was on fill another dword
sar rcx, 1 ; Shift bit 2 into CY
jnc $ + 3
stosd
; RCX now equals the number of qwords to fill
repnz stosq ; Finish by writing RCX qwords.
pop rdi
pop rcx ; Recover
ret
I've tested this fairly extensively, but for buffers smaller than 8 bytes or null for that matter I haven't. Doesn't seem reasonable this would be used for an area that small.