NASM - The Netwide Assembler
NASM Forum => Programming with NASM => Topic started by: munair on December 03, 2021, 08:08:06 PM
-
For the SharpBASIC compiler I've written a function called instr (in-string) with inline asm to find a substring from a specific start position. The function works, but I'm learning ASM as I go, so I wonder if the code is ok or if it could be optimized/shortened. Elsewhere I read that repe cmpsb isn't the fastest way to compare strings. Any suggestions are very welcome.
func _sb_instr(src: str, sbs: str, start: int): int
dim srcl, sbsl: int; ' source and sub-string length
do
asm do
push edi
push esi
mov eax, [ebp + 16] ; source string descriptor
call _sb_load_strdesc ; address in ebx, length in edx
mov esi, ebx ; string address in esi
test edx, edx ; non-zero is fixed-length
jnz .next1
call _sb_strzlen ; null-terminated string
.next1:
test edx, edx ; anything to search in?
jz .out
mov [ebp - 8], edx ; save source string length
mov eax, [ebp + 12] ; sub-string descriptor
call _sb_load_strdesc ; address in ebx, length in edx
test edx, edx ; non-zero is fixed-length
jnz .next2
call _sb_strzlen ; null-terminated string
.next2:
test edx, edx ; anything to search for?
jz .out
mov [ebp - 12], edx ; save sub-string length
mov ecx, [ebp + 8] ; start position
mov edx, [ebp - 8] ; source string length
sub edx, ecx ; length minus start position
mov eax, edx ; save length
sub eax, [ebp - 12] ; subtract sub-string length
cmp eax, 0
jl .out ; sub-string is longer
dec ecx ; 0-based
push esi ; save start address
add esi, ecx ; start address from start position
.repeat: ; compare loop
mov ecx, [ebp - 12] ; substring length is loop counter
mov edi, ebx ; sub-string address in edi
cld
mov eax, esi ; save esi
repe cmpsb ; compare strings (edi with esi)
jz .done
inc eax ; move up one byte
mov esi, eax ; restore modified esi
dec edx ; decrement source string length
and edx, edx ; non-zero means continue search
jnz .repeat
jmp .out
.done:
sub esi, [ebp-12] ; beginning of sub-string
pop eax ; restore start address
sub esi, eax ; start position in string
inc esi ; position 1-based
mov [ebp - 4], esi ; save result
.out:
pop esi
pop edi
end;
end;
-
Rabin-Karp algorithm (https://en.wikipedia.org/wiki/Rabin%E2%80%93Karp_algorithm)
Knuth-Morris-Pratt algorithm (https://en.wikipedia.org/wiki/Rabin%E2%80%93Karp_algorithm)
Boyer-Moore algorithm (https://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string-search_algorithm)
-
Rabin-Karp algorithm (https://en.wikipedia.org/wiki/Rabin%E2%80%93Karp_algorithm)
Knuth-Morris-Pratt algorithm (https://en.wikipedia.org/wiki/Rabin%E2%80%93Karp_algorithm)
Boyer-Moore algorithm (https://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string-search_algorithm)
Examples in the links provided suggest these advanced search algorithms are primarily for higher level languages and larger pieces of texts. The basic InStr function is most often used for smaller sized strings. For this reason I also did not optimize the loop to test the remaining source string length against the sub-string length.
-
Examples in the links provided suggest these advanced search algorithms are primarily for higher level languages and larger pieces of texts. The basic InStr function is most often used for smaller sized strings. For this reason I also did not optimize the loop to test the remaining source string length against the sub-string length.
The more you learn about asm coding, the easier it is to look at an algorithm as pseudo code and translate it. repsb is the easiest to code, but easy/short code is not always the best. I'm going to take a better look at those algorithms to see how I can adapt them into routines I can use in asm apps. The only place I have done string searches so far in my limited coding is in cpuid code and my boot sector, but I do hope to get back into OS coding in the next year, with small code for AT class systems and faster code for 386 and later (I like working with old hardware).
-
Interesting. I also keep older computers and still use them if possible, such as the server for the website and forum, which is a 16 years old 32 bits system. I also still have a 80-386 SX 25MHz with 16MB RAM. It was my first DOS-Win31 system that I used A LOT for programming, although my first programming experience was on a ZX80 with 4KB RAM connected to a TV ;D.
Back on topic, the algorithms are indeed very interesting, but hashes on small strings IMO is overkill. They're more suited for search functions in browsers or word processors. For memo-stuff up to a few hundred words, I wonder which approach is faster.