NASM - The Netwide Assembler
NASM Forum => Programming with NASM => Topic started by: jedi on January 21, 2013, 03:02:12 PM
-
I'm looking for an algorithm to convince my program that the digits it has stored as a hexadecimal are really meant to be a decimal.
For example, '6789' is in memory as a hexadecimal even though in reality it is supposed to be a decimal number. I need to convince the computer that it is really 1A85h (which is the hexadecimal equivalent of 6789d).
I haven't figured out an algorithm to convert this. If you divide by 16 then you separate a '9' which is not useful because we need a '5'. If you divide by 10 then you'll get a '5' followed by a '0' ending up in the number '26505' which is the decimal representation of 6789h which is also not what we want.
Any ideas?
Thanks guys!
The following code probably won't help you but it basically takes unicode characters entered via console (assumed to be passed as a decimal) and then translates them out of unicode. At this point the computer has '6789' in memory but it thinks it is a hex.
SECTION .data
data: db '6789' ;************ for debugging
SECTION .bss
newdata: resq 100
SECTION .text
global uni_hex
extern print# ;************for debugging
extern ExitProcess ;************for debugging
uni_hex:
; ******************** values for debugging
mov r8, data
mov rax, 4
mov r10, newdata
; r8 = A pointer to the unicode data.
; rax = Length of string. Not a pointer.
; r10 = A pointer to the storage of the converted data.
; Housekeeping.
mov r11, rax ; make a copy
dec r11 ; want to point at last byte
mov r12, rax ; make another copy
mov r13, 0
loop1: ; Convert the unicode to decimal.
mov byte al, [r8+r11] ; load last byte of string
sub rax, 30h ; convert to 0xd
mov r9, rax ; this fills the lower half-byte of the r9 register.
dec r11 ; one less character
dec r12 ; one less character
jz oddfinish ; if it isn't an even number of characters then it is very important to end now or we will crash.
mov byte al, [r8+r11] ; load next to last byte of string
sub rax, 30h ; convert to 0xd
shl rax, 4 ; make it take the upper half-byte
add rax, r9 ; add the upper and lower half-bytes together ...
; ... rax now holds the two digits together -- one correct byte in decimal format ...
; ... although the computer thinks that this proper decimal number is actually hexadecimal ...
; ... and we'll need to fix that and turn it into hexadecimal later.
; Now we store it.
mov byte [r10+r13], al ; r13 begins at 0
dec r11 ; point at the previous byte next loop
inc r13
dec r12 ; one less digit to analyze
jnz loop1
jmp skipoddfinish ; you do not want to do label 'oddfinish' because it will double the final byte.
oddfinish: mov byte [r10+r13], al ; we store it
skipoddfinish:
-
Your code is treating the input as hex - not decimal.
If you are truly expecting decimal only input then you need a loop that will multiply the current value by 10 for each digit read in and then add the current digit.
Assume that we're reading the string from left to right, here's pseudo-code:
value = 0 ; <-- must initialize to zero first
again:
read next digit
value = value * 10 + digit
loop again ;<-- if still have chars remaining
Hope that helps.
-
Thank you Rob.
In the past I made a function that does just that. It uses the algorithm you suggested. The limitation is that I could only get it to do four digits before overflow occurs. I read somewhere that there is a way to manipulate it to overcome the overflow but it was complicated and I thought that this idea was simpler and more efficient. It can handle unlimited characters and also does away with the multiplication making it multiple times faster/more efficient.
If I could trick the computer into thinking the hex is a decimal that would be ideal. However, I'm not sure if that is possible.
Jedi
-
Even with just using a 32-bit register you could store a value up to 2^32 ( ie: an input string with a value no larger than '4294967296' ).
I think you should look harder at your decimal algorithm if you're overflowing at only 4 decimal digits. ;)
-
How about this one.
This is same as what Rob suggests i guess :)
After all , We need to calculate 9+80+700+6000
This program does that.
we get the last hex digit using "and ecx,0xF"
mov ebx, 0x6789 ;; Our input in hex
xor edi,edi
mov eax, 1
xor ecx,ecx
test ebx,ebx ;; If input = 0 , output = 0
jz EndProcess
StartProcess:
mov ecx,ebx
and ecx,0xF ;; Take the last digit
push eax
mul ecx
add edi,eax
pop eax
imul eax, 10 ;; Prepare the next place.
shr ebx,4
jz EndProcess
jmp StartProcess
EndProcess:
mov eax,edi
0x1A85 is stored in eax. (final output).
I have written it for 32 bit. But you should be able to change this for 64 bit.
by changing eax to rax, ebx to rbx etc.
64 bit (not tested)
mov rbx, 0x6789 ;; Our input in hex
xor rdi,rdi
mov rax, 1
xor rcx,rcx
test rbx,rbx ;; If input = 0 , output = 0
jz EndProcess
StartProcess:
mov rcx,rbx
and rcx,0xF ;; Take the last digit.
push rax
mul rcx
add rdi,rax
pop rax
imul rax, 10 ;; Prepare the next place.
shr rbx,4
jz EndProcess
jmp StartProcess
EndProcess:
mov rax,rdi
Output is stored in rax
EDIT : Previous method to find the last digit was inefficient. so changed it.
Regards,
Mathi.
-
I'm about as confused as I've ever been... lately, anyway... or maybe I've just now realized it...
What's the "specification" on this program, anyway? Jedi expresses it as "convincing" the computer that something is something it's not. This isn't likely to be possible. The computer doesn't "think" much. The computer works with bit patterns, we're the ones who "think" it's a "number" (signed or unsigned?) or an ascii or unicode "character" or something else (flags register, for example). We provide instructions to manipulate these bits to treat 'em as a "character" or a "number" or whatever. Providing instructions to force the computer to do what we want is going to work better than "convincing" it. :)
Jedi provides:
data: db '6789'
What this puts in memory (expressed in hex for our convenience) is 36h, 37h, 38h, 39h.
Mathi starts out with:
mov ebx, 0x6789
What gets stored in memory here (in hex again) is 89h, 67h, 00h, 00h. This may be useful for what Jedi wants, I'm not sure. In any case, I changed the input to 0x1001 and I don't seem to be getting the expected results. I think I downloaded it pre-edit, so it might be okay now, but I don't know about this one, Mathi...
Jedi mentions unicode. I don't know much about unicode. I understand it provides a standardized mapping between a number and the glyph it represents. I understand that the number in question can be encoded in different ways - UTF-8, UTF-16, UTF-32... others? ... but a certain number will always refer to the same glyph. The glyphs we use to represent numbers we learned from the Arabs - '0', '1' etc. If you need to be able to handle some language (Mayan calendar?) that used a different set of glyphs, you'd want unicode. I don't see anything in your code that's going to handle unicode. As I recall, in UTF-8 encoding at least, the ascii characters will work without any special treatment. I'd suggest "forget unicode" until/unless you really need it. I plan to! :)
If the "specification" is simply "input a string of characters representing a decimal number and print the hex equivalent", that isn't too difficult. Convert the text to the number it represents using the algorithm Rob shows, and then convert the number into text representing hex... and print it. The only difficulty I see here is that if you tell the pesky user "a decimal number" they'll, sure as shootin', put a decimal point in it. I don't think an attempt to convert it "digit by digit", either characters or numbers, will be easy - the "carry" comes in the wrong places.
Converting hex text <-> number involves multiplying/dividing by 16... which we can do with shifts. We also have "rol"/"ror", which can be handy. Converting decimal text <-> number, we get into that situation where we get the digits in the opposite order than we want to print 'em. For hex, a "rol" by four will put the high bits we want to print first in the low four bits - right where we want 'em to isolate 'em, convert 'em to character, and print 'em first... Just a thought...
If we need to convert both dec->hex and hex->dec, can we make the user enter a hex value as "0x..." or "...h", assuming decimal otherwise, or do we want a "menu" where they can specify which way to convert? Can ya clarify the "specification" any, Jedi?
Best,
Frank
-
Frank,
One clarification :)
The routine i pasted was not a replacement for Jedi's code. It should be executed on the output of Jedi's routine.
I think his routine actually converts the char bytes ('6789')
From
36h, 37h, 38h, 39h
to 2bytes at [newdata]
89h 67h (which is the little endian representation of 0x6789)
But he wants these two bytes to be changed to (convince the program :) )
85h 1Ah - which is 0x1A85 = 6789 in decimal.
Well.. We can be sure only if Jedi replies back :D
My code should actually be.
xor ebx,ebx
mov bx,word [newdata] ;; Our input in hex this will load ebx = 0x6789
xor edi,edi
mov eax, 1
xor ecx,ecx
test ebx,ebx ;; If input = 0 , output = 0
jz EndProcess
StartProcess:
mov ecx,ebx
and ecx,0xF ;; Take the last digit
push eax
mul ecx
add edi,eax
pop eax
imul eax, 10 ;; Prepare the next place.
shr ebx,4
jz EndProcess
jmp StartProcess
EndProcess:
mov eax,edi
mov word[newdata], ax ;****NOTE****
Even if it had been in big endian format . We have the BSWAP instruction handy. :)
Still, it cannot be applied to unlimited number of chars.
MAXLIMIT for the input i think is "sixteen 9's "
Regards,
Mathi.
-
Thanks guys. Woke up to so many posts. ;D
Rob is right that my past implementation of his recommended algorithm was bad. I thought I was limited because I read the rule that the processor requires a register or ax:dx set twice as big as the multiplicands. I was doubling on each iteration of the loop which quickly ate up my registers. Thank you Rob.
Frank, the reason I'm converting Unicode is because when I get data from console it arrives in Unicode format. So a 9 is represented as 39h. I need a basic function to be able to receive data from console. I'd like it to be able to handle decimal input and not only hex. I know such a function exists in libraries but I'm gaining a lot of experience from creating this from scratch. :)
The algorithm I uploaded changed the 39h to 09h then removed the prefixed 0 and squashed what was two bytes (38h + 39h) into one (89h). The problem is that this is an intended decimal number. Using Rob's algorithm I thought I was limited to measly 4 digits or 8 Unicode bytes. I'm wrong about that. 8)
Thanks so much Mathi. Your algorithm is perfect. Thanks so much for putting it together. I can see now how I could implement Rob's algorithm without running into a 4 digit limit. Thanks man.
Thanks everyone, this is a great community. It's quite enjoyable to program in assembly really. It's cool stuff.
Jedi
-
Jedi,
Frank, the reason I'm converting Unicode is because when I get data from console it arrives in Unicode format. So a 9 is represented as 39h.
9 represented as 39h is ascii encoding
Unicode is a 16 bit encoding.
If you are just trying to convert the text read from console to integer or viceversa,
You can check the below macros which are part of nagoa macroset.
you can invoke this macro as,
str2int data ;; after execution eax will hold 6789. but data should be null terminated like data db '6789',0
to convert integer to ascii (string)
int2str eax, buffer ;; after execution buffer will point to null terminated ascii string.("6789" in this case)
It's quite enjoyable to program in assembly really. It's cool stuff.
All the best. :)
;;================================
;; MACRO str2int int2str by Written by mastercpp.
;;================================
%macro str2int 1
push ebx ;
push esi ;
push edi ;
mov ebx, 0
mov ecx, 0
xor eax,eax
mov ebx,0000000Ah
mov esi,offset %1
%%ConvertLoop:
movzx ecx,byte [esi] ;Zeichen laden.
test ecx,ecx
jz short %%ExitConvertLoop ;0 => Exit
inc esi
sub cl,30h ;0-9...
mul ebx ;Ergebnis * 10
add eax,ecx ;+ nächste Ziffer
jmp short %%ConvertLoop
%%ExitConvertLoop:
pop edi
pop esi
pop ebx
%endmacro
%macro int2str 2
push ebx ;
push esi ;
push edi ;
%%start:
mov eax, %1
xor ecx, ecx
mov ebx, 000ah
%%DecConvert:
xor edx, edx
div ebx
add edx, 0030h
push edx
inc ecx
or eax, eax
jnz short %%DecConvert
mov edi, %2
mov edx,ecx
%%SortDec:
pop eax
stosb
loop %%SortDec
mov eax, 0h
stosb
pop edi
pop esi
pop ebx
%endmacro
;;================================
;; mastercpp end macros
;;================================
Regards,
Mathi.