Author Topic: Problem with itoa code (Read 32038 times)

Nicholas Westerhausen · « **on:** February 01, 2013, 07:06:53 PM »

Hello NASM forums,

( I just tried to post this and the post button wasn't working, so I am retyping it )

Basically, I have a program I am writing that I want to print out integers to STDOUT. I have researched many ways of doing this, some used 16-bit registers, some were written in TASM/MASM, some used 64-bit registers, and some used 32-bit registers but were very opaque. I finally understood what was being done, and wrote a modified version of the itoa code from Bare Metal OS.

For trouble-shooting purposes, I stripped out the itoa code I wrote and put it into a separate .asm to try and figure out where I was going wrong. Here it is:

Code: [Select]

section .text
global _start
_start:

	mov	eax,	1231
	mov	ebx,	10
	xor	ecx,	ecx
	;; Now is the main loop of the conversion. It takes each
	;; digit off of the input starting at the ONES place and
	;; moving upward until it reaches the end of the number
godivide:
	xor	edx,	edx	; clear remainder
	div	ebx		; divide eax by 10
	push	edx		; put edx on stack
	inc	ecx		; increment number of digits
	cmp	eax,	0	; if eax
	jne	godivide	; is not equal to zero, divide again
	;; Next we have to go through each digit and add 0x30 ('0')
	;; to each digit so that it will print correctly
	;; For simplicity, each digit is written as it is converted
nextdigit:
	pop	eax		;pull off the stack
	add	eax,	30h	;add '0' to eax
	push	ecx		; store counter
	xor	ecx,	ecx
	;; write the value to STDOUT
	mov	edx,	1	; one byte
	mov	ecx,	eax	; char
	mov	ebx,	1	; stdout
	mov	eax,	4	; syswrite
	int	80h
	;; 
	pop	ecx		; retrieve counter
	dec	ecx		; move down through digits
	cmp	ecx,	0	; if ecx
	jg	nextdigit	; is greater than 0, jump nextdigit
	
	mov eax,	1
	mov	ebx,	0
	int	80h

Compiling with

Code: [Select]

$ nasm -g -f elf32 itoa-test.asm; ld -o -melf_i386 itoa-test itoa-test.o
Before the method above, where I'm trying to print each character as I convert it, I was trying to use stosb and lodsb, but I don't understand how they work exactly when I'm not calling a function and getting it back (and if I did that, I don't know how to get the char bytes back). I want to be able to call this as a macro or something so that I can just do a single line in the code for printing the integer to the screen (in line). I have yet to be able to get this to work.

I can watch the registers using gdb and see the correct conversion take place. What I do not see is the printing occur like it should. Does this have to do with forcing i386 architecture? My machine is x86_x64 but the code must be 32bit.

I'm new to NASM (just started last week) and I've spent what feels like ages (but is in reality only like 10-12 hours) researching how to do this. Any advice or tips on this are welcome.

Thanks for your time,
Nick

Frank Kotler · « **Reply #1 on:** February 01, 2013, 10:51:10 PM »

Don't ya hate it when that happens? I sure do! Lost a long reply over at AsmCommunity just the other day. Thanks for typing it in again!

Here's where things went wrong...

Code: [Select]

	;; write the value to STDOUT
	mov	edx,	1	; one byte
	mov	ecx,	eax	; char

For sys_write, ecx needs to point to where the character(s) live (in memory). Just putting the character in ecx won't do it. One place the character could "live" temporarily would be on the stack...

Code: [Select]

section .text
global _start
_start:

	mov	eax,	1231
	mov	ebx,	10
	xor	ecx,	ecx
	;; Now is the main loop of the conversion. It takes each
	;; digit off of the input starting at the ONES place and
	;; moving upward until it reaches the end of the number
godivide:
	xor	edx,	edx	; clear remainder
	div	ebx		; divide eax by 10
	push	edx		; put edx on stack
	inc	ecx		; increment number of digits
	cmp	eax,	0	; if eax
	jne	godivide	; is not equal to zero, divide again
	;; Next we have to go through each digit and add 0x30 ('0')
	;; to each digit so that it will print correctly
	;; For simplicity, each digit is written as it is converted
nextdigit:
	pop	eax		;pull off the stack
	add	eax,	30h	;add '0' to eax
	push	ecx		; store counter
;	xor	ecx,	ecx
	;; write the value to STDOUT
	mov	edx,	1	; one byte

;	mov	ecx,	eax	; char
; no, it has to "be" someplace, and ecx point to it

	push eax ; we can put it on the stack (again!)
	mov ecx, esp ; and point to it there...
	
	mov	ebx,	1	; stdout
	mov	eax,	4	; syswrite
	int	80h

	pop eax ; remove it from the stack (unused)
	;; 
	pop	ecx		; retrieve counter
	dec	ecx		; move down through digits
	cmp	ecx,	0	; if ecx
	jg	nextdigit	; is greater than 0, jump nextdigit
	
	mov eax,	1
	mov	ebx,	0
	int	80h

Obviously, that's got a little more pushin' and poppin' than is ideal, but it works. (I love Linux questions 'cause I can actually try it!) We could perhaps improve the situation somewhat by using some other register besides ecx to count digits. We could provide a buffer for the digit in "section .bss" or so... or all of 'em...

As you know, "div" gives us the digits in the "wrong order". Pushing them on the stack and popping them off seems "easy" to me, and it's what I usually show to "beginners". There are better ways to do it. We could start at the right end of the buffer (put in a zero terminator... or maybe a linefeed "last" if you want) and work leftwards... until we run out of digits. You could remember this position and start printing there... or you could pad the rest of the buffer with spaces (right-justified numbers look nice in a column). Sys_write will need both the start position and how many characters. "Calling conventions" return only one thing, but there's no reason you can't return both is you want to! It is plausible to use ecx to point into your buffer throughout, and be ready for sys_write. Since "div" uses edx, we have to use some other register to count digits and swap it back to edx for the sys_write.

We can discuss examples if you want - I've got a pile of 'em - but first let me post the solution to your immediate problem first. TTYL.

Best,
Frank

Nicholas Westerhausen · « **Reply #2 on:** February 05, 2013, 02:04:58 AM »

So, with your modified code, the itoa algorithm works very well

, and even when I make it a macro at the beginning of the file. The problem is that once I add any more data to the file (example below) it doesn't print the integer, instead, some number I have no idea where it comes from.

For example. I have it running when this is the itoa.asm:

Code: [Select]

section .data
    num    dw    13579

%macro   iwrite    1
    mov        eax,      %1
    ...snip, same code from above...
%endmacro

section .text
global _start
_start:
    iwrite   [num]
    ...sys_exit call...

But as soon as I add another number up at the top in the data section

Code: [Select]

section .data
    num    dw    13579
    sum    dw    1932

it prints out something that I can't tell where it comes from (I'm not modifying the .text section at all). I'm not sure if this is because the reference point for num somehow gets messed up, or something. But I've fiddled with it a little bit. Posting here even though I'm continuing to mess with it.

Frank Kotler · « **Reply #3 on:** February 05, 2013, 03:02:42 AM »

The numbers should probably be "dd" not "dw".

Best,
Frank

Nicholas Westerhausen · « **Reply #4 on:** February 05, 2013, 04:32:03 AM »

That fixed it.

Thanks a lot.

Back to what you were saying about starting at a different place in memory and going in the "correct" order though the digits, I have seen example code like that, but when I tried to implement it myself, it got messy when turning it into a macro because I couldn't reference the address of the argument.

Frank Kotler · « **Reply #5 on:** February 05, 2013, 07:35:30 AM »

I'm not sure what you mean by "couldn't reference the address of the argument.". The buffer?

Here's something I use for a quick and dirty "spit out the number". It creates a buffer on the stack, fills it "backwards", falls into sys_write, and makes it all go away. Intended more for "debugging" than nicely formatted numbers...

Code: [Select]

; nasm -f elf32 myprog.asm
; ld -o myprog myprog.o -m elf_i386

global _start

section .text
_start:
    mov eax, 0FFFFFFFFh
    call showeaxd
    
    mov eax, 1
    xor ebx, ebx
    int 80h

;---------------------------------
showeaxd:
    push eax
    push ebx
    push ecx
    push edx
    push esi
    
    sub esp, 10h
    lea ecx, [esp + 12]
    mov ebx, 10
    xor esi, esi
; want linefeed?
    mov byte [ecx], 10
    inc esi

.top:    
    dec ecx
    xor edx, edx
    div ebx
    add dl, '0'
    mov [ecx], dl
    inc esi
    or eax, eax
    jnz .top
    
    mov edx, esi
    mov ebx, 1
    mov eax, 4
    int 80h
    
    add esp, 10h

    pop esi
    pop edx
    pop ecx
    pop ebx
    pop eax

    ret
;---------------------------------

Not really suitable for a macro, but I guess it could be adapted if you wanted to...

Best,
Frank

Nicholas Westerhausen · « **Reply #6 on:** February 05, 2013, 05:21:01 PM »

So here's a question, is it common practice to define a subroutine rather than a macro? I lean towards using macros because they are what I see as functions. When I write code, I tend to abstract out the many lines the "doing" of something takes so that the source code looks clean.

When I look at your "debugging atoi" code, I think I understand what is happening. Most of my confusion comes from the "magic" you are doing with esi. I'm confused with the stack pointer, whether adding is moving which direction. A quick search says that adding to esp is like popping from the stack. So it is moving "down" and by subtracting you are moving "up" into non-assigned memory?

Code: [Select]

showeaxd:
;; push all registers onto the stack
    push eax
    push ebx
    push ecx
    push edx
    push esi
;; move the instruction pointer 16 bits left ("down" the stack?)    
    sub esp, 10h
;; load effective address of one byte one the stack (to the right? "up"?)
    lea ecx, [esp + 12]
;; moving 10 to ebx to convert from decimal to ascii
    mov ebx, 10
;; clearing the esi register (used for string operations?)
    xor esi, esi
; want linefeed?
    mov byte [ecx], 10
    inc esi

.top:    
;; decrement ecx, clear edx, divide eax by 10
    dec ecx
    xor edx, edx
    div ebx
;; add a byte to the lower d register (for ascii conversion of remainder)
    add dl, '0'
;; put dl into the memory address stored in ecx
    mov [ecx], dl
;; move the string pointer??
    inc esi
;; binary 'or' comparison on eax
    or eax, eax
;; if eax || eax != 0 (so there is SOMETHING in eax), jump .top
    jnz .top
    
;; printing the string (number converted)
;; puting the string length in edx?
;; ecx has the address at the beginning of the string
    mov edx, esi
    mov ebx, 1
    mov eax, 4
    int 80h
    
;; move the pointer back to where it was (up 10?)
    add esp, 10h
;; put the original values back in all registers
    pop esi
    pop edx
    pop ecx
    pop ebx
    pop eax
;; return
    ret

Bryant Keller · « **Reply #7 on:** February 05, 2013, 11:04:12 PM »

Quote from: Nicholas Westerhausen on February 05, 2013, 05:21:01 PM

So here's a question, is it common practice to define a subroutine rather than a macro? I lean towards using macros because they are what I see as functions.

Macros should only really be used for small blocks of code that get reused frequently. Subroutines/Procedures are used to reduce and reuse code.

Let's use Frank's as an example:

Code: (showeaxd as Procedure) [Select]

global _start

section .text
_start:
    mov eax, 10
    call showeaxd

    mov eax, 11
    call showeaxd

    mov eax, 12
    call showeaxd

    mov eax, 1
    xor ebx, ebx
    int 80h

showeaxd:
    push eax
    push ebx
    push ecx
    push edx
    push esi
    
    sub esp, 10h
    lea ecx, [esp + 12]
    mov ebx, 10
    xor esi, esi
; want linefeed?
    mov byte [ecx], 10
    inc esi

.top:    
    dec ecx
    xor edx, edx
    div ebx
    add dl, '0'
    mov [ecx], dl
    inc esi
    or eax, eax
    jnz .top
    
    mov edx, esi
    mov ebx, 1
    mov eax, 4
    int 80h
    
    add esp, 10h

    pop esi
    pop edx
    pop ecx
    pop ebx
    pop eax

    ret

Code: (showeaxd as Macro) [Select]

%macro print_int 1
%push _print_int_
    mov eax, %1
    push eax
    push ebx
    push ecx
    push edx
    push esi
    
    sub esp, 10h
    lea ecx, [esp + 12]
    mov ebx, 10
    xor esi, esi
; want linefeed?
    mov byte [ecx], 10
    inc esi

%$top:    
    dec ecx
    xor edx, edx
    div ebx
    add dl, '0'
    mov [ecx], dl
    inc esi
    or eax, eax
    jnz %$top
    
    mov edx, esi
    mov ebx, 1
    mov eax, 4
    int 80h
    
    add esp, 10h

    pop esi
    pop edx
    pop ecx
    pop ebx
    pop eax
%pop
%endmacro

global _start

section .text
_start:
    print_int 10
    print_int 11
    print_int 12

    mov eax, 1
    xor ebx, ebx
    int 80h

Although these two might look similar and the second does "resemble" a high level function call, these two are extremely different. In fact, the second is 3 times bigger when assembled than the first.

When we assemble the first, we get the following code:

Code: [Select]

%line 1+1 main.asm
[global _start]

[section .text]
_start:
 mov eax, 10
 call showeaxd

 mov eax, 11
 call showeaxd

 mov eax, 12
 call showeaxd

 mov eax, 1
 xor ebx, ebx
 int 80h

showeaxd:
 push eax
 push ebx
 push ecx
 push edx
 push esi

 sub esp, 10h
 lea ecx, [esp + 12]
 mov ebx, 10
 xor esi, esi

 mov byte [ecx], 10
 inc esi

.top:
 dec ecx
 xor edx, edx
 div ebx
 add dl, '0'
 mov [ecx], dl
 inc esi
 or eax, eax
 jnz .top

 mov edx, esi
 mov ebx, 1
 mov eax, 4
 int 80h

 add esp, 10h

 pop esi
 pop edx
 pop ecx
 pop ebx
 pop eax

 ret

As you can see, it's practically the same as what we started with. However, when we assemble the second, we get:

Code: [Select]

%line 42+1 main.asm

[global _start]

[section .text]
_start:
 mov eax, 10
%line 47+0 main.asm
 push eax
 push ebx
 push ecx
 push edx
 push esi

 sub esp, 10h
 lea ecx, [esp + 12]
 mov ebx, 10
 xor esi, esi

 mov byte [ecx], 10
 inc esi

..@3.top:
 dec ecx
 xor edx, edx
 div ebx
 add dl, '0'
 mov [ecx], dl
 inc esi
 or eax, eax
 jnz ..@3.top

 mov edx, esi
 mov ebx, 1
 mov eax, 4
 int 80h

 add esp, 10h

 pop esi
 pop edx
 pop ecx
 pop ebx
 pop eax
%line 48+1 main.asm
 mov eax, 11
%line 48+0 main.asm
 push eax
 push ebx
 push ecx
 push edx
 push esi

 sub esp, 10h
 lea ecx, [esp + 12]
 mov ebx, 10
 xor esi, esi

 mov byte [ecx], 10
 inc esi

..@5.top:
 dec ecx
 xor edx, edx
 div ebx
 add dl, '0'
 mov [ecx], dl
 inc esi
 or eax, eax
 jnz ..@5.top

 mov edx, esi
 mov ebx, 1
 mov eax, 4
 int 80h

 add esp, 10h

 pop esi
 pop edx
 pop ecx
 pop ebx
 pop eax
%line 49+1 main.asm
 mov eax, 12
%line 49+0 main.asm
 push eax
 push ebx
 push ecx
 push edx
 push esi

 sub esp, 10h
 lea ecx, [esp + 12]
 mov ebx, 10
 xor esi, esi

 mov byte [ecx], 10
 inc esi

..@7.top:
 dec ecx
 xor edx, edx
 div ebx
 add dl, '0'
 mov [ecx], dl
 inc esi
 or eax, eax
 jnz ..@7.top

 mov edx, esi
 mov ebx, 1
 mov eax, 4
 int 80h

 add esp, 10h

 pop esi
 pop edx
 pop ecx
 pop ebx
 pop eax
%line 50+1 main.asm

 mov eax, 1
 xor ebx, ebx
 int 80h

As you can see, the output of this second example is dramatically larger than the first. That is the nature of macros, they don't invoke the code you reference, they place the referenced code directly into your source as though you typed it yourself.

Frank Kotler · « **Reply #8 on:** February 06, 2013, 05:48:25 AM »

Code: [Select]

;; move the instruction pointer 16 bits left ("down" the stack?)    
    sub esp, 10h

"stack pointer", not "instruction pointer"

Code: [Select]

;; clearing the esi register (used for string operations?)
    xor esi, esi

Naw, just a digit counter. Since sys_write is going to want the "string pointer" in ecx, I use ecx throughout. I'd use edx as the counter (since that's where sys_write wants it), but "div" ties up edx so I had to use something else. I should have commented it better, but you seem to have figured it out...

As Bryant has demonstrated, "abstract away the lines" can also sometimes "hide the fact that your program is filled with repetitive blocks of code". That's not the end of the world. It might be exactly what you want to do. I personally would prefer to see that much code as a subroutine, included once and called from multiple places, but it'll work either way. It might run faster "in line", avoiding the call and ret... but you won't notice any difference. "Programmer's choice". It is good to be the programmer!

Best,
Frank

Nicholas Westerhausen · « **Reply #9 on:** February 07, 2013, 04:00:30 PM »

Thank you for the explanation about macros. It is really clear now. Macros should be used when you are going to type the same thing everytime (like a write or read). Subroutines should be used when you have many lines that you want to reuse but not retype. Makes sense to me. I'm going to rewrite my code without macros I think and compare file sizes for myself.

Nick

NASM - The Netwide Assembler

News:

Author Topic: Problem with itoa code (Read 32038 times)

Nicholas Westerhausen

Problem with itoa code

Frank Kotler

Re: Problem with itoa code

Nicholas Westerhausen

Re: Problem with itoa code

Frank Kotler

Re: Problem with itoa code

Nicholas Westerhausen

Re: Problem with itoa code

Frank Kotler

Re: Problem with itoa code

Nicholas Westerhausen

Re: Problem with itoa code

Bryant Keller

Re: Problem with itoa code

Frank Kotler

Re: Problem with itoa code

Nicholas Westerhausen

Re: Problem with itoa code