NASM - The Netwide Assembler

NASM Forum => Programming with NASM => Topic started by: doublec122 on February 17, 2019, 02:13:52 PM

Title: Confusion about printing (an integer with multiple digits)
Post by: doublec122 on February 17, 2019, 02:13:52 PM: So I'm a beginner in assembly and I'm currently going over some tutorials on YT, and now I'm going over an exercise that
is supposed to print an integer. Now, I understood the algorithm behind all this code, or more exactly, the process on how it
actually prints that integer. The problem is, I'm not sure if it is *exactly* how I imagine it to be in terms of code.

For instance, I am given digitSp, that will hold the integer, and digitSpPos that is supposed to act as a sort of index. Then I
move the integer into RAX and call _printRAX. So far seems quite clear to me. The confusion starts when I get into the _printRAX
label: here, as far as I've understood, I'm adding a line feed value at the address pointed by RCX, then we increment the value
in RCX and finally we transfer that value in digitSpPos. All of this is pretty confusing because I don't know exactly how it is
supposed to work: My goal somehow would be to add a line feed and increment the "index", and then pass the updated index into
digitSpPos. But is it how it happens? For instance, when I do <mov rcx, digitSp>, I am simply transfering 100 bytes into rcx,
so when I do <inc rcx>, I should get 101? And when I do <mov [rcx], rbx>, that moves the line feed value at the address pointed by
RCX, while I thought digitSp would hold the whole string, which has been already passed to RCX as value.

In the following two loops, again, the confusing part is only in the section when I basically do the same thing as in _printRAX
(I update my "index" and move the numbers one by one). All in all, the whole idea would be to divide the integer by 10 and take
each remainder and stick it into RCX until I recreate the number, but in reverse. Then, I print everything from the end of RCX
to the beginning (giving me the integer in the correct order, plus the line feed).

But questions are, why do I use digitSp if RCX is going to hold the whole integer? Why do I pass each number into as a value
in the address pointed by RCX? Has it something to do on how the registers view those bytes assigned to them?

I apologize if my question is a bit weird, but I would really like to understand how everything really works, since as a beginner,
that would help me grasp the idea behind assembly programming better. I usually analyze every program and try to make sense out of
it, but now apparently I've gotten into a bit of a problem.

The code is as follows:

Code: [Select]
section .bss digitSp resb 100 digitSpPos resb 8 section .text global _start _start: mov rax, 12345 call _printRAX mov rax, 60 mov rdi, 0 syscall _printRAX: mov rcx, digitSp mov rbx, 10 mov [rcx], rbx inc rcx mov [digitSpPos], rcx _printRAXLoop: mov rdx, 0 mov rbx, 10 div rbx ;push rax add rdx, 48 ;mov rcx, [digitSpPos] mov [rcx], dl inc rcx mov [digitSpPos], rcx ;pop rax cmp rax, 0 jne _printRAXLoop _printRAXLoop2: ;mov rcx, [digitSpPos] mov rax, 1 mov rdi, 1 mov rsi, rcx mov rdx, 1 syscall mov rcx, [digitSpPos] dec rcx mov [digitSpPos], rcx cmp rcx, digitSp jge _printRAXLoop2 ret
There are a couple of lines which are commented sincer they seemed redundant (checked this by also running the program with the changes made), but I might be wrong, so I left them commented. I apologize if my question is a bit long and weird.
Title: Re: Confusion about printing (an integer with multiple digits)
Post by: Frank Kotler on February 17, 2019, 10:10:33 PM: Hi doublec122,
Welcome to the forum.

Quote
For instance, when I do <mov rcx, digitSp>, I am simply transfering 100 bytes into rcx,
No. You are moving the address of your buffer into rcx. Your code doesn't use the number 100.

I gotta reboot. I'll get back to you...
Later,
Frank

Code: [Select]
global _start section .bss buffer resb 20 section .text _start: mov rax, -1 call showeaxd exit: mov rax, 60 xor rdi, rdi syscall ;===================== showeaxd: mov rbx, 10 mov rsi, buffer + 19 xor rdi, rdi .top: xor rdx, rdx div rbx add dl, '0' mov [rsi], dl dec rsi inc rdi test rax, rax jnz .top mov rdx, rdi mov rdi, 1 mov rax, 1 syscall ret
Title: Re: Confusion about printing (an integer with multiple digits)
Post by: Frank Kotler on February 17, 2019, 11:48:28 PM: Where was I?

Oh yeah... It looks to me like your code was written for 32 bits, originally. The 32 bit write call uses ecx to point to the buffer which may account for some of the saving and restoring you no longer have to do.

Ignore the code I posted. In the first place I meant "showraxd" not "showeaxd". It probably has other errors.

Quote
But questions are, why do I use digitSp if RCX is going to hold the whole integer?

RCX is only 8 bytes. A 64 bit number can be longer than that. Thus the 100 byte buffer - longer than it has to be. The bytes you're interested in are in dl. You add 48 or 0x30 or '0' to convert from
a number to the character representing that number. Your code puts them in the buffer "forwards" and then prints the buffer "backwards". The code I posted attempts to start at the "end" of the buffer and work towards the "front"... but it isn't right.

I don't know if I've cleared up your questions or not. Ask again if not...

Best,
Frank
Title: Re: Confusion about printing (an integer with multiple digits)
Post by: doublec122 on February 18, 2019, 11:17:55 AM: Yes, that was one of the things I actually wanted to clear out, thank you. In fact, after reading this I also checked my program and analyzed it with gdb, and I saw that RCX had an adress as value (the beginning of the digitSp), and so by using [RCX], I'm able to access the value at that address (right?), basically dereferencing a pointer. Now that I cleared that out, I can pretty much see how actually I printed that integer. Filling each byte with each digit, taken backwards (also with the line feed). Then, pretty much the opposite procedure in order to print (decreasing RCX to get back to the address at the beginning, printing each value at each address in the process). I hope I'm correct so far.

Now, there might be one more thing I think I don't get. When analyzing with GDB (also using peda), I noticed that, in the _printRaxLoop2 label, when basically decreasing RCX, I noticed, in the process, that as my addresses got closer to the one at the beginning, I saw something like:

RCX: <address>->0x313233..->('4321')

That would mean that the address currently held in RCX contains the value 4321? To me, that seems improbable, because, as before, decreasing RCX gets me to a lower address that should hold another single digit from my integer, not the whole number in reverse (up to that current address). Looking at that following hex number after the address in RCX, I notice that those values are indeed ASCII codes for 1, 2 ,3 and so on, and are in order. So, could that be some sort of strange mistake by peda? I'm asking just to clarify things now that I'm at them, sorry if my question is too long or messy.

Lastly, just a quick one: does assembly work in the same way for all "variables" declared? To be precise, if I have in section .data, or .bss any type of data, whether I say <text db "Hello">, or, like here, <variable resb 100>, does assembly only see these just as adresses in memory? So in order to do anything with them I'll have to advance byte by byte along its the "variable" 's span?

That would be all the things I had doubts about. Thank you for your help.
Title: Re: Confusion about printing (an integer with multiple digits)
Post by: fredericopissarra on February 18, 2019, 02:29:55 PM: I think this a classical "pointer vs value" confusion almost everyone learning C or ASM makes.
Pointers are integer values. These values are memory addresses. In ASM, every label have their own memory address and you can use this value as indirect acess to data.

In your code you are reserving a buffer of 100 bytes and naming the initial address as digitSp, so the linker can resolve this 'name' to a proper address. In NASM, when you do:

Code: [Select]
mov rcx,digitSp
You are telling the compiler do put the initial address of this buffer on RCX. Of course, this is different from:

Code: [Select]
mov dl,[digitSp]
Where you as asking to read the data stored at the address named digitSp.

The way you can work with variables in assembly is the same you do in C or other compiled languages. It depends on the scope and initial value. If you put your "buffer" in .bss segment, this segment will be allocated by the loader, but not initialized (in a pure asm program). If you put in .data, this segment will be loaded from your executable image and your "global" variables will have their initial values as you told the compiler. Variables on .rodata segment behave as in .data segment, but are READ-ONLY... And, you can use the stack to hold "local" variables as well.

It is a good optimization to hold all variables on registers as possible, obeying a strict calling convention, but you can use memory-allocated variables if you want (it will slow your routines down, probably).

A good trick is to use GCC to build simple functions and compile them with -O2 -S -masm=intel -fno-stack-protector options to see what the compiler did. Generally, this gives you pretty good optimized routines (maybe you can do little tweaks to satisfy your needs!). Take a look at this:

Code: [Select]
void printul64( uint64_t n ) { char buffer[21]; char *p, *q; uint32_t length; uint8_t rem; p = q = buffer + sizeof buffer - 1; *p-- = '\0'; // Special case: if n == 0. if ( !n ) *p-- = '0'; else while ( n ) { rem = n % 10; n /= 10; *p-- = rem + '0'; } length = q - ++p; puts( p ); }
The compiler will create something like this:

Code: [Select]
global printul64 printul64: sub rsp, 40 ; Allocate space for 'buffer'. test rdi, rdi mov rsi, -3689348814741910323 ; Magic! :) mov byte [rsp+20], 0 ; Put NUL char. lea rcx, [rsp+19] je .L8 ; n == 0? .loop: mov rax, rdi sub rcx, 1 mul rsi ; MUL is faster then DIV. shr rdx, 3 lea rax, [rdx+rdx*4] add rax, rax sub rdi, rax mov rax, rdi mov rdi, rdx add eax, '0' test rdx, rdx ; n == 0? mov byte [rcx+1], al jne .loop lea rdi, [rcx+1] call puts@PLT add rsp, 40 ; Reclaim allocated buffer. ret ; Special case when n == 0. .L8: lea rcx, [rsp+18] mov byte [rsp+19], '0' lea rdi, [rcx+1] call puts@PLT add rsp, 40 ; Reclaim allocated buffer. ret
Here it uses the stack to hold the local array 'buffer'. And, notice the DIV instruction is missing, instead, it uses MUL because it is 6 times faster. You can substitute libc call puts() to your own printing routine (syscall_write, for example).

[]s
Fred