Author Topic: Confusion about printing (an integer with multiple digits)  (Read 9989 times)

Offline doublec122

  • Jr. Member
  • *
  • Posts: 2
Confusion about printing (an integer with multiple digits)
« on: February 17, 2019, 02:13:52 PM »
So I'm a beginner in assembly and I'm currently going over some tutorials on YT, and now I'm going over an exercise that
is supposed to print an integer. Now, I understood the algorithm behind all this code, or more exactly, the process on how it
actually prints that integer. The problem is, I'm not sure if it is *exactly* how I imagine it to be in terms of code.

For instance, I am given digitSp, that will hold the integer, and digitSpPos that is supposed to act as a sort of index. Then I
move the integer into RAX and call _printRAX. So far seems quite clear to me. The confusion starts when I get into the _printRAX
label: here, as far as I've understood, I'm adding a line feed value at the address pointed by RCX, then we increment the value
in RCX and finally we transfer that value in digitSpPos. All of this is pretty confusing because I don't know exactly how it is
supposed to work: My goal somehow would be to add a line feed and increment the "index", and then pass the updated index into
digitSpPos. But is it how it happens? For instance, when I do <mov rcx, digitSp>, I am simply transfering 100 bytes into rcx,
so when I do <inc rcx>, I should get 101? And when I do <mov [rcx], rbx>, that moves the line feed value at the address pointed by
RCX, while I thought digitSp would hold the whole string, which has been already passed to RCX as value.

In the following two loops, again, the confusing part is only in the section when I basically do the same thing as in _printRAX
(I update my "index" and move the numbers one by one). All in all, the whole idea would be to divide the integer by 10 and take
each remainder and stick it into RCX until I recreate the number, but in reverse. Then, I print everything from the end of RCX
to the beginning (giving me the integer in the correct order, plus the line feed).

But questions are, why do I use digitSp if RCX is going to hold the whole integer? Why do I pass each number into as a value
in the address pointed by RCX? Has it something to do on how the registers view those bytes assigned to them?

I apologize if my question is a bit weird, but I would really like to understand how everything really works, since as a beginner,
that would help me grasp the idea behind assembly programming better. I usually analyze every program and try to make sense out of
it, but now apparently I've gotten into a bit of a problem.

The code is as follows:

Code: [Select]
section .bss
  digitSp resb 100
  digitSpPos resb 8

section .text
  global _start

_start:
  mov rax, 12345
  call _printRAX

  mov rax, 60
  mov rdi, 0
  syscall

_printRAX:
  mov rcx, digitSp
  mov rbx, 10
  mov [rcx], rbx
  inc rcx
  mov [digitSpPos], rcx

_printRAXLoop:
  mov rdx, 0
  mov rbx, 10
  div rbx
  ;push rax
  add rdx, 48

  ;mov rcx, [digitSpPos]
  mov [rcx], dl
  inc rcx
  mov [digitSpPos], rcx

  ;pop rax
  cmp rax, 0
  jne _printRAXLoop

_printRAXLoop2:
  ;mov rcx, [digitSpPos]

  mov rax, 1
  mov rdi, 1
  mov rsi, rcx
  mov rdx, 1
  syscall

  mov rcx, [digitSpPos]
  dec rcx
  mov [digitSpPos], rcx

  cmp rcx, digitSp
  jge _printRAXLoop2

  ret

There are a couple of lines which are commented sincer they seemed redundant (checked this by also running the program with the changes made), but I might be wrong, so I left them commented. I apologize if my question is a bit long and weird.

Offline Frank Kotler

  • NASM Developer
  • Hero Member
  • *****
  • Posts: 2667
  • Country: us
Re: Confusion about printing (an integer with multiple digits)
« Reply #1 on: February 17, 2019, 10:10:33 PM »
Hi doublec122,
Welcome to the forum.

Quote
For instance, when I do <mov rcx, digitSp>, I am simply transfering 100 bytes into rcx,
No. You are moving the address of your buffer into rcx. Your code doesn't use the number 100.

I gotta reboot. I'll get back to you...
Later,
Frank



Code: [Select]
global _start

section .bss
buffer resb 20

section .text
_start:

mov rax, -1
call showeaxd

exit:
mov rax, 60
xor rdi, rdi
syscall

;=====================
showeaxd:

mov rbx, 10
mov rsi, buffer + 19
xor rdi, rdi

.top:
xor rdx, rdx
div rbx
add dl, '0'
mov [rsi], dl
dec rsi
inc rdi
test rax, rax
jnz .top

mov rdx, rdi
mov rdi, 1
mov rax, 1
syscall
ret




Offline Frank Kotler

  • NASM Developer
  • Hero Member
  • *****
  • Posts: 2667
  • Country: us
Re: Confusion about printing (an integer with multiple digits)
« Reply #2 on: February 17, 2019, 11:48:28 PM »
Where was I?

Oh yeah... It looks to me like your code was written for 32 bits, originally. The 32 bit write call uses ecx to point to the buffer which may account for some of the saving and restoring you no longer have to do.

Ignore the code I posted. In the first place I meant "showraxd" not "showeaxd". It probably has other errors.

Quote
But questions are, why do I use digitSp if RCX is going to hold the whole integer?

RCX is only 8 bytes. A 64 bit number can be longer than that. Thus the 100 byte buffer - longer than it has to be. The bytes you're interested in are in dl. You add 48 or 0x30 or '0' to convert from
a number to the character representing that number. Your code puts them in the buffer "forwards" and then prints the buffer "backwards". The code I posted attempts to start at the "end" of the buffer and work towards the "front"... but it isn't right.

I don't know if I've cleared up your questions or not. Ask again if not...

Best,
Frank


Offline doublec122

  • Jr. Member
  • *
  • Posts: 2
Re: Confusion about printing (an integer with multiple digits)
« Reply #3 on: February 18, 2019, 11:17:55 AM »
Yes, that was one of the things I actually wanted to clear out, thank you. In fact, after reading this I also checked my program and analyzed it with gdb, and I saw that RCX had an adress as value (the beginning of the digitSp), and so by using [RCX], I'm able to access the value at that address (right?), basically dereferencing a pointer. Now that I cleared that out, I can pretty much see how actually I printed that integer. Filling each byte with each digit, taken backwards (also with the line feed). Then, pretty much the opposite procedure in order to print (decreasing RCX to get back to the address at the beginning, printing each value at each address in the process). I hope I'm correct so far.

Now, there might be one more thing I think I don't get. When analyzing with GDB (also using peda), I noticed that, in the _printRaxLoop2 label, when basically decreasing RCX, I noticed, in the process, that as my addresses got closer to the one at the beginning, I saw something like:

RCX: <address>->0x313233..->('4321')

That would mean that the address currently held in RCX contains the value 4321? To me, that seems improbable, because, as before, decreasing RCX gets me to a lower address that should hold another single digit from my integer, not the whole number in reverse (up to that current address). Looking at that following hex number after the address in RCX, I notice that those values are indeed ASCII codes for 1, 2 ,3 and so on, and are in order. So, could that be some sort of strange mistake by peda? I'm asking just to clarify things now that I'm at them, sorry if my question is too long or messy.

Lastly, just a quick one: does assembly work in the same way for all "variables" declared? To be precise, if I have in section .data, or .bss any type of data, whether I say <text db "Hello">, or, like here, <variable resb 100>, does assembly only see these just as adresses in memory? So in order to do anything with them I'll have to advance byte by byte along its the "variable" 's span?

That would be all the things I had doubts about. Thank you for your help.

Offline fredericopissarra

  • Full Member
  • **
  • Posts: 373
  • Country: br
Re: Confusion about printing (an integer with multiple digits)
« Reply #4 on: February 18, 2019, 02:29:55 PM »
I think this a classical "pointer vs value" confusion almost everyone learning C or ASM makes.
Pointers are integer values. These values are memory addresses. In ASM, every label have their own memory address and you can use this value as indirect acess to data.

In your code you are reserving a buffer of 100 bytes and naming the initial address as digitSp, so the linker can resolve this 'name' to a proper address. In NASM, when you do:

Code: [Select]
  mov rcx,digitSp
You are telling the compiler do put the initial address of this buffer on RCX. Of course, this is different from:

Code: [Select]
  mov dl,[digitSp]
Where you as asking to read the data stored at the address named digitSp.

The way you can work with variables in assembly is the same you do in C or other compiled languages. It depends on the scope and initial value. If you put your "buffer" in .bss segment, this segment will be allocated by the loader, but not initialized (in a pure asm program). If you put in .data, this segment will be loaded from your executable image and your "global" variables will have their initial values as you told the compiler. Variables on .rodata segment behave as in .data segment, but are READ-ONLY... And, you can use the stack to hold "local" variables as well.

It is a good optimization to hold all variables on registers as possible, obeying a strict calling convention, but you can use memory-allocated variables if you want (it will slow your routines down, probably).

A good trick is to use GCC to build simple functions and compile them with -O2 -S -masm=intel -fno-stack-protector options to see what the compiler did. Generally, this gives you pretty good optimized routines (maybe you can do little tweaks to satisfy your needs!). Take a look at this:

Code: [Select]
void printul64( uint64_t n )
{
  char buffer[21];
  char *p, *q;
  uint32_t length;
  uint8_t rem;
 
  p = q = buffer + sizeof buffer - 1;
  *p-- = '\0';

  // Special case: if n == 0.
  if ( !n )
    *p-- = '0';
  else
    while ( n )
    {
      rem = n % 10;
      n /= 10;
      *p-- = rem + '0';
    }

  length = q - ++p;

  puts( p );
}

The compiler will create something like this:

Code: [Select]
  global printul64
printul64:
  sub  rsp, 40    ; Allocate space for 'buffer'.
  test rdi, rdi
  mov rsi, -3689348814741910323 ; Magic! :)
  mov  byte [rsp+20], 0   ; Put NUL char.
  lea  rcx, [rsp+19]
  je  .L8   ; n == 0?

.loop:
  mov  rax, rdi
  sub  rcx, 1

  mul  rsi          ; MUL is faster then DIV.
  shr  rdx, 3
  lea  rax, [rdx+rdx*4]
  add  rax, rax
  sub  rdi, rax
  mov  rax, rdi

  mov  rdi, rdx
  add  eax, '0'
  test rdx, rdx     ; n == 0?
  mov  byte [rcx+1], al
  jne  .loop

  lea  rdi, [rcx+1]
  call  puts@PLT
  add  rsp, 40       ; Reclaim allocated buffer.
  ret

  ; Special case when n == 0.
.L8:
  lea  rcx, [rsp+18]
  mov  byte [rsp+19], '0'

  lea  rdi, [rcx+1]
  call  puts@PLT
  add  rsp, 40       ; Reclaim allocated buffer.
  ret

Here it uses the stack to hold the local array 'buffer'. And, notice the DIV instruction is missing, instead, it uses MUL because it is 6 times faster. You can substitute libc call puts() to your own printing routine (syscall_write, for example).

[]s
Fred
« Last Edit: February 18, 2019, 02:33:52 PM by fredericopissarra »