Author Topic: cat rewrite for x86-64 on Linux  (Read 22757 times)

Offline 0xFF

  • Jr. Member
  • *
  • Posts: 15
cat rewrite for x86-64 on Linux
« on: November 03, 2013, 12:31:18 PM »
Hi guys,

First time post & code review, so be gentle. :D

EDIT: I should add that at the moment I'm kind of schizophrenic about this. I'm trying to avoid using external c/c++ functions to keep my learning firmly in the bits and registers, minimal abstraction, but at the same time, I'm using gcc to link. When I used ld, my argv/argc setup didn't work, so one question for me is whether there are and what any differences may be between gcc and ld for how they provide arguments to programs in x86-64 conventions... gcc does it how I would expect, in RDI and RSI, but ld fails me...?

I find a perennial problem in learning different languages and programming in general is the question of finding some reason to code. One solution I guess is to find a program or utility you think you could replicate and rewrite it, so to that end I decided to rewrite the basic functionality of cat using x86-64 instructions. I wrote the basic functionality, then realised as I was about to run it, I could easily break the thing by giving it a bad filename; error checking and reporting was needed. I wrote a little library function to check the error number and spit out an appropriate message. I later rewrote it to use a little string printing function I wrote to make printing strings simpler, c like, print until a null. That saved all the hassles of using equ $-stringlabel length tailor made to every string.

Anyway, any feedback would be good. Many thanks for your time:

The Cat64 source:

Code: [Select]
SECTION .data

SECTION .bss

ByteBuff resb 256

SECTION .text

extern ORWerrors
global main
main:
nop ;traditional nop
push rbp ;stack frame setup
mov rbp, rsp ;this is really unnecessary

cmp rdi, 2 ;check whether more than one argument was given at command line
jl Exit ;jump to exit if only 1 command argument

;opening the file
mov rax, 2 ;specify open syscall
mov rdi, [rsi+8] ;placing argv[1] in rdi for syscalls
mov rsi, 0 ;setting flags to 0, read only
mov rdx, 0 ;do I have to include this mode specification?
syscall ;call the kernel
cmp rax, 0 ;check for errors on opening file
jl Error ;if less than zero, error
mov rbx, rax ;saving file descriptor into rbx where it is safe across calls

read:
mov rax, 0 ;read syscall number
mov rdi, rbx ;placing file descriptor into
mov rsi, ByteBuff ;moving buffer address to rsi
mov rdx, 256 ;specity to read 256 bytes
syscall ;call the kernel
cmp rax, 0 ;check for EOF
jl Error ;if less than zero, error
jne print ;if not EOF, jump over to print

;close and jump to exit
mov rax, 3 ;specify close syscall
mov rdi, rbx ;specify file descriptor
syscall ;call the kernel
cmp rax, 0 ;check for errors
jl Error ;if less than zero, error
jmp NoError ;jump to exit

print:
mov rdx, rax ;mov number of bytes read to rdx
mov rax, 1 ;specify print (sys_write) syscall
mov rdi, 1 ;specify stdout file descriptor
mov rsi, ByteBuff ;mov buffer address to rsi to prepare for printing syscall
syscall ;call the kernel
cmp rax, 0 ;check for errors
jl Error ;if less than zero, error
jmp read ;jump back to read for another loop
Error:
call ORWerrors ;call the external error handler
mov rdi, rax ;move an error code to rdi
jmp Exit ;nothing more to do
NoError:
mov rdi, 0 ;if no errors occured, will report 0 error
Exit:
mov rsp, rbp ;clean up stack frame
pop rbp ;recover rbp for gcc init
ret ;return to gcc's handler

The error handler:

Code: [Select]
;USAGE: A calling program should simply check any open, read
; or write operation against the return value in RAX
; and then call this procedure immediately without
; modifying the value in RAX. The procedure will then
; determine and report the appropriate type of error,
; using my SysPrint procedure on stderror. When finished,
; the original error is restored in RAX and the function
; returns to the caller.


SECTION .data

;ERROR MESSAGE LIST -NULL TERMINATED STRINGS FOR ALL OPEN, READ AND WRITE ERRORS:
ORWEPERM: db "Error 1: Operation not permitted.",10,0
ORWENOENT: db "Error 2: No such file or directory.",10,0
ORWEINTR: db "Error 4: Interrupted system call.",10,0
ORWEIO: db "Error 5: I/O error.",10,0
ORWENXIO: db "Error 6: No such device or address.",10,0
ORWEBADF: db "Error 9: Bad file number.",10,0
ORWEAGAIN: db "Error 11: Try again error or operation would block error.",10,0
ORWENOMEM: db "Error 12: Out of memory.",10,0
ORWEACCES: db "Error 13: Access/permission error.",10,0
ORWEFAULT: db "Error 14: Bad file number.",10,0
ORWEEXIST: db "Error 17: File alredy exists.",10,0
ORWENODEV: db "Error 19: No such device.",10,0
ORWENOTDIR: db "Error 20: Bad path or directory name.",10,0
ORWEISDIR: db "Error 21: Directory specified as file.",10,0
ORWEINVAL: db "Error 22: Invalid argument.",10,0
ORWENFILE: db "Error 23: File table overflow.",10,0
ORWEMFILE: db "Error 24: Too many files open.",10,0
ORWETXTBSY: db "Error 26: Text file busy.",10,0
ORWEFBIG: db "Error 27: File too large.",10,0
ORWENOSPC: db "Error 28: No space on device.",10,0
ORWEROFS: db "Error 30: Read only file system.",10,0
ORWEPIPE: db "Error 32: Broken pipe.",10,0
ORWENAMETOOLONG: db "Error 36: File name too long.",10,0
ORWELOOP: db "Error 40: Symbolic loop too long.",10,0
ORWEOVERFLOW: db "Error 75: Value too large for data type.",10,0
ORWEUNK: db "Unknown Error: That's all I know!",10,0

;Array of possible error codes and pointers to corresponding strings.This method is
;a silly hack to get around the factthat I've only included open, read and write errors,
;instead of a full list.Including a whole list probably wouldn't be any significant size
;addition, so;may be a sensible additioal project. This hack also allows me to avoid the
;alternative method of having a brutally long 'cmp','jne/load appropriate string' list
;for every error which to me seems an ugnly solution. Wish I had a heap!

ORWerrnum: dq 1,2,4,5,6,9,11,12,13,14,17,19,20,21,22,23,24,26,27,28,30,32,36,40,75
ORWerrno: dq ORWEPERM,ORWENOENT,ORWEINTR,ORWEIO,ORWENXIO,ORWEBADF,ORWEAGAIN,ORWENOMEM,ORWEACCES,ORWEFAULT,ORWEEXIST,ORWENODEV,ORWENOTDIR,ORWEISDIR,ORWEINVAL,ORWENFILE,ORWEMFILE,ORWETXTBSY,ORWEFBIG,ORWENOSPC,ORWEROFS,ORWEPIPE,ORWENAMETOOLONG,ORWELOOP,ORWEOVERFLOW,ORWEUNK


SECTION .text
extern SysPrint
GLOBAL ORWerrors
ORWerrors:
mov r8, rax ;saving error message to return to caller
;caller may wish to return error on exit etc
neg rax ;changing negative error value to positive code
mov rbx, 0 ;set loop count to zero
;move adress of error number array into rcx
loop: ;loop to compare various errnos
cmp rax, [ORWerrnum+rbx*8];+rbx*8];compare errno in rax to errno at loop number * qword size
je showerror ;if equal, jump to showerror
inc rbx ;otherwise, loop count++
cmp rbx, 25 ;check if all possible errors exhausted
je showerror ;if equal, errno is unknown, jump to showerror
jmp loop ;jump back to another loop

showerror:
mov rsi, [ORWerrno+rbx*8];place adddress for appropriate error message in rdi
mov rdi, 2 ;specify write to stderror
call SysPrint ;call SysPrint function

mov rax, r8 ;returning original error code to caller
ret ;returning to caller, assuming they will handle exit

String printing procedure:

Code: [Select]
SECTION .text

global SysPrint
SysPrint:
;assuming that a pointer to the string to be printed is loaded into
;rsi, the file to print in rdi and that the string is null terminated
PrintLoop:
cmp byte [rsi], 0 ;is char at rsi+rbx null?
je end ;if char is null, go to end and return
mov rdx, 1 ;specify length: printing 1 byte.
mov rax, 1 ;specify print syscall
syscall ;call the kernel
cmp rax, 0 ;check for errors
jl end ;if any errors, jump to end and return
inc rsi ;increment rsi by one, points to next byte
jmp PrintLoop ;jump back to PrintLoop to print next byte
end:
ret ;return to the caller

Cheers!
« Last Edit: November 03, 2013, 12:37:32 PM by 0xFF »

Offline encryptor256

  • Full Member
  • **
  • Posts: 250
  • Country: lv
  • Win64 .
    • On Youtube: encryptor256
Re: cat rewrite for x86-64 on Linux
« Reply #1 on: November 03, 2013, 01:20:32 PM »
Hi!

What is cat, some Linux specific thing?  ::)

Quote
gcc does it how I would expect, in RDI and RSI, but ld fails me...?

Sounds interesting!
Gcc does a lot of things, calls some subprograms with one kilometer of arguments, then it finally calls that ld thing.
Maybe one of those arguments clarify calling convetion.

Maybe someone would like to try this out: It would be nice, if you add, the way you compiled it, (compiler, linker commands).

Global labels and local labels.
Which is local, global: This would be the way, how to figure it out, in a blink of an eye.

Code: [Select]
global SysPrint
SysPrint:
...
SysPrint.PrintLoop:
...
je SysPrint.end
...
jl SysPrint.end
...
jmp SysPrint.PrintLoop
...
SysPrint.end:
ret

Or

Code: [Select]
global SysPrint
SysPrint:
...
.PrintLoop:
...
je .end
...
jl .end
...
jmp .PrintLoop
...
.end:
ret

Encryptor256!

EDIT:

What is cat, some Linux specific thing?  ::)

Well i found it:

http://www.cyberciti.biz/faq/howto-use-cat-command-in-unix-linux-shell-script/

The cat command is considered as one of the most frequently used commands on Linux or UNIX like operating systems.

It can be used for the following purposes under UNIX or Linux:
  • Display text files on screen.
  • Copy text files.
  • Combine text files.
  • Create new text files.

Encryptor256, again!
« Last Edit: November 03, 2013, 01:43:05 PM by encryptor256 »
Encryptor256's Investigation \ Research Department.

Offline 0xFF

  • Jr. Member
  • *
  • Posts: 15
Re: cat rewrite for x86-64 on Linux
« Reply #2 on: November 03, 2013, 03:10:00 PM »
Hey man, thanks for the feedback.

Yep cat's a little gem for when you're working in the shell. It means you don't have to bother going into any text editor or something like that, you just run it, and it'll dump a file as chars on your terminal. Leads to some funky output when you try to cat a file that's not human readable (rather like opening up an executable or something in notepad, if you're a windows person. With output redirection you can dump one file into another, or other interesting stuff, or invoke the utility as tac (cat backwards) and get the file in reverse, Handy... I guess sometimes you need LIFO files, maybe for parsing stuff? Cat's kind of everyday usefulness in a shell is about on par with ls, or dir for you windows folks. :) Trivia: the name comes (really inappropriately) from 'catenate' which is a synonym for concatenate, and which would be far easier to type when you're hacking together strings in excel. :D

So, compiling instructions were
nasm -f -elf64 -o SysPrint.o SysPrint.asm
nasm -f -elf64 -o ORWerrors.o ORWerrors.asm
nasm -f -elf64 -o Cat64.o Cat64.asm
gcc -o Cat64 Cat64.o ORWerrors.o SysPrint.o

I really should be using makefiles for this stuff, that would make life far easier.

Regarding GCC and arguments, what I mean is that after doing all the crap that I doing really need :P gcc hands me, on a silver platter, the number of arguments given to the program at the command line in rdi, and a pointer to an array of pointers (qword size, being 64 bit code) to the argvstrings (like **argv in c style code, so rsi points to argv[0]). ld can do the same thing.
« Last Edit: November 03, 2013, 03:16:23 PM by 0xFF »

Offline Gunner

  • Jr. Member
  • *
  • Posts: 74
  • Country: us
    • Gunners Software
Re: cat rewrite for x86-64 on Linux
« Reply #3 on: November 03, 2013, 08:30:51 PM »
You can sure use ld to link your code and get the command line parameters!  You just have to access them a different way.  When linking with gcc, it adds startup code that gets called before your main proc, it parses the command line and then passes argv and argc to your program in the proper registers.  When using just ld, argv and argc are passed to your program on the stack...

argc = [rsp]
argv = [rsp + 8 * ARG_NUMBER] (where ARG_NUMBER is the 1 based index of the param to get)

./test hello there

[rsp + 8 * 1] == .test
[rsp + 8 * 2] == hello
[rsp + 8 * 3] == there
etc...
« Last Edit: November 03, 2013, 08:33:19 PM by Gunner »

Offline 0xFF

  • Jr. Member
  • *
  • Posts: 15
Re: cat rewrite for x86-64 on Linux
« Reply #4 on: November 04, 2013, 07:01:06 AM »
Ahh, sweet Gunner, thanks!

I'd figured that, where GCC was following the new x86-64 calling conventions of passing arguments in registers, maybe ld (or rather the kernel etc) was using the plain x86 convention of passing on the stack. I guess GCC also has them on the stack, as you say, and just gives me a pointer to argv[] and puts argc in the registers. That said, I've done a little debugging and reverse engineering (nothing impressive, mostly staring at dissassembly before I was more confident and being massively bewildered by it all), and was amazed by the incredible level of glut that gcc throws into an elf. I'll definitely rewrite this for ld only.

It's not necessarily practical, but the (sensible) change to greater register use in x86-64 in preference to the stack (passing args is so much simpler) makes me obsessed with keeping everything I need in the registers, leaving the stack alone. At some point I'm sure this will backfire on me when recursion destroys some vital data I need mid-runtime, leadng to a frustrating hour of trawling through the various calls. :D

Offline Rob Neff

  • Forum Moderator
  • Full Member
  • *****
  • Posts: 429
  • Country: us
Re: cat rewrite for x86-64 on Linux
« Reply #5 on: November 04, 2013, 05:23:44 PM »
As Gunner correctly states gcc will add additional startup code to your program - ld will not.  This code handles much of the environment, time zone, exception handling, etc. that your initial code probably doesn't need at first.  However, once your programming grows to significant proportions you reach the point were having the extra code is more beneficial than trying to reinvent that wheel once you realize you want/need that stuff in your program.

Yes, x64 programming in Linux is very liberating compared to x32.  For small functions you'll almost never need to use the stack given that you now have twice as many registers available for your use.  Also, the x64 calling convention on Linux uses way more registers for parameter passing than Windows.  However, the convention itself is trickier on Linux than Windows with regard to stack usage.

Offline 0xFF

  • Jr. Member
  • *
  • Posts: 15
Re: cat rewrite for x86-64 on Linux
« Reply #6 on: November 05, 2013, 11:03:33 AM »
Thanks for all the feedback and info guys, it's cleared up some important questions I've had. I've rewritten the main program to be linked with ld. There's a massive drop in the file size, and I guess if this were something that was operating in large batches, it would make a speed difference too. Revised code below:

Code: [Select]
;PROGRAM: Cat64.asm
;AUTHOR: 0xFF
;DATE: 27/10/2013
;DESCRIPTION This program is a simple replication of the
; cat coreutil from linux which takes input
; from a file and writes its contents into
; stdout. For error handling and reporting it
; must be linked with my file handling error
; function ORWerrors.asm.
;
;COMPILE INSTRUCTIONS:
; nasm -f elf64 -o Cat64.o Cat64.asm
; ld -o Cat64 Cat64.o ../ORWErrors/ORWerrors.o ../SysPrint/SysPrint.o



SECTION .data

SECTION .bss

ByteBuff resb 256

SECTION .text

extern ORWerrors
global _start
_start:
nop ;traditional nop
cmp qword [rsp], 2 ;check argc at rsp: was an argument given?
jl Exit ;jump to exit if only 1 command argument
;(if only 1, then no file specified)
;opening the file
mov rax, 2 ;specify open syscall
mov rdi, [rsp+16] ;placing argv[1] in rdi for syscalls
mov rsi, 0 ;setting flags to 0, read only
mov rdx, 0 ;do I have to include this mode specification?
syscall ;call the kernel
cmp rax, 0 ;check for errors on opening file
jl Error ;if less than zero, error
mov rbx, rax ;saving file descriptor into rbx where it is safe across calls

read:
mov rax, 0 ;read syscall number
mov rdi, rbx ;placing file descriptor into
mov rsi, ByteBuff ;moving buffer address to rsi
mov rdx, 256 ;specity to read 256 bytes
syscall ;call the kernel
cmp rax, 0 ;check for EOF
jl Error ;if less than zero, error
jne print ;if not EOF, jump over to print

;close file and jump to exit
mov rax, 3 ;specify close syscall
mov rdi, rbx ;specify file descriptor
syscall ;call the kernel
cmp rax, 0 ;check for errors
jl Error ;if less than zero, error
jmp NoError ;jump to exit

print:
mov rdx, rax ;mov number of bytes read to rdx
mov rax, 1 ;specify print (sys_write) syscall
mov rdi, 1 ;specify stdout file descriptor
mov rsi, ByteBuff ;mov buffer address to rsi to prepare for printing syscall
syscall ;call the kernel
cmp rax, 0 ;check for errors
jl Error ;if less than zero, error
jmp read ;jump back to read for another loop
Error:
call ORWerrors ;call the external error handler
mov rdi, rax ;move an error code to rdi
jmp Exit ;nothing more to do
NoError:
mov rdi, 0 ;if no errors occured, will report 0 error
Exit:
mov rax, 60 ;specify exit syscall
syscall ;call the kernel

i guess other tasks are to add a little usage message and start using make files to simplify compiling, but then again I've learned probably all I will from this project... Hmm, what next I wonder?

Offline encryptor256

  • Full Member
  • **
  • Posts: 250
  • Country: lv
  • Win64 .
    • On Youtube: encryptor256
Re: cat rewrite for x86-64 on Linux
« Reply #7 on: November 05, 2013, 11:29:02 AM »
Then looks like your done with this, code looks nice, well documented too, i think.
 
Quote
Hmm, what next I wonder?

Maybe a calculator.
Encryptor256's Investigation \ Research Department.