Author Topic: Opengl/OpenAL game 100% NASM x86_64 Assembly  (Read 3660 times)

Offline Rodrigo Robles

  • Jr. Member
  • *
  • Posts: 10
Opengl/OpenAL game 100% NASM x86_64 Assembly
« on: June 28, 2023, 03:23:34 AM »
When I first saw x86_64 I was amazed. 16 general purpose 64-bit registers plus 16 128-bit floating point registers is much more than a guy raised with 6502 could imagine.

I thought that would be so easy to code that Assembly so the effort would be close to write C code. Then some time ago I decided to write a little OpenGL/openal game 100% Assembly to measure the productivity and prove the viability of writing large programs in x86_64. In the last years I made some little retro games for Android in JavaScript so I could make a comparison.

I choose to make a revamp for the classic 1982's Attack of the Timelord. Here is the sources: https://gitlab.com/RodrigoRobles/trevaskas-2

Here is a screenshot of the game:


The graphics are quite simple because it is a retro game, but there is no obstacle to make larger games with fancy graphics with pure x86_64 Assembly.

As I expected, the productivity in hours/FP (for not much optimized code) was close to the JavaScript, wich proves that "basic" x86_64 is much easier to write than previous 8-bit or x86 architectures. (Of course optimized modern multithreading SIMD code costs much more than ordinary x86_64 code)

It proves Randall Hyde's point of view:
"Software engineers estimate that developers spend only about thirty percent of their time coding a solution to a problem. Even if it took twice as much time to write a program in assembly versus some HLL, there would only be a fifteen percent difference in the total project completion time. In fact, good assembly language programmers do not need twice
as much time to implement something in assembly language."

Being happy with the results, later I wrote a paper about the theme of large x86_64 Assembly programs: https://drive.google.com/uc?id=1_fKS97tb0UzWJ0RqZXpfTA8odrkCK5bE&export=download

Also created an itch.io page: https://rodrigo-robles.itch.io/trevaskas-ii

You can see a video of gameplay here: https://youtu.be/GzBffhLwkR4

Offline Deskman243

  • Jr. Member
  • *
  • Posts: 49
Re: Opengl/OpenAL game 100% NASM x86_64 Assembly
« Reply #1 on: June 28, 2023, 11:51:54 AM »
If that library is closed source and based on C how is any of that possible?

Offline fredericopissarra

  • Full Member
  • **
  • Posts: 368
  • Country: br
Re: Opengl/OpenAL game 100% NASM x86_64 Assembly
« Reply #2 on: June 28, 2023, 12:48:46 PM »
Just a couple of considerations on the source code (and your "paper")...

1. You don't need to align the stack pointer to DQWORD if you are not using the stack. sub rsp,8 and add rsp,8 as prolog and epilog aren't necessary all the time;

2. If you are loading a 32 bits value into a 64 bits register, use E?? instead of R??. The instruction will be smaller and faster (since there's no REX prefix if registers below R8 are used). For example, instead of xor rax,rax, use xor eax,eax.

3. There's no real gain to use assembly for C like routines, unless you are prepared to optimize the code in ways GCC can't do. Example, to use SSE4.2 for string routines. GCC do a better job with integer divisions, for example, than simply using div/idiv (specially with literal divisors). I recommend to consider to create freestanding routines in C.

4. To use -fno-pie is against SysV ABI for x86-64, you should consider to use rip relative effective addressing in your code.

Overall the code is very good! Just for fun, I'm trying to optimize my way and show to you here, if there is interest in such a thing...

[]s
Fred

Offline fredericopissarra

  • Full Member
  • **
  • Posts: 368
  • Country: br
Re: Opengl/OpenAL game 100% NASM x86_64 Assembly
« Reply #3 on: June 28, 2023, 04:28:32 PM »
Another thing... this:
Code: [Select]
  section .data
  ...
width:  dq 1
  ...
  section .text
  ...
  movq xmm0,[width]
  ...
Will not load 1.0 (double) in XMM0, but a QWORD 1 (0x00000001). The correct approach is to convert the integer representation to double as in:
Code: [Select]
  ; Casting necessary because you can use a dword reference as well...
  cvtsi2sd xmm0,qword [width]
The other way around as well:
Code: [Select]
  ; write the double as an integer
  cvtsd2si rax,xmm0  ; destination MUST be a register.
  mov [width],rax

And... the default for NASM is 32 bits code, it is recommended you tell the compiler your code is 64 bits and using RIP relative addressing, at the beginning:
Code: [Select]
  bits 64
  default rel
And all effective addresses loaded to registers should be done with LEA, like:
Code: [Select]
  mov eax,1
  mov edi,eax
  lea rsi,[msg]  ; this is a rip relative effective address.
  mov edx,msg_size
  syscall
...
msg: db `Hello\n`
msg_size equ ($ - hello)
« Last Edit: June 28, 2023, 04:33:18 PM by fredericopissarra »

Offline Rodrigo Robles

  • Jr. Member
  • *
  • Posts: 10
Re: Opengl/OpenAL game 100% NASM x86_64 Assembly
« Reply #4 on: June 29, 2023, 09:35:44 PM »
If that library is closed source and based on C how is any of that possible?

Are you talking about Opengl and Openal?
They are not part of the project, the game call these libraries to render graphics and play sound. Theoretically one could call directly Linux audio and video drivers, but it would be really uncommon.
By the way, most (or all?) Linux distros uses opensource libraries for this (libopengl, libopenal, freeglut).

Offline fredericopissarra

  • Full Member
  • **
  • Posts: 368
  • Country: br
Re: Opengl/OpenAL game 100% NASM x86_64 Assembly
« Reply #5 on: June 29, 2023, 11:14:40 PM »
Hi, Rodrigo. A question first: Are you brazillian? (I am!)

Well... about your code, as I said before, there are some improvements you can do: First is to avoid using floating point whatsoever, since your projection is orthogonal. Second, is to use lea instruction to load registers with the effective address, using RIP relative addressing mode. I believe you used -fno-pie when linking is because you got tons of relocation errors with something like this:
Code: [Select]
  mov rsi,label ; lea rsi,[label] is better.
  ...
label: dq 0
Third, you are using OpenGL in compatible mode, something pretty outdated nowadays. And fourth: Yep, you are, in essence, writing a C program, but using assembly (and, most of the time, poorly -- sorry).

I'm not saying the game isn't well structured of "wrong". No... it is good, but can be way better.

For example, instead of doing some calculations using R?? registers, you could do using E??, because the coordinates will never be longer than, let's say, 11 bits (2¹¹-1 = 2047). Keeping the majority of the code 32 bits will make it a lot smaller (and faster).

Trying to obey SysV ABI is another point...

And I didn't understand why you create your own mystrcmp when you are using glibc (linked to your code by GCC). Why you are using double precision? Since all OpenGL functions dealing with floating point uses single precision (float)... AND, take functions like glBindTexture, which take 2 int arguments, but your textures use a QWORD (via RSI) instead of a DWORD (ESI), as well as the enumeration, just adding a REX prefix to the instructions!

Another thing is zeroing registers... Instead of xor rdx,rdx or, worse, [fount=courier]mov rdx,0[/font] you could use xor edx,edx, a 2 bytes long instruction... And, with floating point, something like xorps xmm0,xmm0 is faster and smaller then movq xmm0,[DQ_ZERO]. At the same time, instead of using movq or movd, assuming (correctly) that a double is a QWORD (and a float is a DWORD), the use of movss or movsd is more clear and with no penalty. Ahhh... vzeroall is an AVX/AVX2 instruction, not available in all processors supporting x86-64 mode.

Since you are using SSE2, I think some routines should make a better use of vectorization as well.

I partially agree with you about "libraries", but, since the idea is to create a game in assembly, I woun't use glut or OpenAL, but XLib (or XCB) to create the fullscreen (or Window) and ALSA for sound, leaving only OpenGL to be used for graphics. Since it doesn't depends on glibc, you could create a more "pure" assembly code this way without havind to deal with "drivers" directly.
« Last Edit: June 30, 2023, 08:17:42 PM by fredericopissarra »

Offline Deskman243

  • Jr. Member
  • *
  • Posts: 49
Re: Opengl/OpenAL game 100% NASM x86_64 Assembly
« Reply #6 on: June 30, 2023, 07:55:11 PM »
I really like to provide a measured response whenever we have a specification for review. I think this presentation certainly has an admirable amount of yields from build. In particular I'd like to reflect on a few curious contestions of these.
You have here a certain amount of references to platforms outside of the standard NASM environment. I was intrigued by how there is even a reference to javascript right beside the build tools however I'm not clear on how this relates. Intriguingly this gives a contrast whereby the down turn section there is also a relation to a weaker tool sets however there's a claim of unmodified versions of the communities' source code.  Also the subject is referenced only remotely and could be more relevant if you actually posted these type of figures. It may appear difficult for an ordinary investigation however other than that the actual performance figures are in fact the type of details that conserves the status for a good review.

Good Job and Cheers!

Offline alCoPaUL

  • Jr. Member
  • *
  • Posts: 68
  • Country: ph
    • Webpage
Re: Opengl/OpenAL game 100% NASM x86_64 Assembly
« Reply #7 on: July 01, 2023, 03:33:44 PM »
<wrong thread, lelz>
should be here https://forum.nasm.us/index.php?topic=3741.0
« Last Edit: July 01, 2023, 03:37:38 PM by alCoPaUL »

Offline Rodrigo Robles

  • Jr. Member
  • *
  • Posts: 10
Re: Opengl/OpenAL game 100% NASM x86_64 Assembly
« Reply #8 on: July 01, 2023, 03:41:52 PM »
1. You don't need to align the stack pointer to DQWORD if you are not using the stack. sub rsp,8 and add rsp,8 as prolog and epilog aren't necessary all the time;

You're right. It's required only for some SIMD or FPU instructions. I'm doing this to all the functions in a defensive strategy to avoid random errors, but it surely can be removed from some functions. In the paper I pointed that is not always necessary.

2. If you are loading a 32 bits value into a 64 bits register, use E?? instead of R??. The instruction will be smaller and faster (since there's no REX prefix if registers below R8 are used). For example, instead of xor rax,rax, use xor eax,eax.

I was afraid of partial register stalls, but after your feedback I did some research and saw that really is no penalty for accessing 32-bit registers, it happens only when accessing 8-bit or 16-bit registers. Now I'm aware of this surely I will use a lot more 32-bit data and code in my next x86_64 programs.

3. There's no real gain to use assembly for C like routines, unless you are prepared to optimize the code in ways GCC can't do. Example, to use SSE4.2 for string routines. GCC do a better job with integer divisions, for example, than simply using div/idiv (specially with literal divisors). I recommend to consider to create freestanding routines in C.

Yes, In the performance standpoint there's no gain for write assembly like a C compiler. In this particular program my goal was not to reach maximum optimization, but try the viability of 100% large Assembly programs in terms of tech difficulty and cost. Anyway I'm taking seriously your feedback and I will make better use of optimizations in the future.

4. To use -fno-pie is against SysV ABI for x86-64, you should consider to use rip relative effective addressing in your code.

Thanks for this hint. I was not aware of the advantages of RIP-relative addressing and position independent executables. Be sure I will use this features in my next projects.

Overall the code is very good! Just for fun, I'm trying to optimize my way and show to you here, if there is interest in such a thing...

Thank you. And of course I'm interested in your feedback about the optimizations.

Offline Rodrigo Robles

  • Jr. Member
  • *
  • Posts: 10
Re: Opengl/OpenAL game 100% NASM x86_64 Assembly
« Reply #9 on: July 02, 2023, 03:19:57 PM »
Another thing... this:
Code: [Select]
  section .data
  ...
width:  dq 1
  ...
  section .text
  ...
  movq xmm0,[width]
  ...
Will not load 1.0 (double) in XMM0, but a QWORD 1 (0x00000001). The correct approach is to convert the integer representation to double as in:
Code: [Select]
  ; Casting necessary because you can use a dword reference as well...
  cvtsi2sd xmm0,qword [width]
The other way around as well:
Code: [Select]
  ; write the double as an integer
  cvtsd2si rax,xmm0  ; destination MUST be a register.
  mov [width],rax

I did not found the code above in this program ("width:  dq 1" or "movq xmm0,[width]"). The program has a width variable which is uninitialized and it's used in some functions as an integer and in other functions as a float.

And... the default for NASM is 32 bits code, it is recommended you tell the compiler your code is 64 bits and using RIP relative addressing, at the beginning:
Code: [Select]
  bits 64
  default rel
And all effective addresses loaded to registers should be done with LEA, like:
Code: [Select]
  mov eax,1
  mov edi,eax
  lea rsi,[msg]  ; this is a rip relative effective address.
  mov edx,msg_size
  syscall
...
msg: db `Hello\n`
msg_size equ ($ - hello)

Looks like -felf64 already sets NASM to 64-bit mode, anyway is a good suggestion to use BITS 64 to ensure the mode independent of the command line used.

Now I'm aware of the advantages of rip-relative addressing certainly I will use it in my next projects.

Offline fredericopissarra

  • Full Member
  • **
  • Posts: 368
  • Country: br
Re: Opengl/OpenAL game 100% NASM x86_64 Assembly
« Reply #10 on: July 02, 2023, 07:00:39 PM »
Looks like -felf64 already sets NASM to 64-bit mode, anyway is a good suggestion to use BITS 64 to ensure the mode independent of the command line used.
Not quite. bits 64 tells NASM that the code is for x86-64 mode. This is important because INC/DEC instructions, for example, have different opcodes at 32 and 64 bits. While -f elf64 only tells NASM that and ELF x86-64 object file will be created.

Offline Rodrigo Robles

  • Jr. Member
  • *
  • Posts: 10
Re: Opengl/OpenAL game 100% NASM x86_64 Assembly
« Reply #11 on: July 03, 2023, 02:43:09 AM »
Hi, Rodrigo. A question first: Are you brazillian? (I am!)

Yes, I'm also brazilian.

Well... about your code, as I said before, there are some improvements you can do: First is to avoid using floating point whatsoever, since your projection is orthogonal. Second, is to use lea instruction to load registers with the effective address, using RIP relative addressing mode. I believe you used -fno-pie when linking is because you got tons of relocation errors with something like this:
Code: [Select]
  mov rsi,label ; lea rsi,[label] is better.
  ...
label: dq 0

Now I'm aware that accessing 32-bit registers generates no penalty, I probably will use it a lot more in the future. The same for rip-relative addressing.

Third, you are using OpenGL in compatible mode, something pretty outdated nowadays. And fourth: Yep, you are, in essence, writing a C program, but using assembly (and, most of the time, poorly -- sorry).

Compatible mode is really very outdated. This was a cheap architecture choice I made. But I really want to move to a more modern opengl in the next project.
This "C accent" was unavoidable, I believe it should reduce as I improve my x86_64 skills.

I'm not saying the game isn't well structured of "wrong". No... it is good, but can be way better.

For example, instead of doing some calculations using R?? registers, you could do using E??, because the coordinates will never be longer than, let's say, 11 bits (2¹¹-1 = 2047). Keeping the majority of the code 32 bits will make it a lot smaller (and faster).

Ok, I'm already convinced of the advantages of 32-bit code.  :)

Trying to obey SysV ABI is another point...

And I didn't understand why you create your own mystrcmp when you are using glibc (linked to your code by GCC). Why you are using double precision? Since all OpenGL functions dealing with floating point uses single precision (float)... AND, take functions like glBindTexture, which take 2 int arguments, but your textures use a QWORD (via RSI) instead of a DWORD (ESI), as well as the enumeration, just adding a REX prefix to the instructions!

I didn't call a single libc function, it's there as a dependency of opengl/openal/glut. I'm using opengl/opengl because is almost mandatory to access video and sound, but I can successfully avoid any other libraries.
I was trying to use 64-bit in everything I could, I was afraid to get some penalties for using 32-bit, but in the end I got penalized for using 64-bit, wasting memory and machine code where 32-bit should be used.

Another thing is zeroing registers... Instead of xor rdx,rdx or, worse, [fount=courier]mov rdx,0[/font] you could use xor edx,edx, a 2 bytes long instruction... And, with floating point, something like xorps xmm0,xmm0 is faster and smaller then movq xmm0,[DQ_ZERO]. At the same time, instead of using movq or movd, assuming (correctly) that a double is a QWORD (and a float is a DWORD), the use of movss or movsd is more clear and with no penalty. Ahhh... vzeroall is an AVX/AVX2 instruction, not available in all processors supporting x86-64 mode.

Nice optimization hints, I should pay more atention to this.
About vzeroall, I'm considering AVX2 as a minimal requirement to this program.

Since you are using SSE2, I think some routines should make a better use of vectorization as well.

I avoided this level of optimization by purpose to get a faster deliver. But I would like to use more vectorization in the future.

I partially agree with you about "libraries", but, since the idea is to create a game in assembly, I woun't use glut or OpenAL, but XLib (or XCB) to create the fullscreen (or Window) and ALSA for sound, leaving only OpenGL to be used for graphics. Since it doesn't depends on glibc, you could create a more "pure" assembly code this way without havind to deal with "drivers" directly.

Looks like that will not be easy to get rid from libc, according to ldd both libX11.so and libGL.so depend on this. Anyway I can at least avoid it in my own code. I like the suggestion of using xlib and alsa, since it's more low level than glut and openal.

I want to thank you for all this comments, it's the most valuable feedback I received about this program until now.

Offline fredericopissarra

  • Full Member
  • **
  • Posts: 368
  • Country: br
Re: Opengl/OpenAL game 100% NASM x86_64 Assembly
« Reply #12 on: July 03, 2023, 12:09:38 PM »
Yes, I'm also brazilian.
I'm from Vitória-ES! ;)

Compatible mode is really very outdated. This was a cheap architecture choice I made. But I really want to move to a more modern opengl in the next project.
This "C accent" was unavoidable, I believe it should reduce as I improve my x86_64 skills.
Since you're using an othogonal projection the vertex shader will be very simple and you can ditch those matrix manipulation functions... ;)

I didn't call a single libc function, it's there as a dependency of opengl/openal/glut. I'm using opengl/opengl because is almost mandatory to access video and sound, but I can successfully avoid any other libraries.
I was trying to use 64-bit in everything I could, I was afraid to get some penalties for using 32-bit, but in the end I got penalized for using 64-bit, wasting memory and machine code where 32-bit should be used.
Yep, some libraries depends on libc, but your code don't need to include libc dependency. libGL.so, libglut.so, etc will load their own dependency by themselves...

Nice optimization hints, I should pay more atention to this.
About vzeroall, I'm considering AVX2 as a minimal requirement to this program.
Thanks, take notice that, even in x86-64 mode, your processor is, still, a 32 bits one. 64 bits mode is an extension.
If you are considering using AVX2 (or SSE greater then 2, the same goes for FMA, BMI, AVX-512, ...) you should test if the processor support it. For example, I deal with some virtual machines which supports AVX, but not AVX2. In x86-64 mode the only garantee you have about SIMD is SSE and SSE2.

Looks like that will not be easy to get rid from libc, according to ldd both libX11.so and libGL.so depend on this. Anyway I can at least avoid it in my own code. I like the suggestion of using xlib and alsa, since it's more low level than glut and openal.
As said before, yep, those libs could depend on libc, but not your program.

I want to thank you for all this comments, it's the most valuable feedback I received about this program until now.
The pleasure is all mine!

Offline Rodrigo Robles

  • Jr. Member
  • *
  • Posts: 10
Re: Opengl/OpenAL game 100% NASM x86_64 Assembly
« Reply #13 on: July 06, 2023, 01:03:40 AM »
I really like to provide a measured response whenever we have a specification for review. I think this presentation certainly has an admirable amount of yields from build. In particular I'd like to reflect on a few curious contestions of these.
You have here a certain amount of references to platforms outside of the standard NASM environment. I was intrigued by how there is even a reference to javascript right beside the build tools however I'm not clear on how this relates. Intriguingly this gives a contrast whereby the down turn section there is also a relation to a weaker tool sets however there's a claim of unmodified versions of the communities' source code.  Also the subject is referenced only remotely and could be more relevant if you actually posted these type of figures. It may appear difficult for an ordinary investigation however other than that the actual performance figures are in fact the type of details that conserves the status for a good review.

Good Job and Cheers!

Thanks for the feedback!