Hi, Rodrigo. A question first: Are you brazillian? (I am!)
Well... about your code, as I said before, there are some improvements you can do: First is to avoid using floating point whatsoever, since your projection is orthogonal. Second, is to use lea instruction to load registers with the effective address, using RIP relative addressing mode. I believe you used -fno-pie when linking is because you got tons of relocation errors with something like this:
mov rsi,label ; lea rsi,[label] is better.
...
label: dq 0
Third, you are using OpenGL in compatible mode, something pretty outdated nowadays. And fourth: Yep, you are, in essence, writing a C program, but using assembly (and, most of the time, poorly -- sorry).
I'm not saying the game isn't well structured of "wrong". No... it is good, but can be way better.
For example, instead of doing some calculations using R?? registers, you could do using E??, because the coordinates will never be longer than, let's say, 11 bits (2¹¹-1 = 2047). Keeping the majority of the code 32 bits will make it a lot smaller (and faster).
Trying to obey SysV ABI is another point...
And I didn't understand why you create your own mystrcmp when you are using glibc (linked to your code by GCC). Why you are using double precision? Since all OpenGL functions dealing with floating point uses single precision (float)... AND, take functions like glBindTexture, which take 2 int arguments, but your textures use a QWORD (via RSI) instead of a DWORD (ESI), as well as the enumeration, just adding a REX prefix to the instructions!
Another thing is zeroing registers... Instead of xor rdx,rdx or, worse, [fount=courier]mov rdx,0[/font] you could use xor edx,edx, a 2 bytes long instruction... And, with floating point, something like xorps xmm0,xmm0 is faster and smaller then movq xmm0,[DQ_ZERO]. At the same time, instead of using movq or movd, assuming (correctly) that a double is a QWORD (and a float is a DWORD), the use of movss or movsd is more clear and with no penalty. Ahhh... vzeroall is an AVX/AVX2 instruction, not available in all processors supporting x86-64 mode.
Since you are using SSE2, I think some routines should make a better use of vectorization as well.
I partially agree with you about "libraries", but, since the idea is to create a game in assembly, I woun't use glut or OpenAL, but XLib (or XCB) to create the fullscreen (or Window) and ALSA for sound, leaving only OpenGL to be used for graphics. Since it doesn't depends on glibc, you could create a more "pure" assembly code this way without havind to deal with "drivers" directly.