1) I doubt if Nasm is issuing that error. Nasm should accept any symbol, provided it uses only valid characters (see the manual). Your linker, ld, "knows" _start as the default entrypoint. You can override this with the "-e" switch to ld...
global commence
section .text
commence:
; your code
nasm -f elf32 myfile.asm
ld -o myfile myfile.o -e commence
... should work fine. If ld does not find an entrypoint, either _start or something you specify, it will issue a warning and take a guess where the entrypoint is. This may not be correct! C programmers, for some reason, love to put their subroutines first, and main last. You can do the same...
global commence
section .text
my_subroutine:
mov eax, ebx
ret
commence:
; your code starts...
But you'd better tell ld where your entrypoint is, or it'll guess wrong! (and will crash - see question 7)
If you're using gcc, it will link against some "C startup code" (unless you tell it not to). I think of this as "crt0.o", although I think the correct name is slightly different - we don't need to know, gcc knows. In any case, this file contains the "_start" label. Attempting to add a second one will cause ld to complain. Nasm doesn't care, and I don't think gcc does either. This is ld's job, and it can't make code with two entrypoints!
Besides containing the "_start" label, this "C startup code" does some housekeeping - rearranges environment variables and command line arguments on the stack, among other things - and calls "main", I don't think there's any way to change that name (but see question 6).
2) There's nothing "incorrect" - that's what you told it to do! Nasm uses the '$' character for several different things. In this context, it means approximately "here" - the current address in the assembly. "msg1" is the address in the assembly where it occurs. "$ - msg1" is 26 - count 'em yourself - and that's what sys_write writes. Since '$' means roughly "here", it obviously matters where it occurs. Notice that the zeros are useless in this case - sys_write is not looking for a zero-terminated string (and sys_read won't return one!). "printf" (and other C functions) are looking for a zero-terminated string. It doesn't know the length by magic, it has to calculate it by looking for the zero - every time.
3 & 4) Nasm doesn't use "ptr" (or "PTR") for anything special. Following a size specifier, it would be a syntax error, but you could use "ptr" for a variable name or a label or a macro name with no problem. I guess it's "Intel syntax". Ask them what it does. Nasm's viewpoint is that it doesn't do anything, and we don't use it. When "translating" Masm/Tasm code to Nasm, you can do:
%idefine offset
%idefine ptr
Being defined as "nothing", they effectively disappear from the code and Nasm won't choke on 'em. This can be deceptive - other changes may need to be made! You don't ask, but same idea with "offset" - we don't use it. If it's not in square brackets, it means "offset".
5) Initial values of registers at "_start" vary with kernel version. The values are known if you know the kernel version, but unless you're really desperate to save a few bytes, I wouldn't ASSume anything. I'd use "xor eax, eax" instead of "mov eax, 0" - the latter uses 4 bytes to store the 0 - but it doesn't make much difference. You don't need to zero a register before you load it with another value. If you're using partial registers - al or ah or ax - you may want to have the other bytes in a known state... but you might not need to.
At the "main" label, I imagine it depends on the implementation of C. The (32-bit!) calling convention requires that certain registers - ebp, ebx, esi, and edi - be "preserved" (if you change 'em, put 'em back the way they were). This is not always enforced. This applies if you exit with "ret". If you exit with sys_exit (or "exit()"), it doesn't matter what values are in the registers (except eax and ebx - only bl really counts - for sys_exit) or even if the stack is trashed. You're "supposed" to return a meaningful exit code, but it doesn't break anythinh if you don't. You can see the last exit code with "echo $?" if you care to.
6) It's different for you because you're using GNU C. It doesn't use the underscore (the "_start" label being an exception). Other implementations of C use a leading underscore on external (or global) names. OpenWatcom C uses a trailing underscore - "main_" and "printf_", etc.
Nasm has a handy feature whereby you can use no underscores in your source, and specify "--prefix _" on the command line to put underscores on anything extern or global. "--postfix _" for OpenWatcom users.
7) "ret" returns from a subroutine. It essentially "pops" the return address off the stack, and goes there. The return address has been put on the stack by the "call" instruction. "main" is called, "_start" is not! The first thing on the stack at the "_start" label is "argc" - remember I said the "startup code" moneyed with the stack? The important thing here is that it calls "main". You can't "ret" from the "_start" label because there's no return address there - you have no choice but sys_exit (or "exit()"). From "main" you can "ret" (if you follow the rules) or sys_exit - your choice. In the case of "ret", that "meaningful exit code" goes in eax, for sys_exit in ebx (for "ret" ebx is supposed to be preserved!). "ret" is an instruction, sys_exit is a kernel service. They have to be used slightly differently.
8 ) to be determined
Best,
Frank