"i my opinion, if i specify [BITS 16] in my program, then the program will be compiled into 16-bits codes, and surely we can not use 32-bits registers in this mode."
Your opinion differs from that of the CPU... and the CPU wins.
As Keith suggests, RTFM:
http://www.nasm.us/doc/nasmdoc6.html#section-6.1In particular, note the paragraph on the 0x66 and 0x67 prefixes. The reason this works is because Intel employed a "clever trick" or an "ugly kludge" (depending on how you look at it) when they introduced 32-bit code - they used the same opcodes! If the CPU is in 32-bit mode, as determined by a bit in the cs descriptor, it defaults to 32-bit registers and opcodes. If this bit is clear (in protected mode), or if we're in real mode (determined by a bit in cr0), the default is 16-bit registers, opcodes, and addresses. These defaults can be toggled (on a per instruction basis) by the operand size override prefix and/or the address size override prefix. In your proposed code (using 32-bit registers in 16-bit mode), Nasm will automatically insert the 0x66 prefix - specifying "bits 16" and using 32-bit registers tells it to do so. 32-bit addressing modes in 16-bit code will also work:
bits 16
mov al, [array + ecx + edx * 4]
There's a "gotcha" with this: the total offset must be within the "limit" prescribed for this segment, normally 0xFFFF for 16-bit code (although this can be changed, in some circumstances... so called "Flat Real Mode" or "Unreal mode).
Does that help clear it up?
Best,
Frank