Hi Philip,
Good question! (Your English is perfect, BTW... but the CPU doesn't speak English anyway)
bytes you don't really need to align (every address is a correct alignment), words you must put on even addresses, double words must be on multiple of 4 addresses and quad words on multiple of 8's. When you are using SSE instructions you must align on 16 byte addresses.
This is generally true of variables anywhere in memory - not just the stack. Up to SSE, there is merely a performance penalty for unaligned access - apparently "not too bad". In the case of SSE, there are instructions which will tolerate unaligned addresses, and faster ones which will segfault if not aligned on 16 byte boundaries.
When it comes to the stack, my understanding is that it should (must?) be aligned at least on "stacksize" boundaries - 4 bytes for a 32-bit machine. So "sub esp, 10" would probably not be a good idea, even if all you need is 10 bytes - better to bump it up to "sub esp, 16". I observe that gcc seems always to align the stack to 16 bytes, regardless of the size of local variables, and regardless of whether SSE is actually being used.
There's some discussion of this issue in news:comp.lang.asm.x86 entitled "aligning to word boundaries" starting on 6/15/10, if you have access to newsgroups. (groups.google.com if you don't) Apparently the "real" read/write size is a cache line (32 bytes? 64 bytes?)
Agner Fog has written a nice document on optimization, which should cover this issue, among others:
http://www.agner.org/optimize/optimizing_assembly.pdfIMO, a beginner should worry about making it "work" first, and worry about optimization later - but it won't hurt to develop good habits from the beginning!
Best,
Frank