I think those questions were actually meant to be linked together.
In a 4 core processor... you really do have 4 sets of all the registers we know and love. You don't need a 256-bit register to do 4 64-bit operations... that would actually be ugly as every core would basically need to do the same instruction... which really limits its use.
You might be confused by MMX/SSE which do that tactic for several variables that are kinda tied together (works well for many Vector for instance like -- add two R,G,B,A colours together. Instead of doing 4 additions Red + Red, Green + Green, Blue + Blue, Alpha + Alpha... you do one huge one once).
This is
not how multicore processors work... again -- a quadcore CPU has the functional parts of a typical CPU duplicated 4 times on the same chip (+ some other stuff like shared cache, queue, etc. on the side)
It really is almost entirely up to the OS (and the CPU itself) to determine which thread goes on what core at any given time. All you can do is say "Hey I want this string of code and this string of code separate". The OS then shares time based on priority settings, what program has focus, etc.
If you look at the CPU Die itself -- you can actually see the mirroring of the CPUs.