Encryptor256 has posted some Windows C code here:
http://forum.nasm.us/index.php?topic=1725.0You could convert it to "a ALP" by pushing the parameters and calling the APIs, i that's what you want. (or putting the parameters in registers, for 64-bit code)
I suspect the "low level" answer would best be found in the Intel and/or AMD manuals. I don't know the answer - I just graduated to two cores recently. I think it's a pretty complicated question. Above my pay grade. How would I know if I'd succeeded in running my tasks on separate cores? Feel free to post anything you learn!
Best,
Frank