It's, in part at least, a syntax issue. Masm requires, in some cases, a segment override (the 3E and 38 bytes) to indicate "[contents]". The override is "optimized away" by Masm. As you know, there is no need for it, but Nasm will put it there if we say so. Just leave it out of the Nasm code.
The 66 byte, however, is an operation size override. It essentially switches ax to eax or vice versa. Nasm will emit it if required... but Nasm needs to be told what we want to do. If we're using 32-bit registers in 16-bit code (or vice versa), it needs to be there. However, if we've told Nasm "bits 16" and the CPU is actually in 32-bit mode the result will be totally wrong!
We probably need to know more about what you're trying to do before we can help much. For example, you don't know whether your "original" code is 16-bit or 32-bit. What's the header say? You should see the "MZ" signature as the first two bytes. If there's a "PE" signature a few bytes on (I forget the offset), it's 32-bit code. If the "MZ" is all, it's 16-bit code. If Nasm is told incorrectly, the code will be wrong and there's not a chance that it will run. You can see the difference by disassembling with ndisasm's default 16-bit, and adding "-b32" to the command line so ndisasm will "see what the CPU sees" if it's in 32-bit mode. One should "make sense" and the other be "absolute garbage"... if you can tell the difference...
I'm fond of Agner Fog's "objconv" as a disassembler. I don't think it'll do 16-bit code, but if what you've got is 32bit code, you're in luck. It will put the instructions on the left and the addresses and bytes on the right, after a ';', so there's some hope it will assemble. It will even recognize and label "do nothing" code added for alignment padding. Nasm syntax, too!
I have to tell you that we don't discuss "reversing" or "cracking" here. For one thing, it could be illegal in some jurisdiction and we don't want to get the Forum in trouble. For another thing, it's difficult to do in a useful manner. Getting a disassembled binary to assemble unaltered is easy, but pointless - we had the binary! Getting something you can alter and assemble into something that runs properly is much tougher. Often easier to write it from scratch. I don't know how much experience you've got, Fossil, but if you're new to asm, this may not be where you want to start.
I can't resist a wild-asmed guess: the offset in your first example would "make sense" in a bootsector. Is it a bootsector, do we know?
Best,
Frank