I have a single jmp instruction at the beginning of a 16 byte block followed by an alignment:
%use smartalign
ALIGNMODE generic,16
BITS 64
section .text
jmp elsewhere
align 16
0000000000000060 eb92 jmp xxxx
0000000000000062 66666690 nop
0000000000000066 66666690 nop
000000000000006a 66666690 nop
000000000000006e 6690 nop
Is it possible to tell align to combine those 4 nops into a single 14 byte nop? This is on Haswell and in fact it will incur an LCP stall the first time. But then after that it will live in the micro-instruction cache for the rest of eternity.