NASM Forum > Programming with NASM

How do you refresh the instruction cache in 16bit programs?

(1/2) > >>

ben321:
I know that in 32bit Windows (which uses 32bit protected mode) you call the Windows API function FlushInstructionCache. This forces the CPU to clear the instruction cache, and should be called immediately after any code to be executed gets modified (for self modifying code, such as compressed EXE files). How do you do the same thing in 16bit real mode (or for that matter in 16bit protected mode)?

debs3759:
I didn't know you could do that. Only thing I can think would be to use NOPs to execute for as many cycles as needed. I use code modifiers as part of my CPUID code to differentiate between 8088 and 8086.

fredericopissarra:
CLFLUSH is a ring0 instrunction. If you are running MS-DOS you are using ring0...
Unless you are using a very old processor (before Core2), CLFLUSH is available.

BUT, to do a cleanup of L1I cache, I believe, isn't possible.

ben321:

--- Quote from: fredericopissarra on February 28, 2023, 11:16:02 PM ---CLFLUSH is a ring0 instrunction. If you are running MS-DOS you are using ring0...
Unless you are using a very old processor (before Core2), CLFLUSH is available.

BUT, to do a cleanup of L1I cache, I believe, isn't possible.

--- End quote ---

Interesting.  I think on most x86 CPUs that CLFLUSH might be enough. How do most compressed EXEs handle this (such as those compressed by the program UPX)? Or do they just assume the cache was updated and not even call the FlushInstructionCache Windows API function?

fredericopissarra:
The "instruction" cache is the L1I cache. The first level (L1) is close to the "processor" and it is divided in 2 blocks: L1I and L1D, each with 32 KiB in size. CLFLUSH will affect "lines" of L1 cache (L1D or L1I), I believe, but not sure.

L2 and L3 caches have no such divisions (there, everything is data). Since, for Intel processors, everything in L1 cache is, necessarily in L2 cache (and L2->L3 cache), you are dealing with 'data', not instructions. That's why I said "not possible" before. But it is possible, if you get the linear address of a block of instructions, using CLFLUSH, to flush L1I.

But you have to do a good control... Nowadays L1I is 4-way associative (I believe), and L1D is 8 way-associative... This means 4 lines of cache L1I is loaded/unloaded at once, and 8 lines of L1D is dealt the same way. Since each line is 64 bytes long, flushing L1D is flushing 512 bytes at once, and for L1I, 256. Probably the processor distinguish this, but I think it is unlikely (specially in x86-64 mode), since the memory model used for modern applications is flat. The same linear address points to data or code (that's why you can do an indirect near jump not considering the segment selector). That's another reason I think "it is not possible"...

Navigation

[0] Message Index

[#] Next page

Go to full version