Author Topic: How do you refresh the instruction cache in 16bit programs? (Read 70704 times)

ben321 · « **on:** February 28, 2023, 09:33:35 PM »

I know that in 32bit Windows (which uses 32bit protected mode) you call the Windows API function FlushInstructionCache. This forces the CPU to clear the instruction cache, and should be called immediately after any code to be executed gets modified (for self modifying code, such as compressed EXE files). How do you do the same thing in 16bit real mode (or for that matter in 16bit protected mode)?

debs3759 · « **Reply #1 on:** February 28, 2023, 10:06:25 PM »

I didn't know you could do that. Only thing I can think would be to use NOPs to execute for as many cycles as needed. I use code modifiers as part of my CPUID code to differentiate between 8088 and 8086.

fredericopissarra · « **Reply #2 on:** February 28, 2023, 11:16:02 PM »

CLFLUSH is a ring0 instrunction. If you are running MS-DOS you are using ring0...
Unless you are using a very old processor (before Core2), CLFLUSH is available.

BUT, to do a cleanup of L1I cache, I believe, isn't possible.

ben321 · « **Reply #3 on:** March 08, 2023, 10:54:44 AM »

Quote from: fredericopissarra on February 28, 2023, 11:16:02 PM

CLFLUSH is a ring0 instrunction. If you are running MS-DOS you are using ring0...
Unless you are using a very old processor (before Core2), CLFLUSH is available.

BUT, to do a cleanup of L1I cache, I believe, isn't possible.

Interesting. I think on most x86 CPUs that CLFLUSH might be enough. How do most compressed EXEs handle this (such as those compressed by the program UPX)? Or do they just assume the cache was updated and not even call the FlushInstructionCache Windows API function?

fredericopissarra · « **Reply #4 on:** March 08, 2023, 11:10:11 AM »

The "instruction" cache is the L1I cache. The first level (L1) is close to the "processor" and it is divided in 2 blocks: L1I and L1D, each with 32 KiB in size. CLFLUSH will affect "lines" of L1 cache (L1D or L1I), I believe, but not sure.

L2 and L3 caches have no such divisions (there, everything is data). Since, for Intel processors, everything in L1 cache is, necessarily in L2 cache (and L2->L3 cache), you are dealing with 'data', not instructions. That's why I said "not possible" before. But it is possible, if you get the linear address of a block of instructions, using CLFLUSH, to flush L1I.

But you have to do a good control... Nowadays L1I is 4-way associative (I believe), and L1D is 8 way-associative... This means 4 lines of cache L1I is loaded/unloaded at once, and 8 lines of L1D is dealt the same way. Since each line is 64 bytes long, flushing L1D is flushing 512 bytes at once, and for L1I, 256. Probably the processor distinguish this, but I think it is unlikely (specially in x86-64 mode), since the memory model used for modern applications is flat. The same linear address points to data or code (that's why you can do an indirect near jump not considering the segment selector). That's another reason I think "it is not possible"...

ben321 · « **Reply #5 on:** March 08, 2023, 09:48:06 PM »

Quote from: fredericopissarra on March 08, 2023, 11:10:11 AM

The "instruction" cache is the L1I cache. The first level (L1) is close to the "processor" and it is divided in 2 blocks: L1I and L1D, each with 32 KiB in size. CLFLUSH will affect "lines" of L1 cache (L1D or L1I), I believe, but not sure.

L2 and L3 caches have no such divisions (there, everything is data). Since, for Intel processors, everything in L1 cache is, necessarily in L2 cache (and L2->L3 cache), you are dealing with 'data', not instructions. That's why I said "not possible" before. But it is possible, if you get the linear address of a block of instructions, using CLFLUSH, to flush L1I.

But you have to do a good control... Nowadays L1I is 4-way associative (I believe), and L1D is 8 way-associative... This means 4 lines of cache L1I is loaded/unloaded at once, and 8 lines of L1D is dealt the same way. Since each line is 64 bytes long, flushing L1D is flushing 512 bytes at once, and for L1I, 256. Probably the processor distinguish this, but I think it is unlikely (specially in x86-64 mode), since the memory model used for modern applications is flat. The same linear address points to data or code (that's why you can do an indirect near jump not considering the segment selector). That's another reason I think "it is not possible"...

Then how do UPX exes properly implement self modifying code if you can't flush the instruction cash? If clflush doesn't do it, then maybe it uses invd or wbinvd instead? These are intended to clear all the caches in the system. Invd clears the caches without completing any pending writes out of the caches back to memory. Wbinvd first completes all pending writes from the caches back to memory, and then clears the caches.

fredericopissarra · « **Reply #6 on:** March 09, 2023, 07:50:32 AM »

Quote from: ben321 on March 08, 2023, 09:48:06 PM

Then how do UPX exes properly implement self modifying code if you can't flush the instruction cash?

You don't need to clear the caches (not "cash") to make self modifying code work.
You don't need to invalidate de caches every time you write to memory, or are you saying this:

Code: [Select]

  mov dword [ebx],0
  ; no clflush [ebx] here
  mov dword [ebx],1

Will only write 0 to [ebx] because there's no clflush [ebx] between them?

Quote from: ben321 on March 08, 2023, 09:48:06 PM

If clflush doesn't do it, then maybe it uses invd or wbinvd instead?memory, and then clears the caches.

INVD and WBINVD are ring 0 instructions. GPF happens if you try to use in userland.

fredericopissarra · « **Reply #7 on:** March 09, 2023, 09:18:00 AM »

Another thing: It is highly improbable UPX uses self-modifying code because .text section is READ-ONLY by default. You can test this with this function:

Code: [Select]

; test2.asm
  bits  64
  default rel

  segment .text

  global f

  align 4
f:
  xor ecx,ecx

.here:
  mov eax,1

  ; clflush [.here]         ; Makes no difference

  mov byte [.here],0xb9     ; Change 'mov eax,1' to 'mov ecx,1'.
                            ; This will cause a segfault
                            ; because this section is read-only.

  cmp eax,ecx
  jne .here
  mov eax,ecx
  ret

Code: [Select]

/* test.c */
#include <stdio.h>

extern int f( void );

int main( void ) { printf( "%d\n", f() ); }

Code: [Select]

# Makefile
CFLAGS=-O2

test: test.o test2.o

test.o: test.c

test2.o: test2.asm
	nasm -felf64 -o $@ $<

Code: [Select]

$ make
nasm -felf64 -o test2.o test2.asm
cc -O2   -c -o test.o test.c
cc   test.o test2.o   -o test

$ ./test
Segmentation fault (core dumped)

$ objdump -x test | sed -n '/Sections/,/SYMBOL/{ /.text/,+1p }'
 14 .text         00000135  0000000000001050  0000000000001050  00001050  2**4
                  CONTENTS, ALLOC, LOAD, READONLY, CODE

fredericopissarra · « **Reply #8 on:** March 09, 2023, 04:23:44 PM »

To be clear... I'm NOT saying that self modifying code isn't possible on modern systems... they are, once you declare your code in a different section.

.text, .data, .rodata, .bss are default sections with specific attributes (in elf32 or elf64 .text, for example, has alloc, progbits, exec, nowrite attributes). We can do the code in my previous post like this:

Code: [Select]

; test2.asm
  bits  64
  default rel

  ; A non-default section containing code (and data)...
  section myseg alloc progbits exec write align=16

  align 4
f_:
  xor ecx,ecx

.here:
  mov eax,1

  mov byte [.here],0xb9     ; Change 'mov eax,1' to 'mov ecx,1'.
                            ; This will cause a segfault
                            ; because this section is read-only.

  cmp eax,ecx
  jne .here
  mov eax,ecx
  ret

  ; this section is exec only (and read-only), by default.
  section .text

  global f

  align 4
f:
  jmp f_

And the code will work.

ben321 · « **Reply #9 on:** March 12, 2023, 02:43:52 AM »

Quote from: fredericopissarra on March 09, 2023, 04:23:44 PM

To be clear... I'm NOT saying that self modifying code isn't possible on modern systems... they are, once you declare your code in a different section.

.text, .data, .rodata, .bss are default sections with specific attributes (in elf32 or elf64 .text, for example, has alloc, progbits, exec, nowrite attributes). We can do the code in my previous post like this:
Code: [Select]
; test2.asm bits 64 default rel ; A non-default section containing code (and data)... section myseg alloc progbits exec write align=16 align 4 f_: xor ecx,ecx .here: mov eax,1 mov byte [.here],0xb9 ; Change 'mov eax,1' to 'mov ecx,1'. ; This will cause a segfault ; because this section is read-only. cmp eax,ecx jne .here mov eax,ecx ret ; this section is exec only (and read-only), by default. section .text global f align 4 f: jmp f_And the code will work.

Actually while those are the official way those standard sections should be configured (so most linkers configure them this way by default), the actual properties of each section in an EXE file are actually determined by the flags field in the section header for that section. Some linkers may even be able to give specific instructions regarding that to override the default behavior of the sections when making an EXE file, and some assemblers (though I don't think NASM can) may be able to write flags to the section headers in the output COFF file such that the linker will then just use the characteristics in this object file when writing the EXE (though other linkers may re-override the COFF file for the standard sections and insist on making .text readonly+executable).

NASM - The Netwide Assembler

News:

Author Topic: How do you refresh the instruction cache in 16bit programs? (Read 70704 times)

ben321

How do you refresh the instruction cache in 16bit programs?

debs3759

Re: How do you refresh the instruction cache in 16bit programs?

fredericopissarra

Re: How do you refresh the instruction cache in 16bit programs?

ben321

Re: How do you refresh the instruction cache in 16bit programs?

fredericopissarra

Re: How do you refresh the instruction cache in 16bit programs?

ben321

Re: How do you refresh the instruction cache in 16bit programs?

fredericopissarra

Re: How do you refresh the instruction cache in 16bit programs?

fredericopissarra

Re: How do you refresh the instruction cache in 16bit programs?

fredericopissarra

Re: How do you refresh the instruction cache in 16bit programs?

ben321

Re: How do you refresh the instruction cache in 16bit programs?