Author Topic: How do you refresh the instruction cache in 16bit programs?  (Read 2539 times)

Offline ben321

  • Full Member
  • **
  • Posts: 182
How do you refresh the instruction cache in 16bit programs?
« on: February 28, 2023, 09:33:35 PM »
I know that in 32bit Windows (which uses 32bit protected mode) you call the Windows API function FlushInstructionCache. This forces the CPU to clear the instruction cache, and should be called immediately after any code to be executed gets modified (for self modifying code, such as compressed EXE files). How do you do the same thing in 16bit real mode (or for that matter in 16bit protected mode)?

Offline debs3759

  • Global Moderator
  • Full Member
  • *****
  • Posts: 221
  • Country: gb
    • GPUZoo
Re: How do you refresh the instruction cache in 16bit programs?
« Reply #1 on: February 28, 2023, 10:06:25 PM »
I didn't know you could do that. Only thing I can think would be to use NOPs to execute for as many cycles as needed. I use code modifiers as part of my CPUID code to differentiate between 8088 and 8086.
My graphics card database: www.gpuzoo.com

Offline fredericopissarra

  • Full Member
  • **
  • Posts: 368
  • Country: br
Re: How do you refresh the instruction cache in 16bit programs?
« Reply #2 on: February 28, 2023, 11:16:02 PM »
CLFLUSH is a ring0 instrunction. If you are running MS-DOS you are using ring0...
Unless you are using a very old processor (before Core2), CLFLUSH is available.

BUT, to do a cleanup of L1I cache, I believe, isn't possible.
« Last Edit: February 28, 2023, 11:20:16 PM by fredericopissarra »

Offline ben321

  • Full Member
  • **
  • Posts: 182
Re: How do you refresh the instruction cache in 16bit programs?
« Reply #3 on: March 08, 2023, 10:54:44 AM »
CLFLUSH is a ring0 instrunction. If you are running MS-DOS you are using ring0...
Unless you are using a very old processor (before Core2), CLFLUSH is available.

BUT, to do a cleanup of L1I cache, I believe, isn't possible.

Interesting.  I think on most x86 CPUs that CLFLUSH might be enough. How do most compressed EXEs handle this (such as those compressed by the program UPX)? Or do they just assume the cache was updated and not even call the FlushInstructionCache Windows API function?

Offline fredericopissarra

  • Full Member
  • **
  • Posts: 368
  • Country: br
Re: How do you refresh the instruction cache in 16bit programs?
« Reply #4 on: March 08, 2023, 11:10:11 AM »
The "instruction" cache is the L1I cache. The first level (L1) is close to the "processor" and it is divided in 2 blocks: L1I and L1D, each with 32 KiB in size. CLFLUSH will affect "lines" of L1 cache (L1D or L1I), I believe, but not sure.

L2 and L3 caches have no such divisions (there, everything is data). Since, for Intel processors, everything in L1 cache is, necessarily in L2 cache (and L2->L3 cache), you are dealing with 'data', not instructions. That's why I said "not possible" before. But it is possible, if you get the linear address of a block of instructions, using CLFLUSH, to flush L1I.

But you have to do a good control... Nowadays L1I is 4-way associative (I believe), and L1D is 8 way-associative... This means 4 lines of cache L1I is loaded/unloaded at once, and 8 lines of L1D is dealt the same way. Since each line is 64 bytes long, flushing L1D is flushing 512 bytes at once, and for L1I, 256. Probably the processor distinguish this, but I think it is unlikely (specially in x86-64 mode), since the memory model used for modern applications is flat. The same linear address points to data or code (that's why you can do an indirect near jump not considering the segment selector). That's another reason I think "it is not possible"...

Offline ben321

  • Full Member
  • **
  • Posts: 182
Re: How do you refresh the instruction cache in 16bit programs?
« Reply #5 on: March 08, 2023, 09:48:06 PM »
The "instruction" cache is the L1I cache. The first level (L1) is close to the "processor" and it is divided in 2 blocks: L1I and L1D, each with 32 KiB in size. CLFLUSH will affect "lines" of L1 cache (L1D or L1I), I believe, but not sure.

L2 and L3 caches have no such divisions (there, everything is data). Since, for Intel processors, everything in L1 cache is, necessarily in L2 cache (and L2->L3 cache), you are dealing with 'data', not instructions. That's why I said "not possible" before. But it is possible, if you get the linear address of a block of instructions, using CLFLUSH, to flush L1I.

But you have to do a good control... Nowadays L1I is 4-way associative (I believe), and L1D is 8 way-associative... This means 4 lines of cache L1I is loaded/unloaded at once, and 8 lines of L1D is dealt the same way. Since each line is 64 bytes long, flushing L1D is flushing 512 bytes at once, and for L1I, 256. Probably the processor distinguish this, but I think it is unlikely (specially in x86-64 mode), since the memory model used for modern applications is flat. The same linear address points to data or code (that's why you can do an indirect near jump not considering the segment selector). That's another reason I think "it is not possible"...

Then how do UPX exes properly implement self modifying code if you can't flush the instruction cash? If clflush doesn't do it, then maybe it uses invd or wbinvd instead? These are intended to clear all the caches in the system. Invd clears the caches without completing any pending writes out of the caches back to memory. Wbinvd first completes all pending writes from the caches back to memory, and then clears the caches.

Offline fredericopissarra

  • Full Member
  • **
  • Posts: 368
  • Country: br
Re: How do you refresh the instruction cache in 16bit programs?
« Reply #6 on: March 09, 2023, 07:50:32 AM »
Then how do UPX exes properly implement self modifying code if you can't flush the instruction cash?
You don't need to clear the caches (not "cash") to make self modifying code work.
You don't need to invalidate de caches every time you write to memory, or are you saying this:
Code: [Select]
  mov dword [ebx],0
  ; no clflush [ebx] here
  mov dword [ebx],1
Will only write 0 to [ebx] because there's no clflush [ebx] between them?

If clflush doesn't do it, then maybe it uses invd or wbinvd instead?memory, and then clears the caches.
INVD and WBINVD are ring 0 instructions. GPF happens if you try to use in userland.

« Last Edit: March 09, 2023, 08:39:26 AM by fredericopissarra »

Offline fredericopissarra

  • Full Member
  • **
  • Posts: 368
  • Country: br
Re: How do you refresh the instruction cache in 16bit programs?
« Reply #7 on: March 09, 2023, 09:18:00 AM »
Another thing: It is highly improbable UPX uses self-modifying code because .text section is READ-ONLY by default. You can test this with this function:
Code: [Select]
; test2.asm
  bits  64
  default rel

  segment .text

  global f

  align 4
f:
  xor ecx,ecx

.here:
  mov eax,1

  ; clflush [.here]         ; Makes no difference

  mov byte [.here],0xb9     ; Change 'mov eax,1' to 'mov ecx,1'.
                            ; This will cause a segfault
                            ; because this section is read-only.

  cmp eax,ecx
  jne .here
  mov eax,ecx
  ret
Code: [Select]
/* test.c */
#include <stdio.h>

extern int f( void );

int main( void ) { printf( "%d\n", f() ); }
Code: [Select]
# Makefile
CFLAGS=-O2

test: test.o test2.o

test.o: test.c

test2.o: test2.asm
nasm -felf64 -o $@ $<
Code: [Select]
$ make
nasm -felf64 -o test2.o test2.asm
cc -O2   -c -o test.o test.c
cc   test.o test2.o   -o test

$ ./test
Segmentation fault (core dumped)

$ objdump -x test | sed -n '/Sections/,/SYMBOL/{ /.text/,+1p }'
 14 .text         00000135  0000000000001050  0000000000001050  00001050  2**4
                  CONTENTS, ALLOC, LOAD, READONLY, CODE

Offline fredericopissarra

  • Full Member
  • **
  • Posts: 368
  • Country: br
Re: How do you refresh the instruction cache in 16bit programs?
« Reply #8 on: March 09, 2023, 04:23:44 PM »
To be clear... I'm NOT saying that self modifying code isn't possible on modern systems... they are, once you declare your code in a different section.

.text, .data, .rodata, .bss are default sections with specific attributes (in elf32 or elf64 .text, for example, has alloc, progbits, exec, nowrite attributes). We can do the code in my previous post like this:
Code: [Select]
; test2.asm
  bits  64
  default rel

  ; A non-default section containing code (and data)...
  section myseg alloc progbits exec write align=16

  align 4
f_:
  xor ecx,ecx

.here:
  mov eax,1

  mov byte [.here],0xb9     ; Change 'mov eax,1' to 'mov ecx,1'.
                            ; This will cause a segfault
                            ; because this section is read-only.

  cmp eax,ecx
  jne .here
  mov eax,ecx
  ret

  ; this section is exec only (and read-only), by default.
  section .text

  global f

  align 4
f:
  jmp f_
And the code will work.
« Last Edit: March 09, 2023, 04:25:23 PM by fredericopissarra »

Offline ben321

  • Full Member
  • **
  • Posts: 182
Re: How do you refresh the instruction cache in 16bit programs?
« Reply #9 on: March 12, 2023, 02:43:52 AM »
To be clear... I'm NOT saying that self modifying code isn't possible on modern systems... they are, once you declare your code in a different section.

.text, .data, .rodata, .bss are default sections with specific attributes (in elf32 or elf64 .text, for example, has alloc, progbits, exec, nowrite attributes). We can do the code in my previous post like this:
Code: [Select]
; test2.asm
  bits  64
  default rel

  ; A non-default section containing code (and data)...
  section myseg alloc progbits exec write align=16

  align 4
f_:
  xor ecx,ecx

.here:
  mov eax,1

  mov byte [.here],0xb9     ; Change 'mov eax,1' to 'mov ecx,1'.
                            ; This will cause a segfault
                            ; because this section is read-only.

  cmp eax,ecx
  jne .here
  mov eax,ecx
  ret

  ; this section is exec only (and read-only), by default.
  section .text

  global f

  align 4
f:
  jmp f_
And the code will work.

Actually while those are the official way those standard sections should be configured (so most linkers configure them this way by default), the actual properties of each section in an EXE file are actually determined by the flags field in the section header for that section. Some linkers may even be able to give specific instructions regarding that to override the default behavior of the sections when making an EXE file, and some assemblers (though I don't think NASM can) may be able to write flags to the section headers in the output COFF file such that the linker will then just use the characteristics in this object file when writing the EXE (though other linkers may re-override the COFF file for the standard sections and insist on making .text readonly+executable).