Author Topic: Instructions you should avoid...  (Read 2578 times)

Offline fredericopissarra

  • Full Member
  • **
  • Posts: 368
  • Country: br
Instructions you should avoid...
« on: June 18, 2023, 01:40:52 PM »
Old tutorials, specially for 8086, use LOOP, XLAT, and instructions like that. I recomend you don't use them. Why? Because they are SLOW. This loop:
Code: [Select]
  mov ecx,10
.loop:
  call doSomthing
  loop .loop
Works, but is slower than this one:
Code: [Select]
  mov ecx,10
.loop
  call doSomething
  dec ecx
  jne .loop
Even if some processors impose a penalty in DEC instruction (because of read-modify-write the EFLAGS, since CF is not affected by DEC)... This is even a little bit faster:
Code: [Select]
  mov ecx,10
.loop:
  call doSomething
  sub ecx,1
  jne .loop
The same warning goes to XLAT and XLATB. For those not familiar with these instructions, thay  take BX (or EBX) as base of an array and AL as index (yep 8 bits!). If you want to get the 3rd byte from an array you could do:
Code: [Select]
  lea ebx,[array]
  mov al,2           ; remember, offsets start at 0!
  xlatb
But, of couse, this is faster and takes less code space:
Code: [Select]
  mov al,[array+2]
That's why some high level compiler (like GCC) don't use a lot of available instructions: Because they are slow or takes a lot of space.
Ahhh... if you still want to use AL as index, you can always do something like this:
Code: [Select]
  movzx eax,al
  mov al,[array+eax]
Still faster than use XLATB (and you don't need EBX). And, yep, unless you are using a pre-386 processor, this is valid in real mode as well...
« Last Edit: June 18, 2023, 01:49:44 PM by fredericopissarra »

Offline Tobiasz Stamborski

  • Jr. Member
  • *
  • Posts: 7
Re: Instructions you should avoid...
« Reply #1 on: July 13, 2023, 05:21:44 PM »
Good to read finally about some practical optimizations in asm code. :) However these i was know, exactly from watching asm code produced by GCC.

If i could ask something - on how new or old processors these all runs faster? Are these good optimizations for, let's say, pentium ii?

Offline fredericopissarra

  • Full Member
  • **
  • Posts: 368
  • Country: br
Re: Instructions you should avoid...
« Reply #2 on: July 13, 2023, 07:19:38 PM »
If i could ask something - on how new or old processors these all runs faster? Are these good optimizations for, let's say, pentium ii?
Intel publishes a manual about code optimization (with the Intel SDM). See here: Intel SDM site. There is a good reference for some processors timings in this other site.

Offline fredericopissarra

  • Full Member
  • **
  • Posts: 368
  • Country: br
Re: Instructions you should avoid...
« Reply #3 on: July 14, 2023, 03:40:34 PM »
About optimizations on old processors...

The old 386 didn't has a cache and the access to memory was though a bus of 16 (386SX) or 32 bits (386DX). To access unaligned data (or instruction) the processor has to make 2 accesses. This is also true for older processors like 8088, 8086, 80186 (yep, it existed) and 80286 (with bus of 8 bits [8088] or 16 bits [8086 to 286]). The 80486 is the same as 386 but incorporated the 80387 and have minor optimizations on branch prediction.

The first Pentium is, essentially, a 486 with 2 execution units inside with a superscalar model (while reading an instruction from memory the processor was decoding the previous and executing the instruction before that). On pentium each execution unit are called U and V, where the V unit is a "sub 486" (capable of executing only some instructions)... The optimization here was to keep both units full... This was, more or less the same until Pentium 3 and Pentium 4, there more units were added and the processor decides which instructions will be executed in which unit by it self, reordering them.

Find and read "Zen of Code Optimization" (from Michael Abrash). This is a good read for old processors.

So, there are too many rules for optimization for those processors (each one a little bit different from the other), that's why compilers like GCC has -march= option: To enforce the rules to a specific processor.

For instance, until Sandy Bridge microarchitecture, INC/DEC was 1 cycle slower then ADD/SUB equivalents, so GCC uses ADD/SUB, unless your processor is newer and -march=native or a specific architecture.

In ALL microarchitectures the DIV/IDIV instruction is slower then MUL/IMUL, so GCC prefer to use a trick to make integer divisions by a constant divisor using IMUL.

But, granted: Optimizations, nowadays, is a hard task and manually crafted assembly functions tend to be slower than the same routine made in C (not C++!) using compiler optimizations.

[]s
Fred
« Last Edit: July 14, 2023, 03:46:20 PM by fredericopissarra »

Offline Tobiasz Stamborski

  • Jr. Member
  • *
  • Posts: 7
Re: Instructions you should avoid...
« Reply #4 on: July 14, 2023, 07:24:47 PM »
In Abrash's "Graphics Programming Black Book" i have found a few chapters about Pentium optimizations too. However using both execution units simultaneously seems to be already tricky enough. I don't know how hard to master nowadays processors optimizations must be.

Before reading i was thinking, that in most cases, shorter code runs faster. Now I know it isn't true because sometimes it's possible to execute two instructions at the same time.

I know about memory banks and memory alignment. I have read about this in Randall Hyde's "The Art of Assembly Language".

Offline alCoPaUL

  • Jr. Member
  • *
  • Posts: 68
  • Country: ph
    • Webpage
Re: Instructions you should avoid...
« Reply #5 on: July 15, 2023, 11:40:03 PM »
... (should be on the other thread)
« Last Edit: July 15, 2023, 11:41:38 PM by alCoPaUL »