Author Topic: Question for Cyrill, moved from C headers thread  (Read 19293 times)

Offline JoeCoder

  • Jr. Member
  • *
  • Posts: 72
  • Country: by
Question for Cyrill, moved from C headers thread
« on: July 06, 2011, 11:38:43 AM »
Hi Cyrill,

C preprocessor parser is not hard thing (nasm preprocessor is a way more powerful) so I suspect we might consider implementing some converter inside nasm code directly as only time permits ;)

In that case, I wonder if you could add a feature to relate a gpr to a base register for some section of code.

For example in System Z assembler:

Code: [Select]
                    LA        R5,MYAREA             LOAD REGISTER 5 WITH ADDRESS OF MYAREA
                   USING MYAREA,R5             TELL ASSEMBLER IT SHOULD USE R5 FOR BASE ADDRESS OF MYAREA REFERENCES

*  NOW MYAREA IS IN "SCOPE" AND THE ASSEMBLER WILL RESOLVE REFERENCES BY
*  USING R5 AS THE BASE REGISTER IN INSTRUCTIONS REFERENCING NAMES IN MYAREA

                 ST           R4,MYFIELDA            ALL LOAD AND STORE IN SYS Z IS: REGISTER,DISPLACEMENT(INDEX,BASE)
*                                                                     SO HERE THE BASE WILL BE SET TO R5 BY THE ASSEMBLER WHEN GENERATING
*                                                                     THE STORE INSTRUCTION
                DROP      R5                                MYAREA NOW GOES OUT OF ASSEMBLY SCOPE
*
*  THE ASSEMBLER CANNOT RESOLVE REFERENCES TO FIELDS IN MYAREA AND THE
*  FOLLOWING INSTRUCTION WILL PRODUCE AN ASSEMBLY ERROR
*
                      ST         R4,MYFIELDA        ASSEMBLY ERROR
                .
                .
MYAREA   DSECT
MYFIELDA      DS       F
                .

I would like to be able to accomplish a similar function in NASM. The purpose is to make the source code cleaner by not having to specify base address and offset in the form of [eax+symbolic_offset], instead it would be nice to specify symbol_name or [symbol_name] without coding an explicit base register, depending on context. We find it especially useful when a control block is mapped by a structure (DSECT) and that structure is referenced repeatedly in a section of code.

Rob suggested a macro but if I understand correctly it would have to be done *in* the assembler itself, and could not be implemented as a macro.

Do you have any suggestions? Thank you.

Sorry about the alignment, the code tags don't seem to be preserving my spaces. Looks perfect in edit, bad when posted.
« Last Edit: July 06, 2011, 12:50:13 PM by JoeCoder »
If you can't code it in assembly, it can't be coded!

Offline Cyrill Gorcunov

  • NASM Developer
  • Full Member
  • *****
  • Posts: 179
  • Country: 00
Re: Question for Cyrill, moved from C headers thread
« Reply #1 on: July 06, 2011, 06:47:23 PM »
Preprocessing C headers is a different, in most case it will simply substitute some macro symbols into nasm equivalents (note I'm not talking about semifunctions which could be done via do { ...} while(0) contructs).

Putting the things which could be done via macro into nasm hardcoded snippets are go against nasm philosophy I think, so I agree with Rob that such things are better be implemented via macro helpers.

I suspect you can do something like below

Code: [Select]
%define __ra(reg,off) reg + off
%define __mem_v(reg,off) [__ra(reg,off)]
%define sym_v(name) __mem_v(eax,name)

Offline JoeCoder

  • Jr. Member
  • *
  • Posts: 72
  • Country: by
Re: Question for Cyrill, moved from C headers thread
« Reply #2 on: July 06, 2011, 08:03:02 PM »
I'm not sure this can be done with macros at all. At least in the assembler I use, the assembler has to be aware of it and it's done by issuing a directive to the assembler , not by macros. What I am trying to do is be able to use names of any fields defined in a structure without base register references.

Sorry to be dense, but I didn't understand your examples. Can you show an example of what you are recommending to handle something like this:

Code: [Select]
struc mystruc
  fielda resd 1
  fieldb resd 1
endstruc

.
.
  lea   ebx,[mystruc]                 ; load address of mystruc
  mov eax,[fielda]                     ; ideal, how do we get this to work without using offsets for each field?
                                                   ; how do we get the assembler to use ebx for all references to mystruc fields?
  mov eax,[ebx+fielda]            ; what I am trying to avoid...is it possible?

Thanks.
If you can't code it in assembly, it can't be coded!

Offline Cyrill Gorcunov

  • NASM Developer
  • Full Member
  • *****
  • Posts: 179
  • Country: 00
Re: Question for Cyrill, moved from C headers thread
« Reply #3 on: July 06, 2011, 09:30:19 PM »
Why don't make it as

Code: [Select]
;
; %1 dest register
; %2 struct name
; %3 field
%mmacro __ld_field
    lea ebx, [%1]
    mov eax, [%2]
    mov eax, [ebx + %2]
%endma

so you would call it as

Code: [Select]
__ld_field eax, mystruct, fielda

You could wrap it even more if needed.

Rob, Frank, didn't we have some struct helpers? Probably I miss something ;)

Offline Cyrill Gorcunov

  • NASM Developer
  • Full Member
  • *****
  • Posts: 179
  • Country: 00
Re: Question for Cyrill, moved from C headers thread
« Reply #4 on: July 06, 2011, 09:44:59 PM »
mov eax, [ebx + %3] of course

Offline Rob Neff

  • Forum Moderator
  • Full Member
  • *****
  • Posts: 429
  • Country: us
Re: Question for Cyrill, moved from C headers thread
« Reply #5 on: July 06, 2011, 10:28:04 PM »
There are indeed various helper macros but what Joe is asking for is a way for Nasm to automatically handle the fact that any reference to a field in a struc be implicitly associated with the base register established via some "USING" mechanism.  The idea is not new but it's implementation in Nasm would be.  I actually like the suggestion but struggle with it's ramifications.

Very simple examples:

Code: [Select]
USING R11, MYSTRUC
  .
  mov R11, 0   <- Should Nasm account for this or caveat emptor?
  .
  mov  RAX, [MyFieldA]

Or how about

Code: [Select]
USING R11, MYSTRUCA
USING R12, MYSTRUCB  ; <-- Allow multiples?
  .
  mov  RAX, MyFieldX  ; <-- Defined in both MYSTRUCA & MYSTRUCB
  .

Granted, both cases are easily handled with warnings/errors, but I'm sure there are other scenarios that need accounted for.
Regardless, the request should probably be posted to Suggestions or filed as a Bug Tracker Feature Request as it deserves further consideration.

Offline Cyrill Gorcunov

  • NASM Developer
  • Full Member
  • *****
  • Posts: 179
  • Country: 00
Re: Question for Cyrill, moved from C headers thread
« Reply #6 on: July 07, 2011, 06:26:36 AM »
yes, better put it into feature request together with explanation what need to be achieved at the end and code example.

Offline JoeCoder

  • Jr. Member
  • *
  • Posts: 72
  • Country: by
Re: Question for Cyrill, moved from C headers thread
« Reply #7 on: July 07, 2011, 08:16:44 AM »
There are indeed various helper macros but what Joe is asking for is a way for Nasm to automatically handle the fact that any reference to a field in a struc be implicitly associated with the base register established via some "USING" mechanism.  The idea is not new but it's implementation in Nasm would be.  I actually like the suggestion but struggle with it's ramifications.

Of course the idea is not new, it's at least from the 1960s!  ;)

Actually the USING mechanism works for code as well as data and it has many other options. I know I could personally use it for structures in NASM even in the small amount of code I have written until now. I don't know enough about x86 or NASM to know where else it should be used. Maybe you can think of some. But we have a very different environment, in some ways I understand it is more similar to DOS than to modern Win/Linux/Unix because we can only reference 4k of code or data at a time (like segmentation sort of, but not really) so USINGs for us are an essential part of life because we always have to specify the base address of code or data. But it's also essential to make the code cleaner and make maintenence easier when referring to control blocks. For example in x86 if you have a large control block and you have many references to base+fieldname and you need to change the base for some reason, let's suppose you're going from x86 to x64 and you decide to use one of the new registers, now you have to manually change every line of code. That's prone to errors, either by missing a few lines or by changing lines that shouldn't be changed. If you had an assembly directive to base all of your stucture references off of in a specific section of code, that huge PITA change just became a one-liner and it's certain to work like it should with none of the risk associated with changing line by line. Note, one important aspect of the directive has to be that it has a scope. USING and DROP go together. Just like you need to be able to say "at this point in the source file, start assigning the base address by using *this* register" you also have to be able to say, "at this point in the source file, stop using *this* register as a base register". Specific reasons include needing to use the register for something else at that point in code, and it's no longer pointing to your structure so it's unsafe to let the assembler stumble over a line that got left in and assemble it incorrectly. Bad code should get flagged. Since in x86 there is always a shortage of registers, it's very likely we'll use a register to map a structure and then pretty soon we'll stop doing that and reuse the register for some other purpose, or even to map another, different structure. If we don't have a way to undo the prior USING then we'll be in bad shape.

Very simple examples:

Code: [Select]
USING R11, MYSTRUC
  .
  mov R11, 0   <- Should Nasm account for this or caveat emptor?
  .
  mov  RAX, [MyFieldA]

I'm not sure what this example shows...the USING should have the area name followed by the register name, in this case USING MYSTRUC,R11. If that's what you meant then are you saying, what happens if I code an instruction and the base address of the area I am mapping got set to zero by mistake? Then yes, caveat emptor. And, this is an essential part of how it should work because you may need to map areas from zero in some cases. I don't know about x86 but in MVS we do this all the time to navigate control blocks that are defined to exist at virtual address 0. Anyway, the whole thing has to be based on the assembler helping you and not doing something to you by trying to outsmart you. The assembler should blindly accept whatever address you assign as the base, and it would be nice if base+offset or base-offset was also an option rather pure base. That would allow you to use structures to map a stack frame and that would be a super nice cleanup of negative references in code that use magic number and are hard to maintain. With some kind of USING/DROP support, that code can be written cleanly and increase readability and therefore code quality.

If you had a struct that includes the length of the struct as an equate, you should be able to load a gpr with the base address, subtract the struct length equate value, and then do a USING structname,gpr. Then you reference fields in the stack frame by name, with no risk of wrong offset values. You can do that now of course, but you still have to code base+ fieldname rather than just fieldname.

Or how about

Code: [Select]
USING R11, MYSTRUCA
USING R12, MYSTRUCB  ; <-- Allow multiples?
  .
  mov  RAX, MyFieldX  ; <-- Defined in both MYSTRUCA & MYSTRUCB
  .

USING MYSTRUCA,R11
USING MYSTRUCB, R12, ; <-- Allow multiples?

I would have to go look and see how this works today. I know originally all field names had to be unique in an assembly. So this problem couldn't happen. Today they may allow qualified names, but I don't use them because I think it can cause confusion and if not confusing, more typing, which kind of goes against coding in assembler. If NASM already supports qualified names then the simple answer is it should continue to work like it works now. If that reference is a qualified name and it would have to be qualified in code, it should have to be qualified in your example also. I don't know why there should be any difference. Did I miss something, or does that relate to what you are talking about?

Granted, both cases are easily handled with warnings/errors, but I'm sure there are other scenarios that need accounted for. Regardless, the request should probably be posted to Suggestions or filed as a Bug Tracker Feature Request as it deserves further consideration.

I'm impressed with you guys and thanks for the helpful attitudes. I am obviously not qualified to put in a formal request since I don't know enough about how NASM works (which is why the original post was "can we do this" rather than "why don't you add a feature") and I would miss important details. But if you can help me formulate a request that encompasses what I am asking for on a functional level that makes sense in NASM that would be excellent.

Thanks Cyrill and Rob!
« Last Edit: July 07, 2011, 08:18:46 AM by JoeCoder »
If you can't code it in assembly, it can't be coded!

Offline Bryant Keller

  • Forum Moderator
  • Full Member
  • *****
  • Posts: 360
  • Country: us
    • About Bryant Keller
Re: Question for Cyrill, moved from C headers thread
« Reply #8 on: July 08, 2011, 05:57:06 AM »
This should give NASM just a hint of that MASM "flare" with structures. ;)

Code: [Select]
BITS 32

%imacro ASSUME 2
%ifidni %2, NOTHING
%undef %{1}.
%else
%undef %{1}.
%define %{1}.(_x_) (%{1}) + (%{2} %+ . %+ _x_)
%endif
%endmacro

STRUC mystruc
.valueA resd 1
.valueB resd 1
ENDSTRUC

STRUC mytype
.valueB resd 1
.valueA resd 1
ENDSTRUC

SECTION .data

msX:
ISTRUC mystruc
AT mystruc.valueA, DD 1
AT mystruc.valueB, DD 2
IEND

SECTION .text


GLOBAL _start
_start: nop
; works with labels
ASSUME msX, mystruc
mov eax, [msX.(valueA)]
mov ebx, [msX.(valueB)]
ASSUME msX, NOTHING
; and works with registers
mov edi, msX
ASSUME edi, mytype
mov [edi.(valueA)], eax
mov [edi.(valueB)], ebx
ASSUME edi, nothing
xor eax, eax
xor ebx, ebx
inc eax
int 0x80

About Bryant Keller
bkeller@about.me

Offline JoeCoder

  • Jr. Member
  • *
  • Posts: 72
  • Country: by
Re: Question for Cyrill, moved from C headers thread
« Reply #9 on: July 12, 2011, 08:13:32 AM »
Thanks, Bryant. I'll try to understand what you did here. Sorry for the late reply, was on the road for work.
If you can't code it in assembly, it can't be coded!