Author Topic: Why does NASM requre 'extern'?  (Read 23652 times)

nobody

  • Guest
Why does NASM requre 'extern'?
« on: August 06, 2008, 09:01:52 PM »
Why does NASM require code that uses functions from other files use 'extern'? Other assemblers, like GAS, don't. Shouldn't it only be an error when linking?

Offline Frank Kotler

  • NASM Developer
  • Hero Member
  • *****
  • Posts: 2667
  • Country: us
Re: Why does NASM requre 'extern'?
« Reply #1 on: August 07, 2008, 12:39:13 AM »
Same reason Nasm has operand-order "backwards", I guess: Nasm is not Gas.

Why would you *want* to wait until link-time to discover you've made an error???

Best,
Frank

nobody

  • Guest
Re: Why does NASM requre 'extern'?
« Reply #2 on: August 08, 2008, 12:20:20 AM »
Is there an option to make 'extern' not required?

Offline Frank Kotler

  • NASM Developer
  • Hero Member
  • *****
  • Posts: 2667
  • Country: us
Re: Why does NASM requre 'extern'?
« Reply #3 on: August 08, 2008, 01:01:15 AM »
Use (G)as, I guess...

Best,
Frank

Offline alexfru

  • Jr. Member
  • *
  • Posts: 17
Re: Why does NASM requre 'extern'?
« Reply #4 on: February 09, 2014, 12:16:43 PM »
Actually, not requiring extern may be useful.

For example, I'm working on a simple C compiler (http://github.com/alexfru/SmallerC), and in C, as you know, the following two declarations should be sufficient to let the compiler know that there is this function foo() somewhere, either later in the same C file or in a different file:

int foo(int bar);
extern int foo(int bar);

If foo() is in a different C file and I don't generate "extern" in the assembly code, the code won't assemble as _foo will be undefined at all.

If foo() is later in the same C file and I do generate "extern" in the assembly code, the code won't assemble as _foo will be defined multiple times differently.

So, NASM requires me to either carefully choose between "int foo(int bar);" and "extern int foo(int bar);" (currently, they aren't equivalent in my compiler, the extern form emits "extern _foo" in the generated assembly code, while the non-extern form emits nothing) or maintain an additional table within the compiler and append a bunch of "extern _foo"'s for functions and objects not defined within the C file.

I think, defaulting undefined symbols to external (for example, via an option) may be a useful feature.

Offline encryptor256

  • Full Member
  • **
  • Posts: 250
  • Country: lv
  • Win64 .
    • On Youtube: encryptor256
Re: Why does NASM requre 'extern'?
« Reply #5 on: February 09, 2014, 12:57:42 PM »
For example, I'm working on a simple C compiler (http://github.com/alexfru/SmallerC)

Nice, i hope you will finish it.

I think, defaulting undefined symbols to external (for example, via an option) may be a useful feature.

Yes, it might be useful feature/option.
Encryptor256's Investigation \ Research Department.

Offline Rob Neff

  • Forum Moderator
  • Full Member
  • *****
  • Posts: 429
  • Country: us
Re: Why does NASM requre 'extern'?
« Reply #6 on: February 09, 2014, 05:15:37 PM »
If you do not define a variable/function as extern then the assembler, upon finding an undefined label, emits an error in the current source file.  This is perfectly valid reasoning for error checking. 

When developing a compiler you need to ask yourself this: How much much memory space might be required? Was said label to be contained in the .bss, .text, .const, or .data section?  Is it 1, 2, 4, 8, or more bytes in size?  What machine opcodes are permissible?  Are segment overrides required?  Can it be optimized?

That extern directive is vital in obtaining the memory size, alignment requirements, and permissible opcode generation of that label reference.  Conversely, the global directive is the other half of the equation: to make known that the defined label can/will be referenced by other modules.  Thus, together, they define a standard procedure that helps the linker properly resolve addresses.  When using assembly languages it is your responsibility to tell the assembler all these things.  That is the point of using assembly: to have complete control of the generated machine instructions.

Sure, for function references, you could say that the address size and alignment is constant.  However, you can only detect this scenario when parsing a call operation.  You can't know at assembly time that the instruction "mov eax, somelabel" is a function reference ( ie: moving a pointer to a function to a register ) rather than a reference to a data variable if it has not been defined either in the source file currently being parsed or as an external reference.  Thus, you need to use the extern directive.

Perhaps you made a typo?  You defined a variable named _My_Var but in the source file you typed my_var.  Assemblers and compilers catch this.  Do you really want to attempt to defer this to the linker?

Statically typed languages such as C and Pascal also require this information during the compile phase to ensure proper memory allocation, alignment, location, and opcode generation.  Dynamically typed languages such as Python or Ruby can infer these requirements during runtime.  However, there is a cost in execution speed when using dynamically typed languages since they have to figure out what it is you are attempting to accomplish ( and not always successfully or even optimally ).

I hope that gives you some food for thought.

Offline alexfru

  • Jr. Member
  • *
  • Posts: 17
Re: Why does NASM requre 'extern'?
« Reply #7 on: February 09, 2014, 06:33:51 PM »
If you do not define a variable/function as extern then the assembler, upon finding an undefined label, emits an error in the current source file.  This is perfectly valid reasoning for error checking.

It is a possible and rather common behavior. However, it's not the only possible one. (g)as has already been mentioned as an example of where extern is not required or can be optionally omitted.

When developing a compiler you need to ask yourself this: How much much memory space might be required? Was said label to be contained in the .bss, .text, .const, or .data section?  Is it 1, 2, 4, 8, or more bytes in size?  What machine opcodes are permissible?  Are segment overrides required?  Can it be optimized?

A simple line like "extern _foo" does not provide me with any of the information you're talking about. In a simple case like mine, this line is merely a whim, not more.

Perhaps you made a typo?  You defined a variable named _My_Var but in the source file you typed my_var.  Assemblers and compilers catch this.  Do you really want to attempt to defer this to the linker?

I do. I wish to be able to shoot all feet and what have you off within ten yards of here. In C, you get  Undefined Behavior when, for example, you define foo to be a function in one translation unit but use it as some other type, say, external array of char, in another. Extern can't prevent this. Extern can't prevent duplicate definitions of the same symbol in the entire program either. So, why bother? Either the program won't link or it won't work, if there's a mistake. I'm fine with both outcomes.

That is the point of using assembly: to have complete control of the generated machine instructions.

Assembly language programming also comes with responsibility. I'll fix the code if it doesn't assemble, link or work. I see no problem here.

I hope that gives you some food for thought.

Not really, there's nothing I haven't thought of before, but thanks for your consideration.
« Last Edit: February 09, 2014, 06:45:51 PM by alexfru »

Offline Rob Neff

  • Forum Moderator
  • Full Member
  • *****
  • Posts: 429
  • Country: us
Re: Why does NASM requre 'extern'?
« Reply #8 on: February 09, 2014, 07:33:39 PM »
As the compiler writer you are obviously free to make all the design choices you want.  Scratch your own itch, so to speak.  Do you have a target audience other than yourself for your compiler?

It sounds like your compiler will output assembler rather than object code - ie: you have a dependency on using an existing assembler as the back-end to create the object files.  If your output targets Nasm then you play by it's rules.  If you don't particularly like them then you can obviously either modify it's source more to your liking or target gas instead as seems to be your preference.

Personally, when developing assembly with Nasm, I like the global/extern mechanism.  The NASM-X macro set uses it extensively.  I want to know about undefined labels during assembly, not linkage. To each his own...

Offline alexfru

  • Jr. Member
  • *
  • Posts: 17
Re: Why does NASM requre 'extern'?
« Reply #9 on: February 09, 2014, 09:12:49 PM »
There are several possible target audiences:
- MS/whatever-DOS lovers wanting a simple free open-source compiler (I'm extending the existing choice of Turbo/Borland C/C++, Open Watcom C/C++, DGPP gcc, etc, all of the mentioned compilers have their issues)
- x86 OS development hobbyists (again, some 16-bit code is needed there, and as popular and powerful gcc may be, it just won't cut it, also the compiler is very easy to recompile/port; I hope to eventually add enough stuff to make it self-sufficient, but I do depend on NASM now)
- RetroBSD folks (currently, mine is the best C compiler that can fit in the 96KB of the MIPS CPU on-chip RAM; the only other, inferior, alternative is Small C by Cain/Hendrix, but it is way too limited and K&R syntax)
- etc

(g)as is not my preference. I strongly dislike it. But that's the kind of assembler available on RetroBSD (NASM doesn't target MIPS) and its approach to handling undefined/external symbols is about the only thing I'd like in NASM.

Offline Rob Neff

  • Forum Moderator
  • Full Member
  • *****
  • Posts: 429
  • Country: us
Re: Why does NASM requre 'extern'?
« Reply #10 on: February 09, 2014, 10:19:00 PM »
Unless you are dead set on creating your own compiler, as an alternative, have you thought of hacking on PCC?  I've not used PCC myself but from my limited knowledge it was BSD sponsored and there appears to be support of your targeted platforms although the docs imply arm, mips, powerpc support is experimental.  Perhaps some effort there can move it up from experimental status as well as provide the additional feature(s) you require.

Offline alexfru

  • Jr. Member
  • *
  • Posts: 17
Re: Why does NASM requre 'extern'?
« Reply #11 on: February 10, 2014, 12:04:31 AM »
You bet, I am. :) I want to keep developing mine and support in it as much of C89/ANSI C as possible, maybe a little more, maybe a little less. Just see how much is already done (the wiki page will tell you, if it hasn't yet). Struct/union support is nearly done, too. Most of the hard work is in. It's not long until it's usable and useful, heck, it compiles itself today. I don't want to drop it now. And I value the experience of this exercise (it's my 1st compiler, after all). But we are digressing.

Offline Bryant Keller

  • Forum Moderator
  • Full Member
  • *****
  • Posts: 360
  • Country: us
    • About Bryant Keller
Re: Why does NASM requre 'extern'?
« Reply #12 on: February 11, 2014, 06:36:07 PM »
Just to put this out there, NASM-X already removes the need of having to extern files manually. That's been there since the first version. The NASM-X INVOKE macro checks to see if a procedure name is already defined. If not, the macro then inserts an EXTERN on the fly. The reason for this was, the same as it is for assemblers like GAS and GoASM. Using EXTERN bloats your object files. Okay, so that's not entirely true. When NASM sees an EXTERN, an entry is put into the symbol table that then ends up in the object file. If the user never satisfies that external dependency, nothing is changed. This means that if I did something like:

Code: ("bloat.asm") [Select]
BITS 32
EXTERN a_proc
EXTERN b_proc
EXTERN c_proc
EXTERN d_proc
EXTERN e_proc
EXTERN f_proc
EXTERN g_proc
EXTERN h_proc
EXTERN i_proc
EXTERN j_proc
EXTERN k_proc
EXTERN l_proc
EXTERN m_proc
EXTERN n_proc
EXTERN o_proc
EXTERN p_proc
EXTERN q_proc
EXTERN r_proc
EXTERN s_proc
EXTERN t_proc
EXTERN u_proc
EXTERN v_proc
EXTERN w_proc
EXTERN x_proc
EXTERN y_proc
EXTERN z_proc

GLOBAL _start
SECTION .text
_start:
   mov eax, 1
   mov ebx, 0
   int 0x80

Code: ("slim.asm") [Select]
BITS 32

GLOBAL _start
SECTION .text
_start:
   mov eax, 1
   mov ebx, 0
   int 0x80

Now I assemble and link these exactly the same.

Code: (xterm) [Select]
bryant@desktop:~/Projects/bloat$ ls
bloat.asm  slim.asm
bryant@desktop:~/Projects/bloat$ for i in *.asm ; do nasm -f elf $i; done
bryant@desktop:~/Projects/bloat$ for i in *.o ; do ld $i -o ${i%.*}; done
bryant@desktop:~/Projects/bloat$ ls -lh
total 24K
-rwxr-xr-x 1 bryant bryant 1.1K Feb 11 13:19 bloat
-rw-r--r-- 1 bryant bryant  450 Feb 11 13:12 bloat.asm
-rw-r--r-- 1 bryant bryant 1.0K Feb 11 13:19 bloat.o
-rwxr-xr-x 1 bryant bryant  497 Feb 11 13:19 slim
-rw-r--r-- 1 bryant bryant   84 Feb 11 13:12 slim.asm
-rw-r--r-- 1 bryant bryant  432 Feb 11 13:19 slim.o
bryant@desktop:~/Projects/bloat$

As you can see in the above example, bloat.asm is considerably larger than slim.asm. This is the reason for EXTERNDEF in MASM. I had this conversation several years back with various members of the NASM team and it was decided that the need for such an extension wasn't worth the effort. Especially when you can simulate it using NASM's macros. The problem is that you need to overload your instructions to insert the extern on the fly, that's why it became part of NASM-X instead of standard.mac. In fact, you could probably add something like:

Code: [Select]
%imacro externdef 1-*.nolist
   %rep %0
      %ifndef __defined_%{1}
         %define __defined_%{1}
      %endif
      %rotate 1
   %endrep
%endmacro

%imacro call 1.nolist
   %ifdef __defined_%{1}
      [extern %1]
   %endif
   call %{1}
%endmacro

To your own standard.mac file before compiling nASM and this would simulate the MASM style EXTERNDEF.

Code: (bloat.asm) [Select]
BITS 32

%imacro externdef 1-*.nolist
   %rep %0
      %ifndef __defined_%{1}
         %define __defined_%{1}
      %endif
      %rotate 1
   %endrep
%endmacro

%imacro call 1.nolist
   %ifdef __defined_%{1}
      [extern %1]
   %endif
   call %{1}
%endmacro

EXTERNDEF a_proc
EXTERNDEF b_proc
EXTERNDEF c_proc
EXTERNDEF d_proc
EXTERNDEF e_proc
EXTERNDEF f_proc
EXTERNDEF g_proc
EXTERNDEF h_proc
EXTERNDEF i_proc
EXTERNDEF j_proc
EXTERNDEF k_proc
EXTERNDEF l_proc
EXTERNDEF m_proc
EXTERNDEF n_proc
EXTERNDEF o_proc
EXTERNDEF p_proc
EXTERNDEF q_proc
EXTERNDEF r_proc
EXTERNDEF s_proc
EXTERNDEF t_proc
EXTERNDEF u_proc
EXTERNDEF v_proc
EXTERNDEF w_proc
EXTERNDEF x_proc
EXTERNDEF y_proc
EXTERNDEF z_proc
EXTERNDEF puts

GLOBAL _start
SECTION .text
_start:
   push dword message
   call puts
   add esp, 4

   mov eax, 1
   mov ebx, 0
   int 0x80

SECTION .data
message: db "Hello, World!", 0

Code: (slim.asm) [Select]
BITS 32
EXTERN puts
GLOBAL _start
SECTION .text
_start:
   push dword message
   call puts
   add esp, 4

   mov eax, 1
   mov ebx, 0
   int 0x80
SECTION .data
message: db "Hello, World!", 0

Code: (xterm) [Select]
bryant@desktop:~/Projects/bloat$ ls
bloat.asm  slim.asm
bryant@desktop:~/Projects/bloat$ for i in *.asm ; do nasm -f elf $i; done
bryant@desktop:~/Projects/bloat$ for i in *.o ; do gcc -nostartfiles $i -o ${i%.*}; done
bryant@desktop:~/Projects/bloat$ ls -lh
total 24K
-rwxr-xr-x 1 bryant bryant 2.3K Feb 11 13:34 bloat
-rw-r--r-- 1 bryant bryant  868 Feb 11 13:29 bloat.asm
-rw-r--r-- 1 bryant bryant  624 Feb 11 13:34 bloat.o
-rwxr-xr-x 1 bryant bryant 2.3K Feb 11 13:34 slim
-rw-r--r-- 1 bryant bryant  191 Feb 11 13:30 slim.asm
-rw-r--r-- 1 bryant bryant  624 Feb 11 13:34 slim.o
bryant@desktop:~/Projects/bloat$

As you can see, both files are now the same size. Both are considerably larger because we have introduced the C runtime, but now unused EXTERN's aren't polluting the executable files' symbol table. Keep in mind, all of this is stuff that NASM-X already does for you. This is probably one of the reasons that NASM-X seems to have considerably slower build times is due to the large amount of symbol management. These slower build times were the breaker for this feature in NASM. As it is, NASM builds pretty quick and penalizing the users who have become accustomed to the responsiveness of NASM simply to add a feature that most NASM users won't use would be a hard pill to swallow for the majority of the NASM user-base.

But that's just my two-cents...

Regards,
Bryant Keller

About Bryant Keller
bkeller@about.me

Offline alexfru

  • Jr. Member
  • *
  • Posts: 17
Re: Why does NASM requre 'extern'?
« Reply #13 on: February 12, 2014, 02:31:45 AM »
Just to put this out there, NASM-X already removes the need of having to extern files manually. That's been there since the first version. The NASM-X INVOKE macro checks to see if a procedure name is already defined. If not, the macro then inserts an EXTERN on the fly. The reason for this was, the same as it is for assemblers like GAS and GoASM. Using EXTERN bloats your object files. Okay, so that's not entirely true. When NASM sees an EXTERN, an entry is put into the symbol table that then ends up in the object file. If the user never satisfies that external dependency, nothing is changed.
...
As you can see, both files are now the same size. Both are considerably larger because we have introduced the C runtime, but now unused EXTERN's aren't polluting the executable files' symbol table. Keep in mind, all of this is stuff that NASM-X already does for you. This is probably one of the reasons that NASM-X seems to have considerably slower build times is due to the large amount of symbol management. These slower build times were the breaker for this feature in NASM. As it is, NASM builds pretty quick and penalizing the users who have become accustomed to the responsiveness of NASM simply to add a feature that most NASM users won't use would be a hard pill to swallow for the majority of the NASM user-base.

I don't think supporting this at the macro level is a good solution. It is a solution, though.

At any rate, if this is a controllable option and not an always-on feature, those who don't use it should not see any noticeable performance degradation.

Moreover, I think its users shouldn't see degradation either as there shouldn't be any difference between finding the needed label at the end of the file or not finding it there at all and automatically declaring it as extern as if "extern _foo" was actually written in the file. It's the same amount of parsing. There will be some perf implications linked to undefined labels, but their cost (whether positive or negative) should be proportional to the number of undefined labels, which is perfectly understandable.

Offline Bryant Keller

  • Forum Moderator
  • Full Member
  • *****
  • Posts: 360
  • Country: us
    • About Bryant Keller
Re: Why does NASM requre 'extern'?
« Reply #14 on: February 12, 2014, 04:14:18 AM »
I don't think supporting this at the macro level is a good solution. It is a solution, though.

I don't particularly like the EXTERNDEF solution either. The problem I have with it is that it makes use of instruction overloading. I prefer assembler to be WYSIWYG and overloading of instructions to do things behind the scenes has always rubbed me the wrong way. This is actually where most of the performance issues occur, each time the CALL instruction is encountered, the overloaded instruction is executed which forces the assembler to stop, search the symbol table for a defined symbol, then respond accordingly.

At any rate, if this is a controllable option and not an always-on feature, those who don't use it should not see any noticeable performance degradation.

Moreover, I think its users shouldn't see degradation either as there shouldn't be any difference between finding the needed label at the end of the file or not finding it there at all and automatically declaring it as extern as if "extern _foo" was actually written in the file. It's the same amount of parsing. There will be some perf implications linked to undefined labels, but their cost (whether positive or negative) should be proportional to the number of undefined labels, which is perfectly understandable.

My remark on performance penalties was in reference to adding such a macro to standard.mac; a collection of macros like EXTERN, STRUC, etc. that are compiled into NASM itself. I'm sorry if I didn't make that clear.

About Bryant Keller
bkeller@about.me