Author Topic: Totally clueless noob  (Read 19259 times)

Offline David Cooper

  • Jr. Member
  • *
  • Posts: 9
Totally clueless noob
« on: July 17, 2011, 12:44:31 AM »
Hi,

I don't know how to begin. I've just downloaded nasm and have no idea what to do with the command line. I should explain first that I'm not at all new to machine code - I've actually written an operating system directly in machine code which you can see here: http://www.magicschoolbook.com/computing/os-project.

I want to learn to use assembler for two reasons: to help other people debug their code, and to try to find out what the hex values are of some exotic machine code instructions which I can't find listed anywhere.

So, here's my first question, and hopefully someone will have the time to answer it. How do I get a program into the assembler? More specifically, do I write it in Notepad and then load it into nasm with some command line instruction? The nasm manual seems to skip this stage.

Offline Keith Kanios

  • Full Member
  • **
  • Posts: 383
  • Country: us
    • Personal Homepage
Re: Totally clueless noob
« Reply #1 on: July 17, 2011, 01:02:27 AM »
So, here's my first question, and hopefully someone will have the time to answer it. How do I get a program into the assembler? More specifically, do I write it in Notepad and then load it into nasm with some command line instruction? The nasm manual seems to skip this stage.

Assembling and compiling generally assumes standard ASCII and/or Unicode text input, in the form of human-readable (but perhaps not intuitively understandable) instructions.

You can use any standard text editor. Notepad will work OK, but you may prefer to use Notepad++ or any other editor that has syntax highlighting.

As for assembling a source file, the NASM Manual does cover that part quite thoroughly in Chapter 2.

Alternatively, you may want to utilize an IDE, although I do not know of any that work adequately, out-of-the-box and in modern terms. You can try MJaoune's NASM IDE, but YMMV.
« Last Edit: July 17, 2011, 01:04:02 AM by Keith Kanios »

Offline David Cooper

  • Jr. Member
  • *
  • Posts: 9
Re: Totally clueless noob
« Reply #2 on: July 17, 2011, 01:31:03 AM »
Thanks. I'm getting somewhere now that I know to use Notepad. I wrote a file with nothing more than "mov ah,4" in it (without the quotes), saved it in the nasm folder, used "nasm -f coff myfile.asm -l myfile.lst" at the command line and then, after a few failed attempts where I used -1 instead of -l, found in the nasm folder a new lst file with "1 00000000 B404     mov ah,4" in it. That's exactly the kind of thing I needed to be able to do.

Hopefully I'll manage now without further help. Thanks again.

Offline Frank Kotler

  • NASM Developer
  • Hero Member
  • *****
  • Posts: 2667
  • Country: us
Re: Totally clueless noob
« Reply #3 on: July 17, 2011, 10:35:39 AM »
Hi David,

You probably want to use "-f bin" rather than "-f coff", but you'll figure that out.

I tried your OS. Doesn't work for me (funky old Hewlitt-Packard P4). I get a blinking cursor at the top left, and a green letter 'A' at the bottom left. At this point, ctrl-alt-delete works, although the floppy drive motor stays on. If I hit 'esc' twice, the 'A' is replaced by numbers that increment to '4', and a "Loading OS" message appears. At this point, hitting 'esc' twice doesn't do anything, and ctrl-alt-delete no longer reboots. What's the secret?

I took a look at your code to see if I could figure out what the problem was. No joy yet. Some of the most "interesting" code I've seen! Good luck learning assembly - I sure wouldn't want to tackle it in decimal! :)

Best,
Frank


Offline David Cooper

  • Jr. Member
  • *
  • Posts: 9
Re: Totally clueless noob
« Reply #4 on: July 17, 2011, 08:01:25 PM »
Hi Frank,

You probably want to use "-f bin" rather than "-f coff", but you'll figure that out.

I've no idea what the coff part means yet, but just copied a line from the manual: it enabled me to get the hex for the second byte of the instruction mov eax,cr4 which I couldn't find in any manual, so it's already been a great success - I'll never again need to try to guess the values of rare instructions. I expect the bin option will help me debug other people's code, because once I can get it into my OS I'll be able to read it much more comfortably in decimal form, and often the errors stand out much more clearly too once you can see the actual numbers.

Quote
I tried your OS. Doesn't work for me (funky old Hewlitt-Packard P4). I get a blinking cursor at the top left, and a green letter 'A' at the bottom left. At this point, ctrl-alt-delete works, although the floppy drive motor stays on. If I hit 'esc' twice, the 'A' is replaced by numbers that increment to '4', and a "Loading OS" message appears. At this point, hitting 'esc' twice doesn't do anything, and ctrl-alt-delete no longer reboots. What's the secret?

Thanks for trying it - it's always good to get feedback about any problems people have with it. I didn't expect anyone to risk trying to run it directly on their machine as you can experiment with it more safely within Bochs. The letters displayed in the bottom left corner are there to indicate how far it's got with loading in case it gets stuck. When it counts up from 0 to 4, that indicates which track it's reached. The OS modules are loaded in by the BIOS so that you can boot a laptop via a USB floppy drive, and that's important because I now do most of my work with it on a netbook. The load code will try to reload a block of adjacent sectors repeatedly and infinitely if there's a big error until you press Esc, at which point it moves on to the next block. Sometimes the errors don't actually mean the correct data hasn't been loaded, so pressing Esc repeatedly might enable the OS to boot successfully even if there are lots of faulty sectors on the disk. If Ctrl+Alt+Del doesn't reboot, then that tells you it's stuck doing something in protected mode, so the problem may not be with reading the disk but rather with something more awkward. Then again, it may be because some earlier piece of code has failed to load properly due to disk errors. It may of course be that you just have a machine which objects to something in my code - the code enabling the A20 (to open up all the memory above 1MB) is causing trouble on one of my old machines, so I'm going to have to add a big delay loop to cure it.

Once the "Loading OS" message appears, that means the first two modules (the two machine code editors) have been loaded in from fixed locations near the start of the disk, and the next module is about to be loaded in from sectors which change every time the module is saved back to disk. This module is more fragmented across the disk as a result, and any problems with loading it will take more presses of Esc to get past. Even so, that can't be the direct problem as Ctrl+Alt+Del should be able to reboot the machine whenever the BIOS is in the act of loading sectors in. If you're getting any numbers or other chars beyond 4 displayed in the corner, they will indicate any progress through the tracks, but a little square should appear there for a moment whenever there's a disk error. If you can get it to the point where the third module has been loaded in, a message will tell you to press either Return to enter the OS, or Esc to go into repair mode - the latter allows you to run the OS while relying on less of its code, so it may be able to function even if many parts haven't loaded in properly.

Quote
I took a look at your code to see if I could figure out what the problem was. No joy yet.

If you don't normally have problems with that floppy drive and your disks, then it's probably something wrong with my code, and it would be very hard to find out where the problem is without modifying the code repeatedly and booting your machine over and over again with modified versions until the sticking point is isolated. I don't recommend that you do that to your machine. I'll just have to keep a list of machines that don't like my OS and see if any pattern emerges so that I can get hold of one of them and try to track down the fault. There are some weird things that happen sometimes, such as the BIOS on my netbook which crashes if you call a BIOS routine while BP holds a value radically different from the value in SP, so there are always going to be machines out there which cause unexpected problems.

Quote
Some of the most "interesting" code I've seen! Good luck learning assembly - I sure wouldn't want to tackle it in decimal! :)
It's pretty ordinary code, but it is written in a rather unorthodox way. I can assure you, however, that learning to work with machine code in decimal is no harder than learning to program any other way, whether that be with assembler, C or [insert programming language of your choice]. All of them look hard if you haven't learned them, and all of them probably seem easy enough once you have. What made direct machine code programming such a nightmare in the past was having to make thousands of changes every time a bit of code was moved out of place by an edit, but if you index the code, that can all be done automatically in an instant and it suddenly becomes a very practical way of working. I've also added a hex mode recently so that people can use it to learn to program directly in machine code in hex - I did this specifically for assembly language programmers who want to improve their machine code reading skills to make debugging easier.

Anyway, thanks again for letting me know, and I'd be very grateful if you could tell me the model name/number of your Hewlitt-Packard P4 for my list of machines that have trouble loading my OS.

Best regards,

David

Offline Frank Kotler

  • NASM Developer
  • Hero Member
  • *****
  • Posts: 2667
  • Country: us
Re: Totally clueless noob
« Reply #5 on: July 18, 2011, 08:35:07 AM »
Okay: hp pavilion 751n. Award Medallion BIOS, v 6.0, with an HP revision 3.0 (I think). Copyright dates are 2001. Some different from your netbook! :)

I've never learned to run Bochs. Seen too many messages over the years, "My bootsector runs on Bochs, but crashes on real hardware." - or vice versa. If Bochs doesn't emulate real hardware exactly, what use is it? Well, for MSB-OS, since it's apparently intended as a learning environment, running in Bochs may be appropriate. Not my cup of tea.

I actually wrote an assembler that used decimal once, long ago, for an Atari 400 (65c02 cpu, IIRC). I called it "myass.bas" (yes, in BASIC). If I hadn't been limited to 8.3 filenames I'd have called it "myhalfass.bas". It wasn't very good, but "kinda worked". Once I figured out why serious programmers used hex, I never looked back.

I haven't gotten too far with figuring out your code, but here's the bit that puts the 'A' on the screen, for example:

Code: [Select]
; write 'A' on screen
    mov di,0x8f00
    mov ax,0xb000
    mov es,ax
    mov ax,0xa41
    stosw
    xor ax,ax
    mov es,ax

You've loaded a video segment into es which would be correct for an old monochrome ttl card - unlikely to encounter one on a 386+. For EGA/CGA/VGA, 0xb800 is correct. But that's okay, you've used an offset that gets us up into the 0xB800 segment. Not what I'd call "ordinary" code - no offense intended. It works...

Your idea of "indexed" code seems quite interesting! You've gotten around the "big" advantage of using an assembler (or have "reinvented the assembler", if you look at it that way). I'll be interested to see how you did that, if I get far enough into your code - probably won't. My interest in writing an OS has faded considerably, since I realized I don't have enough time left to do anything "good".

I hope you'll find Nasm useful. I have an idea you may not "like" it. Each to his own taste, as the French say. See how it works out...

Best,
Frank


Offline David Cooper

  • Jr. Member
  • *
  • Posts: 9
Re: Totally clueless noob
« Reply #6 on: July 18, 2011, 08:16:48 PM »
Hi Frank,

Thanks for the details of your computer, and the BIOS (which I should have had the sense to ask about). I don't often work in Bochs myself - it's slow and forces you into a little box instead of using the whole screen, but it does have its uses, such as getting screenshots without having to use a camera. An OS which runs in Bochs is certainly not guaranteed to run directly on a computer, and an OS which works fine direct on a computer will often go wrong in Bochs, but getting it to work in Bochs does tend to help it work on a wider range of real hardware than it would otherwise. If you really want to explore my OS in more detail, the easiest way to do so is from the inside through it's interactive manual and the documentation of the code rather than trying to disassemble it, and you can do all of that within Bochs. I usually use my OS on an Advent netbook, but I still check everything works on my old 486, and I check it too on a temperamental Compaq Presario 1200 whenever it feels like booting up (it doesn't like cold weather).

I recently added a hex mode to my OS so that you can do everything in hex instead of decimal, and I've just written some code to convert all the decimal numbers in the documentation to hex as well (or from hex back to decimal) so that it can be converted at the press of a key - there is no longer any need for anyone using my OS to work with decimals at all, other than in the interactive part of the manual that teaches the basics of machine code programming. That new code which does the hex-dec/dec-hex translation of the documentation is not in the version that you've downloaded as I've only just written it, but I'll upload it when I'm sure it's reliable. I'm going to stick to working with decimals myself for the simple reason that it's easier to read (the instructions are more distinctive), but people who have already done a lot of work with hex will undoubtedly not want to switch to decimals, and now they don't need to.

I haven't gotten too far with figuring out your code, but here's the bit that puts the 'A' on the screen, for example:

Code: [Select]
; write 'A' on screen
    mov di,0x8f00
    mov ax,0xb000
    mov es,ax
    mov ax,0xa41
    stosw
    xor ax,ax
    mov es,ax

You've loaded a video segment into es which would be correct for an old monochrome ttl card - unlikely to encounter one on a 386+. For EGA/CGA/VGA, 0xb800 is correct. But that's okay, you've used an offset that gets us up into the 0xB800 segment. Not what I'd call "ordinary" code - no offense intended. It works...

I'm not particularly comfortable using 16-bit mode, so I tend to set segments for 64K blocks of memory and then use addresses within those blocks to line up on things - the text screen memory is fully contained within that block and therefore fully accessible, and by setting the segment that way it makes the addresses look more like the ones used in 32-bit mode, making it easier to remember what they are.

Quote
Your idea of "indexed" code seems quite interesting! You've gotten around the "big" advantage of using an assembler (or have "reinvented the assembler", if you look at it that way). I'll be interested to see how you did that, if I get far enough into your code - probably won't. My interest in writing an OS has faded considerably, since I realized I don't have enough time left to do anything "good".

What I've done is take blocks of memory which may be anything from 1KB to 64KB in size and turn them into independent code cells with an index running down from the top and the code growing from the base. If they meet in the middle, that cell is full. The index in a cell is typically between a half and two thirds the size of the code which it indexes, and it could be paged out to free up memory as it's only needed on loading or while modifying code. There are six sub-indexes within the index, and they are:-

1. The name index: a list of named bytes (four bytes per name) and their addresses within the code cell (two bytes) - these may be variables, the first bytes of routines, or bytes in routines which can be modified to change the way the routines function.

2. The point index: a list of addresses of places in the code which point at variables, plus the addresses of the variables they point at.

3. the offset index: a list of addresses of places in the code which jump to other routines, plus the addresses of those routines.

4. The external point index: a list of addresses of places in the code which point at variables in other code cells, plus the names of the variables they point at.

5. The external offset index: a list of addresses of places in the code which jump to routines in other code cells, plus the names of the routines they jump to.

6. A short-dis jump index which is usually empty, used to hold the addresses of single-byte jumps in the vicinity of code being modified so that these short jumps can be automatically updated after an edit.

You'd be very hard pushed to work that out by disassembling the code, but when you run the OS you can scroll through memory and see the indexes directly and observe how they change as you add more entries. Whenever an edit moves code (or variables) out of place, a program runs through the indexes and modifies all the affected indexed addresses and jump distances to ensure that the code will function correctly in its new location.

Quote
I hope you'll find Nasm useful. I have an idea you may not "like" it. Each to his own taste, as the French say. See how it works out...

There is actually a lot to like about it, though I could never switch to using it for writing my own code as I would lose the freedom to edit memory directly - I can write a string of machine code instructions into memory and then run them straight away (particularly useful for holding conversations with hardware via ports when testing how they work), or modify an interrupt routine while it's actively running, and I just love that directness. But I still want to learn to use assembler - when I try to help other people debug their code I get lost in the pseudo opcodes and other strange stuff that they use, and that's where I'm currently rather clueless. I've tried using other assemblers in the past and got nowhere with them, but nasm seems much more straightforward and efficient. I'm sure it's going to be fun exploring it.

David