Lab 04 - Diving into 64-bit Assembly

After having worked with 6502 Assembly, we now move on to the more complex 64-bit Assembly that is being used by the majority of computers today. In this lab we explore the 64-bit assembly language used in both aarch64 (ARM) and x86_64 (Intel/AMD) architectures. We will explore their differences, similarities, as well as the tradeoffs for using either.

The task for this lab was to develop a simple program that prints the following output:

Loop: 0
Loop: 1
Loop: 2
Loop: 3
Loop: 4
Loop: 5
Loop: 6
Loop: 7
Loop: 8
Loop: 9
Loop: 10
Loop: 11
Loop: 12
Loop: 13
Loop: 14
Loop: 15
Loop: 16
Loop: 17
Loop: 18
Loop: 19
Loop: 20
Loop: 21
Loop: 22
Loop: 23
Loop: 24
Loop: 25
Loop: 26
Loop: 27
Loop: 28
Loop: 29

Here is program I developed to do this in Aarch64 assembly.

.text
.globl _start

min = 0
max = 30

_start:
    mov     x19, min // Store to x19 the value of min (0)

loop:

    /* ... body of the loop ... do something useful here ... */

    // write() syscall takes 3 args; file descriptor, pointer to message buffer, and message size
    mov     x0, 1        // write: arg1; file descriptor: 1 is stdout
    adr     x1, msg      // write: arg2; message location (memory address)
    mov     x2, len      // write: arg3; message length (bytes)

    // modify msg to add the current loop counter at the end
    mov     X20, 10            // Store to x20 the value #10
    udiv    x21, x19, x20      // Store to x21 the value of x19 (loop counter) / x20 (10); This calculates the quotient without remainder
    msub    x22, x21, x20, x19 // Store to x22 the value of x19 (loop counter) - (x21 [quotient] * x20 [10]); This calculates the remainder of quotient
    add     x22, x22, 48       // Add 48 to the valud in x22; 48 is the ascii value for zero

    cmp     x21, 0
    b.eq    single_digit

    add     x21, x21, 48       // Add 48 to the value in x21; 48 is the ascii value for zero
    strb    w21, [x1, 6]       // store value in x21 (ascii charater) in msg address + 6
    strb    w22, [x1, 7]       // store value in x22 (ascii charater) in msg address + 7
    b       continue

single_digit: 
    strb    w22, [x1, 6]       // store value in x22 (ascii charater) in msg address + 7

continue: 

    // to run a syscall, put the syscall number in r8, and invoke it by running `svc 0`
    mov     x8, 64     	 // write is syscall #64
    svc     0          	 // invoke syscall

    // check if we should continue looping
    add     x19, x19, 1     /* increment the loop counter */
    cmp     x19, max        /* see if we've hit the max */
    b.ne    loop            /* if not, then continue the loop */
    
    // Once the loop is done...
    mov     x0, 0           /* set exit status to 0 */
    mov     x8, 93          /* exit is syscall #93 */
    svc     0               /* invoke syscall */

.data
msg: 	.ascii      "Loop:   \n"
len= 	. - msg

And here is the program I developed in x86_64 assembly.

.text
.globl	_start

min = 0
max = 30

_start:
	mov     $min, %r15    /* loop index */

loop: 

	/* ... body of the loop ... do something useful here ... */

	/* write() syscall takes 3 args; file descriptor, pointer to message buffer, and message size */
  	mov	  $msg, %rsi			/* sys write: arg2; message location (mem address) */

  	mov   %r15, %rax      		/* move loop index to rax (dividend) */
  	mov   $10, %r14       		/* move 10 to r14 (divisor) */
  	xor   %rdx, %rdx      		/* clear rdx (remainder) */ 
  	div   %r14          	  	/* divide loop index by 10; quotient in rax, remainder in rdx */
  	add   $48, %rdx     		/* add 48 to remainder; converts it to ascii */

  	cmp   $0, %rax        		/* if quotient is 0, just print the remainder */
  	je    single_digit    

  	add   $48, %rax       		/* add 48 to quotient; converts it to ascii */
  	movb  %al, 6(%rsi)    		/* store quotient in 6th index of msg */
  	movb  %dl, 7(%rsi)    		/* store remainder in 7th index of msg */
  	jmp   continue

single_digit:
  	movb  %dl, 6(%rsi)    		/* store remainder in 6th index of msg */

continue:

  	/* to run a syscall, put the syscall number in r8, and invoke it by running `svc 0` */
  	movq	$1, %rdi			/* sys write: arg1; file descriptor stdout */
  	movq	$len, %rdx			/* sys write: arg3; message length */
	movq	$1, %rax			/* syscall sys_write */
	syscall

  	/* check if we should continue looping */
	inc	%r15		          	/* increment the loop index */
	cmp	$max, %r15	      		/* see if we've hit the max */
	jne	loop		          	/* if not, then continue to loop */

  	/* Once the loop is done... */
	movq	$0, %rdi			/* exit status */
	movq	$60, %rax		   	/* syscall sys_exit */
	syscall

.section .data

msg:	.ascii      "Loop:   \n"
	len = . - msg

Thoughts

After working with 6502 assembly and 64-bit assembly, my thoughts are as follows: "Assembly is pain."

Those who think C is a low-level language have definitely never experienced what it is like to write in Assembly. Not only are the opcodes really cryptic, but you also have to write a lot of code just to execute a simple task like printing a message to the screen. In C you only need to write one line, i.e, printf().

Another painful thing about Assembly is the fact that the language is not even the same for different architectures and compilers. This is because different architectures do things differently, and so it is a given that the language and instruction set will differ greatly as well.

Knowledge in a specific type of assembly does not translate directly to other types. It is like comparing Java and C++. They both have similar concepts but the way they do things is completely different, so you still have to take the time to understand how they work. In assembly, you have to take the time to understand the registers and instructions available to you in a given architecture. For example, in 6502 Assembly you do division through a series of simple arithmetic like subtraction and bit-shifting. In Aarch64, you have the udiv instruction which does an unsigned division but discards the remainder so you have to find another way to get the remainder yourself if you need it. x86_64, on the other hand, presents to you the div instruction which does an unsigned division and stores the quotient in the %rax register and the remainder in the %rdx register which is nice. Overall, the semantics of the assembly language that you work with will ultimately depend on the system architecture that you're on.

6502 assembly was a nice simple language with a relatively small set of instructions. This made it a quite easy to pick up and understand, but came at a price of a reduced feature-set and limited set of registers. You have to write a lot of code just to do a simple task like multiplication or division.

Aarch64 assembly is a huge step up to that because it has a larger set of registers and more features for doing arithmetic and working with memory. Which is to be expected given that it is a system capable of processing 64 bits at a time compared to the 6502 only being able to process 8 bits at a time.

Similarly, we have the x86_64 which also boasts a larger feature-set even larger than the Aarch64; however, it offers only half the registers that Aarch64 has.

After working with both, I currently have a slight preference for Aarch64 over x86_64 because of its simplicity. The huge feature-set of x86_64 made things feel very complex while the simplicity of Aarch64 made it easier to pick and develop with. I do believe however that this may just be due to my lack of experience with x86_64. With time and the more I get myself acquainted with the x86_64 instruction set, I may find myself leaning more in favor of it over Aarch64. We'll see when I get there.

Though learning assembly is a very powerful tool that can be used to help us optimize our programs, my general thoughts and experience with it still stands: "Assembly is pain."

Comments

Popular posts from this blog

Lab 03 - Breakout Game

Building GCC