Lab 04 - Diving into 64-bit Assembly
After having worked with 6502 Assembly, we now move on to the more complex 64-bit Assembly that is being used by the majority of computers today. In this lab we explore the 64-bit assembly language used in both aarch64 (ARM) and x86_64 (Intel/AMD) architectures. We will explore their differences, similarities, as well as the tradeoffs for using either.
The task for this lab was to develop a simple program that prints the following output:
Loop: 0 Loop: 1 Loop: 2 Loop: 3 Loop: 4 Loop: 5 Loop: 6 Loop: 7 Loop: 8 Loop: 9 Loop: 10 Loop: 11 Loop: 12 Loop: 13 Loop: 14 Loop: 15 Loop: 16 Loop: 17 Loop: 18 Loop: 19 Loop: 20 Loop: 21 Loop: 22 Loop: 23 Loop: 24 Loop: 25 Loop: 26 Loop: 27 Loop: 28 Loop: 29
Here is program I developed to do this in Aarch64 assembly.
.text .globl _start min = 0 max = 30 _start: mov x19, min // Store to x19 the value of min (0) loop: /* ... body of the loop ... do something useful here ... */ // write() syscall takes 3 args; file descriptor, pointer to message buffer, and message size mov x0, 1 // write: arg1; file descriptor: 1 is stdout adr x1, msg // write: arg2; message location (memory address) mov x2, len // write: arg3; message length (bytes) // modify msg to add the current loop counter at the end mov X20, 10 // Store to x20 the value #10 udiv x21, x19, x20 // Store to x21 the value of x19 (loop counter) / x20 (10); This calculates the quotient without remainder msub x22, x21, x20, x19 // Store to x22 the value of x19 (loop counter) - (x21 [quotient] * x20 [10]); This calculates the remainder of quotient add x22, x22, 48 // Add 48 to the valud in x22; 48 is the ascii value for zero cmp x21, 0 b.eq single_digit add x21, x21, 48 // Add 48 to the value in x21; 48 is the ascii value for zero strb w21, [x1, 6] // store value in x21 (ascii charater) in msg address + 6 strb w22, [x1, 7] // store value in x22 (ascii charater) in msg address + 7 b continue single_digit: strb w22, [x1, 6] // store value in x22 (ascii charater) in msg address + 7 continue: // to run a syscall, put the syscall number in r8, and invoke it by running `svc 0` mov x8, 64 // write is syscall #64 svc 0 // invoke syscall // check if we should continue looping add x19, x19, 1 /* increment the loop counter */ cmp x19, max /* see if we've hit the max */ b.ne loop /* if not, then continue the loop */ // Once the loop is done... mov x0, 0 /* set exit status to 0 */ mov x8, 93 /* exit is syscall #93 */ svc 0 /* invoke syscall */ .data msg: .ascii "Loop: \n" len= . - msg
And here is the program I developed in x86_64 assembly.
.text .globl _start min = 0 max = 30 _start: mov $min, %r15 /* loop index */ loop: /* ... body of the loop ... do something useful here ... */ /* write() syscall takes 3 args; file descriptor, pointer to message buffer, and message size */ mov $msg, %rsi /* sys write: arg2; message location (mem address) */ mov %r15, %rax /* move loop index to rax (dividend) */ mov $10, %r14 /* move 10 to r14 (divisor) */ xor %rdx, %rdx /* clear rdx (remainder) */ div %r14 /* divide loop index by 10; quotient in rax, remainder in rdx */ add $48, %rdx /* add 48 to remainder; converts it to ascii */ cmp $0, %rax /* if quotient is 0, just print the remainder */ je single_digit add $48, %rax /* add 48 to quotient; converts it to ascii */ movb %al, 6(%rsi) /* store quotient in 6th index of msg */ movb %dl, 7(%rsi) /* store remainder in 7th index of msg */ jmp continue single_digit: movb %dl, 6(%rsi) /* store remainder in 6th index of msg */ continue: /* to run a syscall, put the syscall number in r8, and invoke it by running `svc 0` */ movq $1, %rdi /* sys write: arg1; file descriptor stdout */ movq $len, %rdx /* sys write: arg3; message length */ movq $1, %rax /* syscall sys_write */ syscall /* check if we should continue looping */ inc %r15 /* increment the loop index */ cmp $max, %r15 /* see if we've hit the max */ jne loop /* if not, then continue to loop */ /* Once the loop is done... */ movq $0, %rdi /* exit status */ movq $60, %rax /* syscall sys_exit */ syscall .section .data msg: .ascii "Loop: \n" len = . - msg
Thoughts
After working with 6502 assembly and 64-bit assembly, my thoughts are as follows: "Assembly is pain."
Those who think C is a low-level language have definitely never experienced what it is like to write in Assembly. Not only are the opcodes really cryptic, but you also have to write a lot of code just to execute a simple task like printing a message to the screen. In C you only need to write one line, i.e, printf()
.
Another painful thing about Assembly is the fact that the language is not even the same for different architectures and compilers. This is because different architectures do things differently, and so it is a given that the language and instruction set will differ greatly as well.
Knowledge in a specific type of assembly does not translate directly to other types. It is like comparing Java and C++. They both have similar concepts but the way they do things is completely different, so you still have to take the time to understand how they work. In assembly, you have to take the time to understand the registers and instructions available to you in a given architecture. For example, in 6502 Assembly you do division through a series of simple arithmetic like subtraction and bit-shifting. In Aarch64, you have the udiv
instruction which does an unsigned division but discards the remainder so you have to find another way to get the remainder yourself if you need it. x86_64, on the other hand, presents to you the div
instruction which does an unsigned division and stores the quotient in the %rax
register and the remainder in the %rdx
register which is nice. Overall, the semantics of the assembly language that you work with will ultimately depend on the system architecture that you're on.
6502 assembly was a nice simple language with a relatively small set of instructions. This made it a quite easy to pick up and understand, but came at a price of a reduced feature-set and limited set of registers. You have to write a lot of code just to do a simple task like multiplication or division.
Aarch64 assembly is a huge step up to that because it has a larger set of registers and more features for doing arithmetic and working with memory. Which is to be expected given that it is a system capable of processing 64 bits at a time compared to the 6502 only being able to process 8 bits at a time.
Similarly, we have the x86_64 which also boasts a larger feature-set even larger than the Aarch64; however, it offers only half the registers that Aarch64 has.
After working with both, I currently have a slight preference for Aarch64 over x86_64 because of its simplicity. The huge feature-set of x86_64 made things feel very complex while the simplicity of Aarch64 made it easier to pick and develop with. I do believe however that this may just be due to my lack of experience with x86_64. With time and the more I get myself acquainted with the x86_64 instruction set, I may find myself leaning more in favor of it over Aarch64. We'll see when I get there.
Though learning assembly is a very powerful tool that can be used to help us optimize our programs, my general thoughts and experience with it still stands: "Assembly is pain."
Comments
Post a Comment