Since most of the resources, methods and tools for learning assembly are based for Windows. So here I will introduce some materials and tools for macOS
I will continuously update this article and also using it as a personal note.
Assembly is the “magic” of computers. You can be a fine “warrior” who only uses high-level languages, but if you “enchant” your “weapon” with assembly, your combat effectiveness will increase greatly (of course, there are also risks of “playing with fire and burning yourself”).
Today, there are several ways to use assembly:
Note 1: However, Apple later added a conversion that made the estimation very inaccurate, rendering this kind of app useless.
Note 2: Objective-C is a superset of the C language, just like C++ is also a superset of the C language. They all belong to the C language family. Before Apple use Swift, the development of Apple platforms relied entirely on Objective-C.
Now, assembly is not the threshold of programming, it’s a required course to become a master. However, novices are still advised to learn C language first before contacting assembly (“learned” means being able to understand various structures and concepts).
Today, there are many high-level and script languages, assembly language is becoming more and more unpopular because it is cumbersome, complex and has extremely high learning costs. For example, the Intel CPU development manual has 5,000 pages (and it will be updated every time a new gengeration is released).
The first recommendation is Apple’s official assembly guide: “Mac OS X Assembler Guide”, the download address is at http://personal.denison.edu/~bressoud/cs281-s07/Assembler.pdf.
This guide covers the use of Mac OS X assembler as
, assembly instructions, address modes and more. But it’s not only about Intel CPU, but also about earlier Power PC. Since the latest version of this guide is from 2005, Intel’s instructions have a huge update, such as AVX. And now Apple is already switching to Apple silicon(interestingly, the latest update time of manual of as
is 2020-6-23, which is the day of the M1 released).
The second recommendation is Intel’s “Introduction to x64 Assembly”, this is a short introduction to assembly. It’s friendly for newbies or beginner.
The third one is Intel’s development manual: Intel® 64 and IA-32 Architectures Software Developer Manuals.
Although this manual is very long, it is a very good information and is updated frequently. If you need to understand the new Intel instructions, it must be read.
You can directly Download the bound volume as an archive, but this is inconvenient to read. (Intel used to provide paper copies for free, as long as you sent an email and provided your address. Many people ordered, but most of them used to up monitor or left them in the dust. Now Intel no longer provides this benefit, which is a pity)
The Volume 1 is a general introduction, about the development history and differences of Intel CPU. If you are a beginner, you can take a look. You may be read Volume 2 frequently, it’s about instructions. The specific differences can be seen in the interface in the Intel® 64 and IA-32 Architectures Software Developer Manuals。
Note that although Intel is often said to updating slowly and liitle, Intel almost every generation will have instructions or registers updates and changes. So the document update frequency is relatively high, if you need to use the latest instructions, then please update your stored documents in time.
The fourth recommendation is a set of documents about ARM assembly. Because Mac is now switching to the self-developed M1 chip, which is based on ARM architecture and ARM does not currently provide entire assembly guidelines like Intel. The latest entire guidelines is 《ARM ® DeveloperSuite Assembler Guide Version 1.2》 in 2001, if you are a beginner, please read this first.
If you want to learn newest instruction, it’s here: https://developer.arm.com/architectures/instruction-sets
Assembler and compiler are different. If you don’t understand, simply think that the assembler converts assembly instructions into programs; the compiler converts high-level language codes (such as C/C++, Java) into programs.
Here is an introduction to the assembler and the utilities (including editors) that may be used.
The assembler as
for Mac OS X needs to be used in “Terminal”, similar to masm
in Windows. Supports assembly for Intel and Power PC processors. It’s in the /usr/bin/as
directory. There is an introduction to how to use it in the “Mac OS X Assembler Guide”.
ld
is a linker.
clang can be a assemble and preprocess C or other languages code to assembly code, you can see details in How to generate assembly code using Clang . Clang maybe make you use assembly easilier.
We don’t talk about the specific differences between gcc
and clang
. gcc
has flag -nostartfiles
ignore the linked standard library files and initialization behavior. Because sometimes you don’t need to connect the C standard library and just assemble it directly. I will make a demonstration at the end.
size
will output the size of sections, like:
otool
can check size and content of some section, like debug.exe
on Windows. For example, let’s check content of __TEXT,__text
:
You also can use otool
to check other sections, format is:
$ otool -v -s __TEXT __cstring a.out
a.out:
Contents of (__TEXT,__cstring) section
0000000000000025 hello world!\n
Xcode is Apple development IDE, it not only have GUI application, but also have some CLI utilities. It can write C/C++, Objective-C and Swift code, you also can develop app or programs with some assembly codes.
Many people may know some assembly, but probably Windows based. But the style and syntax in Mac OS X or C language is different. Because Intel and MASM use Intel syntax, and other use AT&T syntax.
The detail you can see in “「Updating」The syntax style of Apple’s as assembler”
I will give the complete code at the end, but now I explain it step by step.
At the beginning, create a file helloworld.s
. The suffix of the assembly code file is usually .asm
or .s
. Now you can write the first part.
First, you need to specify executable command:
.section __TEXT,__text,regular,pure_instructions
Then specify the target platform and version:
.build_version macos, 12, 0 sdk_version 12, 1
Create a new external symbol name _main
to be the entry of the program, just like a C language program must have a main()
function. Note that the underscore cannot be omitted.
.globl _main ## -- Begin function main
Use the align command to move the position counter to the next boundary 4, 0x90
(usually an address).
.p2align 4, 0x90
Finally, we can start writing the code in _main
part. It also means writing main()
function:
_main: ## @main
.cfi_startproc ##Indicates the beginning of the function. Will initialize some internal data structures and issue architecture-dependent initial CFI instructions
## %bb.0:
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset %rbp, -16
movq %rsp, %rbp
.cfi_def_cfa_register %rbp
subq $16, %rsp
movl $0, -4(%rbp)
leaq L_.str(%rip), %rdi
movb $0, %al
callq _printf
xorl %eax, %eax
addq $16, %rsp
popq %rbp
retq
.cfi_endproc ##end function
There are some cfi instructions. You can find more informations in: https://web.mit.edu/rhel-doc/3/rhel-as-en-3/cfi-directives.html
In addition, CFA is the Canonical Frame Address. Related information can be found here: https://www.keil.com/support/man/docs/armasm/armasm_dom1361290010153.htm
Let’s start the second part The string contained in the executable command:
.section __TEXT,__cstring,cstring_literals
L_.str: ## @.str
.asciz "hello world!\n"
Here is the string to be output: hello world!\n
. If you use the .ascii
command, rather than .asciz
, then this line should be changed to .ascii "hello world!\n\0"
, because .asciz
automatically adds \0
at the end of the string. If write in a C program, use .asciz
.
At the end, you can add this line:
.subsections_via_symbols
This will tell the static link editor how many pieces this part of object file can be divided into. However, it doesn’t matter whether it is used here or not, it does not affect the results. You can add it for rigor.
The full source code is:
.section __TEXT,__text,regular,pure_instructions
.build_version macos, 12, 0 sdk_version 12, 1
.globl _main ## -- Begin function main
.p2align 4, 0x90
_main: ## @main
.cfi_startproc ## Indicates the beginning of the function. Will initialize some internal data structures and issue architecture-dependent initial CFI instructions
## %bb.0:
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset %rbp, -16
movq %rsp, %rbp
.cfi_def_cfa_register %rbp
subq $16, %rsp
movl $0, -4(%rbp)
leaq L_.str(%rip), %rdi
movb $0, %al
callq _printf
xorl %eax, %eax
addq $16, %rsp
popq %rbp
retq
.cfi_endproc ## end function
.section __TEXT,__cstring,cstring_literals
L_.str: ## @.str
.asciz "hello world!\n"
.subsections_via_symbols
In Windows, the mainstream executable file format is PE (Portable Executable File Format), and the abbreviation of “Executable” is the EXE that many people are familiar with. In macOS, executable files are called “Mach-O executable files”.
If you use file
to check a Mach-O executable file, you can see someting like below. Here let’s check Xcode:
$ file /Applications/Xcode.app/Contents/MacOS/Xcode
Xcode: Mach-O universal binary with 2 architectures: [x86_64:Mach-O 64-bit executable x86_64] [arm64:Mach-O 64-bit executable arm64]
Xcode (for architecture x86_64): Mach-O 64-bit executable x86_64
Xcode (for architecture arm64): Mach-O 64-bit executable arm64
So the finally you need to convert hello.s
to a Mach-O executable file.
There are two methods:
Since this program is very simple and does not link to any special libraries, you can directly use the following command:
$ gcc -nostartfiles helloworld.s
$ chmod 755 a.out
$ file a.out
a.out: Mach-O 64-bit executable x86_64
Now you can see a.out
is our Mach-O executable file
needed。
Run it:
$ ./a.out
hello world!
Very nice.
This is common method. First use as
to assemble the code and modify the permissions:
$ as hello.s
$ chmod 755 a.out
You need to notice as
will generate Mach-O object file, rather than Mach-O executable file
. Mach-O executable file
can not run, it will warning cannot execute binary file
. On Windows, object file is named as COFF (Common Object File Format).
Then use the linker ld
to deal with object files. But if directly and simply use the ld
, you will see:
$ ld a.out
Undefined symbols for architecture x86_64:
"_printf", referenced from:
_main in a.out
ld: symbol(s) not found for architecture x86_64
Because it uses _printf
in C standard library, but ld
can’t find it directly. So, add library path is fine:
$ ld a.out -o hello -lSystem -L/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/lib
Common Mach-O executable file
doesn’t have any suffiex. The hello
is Mach-O executable file
we need. You can use file
to check it:
$ file hello
hello: Mach-O 64-bit executable x86_64
Run it:
$ ./hello
hello world!
Very Nice~
I hope these will help someone in need~