24/06/2022
Someone in a group I'm in asked the following:
What exactly is "compiling" as it relates to source code, and if a program or application has a single source code, why does it need to be compiled differently for different versions of Linux or going into Unix if it's supposed to be a "cross platform" app?
It's a great question.
MACHINE CODE
A processor is a machine. Its bit more complex than a car engine but essential it repeatedly does the same thing over and over again, billions of times a second. What it does it takes a number stored in binary in the memory and depending what that number is, it interprets the next set of numbers in some hard wired way. That first number is called an instruction, the following numbers are the parameters for the instruction (if there are any for that particular instruction).
ASSEMBLY LANGUAGE
No one talks to computers in binary. Programmers aren't all that smart. Instead they use mnemonics to represent the numbers. So for example on an intel or AMD processor, the numbers 184-191 are represented by the mnemonic MOV and what it says to the processor is that a special variable called a register which is hardwired into the processor should be moved to a 16 bit value which will be contained in the next two bytes.
The person writing assembly language would say MOV AX, 1024 and the assembler would work out which number from 185-191 is to be used based on which register (in this case AX) was referred to by the programmer. These specific MOV instructions will cause the processors to interpret the next two bytes as a 16 bit word.
In machine code, the entire instruction us stored on memory looking like 184, 0, 4.
The last two digits are in little endian order and equate to the hexidecimal value 400 which is 1024 in decimal.
I know, I know, that sounds like total gobbledygook but that's what the source code of a high level language looks like to a computer. They understand machine code really really well and process it very very quickly but we humans don't find it very intuitive, even with the mnemonics to save us typing "101110000000000000000100" which is what it looks like in the binary storage registers on the CPU.
This is where high level languages come in.
An instruction like MOV AX,1024 could be written in almost all high level languages as
AX=1024;
This doesn't change the register on the processor and is a pretty pointless instruction all by itself. An assembly language program will contain many instructions to do tasks which can be expressed much more simply in a more human readable format.
In the language C, using the stdio.h library, we can say "printf("hello");" in a single line inside a main function and 30 characters or less we have achieved output from the machine. In a language like BASIC, there is no need for a main function so its simply "print "hello"".
In assembly language this instruction would require setting several memory locations to the values for H, E, L, L and O, then setting a register to point to the location of that memory, then setting another register to either count through each letter or limit a subroutine to print only a certain number of letters and the short story is, it takes lots and lots of assembly language instructions to do things which can be done in a simple one line command in a high level language.
Now some high level languages such as BASIC, ASP, Javascript or PHP are called interpreted languages. They write the required machine code while they are running but that process of interpreting the high level language into machine code while the wheel spins slows down the wheel and generally interpreted languages are considered less efficient than compiled languages such as C, C++, PASCAL or COBOL.
Compiled languages have standard blocks of assembly language to represent each command. They compile together all those assembly language blocks into object files which can then be linked to a runtime assembly block which keeps all the other blocks operating correctly and sets up the overall machine state within which they run.
If you have a linux operating system, you may have heard the expression "compiling the kernel". Its maybe something you dont want to do outside of a virtual machine but maybe you want to know exactly what this means.
The linux kernel is called a monolithic kernel but it is modular in the sense that you can add bits to it and take bits away from it so long as you know how to compile it. The kernel is basically a big machine code program (which has been compiled by C or C++) that deals with things like switching between processes, providing file system functions and network functions. Process switching involves saving the state of all the registers for one process and replacing them with the values of the registers for the other process. Its something like task switching from the taskbar of your operating system GUI, except its doing that with tiny little bits of code millions of times per second to make it seem like they are all running concurrently.
In a nutshell, the kernel of an operating system is the thing which talks to the hardware and all the software which runs on an operating system must be compiled for that specific operating system for that specific processor.
For example, in an Intel/AMD windows system, we call on the operating system to provide us with file system functions using the instruction INT 33 (which is usually written INT 21h since 21 hexidecimal is 33 in decimal).
On a Linux system using Intel/AMD processors we would use the instruction INT 96 (which is INT 60h). Without a windows emulator like Wine or a virtual machine like VMWARE, windows applications will not work on Linux and the same is true of Windows being unable to execute Linux programs. The operating system needs to be spoken to in the way it is expecting to be talked to otherwise it would be total chaos.
Linux works on a variety of different processors not just Intel/AMD. It will work on anything that UNIX will work on MIPS processors like in DEC Alphas and SUN Stations, ARM processors like in Android phones and tablets, it will even work on 68000 processors for old machines like AMIGAs and Atari STs.
Adept users will note that all the processors I am mentioning besides Intel/AMD are RISC processors and the assembly language involved in programming and compiling these is a different paradigm to the CISC processors produced by Intel and AMD.
Every program written in a compiled language must be compiled for the specific operating system and for the specific processor. Compilers like those in the GNU Compiler Suite will allow you to target the right platform for your program but to get your program to work to test and debug it, you must either be on the platform you are targeting or have some way of emulating it.
This is where virtualisation comes in really handy and its worth noting that both Oracle and VMWare provide free versions of their virtual machine workstations for both "wintel" (as its called) and Lintel (to coin a phrase). You can of course run your compiled windows programs using Wine if you are on Linux but if you want to target macbooks or androids or some other platform, you will need to get a virtual system up and running to test on, if you don't have physical hardware to test it on yourself, or if the process of transferring for just in time debugging is too cumbersome to make your software development efficient.
If you have any questions, leave a comment. If you break anything, order spares at auranos.org