Which is a program that processes a program just before the program is compiled?

Compiling a C Program

  1. Compiling is the transformation from Source Code (human readable) into machine code (computer executable). A compiler is a program. A compiler takes the recipe (code) for a new program (written in a high level language) and transforms this Code into a new language (Machine Language) that can be understood by the computer itself. This "machine language" is difficult to impossible for humans to read and understand (much less debug and maintain), thus the need for "high level languages" such as C.

  2. The compiler also ensures that your program is TYPE correct. For example, you are not allowed to assign a string to an integer variable!

  3. The compiler also ensures that your program is syntactically correct. For example, "x * y" is valid, but "x @ y" is not.

  4. The compiler does not ensure that your program is logically correct.

  5. The compiler we use is the GNU (Gnu is not Unix) Open Source compiler.

    G++ is the name of the compiler. (Note: G++ also compiles C++ code, but since C is directly compatible with C++, so we can use it.).

    To compile a program, you use the following command:

        % g++ -g -pedantic -Wall -o executable_file_name source_file_name.C
    	  

    This command can be written at the Linux command window, or can be typed in using emac's compile command


Parts of the Compile Command Syntax

Compilers provide many options and settings that you can use depending on what properties you want the compiled program to have (e.g., faster vs. easier to debug).

Again, remember we use the following command to compile a program:

    % g++ -g -pedantic -Wall -o executable_file_name source_file_name.C
      

The options we will use for g++ are:

  1. g++ : (the name of compiler)
  2. -g : (allow debugging)
  3. -pedantic : (only allow real C)
  4. -Wall : (provide all warnings of possible mistakes)
  5. -o "X" : name the exectuable X
  6. file.C : the source code
  7. there are many (MANY) more options, but few that we will use in CS1000. These have to do with optimizations, specific computer architectures, etc...


A Note on the G++ (GNU) Compiler

There are many compilers for C, but we will focus on a free open source version called the Gnu C compiler. (Actually we will use the Gnu C++ compiler, but all C programs compile using this compiler).

The g++ compiler is open source, meaning you can use it for free on any project you want, including "for profit" projects. Further, if you so desire, you could extend the compiler to work better, fix bugs in the compiler, port the compiler to another operating system/computer architecture, etc.

G++ will compile not only C++ programs, but C programs as well!

You can download G++ free of charge for your home machine. It will run under Linux or Windows. The most recent version of the compiler can be found here: Gnu Web Page

Additional documentation on the compiler is available at this location as well.

Compiling a C program is a multi-stage process. At an overview level, the process can be split into four separate stages: Preprocessing, compilation, assembly, and linking.

In this post, I’ll walk through each of the four stages of compiling the following C program:

/*
 * "Hello, World!": A classic.
 */

#include <stdio.h>

int
main(void)
{
	puts("Hello, World!");
	return 0;
}

Preprocessing

The first stage of compilation is called preprocessing. In this stage, lines starting with a # character are interpreted by the preprocessor as preprocessor commands. These commands form a simple macro language with its own syntax and semantics. This language is used to reduce repetition in source code by providing functionality to inline files, define macros, and to conditionally omit code.

Before interpreting commands, the preprocessor does some initial processing. This includes joining continued lines (lines ending with a \) and stripping comments.

To print the result of the preprocessing stage, pass the -E option to cc:

Given the “Hello, World!” example above, the preprocessor will produce the contents of the stdio.h header file joined with the contents of the hello_world.c file, stripped free from its leading comment:

[lines omitted for brevity]

extern int __vsnprintf_chk (char * restrict, size_t,
       int, size_t, const char * restrict, va_list);
# 493 "/usr/include/stdio.h" 2 3 4
# 2 "hello_world.c" 2

int
main(void) {
 puts("Hello, World!");
 return 0;
}

Compilation

The second stage of compilation is confusingly enough called compilation. In this stage, the preprocessed code is translated to assembly instructions specific to the target processor architecture. These form an intermediate human readable language.

The existence of this step allows for C code to contain inline assembly instructions and for different assemblers to be used.

Some compilers also supports the use of an integrated assembler, in which the compilation stage generates machine code directly, avoiding the overhead of generating the intermediate assembly instructions and invoking the assembler.

To save the result of the compilation stage, pass the -S option to cc:

This will create a file named hello_world.s, containing the generated assembly instructions. On macOS 10.10.4, where cc is an alias for clang, the following output is generated:

    .section    __TEXT,__text,regular,pure_instructions
    .macosx_version_min 10, 10
    .globl  _main
    .align  4, 0x90
_main:                                  ## @main
    .cfi_startproc
## BB#0:
    pushq   %rbp
Ltmp0:
    .cfi_def_cfa_offset 16
Ltmp1:
    .cfi_offset %rbp, -16
    movq    %rsp, %rbp
Ltmp2:
    .cfi_def_cfa_register %rbp
    subq    $16, %rsp
    leaq    L_.str(%rip), %rdi
    movl    $0, -4(%rbp)
    callq   _puts
    xorl    %ecx, %ecx
    movl    %eax, -8(%rbp)          ## 4-byte Spill
    movl    %ecx, %eax
    addq    $16, %rsp
    popq    %rbp
    retq
    .cfi_endproc

    .section    __TEXT,__cstring,cstring_literals
L_.str:                                 ## @.str
    .asciz  "Hello, World!"


.subsections_via_symbols

Assembly

During this stage, an assembler is used to translate the assembly instructions to object code. The output consists of actual instructions to be run by the target processor.

To save the result of the assembly stage, pass the -c option to cc:

Running the above command will create a file named hello_world.o, containing the object code of the program. The contents of this file is in a binary format and can be inspected using hexdump or od by running either one of the following commands:

hexdump hello_world.o
od -c hello_world.o

Linking

The object code generated in the assembly stage is composed of machine instructions that the processor understands but some pieces of the program are out of order or missing. To produce an executable program, the existing pieces have to be rearranged and the missing ones filled in. This process is called linking.

The linker will arrange the pieces of object code so that functions in some pieces can successfully call functions in other ones. It will also add pieces containing the instructions for library functions used by the program. In the case of the “Hello, World!” program, the linker will add the object code for the puts function.

The result of this stage is the final executable program. When run without options, cc will name this file a.out. To name the file something else, pass the -o option to cc:

cc -o hello_world hello_world.c

What is the process called compiling?

The process of converting source code written in any programming language—usually a mid- or high-level language—into a machine-level language that is understandable by the computer is known as Compilation. The software used for this conversion is known as a compiler.

What happens when a program is compiled?

A compiler takes the program code (source code) and converts the source code to a machine language module (called an object file). Another specialized program, called a linker, combines this object file with other previously compiled object files (in particular run-time modules) to create an executable file.

What is the preprocessor stage of compiling?

Preprocessing manipulates the text of a source file, usually as a first phase of translation that is initiated by a compiler invocation. Common tasks accomplished by preprocessing are macro substitution, testing for conditional compilation directives, and file inclusion.

What is compile and execute programs?

A compiler is an executable program that takes program source code (text) as input and translates it into an executable program (binary machine code) that it writes into a file as output. That executable program can then be run to process input data and generate output according to whatever we wrote our program to do.