How to open elf file in windows. What is the .ELF File Extension? Nodes describing data

Standard development tools compile your program to an ELF (Executable and Linkable Format) file with the option to include debugging information. The format specification can be read. In addition, each architecture has its own characteristics, such as those of ARM. Let's take a quick look at this format.
An executable file of the ELF format consists of the following parts:

1. Header (ELF Header)

Contains general information about the file and its main characteristics.

2. Program Header Table

This is a table of correspondence of file sections to memory segments, tells the loader in which memory area to write each section.

3. Sections

Sections contain all the information in the file (program, data, debug information, etc.)
Each section has a type, a name, and other parameters. The ".text" section usually stores the code, ".symtab" - the table of program symbols (names of files, procedures and variables), ".strtab" - the table of strings, sections with the prefix ".debug_" - debugging information, etc. .d. In addition, the file must have an empty section with index 0.

4. Section Header Table

This is a table containing an array of section headers.
The format is discussed in more detail in the Creating an ELF section.

DWARF Overview

DWARF is a standardized debug information format. The standard can be downloaded from the official website. There's also a great overview of the format: Introduction to the DWARF Debugging Format (Michael J. Eager).
Why is debug information needed? It allows:

set breakpoints not on a physical address, but on a line number in a source code file or on a function name
display and change the values of global and local variables, as well as function parameters
display call stack (backtrace)
execute the program step by step not by one assembly instruction, but by lines of source code

This information is stored in a tree structure. Each tree node has a parent, may have children, and is called a Debugging Information Entry (DIE). Each node has its own tag (type) and a list of attributes (properties) that describe the node. Attributes can contain anything, such as data or links to other nodes. In addition, there is information stored outside the tree.
Nodes are divided into two main types: nodes that describe data and nodes that describe code.

Nodes describing the data:

Data types:
- Base data types (a node with type DW_TAG_base_type), such as the int type in C.
- Composite data types (pointers, etc.)
- Arrays
- Structures, classes, unions, interfaces
Data objects:
- constants
- function parameters
- variables
- etc.

Each data object has a DW_AT_location attribute that specifies how the address where the data resides is calculated. For example, a variable can have a fixed address, be in a register or on the stack, be a member of a class or an object. This address can be calculated in a rather complicated way, so the standard provides for the so-called Location Expressions, which can contain a sequence of statements from a special internal stack machine.

Nodes describing the code:

Procedures (functions) - nodes with the DW_TAG_subprogram tag. Descendant nodes can contain descriptions of variables - function parameters and function local variables.
Compilation Unit. Contains information to the program and is the parent of all other nodes.

The information described above is in the ".debug_info" and ".debug_abbrev" sections.

Other information:

Information about line numbers (".debug_line" section)
Macro info (".debug_macinfo" section)
Call Frame Information (section ".debug_frame")

ELF Creation

We will create EFL files using the libelf library from the elfutils package. There is a good article on the web on using libelf - LibELF by Example (unfortunately, the creation of files is described very briefly in it) as well as documentation.
Creating a file consists of several steps:

libelf initialization
Creating a File Header (ELF Header)
Creating a Program Header Table
Create sections
Write file

Consider the steps in more detail

libelf initialization

First you will need to call the elf_version(EV_CURRENT) function and check the result. If it is equal to EV_NONE, an error has occurred and no further actions can be taken. Then we need to create the file we need on disk, get its descriptor and pass it to the elf_begin function:
Elf * elf_begin(int fd, Elf_Cmd cmd, Elf *elf)

fd - the file descriptor of the newly opened file
cmd - mode (ELF_C_READ for reading information, ELF_C_WRITE for writing or ELF_C_RDWR for reading/writing), it must match the mode of the open file (ELF_C_WRITE in our case)
elf - needed only for working with archive files (.a), in our case, you need to pass 0

The function returns a pointer to the generated handle that will be used in all libelf functions, 0 is returned on error.

Create a header

The new file header is created by the elf32_newehdr function:
Elf32_Ehdr * elf32_newehdr(Elf *elf);

elf - the handle returned by the elf_begin function

Returns 0 on error or a pointer to a structure - ELF file header:
#define EI_NIDENT 16 typedef struct (unsigned char e_ident; Elf32_Half e_type; Elf32_Half e_machine; Elf32_Word e_version; Elf32_Addr e_entry; Elf32_Off e_phoff; Elf32_Off e_shoff; Elf32_Word e_flags; Elf32_Half e_ehsize; Elf32_Half e_phentsize; Elf32_Half e_phnum; Elf32_Half e_shentsize; Elf32_Half e_shnum; Elf32_Half e_shstrndx; ) Elf32_Ehdr;

Some of its fields are filled in in a standard way, some we need to fill in:

e_ident - byte array of identification, has the following indexes:
- EI_MAG0, EI_MAG1, EI_MAG2, EI_MAG3 - these 4 bytes should contain the characters 0x7f, "ELF", which the elf32_newehdr function has already done for us
- EI_DATA - indicates the type of data encoding in the file: ELFDATA2LSB or ELFDATA2MSB. You need to set ELFDATA2LSB like this: e_ident = ELFDATA2LSB
- EI_VERSION - file header version, already set for us
- EI_PAD - do not touch
e_type - file type, can be ET_NONE - no type, ET_REL - relocatable file, ET_EXEC - executable file, ET_DYN - shared object file, etc. We need to set the file type to ET_EXEC
e_machine - architecture required for this file, for example EM_386 - for Intel architecture, for ARM we need to write here EM_ARM (40) - see ELF for the ARM Architecture
e_version - file version, must be set to EV_CURRENT
e_entry - entry point address, not necessary for us
e_phoff - offset in the program header file, e_shoff - section header offset, do not fill
e_flags - processor specific flags, for our architecture (Cortex-M3) should be set to 0x05000000 (ABI version 5)
e_ehsize, e_phentsize, e_phnum, e_shentsize, e_shnum - do not touch
e_shstrndx - contains the number of the section in which the table of strings with section headers is located. Since we do not have any sections yet, we will set this number later.

Creating a Program Header

As already mentioned, the Program Header Table is a table of correspondence between file sections and memory segments, which tells the loader where to write each section. The header is created using the elf32_newphdr function:
Elf32_Phdr * elf32_newphdr(Elf *elf, size_t count);

elf - our handle
count - the number of table elements to create. Since we will have only one section (with program code), then count will be equal to 1.

Returns 0 on error, or a pointer to the program header.
Each element in the header table is described by the following structure:
typedef struct ( Elf32_Word p_type; Elf32_Off p_offset; Elf32_Addr p_vaddr; Elf32_Addr p_paddr; Elf32_Word p_filesz; Elf32_Word p_memsz; Elf32_Word p_flags; Elf32_Word p_align; ) Elf32_Phdr;

p_type - type of segment (section), here we must specify PT_LOAD - loadable segment
p_offset - offsets in the file from where the data of the section that will be loaded into memory begins. We have a .text section, which will be located immediately after the file header and the program header, we can calculate the offset as the sum of the lengths of these headers. The length of any type can be obtained using the elf32_fsize function:
size_t elf32_fsize(Elf_Type type, size_t count, unsigned int version); type - here is the ELF_T_xxx constant, we will need the sizes ELF_T_EHDR and ELF_T_PHDR; count - the number of elements of the desired type, version - must be set to EV_CURRENT
p_vaddr, p_paddr - virtual and physical address where the contents of the section will be loaded. Since we do not have virtual addresses, we set it equal to the physical one, in the simplest case - 0, because this is where our program will be loaded.
p_filesz, p_memsz - section size in file and memory. We have them the same, but since there is no section with the program code yet, we will install them later
p_flags - permissions for the loaded memory segment. Can be PF_R - read, PF_W - write, PF_X - execute, or a combination of both. Set p_flags to PF_R + PF_X
p_align - segment alignment, we have 4

Create sections

After creating the headings, you can start creating sections. An empty section is created using the elf_newscn function:
Elf_Scn * elf_newscn(Elf *elf);

elf - the handle returned earlier by the elf_begin function

The function returns a section pointer or 0 on error.
After creating a section, you need to fill in the section header and create a section data descriptor.
We can get a pointer to the section header using the elf32_getshdr function:
Elf32_Shdr * elf32_getshdr(Elf_Scn *scn);

scn is the section pointer we got from the elf_newscn function.

The section header looks like this:
typedef struct ( Elf32_Word sh_name; Elf32_Word sh_type; Elf32_Word sh_flags; Elf32_Addr sh_addr; Elf32_Off sh_offset; Elf32_Word sh_size; Elf32_Word sh_link; Elf32_Word sh_info; Elf32_Word sh_addralign; Elf32_Word sh_entsize; ) Elf32;

sh_name - section name - offset in string table of section headers (section.shstrtab) - see "String Tables" below
sh_type - section content type, set SHT_PROGBITS for section with program code, SHT_STRTAB for sections with string table, SHT_SYMTAB for symbol table
sh_flags - section flags that can be combined, and of which we only need three:
- SHF_ALLOC - means that the section will be loaded into memory
- SHF_EXECINSTR - section contains executable code
- SHF_STRINGS - section contains string table
Accordingly, for the .text section with the program, you need to set the SHF_ALLOC + SHF_EXECINSTR flags
sh_addr - address where the section will be loaded into memory
sh_offset - section offset in the file - do not touch, the library will install for us
sh_size - section size - do not touch
sh_link - contains the number of the linked section, needed to link the section with the corresponding string table (see below)
sh_info - additional information depending on section type, set to 0
sh_addralign - address alignment, do not touch
sh_entsize - if the section consists of several elements of the same length, indicates the length of such an element, do not touch

After filling in the header, you need to create a section data descriptor using the elf_newdata function:
Elf_Data * elf_newdata(Elf_Scn *scn);

scn is the newly acquired pointer to the new section.

The function returns 0 on error, or a pointer to an Elf_Data structure to fill in:
typedef struct ( void* d_buf; Elf_Type d_type; size_t d_size; off_t d_off; size_t d_align; unsigned d_version; ) Elf_Data;

d_buf - pointer to the data to be written to the section
d_type - data type, ELF_T_BYTE is suitable for us everywhere
d_size - data size
d_off - section offset, set to 0
d_align - alignment, can be set to 1 - no alignment
d_version - version, must be set to EV_CURRENT

Special sections

For our purposes, we will need to create the minimum necessary set of sections:

.text - section with program code
.symtab - symbol table of the file
.strtab - a table of strings containing symbol names from the .symtab section, since the latter does not store the names themselves, but their indices
.shstrtab - string table containing section names

All sections are created as described in the previous section, but each special section has its own characteristics.

Section.text

This section contains executable code, so you need to set sh_type to SHT_PROGBITS, sh_flags to SHF_EXECINSTR + SHF_ALLOC, sh_addr to the address where this code will be loaded

Section.symtab

The section contains a description of all the symbols (functions) of the program and the files in which they were described. It consists of the following elements, each 16 bytes long:
typedef struct ( Elf32_Word st_name; Elf32_Addr st_value; Elf32_Word st_size; unsigned char st_info; unsigned char st_other; Elf32_Half st_shndx; ) Elf32_Sym;

st_name - symbol name (index in string table.strtab)
st_value - value (input address for a function or 0 for a file). Since the Cortex-M3 has a Thumb-2 instruction set, this address must be odd (real address + 1)
st_size - function code length (0 for file)
st_info - symbol type and its scope. There is a macro to define the value of this field
#define ELF32_ST_INFO(b,t) (((b)<<4)+((t)&0xf))
where b is scope and t is character type
The scope can be STB_LOCAL (the symbol is not visible from other object files) or STB_GLOBAL (visible). To simplify, we use STB_GLOBAL.
Symbol type - STT_FUNC for function, STT_FILE for file
st_other - set to 0
st_shndx - index of the section for which the symbol is defined (section index.text), or SHN_ABS for the file.
The section index by its scn handle can be determined using elf_ndxscn:
size_t elf_ndxscn(Elf_Scn *scn);

This section is created in the usual way, only the sh_type needs to be set to SHT_SYMTAB, and the section.strtab index needs to be written to the sh_link field, so these sections become linked.

Section.strtab

This section contains the names of all symbols from the .symtab section. Created like a regular section, but sh_type needs to be set to SHT_STRTAB, sh_flags to SHF_STRINGS, so this section becomes a string table.
The data for the section can be collected when passing through the source text into an array, the pointer to which is then written to the section data descriptor (d_buf).

Section.shstrtab

Section - a table of strings, contains the headers of all sections of the file, including its own header. It is created in the same way as the .strtab section. After creation, its index must be written in the e_shstrndx field of the file header.

String tables

String tables contain consecutive strings ending in a null byte, the first byte in that table must also be 0. The row index in a table is simply an offset in bytes from the start of the table, so the first string "name" has index 1, the next string " var" has index 6.
Index 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 \0 n a m e \0 v a r \0

Write file

So, the headers and sections are already formed, now they need to be written to a file and finished with libelf. The write is done by the elf_update function:
off_t elf_update(Elf *elf, Elf_Cmd cmd);

elf - handle
cmd - command, must be equal to ELF_C_WRITE to write.

The function returns -1 on error. The error text can be obtained by calling the elf_errmsg(-1) function, which will return a pointer to the line with the error.
We finish working with the library with the elf_end function, to which we pass our descriptor. It remains only to close the previously opened file.
However, our generated file does not contain debugging information, which we will add in the next section.

Creation of DWARF

We will create debugging information using the library, which comes with a pdf file with documentation (libdwarf2p.1.pdf - A Producer Library Interface to DWARF).
Creating debug information consists of the following steps:

Creation of nodes (DIE - Debugging Information Entry)
Creating Node Attributes
Creating Data Types
Creation of procedures (functions)

Consider the steps in more detail

Initializing libdwarf producer

We will generate debugging information at compile time at the same time as creating symbols in the .symtab section, so library initialization should be done after initializing libelf, creating the ELF header and program header, before creating the sections.
For initialization, we will use the dwarf_producer_init_c function. The library has several more initialization functions (dwarf_producer_init, dwarf_producer_init_b), which differ in some nuances described in the documentation. In principle, any of them can be used.

Dwarf_P_Debug dwarf_producer_init_c(Dwarf_Unsigned flags, Dwarf_Callback_Func_c func, Dwarf_Handler errhand, Dwarf_Ptr errarg, void * user_data, Dwarf_Error *error)

flags - a combination of "or" several constants that define some parameters, for example, the bit depth of information, byte order (little-endian, big-endian), relocation format, of which we definitely need DW_DLC_WRITE and DW_DLC_SYMBOLIC_RELOCATIONS
func - callback function that will be called when creating ELF sections with debugging information. For more details, see the "Creating sections with debug information" section below.
errhand is a pointer to a function to be called when an error occurs. Can be passed 0
errarg - data that will be passed to the errhand function, can be set to 0
user_data - data that will be passed to the func function, can be set to 0
error - returned error code

The function returns Dwarf_P_Debug - a descriptor used in all subsequent functions, or -1 in case of an error, while error will contain an error code (you can get the text of an error message by its code using the dwarf_errmsg function, passing this code to it)

Node Creation (DIE - Debugging Information Entry)

As described above, debug information forms a tree structure. In order to create a node of this tree, you need:

create it with the dwarf_new_die function
add attributes to it (each attribute type is added by its own function, which will be described later)

The node is created using the dwarf_new_die function:
Dwarf_P_Die dwarf_new_die(Dwarf_P_Debug dbg, Dwarf_Tag new_tag, Dwarf_P_Die parent, Dwarf_P_Die child, Dwarf_P_Die left_sibling, Dwarf_P_Die right_sibling, Dwarf_Error *error)

new_tag - node tag (type) - constant DW_TAG_xxxx, which can be found in the libdwarf.h file
parent, child, left_sibling, right_sibling - respectively the parent, child, left and right neighbors of the node. It is not necessary to specify all these parameters, it is enough to specify one, instead of the rest put 0. If all parameters are 0, the node will be either root or isolated
error - will contain the error code when it occurs

The function returns DW_DLV_BADADDR on failure or a Dwarf_P_Die node handle on success

Creating Node Attributes

There is a whole family of dwarf_add_AT_xxxx functions to create node attributes. Sometimes it is problematic to determine which function needs to create the necessary attribute, so I even dug into the source code of the library several times. Some of the functions will be described here, some below - in the relevant sections. They all take an ownerdie parameter, a handle to the node to which the attribute will be added, and return an error code in the error parameter.
The dwarf_add_AT_name function adds the "name" attribute (DW_AT_name) to a node. Most nodes must have a name (for example, procedures, variables, constants), some may not have a name (for example, the Compilation Unit)
Dwarf_P_Attribute dwarf_add_AT_name(Dwarf_P_Die ownerdie, char *name, Dwarf_Error *error)

name - the actual value of the attribute (node name)

The functions dwarf_add_AT_signed_const, dwarf_add_AT_unsigned_const add the specified attribute and its signed (unsigned) value to the node. Signed and unsigned attributes are used to set constant values, sizes, line numbers, and so on. Function format:
Dwarf_P_Attribute dwarf_add_AT_(un)signed_const(Dwarf_P_Debug dbg, Dwarf_P_Die ownerdie, Dwarf_Half attr, Dwarf_Signed value, Dwarf_Error *error)

dbg - Dwarf_P_Debug descriptor received during library initialization
attr - the attribute whose value is set - the DW_AT_xxxx constant, which can be found in the libdwarf.h file
value - the value of the attribute

Return DW_DLV_BADADDR on error, or an attribute handle on success.

Creating a Compilation Unit

Any tree must have a root - in our case it is a compilation unit that contains information about the program (for example, the name of the main file, the programming language used, the name of the compiler, the sensitivity of characters (variables, functions) to case, the main function of the program, the starting address, etc.). etc.). In principle, no attributes are required. For example, let's create information about the main file and the compiler.

Information about the main file

The "name" attribute (DW_AT_name) is used to store information about the main file, use the dwarf_add_AT_name function as shown in the "Creating Node Attributes" section.

Compiler Information

We use the dwarf_add_AT_producer function:
Dwarf_P_Attribute dwarf_add_AT_name(Dwarf_P_Die ownerdie, char *producer_string, Dwarf_Error *error)

producer_string - string with information text

Returns DW_DLV_BADADDR on error, or an attribute handle on success.

Creating a Common Information Entry

Usually, when a function (subroutine) is called, its parameters and return address are placed on the stack (although each compiler may do this in its own way), all this is called a Call Frame. The debugger needs information about the frame format in order to correctly determine the return address from the function and build a backtrace - the chain of function calls that led us to the current function, and the parameters of these functions. It also usually specifies the processor registers that are stored on the stack. The code that reserves space on the stack and saves the processor registers is called the function prologue, the code that restores the registers and the stack is called the epilogue.
This information is highly compiler dependent. For example, the prologue and epilogue need not be at the very beginning and end of the function; sometimes a frame is used, sometimes not; processor registers can be stored in other registers, and so on.
So, the debugger needs to know how processor registers change their value and where they will be stored when entering the procedure. This information is called Call Frame Information - frame format information. For each address in the program (containing code), the address of the frame in memory (Canonical Frame Address - CFA) and information about the processor registers are indicated, for example, you can specify that:

case not preserved in procedure
the register does not change its value in the procedure
the register is stored on the stack at address CFA+n
the register is stored in another register
the register is stored in memory at some address, which can be calculated in a rather non-obvious way
etc.

Since the information must be specified for each address in the code, it is very voluminous and is stored in a compressed form in the .debug_frame section. Since it changes little from address to address, only its changes are encoded in the form of DW_CFA_хххх instructions. Each instruction indicates one change, for example:

DW_CFA_set_loc - points to the current address in the program
DW_CFA_advance_loc - Advances the pointer by some number of bytes
DW_CFA_def_cfa - specifies the address of the stack frame (numeric constant)
DW_CFA_def_cfa_register - specifies the address of the stack frame (taken from the processor register)
DW_CFA_def_cfa_expression - specifies how the stack frame address should be calculated
DW_CFA_same_value - indicates that the case is not changed
DW_CFA_register - indicate that the register is stored in another register
etc.

The elements of the .debug_frame section are entries that can be of two types: Common Information Entry (CIE) and Frame Description Entry (FDE). The CIE contains information that is common to many FDE entries, roughly speaking it describes a particular type of procedure. FDE describe each specific procedure. When entering a procedure, the debugger first executes instructions from CIE and then from FDE.
My compiler generates procedures where the CFA is in register sp (r13). Let's create CIE for all procedures. There is a function dwarf_add_frame_cie for this:
Dwarf_Unsigned dwarf_add_frame_cie(Dwarf_P_Debug dbg, char *augmenter, Dwarf_Small code_align, Dwarf_Small data_align, Dwarf_Small ret_addr_reg, Dwarf_Ptr init_bytes, Dwarf_Unsigned init_bytes_len, Dwarf_Error *error);

augmenter - UTF-8 encoded string, the presence of which indicates that there is additional platform-specific information for CIE or FDE. Put an empty line
code_align - code alignment in bytes (we have 2)
data_align - data alignment in the frame (set -4, which means all parameters take 4 bytes on the stack and it grows down in memory)
ret_addr_reg - register containing the return address from the procedure (we have 14)
init_bytes - array containing DW_CFA_xxxx instructions. Unfortunately, there is no convenient way to generate this array. You can form it manually or peep it in the elf file that was generated by the C compiler, which I did. For my case, it contains 3 bytes: 0x0C, 0x0D, 0, which stands for DW_CFA_def_cfa: r13 ofs 0 (CFA is in register r13, offset is 0)
init_bytes_len - length of init_bytes array

The function returns DW_DLV_NOCOUNT on error, or a CIE handle that should be used when creating an FDE for each procedure, which we will discuss later in the "Creating an FDE procedure" section.

Creating Data Types

Before creating procedures and variables, you must first create nodes corresponding to data types. There are many data types, but they are all based on basic types (elementary types like int, double, etc.), other types are built from basic ones.
The base type is the node with the DW_TAG_base_type tag. It must have the following attributes:

"name" (DW_AT_name)
"encoding" (DW_AT_encoding) - means exactly what data this base type describes (for example, DW_ATE_boolean - boolean, DW_ATE_float - floating point, DW_ATE_signed - signed integer, DW_ATE_unsigned - unsigned integer, etc.)
"size" (DW_AT_byte_size - size in bytes or DW_AT_bit_size - size in bits)

The node may also contain other optional attributes.
For example, to create a 32-bit integer signed base type "int", we will need to create a node with the DW_TAG_base_type tag and set its attributes DW_AT_name - "int", DW_AT_encoding - DW_ATE_signed, DW_AT_byte_size - 4.
After creating the base types, you can create derivatives from them. Such nodes must contain the attribute DW_AT_type - a reference to their base type. For example, a pointer to int - a node with the DW_TAG_pointer_type tag must contain a reference to the previously created "int" type in the DW_AT_type attribute.
An attribute with a reference to another node is created by the dwarf_add_AT_reference function:
Dwarf_P_Attribute dwarf_add_AT_reference(Dwarf_P_Debug dbg, Dwarf_P_Die ownerdie, Dwarf_Half attr, Dwarf_P_Die otherdie, Dwarf_Error *error)

attr - attribute, in this case DW_AT_type
otherdie - a handle to the node of the type being referenced

Creating Procedures

To create procedures, I need to explain one more type of debugging information - Line Number Information. It serves to map each machine instruction to a specific line of the source code and also to enable line-by-line debugging of the program. This information is stored in the .debug_line section. If we had enough space, then it would be stored as a matrix, one row for each instruction with columns like this:

source file name
line number in this file
column number in the file
whether the instruction is the start of a statement or block of statements
etc.

Such a matrix would be very large, so it has to be compressed. Firstly, duplicate lines are removed, and secondly, not the lines themselves are saved, but only changes to them. These changes look like commands for a finite state machine, and the information itself is already considered a program that will be “executed” by this machine. The commands of this program look like this: DW_LNS_advance_pc - advance the program counter to some address, DW_LNS_set_file - set the file in which the procedure is defined, DW_LNS_const_add_pc - advance the program counter by a few bytes, etc.
It is difficult to create this information at such a low level, so the libdwarf library provides several functions to make this task easier.
It is expensive to store the file name for each instruction, so instead of the name, its index is stored in a special table. To create a file index, use the dwarf_add_file_decl function:
Dwarf_Unsigned dwarf_add_file_decl(Dwarf_P_Debug dbg, char *name, Dwarf_Unsigned dir_idx, Dwarf_Unsigned time_mod, Dwarf_Unsigned length, Dwarf_Error *error)

name - the name of the file
dir_idx - index of the folder where the file is located. The index can be obtained using the dwarf_add_directory_decl function. If full paths are used, you can set 0 as the folder index and not use dwarf_add_directory_decl at all
time_mod - file modification time, can be omitted (0)
length - file size, also optional (0)

The function will return the index of the file, or DW_DLV_NOCOUNT on error.
To create line number information, there are three functions dwarf_add_line_entry_b, dwarf_lne_set_address, dwarf_lne_end_sequence, which we will consider below.
Creating debugging information for a procedure goes through several steps:

creating a procedure symbol in the .symtab section
creating a procedure node with attributes
creating an FDE procedure
creating procedure parameters
generating line number information

Creating a procedure symbol

The procedure symbol is created as described above in the "Section.symtab" section. In it, the symbols of procedures are interspersed with the symbols of the files in which the source code of these procedures is located. First we create the file symbol, then the procedures. This makes the file current, and if the next procedure is in the current file, the file symbol does not need to be created again.

Creating a Procedure Node with Attributes

First, we create a node using the dwarf_new_die function (see the "Creating Nodes" section), specifying the DW_TAG_subprogram tag as the tag, and the Compilation Unit (if this is a global procedure) or the corresponding DIE (if local) as the parent. Next, we create the attributes:

procedure name (function dwarf_add_AT_name, see "Creating Node Attributes")
line number in the file where the procedure code begins (attribute DW_AT_decl_line), function dwarf_add_AT_unsigned_const (see "Creating Node Attributes")
procedure start address (attribute DW_AT_low_pc), function dwarf_add_AT_targ_address, see below
end address of procedure (attribute DW_AT_high_pc), function dwarf_add_AT_targ_address, see below
the type of the result returned by the procedure (the DW_AT_type attribute is a link to a previously created type, see "Creating Data Types"). If the procedure does not return anything, this attribute does not need to be created.

The attributes DW_AT_low_pc and DW_AT_high_pc must be created using the dwarf_add_AT_targ_address_b function specially designed for this:
Dwarf_P_Attribute dwarf_add_AT_targ_address_b(Dwarf_P_Debug dbg, Dwarf_P_Die ownerdie, Dwarf_Half attr, Dwarf_Unsigned pc_value, Dwarf_Unsigned sym_index, Dwarf_Error *error)

attr - attribute (DW_AT_low_pc or DW_AT_high_pc)
pc_value - address value
sym_index - index of the procedure symbol in the .symtab table. Optional, you can pass 0

The function will return DW_DLV_BADADDR on error.

Creating an FDE procedure

As mentioned above in the “Creating a Common Information Entry” section, for each procedure, you need to create a frame descriptor, which occurs in several stages:

creating a new FDE (see Creating a Common Information Entry)
attaching the created FDE to the general list
adding instructions to the generated FDE

You can create a new FDE with the dwarf_new_fde function:
Dwarf_P_Fde dwarf_new_fde(Dwarf_P_Debug dbg, Dwarf_Error *error)
The function will return a handle to the new FDE or DW_DLV_BADADDR on error.
You can add a new FDE to the list with dwarf_add_frame_fde :
Dwarf_Unsigned dwarf_add_frame_fde(Dwarf_P_Debug dbg, Dwarf_P_Fde fde, Dwarf_P_Die die, Dwarf_Unsigned cie, Dwarf_Addr virt_addr, Dwarf_Unsigned code_len, Dwarf_Unsigned sym_idx, Dwarf_Error* error)

fde - the handle just received
die - Procedure DIE (see Creating a Procedure Node with Attributes)
cie - CIE descriptor (see Creating a Common Information Entry)
virt_addr - the starting address of our procedure
code_len - procedure length in bytes

The function will return DW_DLV_NOCOUNT on error.
After all this, we can add DW_CFA_хххх instructions to our FDE. This is done with the dwarf_add_fde_inst and dwarf_fde_cfa_offset functions. The first adds the given instruction to the list:
Dwarf_P_Fde dwarf_add_fde_inst(Dwarf_P_Fde fde, Dwarf_Small op, Dwarf_Unsigned val1, Dwarf_Unsigned val2, Dwarf_Error *error)

op - instruction code (DW_CFA_хххх)
val1, val2 - instruction parameters (different for each instruction, see the Standard, section 6.4.2 Call Frame Instructions)

The dwarf_fde_cfa_offset function adds the DW_CFA_offset instruction:
Dwarf_P_Fde dwarf_fde_cfa_offset(Dwarf_P_Fde fde, Dwarf_Unsigned reg, Dwarf_Signed offset, Dwarf_Error *error)

fde - handle to the created FDE
reg - the register that is written to the frame
offset - its offset in the frame (not in bytes, but in frame elements, see Creating a Common Information Entry, data_align)

For example, the compiler creates a procedure whose prolog saves register lr (r14) to the stack frame. First of all, you need to add the DW_CFA_advance_loc instruction with the first parameter equal to 1, which means the advance of the pc register by 2 bytes (see Creating a Common Information Entry, code_align), then add DW_CFA_def_cfa_offset with the parameter 4 (setting the data offset in the frame by 4 bytes) and call the dwarf_fde_cfa_offset function with the reg=14 offset=1 parameter, which means writing the r14 register to the frame with an offset of -4 bytes from the CFA.

Creating Procedure Parameters

Creating procedure parameters is similar to creating ordinary variables, see "Creating Variables and Constants"

Generating line number information

This information is created like this:

at the beginning of the procedure, we begin the instruction block with the dwarf_lne_set_address function
for each line of code (or machine instruction) we create information about the source code (dwarf_add_line_entry)
at the end of the procedure, we complete the block of instructions with the dwarf_lne_end_sequence function

The dwarf_lne_set_address function sets the address where the instruction block starts:
Dwarf_Unsigned dwarf_lne_set_address(Dwarf_P_Debug dbg, Dwarf_Addr offs, Dwarf_Unsigned symidx, Dwarf_Error *error)

offs - procedure address (address of the first machine instruction)
sym_idx - symbol index (optional, you can specify 0)

The dwarf_add_line_entry_b function adds information about lines of source code to the .debug_line section. I call this function for every machine instruction:
Dwarf_Unsigned dwarf_add_line_entry_b (Dwarf_P_Debug dbg, Dwarf_Unsigned file_index, Dwarf_Addr code_offset, Dwarf_Unsigned lineno, Dwarf_Signed column_number, Dwarf_Bool is_source_stmt_begin, Dwarf_Bool is_basic_block_begin, Dwarf_Bool is_epilogue_begin, Dwarf_Bool is_prologue_end, Dwarf_Unsigned isa, Dwarf_Unsigned discriminator, Dwarf_Error * error)

file_index - index of the source code file obtained earlier by the dwarf_add_file_decl function (see "Creating Procedures")
code_offset - address of the current machine instruction
lineno - the line number in the source code file
column_number - column number in the source code file
is_source_stmt_begin - 1 if the current instruction is the first in the code in the lineno line (I always use 1)
is_basic_block_begin - 1 if the current instruction is the first in the statement block (I always use 0)
is_epilogue_begin - 1 if the current instruction is the first in the procedure epilogue (not necessary, I always have 0)
is_prologue_end - 1 if the current instruction is the last one in the prologue of the procedure (mandatory!)
isa - instruction set architecture (instruction set architecture). Be sure to specify DW_ISA_ARM_thumb for ARM Cortex M3!
discriminator. One position (file, line, column) of the source code can correspond to different machine instructions. In this case, different discriminators must be set for sets of such instructions. If there are no such cases, it should be 0

The function returns 0 (success) or DW_DLV_NOCOUNT (error).
Finally, the dwarf_lne_end_sequence function ends the procedure:
Dwarf_Unsigned dwarf_lne_end_sequence(Dwarf_P_Debug dbg, Dwarf_Addr address; Dwarf_Error *error)

address - address of the current machine instruction

Returns 0 (success) or DW_DLV_NOCOUNT (error).
This completes the creation of the procedure.

Creating Variables and Constants

In general, variables are pretty simple. They have a name, a memory location (or processor register) where their data resides, and the type of that data. If the variable is global - its parent should be the Compilation Unit, if local - the corresponding node (this is especially true for procedure parameters, they must have the procedure itself as a parent). You can also specify in which file, line, and column the variable declaration is located.
In the simplest case, the value of the variable is located at some fixed address, but many variables are dynamically created when entering the procedure on the stack or register, sometimes calculating the address of the value can be quite non-trivial. The standard provides a mechanism for describing where the value of a variable is located - location expressions. An address expression is a set of instructions (DW_OP_xxxx constants) for a forth-like stack machine, in fact it is a separate language with branches, procedures and arithmetic operations. We will not review this language in full, we will actually be interested in only a few instructions:

DW_OP_addr - specifies the address of a variable
DW_OP_fbreg - Indicates the variable's offset from the base register (usually the stack pointer)
DW_OP_reg0 ... DW_OP_reg31 - indicates that the variable is stored in the corresponding register

In order to create a destination expression, you must first create an empty expression (dwarf_new_expr), add instructions to it (dwarf_add_expr_addr, dwarf_add_expr_gen, etc.) and add it to the node as the value of the DW_AT_location attribute (dwarf_add_AT_location_expression).
The function for creating an empty address expression returns its handle or 0 on error:
Dwarf_Expr dwarf_new_expr(Dwarf_P_Debug dbg, Dwarf_Error *error)
To add instructions to an expression, use the dwarf_add_expr_gen function:
Dwarf_Unsigned dwarf_add_expr_gen(Dwarf_P_Expr expr, Dwarf_Small opcode, Dwarf_Unsigned val1, Dwarf_Unsigned val2, Dwarf_Error *error)

opcode - operation code, constant DW_OP_хххх
val1, val2 - instruction parameters (see standard)

To explicitly set the address of a variable, the dwarf_add_expr_addr function should be used instead of the previous one:
Dwarf_Unsigned dwarf_add_expr_addr(Dwarf_P_Expr expr, Dwarf_Unsigned address, Dwarf_Signed sym_index, Dwarf_Error *error)

expr - handle of the address expression to which the instruction is added
address - variable address
sym_index - symbol index in the .symtab table. Optional, you can pass 0

The function also returns DW_DLV_NOCOUNT on error.
And finally, you can add the created address expression to the node using the dwarf_add_AT_location_expr function:
Dwarf_P_Attribute dwarf_add_AT_location_expr(Dwarf_P_Debug dbg, Dwarf_P_Die ownerdie, Dwarf_Half attr, Dwarf_P_Expr loc_expr, Dwarf_Error *error)

ownerdie - the node to which the expression is added
attr - attribute (in our case DW_AT_location)
loc_expr - a handle to a previously created address expression

The function returns the attribute handle, or DW_DLV_NOCOUNT on error.
Variables (as well as procedure parameters) and constants are ordinary nodes with the tag DW_TAG_variable, DW_TAG_formal_parameter and DW_TAG_const_type respectively. They need the following attributes:

variable/constant name (function dwarf_add_AT_name, see "Creating Node Attributes")
line number in the file where the variable is declared (attribute DW_AT_decl_line), function dwarf_add_AT_unsigned_const (see "Creating Node Attributes")
filename index (DW_AT_decl_file attribute), dwarf_add_AT_unsigned_const function (see "Creating Node Attributes")
variable/constant data type (the DW_AT_type attribute is a link to a previously created type, see "Creating Data Types")
address expression (see above) - needed for a variable or procedure parameter
or value - for a constant (attribute DW_AT_const_value, see "Creating Node Attributes")

Creating sections with debug information

After creating all nodes of the debug information tree, you can start forming elf sections with it. This happens in two stages:

first you need to call the dwarf_transform_to_disk_form function, which will call the function we wrote to create the desired elf sections once for each section
for each section, the dwarf_get_section_bytes function will return data to us, which will need to be written to the corresponding section

Function
dwarf_transform_to_disk_form (Dwarf_P_Debug dbg, Dwarf_Error* error)
converts the debug information we created to binary format, but does not write anything to disk. It will return the number of created elf sections or DW_DLV_NOCOUNT on error. In this case, for each section, the callback function will be called, which we passed when initializing the library to the dwarf_producer_init_c function. We have to write this function ourselves. Its specification is:
typedef int (*Dwarf_Callback_Func_c)(char* name, int size, Dwarf_Unsigned type, Dwarf_Unsigned flags, Dwarf_Unsigned link, Dwarf_Unsigned info, Dwarf_Unsigned* sect_name_index, void * user_data, int* error)

name - the name of the elf section to be created
size - section size
type - section type
flags - section flags
link - section link field
info - section information field
sect_name_index - you need to return the index of the section with relocations (optional)
user_data - is passed to us in the same way as we set it in the library initialization function
error - here you can pass the error code

In this function, we must:

create a new section (function elf_newscn, see Creating Sections)
create section header (function elf32_getshdr, ibid.)
fill it out correctly (see ibid.). This is easy because the section header fields correspond to our function parameters. The missing fields sh_addr, sh_offset, sh_entsize will be set to 0, and sh_addralign to 1
return the index of the created section (function elf_ndxscn, see "Section.symtab") or -1 on error (by setting the error code to error)
also we have to skip the ".rel" section (in our case) by returning 0 when returning from the function

Upon completion, the dwarf_transform_to_disk_form function will return the number of partitions created. We will need to loop from 0 through each section, following these steps:

create data to write to the section using the dwarf_get_section_bytes function:
Dwarf_Ptr dwarf_get_section_bytes(Dwarf_P_Debug dbg, Dwarf_Signed dwarf_section, Dwarf_Signed *elf_section_index, Dwarf_Unsigned *length, Dwarf_Error* error)
- dwarf_section - section number. Should be in the range 0..n, where n is the number returned to us by the dwarf_transform_to_disk_form function
- elf_section_index - returns the index of the section to write data to
- length - the length of this data
- error - not used
The function returns a pointer to the received data or 0 (if
when there are no more sections to create)
create a data descriptor for the current section (function elf_newdata, see Creating Sections) and fill it (see ibid.) by setting:
- d_buf - pointer to the data we received from the previous function
- d_size - the size of this data (ibid.)

End of work with the library

After forming the sections, you can finish working with libdwarf with the dwarf_producer_finish function:
Dwarf_Unsigned dwarf_producer_finish(Dwarf_P_Debug dbg, Dwarf_Error* error)
The function returns DW_DLV_NOCOUNT on error.
I note that writing to disk at this stage is not performed. Recording must be done using the functions from the section "Creating an ELF - Writing a file".

Conclusion

That's all.
I repeat, the creation of debugging information is a very extensive topic, and I did not touch on many topics, only opening the veil. Those who wish can go deep into infinity.
If you have questions, I will try to answer them.

ELF format

The ELF format has several types of files that we have referred to differently so far, such as an executable file or an object file. However, the ELF standard distinguishes between the following types:

1. File being moved(relocatable file) that stores instructions and data that can be linked to other object files. The result of such linking can be an executable file or a shared object file.

2. Shared object file(shared object file) also contains instructions and data, but can be used in two ways. In the first case, it can be linked to other relocatable files and shared object files, resulting in a new object file being created. In the second case, when the program is launched for execution, the operating system can dynamically link it to the program's executable file, as a result of which an executable image of the program will be created. In the latter case, we are talking about shared libraries.

3. Executable stores a complete description that allows the system to create an image of the process. It contains instructions, data, a description of the required shared object files, and the necessary symbolic and debugging information.

On fig. 2.4 shows the structure of the executable file, with which the operating system can create a program image and run the program for execution.

Rice. 2.4. The structure of the executable file in ELF format

The header has a fixed location in the file. The rest of the components are placed according to the information stored in the header. Thus, the header contains a general description of the file structure, the location of individual components and their sizes.

Since the header of an ELF file defines its structure, let's consider it in more detail (Table 2.4).

Table 2.3. ELF Header Fields

Field	Description
e_ident	An array of bytes, each of which defines some general characteristic of the file: file format (ELF), version number, system architecture (32-bit or 64-bit), etc.
e_type	File type as ELF format supports multiple types
e_machine	The architecture of the hardware platform for which this file was created. In table. 2.4 shows the possible values of this field
e_version	Version number of the ELF format. Usually defined as EV_CURRENC (current), which means the latest version
e_entry	Virtual address to which the system will transfer control after loading the program (entry point)
e_phoff	Location (offset from the beginning of the file) of the program header table
e_shoff	Section header table location
e_ehsize	Header Size
e_phentsize	Size of each program header
e_phnum	Number of program titles
e_shentsize	Size of each segment (section) header
e_shnum	Number of segment (section) titles
e_shstrndx	The location of the segment containing the string table

Table 2.4. Values of the e_machine field of the ELF file header

Meaning	Hardware platform
EM_M32	AT&T WE 32100
EM_SPARC	Sun SPARC
EM_386	Intel 80386
EM_68K	Motorola 68000
EM_88K	Motorola 88000
EM_486	Intel 80486
EM_860	Intel i860
EM_MIPS	MIPS RS3000 Big Endian
EM_MIPS_RS3_LE	MIPS RS3000 Little Endian
EM_RS6000	RS6000
EM_PA_RISC	PA-RISC
EM_nCUBE	nCUBE
EM_VPP500	Fujitsu VPP500
EM_SPARC32PLUS	Sun SPARC 32+

The information contained in the program header table tells the kernel how to create a process image from the segments. Most segments are copied (mapped) to memory and represent the corresponding segments of a process when it is executed, such as code or data segments.

Each program segment header describes one segment and contains the following information:

Segment type and actions of the operating system with this segment

Location of the segment in the file

The starting address of the segment in the virtual memory of the process

File segment size

Memory segment size

Segment access flags (write, read, execute)

Some segments have the LOAD type, which instructs the kernel to create data structures corresponding to these segments, called areas, which define contiguous portions of the process's virtual memory and their associated attributes. The segment, whose location in the ELF file is indicated in the corresponding program header, will be mapped to the created area, the virtual start address of which is also indicated in the program header. Segments of this type include, for example, segments containing program instructions (code) and its data. If the segment size is smaller than the area size, the unused space may be filled with zeros. Such a mechanism is used in particular when creating uninitialized process data (BSS). We'll talk more about areas in Chapter 3.

An INTERP type segment stores a program interpreter. This segment type is used for programs that require dynamic linking. The essence of dynamic linking is that the individual components of the executable file (shared object files) are connected not at the compilation stage, but at the stage of launching the program for execution. The name of the file that is dynamic link editor, is stored in this segment. During the execution of a program, the kernel creates a process image using the specified linker. Thus, it is not the original program that is initially loaded into memory, but the dynamic linker. In the next step, the dynamic linker works with the UNIX kernel to create a complete executable image. The dynamic editor loads the necessary shared object files, whose names are stored in separate segments of the source executable file, and performs the required placement and linking. Finally, control is transferred to the original program.

Finally, the header table completes the file. sections or sections(section). Sections (sections) define sections of a file that are used for linking with other modules during compilation or dynamic linking. Accordingly, the headings contain all the necessary information to describe these sections. As a rule, sections contain more detailed information about segments. So, for example, a code segment can consist of several sections, such as a hash table for storing indexes of symbols used in the program, a section for the initialization code of the program, a linking table used by the dynamic editor, and a section containing the actual program instructions.

We will return to the ELF format in Chapter 3 when we discuss the organization of process virtual memory, but for now let's move on to the next common format, COFF.

From the book The Art of Unix Programming author Raymond Eric Steven

From the book Computer Tutorial author Kolisnichenko Denis Nikolaevich

From the book Abstract, term paper, diploma on a computer author Balovsyak Nadezhda Vasilievna

5.2.6. The Windows INI Format Many programs in Microsoft Windows use a text-based data format, such as the example in Example 5-6. In this example, optional resources named account, directory, numeric_id, and developer are linked to the named projects python, sng, f etchmail, and py-howto. In recording

From the book The latest computer tutorial author Beluntsov Valery

14.5.3. Cell Format The format specifies how the value of the cell will be displayed. The format is closely related to the data type of the cell. The type is up to you. If you entered a number, then it is a numeric data type. Excel itself tries to determine the format by data type. For example, if you entered text, then

From the book The Art of Unix Programming author Raymond Eric Steven

PDF format PDF stands for Portable Document Format (Portable Document Format). This format was created specifically to eliminate problems with the display of information in files. Its advantage is that, firstly, a document saved in PDF format will be the same

From the book TCP/IP Architecture, Protocols, Implementation (including IP version 6 and IP Security) the author Faith Sidney M

File Format When a user starts working with a file, the system needs to know in what format it is written and with what program it should be opened. For example, if a file contains plain text, then it can be read in any text program

From the book Yandex for everyone author Abramzon M. G.

5.2.2. RFC 822 Format The RFC 822 metaformat is derived from the text format of Internet e-mail messages. RFC 822 is the main Internet RFC standard that describes this format (subsequently superseded by RFC 2822). MIME (Multipurpose Internet Media Extension) format

From Macromedia Flash Professional 8. Graphics and Animation author Dronov V. A.

5.2.3. Cookie-Jar Format The cookie-jar format is used by fortune(1) for its own database of random quotes. It is suitable for entries that are simply blocks of unstructured text. The record separator in this format is the character

From the book Computer Sound Processing author Zagumennov Alexander Petrovich

5.2.4. The record-jar format The cookie-jar record delimiters fit well with the RFC 822 metaformat for records that form the format referred to in this book as "record-jar". Sometimes a text format is required that supports multiple entries with a different set of explicit names

From the book UNIX Operating System author Robachevsky Andrey M.

From the book Office Computer for Women author Pasternak Evgenia

19.5 Generic URL format Summarizing the above, we note that:? The URL starts with the access protocol used.? For all applications except online news and email, this is followed by the delimiter://.? Then the hostname of the server is specified.? Finally

From the author's book

3.3.1. RSS Format You can read site news in different ways. The easiest way is to visit the site from time to time and view new messages. You can put a program that connects to a news channel and itself receives headlines or annotations of news, according to

From the author's book

MP3 Format The MP3 format was created to distribute music files compressed with the MPEG 1 level 3 codec. It is currently the most popular format for distributing music over the Internet, and beyond. It is supported by absolutely all programs for recording and processing sound, for

From the author's book

MP3 Format The audio compression method, as well as the format of compressed audio files, proposed by the international organization MPEG (Moving Pictures Experts Group - Video Recording Experts Group), is based on perceptual audio coding. Work on the creation of efficient coding algorithms

From the author's book

The ELF Format The ELF format has several types of files that we have called differently so far, such as an executable file or an object file. However, the ELF standard distinguishes between the following types:1. A relocatable file that contains instructions and data that can be

From the author's book

Number Format Finally got to the number format. I have already mentioned it more than once, now I will put everything on the shelves (although you could already understand the general meaning). Numbers in Excel can be displayed in various formats. In this section, we will talk about what number formats exist and how

In this review, we will only talk about the 32-bit version of this format, because we do not need the 64-bit one yet.

Any ELF file (including object modules of this format) consists of the following parts:

ELF file header;
Table of program sections (may be absent in object modules);
Sections of the ELF file;
Section table (may not be present in the executable module);
For performance reasons, ELF format does not use bit fields. And all structures are usually 4-byte aligned.

Now let's look at the types used in the headers of ELF files:

Now consider the file header:

#define EI_NIDENT 16 struct elf32_hdr (unsigned char e_ident; Elf32_Half e_type; Elf32_Half e_machine; Elf32_Word e_version; Elf32_Addr e_entry; / * Entry point * / Elf32_Off e_phoff; Elf32_Off e_shoff; Elf32_Word e_flags; Elf32_Half e_ehsize; Elf32_Half e_phentsize; Elf32_Half e_phnum; Elf32_Half e_shentsize; Elf32_Half e_shnum; Elf32_Half e_shstrndx; );

The e_ident array contains information about the system and consists of several subfields.

Struct ( unsigned char ei_magic; unsigned char ei_class; unsigned char ei_data; unsigned char ei_version; unsigned char ei_pad; )

ei_magic - constant value for all ELF files, equal to ( 0x7f, "E", "L", "F")
ei_class - ELF file class (1 - 32 bits, 2 - 64 bits which we don't consider)
ei_data - determines the byte order for this file (this order depends on the platform and can be direct (LSB or 1) or reverse (MSB or 2)) For Intel processors, only the value 1 is allowed.
ei_version is a rather useless field, and if not equal to 1 (EV_CURRENT) then the file is considered invalid.

The ei_pad field is where operating systems store their identification information. This field may be empty. It doesn't matter to us either.

The e_type header field can contain multiple values, for executable files it must be ET_EXEC equal to 2

e_machine - determines the processor on which this executable file can run (For us, the value of EM_386 is 3)

The e_version field corresponds to the ei_version field from the header.

The e_entry field defines the starting address of the program, which is placed in eip before starting the program.

The e_phoff field specifies the offset from the beginning of the file where the program section table is located, used to load programs into memory.

I will not list the purpose of all fields, not all are needed for loading. I will describe only two more.

The e_phentsize field defines the size of the entry in the program section table.

And the e_phnum field specifies the number of entries in the program section table.

The section table (non-program) is used to link programs. we will not consider it. Also, we will not consider dynamically linked modules. This topic is quite complicated, not suitable for a first acquaintance. :)

Now about the program sections. The format of the program section table entry is as follows:

Struct elf32_phdr( Elf32_Word p_type; Elf32_Off p_offset; Elf32_Addr p_vaddr; Elf32_Addr p_paddr; Elf32_Word p_filesz; Elf32_Word p_memsz; Elf32_Word p_flags; Elf32_Word p_align; );

More about fields.

p_type - defines the type of program section. It can take several values, but we are only interested in one. PT_LOAD(1). If the section is of this type, then it is intended to be loaded into memory.
p_offset - determines the offset in the file from which this section begins.
p_vaddr Specifies the virtual address where this section should be loaded into memory.
p_paddr - defines the physical address where this section should be loaded. This field does not have to be used and is only meaningful for some platforms.
p_filesz - Determines the size of a section in a file.
p_memsz - determines the size of a section in memory. This value may be greater than the previous one. The p_flag field defines the type of access to sections in memory. Some sections are allowed to be performed, some to be recorded. Everyone is available for reading in existing systems.

Loading the ELF format.

With the title, we figured it out a bit. Now I will give an algorithm for loading a binary file of the ELF format. The algorithm is schematic, you should not consider it as a working program.

Int LoadELF (unsigned char *bin) ( struct elf32_hdr *EH = (struct elf32_hdr *)bin; struct elf32_phdr *EPH; if (EH->e_ident != 0x7f || // Control MAGIC EH->e_ident != "E" || EH->e_ident != "L" || EH->e_ident != "F" || EH->e_ident != ELFCLASS32 || // Control class EH->e_ident != ELFDATA2LSB || // byte order EH->e_ident != EV_CURRENT || // version EH->e_type != ET_EXEC || // type EH->e_machine != EM_386 || // platform EH->e_version != EV_CURRENT) // and again version, just in case return ELF_WRONG; EPH = (struct elf32_phdr *)(bin + EH->e_phoff); while (EH->e_phnum--) ( if (EPH->p_type == PT_LOAD) memcpy (EPH->p_vaddr, bin + EPH->p_offset, EPH->p_filesz); EPH = (struct elf32_phdr *)((unsigned char *)EPH + EH->e_phentsize)); ) return ELF_OK; )

On a serious note, it’s worth analyzing the EPH->p_flags fields and setting access rights to the corresponding pages, and simply copying won’t work here, but this no longer applies to the format, but to memory allocation. Therefore, we will not talk about it now.

PE format.

In many ways, it is similar to the ELF format, and not surprisingly, there should also be sections available for download.

Like everything in Microsoft :) the PE format is based on the EXE format. The file structure is:

00h - EXE header (I will not consider it, it is old as Dos. :)
20h - OEM header (nothing significant in it);
3ch - real PE header offset in the file (dword).
stub movement table;
stub;
PE header;
object table;
file objects;

stub is a program that runs in real mode and does some preliminary work. It may not be available, but sometimes it may be needed.

We are interested in something else, the PE header.

Its structure is like this:

Struct pe_hdr ( unsigned long pe_sign; unsigned short pe_cputype; unsigned short pe_objnum; unsigned long pe_time; unsigned long pe_cofftbl_off; unsigned long pe_cofftbl_size; unsigned short pe_nthdr_size; unsigned short pe_flags; unsigned short pe_magic; unsigned short pe_link_peverdata_code; unsigned long_size; ; unsigned long pe_udata_size; unsigned long pe_entry; unsigned long pe_code_base; unsigned long pe_data_base; unsigned long pe_image_base; unsigned long pe_obj_align; unsigned long pe_file_align; // ... well, and a lot of other things, unimportant. );

Lots of things are there. Suffice it to say that the size of this header is 248 bytes.

And the main thing is that most of these fields are not used. (Who builds like that?) No, of course, they have a well-known purpose, but my test program, for example, contains zeros in the fields pe_code_base, pe_code_size, etc., but it works fine. The conclusion suggests itself that the file is loaded based on the table of objects. That's what we'll talk about.

The object table follows immediately after the PE header. The entries in this table have the following format:

Struct pe_ohdr ( unsigned char o_name; unsigned long o_vsize; unsigned long o_vaddr; unsigned long o_psize; unsigned long o_poff; unsigned char o_reserved; unsigned long o_flags; );

o_name - section name, it is absolutely indifferent for loading;
o_vsize - section size in memory;
o_vaddr - memory address relative to ImageBase;
o_psize - section size in the file;
o_poff - section offset in the file;
o_flags - section flags;

Here it is worth dwelling on the flags in more detail.

00000004h - used for code with 16 bit offsets
00000020h - code section
00000040h - initialized data section
00000080h - uninitialized data section
00000200h - comments or any other type of information
00000400h - overlay section
00000800h - will not be part of the program image
00001000h - general data
00500000h - default alignment unless otherwise specified
02000000h - can be unloaded from memory
04000000h - not cached
08000000h - not pageable
10000000h - shared
20000000h - feasible
40000000h - can be read
80000000h - you can write

Again, I will not be with shared and overlay sections, we are interested in code, data and access rights.

In general, this information is already enough to download a binary file.

Loading PE format.

int LoadPE (unsigned char *bin) ( struct elf32_hdr *PH = (struct pe_hdr *) (bin + *((unsigned long *)&bin)); // Of course, the combination is not clear... just take dword at offset 0x3c / / And calculate the PE header address in the file image struct elf32_phdr *POH; if (PH == NULL || // Control the PH->pe_sign pointer != 0x4550 || // PE signature ("P", "E", 0, 0) PH->pe_cputype != 0x14c || // i386 (PH->pe_flags & 2) == 0) // file cannot be run! return PE_WRONG; POH = (struct pe_ohdr *)((unsigned char *)PH + 0xf8); while (PH->pe_obj_num--) ( if ((POH->p_flags & 0x60) != 0) // either code or initialized data memcpy (PE->pe_image_base + POH->o_vaddr, bin + POH- >o_poff, POH->o_psize); POH = (struct pe_ohdr *)((unsigned char *)POH + sizeof (struct pe_ohdr)); ) return PE_OK; )

This is again not a finished program, but a loading algorithm.

And again, many points are not covered, as they go beyond the topic.

But now it’s worth talking a little about the existing system features.

System features.

Despite the flexibility of the protections available in processors (protection at the level of descriptor tables, protection at the segment level, page level protection), in existing systems (both in Windows and in Unix), only page protection is fully used, which, although it can keep code from being written, but cannot keep data from being executed. (Maybe this is the reason for the abundance of system vulnerabilities?)

All segments are addressed from linear address zero and extend to the end of linear memory. Process demarcation occurs only at the page table level.

In this regard, all modules are linked not from the starting addresses, but with a sufficiently large offset in the segment. On Windows, the base address in the segment is 0x400000, on Unix (Linux or FreeBSD) it is 0x8048000.

Some features are also related to the paging of memory.

ELF files are linked in such a way that the boundaries and sizes of sections fall on 4 kilobyte blocks of the file.

And in the PE format, despite the fact that the format itself allows you to align sections of 512 bytes, 4k section alignment is used, a smaller alignment in Windows is not considered correct.

If your computer has antivirus program can scan all files on the computer, as well as each file individually. You can scan any file by right-clicking on the file and selecting the appropriate option to scan the file for viruses.

For example, in this figure, file my-file.elf, then you need to right-click on this file, and in the file menu select the option "scan with AVG". Selecting this option will open AVG Antivirus and scan the file for viruses.

Sometimes an error can result from incorrect software installation, which may be due to a problem that occurred during the installation process. It may interfere with your operating system associate your ELF file with the correct software application, influencing the so-called "file extension associations".

Sometimes simple reinstalling Dolphin (emulator) may solve your problem by properly linking ELF with Dolphin (emulator). In other cases, file association problems may result from bad software programming developer, and you may need to contact the developer for further assistance.

Advice: Try updating Dolphin (emulator) to the latest version to make sure you have the latest patches and updates.

This may seem too obvious, but often the ELF file itself may be causing the problem. If you received a file via an email attachment or downloaded it from a website and the download process was interrupted (for example, by a power outage or other reason), the file may be corrupted. If possible, try to get a fresh copy of the ELF file and try to open it again.

Carefully: A corrupted file can cause collateral damage to previous or existing malware on your PC, so it's important to keep your computer up to date with an up-to-date antivirus.

If your ELF file associated with the hardware on your computer to open the file you may need update device drivers associated with this equipment.

This problem usually associated with media file types, which depend on the successful opening of the hardware inside the computer, for example, sound card or video card. For example, if you are trying to open an audio file but cannot open it, you may need to update sound card drivers.

Advice: If when you try to open an ELF file you get .SYS file related error message, the problem could probably be associated with corrupted or outdated device drivers that need to be updated. This process can be facilitated by using driver update software such as DriverDoc.

If the steps didn't solve the problem and you are still having problems opening ELF files, this may be due to lack of available system resources. Some versions of ELF files may require a significant amount of resources (eg. memory/RAM, processing power) to open properly on your computer. This problem is quite common if you are using fairly old computer hardware and a much newer operating system at the same time.

This problem can occur when the computer is having a hard time completing a task because the operating system (and other services running in the background) can consume too many resources to open ELF file. Try closing all applications on your PC before opening Nintendo Wii Game File. By freeing up all available resources on your computer, you will ensure the best conditions for trying to open the ELF file.

If you completed all the above steps and your ELF file still won't open, you may need to run hardware upgrade. In most cases, even with older hardware versions, the processing power can still be more than enough for most user applications (unless you're doing a lot of CPU-intensive work like 3D rendering, financial/science modeling, or media-intensive work) . In this way, it is likely that your computer does not have enough memory(more commonly referred to as "RAM", or RAM) to perform the task of opening a file.

Version of this answer with good TOC and more content: http://www.cirosantilli.com/elf-hello-world (click here 30k char limit)

Standards

ELF is given by LSB:

core generic: http://refspecs.linuxfoundation.org/LSB_4.1.0/LSB-Core-generic/LSB-Core-generic/elf-generic.html
core AMD64: http://refspecs.linuxfoundation.org/LSB_4.1.0/LSB-Core-AMD64/LSB-Core-AMD64/book1.html

LSB mostly refers to other standards with minor extensions, in particular:

generic (both by SCO):

System V ABI 4.1 (1997) http://www.sco.com/developers/devspecs/gabi41.pdf, not 64 bits, although a magic number is reserved for it. The same for the main files.
System V ABI Update DRAFT 17 (2003) http://www.sco.com/developers/gabi/2003-12-17/contents.html adds 64 bits. Only updates chapters 4 and 5 of the previous document: the rest remain valid and still referenced.

specific architecture:
- IA-32: http://refspecs.linuxfoundation.org/LSB_4.1.0/LSB-Core-IA32/LSB-Core-IA32/elf-ia32.html points mainly to http://www.sco.com/developers /devspecs/abi386-4.pdf
- AMD64: http://refspecs.linuxfoundation.org/LSB_4.1.0/LSB-Core-AMD64/LSB-Core-AMD64/elf-amd64.html , basically points to http://www.x86-64.org/ documentation/abi.pdf

A handy resume can be found at:

Its structure can be examined using user-friendly ways such as readelf and objdump .

Create an example

Let's break down a minimal Linux x86-64 executable example:

Section .data hello_world db "Hello world!", 10 hello_world_len equ $ - hello_world section .text global _start _start: mov rax, 1 mov rdi, 1 mov rsi, hello_world mov rdx, hello_world_len syscall mov rax, 60 mov rdi, 0 syscall

Compiled with

Nasm -w+all -f elf64 -o "hello_world.o" "hello_world.asm" ld -o "hello_world.out" "hello_world.o"

NASM 2.10.09
Binutils version 2.24 (contains ld)
Ubuntu 14.04

We don't use a C program as that would complicate the analysis which would be level 2 :-)

hexadecimal representations of binary

hd hello_world.o hd hello_world.out

Global file structure

An ELF file contains the following parts:

ELF header. Indicates the position of the section header table and program header table.

Section header table (optional in the executable). Each of them has section headers e_shnum , each of which indicates the position of the section.

N partitions with N<= e_shnum (необязательно в исполняемом файле)

Program header table (only for executable files). Each of these has e_phnum program headers, each of which indicates the position of the segment.

N segments, with N<= e_phnum (необязательно в исполняемом файле)

The order of these parts is not fixed: the only fixed thing is the ELF header, which must be first in the file: The common docs say:

ELF header

The easiest way to watch the header is:

readelf -h hello_world.o readelf -h hello_world.out

Byte in object file:

00000000 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 |.ELF...........| 00000010 01 00 3e 00 01 00 00 00 00 00 00 00 00 00 00 00 |..>.............| 00000020 00 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00 | [email protected]| 00000030 00 00 00 00 40 00 00 00 00 00 40 00 07 00 03 00 |[email protected]@.....|

00000000 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 |.ELF...........| 00000010 02 00 3e 00 01 00 00 00 b0 00 40 00 00 00 00 00 |..> [email protected]| 00000020 40 00 00 00 00 00 00 00 10 01 00 00 00 00 00 00 |@...............| 00000030 00 00 00 00 40 00 38 00 02 00 40 00 06 00 03 00 |[email protected]@.....|

Presented structure:

Typedef struct (unsigned char e_ident; Elf64_Half e_type; Elf64_Half e_machine; Elf64_Word e_version; Elf64_Addr e_entry; Elf64_Off e_phoff; Elf64_Off e_shoff; Elf64_Word e_flags; Elf64_Half e_ehsize; Elf64_Half e_phentsize; Elf64_Half e_phnum; Elf64_Half e_shentsize; Elf64_Half e_shnum; Elf64_Half e_shstrndx;) Elf64_Ehdr;

Decay by hand:

0 0: EI_MAG = 7f 45 4c 46 = 0x7f "E", "L", "F" : ELF magic number

0 4: EI_CLASS=02=ELFCLASS64: 64-bit elf

0 5: EI_DATA = 01 = ELFDATA2LSB: big end data

0 6: EI_VERSION = 01: format version

0 7: EI_OSABI (2003 only) = 00 = ELFOSABI_NONE: No extensions.

0 8: EI_PAD = 8x 00: reserved bytes. Must be set to 0.

1 0: e_type = 01 00 = 1 (big endian) = ET_REl: relocatable format

In the executable 02 00 for ET_EXEC .

1 2: e_machine = 3e 00 = 62 = EM_X86_64: AMD64 architecture

1 4: e_version = 01 00 00 00: should be 1

1 8: e_entry = 8x 00: entry point of the execution address, or 0 if not applicable, as for an object file, since there is no entry point.

In the executable, this is b0 00 40 00 00 00 00 00 . TODO: what else can we install? The kernel seems to put the IP directly into this value, it is not hardcoded.

2 0: e_phoff = 8x 00: program header table offset, 0 if not.

40 00 00 00 in the executable, meaning it starts right after the ELF header.

2 8: e_shoff = 40 7x 00 = 0x40: section header table file offset, 0 if none.

3 0: e_flags = 00 00 00 00 TODO. Specially for Arch.

3 4: e_ehsize = 40 00: size of this elf header. Why is this field? How can this change?

3 6: e_phentsize = 00 00: size of each program header, 0 if none.

38 00 in executable file: file length is 56 bytes

3 8: e_phnum = 00 00: number of program header entries, 0 if none.

02 00 in the executable: there are 2 entries.

3 A: e_shentsize and e_shnum = 40 00 07 00: section header size and number of entries

Section header table

An array of Elf64_Shdr structures.

Each entry contains metadata about that section.

e_shoff of the ELF header gives the starting position here, 0x40.

e_shentsize and e_shnum from the ELF header say we have 7 entries, each 0x40 long.

So the table takes bytes from 0x40 to 0x40 + 7 + 0x40 - 1 = 0x1FF.

Some section titles are reserved for certain section types: http://www.sco.com/developers/gabi/2003-12-17/ch4.sheader.html#special_sections for example. .text requires type SHT_PROGBITS and SHF_ALLOC + SHF_EXECINSTR

readelf -S hello_world.o:

There are 7 section headers, starting at offset 0x40: Section Headers: Name Type Address Offset Size EntSize Flags Link Info Align [0] NULL 0000000000000000 00000000 0000000000000000 0000000000000000 0 0 0 [1] .data PROGBITS 0000000000000000 00000200 0000000000000000 000000000000000d WA 0 0 4 [ 2] .text PROGBITS 0000000000000000 00000210 0000000000000027 0000000000000000 AX 0 0 16 [3] .shstrtab STRTAB 0000000000000000 00000240 0000000000000032 0000000000000000 0 0 1 [4] .symtab SYMTAB 0000000000000000 00000280 0000000000000018 00000000000000a8 5 6 4 [5] .strtab STRTAB 00000330 0000000000000000 0000000000000034 0000000000000000 0 0 1 [ 6] .rela.text RELA 0000000000000000 00000370 0000000000000018 0000000000000018 4 2 4 Key to Flags: W (write), A (alloc), X (execute), M (merge), S (strings), l (large) I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown) O (extra OS processing required) o (OS specific), p (processor specific)

struct , represented by each entry:

Typedef struct ( Elf64_Word sh_name; Elf64_Word sh_type; Elf64_Xword sh_flags; Elf64_Addr sh_addr; Elf64_Off sh_offset; Elf64_Xword sh_size; Elf64_Word sh_link; Elf64_Word sh_info; Elf64_Xword sh_addralign; Elf64_Xword sh_entsize; ) Elf64;

Sections

Index section 0

Contained in bytes 0x40 to 0x7F.

The first section is always magical: http://www.sco.com/developers/gabi/2003-12-17/ch4.sheader.html says:

If the number of partitions is greater than or equal to SHN_LORESERVE (0xff00), e_shnum is set to SHN_UNDEF (0) and the actual number of partition header table entries is contained in the sh_size field of the partition header at index 0 (otherwise, the sh_size member of the initial entry contains 0).

There are other magical sections in Figure 4-7: Special Section Indexes.

At index 0, SHT_NULL is required. Are there other uses for this: What is the use of the SHT_NULL section in ELF? ?

.data section

Data is section 1:

00000080 01 00 00 00 01 00 00 00 03 00 00 00 00 00 00 00 |................| 00000090 00 00 00 00 00 00 00 00 00 02 00 00 00 00 00 00 |................| 000000a0 0d 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 000000b0 04 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|

Here, 1 says that the name of this section starts at the first character of this section and ends at the first NUL character, making up the string.data .

Data is one of the section names which has a predefined meaning http://www.sco.com/developers/gabi/2003-12-17/ch4.strtab.html

These sections store initialized data that contributes to the program's memory image.

80 4: sh_type = 01 00 00 00: SHT_PROGBITS: section contents are not specified by ELF, only by how the program interprets it. It's ok, since a .data .

80 8: sh_flags = 03 7x 00: SHF_ALLOC and SHF_EXECINSTR: http://www.sco.com/developers/gabi/2003-12-17/ch4.sheader.html#sh_flags , as required from the .data section

90 0: sh_addr = 8x 00: in which virtual address the section will be placed at runtime, 0 if not placed

90 8: sh_offset = 00 02 00 00 00 00 00 00 = 0x200: number of bytes from the beginning of the program to the first byte in this section

a0 0: sh_size = 0d 00 00 00 00 00 00 00

If we take 0xD bytes, starting at sh_offset 200, we see:
00000200 48 65 6c 6c 6f 20 77 6f 72 6c 64 21 0a 00 |Hello world!.. |
AHA! So our string "Hello world!" is in the data section, as we said, it's on NASM.

Once we've finished hd , we'll look at it like this:
Readelf -x .data hello_world.o
which outputs:
Hex dump of section ".data": 0x00000000 48656c6c 6f20776f 726c6421 0a Hello world!.
NASM sets decent properties for this section because it magically refers to .data: http://www.nasm.us/doc/nasmdoc7.html#section-7.9.2

Also note that this was the wrong partition choice: a good C compiler would put the line in .rodata instead, because it's read-only, and that would allow OS optimization to continue.

a0 8: sh_link and sh_info = 8x 0: do not apply to this section type. http://www.sco.com/developers/gabi/2003-12-17/ch4.sheader.html#special_sections

b0 0: sh_addralign = 04 = TODO: why is this alignment necessary? Is it only for sh_addr and also for characters inside sh_addr ?

b0 8: sh_entsize = 00 = section does not contain a table. If != 0, this means that the section contains a table of records of a fixed size. In this file, we can see from the readelf output that this is the case for the .symtab and .rela.text sections.

.text section

Now that we've made one section by hand, let's graduate and use readelf -S of the other sections.

Name Type Address Offset Size EntSize Flags Link Info Align [ 2] .text

Text is executable but not writable: if we try to write Linux segfaults to it. Let's see if we really have the code:

Objdump -d hello_world.o

Hello_world.o: file format elf64-x86-64 Disassembly of section .text: 0000000000000000<_start>: 0: b8 01 00 00 00 mov $0x1,%eax 5: bf 01 00 00 00 mov $0x1,%edi a: 48 be 00 00 00 00 00 movabs $0x0,%rsi 11: 00 00 00 14: ba 0d 00 00 00 mov $0xd,%edx 19: 0f 05 syscall 1b: b8 3c 00 00 00 mov $0x3c,%eax 20: bf 00 00 00 00 mov $0x0,%edi 25: 0f 05 syscall

If we have grep b8 01 00 00 on hd , we see that it only happens at 00000210 , which is what this section says. And the size is 27, which also fits. Therefore, we must talk about the correct section.

This looks like the correct code: a write followed by an exit .

The most interesting part is the line a , which does:

Movabs $0x0,%rsi

pass the address of the string to the system call. Currently 0x0 is just a placeholder. After binding, it will change:

4000ba: 48 be d8 00 60 00 00 movabs $0x6000d8,%rsi

This modification is possible due to the data in the .rela.text section.

SHT_STRTAB

Sections with sh_type == SHT_STRTAB are called string tables.

Such sections are used by other sections when string names are to be used. The Usage section says:

what line do they use
what is the index in the table of target rows where the row starts

So, for example, we could have a string table containing: TODO: should we start with \0 ?

Data: \0 a b c \0 d e f \0 Index: 0 1 2 3 4 5 6 7 8

And if another section wants to use the string d e f , they must point to index 5 of that section (letter d).

Known string tables:

.shstrtab
.strtab

.shstrtab

Section type: sh_type == SHT_STRTAB .

Common name: section title header line.

The section name.shstrtab is reserved. The standard says:

This section contains section names.

This section specifies the e_shstrnd field of the ELF header itself.

The row indices of this section are indicated by the sh_name field of the section headers that denote the rows.

This section does not specify SHF_ALLOC , so it will not appear in the executable.

Readelf -x .shstrtab hello_world.o

Hex dump of section ".shstrtab": 0x00000000 002e6461 7461002e 74,657,874 73,747,274 002e7368 ..data..text..sh 0x00000010 6162002e 73796d74 6162002e strtab..symtab .. 0x00000020 73747274 6162002e 72656c61 2e746578 strtab..rela.tex 0x00000030 7400 t.

The data in this section is in a fixed format: http://www.sco.com/developers/gabi/2003-12-17/ch4.strtab.html

If we look at the other section names, we can see that they all contain numbers, eg. the .text section is numbered 7 .

Then each line ends when the first NUL character is found, e.g. character 12 \0 immediately after .text\0 .

.symtab

Section type: sh_type == SHT_SYMTAB .

Common name: symbol table.

Let's first note that:

sh_link = 5
sh_info=6

In the SHT_SYMTAB section, these numbers mean that:

Strings
which give symbol names are in section 5, .strtab
relocation data is in section 6, .rela.text

A good high-level tool for disassembling this section:

Nm hello_world.o

which gives:

0000000000000000 T _start 0000000000000000 d hello_world 000000000000000d a hello_world_len

This is, however, a high-level representation that omits certain types of characters and denotes characters. A more detailed breakdown can be obtained using:

Readelf -s hello_world.o

which gives:

Symbol table ".symtab" contains 7 entries: Num: Value Size Type Bind Vis Ndx Name 0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND 1: 0000000000000000 0 FILE LOCAL DEFAULT ABS hello_world.asm 2: 0000000000000000 0 SECTION LOCAL DEFAULT 1 3: 0000000000000000 0 _

The binary format of the table is documented at http://www.sco.com/developers/gabi/2003-12-17/ch4.symtab.html

Readelf -x .symtab hello_world.o

What gives:

Hex dump of section ".symtab": 0x00000000 00000000 00000000 00000000 00000000 ................ 0x00000010 00000000 00000000 01000000 0400f1ff ............... . 0x00000020 00000000 00000000 00000000 00000000 ................ 0x00000030 00000000 0x00000000 00000000 0x000000 00000000 000000 ................ 0x00000040 00000000 00000000 00000000 0x000000 .............. 0x00000050 00000000 00000000 00000000 00000000 ...... 0x00000070 00000000 00000000 1d000000 0000f1ff ................ 0x00000080 0d000000 00000000 00000000 00000000 ................ 0x00000090 2d000000 10000200 00000000 00000000 -............. 0x000000a0 00000000 00000000 ........

The entries are of type:

Typedef struct ( Elf64_Word st_name; unsigned char st_info; unsigned char st_other; Elf64_Half st_shndx; Elf64_Addr st_value; Elf64_Xword st_size; ) Elf64_Sym;

Like the partition table, the first entry is magic and is set to fixed meaningless values.

Entry 1 has ELF64_R_TYPE == STT_FILE . ELF64_R_TYPE continues inside st_info .

Byte analysis:

10 8: st_name = 01000000 = character 1 in .strtab , which until the next \0 does hello_world.asm

This fragment of the info file can be used by the linker to determine which segments of the segment are coming.

10 12: st_info = 04

Bits 0-3 = ELF64_R_TYPE = Type = 4 = STT_FILE: The main purpose of this entry is to use st_name to specify the name of the file generated by this object file.

Bits 4-7 = ELF64_ST_BIND = Binding = 0 = STB_LOCAL . Required value for STT_FILE .

10 13: st_shndx = Symbol table Header table Index = f1ff = SHN_ABS . Required for STT_FILE .

20 0: st_value = 8x 00: required for value for STT_FILE

20 8: st_size = 8x 00: no allocated size

Now from readelf we quickly interpret the rest.

STT_SECTION

There are two such elements, one pointing to .data and the other to .text (section indexes 1 and 2).

Num: Value Size Type Bind Vis Ndx Name 2: 0000000000000000 0 SECTION LOCAL DEFAULT 1 3: 0000000000000000 0 SECTION LOCAL DEFAULT 2

TODO, what is their purpose?

STT_NOTYPE

Then enter the most important characters:

Num: Value Size Type Bind Vis Ndx Name 4: 0000000000000000 0 NOTYPE LOCAL DEFAULT 1 hello_world 5: 000000000000000d 0 NOTYPE LOCAL DEFAULT ABS hello_world_len 6: 0000000000000000 start NOTYPE GLOBAL _String 2

hello_world is in the .data section (index 1). This value is 0: it points to the first byte of this section.

Start is marked with GLOBAL visibility, as we wrote:

global_start

at NASM. This is necessary as it should be considered as an entry point. Unlike C, NASM labels are local by default.

hello_world_len points to the special st_shndx == SHN_ABS == 0xF1FF .

0xF1FF is chosen so as not to conflict with other sections.

st_value == 0xD == 13 , which is the value we stored there on assembly: the length of the string Hello World! .

This means that the move will not affect this value: it is a constant.

This is a small optimization that our assembler does for us and has ELF support.

If we used the hello_world_len address anywhere, the assembler would fail to mark it as SHN_ABS and the linker would have an extra relocation later.

SHT_SYMTAB in the executable

By default, NASM places the .symtab in the executable.

This is only used for debugging. Without symbols, we are completely blind and must redesign everything.

You can remove it with objcopy and the executable will still work. Such executables are called split executables.

.strtab

Holds strings for the character table.

In this section, sh_type == SHT_STRTAB .

Points to sh_link == 5 of the .symtab section.

Readelf -x .strtab hello_world.o

Hex dump of section ".strtab": 0x00000000 0068656c 6c6f5f77 6f726c64 2e61736d .hello_world.asm 0x00000010 0068656c 6c6f5f77 6f726c64 0068656c .hello_world.hel 0x00000020 6c6f5f77 6f726c64 5f6c656e 005f7374 lo_world_len._st 0x00000030 61727400 art.

This means it's an ELF level restriction that global variables cannot contain NUL characters.

.rela.text

Section type: sh_type == SHT_RELA .

Common name: move section.

Rela.text contains relocation data that specifies how the address should be changed when the last executable is linked. This indicates the bytes of the text area that should be changed when linking occurs with the correct memory locations.

Basically, it converts the object text containing the 0x0 placeholder address:

A:48 be 00 00 00 00 00 movabs $0x0,%rsi 11:00 00 00

to the actual executable code containing the final 0x6000d8:

4000ba: 48 be d8 00 60 00 00 movabs $0x6000d8,%rsi 4000c1: 00 00 00

The sh_info = 6 of the .symtab section was specified.

readelf -r hello_world.o gives:

Relocation section ".rela.text" at offset 0x3b0 contains 1 entries: Offset Info Type Sym. Value Sym. Name + Addend 00000000000c 000200000001 R_X86_64_64 0000000000000000 .data + 0

The section does not exist in the executable.

Actual bytes:

00000370 0c 00 00 00 00 00 00 00 01 00 00 00 02 00 00 00 |................| 00000380 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|

Represented struct:

Typedef struct ( Elf64_Addr r_offset; Elf64_Xword r_info; Elf64_Sxword r_addend; ) Elf64_Rela;

370 0: r_offset = 0xC: address to address.text whose address will be changed

370 8: r_info = 0x200000001. Contains 2 fields:

ELF64_R_TYPE = 0x1: The value depends on the exact architecture.
ELF64_R_SYM = 0x2: The index of the partition pointed to by the address, therefore .data , which is at index 2.

The AMD64 ABI says that type 1 is called R_X86_64_64 and that it represents the S+A operation where:

S: the value of the symbol in the object file, here 0 because we are pointing to 00 00 00 00 00 00 00 00 from movabs $0x0,%rsi
a: the addition present in the r_added field

This address is added to the partition where the move is running.

This move operation operates on 8 bytes.

380 0: r_addend = 0

Thus, in our example, we conclude that the new address will be: S + A = .data + 0 , and thus the first in the data section.

Program Title Table

Displayed only in the executable file.

Contains information about how the executable should be placed in the process's virtual memory.

The executable file is created by the linker object file. The main tasks that the linker performs:

determine which sections of the object files go into which segments of the executable.

In Binutils it comes down to parsing the builder script and working with a lot of defaults.

You can get the linker script used with ld --verbose and install the custom one with ld -T .

navigate through text sections. It depends on how multiple partitions fit into memory.

readelf -l hello_world.out gives:

Elf file type is EXEC (Executable file) Entry point 0x4000b0 There are 2 program headers, starting at offset 64 Program Headers: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flags Align LOAD 0x0000000000000000 0x0000000000400000 0x0000000000400000 0x00000000000000d7 0x00000000000000d7 RE 200000 LOAD 0x00000000000000d8 0x00000000006000d8 0x00000000006000d8 0x000000000000000d 0x000000000000000d RW 200000 Section to Segment mapping: Segment Sections... 00 .text 01 .data

In the ELF header e_phoff , e_phnum and e_phentsize told us that there are 2 program headers that start at 0x40 and are 0x38 bytes long each, so they are:

00000040 01 00 00 00 05 00 00 00 00 00 00 00 00 00 00 00 |................| 00000050 00 00 40 00 00 00 00 00 00 00 40 00 00 00 00 00 |[email protected]@.....| 00000060 d7 00 00 00 00 00 00 00 d7 00 00 00 00 00 00 00 |................| 00000070 00 00 20 00 00 00 00 00 |.. ..... |

00000070 01 00 00 00 06 00 00 00 | ........| 00000080 d8 00 00 00 00 00 00 00 d8 00 60 00 00 00 00 00 |..........`.....| 00000090 d8 00 60 00 00 00 00 00 0d 00 00 00 00 00 00 00 |..`.............| 000000a0 0d 00 00 00 00 00 00 00 00 00 20 00 00 00 00 00 |.......... .....| typedef struct ( Elf64_Word p_type; Elf64_Word p_flags; Elf64_Off p_offset; Elf64_Addr p_vaddr; Elf64_Addr p_paddr; Elf64_Xword p_filesz; Elf64_Xword p_memsz; Elf64_Xword p_align; ) Elf64_Phdr;

Breakdown of the first:

40 0: p_type = 01 00 00 00 = PT_LOAD: TODO. I think this means it will be loaded into memory. Other types may not necessarily be.
40 4: p_flags = 05 00 00 00 = execute and read permissions, do not write TODO
40 8: p_offset = 8x 00 TODO: what is this? It looks like offsets from the beginning of the segments. But would that mean that some segments are intertwined? You can play with it a bit: gcc -Wl,-Ttext-segment=0x400030 hello_world.c
50 0: p_vaddr = 00 00 40 00 00 00 00 00: starting virtual memory address to load this segment into
50 8: p_paddr = 00 00 40 00 00 00 00 00: starting physical address to load into memory. Only questions for systems where the program can set the physical address. Otherwise, as with System V systems, anything can happen. Seems like NASM will just copy p_vaddrr
60 0: p_filesz = d7 00 00 00 00 00 00 00: TODO vs p_memsz
60 8: p_memsz = d7 00 00 00 00 00 00 00: TODO
70 0: p_align = 00 00 20 00 00 00 00 00: 0 or 1 means no alignment is required TODO, what does that mean? otherwise redundant with other fields

The second is similar.

Section to Segment mapping:

the readelf section tells us that:

0 - segment.text . Yup, that's why it's executable and not writable.
1 - segment.data .