RISCOS.com

www.riscos.com Technical Support:
DDE (Desktop Development Environment)

 


Code file formats


This appendix defines three file formats used by the Desktop tools to store processed code and the format of debugging data used by DDT:

  • AOF - ARM Object Format
  • ALF - Acorn Library Format
  • AIF - ARM Image Format
  • ASD - ARM Symbolic Debugging Format.

Desktop tools language processors such as CC and ObjAsm generate processed code output as AOF files. An ALF file is a collection of AOF files constructed from a set of AOF files by the LibFile tool. The Link tool accepts a set of AOF and ALF files as input, and by default produces an executable program file as output in AIF.

Terminology

Throughout this appendix the terms byte, half word, word, and string are used to mean the following:

Byte: 8 bits, considered unsigned unless otherwise stated, usually used to store flag bits or characters.

Half word:16 bits, or 2 bytes, usually unsigned. The least significant byte has the lowest address (DEC/Intel byte sex, sometimes called little endian). The address of a half word (i.e. of its least significant byte) must be divisible by 2.

Word: 32 bits, or 4 bytes, usually used to store a non-negative value. The least significant byte has the lowest address (DEC/Intel byte sex, sometimes called little endian). The address of a word (i.e. of its least significant byte) must be divisible by 4.

String: A sequence of bytes terminated by a NUL (0X00) byte. The NUL is part of the string but is not counted in the string's length. Strings may be aligned on any byte boundary.

Note: a word consists of 32 bits, 4-byte aligned; within a word, the least significant byte has the lowest address. This is DEC/Intel, or little endian, byte sex, not IBM/Motorola byte sex.

Byte Sex or Endian-ness

There are two sorts of AOF or ALF: little-endian and big-endian.

In little-endian AOF or ALF, the least significant byte of a word or half-word has the lowest address of any byte in the (half-)word. This byte sex is used by DEC, Intel and Acorn, amongst others.

In big-endian AOF or ALF, the most significant byte of a (half-)word has the lowest address. This byte sex is used by IBM, Motorola and Apple, amongst others.

For data in a file, address means 'offset from the start of the file'.

There is no guarantee that the endian-ness of an AOF or ALF file will be the same as the endian-ness of the system used to process it (the endian-ness of the file is always the same as the endian-ness of the target ARM system).

The two sorts of AOF or ALF cannot, be mixed (the target system cannot have mixed endian-ness: it must have one or the other). Thus the ARM linker will accept inputs of either sex and produce an output of the same sex, but will reject inputs of mixed endian-ness.

Alignment

Strings and bytes may be aligned on any byte boundary.

AOF and ALF fields defined in this appendix make no use of half-words and align words on 4-byte boundaries.

Within the contents of an AOF or ALF file the alignment of words and half-words is defined by the use to which AOF or ALF is being put.

For all current ARM-based systems, words are aligned on 4-byte boundaries and half-words on 2-byte boundaries.

Undefined fields

Fields not explicitly defined by this appendix are implicitly reserved to Acorn. It is required that all such fields be zeroed. Acorn may ascribe meaning to such fields at any time, but will usually do so in a manner which gives no new meaning to zeroes.

AOF

ARM object format files are output by language processors such as CC and ObjAsm.

Chunk file format

A chunk is accessed via a header at the start of the file. The header contains the number, size, location and identity of each chunk in the file. The size of the header may vary between different chunk files but is fixed for each file. Not all entries in a header need be used, thus limited expansion of the number of chunks is permitted without a wholesale copy. A chunk file can be copied without knowledge of the contents of the individual chunks.

Graphically, the layout of a chunk file is as follows:

APPE-2.GIF

ChunkFileId marks the file as a chunk file. Its value is 0xC3CBC6C5. The endian-ness of the chunk file can be deduced from this value (if, when read as a word, it appears to be 0xC5C6CBC3 then each word value must be byte-reversed before use).

The MaxChunks field defines the number of the entries in the header, fixed when the file is created. The NumChunks field defines how many chunks are currently used in the file, which can vary from 0 to MaxChunks. The value of NumChunks is redundant as it can be found by scanning the entries.

Each entry in the header comprises four words in the following order:

chunkId is an 8-byte field identifying what data the chunk contains (note that this is an 8-byte field, not a 2-word field, so it has the same byte order independent of endian-ness).
fileOffset is a one word field defining the byte offset within the file of the start of the chunk. All chunks are word-aligned, so it must be divisible by four. A value of zero indicates that the chunk entry is unused.
size a one word field defining the exact byte size of the chunk (which need not be a multiple of four).

The chunkId field provides a conventional way of identifying what type of data a chunk contains. It is split into two parts. The first four characters contain a unique name allocated by a central authority (Acorn). The remaining four characters can be used to identify component chunks within this domain. The 8 characters are stored in ascending address order, as if they formed part of a NUL-terminated string (which they do not), independently of endian-ness.

For AOF files, the first part of each chunk's name is OBJ_; the second components are defined later in this section.

Object file format

Each piece of an object file is stored in a separate, identifiable, chunk. AOF defines five chunks as follows:

Chunk Chunk Name
Header OBJ_HEAD
Areas OBJ_AREA
Identification OBJ_IDFN
Symbol Table OBJ_SYMT
String Table OBJ_STRT

Only the header and areas chunks must be present, but a typical object file will contain all five of the above chunks.

Each name in an object file is encoded as an offset into the string table, stored in the OBJ_STRT chunk (see String table chunk (OBJ_STRT)). This allows the variable-length nature of names to be factored out from primary data formats.

A feature of chunk file format is that chunks may appear in any order in the file. However, language processors which must also generate other object formats - such as Unix's a.out format - should use this flexibility cautiously.

A language translator or other system utility may add additional chunks to an object file, for example a language-specific symbol table or language-specific debugging data, so it is conventional to allow space in the chunk header for additional chunks; space for eight chunks is conventional when the AOF file is produced by a language processor which generates all five chunks described here.

The header chunk should not be confused with the chunk file's header.

Format of the AOF header chunk

The AOF header is logically in two parts, though these appear contiguously in the header chunk. The first part is of fixed size and describes the contents and nature of the object file. The second part is variable in length (specified in the fixed part) and is a sequence of area declarations defining the code and data areas within the OBJ_AREA chunk.

The AOF header chunk (OBJ_HEAD) has the following format:

APPE-3.GIF

Object file type

0xC5E2D080 marks the file as being in relocatable object format (the usual output of compilers and assemblers and the usual input to the linker).

The endian-ness of the object code can be deduced from this value and shall be identical to the endian-ness of the containing chunk file.

Version ID

Encodes the version of AOF to which the object file complies: version 1.50 is denoted by decimal 150; version 2.00 by 200; version 3.10 by 310; and this version 3.11 by decimal 311 (0x137).

Number of areas

The code and data of the object file is presented as a number of separate areas, in the OBJ_AREA chunk, each with a name and some attributes (see below). Each area is declared in the (variable-length) part of the header which immediately follows the fixed part. The value of the Number of Areas field defines the number of areas in the file and consequently the number of area declarations which follow the fixed part of the header.

Number of symbols

If the object file contains a symbol table chunk OBJ_SYMT, then this field defines the number of symbols in the symbol table.

Entry address area/ entry address offset

One of the areas in an object file may be designated as containing the start address of any program which is linked to include the file. If this is the case, the entry address is specified as an Entry Area Index, Entry Offset pair. Entry Area Index, in the range 1 to Number of Areas, gives the 1-origin index in the following array of area headers of the area containing the entry point. The entry address is defined to be the base address of this area plus Entry Offset.

A value of 0 for area-index signifies that no program entry address is defined by this AOF file.

Format of area headers

The area headers follow the fixed part of the AOF header. Each area header has the following form:

APPE-4.GIF

Area name

Each area within an object file must be given a name which is unique amongst all the areas in the file. Area Name gives the offset of that name in the string table (stored in the OBJ_STRT chunk - see String table chunk (OBJ_STRT)).

Area size

This field gives the size of the area in bytes, which must be a multiple of 4. Unless the Uninitialised bit (bit 4) is set in the area attributes (see Attributes and Alignment), there must be this number of bytes for this area in the OBJ_AREA chunk. If the Uninitialised bit is set, then there shall be no initialising bytes for this area in the OBJ_AREA chunk.

Number of relocations

This word specifies the number of relocation directives which apply to this area, (equivalently: the number of relocation records following the area's contents in the OBJ_AREA chunk - see Format of the areas chunk).

Attributes and Alignment

Each area has a set of attributes encoded in the most-significant 24 bits of the Attributes + Alignment word. The least-significant 8 bits of this word encode the alignment of the start of the area as a power of 2 and shall have a value between 2 and 32 (this value denotes that the area should start at an address divisible by 2alignment).

The linker orders areas in a generated image first by attributes, then by the (case-significant) lexicographic order of area names, then by position of the containing object module in the link list. The position in the link list of an object module loaded from a library is not predictable.

The precise significance to the linker of area attributes depends on the output being generated.

Bit 8

Bit 8 encodes the absolute attribute and denotes that the area must be placed at its Base Address. This bit is not usually set by language processors.

Bit 9

Bit 9 encodes the code attribute: if set the area contains code; otherwise it contains data.

Bits 10 and 11

Bits 10, 11 encode the common block definition and common block reference attributes, respectively.

Bit 10 specifies that the area is a common block definition.

Bit 11 defines the area to be a reference to a common block, and precludes the area having initialising data (see Bit 12, below). In effect, bit 11 implies bit 12.

If both bits 10 and 11 are set, bit 11 is ignored.

Common areas with the same name are overlaid on each other by the linker. The Area Size field of a common definition area defines the size of a common block. All other references to this common block must specify a size which is smaller or equal to the definition size. If, in a link step, there is more than one definition of an area with the common definition attribute (area of the given name with bit 10 set), then each of these areas must have exactly the same contents. If there is no definition of a common area, its size will be the size of the largest common reference to it.

Although common areas conventionally hold data, it is quite legal to use bit 10 in conjunction with bit 9 to define a common block containing code. This is most useful for defining a code area which must be generated in several compilation units but which should be included in the final image only once.

Bit 12

Bit 12 encodes the zero-initialised attribute, specifying that the area has no initialising data in this object file, and that the area contents are missing from the OBJ_AREA chunk. Typically, this attribute is given to large uninitialised data areas. When an uninitialised area is included in an image, the linker either includes a read-write area of binary zeroes of appropriate size, or maps a read-write area of appropriate size that will be zeroed at image start-up time. This attribute is incompatible with the read-only attribute (see Bit 13, below).

Whether or not a zero-initialised area is re-zeroed if the image is re-entered is a property of the relevant image format and/or the system on which it will be executed. The definition of AOF neither requires nor precludes re-zeroing.

To summarise, bits 10, 11 and 12 interact as follows:

12 11 10   Interaction
0 0 1   Initialised common definition
0 1 1   Initialised common definition
0 1 0   Uninitialised reference to common block
1 0 1   Uninitialised reference to common block
1 1 0   Uninitialised reference to common block
1 1 1   Uninitialised reference to common block
1 0 0   Zero-initialised (bss = unnamed common reference)

So, an initialised common definition is inferred if bit 10 is set and bit 11 is not, a Zero-initialised area is inferred if bit 12 is set and both bits 10 and 11 are unset, all other bit combinations infer an uninitialised reference to common block.

Bit 13

Bit 13 encodes the read only attribute and denotes that the area will not be modified following relocation by the linker. The linker groups read-only areas together so that they may be write protected at run-time, hardware permitting. Code areas and debugging tables should have this bit set. The setting of this bit is incompatible with the setting of bit 12.

Bit 14

Bit 14 encodes the position independent (PI) attribute, usually only of significance for code areas. Any reference to a memory address from a PI area must be in the form of a link-time-fixed offset from a base register (e.g. a PC-relative branch offset).

Bit 15

Bit 15 encodes the debugging table attribute and denotes that the area contains symbolic debugging tables. The linker groups these areas together so they can be accessed as a single continuous chunk at or before run-time (usually, a debugger will extract its debugging tables from the image file prior to starting the debuggee).

Usually, debugging tables are read-only and, therefore, have bit 13 set also. In debugging table areas, bit 9 (the code attribute) is ignored.

Bits 16-19 encode additional attributes of code areas and shall be non-0 only if the area has the code attribute (bit 9 set).

Bit 16

Bit 16 encodes the 32-bit PC attribute, and denotes that code in this area complies with a 32-bit variant of the ARM Procedure Call Standard (APCS). For details, refer to '32-bit PC vs 26-bit PC'. Such code may be incompatible with code which complies with a 26-bit variant of the APCS.

Bit 17

Bit 17 encodes the reentrant attribute, and denotes that code in this area complies with a reentrant variant of the ARM Procedure Call Standard.

Bit 18

Bit 18, when set, denotes that code in this area uses the ARM's extended floating-point instruction set. Specifically, function entry and exit use the LFM and SFM floating-point save and restore instructions rather than multiple LDFEs and STFEs. Code with this attribute may not execute on older ARM-based systems.

Bit 19

Bit 19 encodes the No Software Stack Check attribute, denoting that code in this area complies with a variant of the ARM Procedure Call Standard without software stack-limit checking. Such code may be incompatible with code which complies with a limit-checked variant of the APCS.

Bits 20-27 encode additional attributes of data areas, and shall be non-0 only if the area does not have the code attribute (bit 9) unset.

Bit 20

Bit 20 encodes the based attribute, denoting that the area is addressed via link-time-fixed offsets from a base register (encoded in bits 24-27). Based areas have a special role in the construction of shared libraries and ROM-able code, and are treated specially by the linker.

Bit 21

Bit 21 encodes the Shared Library Stub Data attribute. In a link step involving layered shared libraries, there may be several copies of the stub data for any library not at the top level. In other respects, areas with this attribute are treated like data areas with the common definition (bit 10) attribute. Areas which also have the zero initialised attribute (bit 12) are treated much the same as areas with the common reference (bit 11) attribute.

This attribute is not usually set by language processors, but is set only by the linker.

Bits 22-23

Bits 22-23 are reserved and shall be set to 0.

Bits 24-27

Bits 24-27 encode the base register used to address a based area. If the area does not have the based attribute then these bits shall be set to 0.

Bits 28-31

Bits 28-31 are reserved and shall be set to 0.

Area Attributes Summary
Bit Mask Attribute Description
8 0x00000100 Absolute attribute
9 0x00000200 Code attribute
10 0x00000400 Common block definition
11 0x00000800 Common block reference
12 0x00001000 Uninitialised (0-initialised)
13 0x00002000 Read only
14 0x00004000 Position independent
15 0x00008000 Debugging tables
Code areas only
16 0x00010000 Complies with the 32-bit APCS
17 0x00020000 Reentrant code
18 0x00040000 Uses extended FP inst set
19 0x00080000 No software stack checking
Data areas only
20 0x00100000 Based area
21 0x00200000 Shared library stub data
24-27 0x0F000000 Base register for based area
Format of the areas chunk

The areas chunk (ChunkId of OBJ_AREA) contains the actual areas (code, data, zero- initialised data, debugging data, etc.) plus any associated relocation information. Graphically, an area's layout is:

APPE-5.GIF

An area is simply a sequence of byte values. The endian-ness of the words and half-words within it shall agree with that of the containing AOF file.

An area is followed by its associated table of relocation directives (if any). An area is either completely initialised by the values from the file or is initialised to zero, as specified by bit 12 of its area attributes.

Both the area contents and the table of relocation directives are aligned to 4-byte boundaries.

Relocation directives

A relocation directive describes a value which is computed at link time or load time, but which cannot be fixed when the object module is created.

In the absence of applicable relocation directives, the value of a byte, halfword, word or instruction from the preceding area is exactly the value that will appear in the final image.

A field may be subject to more than one relocation.

Pictorially, a relocation directive looks like:

APPE-6.GIF

Offset

Offset is the byte offset in the preceding area of the subject field to be relocated by a value calculated as described below.

SID (Subject Identification)

The interpretation of the 24-bit SID field depends on the A bit.

If A (bit 27) is 1, the subject field is relocated (as further described below) by the value of the symbol of which SID is the 0-origin index in the symbol table chunk.

If A (bit 27) is 0, the subject field is relocated (as further described below) by the base of the area of which SID is the 0-origin index in the array of areas, (or, equivalently, in the array of area headers).

FT (Field Type)

The 2-bit field type FT (bits 25, 24) describes the subject field:

00 the field to be relocated is a byte
01 the field to be relocated is a half-word (2 bytes)
10 the field to be relocated is a word (4 bytes)
11 the field to be relocated is an instruction or instruction sequence

Bytes, halfwords and instructions may only be relocated by values of suitably small size. Overflow is faulted by the linker.

An ARM branch, or branch-with-link instruction is always a suitable subject for a relocation directive of field type instruction.

II (Instruction Instruction)

If the subject field is an instruction sequence (FT = 11), then Offset addresses the first instruction of the sequence and the II field (bits 29 and 30) constrains how many instructions may be modified by this directive:

00 no constraint (the linker may modify as many contiguous instructions as it needs to)
01 the linker will modify at most 1 instruction
10 the linker will modify at most 2 instructions
11 the linker will modify at most 3 instructions
R (relocation type)

The way the relocation value is used to modify the subject field is determined by the R (PC-relative) bit, modified by the B (based) bit.

R (bit 26) = 1 and B (bit 28) = 0 specifies PC-relative relocation: to the subject field is added the difference between the relocation value and the base of the area containing the subject field. In pseudo C:

subject_field = subject_field + (relocation_value - base_of_area_containing(subject_field))

As a special case, if A is 0, and the relocation value is specified as the base of the area containing the subject field, then it is not added and:

subject_field = subject_field - base_of_area_containing(subject_field)

This caters for relocatable PC-relative branches to fixed target addresses.

If R is 1, B is usually 0. If B is 1 this is used to denote that the inter-link-unit value of a branch destination is to be used, rather than the more usual intra-link-unit value (this allows compilers to perform the tail-call optimisation on reentrant code).

R (bit 26) = 0 and B (bit 28) = 0, specifies plain additive relocation: the relocation value is added to the subject field. In pseudo C:

subject_field = subject_field + relocation_value

R (bit 26) = 0 and B (bit 28) = 1, specifies based area relocation. The relocation value must be an address within a based data area. The subject field is incremented by the difference between this value and the base address of the consolidated based area group (the linker consolidates all areas based on the same base register into a single, contiguous region of the output image). In pseudo C:

subject_field = subject_field + (relocation_value - base_of_area_group_containing(relocation_value))

For example, when generating reentrant code, the C compiler will place address constants in an adcon area based on register sb, and load them using sb relative LDRs. At link time, separate adcon areas will be merged and sb will no longer point where presumed at compile time. B type relocation of the LDR instructions corrects for this.

Bits 29-31

Bit 31 of the relocation flags word shall be 1, and (unless FT bits are 11) bits 29 and 30 shall be 0.

Format of the symbol table chunk

The Number of Symbols field in the fixed part of the AOF header (OBJ_STRT) defines how many entries there are in the symbol table. Each symbol table entry has the following format:

APPE-7.GIF

Name

This value is an index into the string table (in chunk OBJ_STRT) and thus locates the character string representing the symbol.

Value

This is only meaningful if the symbol is a defining occurrence (bit 0 of Attributes set), or a common symbol (bit 6 of Attributes set):

  • if the symbol is absolute (bits 0,2 of Attributes set), this field contains the value of the symbol
  • if the symbol is a common symbol (bit 6 of Attributes set), this field contains the byte-length of the referenced common area
  • otherwise, Value is interpreted as an offset from the base address of the area named by Area Name, which must be an area defined in this object file.
Area Name

is meaningful only if the symbol is a non-absolute defining occurrence (bit 0 of Attributes set, bit 2 unset). In this case it gives the index into the string table for the name of the area in which the symbol is defined (which must be an area in this object file).

Symbol Attributes

The Symbol Attributes word is interpreted as follows:

  • Bit 0 denotes that the symbol is defined in this object file.
  • Bit 1 denotes that the symbol has global scope and can be matched by the linker to a similarly named symbol from another object file.

Specifically:

Bits 1 and 0
01 (bit 1 unset, bit 0 set)
denotes that the symbol is defined in this object file and has scope limited to this object file (when resolving symbol references, the linker will only match this symbol to references from within the same object file).
10 (bit 1 set, bit 0 unset)
denotes that the symbol is a reference to a symbol defined in another object file. If no defining instance of the symbol is found the linker attempts to match the name of the symbol to the names of common blocks. If a match is found it is as if there were defined an identically-named symbol of global scope, having as its value the base address of the common area.
11 denotes that the symbol is defined in this object file with global scope (when attempting to resolve unresolved references, the linker will match this definition to a reference from another object file).
00 Reserved by Acorn.
Bit 2

Bit 2 encodes the absolute attribute which is meaningful only if the symbol is a defining occurrence (bit 0 set). If set, it denotes that the symbol has an absolute value, for example, a constant. If unset, the symbol's value is relative to the base address of the area defined by the Area Name field of the symbol.

Bit 3

Bit 3 encodes the case insensitive reference attribute which is meaningful only if bit 0 is unset (that is, if the symbol is an external reference). If set, the linker will ignore the case of the symbol names it tries to match when attempting to resolve this reference.

Bit 4

Bit 4 encodes the weak attribute which is meaningful only if the symbol is an external reference, (bits 1,0 = 10). It denotes that it is acceptable for the reference to remain unsatisfied and for any fields relocated via it to remain unrelocated. The linker ignores weak references when deciding which members to load from an object library.

Bit 5

Bit 5 encodes the strong attribute which is meaningful only if the symbol is an external defining occurrence (if bits 1,0 = 11). In turn, this attribute only has meaning if there is a non-strong, external definition of the same symbol in another object file. In this case, references to the symbol from outside of the file containing the strong definition, resolve to the strong definition, while those within the file containing the strong definition resolve to the non-strong definition.

This attribute allows a kind of link-time indirection to be enforced. Usually, a strong definition will be absolute, and will be used to implement an operating system's entry vector having the forever binary property.

Bit 6

Bit 6 encodes the common attribute, which is meaningful only if the symbol is an external reference (bits 1,0 = 10). If set, the symbol is a reference to a common area with the symbol's name. The length of the common area is given by the symbol's Value field (see above). The linker treats common symbols much as it treats areas having the Common Reference attribute - all symbols with the same name are assigned the same base address, and the length allocated is the maximum of all specified lengths.

If the name of a common symbol matches the name of a common area, then these are merged and the symbol identifies the base of the area.

All common symbols for which there is no matching common area (reference or definition) are collected into an anonymous, linker-created, pseudo-area.

Bit 7

Bit 7 is reserved and shall be set to 0.

Bit 8-11

Bits 8-11 encode additional attributes of symbols defined in code areas.

Bit 8 encodes the code datum attribute which is meaningful only if this symbol defines a location within an area having the Code attribute. It denotes that the symbol identifies a (usually read-only) datum, rather than an executable instruction.

Bit 9 encodes the floating-point arguments in floating-point registers attribute. This is meaningful only if the symbol identifies a function entry point. A symbolic reference with this attribute cannot be matched by the linker to a symbol definition which lacks the attribute.

Bit 10 is reserved and shall be set to 0.

Bit 11 is the simple leaf function attribute which is meaningful only if this symbol defines the entry point of a sufficiently simple leaf function (a leaf function is one which calls no other function). For a reentrant leaf function it denotes that the function's inter-link-unit entry point is the same as its intra-link-unit entry point.

Bit 12-31

Bits 12-31 are reserved and shall be set to 0.

Symbol Attribute Summary
Bit Mask Attribute Description
0 0x00000001 Symbol is defined in this file
1 0x00000002 Symbol has global scope
2 0x00000004 Absolute attribute
3 0x00000008 Case-insensitive attribute
4 0x00000010 Weak attribute
5 0x00000020 Strong attribute
6 0x00000040 Common attribute

Code symbols only

8 0x00000100 Code area datum attribute
9 0x00000200 FP args in FP regs attribute
11 0x00000800 Simple leaf function attribute
String table chunk (OBJ_STRT)

The string table chunk contains all the print names referred to from the header and symbol table chunks. This separation is made to factor out the variable length characteristic of print names from the key data structures.

A print name is stored in the string table as a sequence of non-control characters (codes 32-126 and 160-255) terminated by a NUL (0) byte, and is identified by an offset from the start of the table. The first 4 bytes of the string table contain its length (including the length of its length word), so no valid offset into the table is less than 4, and no table has length less than 4.

The endian-ness of the length word shall be identical to the endian-ness of the AOF and chunk files containing it.

Identification chunk (OBJ_IDFN)

This chunk should contain a string of printable characters (codes 10-13 and 32-126) terminated by a NUL (0) byte, which gives information about the name and version of the tool which generated the object file. Use of codes in the range 128-255 is discouraged, as the interpretation of these values is host dependent.

ALF

ALF is the format of linkable libraries (such as the C RISC OS Toolbox library toolboxlib).

Library file format

For library files, the first part of each chunk's name is 'LIB_'; for object libraries, the names of the additional two chunks begin with 'OFL_'.

Each piece of a library file is stored in a separate, identifiable chunk, named as follows:

Chunk Chunk Name  
Directory LIB_DIRY  
Time-stamp LIB_TIME  
Version LIB_VSRN  
Data LIB_DATA  
Symbol table OFL_SYMT - object code libraries only
Time-stamp OFL_TIME - object code libraries only

There may be many LIB_DATA chunks in a library, one for each library member. In all chunks, word values are stored with the same byte order as the target system; strings are stored in ascending address order, which is independent of target byte order.

LIB_DIRY

The LIB_DIRY chunk contains a directory of the modules in the library, each of which is stored in a LIB_DATA chunk. The directory size is fixed when the library is created. The directory consists of a sequence of variable length entries, each an integral number of words long. The number of directory entries is determined by the size of the LIB_DIRY chunk.

This is shown pictorially in the following diagram:

APPE-8.GIF

ChunkIndex

ChunkIndex is a word containing the 0-origin index within the chunk file header of the corresponding LIB_DATA chunk. Conventionally, the first 3 chunks of an OFL file are LIB_DIRY, LIB_TIME and LIB_VSRN, so ChunkIndex is at least 3. A ChunkIndex of 0 means the directory entry is unused.

The corresponding LIB_DATA chunk entry gives the offset and size of the library module in the library file.

EntryLength

EntryLength is a word containing the number of bytes in this LIB_DIRY entry, always a multiple of 4.

DataLength

DataLength is a word containing the number of bytes used in the data section of this LIB_DIRY entry, also a multiple of 4.

Data

The Data section consists of, in order:

  • a 0-terminated string (the name of the library member)
  • any other information relevant to the library module (often empty)
  • a 2-word, word-aligned time stamp.

Strings should contain only ISO-8859 non-control characters (codes [0-31], 127 and 128+[0-31] are excluded).

The string field is the name used to identify this library module. Typically it is the name of the file from which the library member was created.

The format of the time stamp is described in Time Stamps. Its value is an encoded version of the last-modified time of the file from which the library member was created.

To ensure maximum robustness with respect to earlier, now obsolete, versions of the ARM object library format:

  • Applications which create libraries or library members should ensure that the LIB_DIRY entries they create contain valid time stamps.
  • Applications which read LIB_DIRY entries should not rely on any data beyond the end of the name string being present, unless the difference between the DataLength field and the name-string length allows for it. Even then, the contents of a time stamp should be treated cautiously and not assumed to be sensible.

Applications which write LIB_DIRY or OFL_SYMT entries should ensure that padding is done with NUL (0) bytes; applications which read LIB_DIRY or OFL_SYMT entries should make no assumptions about the values of padding bytes beyond the first, string-terminating NUL byte.

Time Stamps

A library time stamp is a pair of words encoding the following:

  • a 6-byte count of centi-seconds since the start of the 20th century
  • a 2-byte count of microseconds since the last centi-second (usually 0).

APPE-9.GIF

The first word stores the most significant 4 bytes of the 6-byte count; the least significant 2 bytes of the count are in the most significant half of the second word.

The least significant half of the second word contains the microsecond count and is usually 0.

Time stamp words are stored in target system byte order: they must have the same endian-ness as the containing chunk file.

LIB_TIME

The LIB_TIME chunk contains a 2-word time stamp recording when the library was last modified. It is, hence, 8 bytes long.

LIB_VSRN

The version chunk contains a single word whose value is 1.

LIB_DATA

A LIB_DATA chunk contains one of the library members indexed by the LIB_DIRY chunk. The endian-ness or byte order of this data is, by assumption, the same as the byte order of the containing library/chunk file.

No other interpretation is placed on the contents of a member by the library management tools. A member could itself be a file in chunk file format or even another library.

Object Code Libraries

An object code library is a library file whose members are files in ARM Object Format (see AOF for details).

An object code library contains two additional chunks: an external symbol table chunk named OFL_SYMT; and a time stamp chunk named OFL_TIME.

OFL_SYMT

The external symbol table contains an entry for each external symbol defined by members of the library, together with the index of the chunk containing the member defining that symbol.

The OFL_SYMT chunk has exactly the same format as the LIB_DIRY chunk except that the Data section of each entry contains only a string, the name of an external symbol, and between 1 and 4 bytes of NUL padding, as follows:

APPE-10.GIF

OFL_SYMT entries do not contain time stamps.

OFL_TIME

The OFL_TIME chunk records when the OFL_SYMT chunk was last modified and has the same format as the LIB_TIME chunk (see Time Stamps).

AIF

ARM Image Format (AIF) is a simple format for ARM executable images, which consists of a 128 byte header followed by the image's code, followed by the image's initialised static data.

Properties of AIF

Two variants of AIF exist:

  • Executable AIF (in which the header is part of the image itself) can be executed by entering the header at its first word. Code in the header ensures the image is properly prepared for execution before being entered at its entry address.
  • Non-executable AIF (in which the header is not part of the image, but merely describes it) is intended to be loaded by a program which interprets the header, and prepares the following image for execution.

The two flavours of AIF are distinguished as follows:

  • The fourth word of an executable AIF header is BL entrypoint. The most significant byte of this word (in the target byte order) is 0xEB.
  • The fourth word of a non-executable AIF image is the offset of its entry point from its base address. The most significant nibble of this word (in the target byte order) is 0x0.

The base address of an executable AIF image is the address at which its header should be loaded; its code starts at base + 0x80. The base address of a non-executable AIF image is the address at which its code should be loaded.

Executable AIF

The following remarks about executable AIF apply also to non-executable AIF, except that loader code must interpret the AIF header and perform any required decompression, relocation, and creation of zero-initialised data. Compression and relocation are, of course, optional: AIF is often used to describe very simple absolute images.

It is assumed that on entry to a program in ARM Image Format (AIF), the general registers contain nothing of value to the program (the program is expected to communicate with its operating environment using SWI instructions or by calling functions at known, fixed addresses).

A program image in ARM Image Format is loaded into memory at its load address, and entered at its first word. The load address may be:

  • an implicit property of the type of the file containing the image (as is usual with UNIX executable file types, Acorn Absolute file types, etc.)
  • read by the program loader from offset 0x28 in the file containing the AIF image
  • given by some other means, e.g. by instructing an operating system or debugger to load the image at a specified address in memory.

An AIF image may be compressed and can be self-decompressing (to support faster loading from slow peripherals, and better use of space in ROMs and delivery media such as floppy discs). An AIF image is compressed by a separate utility which adds self-decompression code and data tables to it.

If created with appropriate linker options, an AIF image may relocate itself at load time. Two kinds of self-relocation are supported:

  • relocate to load address (the image can be loaded anywhere and will execute where loaded)
  • self-move up memory, leaving a fixed amount of workspace above, and relocate to this address (the image is loaded at a low address and will move to the highest address which leaves the required workspace free before executing there).

The second kind of self-relocation can only be used if the target system supports an operating system or monitor call which returns the address of the top of available memory. The ARM linker provides a simple mechanism for using a modified version of the self-move code illustrated in Self-Move and Self-Relocation Code, allowing AIF to be easily tailored to new environments.

AIF images support being debugged by the Desktop debugging tool (DDT). Low-level and source-level support are orthogonal, and both, either, or neither kind of debugging support need be present in an AIF image.

For details of the format of the debugging tables see ASD.

References from debugging tables to code and data are in the form of relocatable addresses. After loading an image at its load address these values are effectively absolute. References between debugger table entries are in the form of offsets from the beginning of the debugging data area. Thus, following relocation of a whole image, the debugging data area itself is position independent and may be copied or moved by the debugger.

The Layout of AIF

The layout of a compressed AIF image is as follows:

APPE-11.GIF

The header is small, fixed in size, and described below. In a compressed AIF image, the header is not compressed.

An uncompressed image has the following layout:

APPE-12.GIF

Debugging data is absent unless the image has been linked using the linker's -d option and, in the case of source-level debugging, unless the components of the image have been compiled using the compiler's -g option.

The relocation list is a list of byte offsets from the beginning of the AIF header, of words to be relocated, followed by a word containing -1. The relocation of non-word values is not supported.

After the execution of the self-relocation code - or if the image is not self-relocating - the image has the following layout:

APPE-13.GIF

At this stage a debugger is expected to copy any debugging data to somewhere safe, otherwise it will be overwritten by the zero-initialised data and/or the heap/stack data of the program. A debugger can seize control at the appropriate moment by copying, then modifying, the third word of the AIF header (see AIF Header Layout).

AIF Header Layout

APPE-14.GIF

Notes

NOP is encoded as MOV r0, r0.

BL is used to make the header addressable via r14 in a position-independent manner, and to ensure that the header will be position-independent. Care is taken to ensure that the instruction sequences which compute addresses from these r14 values work in both 26-bit and 32-bit ARM modes.

Program Exit Instruction will usually be a SWI causing program termination. On systems which lack this, a branch-to-self is recommended. Applications are expected to exit directly and not to return to the AIF header, so this instruction should never be executed. The ARM linker sets this field to SWI 0x11 by default, but it may be set to any desired value by providing a template for the AIF header in an area called AIF_HDR in the first object file in the input list to Link.

Image ReadOnly Size includes the size of the AIF header only if the AIF type is executable (that is, if the header itself is part of the image).

An AIF image is re-startable if, and only if, the program it contains is re-startable (note: an AIF image is not reentrant). If an AIF image is to be re-started then, following its decompression, the first word of the header must be set to NOP. Similarly, following self-relocation, the second word of the header must be reset to NOP. This causes no additional problems with the read-only nature of the code segment: both decompression and relocation code must write to it. On systems with memory protection, both the decompression code and the self-relocation code must be bracketed by system calls to change the access status of the read-only section (first to writable, then back to read-only).

The image debug type has the following meaning:

0: No debugging data are present.
1: Low-level debugging data are present.
2: Source level (ASD) debugging data are present.
3: 1 and 2 are present together.

All other values of image debug type are reserved to ARM Ltd.

Debug Initialisation Instruction (if used) is expected to be a SWI instruction which alerts a resident debugger that a debuggable image is commencing execution. Of course, there are other possibilities within the AIF framework. The linker sets this field to NOP by default, but it can be customised by providing your own template for the AIF header in an area called AIF_HDR in the first object file in the input list to Link.

The Address mode word (at offset 0x30) is 0, or contains in its least significant byte (using the byte order appropriate to the target):

  • the value 26, indicating the image was linked for a 26-bit ARM mode, and may not execute correctly in a 32-bit mode
  • the value 32, indicating the image was linked for a 32-bit ARM mode, and may not execute correctly in a 26-bit mode.

A value of 0 indicates an old-style 26-bit AIF header.

If the Address mode word has bit 8 set ((address_mode & 0x100) != 0), then the image was linked with separate code and data bases (usually the data is placed immediately after the code). In this case, the word at offset 0x34 contains the base address of the image's data.

Zero-Initialisation Code

The Zero-initialisation code is as follows:

ZeroInit
    NOP                       ; or <Debug Init Instruction>
    SUB     ip, lr, pc        ; base+12+[PSR]-(ZeroInit+12+PSR])
                              ; = base-ZeroInit
    ADD     ip, pc, ip        ; base-ZeroInit+ZeroInit+16
                              ; = base+16
    LDMIB   ip, {r0,r1,r2,r3} ; various sizes
    SUB     ip, ip, #16       ; image base
    LDR     r2, [ip, #48]     ; flags
    TST     r2, #256          ; separate data area?
    LDRNE   ip, [ip, #52]     ; Yes, so get it...
    ADDEQ   ip, ip, r0        ; No, so add + RO size
    ADD     ip, ip, r1        ; + RW size = base of 0-init area
    MOV     r0, #0
    CMPS    r3, #0
00  MOVLE   pc, lr            ; nothing left to do
    STR     r0, [ip],#4
    SUBS    r3, r3, #4
    B       %B00

Self-Move and Self-Relocation Code

This code is added to the end of an AIF image by the linker, immediately before the list of relocations (which is terminated by -1). Note that the code is entered via a BL from the second word of the AIF header so, on entry, r14 points to AIFHeader + 8. In 26-bit ARM modes, r14 also contains a copy of the PSR flags.

On entry, the relocation code calculates the address of the AIF header (in a CPU-independent fashion) and decides whether the image needs to be moved. If the image doesn't need to be moved, the code branches to R(elocateOnly).

RelocCode
    NOP                   ; required by ensure_byte_order()
                          ; and used below.
    SUB    ip, lr, pc     ; base+8+[PSR]-(RelocCode+12+[PSR])
                          ; = base-4-RelocCode
    ADD    ip, pc, ip     ; base-4-RelocCode+RelocCode+16 = base+12
    SUB    ip, ip, #12    ; -> header address
    LDR    r0, RelocCode  ; NOP
    STR    r0, [ip, #4]   ; won't be called again on image re-entry
    LDR    r9, [ip, #&2C] ; min free space requirement
    CMPS   r9, #0         ; 0 => no move, just relocate
    BEQ    RelocateOnly

If the image needs to be moved up memory, then the top of memory has to be found. Here, a system service (SWI 0x10) is called to return the address of the top of memory in r1. This is, of course, system specific and should be replaced by whatever code sequence is appropriate to the environment.

    LDR    r0, [ip, #&20]    ; image zero-init size
    ADD    r9, r9, r0        ; space to leave = min free + zero init
    SWI    #&10              ; return top of memory in r1.

The following code calculates the length of the image inclusive of its relocation data, and decides whether a move up store is possible.

    ADR    r2, End           ; -> End
01  LDR    r0, [r2], #4      ; load relocation offset, increment r2
    CMNS   r0, #1            ; terminator?
    BNE    %B01              ; No, so loop again
    SUB    r3, r1, r9        ; MemLimit - freeSpace
    SUBS   r0, r3, r2        ; amount to move by
    BLE    RelocateOnly      ; not enough space to move...
    BIC    r0, r0, #15       ; a multiple of 16...
    ADD    r3, r2, r0        ; End + shift
    ADR    r8, %F02          ; intermediate limit for copy-up

Finally, the image copies itself four words at a time, being careful about the direction of copy, and jumping to the copied copy code as soon as it has copied itself.

02  LDMDB  r2!, {r4-r7}
    STMDB  r3!, {r4-r7}
    CMPS   r2, r8            ; copied the copy loop?
    BGT    %B02              ; not yet
    ADD    r4, pc, r0
    MOV    pc, r4            ; jump to copied copy code
03  LDMDB  r2!, {r4-r7}
    STMDB  r3!, {r4-r7}
    CMPS   r2, ip            ; copied everything?
    BGT    %B03              ; not yet
    ADD    ip, ip, r0        ; load address of code
    ADD    lr, lr, r0        ; relocated return address

Whether the image has moved itself or not, control eventually arrives here, where the list of locations to be relocated is processed. Each location is word sized and is relocated by the difference between the address the image was loaded at (the address of the AIF header) and the address the image was linked at (stored at offset 0x28 in the AIF header).

RelocateOnly
    LDR    r1, [ip, #&28]    ; header + 0x28 = code base set by Link
    SUBS   r1, ip, r1        ; relocation offset
    MOVEQ  pc, lr            ; relocate by 0 so nothing to do
    STR    ip, [ip, #&28]    ; new image base = actual load address
    ADR    r2, End           ; start of reloc list
04  LDR    r0, [r2], #4      ; offset of word to relocate
    CMNS   r0, #1            ; terminator?
    MOVEQ  pc, lr            ; yes => return
    LDR    r3, [ip, r0]      ; word to relocate
    ADD    r3, r3, r1        ; relocate it
    STR    r3, [ip, r0]      ; store it back
    B      %B04              ; and do the next one
End                          ; The list of offsets of locations to
                             ; relocate starts here, terminated by -1

You can customise the self-relocation and self-moving code generated by Link by providing your version of it in an area called AIF_RELOC in the first object file in Link's input list.

ASD

Acknowledgement: This design is based on work originally done for Acorn Computers by Topexpress Ltd.

This section specifies the format of symbolic debugging data generated by ARM compilers, which is used by the Desktop debugging tool (DDT) to support high level language oriented, interactive debugging.

For each separate compilation unit (called a section) the compiler produces debugging data, and a special area in the object code (see AOF for an explanation of ARM Object Format, including areas and their attributes). Debugging data are position independent, containing only relative references to other debugging data within the same section, and relocatable references to other compiler-generated areas.

Debugging data areas are combined by the linker into a single contiguous section of a program image. For a description of the linker's principal output format see AIF.

Since the debugging section is position-independent, the debugger can move it to a safe location before the image starts executing. If the image is not executed under debugger control, the debugging data are simply overwritten.

The format of debugging data allows for a variable amount of detail. This potentially allows the user to trade off among memory used, disc space used, execution time, and debugging detail.

Assembly-language level debugging is also supported, though in this case the debugging tables are generated by the linker. If required, the assembler can generate debugging table entries relating code addresses to source lines. Low-level debugging tables appear in an extra section item, as if generated by an independent compilation (see Debugging Data Items in Detail). Low-level and high-level debugging are orthogonal facilities, though DDT allows the user to move smoothly between levels if both sets of debugging data are present in an image.

Order of Debugging Data

A debug data area consists of a series of items. The arrangement of these items mimics the structure of the high-level language program itself.

For each debug area, the first item is a section item, giving global information about the compilation, including a code identifying the language, and flags indicating the amount of detail included in the debugging tables.

Each datum, function, procedure, etc., definition in the source program has a corresponding debug data item; these items appear in an order corresponding to the order of definitions in the source. This means that any nested structure in the source program is preserved in the debugging data, and the debugger can use this structure to make deductions about the scope of various source-level objects. Of course, for procedure definitions, two debug items are needed: a procedure item to mark the definition itself, and an endproc item to mark the end of the procedure's body and the end of any nested definitions. If procedure definitions are nested then the procedure-endproc brackets are nested too. Variable and type definitions made at the outermost level, of course, appear outside of all procedure/endproc items.

Information about the relationship between the executable code and source files is collected together and appears as a fileinfo item, which is always the final item in a debugging area. Because of the C language's #include facility, the executable code produced from an outer-level source file may be separated into disjoint pieces interspersed with that produced from the included files. Therefore, source files are considered to be collections of 'fragments', each corresponding to a contiguous area of executable code, and the fileinfo item is a list with an entry for each file, each in turn containing a list with an entry for each fragment. The fileinfo field in the section item addresses the fileinfo item itself. In each procedure item there is a 'fileentry' field, which refers to the file-list entry for the source file containing the procedure's start; there is a separate one in the endproc item because it may possibly not be in the same source file.

Endian-ness and the Encoding of Debugging Data

The ARM can be configured to use either a little-endian memory system (the least significant byte of each 4-byte word has the lowest address), or a big-endian memory system (the most significant byte of each 4-byte word has the lowest address).

In general, the code to be generated varies according to the endian-ness (or byte-sex) of the target. The linker has insufficient information to change an object file's byte sex, so object files are encoded using the byte order of the intended target, independently of the byte order of the host system on which the compiler or assembler runs. The linker accepts inputs having either byte order, but rejects mixed sex inputs, and generates its output using the same byte order.

This means that producers of debugging tables must be prepared to generate them in either byte order, as required. In turn, this requires definitions to be very clear about when a 4-byte word is being used (which will require reversal on output or input when cross-sex compiling or debugging), and when a sequence of bytes is being used (which requires no special treatment provided it is written and read as a sequence of bytes in address order).

Representation of Data Types

Several of the debugging data items (e.g. procedure and variable) have a type word field to identify their data type. This field contains, in the most significant 24 bits, a code to identify a base type, and in the least significant 8 bits, a pointer count:

0 to denote the type itself
1 to denote a pointer to the type
2 to denote a pointer to a pointer to...
etc.

For simple types the code is a positive integer as follows, (all codes are decimal):

void 0
signed integers
single byte 10
half-word 11
word 12
unsigned integers
single byte 20
half-word 21
word 22
floating point
float 30
double 31
long double 32
complex
single complex 41
double complex 42
functions
function 100

For compound types (arrays, structures, etc.) there is a special kind of debug data item (array, struct, etc.) to give details such as array bounds and field types. The type code for compound types is negative, the negation of the (byte) offset of the debug item from the start of the debugging area.

If a type has been given a name in a source program, it will give rise to a type debugging data item which contains the name and a type word as defined above. If necessary, there will also be a debugging data item, such as an array or struct item, to define the type itself. In that case, the type word will refer to this item.

Set types in Pascal are not treated in detail: the only information recorded for them is the total size occupied by the object in bytes. Neither are Pascal file variables supported by the debugger, since their behaviour under debugger control is unlikely to be helpful to the user.

FORTRAN character types are supported by special kinds of debugging data item, the format of which is specific to each FORTRAN compiler.

Representation of Source File Positions

Several of the debugging data items have a sourcepos field to identify a position in the source file. This field contains a line number and character position within the line packed into a single word. The most significant 10 bits encode the character offset (0-based) from the start of the line and the least-significant 22 bits give the line number.

Debugging Data Items in Detail

The Code and Length Field

The first word of each debugging data item contains the byte length of the item (encoded in the most significant 16 bits), and a code identifying the kind of item (in the least significant 16 bits). The defined codes are:

1 section
2 procedure/function definition
3 endproc
4 variable
5 type
6 struct
7 array
8 subrange
9 set
10 fileinfo
11 contiguous enumeration
12 discontiguous enumeration
13 procedure/function declaration
14 begin naming scope
15 end naming scope

The meaning of the second and subsequent words of each item is defined below.

If a debugger encounters a code it does not recognise, it should use the length field to skip the item entirely. This discipline allows the debugging tables to be extended without invalidating existing debuggers.

Text Names in Items

Where items include a string field, the string is packed into successive bytes beginning with a length byte, and padded at the end to a word boundary with 0 bytes. The length of a string is in the range [0..255] bytes.

Offsets in File and Addresses in Memory

Where an item contains a field giving an offset in the debugging data area (usually to address another item), this means a byte offset from the start of the debugging data for the whole section (in other words, from the start of the section item).

When the same structure is used to map debugging data in memory, an offset field may be used to hold a pointer to another debug item in memory, rather than the offset of it in the debug area.

Section Items

A section item is the first item of each section of the debugging data. After its code and length word it contains the fields listed below. First there are 4 flag bytes:

lang a byte identifying the source language
flags a byte describing the level of detail
unused  
asdversion a byte version number of the debugging data

The following language byte codes are defined:

LANG_NONE 0 Low-level debugging data only
LANG_C 1 C source level debugging data
LANG_PASCAL 2 Pascal source level debugging data
LANG_FORTRAN 3 FORTRAN-77 source level debugging data
LANG_ASM 4 ARM Assembler line number data

All other codes are reserved to ARM.

The flags byte uses the following mask values:

1 debugging data contains line-number information
2 debugging data contains information about top-level variables
3 both of the above

The asdversion byte should be set to 3, the version of this definition.

The flag bytes are followed by the following word-sized fields:

codestart address of first instruction in this section
datastart address of start of static data for this section
codesize byte size of executable code in this section
datasize byte size of the static data in this section
fileinfo offset in the debugging area of the fileinfo item for this section (0 if no fileinfo item present)
debugsize total byte length of debug data for this section
name or nsyms string or integer
(the first byte of string is the string's length, followed by a non-NULL-terminated string of characters with NULL padding up to the next word boundary)

codestart and datastart are addresses, relocated by the linker. The fileinfo field, nominally an offset, is also used as a pointer when this structure is mapped in memory. The fileinfo field is 0 if no source file information is present.

The name field contains the program name for Pascal and FORTRAN programs. For C programs it contains a name derived by the compiler from the root file name (notionally a module name). In each case, the name is similar to a variable name in the source language. For a low-level debugging section (language = 0), the field is treated as a 4 byte integer giving the number of symbols following.

For linker-generated low-level debugging data, the fields have the following values:

language 0
codestart Image$$RO$$Base
datastart Image$$RW$$Base
codesize Image$$RO$$Limit - Image$$RO$$Base
datasize Image$$RW$$Limit - Image$$RW$$Base
fileinfo 0
nsyms number of symbols in the following debugging data
debugsize total size of the low-level debugging data including the size of this section item

For linker-generated low-level debugging data, the section item is followed by nsyms symbol items, each consisting of 2 words:

sym flags + byte offset in string table of symbol name
value the value of the symbol

sym encodes an index into the string table in the 24 least significant bits, and the following flag values in the 8 most significant bits:

ASD_GLOBSYM 0 if the symbol is absolute
ASD_ABSSYM 0x01000000L if the symbol is global
ASD_TEXTSYM 0x02000000L if the symbol names code
ASD_DATASYM 0x04000000L if the symbol names data
ASD_ZINITSYM 0x06000000L if the symbol names 0-initialised data

Note that the linker reduces all symbol values to absolute values, so that the flag values record the history, or origin, of the symbol in the image.

Immediately following the symbol table is the string table, in standard AOF format. It consists of:

  • a length word
  • the strings themselves, each terminated by a NUL (0).

The length word includes the size of the length word, so no offset into the string table is less than 4. The end of the string table is padded with NULs to the next word boundary (so the length is a multiple of 4).

Procedure Items

A procedure item appears once for each procedure or function definition in the source program. Any definitions within the procedure have their related debugging data items between the procedure item and its matching endproc item. After its code and length field, a procedure item contains the following word-sized fields:

type the return type if this is a function, else 0
(see Representation of Data Types)
args the number of arguments
sourcepos the source position of the procedure's start
(see Representation of Data Types)
startaddr address of 1st instruction of procedure prologue
entry address of 1st instruction of the procedure body
(see note below)
endproc offset of the related endproc item (in file) or pointer to related endproc item (in memory)
fileentry offset of the file list entry for the source file (in file) or a pointer to it (in memory)
name string
(the first byte of string is the string's length, followed by a non-NULL-terminated string of characters with NULL padding up to the next word boundary)

The entry field addresses the first instruction following the procedure prologue. That is, the first address at which a high-level breakpoint could sensibly be set. The startaddr field addresses the start of the prologue. That is, the instruction at which control arrives when the procedure is called.

Label Items

A label in a source program is represented by a special procedure item with no matching endproc, (the endproc field is 0 to denote this). Pascal and FORTRAN numerical labels are converted by their respective compilers into strings prefixed by $n.

For FORTRAN77, multiple entry points to the same procedure each give rise to a separate procedure item, all of which have the same endproc offset referring to the unique, matching endproc item.

Endproc Items

An endproc item marks the end of the debugging data items belonging to a particular procedure. It also contains information relating to the procedure's return. After its code and length field, an endproc item contains the following word-sized fields:

sourcepos position in the source file of the procedure's end
(see Representation of Source File Positions)
endpoint address of the code byte after the compiled code for the procedure
fileentry offset of the file-list entry for the procedure's end (in file) or a pointer to it (in memory)
nreturns number of procedure return points (may be 0)
retaddrs array of addresses of procedure return code

If the procedure body is an infinite loop, there will be no return point, so nreturns will be 0. Otherwise each member of retaddrs should point to a suitable location at which a breakpoint may be set 'at the exit of the procedure'. When execution reaches this point, the current stack frame should still be for this procedure.

Variable Items

A variable item contains debugging data relating to a source program variable, or a formal argument to a procedure (the first variable items in a procedure always describe its arguments). After its code and length field, a variable item contains the following word-sized fields:

type type of this variable
(see Representation of Data Types)
sourcepos the source position of the variable
(see Representation of Source File Positions)
storageclass a word encoding the variable's storage class
location see explanation below
name string
(the first byte of string is the string's length, followed by a non-NULL-terminated string of characters with NULL padding up to the next word boundary)

The following codes define the storage classes of variables:

1 external variables (or FORTRAN common)
2 static variables private to one section
3 automatic variables
4 register variables
5 Pascal 'var' arguments
6 FORTRAN arguments
7 FORTRAN character arguments

The meaning of the location field of a variable item depends on the storage class; it contains:

  • an absolute address for static and external variables (relocated by the linker)
  • a stack offset (an offset from the frame pointer) for automatic and var-type arguments
  • an offset into the argument list for FORTRAN arguments
  • a register number for register variables, (the 8 floating point registers are numbered 16..23).

No account is taken of variables which ought to be addressed by +ve offsets from the stack-pointer rather than -ve offsets from the frame-pointer.

The sourcepos field is used by the debugger to distinguish between different definitions having the same name (e.g. identically named variables in disjoint source-level naming scopes such as nested blocks in C).

Type Items

A type item is used to describe a named type in the source language (e.g. a typedef in C). After its code and length field, a type item contains two word-sized fields:

type a type word (see Representation of Data Types)
name string
(the first byte of string is the string's length, followed by a non-NULL-terminated string of characters with NULL padding up to the next word boundary)
Struct Items

A struct item is used to describe a structured data type (e.g. a struct in C or a record in Pascal). After its code and length field, a struct item contains the following word-sized fields:

fields the number of fields in the structure
size total byte size of the structure
fieldtable... an array of fields struct field items

Each struct field item has the following word-sized fields:

offset byte offset of this field within the structure
type a type word (see Representation of Data Types)
name string
(the first byte of string is the string's length, followed by a non-NULL-terminated string of characters with NULL padding up to the next word boundary)

Union types are described by struct items in which all fields have 0 offsets.

C bit fields are not treated in full detail: a bit field is simply represented by an integer starting on the appropriate word boundary (so that the word contains the whole field).

Array Items

An array item is used to describe a one-dimensional array. Multi-dimensional arrays are described as 'arrays of arrays'. Which dimension comes first is dependent on the source language (which is different for C and FORTRAN). After its code and length field, an array item contains the following word-sized fields:

size total byte size of the array
flags see below
basetype a type word (see Representation of Data Types)
lowerbound constant value or location of variable
upperbound constant value or location of variable

If the size field is zero, debugger operations affecting the whole array, rather than individual elements of it, are forbidden.

The following mask values are defined for the flags field:

ARRAY_UNDEF_LBOUND 1 lower bound is undefined
ARRAY_CONST_LBOUND 2 lower bound is a constant
ARRAY_UNDEF_UBOUND 4 upper bound is undefined
ARRAY_CONST_UBOUND 8 upper bound is a constant
ARRAY_VAR_LBOUND 16 lower bound is a variable
ARRAY_VAR_UBOUND 32 upper bound is a variable

A bound is described as undefined when no information about it is available.

A bound is described as constant when its value is known at compile time. In this case, the corresponding bound field gives its value.

If a bound is described as variable, the offset field identifies a variable debug item describing the location containing the bound. In a debug area in an object file, the offset field contains the offset from the start of the debug area to the variable item; in memory it contains a pointer to the corresponding variable item. Note that a variable item may be used to describe a location known to the compiler, which need not correspond to a source language variable.

Subrange Items

A subrange item is used to describe a subrange typed in Pascal. It also serves to describe enumerated types in C, and scalars in Pascal (in which case the base type is understood to be an unsigned integer of appropriate size). After its code and length field, a subrange item contains the following word-sized fields:

sizeandtype see below
lb low bound of subrange
hb high bound of subrange

The sizeandtype field encodes the byte size of container for the subrange (1, 2 or 4) in its least significant 16 bits, and a simple type code (see Representation of Data Types) in its most significant 16 bits. The type code refers to the base type of the subrange.

For example, a subrange 256..511 of unsigned short might be held in 1 byte.

Set Items

A set item is used to describe a Pascal set type. Currently, the description is only partial. After its code and length field, a set item consists of a single word:

size byte size of the object
Enumeration Items

An enumeration item describes a Pascal or C enumerated type. After its code and length word, the description of a 'contiguous enumeration' contains the following word-sized fields

type a type word describing the type of the container for the enumeration (see Representation of Data Types)
count the cardinality of the enumeration
base the first (lowest) value (may be -ve)
nametable a character array containing 'count' names
(see Text Names in Items)
(the first byte of name is the name's length, followed by a non-NULL-terminated string of characters with NULL padding up to the next word boundary)

The description of a discontiguous enumeration (such as the C enumeration enum bits {bit0=1, bit1=2, bit2=4, bit3=8, bit4=16}) contains the following fields after its code and length word:

type as above
count as above
nametable a table of count (value, name) pairs

Each nametable entry has the following format (which is variable in length):

val a word describing the enumerated value (1/2/4/8/16 in the example)
name the name of the enumerated element (may be several words long)
(the first byte of name is the name's length, followed by a non-NULL-terminated string of characters with NULL padding up to the next word boundary)
Function Declaration Items

After its code and length word, a function declaration item contains the following fields:

type a type word (see Representation of Data Types) describing the return type of the function or procedure
argcount the number of arguments to the function
args a sequence of argcount argument description items

Each argument description item contains the following:

type a type word (see Representation of Data Types) describing the type of the argument
name the name of the argument (may be several words)
(the first byte of name is the name's length, followed by a non-NULL-terminated string of characters with NULL padding up to the next word boundary)

An argument descriptor need not be named; in this case the length of the name is zero, and the name field is a single zero word.

Begin and End Naming Scope Items

These debug items are used to mark the beginning and end of a naming scope. They must be properly nested in the debug area.

In each case, after the code and length word, there is one word-sized field:

codeaddress address of the start/end of scope (determined by the code word)
Fileinfo Items

A fileinfo item appears once per section, after all other debugging data items. If the fileinfo item is too large for its length to be encoded in 16 bits, its length field must be written as 0 (since this is the last item in a section and the section header contains the length of the whole section, the length field is strictly redundant.

Each source file is described by a sequence of fragments. Each fragment describes a contiguous region of the file, within which the addresses of compiled code increase monotonically with source file position. The order in which fragments appear in the sequence is not necessarily related to the source file positions to which they refer.

Note that for compilations which make no use of the #include facility, the list of fragments may have only one entry, and all line-number information can be contiguous.

After its code and length word, the fileinfo item is a sequence of file entry items with the following format:

len length of this entry in bytes (including the length of the following fragments)
date date and time when the file was last modified may be 0, indicating not available, or unused)
filename string (or "" if the name is not known)
(the first byte of string is the string's length, followed by a non-NULL-terminated string of characters with NULL padding up to the next word boundary)
fragment data see below

If present, the date field contains the number of seconds since the beginning of 1970 (the Unix date origin).

Following the final file entry item, is a single 0 word marking the end of the sequence.

The fragment data is a word giving the number of following fragments followed by a sequence of fragment items:

n number of fragments following
fragments... n fragment items

Each fragment item consists of 5 words, followed by a sequence of byte pairs and half word pairs, formatted as follows:

size length of this fragment in bytes (including length of following lineinfo items)
firstline linenumber
lastline linenumber
codestart pointer to the start of the fragment's executable code
codesize byte size of the code in the fragment
lineinfo... a variable number of bytes matching line numbers to code addresses

Each lineinfo item describes a source statement and consists of a pair of (unsigned) bytes, possibly followed by a two or three (unsigned) half words, (each half word has the byte ordering appropriate to the target memory system's endian-ness or byte sex).

The short form (pair of bytes) lineinfo item is as follows:

codeinc # bytes of code generated by this statement
lineinc # source space occupied by this statement

lineinc describes how to calculate the source position (line, column) of the next statement from the source position of this one:

If lineinc is in the range 0 <= and < 64, the new position is (line+lineinc,1).
If lineinc >= 64, the new position is (line, column+lineinc -64).

The number of bytes of code generated for a statement may be zero, provided the line increment is non-zero (such an item may describe a block end or block start, for example).

It is not possible to describe a statement which generates no code and no line number increment, as that encoding is used as an escape to the long form lineinfo items described below.

If codeinc is greater than 255, or lineinc is required to describe a line number change greater than 63 or a column change greater than 191, then both bytes are written to describe 0 increments, and the real values are given in the following two or three (unsigned) half words. (Note that there are two ways to describe 0 increments: 0 lines and 0 columns, which serves to discriminate between the two half word and three half word forms). If the starting column for the next statement is 1, the two half word form is used, which in effect is a triple of half words as follows:

zero 2 zero bytes
lineinc # source lines occupied by this statement
codeinc # bytes of code generated by this statement

Note that the order of the lineinc and codeinc half words is the reverse of the corresponding bytes.

If the starting column for the next statement is not 1, the three half word form is used, which in effect is a quadruple of half words, as follows:

  codeinc = 0, lineinc = 64
lineinc # source lines occupied by this statement
codeinc # bytes of code generated by this statement
newcol starting column for the next statement

Note as above that the order of the lineinc and codeinc half words is the reverse of the corresponding bytes. Note also that the column item here is the absolute column number for the next statement, and not an increment as in the two byte form.

(This encoding of lineinfo items is an incompatible change from the previous format (version 2): in that format, lineinc in a two byte lineinfo item always describes a line increment, and accordingly, there is no four half word form. Programs interpreting asd tables should interpret lineinfo items differently according to the table format in the section item.)

This edition Copyright © 3QD Developments Ltd 2015
Last Edit: Tue,03 Nov 2015