SHELF Loading
SHELF Loading is a new type of ELF binary reflective loading that my colleague @Anonymous_ and I first documented on April 21st 2021.
This new ELF reflective loading
methodology enables the capability to generate compiler-based artifacts with properties that resemble those of shellcode
.
These compiler-based artifacts are ultimately a Hybrid ELF
file between conventional static and PIE binaries.
Had the pleasure to publish this research at Tmp0ut, a Linux VX zine. This was Volume #1, and its first release and their official site can be found at tmpout.sh.
Make sure to check other of the great articles published on this first volume if ELF mangling/golfing/fuckery is for you :)
Introduction⌗
Over the last several years there have been several enhancements in Linux offensive tooling in terms of sophistication and complexity. Linux malware has become increasingly more popular, given the higher number of public reports documenting Linux threats. These include government-backed Linux implants such as APT28’s VPNFilter, Drovorub or Winnti wide range of Linux Malware.
However, this increase in popularity does not seem to have had much of an impact in the totality of the sophistication of the current Linux threat landscape just yet. It’s a fairly young ecosystem, where cybercriminals have not been able to identify reliable targets for monetization apart from Cryptocurrency Mining, DDoS, and more recently, Ransomware operations.
In today’s Linux threat landscape, even the smallest refinement or introduction of complexity often results in AV evasion, and therefore Linux malware authors do not tend to invest unnecessary resources to sophisticate their implants. There are various reasons why this phenomenon occurs, and it is subject to ambiguity. The Linux ecosystem, in contrast to other popular spheres such as Windows and MacOS, is more dynamic and diverse, stemming from the many flavors of ELF files for different architectures, the fact that ELF binaries can be valid in many different forms, and that the visibility of Linux threats is often quite poor.
Due to these issues, AV vendors face a completely different set of challenges detecting these threats. Often times this disproportionate detection failure of simple/unsophisticated threats leaves an implicit impression that Linux malware is by nature not complex. This statement couldn’t be further from the truth, and those familiar with the ELF file format know that there is quite a lot of room for innovation with ELF files that other file formats are not able to provide due to their lack of flexibility, even if we have not seen it abused as much over the years.
In this article we are going to discuss a technique that achieves an uncommon functionality of file formats, which generically converts full executables to shellcode in a way that demonstrates, yet again, another example that ELF binaries can be manipulated to achieve offensive innovation that is hard or impossible to replicate in other file formats.
A Primer On ELF Reflective Loading⌗
In order to understand the technique, we must first give a contextual background on previously known ELF techniques upon which this one is based, with a comparison of the benefits and tradeoffs.
Most ELF packers, or any application implementing any form of ELF binary
loading, are primarily based on what’s known as User-Land-Exec
.
User-Land-Exec is a method first documented by @thegrugq, in which an ELF binary can be loaded without using any of the execve family of system calls, and hence its name.
For the sake of simplicity, the steps to implement an ordinary User-Land-Exec
with support of ET_EXEC
and ET_DYN
ELF binaries is illustrated in the following
diagram, showcasing an implementation of the UPX packer for ELF binaries:
As we can observe, this technique has the following requirements (by @thegrugq):
- Clean out the address space
- If the binary is dynamically linked, load the dynamic linker.
- Load the binary.
- Initialize the stack.
- Determine the entry point (i.e. the dynamic linker or the main executable).
- Transfer execution to the entry point.
On a more technical level, we come up with the following requirements:
- Setup the stack of the embedded executable with its correspondent Auxiliary Vector.
- Parse PHDR’s and identify if there is a PT_INTERP segment, denoting that the file is a dynamically linked executable.
- LOAD interpreter if PT_INTERP is present.
- LOAD target embedded executable.
- Pivot to mapped e_entry of target executable or interpreter accordingly, depending if the target executable is a dynamically linked binary.
For a more in-depth explanation, we suggest reading @thegrugq’s comprehensive paper on the matter [9].
One of the capabilities of conventional User-Land-Exec
are the evasion of an
execve footprint as previously mentioned, in contrast with other techniques
such as memfd_create/execveat
, which are also widely used to load end execute
a target ELF file. Since the loader maps and loads the target executable, the
embedded executable has the flexibility of having a non-conventional structure.
This has the side benefit of being useful for evasion and anti-forensics
purposes.
On the other hand, since there are a lot of critical artifacts involved in the
loading process, it can be easy to recognize by reverse-engineers, as well as
being somewhat fragile due to the fact that the technique is heavily dependent
on these components. For this reason, writing User-Land-Exec
based loaders have
been somewhat tedious. As more features get added to the ELF file format, this
technique has been inclined to mature over time and thereby increasing its
complexity.
The new technique that we will be covering in this paper relies on implementing
a generic User-Land-Exec
loader with a reduced set of constraints supporting a
hybrid PIE
and statically linked ELF binaries that to our knowledge have yet to
be reported.
We believe this technique represents a drastic improvement of previous versions
of User-Land-Exec loaders, since based on the lack of technical implementation
constraints and the nature of this new hybrid static/PIE ELF flavor, the extent
of capabilities it can provide is wider and more evasive than with previous
User-Land-Exec
variants.
Internals Of Static PIE Executable Generation⌗
Background⌗
In July of 2017 H. J. Lu patched a bug entry in GCC bugzilla named ‘Support
creating static PIE’. This patch mentioned the implementation of a statically
based PIE in his branch at glibc hjl/pie/static
, in which Lu documented that by
supplying –static
and –pie
flags to the linker along with PIE
versions of crt*.o
as input, static PIE ELF executables could be generated. It is important to
note, that at the time of this patch, generation of fully statically linked PIE
binaries was not possible.[1]
In August, Lu submitted a second patch[2] to the GCC driver, for adding the
–static
flag to support static PIE
files that he was able to demonstrate in his
previous patch. The patch was accepted in trunk[3], and this feature was
released in GCC v8.
Moreover, in December of 2017 a commit was made in glibc[4] adding the option
–enable-static-pie
. This patch made it possible to embed the needed parts of
ld.so to produce standalone static PIE executables.
The major change in glibc to allow static PIE was the addition of the function
_dl_relocate_static_pie
which gets called by __libc_start_main
. This function is
used to locate the run-time load address, read the dynamic segment, and perform
dynamic relocations before initialization, then transfer control flow of
execution to the subject application.
In order to know which flags and compilation/linking stages were needed in order
to generate static PIE
executables, we passed the flag –static-pie –v
to GCC.
However, we soon realized by doing this that the linker generated a plethora of
flags and calls to internal wrappers. As an example, the linking phase is
handled by the tool /usr/lib/gcc/x86_64-linux-gnu/9/collect2
and GCC itself is
wrapped by /usr/lib/gcc/x86_64-linux-gnu/9/cc1
. Nevertheless, we managed to
remove the irrelevant flags and we ended up with the following steps:
These steps are in fact the same provided by Lu, supplying the linker with input
files compiled with –fpie
, and –static, -pie, -z text, --no-dynamic-linker
.
In particular, the most relevant artifacts for static PIE creation are rcrt1.o
,
libc.a
, and our own supplied input file, test.o
. The rcrt1.o
object contains the
_start
code which has the code required to correctly load the application before
executing its entry point by calling the correspondent libc startup code
contained in __libc_start_main
:
As previously mentioned, __libc_start_main
will call the new added function
_dl_relocate_static_pie
(defined at elf/dl-reloc-static-pie.c
file of glibc
source). The primary steps performed by this function are commented in the
source:
With the help of these features, GCC is capable of generating static executables which can be loaded at any arbitrary address.
We can observe that _dl_relocate_static_pie
will handle the needed dynamic
relocations. One noticeable difference of rcrt1.o
from conventional crt1.o
is
that all contained code is position independent. Inspecting what the generated
binaries look like we see the following:
At first glance they seem to be common dynamically linked PIE executables, based
on the ET_DYN
executable type retrieved from the ELF header. However, upon
closer examination of the segments, we will observe the nonexistent PT_INTERP
segment usually denoting the path to the interpreter in dynamically linked
executables and the existence of a PT_TLS
segment, usually contained only in
statically linked executables.
If we check what the dynamic linker identifies the subject executable as, we will see it identifies the file type correctly:
In order to load this file, all we would need to do is map all the PT_LOAD
segments to memory, set up the process stack with the correspondent Auxiliary Vector
entries, and then pivot to the mapped executable’s entry point. We do
not need to be concerned about mapping the RTLD
since we don’t have any external
dependencies or link time address restrictions.
As we can observe, we have four loadable segments commonly seen in SCOP ELF binaries
. However, for the sake of easier deployment, it will be crucial if we
could merge all those segments into one as is usually done with ELF disk
injection into a foreign executable. We can do just this by using the –N
linker
flag to merge data and text within a single segment.
Non-compatibility of GCC’s -N and static-pie flags⌗
If we pass –static-pie
and –N
flags together to GCC we see that it generates the
following executable:
The first thing we noticed about the type of generated ELF when using
–static-pie
alone was that it had a type of ET_DYN
, and now together with –N
it
results in an ET_EXEC
.
In addition, if we take a closer look at the segment’s virtual addresses, we see that the generated binary is not a Position Independent Executable. This is due to the fact that the virtual addresses appear to be absolute addresses and not relative ones. To understand why our program is not being linked as expected, we inspected the linker script that was being used.
As we are using the ld linker from binutils, we took a look on how ld selected
the linker script; this is done in the ld/ldmain.c
code at line 345:
The ldfile_open_default_command_file
is in fact an indirect call to an
architecture independent function generated at compile time that contains a set
of internal linker scripts to be selected depending upon the flags passed to ld.
Because we are using the x86_64 architecture, the generated source will be
ld/elf_x86_64.c
, and the function which is called to select the script is
gldelf_x86_64_get_script
, which is simply a set of if-else-if statements to
select the internal linker script. The –N
option sets the config.text_read_only
variable to false, which forces the selection function to use an internal script
which does not produce PIC as can be seen below:
This way of selecting the default script makes the –static-pie
and –N
flags
non-compatible, because the forced test of selecting the script based on –N is
parsed before –static-pie
.
Circumvention via custom Linker Script⌗
The incompatibility between –N
, -static
, and –pie
flags led us to a dead end,
and we were forced to think of different ways to overcome this barrier. What we
attempted was to provide a custom script to drive the linker. As we essentially
needed to merge the behavior of two separate linker scripts, our approach was to
choose one of the scripts and adapt it to generate the desired outcome with
features of the remaining script.
We chose the default script of –static-pie
over the one used with –N
because in
our case it was easier to modify as opposed to changing the –N default script to
support PIE generation.
To accomplish this goal, we would need to change the definition of the segments,
which are controlled by the PHDRS
[5] field in the linker script. If the command
is not used the linker will provide program headers generated by default –
However, if we neglect this in the linker script, the linker will not create any
additional program headers and will strictly follow the guidelines defined in
the subject linker script.
Taking into account the details discussed above, we added a PHDRS command to the
default linker script, starting with all the original segments which are created
by default when using –static-pie
:
After this we need to know how each section maps to each segment – and for this we can use readelf as shown below:
With knowledge of the mappings, we just needed to change the section output definition in the linker script which adds the appropriate segment name at the end of each function definition, as shown in the following example:
Here, the .tdata and .tbss sections are being assigned to the segments that get mapped in the same order that we saw in the output of the readelf –l command. Eventually, we ended up having a working script precisely changing all mapped sections which were mapped in data to the text segment:
If we compile our subject test file with this linker script, we see the following generated executable:
We now have a static-pie with just one loadable segment. The same approach can be repeated to remove other irrelevant segments, keeping only critical segments necessary for the execution of the binary. As an example, the following is a static-pie executable instance with minimal program headers needed to run:
The following is the final output of our desired ELF structure – having only one PT_LOAD segment generated by a linker script with the PHDRS command configured as in the screenshot below:
SHELF Loading⌗
This generated ELF flavor gives us some interesting capabilities that other ELF types are not able to provide. For the sake of simplicity, we have labelled this type of ELF binary as SHELF, and we will be referencing it throughout the rest of this paper. The following is an updated diagram of the loading stages needed for SHELF loading:
As we can see in the diagram above, the process of loading SHELF files is highly reduced in complexity compared to conventional ELF loading schemes.
To illustrate the reduced set of constraints to load these types of files, a snippet of a minimalistic SHELF User-Land-Exec approach is as follows:
By using this approach, a subject SHELF file would look as follows in memory and on disk:
As we can observe, the ELF header and Program Headers are missing from the process image. This is a feature that this flavor of ELF enables us to implement and is discussed in the following section.
Anti-Forensic Capabilities⌗
This new approach to User-Land-Exec
has also two optional stages useful for
anti-forensic purposes. Since the dl_relocate_static_pie
function will obtain
all of the required fields for relocation from the Auxiliary Vector, this leaves
us room to play with how the subject SHELF
file structure may look in memory and
on disk.
Removing the ELF header will directly impact reconstruction capabilities, because most Linux-based scanners will scan process memory for existing ELF images by first identifying ELF headers. The ELF header will be parsed and will contain further information on where to locate the Program Header Table and consequently the rest of the mapped artifacts of the file.
Removal of the ELF header is trivial since this artifact is not really needed by
the loader – all required information in the subject file will be retrieved from
the Auxiliary Vector
as previously mentioned.
An additional artifact that can be hidden is the Program Header Table. This is a
slightly different case when compared with the ELF Header. The Auxiliary Vector
needs to locate the Program Header Table in order for the RTLD
to successfully
load the file by applying the needed runtime relocations. Regardless, there are
many approaches to obfuscating the PHT
. The simplest approach is to remove the
original Program Header Table
location, and relocate it somewhere in the file
that is only known by the Auxiliary Vector
.
We can precompute the location of each of the Auxiliary Vector
entries and
define each entry as a macro in an include file, tailoring our loader to every
subject SHELF file at compile-time. The following is an example of how these
macros can be generated:
As we can observe, we have parsed the subject SHELF
file for its e_entry and
e_phnum fields, creating corresponding macros to hold those values. We also have
to choose a random base image to load the file. Finally, we locate the PHT and
convert it to an array, then remove it from its original location. Applying
these modifications allows us to completely remove the ELF header and change the
default location of the subject SHELF
file PHT both on disk and in memory(!)
Without successful retrieval of the Program Header Table
, reconstruction
capabilities may be strictly limited and further heuristics will have to be
applied for successful process image reconstruction.
An additional approach to make the reconstruction of the Program Header Table much harder is by instrumenting the way glibc implements the resolution of the Auxiliary Vector fields.
Obscuring SHELF features by PT_TLS patching⌗
Even after modifying the default location of the Program Header Table
by
choosing a new arbitrary location when crafting the Auxiliary Vector
, the
Program Header Table would still reside in memory and could be found with some
effort. To obscure ourselves even further we can cover how the startup code
reads the Auxiliary Vector
fields.
The code that does this is in elf/dl_support.c
in the function _dl_aux_init
. In
abstract, the code iterates over all the auxv_t
entries, and each of these
entries initialize internal variables from glibc:
The only reason the Auxiliary Vector is required is to initialize internal _dl_*
variables. Knowing this, we can bypass the creation of the Auxiliary Vector
entirely and do the same job that _dl_aux_init
would do before passing control
of execution to the subject SHELF
file.
The only entries which are critical are AT_PHDR
, AT_PHNUM
, and AT_RANDOM
.
Therefore, we only need to patch the respective _dl_*
variables that depend on
these fields. As an example of how to retrieve these values, we can use the
following one-liner to generate an include file with precomputed macros holding
the offset to every dl_*
variable:
With the offset to these variables located, we only need to patch them in the
same way the original startup code would do so using the Auxiliary Vector
. As a
way to illustrate this technique, the following code will initialize the
addresses of the Program Headers
to new_address
, and the number of program
headers to the correct number:
At this point we have a working program without supplying the Auxiliary Vector
.
Because the subject binary is statically linked
, and the code that will load the
SHELF
file is our loader, we can neglect every other segment in the Auxiliary
Vector’s AT_PHDR
and AT_PHNUM
or dl_phdr
and dl_phnum
respectively. There is an
exception, which is the PT_TLS
segment which is the interface in which Thread
Local Storage is implemented in the ELF file format.
The following code which resides in csu/libc-tls.c
on function __libc_setup_tls
show the type of information that gets retrieved from the PT_TLS
segment:
In the code snippet above, we can see that TLS initialization relies on the
presence of the PT_TLS
segment. We have several approaches that can obfuscate
this artifact, such as patching the __libc_setup_tls
function to just return and
then initialize the TLS with our own code. Here, we’ll choose to implement a
quick patch to glibc instead as a PoC.
To avoid the need of the PT_TLS
Program Header we have added a global variable
to hold the values from PT_TLS
and set the values inside __libc_setup_tls
,
reading from our global variable instead of the subject SHELF
file Program
Header Table. With this small change we finally strip all the program headers:
Using the following script to generate _phdr.h
:
We can apply our patches in the following way after including _phdr.h
:
Applying the methodology shown above, we gain a high level of evasiveness by
loading and executing our SHELF
file without an ELF header, Program Header
Table, and Auxiliary Vector
– just as shellcode gets loaded. The following
diagram illustrates how straightforward the loading process of SHELF
files is:
Conclusion⌗
We have covered the internals of Reflective Loading of ELF files, explaining previous implementations of User-Land-Exec, along with its benefits and drawbacks. We then explained the latest patches in the GCC code base that implemented support for static-pie binaries, discussing our desired outcome, and the approaches we followed to achieve the generation of static-pie ELF files with one single PT_LOAD segment. Finally, we discussed the anti-forensic features that SHELF loading can provide, which we think to be a considerable enhancement when compared with previous versions of ELF Reflective Loading.
We think this could be the next generation of ELF Reflective Loading, and it may benefit readers to understand the extent of offensive capabilities that the ELF file format can provide. If you would like access to the source code, contact @sblip or @ulexec.
References⌗
- [1] (support static pie) https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81498
- [2] (first patch gcc) https://gcc.gnu.org/ml/gcc-patches/2017-08/msg00638.html
- [3] (gcc patch) https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=252034
- [4] (glibc –enable-static-pie)
https://sourceware.org/git/?p=glibc.git;a=commit;
h=9d7a3741c9e59eba87fb3ca6b9f979befce07826 - [5] (ldscript doc) https://sourceware.org/binutils/docs/ld/PHDRS.html#PHDRS
- [6] https://sourceware.org/binutils/docs/ld/ Output-Section-Phdr.html#Output-Section-Phdr
- [7] https://www.akkadia.org/drepper/tls.pdf
- [8] (why ld doesn’t allow -static -pie -N)
https://sourceware.org/git
/gitweb.cgi?p=binutils-gdb.git;a=blob;f=ld/ldmain.c;
h=c4af10f4e9121949b1b66df6428e95e66ce3eed4;hb=HEAD#l345 - [9] (grugq ul_exec paper) https://grugq.github.io/docs/ul_exec.txt
- [10] (ELF UPX internals)
https://ulexec.github.io/ulexec.github.io/article
/2017/11/17/UnPacking_a_Linux_Tsunami_Sample.html