Do you have a mystery file? The command Linux
file will quickly tell you what type of file it is. However, if it is a binary file, you can find out more about it.
file has a whole host of stable mates that will help you analyze it. We'll show you how to use some of these tools.
Identifying File Types
Files usually have attributes that allow software packages to identify what type of file it is and what its data represents. It wouldn't make sense to try opening a PNG file in an MP3 music player, so it is both useful and pragmatic for a file to carry some form of ID.
These may initially be a few typical bytes of the file. This allows a file to be explicit about its format and content. Sometimes the file type is derived from a distinctive aspect of the internal organization of the data itself, also known as the file architecture.
Some operating systems, such as Windows, are guided entirely by the extension of a file. You can call it gullible or trust, but Windows assumes that any file with the DOCX extension is really a DOCX word processing file. Linux is not like that, as you will soon see. It wants proof and looks in the file to find it.
The tools described here were already installed on the Manjaro 20, Fedora 21
With the Command Command
We have a collection of different file types in our current directory. They are a mix of document, source code, executable and text files.
ls command will show us what's in the directory, and the
-hl (human-) readable sizes, long list) option shows us the size of any file:
Let's try the
file on some of these and see what we get:
file COBOL_Report_Apr60.djvu  file build_instructions.odt in a terminal window. " width="646" height="167" src="/pagespeed_static/1.JiBnMqyl6S.gif" onload="pagespeed.lazyLoadImages.loadIfVisibleAndMaybeBeacon(this);" onerror="this.onerror=null;pagespeed.lazyLoadImages.loadIfVisibleAndMaybeBeacon(this);"/>
The three file formats are correctly identified. Where possible,
filegives us some more information. The PDF file is said to be in the version 1.5 format.
Even if we rename the ODT file to an extension with the arbitrary value XYZ, the file is still correctly identified both within the
files] file browser and on the command line with
Filesfile browser, it has got the correct icon. On the command line, file
ignores the extension and looks in the file to determine its type:file build_instructions.xyz
fileon media, such as picture and music files, usually provides information about their format, encoding, resolution, etc.:file screenshot.pngfile screenshot.jpgfile Pachelbel_Canon_In_D.mp3
Interestingly, even with plain text files,
filedoes not judge the file by extension. For example, if you have a file with the extension ".c", which contains plain text by default but no source code, file
will not confuse it with a real C source file: the file function+ headers.hmakefile filehello.c file
filecorrectly identifies the header file (".h") as part of a C source code collection of files, and knows that the makefile is a script.
Using file with binary files
Binary files are more of a "black box" than others. Image files can be viewed, sound files can be played and document files can be opened with the appropriate software package. Binary files are more of a challenge, however.
For example, the files "hello" and "wd" are binary executables. They are programs. The file named "wd.o" is an object file. When source code is compiled by a compiler, one or more object files are created. These contain the machine code that the computer will eventually run when the finished program is run, along with information for the linker. The left one checks each object file for function calls to libraries. It links them to all libraries the program uses. The result of this process is an executable file.
The "watch.exe" file is a binary executable file that is compiled to run on Windows:file wdfile wd.ofile hellofile watch.exe  file wd in a terminal window. " width="646" height="337" src="/pagespeed_static/1.JiBnMqyl6S.gif" onload="pagespeed.lazyLoadImages.loadIfVisibleAndMaybeBeacon(this);" onerror="this.onerror=null;pagespeed.lazyLoadImages.loadIfVisibleAndMaybeBeacon(this);"/>
filetells us that the "watch.exe" file is a PE32 + executable console program for the x86 processor family on Microsoft Windows. PE stands for portable executable format, which has 32- and 64-bit versions. The PE32 is the 32-bit version and the PE32 + is the 64-bit version.
The other three files are all identified as Executable and Linkable Format (ELF) files. This is a standard for executables and files for shared objects, such as libraries. We will be reviewing the ELF header format soon.
What you notice is that the two executables ("wd" and "hello") are identified as Linux Standard Base (LSB) shared objects, and the object file "wd.o" is identified as an LSB movable . The word executable is clear in its absence.
Object files can be moved, which means that the code can be loaded into memory at any location. The executables are listed as shared objects because they are created by the linker from the object files so that they inherit this capability.
This allows the Address Space Layout Randomization (ASMR) system to load the executable files into memory at addresses of your choice. Standard executables have a loading address encoded in their headers, which determines where they are loaded into memory.
ASMR is a security technique. Loading executable files into memory at predictable addresses makes them susceptible to attack. This is because their access points and the location of their functions will always be known to attackers. Position Independent Executables (PIE) at any address overcome this sensitivity.
If we compile our program with the
gcccompiler and provide the
-no-pieoption, then & # 39; ll generate a conventional executable file.
With the option
-o(output file) we can enter a name for our executable file:gcc -o hello -no-pie hello.c
We I'm using
fileon the new executable file and see what has changed:file hello
The size of the executable file is the same as before (17 KB):ls -hl hello
The binary file is now identified as a standard executable file. We only do this for demonstration purposes. If you compile applications this way, you will lose all the benefits of the ASMR.
Why is an executable file so large?
helloprogram is 17 KB, so it could hardly be called large, but then everything is relative. The source code is 120 bytes:cat hello.c
What does the binary file matter if it prints only one string to the terminal window? We know there is an ELF header, but that is only 64 bytes long for a 64-bit binary. Obviously it should be something else:ls -hl hello
Let's scan the binary file with the command
stringsas a simple first step to discover what's inside. We put it in
less:strings hello | less
There are many strings within the binary, in addition to the "Hello, Geek world!" from our source code. Most are labels for regions within the binary file and the names and mapping data of shared objects. These include the libraries and functions within those libraries on which the binary file depends.
lddcommand shows us the shared object dependencies of a binary:ldd hello
There are three entries in the output, and two of them contain a folder path (the first one is not):
- linux-vdso.so: Virtual Dynamic Shared Object (VDSO) that allows a set of kernel space routines can be accessed by a user space binary. This avoids the overhead of a context change from the user core mode. VDSO shared objects adhere to the Executable and Linkable Format (ELF) format, which allows them to be dynamically linked to the binary at runtime. The VDSO is assigned dynamically and benefits from ASMR. The VDSO capability is provided by the standard GNU C library if the kernel supports the ASMR scheme.
- libc.so.6: The common object GNU C Library.
- / lib64 / ld-linux-x86-64.so.2: This is the dynamic linker that wants to use the binary file. The dynamic linker queries the binary to find out what dependencies it has. It launches those shared objects into memory. It prepares the binary file to run and to find and access the dependencies in memory. Then the program is started.
The ELF header
We can examine and decode the ELF header using the utility
readelf and option
-h (file header):
readelf -h hello
The header is interpreted for us.
The first byte of all ELF binaries is set to hexadecimal value 0x7F. The next three bytes are set to 0x45, 0x4C and 0x46. The first byte is a flag that identifies the file as an ELF binary file. To make this clear, the following three bytes spell "ELF" in ASCII:
- Class: Indicates whether the binary file is a 32 or 64 bit executable (1 = 32, 2 = 64) .  Data: Indicates the used lifetime. Endian encoding defines the way multibyte numbers are stored. Big-endian encoding first stores a number with the most significant bits. In Little-Endian encoding, the number is first stored with the least significant bits.
- Version: The version of ELF (currently it is 1).
- OS / ABI: Specifies the type of application binary interface in use. This defines the interface between two binary modules, such as a program and a shared library.
- ABI version: The version of the ABI.
- Type: The ELF binary type. The common values are
ET_RELfor a movable source (such as an object file),
ET_EXECfor an executable file compiled with the
ET_DYNfor an ASMR aware executable file.
- Machine: The Instruction Set Architecture. This indicates the target platform for which the binary file was created.
- Version: Always set to 1, for this version of ELF.
- Access point address: The memory address within the binary file on which execution takes place begins.
The other entries are the size and number of regions and sections within the binary file so that their locations can be calculated.
A quick look at the first eight bytes of the binary file with
hexdump shows the signature byte and the "ELF" string in the first four bytes of the file. The option
-C (canonical) gives us the ASCII representation of the bytes next to their hexadecimal values, and with the option
-n (number) we can specify how many bytes we want to see :
hexdump -C -n 8 hello
objdump and the Granular View
If you want to see the core -grit detail, you can use the
objdump command with the
-d (disassemble) option :
objdump -d hello | less
This disassembles the executable machine code and displays it in hexadecimal bytes next to the equivalent of the assembly language. The address location of the first bye in each line is displayed on the far left.
This is only useful if you can read the assembly language, or if you are curious about what is going on behind the curtain. There is a lot of output, so we have led it to
Compile and Link
There are many ways to compile a binary file. For example, the developer chooses whether to include debug information. The way the binary file is linked also plays a role in its content and size. If the binary references share objects as external dependencies, it will be less than one to which the dependencies statically refer.
Most developers already know the commands we've covered here. For others, however, they offer some easy ways to browse and see what's inside the binary black box.