File Signature Analysis and File Carving Techniques

Thoery

File signatures are also referred to as file headers or magic headers.

Most of the files we encounter on our computers have unique file signatures. These signatures are sequences of bytes located at the beginning of a file, known as the header, and they are important in identifying a file type.

For instance, when we access a ZIP file using any archive application (like WinRAR or WinZip), it first checks for the presence of the signature (in hex: 0x50 0x4B 0x03 0x04) along with the file extension .zip. Below is a screenshot of a hex viewer application displaying the contents of a ZIP file, starting from the very beginning. As shown, the file begins with the ZIP file header in hexadecimal: 50 4B 03 04, which is always located at the start of a valid ZIP file.

Header Corruption

The size of the signature varies according to the file type. An extensive list of file signatures is documented.

Utilizing the file signature of a JPG image, for example, allows the extraction of JPG images from various sources such as traffic dumps, disk images, and other storage mediums. This procedure can be applied to identify and extract other file types as well, provided the storage mediums are not encrypted.

File signatures vary across data compression formats such as ZIP, TAR, and 7z. For example, RAR files use the signature 52 61 72 21 1A 07 00, and TAR files use 75 73 74 61 72 00 30 30 to indicate files compressed using their respective tools. Moreover, file signatures may change based on the version of the file format. For instance, the standard JPEG uses the signature FF D8, while JPEG 2000 uses 00 00 00 0C 6A 50 20 20 0D 0A 87 0A. Therefore, having accurate knowledge of file signatures is crucial when recovering files from storage media or analyzing raw data in packet dumps, where file metadata may be missing or corrupted.

We can also store all known file signatures inside a program (e.g., a Python dictionary) and run it through storage mediums to extract known file types identified by their signatures.

File signatures are also used to discern embedded file types within media cover files.

While file signatures can be used to extract files from memory dumps, a more formal and reliable approach in memory forensics involves constructing unique patterns based on the size and arrangement of data structure members of operating system objects. These patterns, derived from the internal layout of structures such as FILE_OBJECT, are then used to scan memory dumps, allowing for accurate identification and reconstruction of memory-mapped files.

This lab will introduce the concept of file signatures through exercises on file signature header corruption, and recovery of files from traffic dumps and container files such as images (PNG, JPG, GIF, etc.).