Strings & Text

A Memory Dump Analyzer in Rust

Analyzing binary files and memory dumps is a common task in software development, especially in cybersecurity, reverse engineering, and…

By Luis SoaresOctober 29, 202410 min readOriginal on Medium

Analyzing binary files and memory dumps is a common task in software development, especially in cybersecurity, reverse engineering, and low-level programming.

In this article, we will build a memory and hex dump analyzer in Rust that provides an interactive UI to view, navigate, and search through binary data.

By the end, you’ll have a tool capable of detecting specific byte patterns, and ASCII strings, and displaying them in an organized way. 🦀

1. Project Overview

Our Rust Dump Analyzer will allow us to:

Display a hex dump of binary files with addresses and ASCII string detections.
Detect common file patterns (e.g., PDF, JPEG) based on known byte headers.
Navigate through entries, view contextual byte data, and use search and jump-to-address functions.
Get an overview of key statistics (total entries, patterns found, and ASCII strings detected).

We’ll implement the tool with Rust’s crossterm and ratatui libraries to build an interactive command-line interface.

2. Setting Up the Project

Begin by creating a new Rust project:

cargo new rust-dump-analyzer
cd rust-dump-analyzer

Add the following dependencies to Cargo.toml:

[dependencies]
crossterm = "0.20"
memchr = "2.5"
ratatui = "0.29"  # for building the UI

3. Implementing the Core Functionality

Our analyzer’s core functions will focus on:

Reading Binary Data: Loading the binary file’s contents into memory.
Detecting ASCII Strings and Patterns: Identifying readable text and known file signatures in the data.
Generating a Hex Dump: Displaying a formatted hex dump for easier analysis.

Let’s break down each of these components in detail.

Reading Binary Data

The first task is reading the binary data from a file. We’ll implement a function called read_dump_file that opens a file, reads its contents into a byte vector, and returns this data.

This function needs to:

Open the File: Use Rust’s File::open method to open the file.
Read to End: Use read_to_end to read the file's entire content into a byte vector.

Here’s the full implementation of the read_dump_file function:

use std::fs::File;
use std::io::{self, Read};

fn read_dump_file(filename: &str) -> io::Result<Vec<u8>> {
    let mut file = File::open(filename)?;
    let mut buffer = Vec::new();
    file.read_to_end(&mut buffer)?;
    Ok(buffer)
}

Error Handling: The ? operator is used to propagate errors, allowing the function to return an io::Result.
Buffer: The buffer is dynamically sized to accommodate the file’s contents, making it suitable for files of various sizes.

This function will be used to load binary files, providing raw data for further analysis in subsequent functions.

Detecting ASCII Strings

In binary files, ASCII strings often represent readable text or meaningful data. We want to identify these strings and their positions.

Our find_ascii_strings function will:

Detect ASCII Characters: Iterate over bytes and check if each byte is an ASCII character (i.e., printable).
Build Strings: Collect consecutive ASCII bytes into strings.
Minimum Length Filter: Only return strings longer than a specified minimum length (e.g., 4 characters).

Here’s the complete implementation of find_ascii_strings:

fn find_ascii_strings(chunk: &[u8], chunk_offset: usize, min_length: usize) -> Vec<(String, usize)> {
    let mut result = Vec::new();
    let mut current_string = Vec::new();
    let mut start_index = 0;

    for (i, &byte) in chunk.iter().enumerate() {
        if byte.is_ascii_graphic() || byte == b' ' {
            if current_string.is_empty() {
                start_index = i;
            }
            current_string.push(byte);
        } else if current_string.len() >= min_length {
            result.push((
                String::from_utf8_lossy(&current_string).to_string(),
                chunk_offset + start_index,
            ));
            current_string.clear();
        } else {
            current_string.clear();
        }
    }
    if current_string.len() >= min_length {
        result.push((
            String::from_utf8_lossy(&current_string).to_string(),
            chunk_offset + start_index,
        ));
    }
    result
}

Iterating Over Bytes: We loop through each byte in chunk. The is_ascii_graphic method helps us filter for printable characters.
String Building: We use current_string to collect contiguous ASCII bytes. When a non-ASCII byte is encountered, the accumulated bytes are processed if they meet the minimum length requirement.
Result: We return a vector of tuples, where each tuple contains an ASCII string and its starting position in the file.

This function will be used to detect readable text within binary data, which can often reveal metadata, file names, and other useful information.

Detecting Known File Patterns

Many file formats have specific “magic numbers” — unique byte sequences at the beginning of the file. Detecting these patterns can help identify embedded files or known data structures within the binary dump.

Our detect_patterns function will:

Define Common Patterns: Accept a list of known byte patterns to search for, such as PDF, JPEG, ZIP, and PNG headers.
Search for Patterns: Use a slice-searching function to locate patterns within the binary data.
Store Results: Return a list of found patterns with their names and starting addresses.

Practice what you learned

Reinforce this article with hands-on coding exercises and AI-powered feedback.

String Manipulation ToolkitIntermediate Regex-Free Pattern MatcherIntermediate Build a head CommandBeginner

View all exercises

Here’s the complete detect_patterns implementation:

use memchr::memmem;

#[derive(Debug, Clone)]
struct Pattern {
    name: &'static str,
    bytes: &'static [u8],
}
fn detect_patterns(chunk: &[u8], chunk_offset: usize, patterns: &[Pattern]) -> Vec<(String, usize)> {
    let mut results = Vec::new();
    for pattern in patterns {
        let mut start = 0;
        while let Some(pos) = memmem::find(&chunk[start..], pattern.bytes) {
            let actual_pos = chunk_offset + start + pos;
            results.push((pattern.name.to_string(), actual_pos));
            start += pos + 1;
        }
    }
    results
}

Pattern Struct: We define a Pattern struct to store the name and byte sequence for each known pattern.
Search Logic: For each pattern, we use memmem::find, a fast substring search, to locate occurrences of the pattern within the data.
Result Collection: Each time a pattern is found, its name and address are stored in the results vector.

With this function, our analyzer can identify embedded files and other known structures within the binary data.

Generating a Hex Dump

A hex dump allows users to see the raw byte data in a readable format, often with corresponding ASCII characters. This is crucial for analyzing binary data because it presents both the raw hex values and readable characters side-by-side.

The hex_dump function will:

Print Addresses: Display the address offset for each row.
Format Hex Bytes: Show each byte in hexadecimal format.
Display ASCII Representation: Print ASCII characters next to each row to help identify readable text.

Here’s the complete hex_dump function:

fn hex_dump(chunk: &[u8], chunk_offset: usize, bytes_per_row: usize) {
    for (i, line) in chunk.chunks(bytes_per_row).enumerate() {
        // Print offset in hexadecimal
        print!("{:08X}  ", chunk_offset + i * bytes_per_row);

        // Print each byte in hexadecimal
        for byte in line {
            print!("{:02X} ", byte);
        }
        // Pad if row is incomplete
        if line.len() < bytes_per_row {
            print!("{:width$}", "", width = (bytes_per_row - line.len()) * 3);
        }
        // Print ASCII representation
        print!(" |");
        for &byte in line {
            if byte.is_ascii_graphic() {
                print!("{}", byte as char);
            } else {
                print!(".");
            }
        }
        println!("|");
    }
}

Address Display: Each row begins with the address offset, providing context for the displayed bytes.
Hexadecimal Bytes: Each byte in line is formatted as a two-digit hex value, separated by spaces.
ASCII Representation: After the hex bytes, we display their ASCII equivalents, using . for non-printable characters.

This function can be called to display chunks of data in hex format, enabling users to see the raw byte values alongside any readable text.

Putting It All Together

With these core functions in place, our dump analyzer can read a binary file, detect ASCII strings and known patterns, and display data in a hex dump format. Here’s a summary of each function:

read_dump_file: Loads binary data from a file.
find_ascii_strings: Detects ASCII strings within binary data.
detect_patterns: Finds known byte sequences or file signatures.
hex_dump: Displays data in a formatted hex dump.

Next, we can use these functions within an interactive UI to create a powerful tool for analyzing memory dumps and binary files. Here’s an example of how they might be used together:

fn main() -> io::Result<()> {
    let filename = "test_dump.bin";
    let chunk_size = 1024;
    let min_string_length = 4;
    let patterns = vec![
        Pattern { name: "PDF", bytes: b"%PDF" },
        Pattern { name: "JPEG", bytes: &[0xFF, 0xD8, 0xFF, 0xE0] },
        Pattern { name: "ZIP", bytes: &[0x50, 0x4B, 0x03, 0x04] },
        Pattern { name: "PNG", bytes: &[0x89, 0x50, 0x4E, 0x47] },
    ];

    let data = read_dump_file(filename)?;
    for chunk_offset in (0..data.len()).step_by(chunk_size) {
        let chunk = &data[chunk_offset..chunk_offset + chunk_size.min(data.len() - chunk_offset)];

        // Display hex dump
        hex_dump(chunk, chunk_offset, 16);
        // Detect ASCII strings
        let ascii_strings = find_ascii_strings(chunk, chunk_offset, min_string_length);
        for (string, addr) in ascii_strings {
            println!("ASCII String '{}' found at 0x{:X}", string, addr);
        }
        // Detect patterns
        let detected_patterns = detect_patterns(chunk, chunk_offset, &patterns);
        for (pattern, addr) in detected_patterns {
            println!("Pattern '{}' found at 0x{:X}", pattern, addr);
        }
    }
    Ok(())
}

This code will process the file in chunks, displaying each section as a hex dump and reporting any ASCII strings or known patterns found.

Generating a dump file for testing

Here’s a simple Rust program that creates a memory dump file with known patterns, ASCII strings, and random data for testing our implementation.

use std::fs::File;
use std::io::{self, Write};
use rand::Rng;

fn main() -> io::Result<()> {
    let mut file = File::create("test_dump.bin")?;

    // Insert a PDF header signature at the beginning
    file.write_all(b"%PDF-1.4\n")?;

    // Fill with random data for padding
    let padding: Vec<u8> = (0..1024).map(|_| rand::thread_rng().gen()).collect();
    file.write_all(&padding)?;

    // Insert a JPEG signature at 1KB offset
    file.write_all(b"\xFF\xD8\xFF\xE0")?;

    // More padding
    let padding: Vec<u8> = (0..1024).map(|_| rand::thread_rng().gen()).collect();
    file.write_all(&padding)?;

    // Insert an ASCII string at 2KB offset
    file.write_all(b"Hello, this is a test ASCII string.")?;

    // Add more random data to reach a certain size, e.g., 1 MB
    let padding: Vec<u8> = (0..1024 * 1024 - 4096).map(|_| rand::thread_rng().gen()).collect();
    file.write_all(&padding)?;

    println!("Generated test_dump.bin with known patterns for testing.");
    Ok(())
}

You can also generate a dump file using your operating system tools, such as gcore or memdump in Linux.

Running the Analyzer

You can now run the Analyzer tool with:

cargo run --bin dump /your_dump_file.bin

You can now build on this foundation with a UI improvement in usability and interactivity.

Click here to see the full implementation on my Github.

🚀 Discover More Free Software Engineering Content! 🌟

If you enjoyed this post, be sure to explore my new software engineering blog, packed with 200+ in-depth articles, 🎥 explainer videos, 🎙️ a weekly software engineering podcast, 📚 books, 💻 hands-on tutorials with GitHub code, including:

🌟 Developing a Fully Functional API Gateway in Rust — Discover how to set up a robust and scalable gateway that stands as the frontline for your microservices.

🌟 Implementing a Network Traffic Analyzer — Ever wondered about the data packets zooming through your network? Unravel their mysteries with this deep dive into network analysis.

🌟Implementing a Blockchain in Rust — a step-by-step breakdown of implementing a basic blockchain in Rust, from the initial setup of the block structure, including unique identifiers and cryptographic hashes, to block creation, mining, and validation, laying the groundwork.

and much more!

✅ 200+ In-depth software engineering articles
🎥 Explainer Videos — Explore Videos
🎙️ A brand-new weekly Podcast on all things software engineering — Listen to the Podcast