KAF 2020–8Byte (Reversing manually a small VM) — part 1

Ilyar
4 min readNov 21, 2020

I didn’t solve this one during the challenge, but because 39 people did solve it, I had the urge to try also.
It’s a 32bit PE binary. I rarely reverse on Windows, so that was quite a challenge. For analyzing the file I used IDA without a decompiler.

Running the binary in cmd requires a flag. This is obviously the flag we want. A brief look at the main function:

The interesting stuff happen in sub_402040, which has another function call to sub_4020C0, where all the magic happens

Okay, so everything happens in sub_4020c. Let’s look at it:

This part is where real code is being unpacked. When I ran the program at first, it was closed immidiately. Therefore, I tried a different strategy: I know that the flag is compared with the bytecode, so at some point, it will be checked. Thus, if I’ll put a breakpoint on it, I might land in the code that checkes it. This plan worked, and I did land inside a function a very small section that is being modified all the time. To be precise, the breakpoint was triggered in strlen() function (imported from ucrtbase.dll). From there, I followed the instrucitons up until I landed in sub_40700. This function resides in .awsm1 section and being modified all the time. I will discuss how it is unpacked later. sub_40700:

And we can confirm that this section is indeed writable:

So up until now, weknow that the input is checked in this function and then we return to sub_4020c. I followed the instruction and put a breakpoint at the return address of sub_40700. This helped me to gain a better understand about what is happening: the code that will be executed in that function is being unpacked in sub_4020c and written into sub_40700. This method is known as virtualized code. It makes life harder on researchers. Bottom line, we now have two options:
1. Continue analyzing the unpacked code instruction by instruction
2. Unpack the .aswm code and load it again in IDA, then reverse it

Obviously the second method is the intended one by the auther. Now for us to unpack it, we first need to understand how the code is being unpacked and then we can write a routine that will do the work for us. Back to sub_4020c:

The bytecode is being unpacked to 8 bytes pointed by ebp. The first byte [ebp-8] contains the amount of bytes unpacked.

This is the core of the unpacking routine ( Excuse me for the graphics, all my notes were not saved :( ).

  1. dword_4060A0 points to the current place in .awsm to be unpacked. We unpack byte by byte, by sending the ciphered byte to some_calc which we will reverse later.
  2. Go to the next place in the 8 bytes pointer by [ebp] where ecx is the index. This is where we will store out next unpacked byte.
  3. increment the number of bytes unpacked by 1 and put them in [ebp-8] we discussed earlier. Also move to the next bye in .awsm pointer by dword_4060A0 and store that new place there.
  4. if the next two byes are 0x1337, we stop and move to next phase. Otherwise, we repeat that process.

This block takes the 8 bytes from the previous operation and memcpy’s them into the destination which is sub_40700. sub_401E90 constructs the bytecode in a specific logic. To be honest, I still don’t exactly understand what is the purpose of it. A decompiler would have been a great aid here. Afterwards, when jump the the function, we will return to dword_4060B0 which is the beginning of the whole routine.

Okay. Now we have everything we need to know in order to get the flag. We will reverse some_calc() in part 2:
https://int3rsys.medium.com/kaf-2020-8byte-reversing-manually-a-small-vm-part-2-d016f1773788

--

--