Saturday, December 24, 2011

Can you crack it ? A quite detailed solution (Part 1)

On the December 5 I saw an article about the site http://www.canyoucrackit.co.uk/. A little late since it was apparently up since November 2.
Canyoucrackit front page.

I decided to take the challenge and it took me 2 days to complete it. Now that I have some time to write articles I will walk you through it. It comes in 3 parts.
The challenge is over now but at time of writing all links are still working. I will provide you with local copies of all that is needed to complete the challenge anyway.

Prerequisites

First you need to know a little bit about x86 assembler and how the stack works. I guess the following wikipedia articles are a good start :
And if you are not quite at ease with it I would also recommend you to read the article about the endianness used on the x86 : Little endian.

I will assume that you have a GNU/Linux machine (or that you use cygwin for that matter) and that you know how to install missing packages.

Decoding the hexadecimal code

The hexadecimal code in the front page is actually a PNG image. The original image can still be found at this URL : http://www.canyoucrackit.co.uk/images/cyber.png

cyber.png

Let's translate this hexadecimal string to binary.
$ cat <<EOF | xxd -r -p > challengepart1.bin
EB 04 AF C2 BF A3 81 EC 00 01 00 00 31 C9 88 0C 0C FE C1 75 F9 31 C0 BA EF BE AD DE 02
04 0C 00 D0 C1 CA 08 8A 1C 0C 8A 3C 04 88 1C 04 88 3C 0C FE C1 75 E8 E9 5C 00 00 00 89
E3 81 C3 04 00 00 00 5C 58 3D 41 41 41 41 75 43 58 3D 42 42 42 42 75 3B 5A 89 D1 89 E6
89 DF 29 CF F3 A4 89 DE 89 D1 89 DF 29 CF 31 C0 31 DB 31 D2 FE C0 02 1C 06 8A 14 06 8A
34 1E 88 34 06 88 14 1E 00 F2 30 F6 8A 1C 16 8A 17 30 DA 88 17 47 49 75 DE 31 DB 89 D8
FE C0 CD 80 90 90 E8 9D FF FF FF 41 41 41 41
EOF

What can it be ? I have a good feeling for x86 machine code. So let's try and disassemble it to see what we get.
$ ndisasm -b32 challengepart1.bin
00000000  EB04              jmp short 0x6
00000002  AF                scasd
00000003  C2BFA3            ret 0xa3bf
00000006  81EC00010000      sub esp,0x100
0000000C  31C9              xor ecx,ecx
0000000E  880C0C            mov [esp+ecx],cl
00000011  FEC1              inc cl
00000013  75F9              jnz 0xe
00000015  31C0              xor eax,eax
00000017  BAEFBEADDE        mov edx,0xdeadbeef
0000001C  02040C            add al,[esp+ecx]
0000001F  00D0              add al,dl
00000021  C1CA08            ror edx,0x8
00000024  8A1C0C            mov bl,[esp+ecx]
00000027  8A3C04            mov bh,[esp+eax]
0000002A  881C04            mov [esp+eax],bl
0000002D  883C0C            mov [esp+ecx],bh
00000030  FEC1              inc cl
00000032  75E8              jnz 0x1c
00000034  E95C000000        jmp dword 0x95
00000039  89E3              mov ebx,esp
0000003B  81C304000000      add ebx,0x4
00000041  5C                pop esp
00000042  58                pop eax
00000043  3D41414141        cmp eax,0x41414141
00000048  7543              jnz 0x8d
0000004A  58                pop eax
0000004B  3D42424242        cmp eax,0x42424242
00000050  753B              jnz 0x8d
00000052  5A                pop edx
00000053  89D1              mov ecx,edx
00000055  89E6              mov esi,esp
00000057  89DF              mov edi,ebx
00000059  29CF              sub edi,ecx
0000005B  F3A4              rep movsb
0000005D  89DE              mov esi,ebx
0000005F  89D1              mov ecx,edx
00000061  89DF              mov edi,ebx
00000063  29CF              sub edi,ecx
00000065  31C0              xor eax,eax
00000067  31DB              xor ebx,ebx
00000069  31D2              xor edx,edx
0000006B  FEC0              inc al
0000006D  021C06            add bl,[esi+eax]
00000070  8A1406            mov dl,[esi+eax]
00000073  8A341E            mov dh,[esi+ebx]
00000076  883406            mov [esi+eax],dh
00000079  88141E            mov [esi+ebx],dl
0000007C  00F2              add dl,dh
0000007E  30F6              xor dh,dh
00000080  8A1C16            mov bl,[esi+edx]
00000083  8A17              mov dl,[edi]
00000085  30DA              xor dl,bl
00000087  8817              mov [edi],dl
00000089  47                inc edi
0000008A  49                dec ecx
0000008B  75DE              jnz 0x6b
0000008D  31DB              xor ebx,ebx
0000008F  89D8              mov eax,ebx
00000091  FEC0              inc al
00000093  CD80              int 0x80
00000095  90                nop
00000096  90                nop
00000097  E89DFFFFFF        call dword 0x39
0000009C  41                inc ecx
0000009D  41                inc ecx
0000009E  41                inc ecx
0000009F  41                inc ecx
It looks very good. Now we are going to analyze it a bit.

Analysis

The code begins with a jump that skips over the next 4 bytes. The 4 bytes skipped are 0xAFC2BFA3. They are not used for now but it is going to be very important for the last part of the challenge.
00000000  EB04              jmp short 0x6
00000002  AF                scasd          ;Unused byte
00000003  C2BFA3            ret 0xa3bf     ;Unused bytes

Then it allocates 0x100 bytes on the stack (in decimal that makes 256) and it loops through it. Basically it allocates an array just like this one :
unsigned char array[256] = {0,1,2,3, ... ,254,255};
(Note that register cl is 1 byte long and it can only store values from 0 to 255. It rolls over to 0 afterwards.)
00000006  81EC00010000      sub esp,0x100
0000000C  31C9              xor ecx,ecx
0000000E  880C0C            mov [esp+ecx],cl
00000011  FEC1              inc cl
00000013  75F9              jnz 0xe

Again another loop on the same 256 allocated bytes. I will save you the details but it uses 0xdeadbeef as a sort of key and rotates through its bytes to compute an index used to swap values of the array. To put it simple the array is "shuffled".
00000015  31C0              xor eax,eax
00000017  BAEFBEADDE        mov edx,0xdeadbeef
[...]
00000030  FEC1              inc cl
00000032  75E8              jnz 0x1c

Then there is a jump at address 0x95. If we follow the execution path we see that at address 0x95 there is a call to address 0x39, which is after the original jmp!
It seems useless, but this is a trick. The instruction call pushes on the stack the return address which is the address of the instruction following the call. In our case this is 0x0000009C.
00000034  E95C000000        jmp dword 0x95
--
00000095  90                nop
00000096  90                nop
00000097  E89DFFFFFF        call dword 0x39 ;Goes back just after the jmp
0000009C  41                inc ecx         ;The new stack will point there
0000009D  41                inc ecx
0000009E  41                inc ecx
0000009F  41                inc ecx

After that the original stack pointer (pointing to the 256 bytes array) is saved into ebx and the return address is popped into esp so that we point to a new stack beginning at 0x0000009C. This way the next value popped would be 0x41414141.
00000039  89E3              mov ebx,esp
0000003B  81C304000000      add ebx,0x4
00000041  5C                pop esp      ;esp contains the return address of the call

Then there is a kind of sanity check. To pass it the first value on the stack should be 0x41414141 and the next one 0x42424242. No problem for the first test, our stack contains the good value, but then we have a nothing else after that, this is the end of our code ! This is problematic.
Anyways we will not stop the analysis here. Let's note that and come back to it later.
(Note : instruction jnz 0x8d jumps to a bit of code that just exits the program. We will detail it later.)
00000042  58                pop eax
00000043  3D41414141        cmp eax,0x41414141
00000048  7543              jnz 0x8d
0000004A  58                pop eax
0000004B  3D42424242        cmp eax,0x42424242
00000050  753B              jnz 0x8d

Before we go on, a quick note on the instruction rep movsb:
  • movsb moves a byte from the address esi to the address edi and increments esi and edi.
  • rep makes the CPU do it ecx times.
So the following code pops another value from the stack that should be a length. It is saved into edx and ecx. Again the popped value is somewhere after our code. Let's note that and continue.
The source register esi is pointing in a zone after our code. The destination register edi is set to ebx which is the original stack pointer. ecx bytes are reserved on it for the data to be copied.
It looks like there should be a lot more data after our final 0x41414141 !
00000052  5A                pop edx     ;This is somewhere after the code
00000053  89D1              mov ecx,edx
00000055  89E6              mov esi,esp ;esi is somewhere further after the code
00000057  89DF              mov edi,ebx ;edi is the original esp
00000059  29CF              sub edi,ecx ;Reserve ecx bytes on the original stack
0000005B  F3A4              rep movsb   ;Copies data to the original stack

Then there is a part there is no need to dig into. Basically registers are initialized and there is a loop using the 256 bytes array and the data we have just copied. The data is XORed with the array in a strange way an the result put back in the data. It looks like a decryption of some sort.
0000005D  89DE              mov esi,ebx      ; Initialization
...
0000006B  FEC0              inc al           ; Start of the loop
...
00000085  30DA              xor dl,bl
...
0000008A  49                dec ecx
0000008B  75DE              jnz 0x6b

Finally this is the end of the program. int 0x80 with eax set to 1 is the system call sys_exit on Linux. The returned value is ebx which is set to zero.
0000008D  31DB              xor ebx,ebx
0000008F  89D8              mov eax,ebx
00000091  FEC0              inc al
00000093  CD80              int 0x80

Filling the missing part

From what we saw with the analysis we have a part of the code that is missing and it should be something like this.
000000A0  0x42424242 (value needed to pass the sanity check)
000000A4  0x???????? (length of the following data)
000000A8  0x???????? (data)
000000AC  0x???????? (data)
..........

The HTML source code of the page http://www.canyoucrackit.co.uk seems ok. (It is not available anymore).
All we have left is this PNG image. Let's see if it contains some indication.
$ strings cyber.png | head
IHDR
sRGB
        pHYs
tIME
]iTXtComment
QkJCQjIAAACR2PFtcCA6q2eaC8SR+8dmD/zNzLQC+td3tFQ4qx8O447TDeuZw5P+0SsbEcYR
78jKLw==2
IDATx
.^cwuW
n7-KtUwI
The highlighted part is very interesting stuff indeed ! It looks like it is Base64 encoded. We can tell that because it uses only the characters [A-Z], [a-z] + and / and it ends with '=' characters.

Let's trim the final '2' character and decode it.
$ cat <<EOF | base64 -d > pngcomment.bin
QkJCQjIAAACR2PFtcCA6q2eaC8SR+8dmD/zNzLQC+td3tFQ4qx8O447TDeuZw5P+0SsbEcYR
78jKLw==
EOF

$ od -Ax -tx1 pngcomment.bin
000000 42 42 42 42 32 00 00 00 91 d8 f1 6d 70 20 3a ab
000010 67 9a 0b c4 91 fb c7 66 0f fc cd cc b4 02 fa d7
000020 77 b4 54 38 ab 1f 0e e3 8e d3 0d eb 99 c3 93 fe
000030 d1 2b 1b 11 c6 11 ef c8 ca 2f
00003a
It looks very promising. We have the 0x42424242 we needed followed by 0x00000032 which is actually the length of the data (Total length : 0x3A - 0x8 bytes for length and initial 0x42424242).

So let's just append it to our binary file.
$ cat challengepart1.bin pngcomment.bin > challengepart1_full.bin

Time to run the code

Let's run the code. We will do it the way you traditionally do shellcode testing. xxd is very handy for that with the -i switch.
$ xxd -i challengepart1_full.bin
unsigned char challengepart1_full_bin[] = {
  0xeb, 0x04, 0xaf, 0xc2, 0xbf, 0xa3, 0x81, 0xec, 0x00, 0x01, 0x00, 0x00,
  0x31, 0xc9, 0x88, 0x0c, 0x0c, 0xfe, 0xc1, 0x75, 0xf9, 0x31, 0xc0, 0xba,
  0xef, 0xbe, 0xad, 0xde, 0x02, 0x04, 0x0c, 0x00, 0xd0, 0xc1, 0xca, 0x08,
  0x8a, 0x1c, 0x0c, 0x8a, 0x3c, 0x04, 0x88, 0x1c, 0x04, 0x88, 0x3c, 0x0c,
  0xfe, 0xc1, 0x75, 0xe8, 0xe9, 0x5c, 0x00, 0x00, 0x00, 0x89, 0xe3, 0x81,
  0xc3, 0x04, 0x00, 0x00, 0x00, 0x5c, 0x58, 0x3d, 0x41, 0x41, 0x41, 0x41,
  0x75, 0x43, 0x58, 0x3d, 0x42, 0x42, 0x42, 0x42, 0x75, 0x3b, 0x5a, 0x89,
  0xd1, 0x89, 0xe6, 0x89, 0xdf, 0x29, 0xcf, 0xf3, 0xa4, 0x89, 0xde, 0x89,
  0xd1, 0x89, 0xdf, 0x29, 0xcf, 0x31, 0xc0, 0x31, 0xdb, 0x31, 0xd2, 0xfe,
  0xc0, 0x02, 0x1c, 0x06, 0x8a, 0x14, 0x06, 0x8a, 0x34, 0x1e, 0x88, 0x34,
  0x06, 0x88, 0x14, 0x1e, 0x00, 0xf2, 0x30, 0xf6, 0x8a, 0x1c, 0x16, 0x8a,
  0x17, 0x30, 0xda, 0x88, 0x17, 0x47, 0x49, 0x75, 0xde, 0x31, 0xdb, 0x89,
  0xd8, 0xfe, 0xc0, 0xcd, 0x80, 0x90, 0x90, 0xe8, 0x9d, 0xff, 0xff, 0xff,
  0x41, 0x41, 0x41, 0x41, 0x42, 0x42, 0x42, 0x42, 0x32, 0x00, 0x00, 0x00,
  0x91, 0xd8, 0xf1, 0x6d, 0x70, 0x20, 0x3a, 0xab, 0x67, 0x9a, 0x0b, 0xc4,
  0x91, 0xfb, 0xc7, 0x66, 0x0f, 0xfc, 0xcd, 0xcc, 0xb4, 0x02, 0xfa, 0xd7,
  0x77, 0xb4, 0x54, 0x38, 0xab, 0x1f, 0x0e, 0xe3, 0x8e, 0xd3, 0x0d, 0xeb,
  0x99, 0xc3, 0x93, 0xfe, 0xd1, 0x2b, 0x1b, 0x11, 0xc6, 0x11, 0xef, 0xc8,
  0xca, 0x2f
};
unsigned int challengepart1_full_bin_len = 218;
All we need to add is the standard shellcode testing bit.
$ xxd -i challengepart1_full.bin > challengepart1_full.c

$ cat <<EOF >>challengepart1_full.c
int main(int argc, char **argv)
{
  int (*func)();
  func = (int (*)()) challengepart1_full_bin;
  (int)(*func)();
}
EOF

Let's then just compile it and run it through gdb.
$ gcc -o challengepart1_full challengepart1_full.c
$ gdb challengepart1_full
We just run the program until the end.
(gdb) r
With a little analysis we can see that at the end of the program the register edi points to the end of the decoded data. The data is 0x32 bytes long so let's dump it.
(gdb) info registers edi
edi            0x22cbdc 2280412
(gdb) p (char*) (0x22cbdc-0x32)
$1 = 0x22cbaa "GET /15b436de1f9107f3778aad525e5d0b20.js HTTP/1.1"
Here we go. A HTTP request.
You can check what it is for yourself : www.canyoucrackit.co.uk/15b436de1f9107f3778aad525e5d0b20.js

The challenge is not over yet. Part 2 and 3 are to come.

Final Note

It didn't come clear to me since I am note very acquainted with it but the code is in fact a RC4 cipher. The initialization of the 256 bytes array is very characteristic. 0xDEADBEEF is the key used to decrypt the data that is inside the PNG comment.

2 comments:

  1. Thanks for such a detailed writeup! I've tried to run your solution on a 32b linux distro (Linux trixie 3.0.0-12-generic-pae #20-Ubuntu SMP Fri Oct 7 16:37:17 UTC 2011 i686 i686 i386 GNU/Linux) and it segfaults on me? I've tried to debug a bit with gdb (http://pastebin.com/Dzan1FUA) but I've got a bit out of my depth.. Any ideas why it would be segfaulting? For reference the c file I'm using is http://pastebin.com/d41P6jq2 and I'm compiling/running it the same as yourself.

    ReplyDelete
    Replies
    1. To answer my own question gcc from Ubuntu 10.10 has the NX bit set on the stack. You can get around it by passing "-z execstack" to gcc.

      Delete