OverTheWireAdvent ChristmaSSE KeyGen

14 minute read

Tags: rev math
Points: 271
Solves: 45

I ran this program but it never finished… maybe my computer is too slow. Maybe yours is faster?

Author: hpmv

Solution

Summary

Reverse engineer to understand the algorithm
- the program uses a lot of SSE instructions, and we had to manually reverse it
- the algorithm found to be matrix multiplication with modulo
Rewrite the program with an efficient algorithm
- Power of matrix: efficiently compute when exponent is large

Details

Reverse engineer to understand the algorithm

When we try to run the program, the program does not terminate. This is because the program is computing something in an inefficient way. We cannot speed it up by simply optimising it, but we have to understand the meaning of the program and rewrite it with more efficient algorithm.

The program has a lot of SSE instructions, as the name of the challenge implies. First we tried to decompile it with Ghidra or other reverse engineering tools, but it doesn’t work well.

The program structure is shown below. There are outer loop and inner loop.

0000000000400570 <main>:
  400570:	66 0f 6f 05 08 0b 20 	movdqa xmm0,XMMWORD PTR [rip+0x200b08]        # 601080 <data+0x40>
  400577:	00 
  400578:	66 0f 70 d8 54       	pshufd xmm3,xmm0,0x54
  ...
  40058e:	48 b9 15 81 e9 7d f4 	movabs rcx,0x112210f47de98115
  400595:	10 22 11 

### outer loop
.outer_loop:
  4005a0:	0f ae f8             	sfence 
  4005a3:	66 44 0f 6f 0d 94 0a 	movdqa xmm9,XMMWORD PTR [rip+0x200a94]
  4005aa:	20 00 
  4005ac:	66 41 0f 70 f1 00    	pshufd xmm6,xmm9,0x0 # 0b00000000
  4005b2:	66 0f 38 40 f3       	pmulld xmm6,xmm3
  ...
  
.inner_loop:
  4006d0:	66 0f 6f f4          	movdqa xmm6,xmm4
  4006d4:	66 0f 66 f3          	pcmpgtd xmm6,xmm3
  ...
  400724:	83 c2 ff             	add    edx,0xffffffff
  400727:	75 a7                	jne    .inner_loop

  400729:	48 83 c0 01          	add    rax,0x1
  40072d:	48 39 c8             	cmp    rax,rcx
  400730:	0f 85 6a fe ff ff    	jne    .outer_loop


  400736:	48 83 ec 18          	sub    rsp,0x18
  ...

We ran the program with debugger to see internal state (xmm0, …, xmm3) for each round, in order to find out the recurrence formula. The main operation seems to be done in outer loop, and inner loop doesn’t seem to do anything, at least for the first several iterations. The output from outer loop looks like:

1st iteration:
 xmm3 = {0x1, 0x2, 0x3, 0x4}, 
 xmm2 = {0x5, 0x6, 0x7, 0x8}, 
 xmm1 = {0x9, 0xa, 0xb, 0xc}, 
 xmm0 = {0xd, 0xe, 0xf, 0x10}, 
2nd iteration:
 xmm3 = {0x5a, 0x64, 0x6e, 0x78}, 
 xmm2 = {0xca, 0xe4, 0xfe, 0x118}, 
 xmm1 = {0x13a, 0x164, 0x18e, 0x1b8}, 
 xmm0 = {0x1aa, 0x1e4, 0x21e, 0x258}, 
3rd iteration:
 xmm3 = {0xc44, 0xde8, 0xf8c, 0x1130}, 
 xmm2 = {0x1c64, 0x2028, 0x23ec, 0x27b0}, 
 xmm1 = {0x2c84, 0x3268, 0x384c, 0x3e30}, 
 xmm0 = {0x3ca4, 0x44a8, 0x4cac, 0x54b0}, 
4th iteration:
 xmm3 = {0x1bd28, 0x1f810, 0x232f8, 0x26de0}, 
 xmm2 = {0x40468, 0x48c90, 0x514b8, 0x59ce0}, 
 xmm1 = {0x64ba8, 0x72110, 0x7f678, 0x8cbe0}, 
 xmm0 = {0x892e8, 0x9b590, 0xad838, 0xbfae0}, 
...

Here we gave up finding the rule, and began to analyse the code.

movdqa: move data into xmm register
pshufd: select 32bit elements

pshufd is a bit complicated instruction, but we can understand the instruction this way:
If we have pshufd xmm1, xmm2, 0x36, then convert 0x36 to binary form 0b00110110, then separate it for each 2 bits (00 11 01 10), and use it as source register index as follows:

xmm1[0] := xmm2[0b00 = 0]
xmm1[1] := xmm2[0b11 = 3]
xmm1[2] := xmm2[0b01 = 1]
xmm1[3] := xmm2[0b10 = 2]

Then we annotate the meanings of each instruction as follows:

0000000000400570 <main>:
  400570:	66 0f 6f 05 08 0b 20 	movdqa xmm0,XMMWORD PTR [rip+0x200b08]        # 601080 <data+0x40>
  400577:	00 
xmm0 = {0x1, 0x0, 0x96433d, 0x0}
  400578:	66 0f 70 d8 54       	pshufd xmm3,xmm0,0x54 # 0x01010100
xmm3 = {xmm0[0], xmm0[1], xmm0[1], xmm0[1]} == {0x1, 0x0, 0x0, 0x0}
  40057d:	66 0f 70 d0 51       	pshufd xmm2,xmm0,0x51 # 0x01010001
xmm2 = {xmm0[1], xmm0[0], xmm0[1], xmm0[1]} == {0x0, 0x1, 0x0, 0x0}
  400582:	66 0f 70 c8 45       	pshufd xmm1,xmm0,0x45 # 0x01000101
xmm1 = {xmm0[1], xmm0[1], xmm0[0], xmm0[1]} == {0x0, 0x0, 0x1, 0x0}
  400587:	66 0f 70 c0 15       	pshufd xmm0,xmm0,0x15 # 0x00010101
xmm0 = {xmm0[1], xmm0[1], xmm0[1], xmm0[0]} == {0x0, 0x0, 0x0, 0x1}
  40058c:	31 c0                	xor    eax,eax
eax = 0
  40058e:	48 b9 15 81 e9 7d f4 	movabs rcx,0x112210f47de98115
  400595:	10 22 11 
rcx = 1234567890123456789
  400598:	0f 1f 84 00 00 00 00 	nop    DWORD PTR [rax+rax*1+0x0]
  40059f:	00 

### outer loop effectively computes "matrix multiplication"
L1:
  4005a0:	0f ae f8             	sfence 
  4005a3:	66 44 0f 6f 0d 94 0a 	movdqa xmm9,XMMWORD PTR [rip+0x200a94]        # 601040 <data>
  4005aa:	20 00 
xmm9 = {0x1, 0x2, 0x3, 0x4}
  4005ac:	66 41 0f 70 f1 00    	pshufd xmm6,xmm9,0x0 # 0b00000000
xmm6[i] = 1  # xmm9[0]
  4005b2:	66 0f 38 40 f3       	pmulld xmm6,xmm3
xmm6[i] = 1 * xmm3[i]
  4005b7:	66 45 0f 70 e1 55    	pshufd xmm12,xmm9,0x55 # 0b01010101
xmm12[i] = 2  # xmm9[1]
  4005bd:	66 44 0f 38 40 e2    	pmulld xmm12,xmm2
xmm12[i] = 2 * xmm2[i]
  4005c3:	66 44 0f fe e6       	paddd  xmm12,xmm6
xmm12[i] = 2 * xmm2[i] + 1 * xmm3[i]
  4005c8:	66 44 0f 6f 15 7f 0a 	movdqa xmm10,XMMWORD PTR [rip+0x200a7f]        # 601050 <data+0x10>
  4005cf:	20 00 
xmm10 = {0x5, 0x6, 0x7, 0x8}
  4005d1:	66 45 0f 70 c2 00    	pshufd xmm8,xmm10,0x0 # 0b00000000
xmm8[i] = 5  # xmm10[0]
  4005d7:	66 44 0f 38 40 c3    	pmulld xmm8,xmm3
xmm8[i] = 5 * xmm3[i] for i in range(0, 4)
  4005dd:	66 44 0f 6f 2d 7a 0a 	movdqa xmm13,XMMWORD PTR [rip+0x200a7a]        # 601060 <data+0x20>
  4005e4:	20 00 
xmm13 = {0x9, 0xa, 0xb, 0xc}
  4005e6:	66 45 0f 70 dd 00    	pshufd xmm11,xmm13,0x0 # 0b00000000
xmm11 = 9  # xmm13[0]
  4005ec:	66 44 0f 38 40 db    	pmulld xmm11,xmm3
xmm11[i] = 9 * xmm3[i]
  4005f2:	66 44 0f 6f 3d 75 0a 	movdqa xmm15,XMMWORD PTR [rip+0x200a75]        # 601070 <data+0x30>
  4005f9:	20 00 
xmm15 = {0xd, 0xe, 0xf, 0x10}
  4005fb:	66 45 0f 70 f7 00    	pshufd xmm14,xmm15,0x0 # 0b00000000
xmm14 = 13  # xmm15[0]
  400601:	66 44 0f 38 40 f3    	pmulld xmm14,xmm3
xmm14[i] = 13 * xmm3[i]
  400607:	66 41 0f 70 e9 aa    	pshufd xmm5,xmm9,0xaa # 0b10101010
xmm5 = 3  # xmm9[2]
  40060d:	66 0f 38 40 e9       	pmulld xmm5,xmm1
xmm5[i] = 3 * xmm1[i]
  400612:	66 41 0f 70 d9 ff    	pshufd xmm3,xmm9,0xff # 0b11111111
xmm3 = 4  # xmm9[3]
  400618:	66 0f 38 40 d8       	pmulld xmm3,xmm0
xmm3[i] = 4 * xmm0[i]
  40061d:	66 0f fe dd          	paddd  xmm3,xmm5
xmm3[i] = 4 * xmm0[i] + 3 * xmm1[i]
  400621:	66 41 0f fe dc       	paddd  xmm3,xmm12
xmm3[i] = 4 * xmm0[i] + 3 * xmm1[i] + 2 * xmm2[i] + 1 * xmm3[i]


  400626:	66 41 0f 70 ea 55    	pshufd xmm5,xmm10,0x55 # 0b01010101
xmm5 = {xmm10[1], xmm10[i], xmm10[i], xmm10[i]} == {0x6, 0x6, 0x6, 0x6}
  40062c:	66 0f 38 40 ea       	pmulld xmm5,xmm2
xmm5[i] = 6 * xmm2[i]
  400631:	66 41 0f fe e8       	paddd  xmm5,xmm8
xmm5[i] = 6 * xmm2[i] + 5 * xmm3[i]
  400636:	66 41 0f 70 fd 55    	pshufd xmm7,xmm13,0x55 # 0b01010101
xmm7[i] = 10  # xmm13[1]
  40063c:	66 0f 38 40 fa       	pmulld xmm7,xmm2
xmm7[i] = 10 * xmm2[i]
  400641:	66 41 0f 70 e7 55    	pshufd xmm4,xmm15,0x55 # 0b01010101
xmm4[i] = 14  # xmm15[i]
  400647:	66 0f 38 40 e2       	pmulld xmm4,xmm2
xmm4[i] = 14 * xmm2[i]
  40064c:	66 41 0f 70 f2 aa    	pshufd xmm6,xmm10,0xaa # 0b10101010
xmm6[i] = 7  # xmm10[2]
  400652:	66 0f 38 40 f1       	pmulld xmm6,xmm1
xmm6[i] = 7 * xmm1[i]
  400657:	66 41 0f 70 d2 ff    	pshufd xmm2,xmm10,0xff # 0b11111111
xmm2[i] = 8  # xmm10[3]
  40065d:	66 0f 38 40 d0       	pmulld xmm2,xmm0
xmm2[i] = 8 * xmm0[i]
  400662:	66 0f fe d6          	paddd  xmm2,xmm6
xmm2[i] = 8 * xmm0[i] + 7 * xmm1[i]
  400666:	66 0f fe d5          	paddd  xmm2,xmm5
xmm2[i] = 8 * xmm0[i] + 7 * xmm1[i] + 6 * xmm2[i] + 5 * xmm3[i]

  40066a:	66 41 0f fe fb       	paddd  xmm7,xmm11
xmm7[i] = 10 * xmm2[i] + 9 * xmm3[i]
  40066f:	66 41 0f 70 ed aa    	pshufd xmm5,xmm13,0xaa # 0b10101010
xmm5[i] = 11  # xmm13[2]
  400675:	66 0f 38 40 e9       	pmulld xmm5,xmm1
xmm5[i] = 11 * xmm1[i]
  40067a:	66 41 0f 70 f7 aa    	pshufd xmm6,xmm15,0xaa # 0b10101010
xmm6[i] = 15  # xmm15[2]
  400680:	66 0f 38 40 f1       	pmulld xmm6,xmm1
xmm6[i] = 15 * xmm1[i]
  400685:	66 41 0f 70 cd ff    	pshufd xmm1,xmm13,0xff # 0b11111111
xmm1[i] = 12  # xmm13[3]
  40068b:	66 0f 38 40 c8       	pmulld xmm1,xmm0
xmm1[i] = 12 * xmm0[i]
  400690:	66 0f fe cd          	paddd  xmm1,xmm5
xmm1[i] = 12 * xmm0[i] + 11 * xmm1[i]
  400694:	66 0f fe cf          	paddd  xmm1,xmm7
xmm1[i] = 12 * xmm0[i] + 11 * xmm1[i] + 10 * xmm2[i] + 9 * xmm3[i]

  400698:	66 41 0f fe e6       	paddd  xmm4,xmm14
xmm4[i] = 14 * xmm2[i] + 13 * xmm3[i]
  40069d:	66 41 0f 70 ff ff    	pshufd xmm7,xmm15,0xff # 0b11111111
xmm7[i] = 16  # xmm15[i]
  4006a3:	66 0f 38 40 f8       	pmulld xmm7,xmm0
xmm7[i] = 16 * xmm0[i]
  4006a8:	66 0f fe fe          	paddd  xmm7,xmm6
xmm7[i] = 16 * xmm0[i] + 15 * xmm1[i]
  4006ac:	66 0f fe fc          	paddd  xmm7,xmm4
xmm7[i] = 16 * xmm0[i] + 15 * xmm1[i] + 14 * xmm2[i] + 13 * xmm3[i]

  4006b0:	66 0f 6f 05 c8 09 20 	movdqa xmm0,XMMWORD PTR [rip+0x2009c8]        # 601080 <data+0x40>
  4006b7:	00 
xmm0 = {0x1, 0x0, 0x96433d, 0x0}
  4006b8:	66 0f 70 e0 aa       	pshufd xmm4,xmm0,0xaa # 0b10101010
xmm4[i] = 0x96433d  # xmm0[2]
  4006bd:	66 0f 70 e8 00       	pshufd xmm5,xmm0,0x0 # 0b00000000
xmm5[i] = 0x1  # xmm0[0]
  4006c2:	ba e8 03 00 00       	mov    edx,0x3e8
edx = 1000
  4006c7:	66 0f 6f c7          	movdqa xmm0,xmm7
xmm7[i] = 16 * xmm0[i] + 15 * xmm1[i] + 14 * xmm2[i] + 13 * xmm3[i]

  4006cb:	0f 1f 44 00 00       	nop    DWORD PTR [rax+rax*1+0x0]

### inner loop effectively computes "mod 0x96433d"
L2:
  4006d0:	66 0f 6f f4          	movdqa xmm6,xmm4
xmm6[i] = 0x96433d
  4006d4:	66 0f 66 f3          	pcmpgtd xmm6,xmm3
xmm6[i] = 0x96433d > xmm3[i] ? -1 : 0
  4006d8:	66 0f fe f5          	paddd  xmm6,xmm5
xmm6[i] = 0x96433d > xmm3[i] ? 0 : 1
  4006dc:	66 0f 38 40 f4       	pmulld xmm6,xmm4
xmm6[i] = 0x96433d > xmm3[i] ? 0 : 0x96433d
  4006e1:	66 0f fa de          	psubd  xmm3,xmm6
xmm3[i] = 0x96433d > xmm3[i] ? xmm3[i] : xmm3[i] - 0x96433d

  4006e5:	66 0f 6f f4          	movdqa xmm6,xmm4
  4006e9:	66 0f 66 f2          	pcmpgtd xmm6,xmm2
  4006ed:	66 0f fe f5          	paddd  xmm6,xmm5
  4006f1:	66 0f 38 40 f4       	pmulld xmm6,xmm4
  4006f6:	66 0f fa d6          	psubd  xmm2,xmm6
xmm2[i] = 0x96433d > xmm2[i] ? xmm2[i] : xmm2[i] - 0x96433d

  4006fa:	66 0f 6f f4          	movdqa xmm6,xmm4
  4006fe:	66 0f 66 f1          	pcmpgtd xmm6,xmm1
  400702:	66 0f fe f5          	paddd  xmm6,xmm5
  400706:	66 0f 38 40 f4       	pmulld xmm6,xmm4
  40070b:	66 0f fa ce          	psubd  xmm1,xmm6
xmm1[i] = 0x96433d > xmm1[i] ? xmm1[i] : xmm1[i] - 0x96433d

  40070f:	66 0f 6f f4          	movdqa xmm6,xmm4
  400713:	66 0f 66 f0          	pcmpgtd xmm6,xmm0
  400717:	66 0f fe f5          	paddd  xmm6,xmm5
  40071b:	66 0f 38 40 f4       	pmulld xmm6,xmm4
  400720:	66 0f fa c6          	psubd  xmm0,xmm6
xmm0[i] = 0x96433d > xmm0[i] ? xmm0[i] : xmm0[i] - 0x96433d

  400724:	83 c2 ff             	add    edx,0xffffffff
  400727:	75 a7                	jne    4006d0 <main+0x160>

if (--edx != 0) goto L2;

  400729:	48 83 c0 01          	add    rax,0x1
  40072d:	48 39 c8             	cmp    rax,rcx
  400730:	0f 85 6a fe ff ff    	jne    4005a0 <main+0x30>

if (++rax != rcx) goto L1;

  400736:	48 83 ec 18          	sub    rsp,0x18
  40073a:	66 0f 7f 1c 24       	movdqa XMMWORD PTR [rsp],xmm3
  40073f:	66 0f 7f 54 24 10    	movdqa XMMWORD PTR [rsp+0x10],xmm2
  400745:	66 0f 7f 4c 24 20    	movdqa XMMWORD PTR [rsp+0x20],xmm1
  40074b:	66 0f 7f 44 24 30    	movdqa XMMWORD PTR [rsp+0x30],xmm0
  400751:	31 c0                	xor    eax,eax

eax = 0

  400753:	66 2e 0f 1f 84 00 00 	nop    WORD PTR cs:[rax+rax*1+0x0]
  40075a:	00 00 00 
  40075d:	0f 1f 00             	nop    DWORD PTR [rax]

  400760:	0f b6 0c 44          	movzx  ecx,BYTE PTR [rsp+rax*2]
  400764:	30 88 90 10 60 00    	xor    BYTE PTR [rax+0x601090],cl
  40076a:	0f b6 4c 44 01       	movzx  ecx,BYTE PTR [rsp+rax*2+0x1]
  40076f:	30 88 91 10 60 00    	xor    BYTE PTR [rax+0x601091],cl
  400775:	48 83 c0 02          	add    rax,0x2
  400779:	48 83 f8 20          	cmp    rax,0x20
  40077d:	75 e1                	jne    400760 <main+0x1f0>

# xmm3, xmm2, xmm1, xmm0 is used to xor with the following bytes:
# fc14eb09 bcaee747 4fe37cc1 52a5028e
# 8971c88d 9623016d 71405aea fd461d23

  40077f:	be 54 08 40 00       	mov    esi,0x400854
  400784:	bf 01 00 00 00       	mov    edi,0x1
  400789:	ba 05 00 00 00       	mov    edx,0x5
  40078e:	31 c0                	xor    eax,eax
  400790:	e8 cb fc ff ff       	call   400460 <write@plt>
  400795:	be 90 10 60 00       	mov    esi,0x601090
  40079a:	bf 01 00 00 00       	mov    edi,0x1
  40079f:	ba 20 00 00 00       	mov    edx,0x20
  4007a4:	31 c0                	xor    eax,eax
  4007a6:	e8 b5 fc ff ff       	call   400460 <write@plt>
  4007ab:	be 5a 08 40 00       	mov    esi,0x40085a
  4007b0:	bf 01 00 00 00       	mov    edi,0x1
  4007b5:	ba 02 00 00 00       	mov    edx,0x2
  4007ba:	31 c0                	xor    eax,eax
  4007bc:	e8 9f fc ff ff       	call   400460 <write@plt>
  4007c1:	31 ff                	xor    edi,edi
  4007c3:	e8 a8 fc ff ff       	call   400470 <exit@plt>
  4007c8:	0f 1f 84 00 00 00 00 	nop    DWORD PTR [rax+rax*1+0x0]
  4007cf:	00 

Then we found out that the registers are initialised to:

xmm0 = {0x0, 0x0, 0x0, 0x1}
xmm1 = {0x0, 0x0, 0x1, 0x0}
xmm2 = {0x0, 0x1, 0x0, 0x0}
xmm3 = {0x1, 0x0, 0x0, 0x0}

The outer loop computes the following for 1234567890123456789 times:

xmm0[i] = 16 * xmm0[i] + 15 * xmm1[i] + 14 * xmm2[i] + 13 * xmm3[i]
xmm1[i] = 12 * xmm0[i] + 11 * xmm1[i] + 10 * xmm2[i] + 9 * xmm3[i]
xmm2[i] = 8 * xmm0[i] + 7 * xmm1[i] + 6 * xmm2[i] + 5 * xmm3[i]
xmm3[i] = 4 * xmm0[i] + 3 * xmm1[i] + 2 * xmm2[i] + 1 * xmm3[i]
inner loop

This effectively is a multiplication of matrix:

(\begin{matrix} xmm3 \\ xmm2 \\ xmm1 \\ xmm0 \end{matrix}) = (\begin{matrix} 1 & 2 & 3 & 4 \\ 5 & 6 & 7 & 8 \\ 9 & 10 & 11 & 12 \\ 13 & 14 & 15 & 16 \end{matrix}) (\begin{matrix} xmm3 \\ xmm2 \\ xmm1 \\ xmm0 \end{matrix})

The inner loop computes the following for 1000 times:

xmm0[i] = 0x96433d > xmm0[i] ? xmm0[i] : xmm0[i] - 0x96433d
xmm1[i] = 0x96433d > xmm1[i] ? xmm1[i] : xmm1[i] - 0x96433d
xmm2[i] = 0x96433d > xmm2[i] ? xmm2[i] : xmm2[i] - 0x96433d
xmm3[i] = 0x96433d > xmm3[i] ? xmm3[i] : xmm3[i] - 0x96433d

Since $0x96433d \times 1000 = 9847613000 > 2^{32}$ , it effectively computes “ $n mod 0x96433d$ ” for each element of xmm0..xmm3.

(\begin{matrix} xmm3 \\ xmm2 \\ xmm1 \\ xmm0 \end{matrix}) = (\begin{matrix} xmm3 \\ xmm2 \\ xmm1 \\ xmm0 \end{matrix}) mod 0x96433d

To summarise, one iteration of outer loop will compute:

(\begin{matrix} xmm3 \\ xmm2 \\ xmm1 \\ xmm0 \end{matrix}) = (\begin{matrix} 1 & 2 & 3 & 4 \\ 5 & 6 & 7 & 8 \\ 9 & 10 & 11 & 12 \\ 13 & 14 & 15 & 16 \end{matrix}) (\begin{matrix} xmm3 \\ xmm2 \\ xmm1 \\ xmm0 \end{matrix}) mod 0x96433d

And the final block will compute flag using the computed xmm0..xmm3 result, by concatenating xmm3, xmm2, xmm1, xmm0 in that order and exclusive-or’ing the value with the constant bytes fc14eb09 bcaee747 4fe37cc1 52a5028e 8971c88d 9623016d 71405aea fd461d23

Rewrite the program with an efficient algorithm

Now we understand the program as matrix multiplication and modulo. However, the outer loop is iterated for incredibly many times, it cannot be directly computed. We simplify the computation by using power of matrix.

Since the initial value of the xmm registers are a unit matrix, as shown below:

(\begin{matrix} xmm3 \\ xmm2 \\ xmm1 \\ xmm0 \end{matrix}) = (\begin{matrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{matrix})

The final value would be:

(\begin{matrix} xmm3 \\ xmm2 \\ xmm1 \\ xmm0 \end{matrix}) = {(\begin{matrix} 1 & 2 & 3 & 4 \\ 5 & 6 & 7 & 8 \\ 9 & 10 & 11 & 12 \\ 13 & 14 & 15 & 16 \end{matrix})}^{1234567890123456789} mod 0x96433d

We optimised the computation of power of matrix by using the following formula, to compute the power of matrix in $O(log N)$ time, where $N$ is the exponent:

A^{2 n} = {(A^{n})}^{2}

A^{2 n + 1} = {(A^{n})}^{2} \times A

chal10-solve.py:

import numpy
import struct
import sys

print(sys.getrecursionlimit())

# create 4x4 array
matrix = numpy.array(
    [[ 1,  2,  3,  4],
     [ 5,  6,  7,  8],
     [ 9, 10, 11, 12],
     [13, 14, 15, 16]])

def powmod(mat, n, m):
    if n == 1:
        return numpy.mod(mat, m)
    else:
        x = powmod(mat, n//2, m)
        x = numpy.mod(numpy.matmul(x, x), m)
        if n % 2 == 1:
            return numpy.mod(numpy.matmul(x, mat), m)
        else:
            return x

def powmod_slow(mat, n, m):
    x = mat
    for i in range(1, n):
        x = numpy.mod(numpy.matmul(x, mat), m)
    return x

#for i in range(1, 20):
#    x = powmod(matrix, i, 0x96433d)
#    for j in range(4):
#        print(' {0x%x, 0x%x, 0x%x, 0x%x}, ' % tuple(x[j]))

result = powmod(matrix, 0x112210f47de98115, 0x96433d)
result = result.astype('<u4')
print(result)

key = list(result.tobytes())
# print(key)

encrypted = list(bytes.fromhex('fc14eb09bcaee7474fe37cc152a5028e8971c88d9623016d71405aeafd461d23'))
# print(encrypted)

for i in range(0, 0x20, 2):
    encrypted[i] ^= key[i*2]
    encrypted[i+1] ^= key[i*2+1]

# print(encrypted)
print(bytes(encrypted))

$ python3 chal10-solve.py 
1000
[[7938225 4160415  382605 6452408]
 [8295223 6860620 5426017 3991414]
 [8652221 9560825  621816 1530420]
 [9009219 2413417 5665228 8917039]]
b'M4tr1x_3xp0n3nti4t1on_5728391723'

by Fukutomo

Twitter Facebook LinkedIn

Phish'n'Chips

OverTheWireAdvent ChristmaSSE KeyGen

Solution

Summary

Details

Reverse engineer to understand the algorithm

Rewrite the program with an efficient algorithm

You May Also Enjoy

OverTheWireAdvent Santa’s Signature

OverTheWireAdvent Impressive Sudoku

OverTheWireAdvent Snowflake Idle

OverTheWireAdvent Musical Stegano