Last month when news broke of XcodeGhost, the iOS malware that infected apps on the Apple App Store, we retrospected our haystack for evidence of this malware across our customers. We quickly discovered that more than half of our customers had affected devices on their networks, with infections dating as far back as April 25th, 2015 (much earlier than reported by several news outlets).
Network security tends to focus on detection. But what happens after the initial discovery?
Once a threat is discovered, the work has just begun for security teams and incident responders. This is because interpreting malicious activity can be just as difficult as discovering it. Incident responders need to identify the affected devices and assess the impact of the compromise. Hackers employ numerous tactics to hinder this investigation. For example, malware communications that are encrypted often hide critical information from incident responders. The level of effort required to decrypt these communications varies greatly between malware families. Understanding how to decipher information between victim devices and command and control servers is vital to any investigation.
This is certainly the case with XcodeGhost. Having full PCAP─in essence, a full recording of network history─gave us the opportunity to decrypt the communications in order to get a feel for the number of unique devices affected, as well as confirm the malware's capabilities. Determining the details of a device which beaconed using an IP address several months ago is difficult, but the information gained by decoding the beaconing traffic yielded details that enabled tracking them down.
This blog post will discuss the process of decrypting XcodeGhost communications and further detail the forensic benefits provided by long term full packet retention.
Decoding XcodeGhost Communications
Figure 1: XcodeGhost Beacon
Examining XcodeGhost traffic we see that it uses HTTP to communicate and that the payloads for the requests/responses are encrypted (Figure 1). Visual inspection of the traffic using Wireshark or a hex editor can yield a general idea of the encryption routine used to obfuscate the traffic. Occurrences of repeated characters or low levels of entropy in the data are typically caused by a simple bitwise operation such as XOR with a fixed key. If a simple bitwise encoding routine is used, an analyst can try to brute force the traffic or manually guess keys based on the repeated characters they see. A high level of entropy in the data indicates a more sophisticated algorithm was used, such as RC4 or DES. Examining the XcodeGhost traffic, we see a high level of entropy in the encrypted data, so manually guessing or brute forcing would likely be a waste of time. In this case a sample of the malware or source code is required. Generally source code isn't available and analysis of the disassembled malware is needed to decipher the encryption routine. Luckily, the source for XcodeGhost was published, simplifying the task of decryption.
XcodeGhost's Payload Structure
Figure 2: Construction of the Payload
When interpreting network traffic it is important to understand the payload structure for the communication in question. This is important for signature generation and can provide valuable information if new variants are found or the payloads can't be fully decrypted. After examining the source, we learn that the payload is composed of four main parts. In the first three lines of Figure 2 (line 183) we see the variable bodylen is assigned the variable encryptData's length + 8. This is stored in an int32_t which is converted from host order to network order. This is important to note because we know to expect the value of this variable to be in big endian format over the wire. The translated value is then converted to an NSData object using the size of bodyLen. Since this variable is an int32_t we can expect to look for a DWORD (four bytes) to contain its value, and we can infer that this value should match the size of encryptedData + 8.
Next we see two more variables assigned, cmdlen and verLen, and in the same manner they are converted to network order. An NSData object is then created using these values, however this time the variables were int16_t, which means that we are now looking for WORDs (two bytes) to contain their values. The variable cmdlen is set to 101 which translates to 0x0065 in hexidecimal network order format. The variable verLen is set to 10 which translates to 0x000a in hexidecimal network order format.
Lastly all four parts of the data are concatenated together in the order bodylen, cmdlen, verLen, and encryptData. Since the value of bodyLen is encryptData + 8 and the first three components of the payload total eight bytes in size we can infer that the first value should be the total payload size in bytes. We then expect the next two bytes to contain 0x0065 and the following two bytes to contain 0x000a. The remaining payload should be the encrypted data. In Figure 3 we confirm our suspicions (note: the picture has been truncated so the data length of 0x00000140 will not match the figure).
Figure 3: XcodeGhost Payload Structure
XcodeGhost's Encryption Routine
The next step is to decrypt the encrypted portion of the payload. Looking through the source code, we find an encryption routine for the variable encryptData (line 497).
Figure 4: XcodeGhost Encryption Routine
Scrolling down to the last line in Figure 4 we can see the data is encrypted using DES. DES is a block cipher that requires a fixed length key and a fixed length of data to encrypt. At the beginning of the code snippet we see a NSString object named key assigned the value "stringWithFormat." Interesting enough stringWithFormat is a method for the NSString object that can be used in creating an NSString. Since DES requires a key of a fixed length, the next few lines of code end up pairing down "stringWithFormat" to "stringWi" which is then used as the actual key. With this knowledge we can write a routine in python to decode the encrypted data in the payload. Figure 5 shows example code needed to take a hex stream and decrypt it.
Figure 5. XcodeGhost Decryption Routine
Using this python routine we can grab the encrypted data via Wireshark (shown in Figure 6) and plug the encrypted data into the hex_string variable above. When doing this, it is important to remember to strip off the first eight bytes that contain the payload size, etc.
Figure 6: Copying payload data from Wireshark
After running the payload data through the python routine above a JSON object is decoded revealing the encrypted device information. Note there will likely be extra data at the end of the JSON object from the padding needed for the DES encryption.
"country" : "US",
"os" : "8.3",
"type" : "iPhone7,1",
"app" : "LeCam_FR",
"name" : "",
"idfv" : "",
"timestamp" : "1437427554",
"bundle" : "com.arcsoft.closeli",
"status" : "launch",
"version" : "2.21.282",
"language" : "en"
Figure 7. Decoded Result
After comparing traffic from April 2015 to traffic from October 2015 we were able to conclude that the payload structure and static values did not change. Knowing this, the routine above can be expanded to read PCAP searching for the value 0x0065000a, parse out the encrypted data, decrypt the data, and then repeat for the remaining PCAP. In addition to the beacons analyzed, the server responses carried the same payload format and encryption routine.
Retaining full PCAP provides very valuable evidence for investigating threats that were recently discovered. Decrypting malware communications can help identify infected hosts more quickly and determine what command and control activities took place. When it comes to network artifacts, logs such as HTTP and DNS can be very useful in determining if malicious activity occurred, but full PCAP is often needed to understand the full extent of attacker activity.