The DGA of CoreBot

Table of Contents
Disclaimer

These are just unpolished notes. The content likely lacks clarity and structure; and the results might not be adequately verified and/or incomplete.

DGArchive

The DGA in this blog post has been implemented by the DGArchive project.

Malpedia

For more information about the malware in this blog post see the Malpedia entry on Corebot.

Recently, IBM’s Security X-Force researchers analysed and reported a new banking trojan called CoreBot. They note that CoreBot features an inactive domain generation algorithm (DGA). The DGA has since been activated as observed by Kleissner & Associates who sinkholed some of the domains.

Since I couldn’t find a description of the DGA elsewhere, in the following my short write-up about the DGA of CoreBot. I looked at this sample provided by @benkow_ and referenced in a Tweet of @Bry_Campbell. Here are the first 10 DGA domains from the malwr report:

	lkhylm0mhyfuhg.ddns.net 	
	s63234wluv5v365bwp5.ddns.net 	
	afe6mfy23xcxgfa.ddns.net 	
	7rsl1f34sfq0oj3jwvmfa6c.ddns.net 	
	ir7l3po0gjy8ypqjm8o.ddns.net 	
	3lgrupwdivsfm2w4kng2iha.ddns.net 	
	i8a0q2wdu8otulkfylo2gdq.ddns.net 	
	kh1her76avy0qnelivijwd1.ddns.net 	
	ubgp1f1han7lu410eh5.ddns.net 	
	uliry8knadmpmdm4wti6oro.ddns.net 	

Edit 2015-09-28: The analysed sample turned out to be a debugging exemplar. I revised the post to highlight the difference.

Configuration

The DGA is configured with the following routine init_dga_config:

Call to init_dga_config

The meaning of these values are:

  • charset_len: This is the length of the charset array containing ASCII characters used for the DGA. The actual array is initialized later (see below).
  • r: this is the random number, initialized to the hardcoded seed 1DB98930. Other samples of CoreBot use a different hardcoded seeds, see Section Samples in the Wild.
  • len_l: this is the inclusive lower bound on the length of the subdomains of ddns.net.
  • len_u: this is the exclusive upper bound on the length of the subdomains of ddns.net

The set of characters for the domains is initialized as follows:

init_rand_and_chars

This code fills the charset array with “abcdefghijklmnopqrstuvwxy012345678”. Note that “z” and “9” are missing due to an off-by-one error. This bug seems to be widespread among VXers: Necurs, Ramnit, and Ranbyus all have similar errors that lead to missing *“z”*s. Edit 2015-09-17: Tinba, Geodo/Emotet, and Cryptolocker also have the missing “z” problem, thanks to Daniel Plohmann for pointing that out.

Seeding

The DGA is time dependent. The time is determined by making an HTTP request to www.google.com

Request to Google

… and querying the date and time with the WinHTTP function WinHttpQueryHeaders:

Systemtime from Google’s Response Header

My sample later overwrites the day with 8. While this could be to reduce the granularity of the DGA from days to months, it is more likely a debugging measure:

Overwriting the day

The next screenshot shows another sample that doesn’t overwrite the day. Notice that the offset nicely line up; the two samples are equal except for the removed “day ← 8” statement.

Without debug

Apart from the year, month, and day (set to 8), there is a fourth value used for seeding. This value is stored as a configuration value core.dga.group:

reading the group

In my sample the returned value was NULL, and the group was set to 1. I have yet to see a sample that uses the core.dga.group config value.

The year, month, day (set to 8) and the core.dga.group are then applied to the random number:

init_dga_config

The above disassembly boils down to:

	r = r + year + ((group << 16) + (month << 8) | day)

The DGA

The itself is very simple. It generates up to 40 subdomains (configurable with core.dga.domains_count) using the common linear congruential generator with multiplier 1664525 and increment 1013904223:

the dga

The disassembly decompiles to:

	r = (1664525*r + 1013904223) & 0xFFFFFFFF
	domain_len = len_l + r % (len_u - len_l)
	domain = ""
	for i in range(domain_len):
			r = ((1664525 * r) + 1013904223) & 0xFFFFFFFF
			domain += charset[r % charset_size]

Python Code

The following Python code generates the domains for any given date. It takes the following arguments:

  • -s, --seed: the seed as a hex string. If none is provided, the script uses 1DBA8930
  • -d, --date: the date for which to generate the domains. If none is provided, then the current date is used. If you like to get the domains for the debug sample, you can use the next option --debug.
  • -t, --debug: overwrite the day with 8 like the debug in this blog post does.
  • -n, --nr: number of domains to generate, default 40.

You can also find the code in my GitHub repository:

	import argparse
	from datetime import datetime

	def init_rand_and_chars(year, month, day, nr_b, r):
			r = (r + year + ((nr_b << 16) + (month << 8) | day)) & 0xFFFFFFFF
			charset = [chr(x) for x in xrange(ord('a'), ord('z'))] +\
							[chr(x) for x in xrange(ord('0'), ord('9'))]
							
			return charset, r

	def generate_domain(charset, r):
			len_l = 0xC
			len_u = 0x18
			r = (1664525*r + 1013904223) & 0xFFFFFFFF
			domain_len = len_l + r % (len_u - len_l)
			domain = ""
			for i in range(domain_len, 0, -1):
					r = ((1664525 * r) + 1013904223) & 0xFFFFFFFF
					domain += charset[r % len(charset)] 
			domain += ".ddns.net"
			print(domain)
			return r

	if __name__=="__main__":
			parser = argparse.ArgumentParser()
			parser.add_argument("-s", "--seed", help="seed", default="1DBA8930")
			parser.add_argument("-d", "--date", help="date for which to generate domains")
			parser.add_argument("-t", "--debug", help="debug DGA (day set to 8)")
			parser.add_argument("-n", "--nr", help="nr of domains to generate", 
					type=int, default=40)
			args = parser.parse_args()
			
			d = datetime.strptime(args.date, "%Y-%m-%d") if args.date else datetime.now()
			day = 8 if args.debug else d.day

			charset, r = init_rand_and_chars(d.year, d.month, day, 1, 
							int(args.seed, 16)) 
			for _ in range(40):
					r = generate_domain(charset, r)

Samples in the Wild

The sample in this blog post (first entry in the following table) turns out to be a special case: the day is set to 8 for debugging purposes, and the seed is slightly different than the ones of the “productive” samples. All other samples have the same seed.

MD5seeddebug
cb345ee48e811219387ffcd0d76788f21DB98930yes1
cc09ad01ce6785d287724f2f877a91f81DBA8930no
2f46770e63abd90d24031ff88b6a46f51DBA8930no
34f36f4ec445755d6e24203f81e562e81DBA8930no
feea363fb52213c72e5876cc8b5f88311DBA8930no
5c0b4d07949be6ed2035def1e8fcd85e1DBA8930no
09da58404d000cad3daea72a5782bb001DBA8930no
c40a5db6c20ba4316edd64d612481c41 21DBA8930unknown 3
67a248a56380865c85f902729b0d99441DBA8930no

1: meaning the day is set to 8. 2: md5 sum of javascript that dropped corebot 3: the sample was submitted September 8th.

Characteristics

The following table summarizes the properties of Corebot’s DGA:

propertyvalue
seedmagic number and current date
granularity1 day
domains per seed and day40
sequencesequential
wait time between domainsnone
top and second level domain.ddns.net
third level characterslower case letters except ‘z’
third level length12 to 23 letters