The DGA of CoreBot
These are just unpolished notes. The content likely lacks clarity and structure; and the results might not be adequately verified and/or incomplete.
Table of Contents
The DGA in this blog post has been implemented by the DGArchive project.
For more information about the malware in this blog post see the Malpedia entry on Corebot.
Recently, IBM's Security X-Force researchers analysed and reported a new banking trojan called CoreBot. They note that CoreBot features an inactive domain generation algorithm (DGA). The DGA has since been activated as observed by Kleissner & Associates who sinkholed some of the domains.
Since I couldn't find a description of the DGA elsewhere, in the following my short write-up about the DGA of CoreBot. I looked at this sample provided by @benkow_ and referenced in a Tweet of @Bry_Campbell. Here are the first 10 DGA domains from the malwr report:
lkhylm0mhyfuhg.ddns.net s63234wluv5v365bwp5.ddns.net afe6mfy23xcxgfa.ddns.net 7rsl1f34sfq0oj3jwvmfa6c.ddns.net ir7l3po0gjy8ypqjm8o.ddns.net 3lgrupwdivsfm2w4kng2iha.ddns.net i8a0q2wdu8otulkfylo2gdq.ddns.net kh1her76avy0qnelivijwd1.ddns.net ubgp1f1han7lu410eh5.ddns.net uliry8knadmpmdm4wti6oro.ddns.net
Edit 2015-09-28: The analysed sample turned out to be a debugging exemplar. I revised the post to highlight the difference.
The DGA is configured with the following routine init_dga_config:
The meaning of these values are:
- charset_len: This is the length of the charset array containing ASCII characters used for the DGA. The actual array is initialized later (see below).
- r: this is the random number, initialized to the hardcoded seed 1DB98930. Other samples of CoreBot use a different hardcoded seeds, see Section Samples in the Wild.
- len_l: this is the inclusive lower bound on the length of the subdomains of ddns.net.
- len_u: this is the exclusive upper bound on the length of the subdomains of ddns.net
The set of characters for the domains is initialized as follows:
This code fills the charset array with “abcdefghijklmnopqrstuvwxy012345678”. Note that “z” and “9” are missing due to an off-by-one error. This bug seems to be widespread among VXers: Necurs, Ramnit, and Ranbyus all have similar errors that lead to missing *“z”*s. Edit 2015-09-17: Tinba, Geodo/Emotet, and Cryptolocker also have the missing “z” problem, thanks to Daniel Plohmann for pointing that out.
The DGA is time dependent. The time is determined by making an HTTP request to www.google.com…
… and querying the date and time with the WinHTTP function WinHttpQueryHeaders:
My sample later overwrites the day with 8. While this could be to reduce the granularity of the DGA from days to months, it is more likely a debugging measure:
The next screenshot shows another sample that doesn't overwrite the day. Notice that the offset nicely line up; the two samples are equal except for the removed “day ← 8” statement.
Apart from the year, month, and day (set to 8), there is a fourth value used for seeding. This value is stored as a configuration value core.dga.group:
In my sample the returned value was NULL, and the group was set to 1. I have yet to see a sample that uses the core.dga.group config value.
The year, month, day (set to 8) and the core.dga.group are then applied to the random number:
The above disassembly boils down to:
r = r + year + ((group << 16) + (month << 8) | day)
The itself is very simple. It generates up to 40 subdomains (configurable with core.dga.domains_count) using the common linear congruential generator with multiplier 1664525 and increment 1013904223:
The disassembly decompiles to:
r = (1664525*r + 1013904223) & 0xFFFFFFFF domain_len = len_l + r % (len_u - len_l) domain = "" for i in range(domain_len): r = ((1664525 * r) + 1013904223) & 0xFFFFFFFF domain += charset[r % charset_size]
The following Python code generates the domains for any given date. It takes the following arguments:
--seed: the seed as a hex string. If none is provided, the script uses 1DBA8930
--date: the date for which to generate the domains. If none is provided, then the current date is used. If you like to get the domains for the debug sample, you can use the next option
--debug: overwrite the day with 8 like the debug in this blog post does.
--nr: number of domains to generate, default 40.
You can also find the code in my GitHub repository:
import argparse from datetime import datetime def init_rand_and_chars(year, month, day, nr_b, r): r = (r + year + ((nr_b << 16) + (month << 8) | day)) & 0xFFFFFFFF charset = [chr(x) for x in xrange(ord('a'), ord('z'))] +\ [chr(x) for x in xrange(ord('0'), ord('9'))] return charset, r def generate_domain(charset, r): len_l = 0xC len_u = 0x18 r = (1664525*r + 1013904223) & 0xFFFFFFFF domain_len = len_l + r % (len_u - len_l) domain = "" for i in range(domain_len, 0, -1): r = ((1664525 * r) + 1013904223) & 0xFFFFFFFF domain += charset[r % len(charset)] domain += ".ddns.net" print(domain) return r if __name__=="__main__": parser = argparse.ArgumentParser() parser.add_argument("-s", "--seed", help="seed", default="1DBA8930") parser.add_argument("-d", "--date", help="date for which to generate domains") parser.add_argument("-t", "--debug", help="debug DGA (day set to 8)") parser.add_argument("-n", "--nr", help="nr of domains to generate", type=int, default=40) args = parser.parse_args() d = datetime.strptime(args.date, "%Y-%m-%d") if args.date else datetime.now() day = 8 if args.debug else d.day charset, r = init_rand_and_chars(d.year, d.month, day, 1, int(args.seed, 16)) for _ in range(40): r = generate_domain(charset, r)
Samples in the Wild
The sample in this blog post (first entry in the following table) turns out to be a special case: the day is set to 8 for debugging purposes, and the seed is slightly different than the ones of the “productive” samples. All other samples have the same seed.
|c40a5db6c20ba4316edd64d612481c41 2||1DBA8930||unknown 3|
The following table summarizes the properties of Corebot's DGA:
|seed||magic number and current date|
|domains per seed and day||40|
|wait time between domains||none|
|top and second level domain||.ddns.net|
|third level characters||lower case letters except ‘z’|
|third level length||12 to 23 letters|