cover image for post 'The Defective Domain Generation Algorithm of BazarLoader'

The Defective Domain Generation Algorithm of BazarLoader

Disclaimer

These are just unpolished notes. The content likely lacks clarity and structure; and the results might not be adequately verified and/or incomplete.

Changes
  • 2020-07-19 16:33:00: link to article by Cybereason and renamed BazarBackdoor to BazarLoader
Aliases

The malware in this blog post is also known as BazarBackdoor, Team9Backdoor, BazDor, BazarLoader and BazaLoader

Malpedia

For more information about the malware in this blog post see the Malpedia entry on BazarBackdoor.

URLhaus

This page lists malware URLs that are tagged with BazarLoader.

MalwareBazaar

You can download bazaloader samples from this page.

Cover Image Image by TheUjulala from Pixabay (modified by me)

Edit 2020-07-19: Cybereason published an excellent article A Bazar of Tricks: Following Team9’s Development Cycles. The article shows that the DGA is part of Bazar Loader, which will try to download Bazar Backdoor. I therefore renamed most instances of BazarBackdoor to BazarLoader.

When I analyzed the domain generation algorithm of BazarLoader, I noticed a sample that generates bizarre “domains”:

^efggkzjhggm.bazaar
]`egkjzeggkl.bazaar
_`eigkzegigm.bazaar
^`ggilzeigin.bazaar
bceeijbhgeil.bazaar
_acgkjzfegkl.bazaar
a`gggkaeiggm.bazaar
`cehimzhghio.bazaar
``ceikzeeeim.bazaar
`edgjlzjfgjn.bazaar
_ccghjzheghl.bazaar
a`eijjaegijl.bazaar
^aegjkzfggjm.bazaar
a`geikaeieim.bazaar
_dghhkziihhm.bazaar

Two things are obviously wrong:

  1. There is no top level domain .bazaar. There is a Persian tld .بازار which translates to bazaar, but that won’t work of course.
  2. Some second level domains contain special characters which makes them invalid too.

The first error is easy to explain: the authors meant to use .bazar, which is a valid EmerDNS domain. The second mistake is more interesting. The authors must have noticed the occasional special characters too. But they probably couldn’t find the root cause and instead programmed a workaround that fixes some, but not all, characters.

Here is the sample with the broken DGA that I looked at:

MD5
18d635a8ca7caefb4f4513650a31efc9
SHA1
d555233122a277fb89797ab2293efbe2a0c75f7f
SHA256
2e99ed535a9f73bafab151ec409de04c953a0187cb8e4063317617befa09068d
Size
377 KB (386224 Bytes)
Compile Timestamp
2020-06-17 09:20:56 UTC
Links
VirusTotal
Filenames
DD45.exe, Preview_Report.exe (VirusTotal)
Detections
Virustotal: 41/76 as of 2020-07-09 13:47:45 - Trojan.Trickster.Gen (ALYac), Trojan.Win32.Mansabo.4!c (AegisLab), Trojan:Win32/Mansabo.e7acfbbd (Alibaba), Trojan/Win32.Mansabo (Antiy-AVL), Trojan.Mansabo (CAT-QuickHeal), Trojan.Win32.Mansabo.fef (Kaspersky), Trojan:Win32/Trickbot.A!Cert (Microsoft), TrojanSpy.Win64.TRICKBOT.ENJ (TrendMicro), TrojanSpy.Win64.TRICKBOT.ENJ (TrendMicro-HouseCall), Trojan.Mansabo (VBA32), Trojan.Win32.Mansabo.fef (ZoneAlarm)

The domain generation algorithm in this faulty version is the same as the one documented here. The only place that is different is shown in the following screenshot comparison. The faulty DGA is on the left, the fixed on the right. Can you spot the problem?

Step 1 of the DGA

The divisions by invariant multiplication are hard to read, but notice the right site being much shorter even tough the calculation is basically the same. This is because compiler optimization was able to strip some minor corrections that are only necessary for large numbers. Here the decompiled code after some renaming and cleaning up:

  j_1 = 0;
  i_1 = 0;
  do
  {
    r = 0;
    [...]
    bcrypt_BCryptGenRandom(0i64, &r, 4i64);
    offset_letter = i_1 + 'a';
    i_1 += 2;
    character = r % 25 / (j_1 + 6) + offset_letter;
    r = r % 25 / (j_1 + 6);
    j_2 = j_1++;
    *(szDomain + 2 * j_2) = character;
  } while ( i_1 < 12u );

This is the same code as for the fixed DGA, except for how the random numbers are generated:

  1. The faulty DGA generates 4 random bytes using a call to BCryptGenRandom.
  2. The fixed DGA generates a random value with a call to GetTickCount, and extracting the lowest 15 bits.

The problem with the first approach is, that the number will be 0x80000000 or larger in 50% of the cases. Since it is a signed number, it becomes negative. And the remainder of a negative number for a positive divisor is negative. The fixed version doesn’t have this problem, because the integer overflow does not happen. When extending the random number ranges to the negative, we get these character sets:

indexrandom number rangepotential characters
0-4–4]^_`abcde
1-3–3`abcdef
2-3–3bcdefgh
3-2–2efghi
4-2–2ghijk
5-2–2ijklm

The malware authors used the following patch instead of fixing the integer overflow.

l = 6i64;
do {
    c = *(&szSeedStr[-6] + wDomain - a2) + *(wDomain - 6) - '0';
    *wDomain = c;
    if ( c < 'a' )
      *wDomain = 'z';
    ++wDomain;
    --l;
} while ( l );

The patch is an if condition that replaces characters below “a” — that includes all special characters generated by the faulty DGA — with “z”. This resolves the problem for the last half of the second level domain (in particular, the 7th and 8th letter, the rest are not affected by the bug). However, the first half of the second level domain remains unmodified.

The following Python reimplementation generates all possible domains for a given date. Note that due to the extended random ranges, there are about 55000 domains per month instead of 2160 for the fixed version. So even if the correct tld would have been used, then the number of domains would have been a problem for the attackers — as they have no way of predicting which ones are used in what order.

import argparse
from datetime import datetime
from itertools import product


def dga(date):
    month = date.month
    year = date.year
    date_str = "{0:02d}{1:04d}".format(12-month, year-18)

    valid_chars = [
      "]^_`abcde",
      "`abcdef",
      "bcdefgh",
      "efghi",
      "ghijk",
      "ijklm"
    ]
    valid_chars = [list(_) for _ in valid_chars]
    for part1 in product(*valid_chars):
        domain = "".join(part1)
        for i, c in enumerate(part1):
            r = ord(c) + int(date_str[i])
            if r < ord('a'):
                domain += 'z'
            else:
                domain += chr(r)
        domain += ".bazaar"
        yield domain



if __name__=="__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("-d", "--date", help="date when domains are generated, e.g., 2020-06-28")
    args = parser.parse_args()
    if args.date:
        d = datetime.strptime(args.date, "%Y-%m-%d")
    else:
        d = datetime.now()
    for domain in dga(d):
        print(domain)