# Next Version of the Bazar Loader DGA

#### Disclaimer

These are just unpolished notes. The content likely lacks clarity and structure; and the results might not be adequately verified and/or incomplete.

#### Aliases

The malware in this blog post is known as BazarBackdoor, Team9Backdoor, BazDor, BazarLoader and BazaLoader

#### DGArchive

The DGA in this blog post has been implemented by the DGArchive project.

#### Cover Image

Image by falco from Pixabay

Last week, a new version of the Bazar Loader Domain Generation Algorithm (DGA) appeared. I already analyzed two previous versions, so I’m keeping this post short.

The DGA still uses the eponymous .bazar top level domain, but the second level domains are shorter with 8 characters instead of 12 for the previous versions:

liybelac.bazar
izryudew.bazar
biymudqe.bazar
fuicibem.bazar
biykonem.bazar
aqtielew.bazar
yptaonem.bazar
exyxtoca.bazar
iqfisoew.bazar
aguponew.bazar
exogelqe.bazar
exybonyw.bazar
etymonac.bazar


I analysed the following sample without much obfuscation. There are many other samples that have additional reverse engineering counter measures such as junk code, but a quick comparison revealed no functional differences.

MD5
c6502d4dd27a434167686bfa4d183e89
SHA1
bddbceefe4185693ef9015d0a535eb7e034b9ec3
SHA256
35683ac5bbcc63eb33d552878d02ff44582161d1ea1ff969b14ea326083ea780
Size
336 KB (344576 Bytes)
Compile Timestamp
2020-12-10 13:05:18 UTC
MalwareBazaar, Malpedia, Cape, VirusTotal
Filenames
1ld.3.v1.exe, 35683ac5bbcc63eb33d552878d02ff44582161d1ea1ff969b14ea326083ea780 (VirusTotal)
Detections
Virustotal: 8/72 as of 2020-12-11 02:58:32 - Win64/Bazar.Y (ESET-NOD32), Backdoor.Win32.Bazdor.co (Kaspersky), Trojan.Win64.BAZALOADER.SMYAAJ-A (TrendMicro), Trojan.Win64.BAZALOADER.SMYAAJ-A (TrendMicro-HouseCall)

Unpacking the sample leads to this:

MD5
e44cfd6ecc1ea0015c28a75964d19799
SHA1
cb294c79b5d48840382a06c4021bc2772fdbcf63
SHA256
52e72513fe2a38707aa63fbc52dabd7c7d2c5809ed7e27f384315375426f57bf
Size
96 KB (98816 Bytes)
Compile Timestamp
2020-12-09 10:16:56 UTC
MalwareBazaar, Malpedia, Cape, VirusTotal
Filenames
content.28641.20903.13470.9122.7127 (VirusTotal)
Detections
Virustotal: 4/75 as of 2020-12-15 21:30:37

## Reverse Engineering

Apart from the common dynamic loading of Windows API functions and encrypted strings, Bazar Loader relies on arithmetic substitution via identities to obfuscate the code. The following relationship is particularly often used:

$$a \oplus b = (\sim a \cdot b) + (a \cdot \sim b)$$

The same obfuscation is also used by Zloader. It makes the code very hard to read. Here is a small snippet from the DGA:

Hex Ray’s decompiler also produces really messy code because the arithmetic identities are not simplified:

The DGA uses the current month and year as the seed. The seed is stored as a string, and its four ASCII characters are the basis for picking four character pairs. These four pairs are joined to form the 8 second level characters.

The list of character pairs is generated by calculating the cartesian product of the consonants “bcdfghklmnpqrstvwxz” and vowels (with y) “aeiouy”. The product is calculated both ways, leading to 19·6·2 character pairs. These pairs are then concatenated into a large string of 456 characters by using a hardcoded sequence of random numbers:

qeewcaacywemomedekwyuhidontoibeludsocuexvuuftyliaqydhuizuctuiqow
agypetehfubitiaziceblaogolryykosuptaymodisahfiybyxcoleafkudarapu
qoawyluxqagenanyoxcygyqugiutlyvegahepovyigqyqibaeqynyfkiobpeepby
hoevmeburedeviihiravygkemywaerdonoyryqloammoseweesuvfopiriboikuz
orruzemuulimyhceukoqiwfexuefgoycwiokitnuneroxepyanbekyixxiuqsias


The string is then encrypted using a random xor key of the same length.

Apart from the date-based seed, the DGA also uses a standard linear congruential generator (LCG) to pick the four character pairs. The LCG is seeded with the current processor tick count and thus unpredictable. For the first two character pairs, the random number is taken mod 19, and for the remaining two pairs mod 6. These numbers correspond to the length of the consonants and vowels array, but make no sense in this context. Because the random numbers are unpredictable, any combination of the 19·19·6·6 = 12996 character pairs could be picked. Bazar Loader generates 10'000 domains per run, but does not guarantee they are unique. On average, 6975 unique domains are expected:

$$\mathbb{E} = 12996\left( 1 - \left(\frac{12996-1}{12966}\right)^{10000} \right) = 6975$$

Even with the short waiting time between resolving domains, the malware will need to run a long time to get through the list of domains.

## Reimplementation in Python

The following Python code shows how the domains are generated:

from itertools import product
from datetime import datetime
import argparse
from collections import namedtuple

Param = namedtuple('Param', 'mul mod idx')
pool = (
"qeewcaacywemomedekwyuhidontoibeludsocuexvuuftyliaqydhuizuctuiqow"
"agypetehfubitiaziceblaogolryykosuptaymodisahfiybyxcoleafkudarapu"
"qoawyluxqagenanyoxcygyqugiutlyvegahepovyigqyqibaeqynyfkiobpeepby"
"hoevmeburedeviihiravygkemywaerdonoyryqloammoseweesuvfopiriboikuz"
"orruzemuulimyhceukoqiwfexuefgoycwiokitnuneroxepyanbekyixxiuqsias"
"xoapaxmaohezwoildifaluzihipanizoecxyopguakdudyovhaumunuwsusyenko"
"atugabiv"
)

def dga(date):
seed = date.strftime("%m%Y")
params = [
Param(19, 19, 0),
Param(19, 19, 1),
Param(6, 6, 4),
Param(6, 6, 5)
]
ranges = []
for p in params:
s = int(seed[p.idx])
lower = p.mul*s
upper = lower + p.mod
ranges.append(list(range(lower, upper)))

for indices in product(*ranges):
domain = ""
for index in indices:
domain += pool[index*2:index*2 + 2]
domain += ".bazar"
yield domain

if __name__ == "__main__":
parser = argparse.ArgumentParser()
"-d", "--date", help="date used for seeding, e.g., 2020-06-28",
default=datetime.now().strftime('%Y-%m-%d'))
args = parser.parse_args()
d = datetime.strptime(args.date, "%Y-%m-%d")
for domain in dga(d):
print(domain)


Here are all the domains for December 2020, January 2021, February 2021, and March 2021.

## Characteristics

The following table summarizes the properties of the new Bazar Loader DGA.

propertyvalue
typeTDD (time-dependent-deterministic)
generation schemearithmetic
seedcurrent date
domain change frequencyevery month
unique domains per month12996
sequencerandom selection, might pick domains multiple times
wait time between domains10 seconds
top level domain.bazar
second level charactersa-z, without j
regex[a-ik-z]{8}\.bazar
second level domain length8

## Domain to Seed

As for the previous version of the Bazar Loader DGA, the seed can be extracted from the domains. I updated my Javascript form to also support the DGA in this blog post: