Diving into YarGen

Backlit keyboard

Photo by Taskin Ashiq on Unsplash

Last time, we talked about how to detect malware using YARA, and how to find YARA rules to use online: Malware Detection Using YARA.

But if you can’t find YARA rules published online that suit your needs, you’ll need to write your own rules instead!

Intro to YarGen

YarGen is a tool for generating YARA rules. It is able to generate YARA rules given a malware file. It generates YARA rules by identifying the strings found in the malware file, while also removing known strings that also appear in non-malicious files. YarGen includes a big database of strings and opcodes that are known to also appear in non-malicious files.

You can find YarGen on Github here:


Installing YarGen

First, download the latest version of YarGen in the release section of its Github page and unzip the archive. The source code is available as a zip file or a tarball.

Next, make sure you have all the dependencies installed. You can run these commands:

pip install pefile cd
pip install scandir lxml naiveBayesClassifier

Finally, cd into the YarGen directory and run the following command to download the built-in databases. The databases are saved into the ./dbs subdirectory.

python3 yarGen.py --update

Running YarGen

YarGen has many options for rule generation. To see the command line parameters, you can run:

python3 yarGen.py --help

To use the included database for rules generation, you can simply run the command:

python3 yarGen.py -m PATH_TO_MALWARE_DIRECTORY

This command will scan and create rules for the malware files under PATH_TO_MALWARE_DIRECTORY. A file named yargen_rules.yar will be created in the current directory, containing the rules generated.

Simple vs Super rules

A YarGen rule can be either a simple rule or a super rule.

If multiple sample files are used, YarGen will try to identify the similarities between the samples and combine the identified strings into a “super rule”.

Super rules can be identified by a line in the meta section of the rule:

super_rule = 1

The process of combining multiple rules into a single super rule does not remove the simple rules generated for each file. This means that there will be an overlap of rule strings between the simple rules and the super rule. To delete the simple rules that are covered by the super rule, you can use the --nosimple flag in your YarGen command:

python3 yarGen.py -m PATH_TO_MALWARE_DIRECTORY --nosimple

You can also suppress super rule creation by using the flag --nosuper:

python3 yarGen.py -m PATH_TO_MALWARE_DIRECTORY --nosuper

Rule creation flags

In addition to --nosimple and --nosuper, there are plenty of other flags that you can use to customize the behavior of YarGen! In particular, let’s look at the flags that are going to influence how YarGen approaches rule creation and output.

Here are all of them from the YarGen help page:

Rule Creation:

 -m M                  Path to scan for malware

 -y min-size           Minimum string length to consider (default=8)

 -z min-score          Minimum score to consider (default=0)

 -x high-scoring       Score required to set string as 'highly specific

                       string' (default: 30)

 -w superrule-overlap  Minimum number of strings that overlap to create a

                       super rule (default: 5)

 -s max-size           Maximum length to consider (default=128)

 -rc maxstrings        Maximum number of strings per rule (default=20,

                       intelligent filtering will be applied)

 --excludegood         Force the exclude all goodware strings

Rule Output:

 -o output_rule_file   Output rule file

 -e output_dir_strings

                       Output directory for string exports

 -a author             Author Name

 -r ref                Reference (can be string or text file)

 -l lic                License

 -p prefix             Prefix for the rule description

 -b identifier         Text file from which the identifier is read (default:

                       last folder name in the full path, e.g. "myRAT" if -m

                       points to /mnt/mal/myRAT)

 --score               Show the string scores as comments in the rules

 --strings             Show the string scores as comments in the rules

 --nosimple            Skip simple rule creation for files included in super


 --nomagic             Don't include the magic header condition statement

 --nofilesize          Don't include the filesize condition statement

 -fm FM                Multiplier for the maximum 'filesize' condition value

                       (default: 3)

 --globalrule          Create global rules (improved rule set speed)

 --nosuper             Don't try to create super rules that match against various files

Specifically, let’s talk about --excludegood,--score, -rc and -z.

YarGen gives each string a “score” based on its ability to indicate a malware file. The higher the score of a string, the higher the probability that files that contain it are malware files.

YarGen also does not completely remove the goodware strings from rules but includes them with a very low score. The --excludegood flag forces YarGen to exclude all of the goodware strings found in the YarGen database.

By default, YarGen does not include these “scores” for each string in the resulting rule file. To see how each string is scored, use the --score flag to output the scores as comments in the rule file.

The -rc (maxstrings) flag specifies the maximum number of strings to include in each rule. The default number is 20, which means that each rule will include up to 20 of the highest scoring strings.

The -z (min-score) flag determines the minimum score that a string needs to have in order to be included in the rule.

Decoding the Output: yargen_rules.yar

Now that we’ve generated a few YARA rules using YarGen, let’s dive into the rules and learn how to read it!

Each YARA rule generated via YarGen is composed of three sections: meta, strings, and condition.

Example YarGen output: autogenerated ruleset for backdoor.exe

Image taken from YarGen documentation at https://github.com/Neo23x0/yarGen.

Meta Section

The “meta” section of a rule contains the description, author, reference, date, and hash of the rule. You can specify the author of the rule via the -a flag:

python3 yarGen.py -m PATH_TO_MALWARE_DIRECTORY -a "Vickie Li"

And you can specify the reference file or webpage of a rule via the -r flag:

python3 yarGen.py -m PATH_TO_MALWARE_DIRECTORY -r "https://github.com/Neo23x0/yarGen"

Strings Section

The “strings” section of a rule specifies the strings that are used to identify that particular strain of malware. YarGen categorizes these rules based on the likelihood of them to be indicators of malware. There are three categories of these strings, marked by $s, $x, and $z.

Strings that start with $s (“Highly Specific Strings”) are very specific strings that will not appear in legitimate software. These strings can include malicious server addresses, the names of hacking tools and malware, hacking tool outputs, and typos in common strings. For example, sometimes malware files will contain misspelled words like “Micorsoft” or “Monnitor” when it tries to masquerade itself as legitimate software.

Strings that start with $x (“Specific Strings”) are likely to be indicators of malware files, but might also appear in legitimate files.

Lastly, strings that start with $z are likely to be ordinary but are not currently included in the goodware string database.

Condition section

Conditions in YARA rules are boolean expressions that specify the additional conditions of that rule.

Yara condition: uint16(0) == 0x5a4d and filesize < 3785KB and all of them

YarGen uses a combination of a magic header, file size, and strings for the condition section. For example, the conditions in the rule above specify that a file also needs to satisfy the following conditions to be classified as a “backdoor”:

  • The file has the magic header of 0x5a4d.
  • The file is smaller than 3785 kB.
  • All the strings specified in the “strings” section must be present.

To understand more types of conditions that can appear in YARA rules, please read the YARA documentation.

Good Luck!

You can also write YARA rules manually, but in doing that you risk writing rules that are either too specific or not specific enough. YarGen is a fast way of generating YARA rules that are both flexible and comprehensive.

Vickie Li