Photo by Taskin Ashiq on Unsplash
Last time, we talked about how to detect malware using YARA, and how to find YARA rules to use online: Malware Detection Using YARA.
But if you can’t find YARA rules published online that suit your needs, you’ll need to write your own rules instead!
Intro to YarGen
YarGen is a tool for generating YARA rules. It is able to generate YARA rules given a malware file. It generates YARA rules by identifying the strings found in the malware file, while also removing known strings that also appear in non-malicious files. YarGen includes a big database of strings and opcodes that are known to also appear in non-malicious files.
You can find YarGen on Github here:
Installing YarGen
First, download the latest version of YarGen in the release section of its Github page and unzip the archive. The source code is available as a zip file or a tarball.
Next, make sure you have all the dependencies installed. You can run these commands:
pip install pefile cd
pip install scandir lxml naiveBayesClassifier
Finally, cd
into the YarGen directory and run the following command to download the built-in databases.
The databases are saved into the ./dbs
subdirectory.
python3 yarGen.py --update
Running YarGen
YarGen has many options for rule generation. To see the command line parameters, you can run:
python3 yarGen.py --help
To use the included database for rules generation, you can simply run the command:
python3 yarGen.py -m PATH_TO_MALWARE_DIRECTORY
This command will scan and create rules for the malware files under PATH_TO_MALWARE_DIRECTORY
.
A file named yargen_rules.yar
will be created in the current directory, containing the rules generated.
Simple vs Super rules
A YarGen rule can be either a simple rule or a super rule.
If multiple sample files are used, YarGen will try to identify the similarities between the samples and combine the identified strings into a “super rule”.
Super rules can be identified by a line in the meta section of the rule:
super_rule = 1
The process of combining multiple rules into a single super rule does not remove the simple rules generated for each file.
This means that there will be an overlap of rule strings between the simple rules and the super rule.
To delete the simple rules that are covered by the super rule, you can use the --nosimple
flag in your YarGen command:
python3 yarGen.py -m PATH_TO_MALWARE_DIRECTORY --nosimple
You can also suppress super rule creation by using the flag --nosuper
:
python3 yarGen.py -m PATH_TO_MALWARE_DIRECTORY --nosuper
Rule creation flags
In addition to --nosimple
and --nosuper
, there are plenty of other flags that you can use to customize the behavior of YarGen!
In particular, let’s look at the flags that are going to influence how YarGen approaches rule creation and output.
Here are all of them from the YarGen help page:
Rule Creation:
-m M Path to scan for malware
-y min-size Minimum string length to consider (default=8)
-z min-score Minimum score to consider (default=0)
-x high-scoring Score required to set string as 'highly specific
string' (default: 30)
-w superrule-overlap Minimum number of strings that overlap to create a
super rule (default: 5)
-s max-size Maximum length to consider (default=128)
-rc maxstrings Maximum number of strings per rule (default=20,
intelligent filtering will be applied)
--excludegood Force the exclude all goodware strings
Rule Output:
-o output_rule_file Output rule file
-e output_dir_strings
Output directory for string exports
-a author Author Name
-r ref Reference (can be string or text file)
-l lic License
-p prefix Prefix for the rule description
-b identifier Text file from which the identifier is read (default:
last folder name in the full path, e.g. "myRAT" if -m
points to /mnt/mal/myRAT)
--score Show the string scores as comments in the rules
--strings Show the string scores as comments in the rules
--nosimple Skip simple rule creation for files included in super
rules
--nomagic Don't include the magic header condition statement
--nofilesize Don't include the filesize condition statement
-fm FM Multiplier for the maximum 'filesize' condition value
(default: 3)
--globalrule Create global rules (improved rule set speed)
--nosuper Don't try to create super rules that match against various files
Specifically, let’s talk about --excludegood
,--score
, -rc
and -z
.
YarGen gives each string a “score” based on its ability to indicate a malware file. The higher the score of a string, the higher the probability that files that contain it are malware files.
YarGen also does not completely remove the goodware strings from rules but includes them with a very low score.
The --excludegood
flag forces YarGen to exclude all of the goodware strings found in the YarGen database.
By default, YarGen does not include these “scores” for each string in the resulting rule file.
To see how each string is scored, use the --score
flag to output the scores as comments in the rule file.
The -rc
(maxstrings) flag specifies the maximum number of strings to include in each rule.
The default number is 20, which means that each rule will include up to 20 of the highest scoring strings.
The -z
(min-score) flag determines the minimum score that a string needs to have in order to be included in the rule.
Decoding the Output: yargen_rules.yar
Now that we’ve generated a few YARA rules using YarGen, let’s dive into the rules and learn how to read it!
Each YARA rule generated via YarGen is composed of three sections: meta, strings, and condition.
Image taken from YarGen documentation at https://github.com/Neo23x0/yarGen.
Meta Section
The “meta” section of a rule contains the description, author, reference, date, and hash of the rule.
You can specify the author of the rule via the -a
flag:
python3 yarGen.py -m PATH_TO_MALWARE_DIRECTORY -a "Vickie Li"
And you can specify the reference file or webpage of a rule via the -r
flag:
python3 yarGen.py -m PATH_TO_MALWARE_DIRECTORY -r "https://github.com/Neo23x0/yarGen"
Strings Section
The “strings” section of a rule specifies the strings that are used to identify that particular strain of malware.
YarGen categorizes these rules based on the likelihood of them to be indicators of malware.
There are three categories of these strings, marked by $s
, $x
, and $z
.
Strings that start with $s
(“Highly Specific Strings”) are very specific strings that will not appear in legitimate software.
These strings can include malicious server addresses, the names of hacking tools and malware, hacking tool outputs, and typos in common strings.
For example, sometimes malware files will contain misspelled words like “Micorsoft” or “Monnitor” when it tries to masquerade itself as legitimate software.
Strings that start with $x
(“Specific Strings”) are likely to be indicators of malware files, but might also appear in legitimate files.
Lastly, strings that start with $z
are likely to be ordinary but are not currently included in the goodware string database.
Condition section
Conditions in YARA rules are boolean expressions that specify the additional conditions of that rule.
YarGen uses a combination of a magic header, file size, and strings for the condition section. For example, the conditions in the rule above specify that a file also needs to satisfy the following conditions to be classified as a “backdoor”:
- The file has the magic header of
0x5a4d
. - The file is smaller than 3785 kB.
- All the strings specified in the “strings” section must be present.
To understand more types of conditions that can appear in YARA rules, please read the YARA documentation.
Good Luck!
You can also write YARA rules manually, but in doing that you risk writing rules that are either too specific or not specific enough. YarGen is a fast way of generating YARA rules that are both flexible and comprehensive.