YARA – or “yet another regex alternative” – is a pattern matching tool with multiple uses but extensive application in malware analysis and alerting. The framework itself is simple, relatively easy to understand (especially on basic string matching), and incredibly flexible. Yet in application and advertised use, YARA is often limited to a signature-like use after very specific examples of malicious software. While this is not a “bad” use per se, it is artificially circumscribed from what YARA allows security researchers to do and hunt for within environments. The result are both overconfidence in YARA’s capability (as implemented by researchers) and derision by other researchers for YARA simply re-creating signature analysis found in malware analysis (antivirus) engines.

In typical application, YARA is used as another signature-based detection method for malware. An excellent overview of YARA (both in general and for this application) can be found in a recent SANS paper written by Christopher Culling. A more specific example in terms of precise application and consequences – which resulted in some heated discussion – comes from late last year surrounding TRISIS/TRITON malware. At the time, FireEye researcher Nick Carr published a YARA signature as part of this event and stressed that it was active for months without registering false positives. From the standpoint of focused detection on a specific threat, this was a very effective, highly-targeted rule to catch a very specific malware instance observed in a specific, concerning attack. Yet in the very process of designing an effective, “false positive-less” detection in this manner, the rule itself is limited (deliberately) to catch ONLY the specific instantiation of malware observed in the TRISIS event – in this case, accepting a potential large number of false negatives for similar-targeting software but with some variation. To paraphrase a concept briefed by Matthew Dunwoody and Daniel Bohannon (which I unfortunately have only heard second-hand but hope to correct this in the future), the signature was effective – but not necessarily “resilient”. As a result, the detection method applied is very effective for a specific event as observed, but likely irrelevant for variations on the attack. In essence, a signature is created for just the observed implementation of a safety system attack framework.

This leads direct to the second criticism highlighted earlier, which I most often hear from my colleague Jimmy Wylie. His critique is that YARA simply recreates malware detection already – or soon to be – present within antivirus products. Such a stance is understandable given the application of YARA in many instances, such as that highlighted in the previous paragraph, but also dramatically underestimates the versatility of YARA for detection and alerting purposes. When applied in the “traditional” fashion, YARA simply represents an open-source, flexible competitor to traditional antivirus. While this has its uses (e.g., identifying very specific malware items that might not make broad-based AV detections such as the TRISIS example, or adding detections before an AV signature set can be updated), the ultimate functionality – and limitations – remain the same.

Both of the above examples fault on their narrow conception of what YARA can achieve. YARA can certainly be used for very specific detections against exact malware types, families, or even samples – but doing so completely ignores the flexibility YARA affords to detect suspicious – if not outright malicious – functionality within files. From a hunting perspective, we as defenders do not wish to pursue (at least not exclusively) definitively known-bad items – presumably, other security controls will already handle these items (such as commercial antivirus). Instead, threat hunting should look for indications of possible malicious activity by searching for items indicative of a potential attack requiring further research and analysis. From a YARA perspective, this means not hunting for an exact malware signature, but instead indicators within a file that signify possible malicious activity. While there have been some posts and potential training on this subject previously, I have been unable to find any readily-accessible that also really focus on the “hunting” aspect (as defined above).

To provide an example of what I mean by hunting with YARA, take the following, very simple rule:

rule odd_creation_date

{

meta:

description = "Identifying times either in the distant past or future for alerting."

author = "Joe Slowik, Dragos Inc"

condition:

pe.timestamp > time.now() or  pe.timestamp < 946684800

}

 

The above just looks for portable executable files with a compilation timestamp either in the future, or earlier than a specified date. On its own, this is a detection point – indicative of something suspicious, but not necessarily malicious. But as a detection point paired with contextual information, this very simple rule can become quite powerful: for example, when paired with or applied as a search following a filter for “new” binaries in the monitored environment, or binaries downloaded from unknown or previously not-seen websites. Even then, applications may be limited. But as a “hunt” hypothesis – new binaries with odd timestamps are likely malicious – an analyst can begin identifying items of interest that may evade other controls such as signature alerting.

Slightly more complex, the following looks for signs of programmatic obfuscation or packing:

rule suspicious_pe_entropy

{

meta:

description = "Identify PE sections with high levels of entropy indicating encoding."

author = "Joe Slowik, Dragos Inc."

condition:

uint16(0) == 0x5a4d and pe.number_of_sections > 2 and

for any i in (0..pe.number_of_sections -1): (

math.entropy(pe.sections[i].raw_data_offset, 

pe.sections[i].raw_data_size) >= 7.4)

}

 

In this case, looking at PE section entropy becomes a proxy for identifying obfuscation or packing, with the rule iterating through each section to determine if any meet a threshold. Some legitimate software features these aspects – to prevent reverse engineering or competitor analysis – but again as an indicator combined with other observable items, the above can serve as an initial input for hunting (or monitoring) exercises to identify new, not-yet-known malicious activity.

Finally, document file formats are often leveraged to deliver malicious payloads. While macros are popular, embedded ActiveX objects – such as Flash objects or PE files – can similarly be used to initiate exploitation and infiltration.

rule embedded_activex

{

meta:

Description = "Rule to identify embedded ActiveX objects in Documents."

Author = "Joe Slowik, Dragos Inc."

strings:

$header1 = { D0 CF 11 E0 }

$header2 = { 50 4B 03 04 }

$flashHeader1 = { 46 57 53 }

$flashHeader2 = { 43 57 53 }

$mzHeader = {4d 5a }

$wmfExtension = ".wmf" nocase ascii wide

$activex = "word/activex/activex" nocase wide ascii

$activex_reg = /word\/activeX\/activeX[1-9]\.bin/ nocase wide ascii

$active = "active" nocase ascii wide

condition:

1 of ($header*) and (#activex > 0) and (1 of ($flashHeader*) or $mzHeader or $wmfExtension) and #active < 170 and $activex_reg

}

Again, this rule will catch “legitimate” items – but paired with other observations and implemented as a hunting (if not alerting) technique, can be leveraged to identify malicious document files evading traditional controls.

There are two, complementary ideas underlying the above examples: first, YARA can be utilized to look for indicators of suspicious behavior; second, YARA need not be confined to exact signatures for known malicious activity. The result are YARA signatures that are, to some degree, “fuzzy”: on their own, they may not mean much (and could lead to a large number of false positives depending on environment – so DO NOT USE THESE ITEMS IN ISOLATION). But in concert with other, enriching data points or analysis, such YARA detections can become quite effective as leading indicators for new, not previously observed malicious activity.

Yet there’s also a third case of leveraging YARA for hunting purposes, which takes explicit advantage of human foibles and laziness. In this case, adversaries – even advanced ones – are human, and either prone or required to reuse environments, development materials, and codebase from prior operations. As detailed by a former colleague of mine, Micah Yates, in his APT3/Pirpi talk at ReCon 2017, an “advanced” adversary used nearly-identical functions across multiple malware variants for almost 10 years. The functions in question which demonstrated code reuse (as well as continued use of the same compilation environment) aligned to functions such as timestomping and other “utility” items – functionality that’s common enough to be used in various versions, and where an institution (consisting of multiple, labor-saving individuals) is likely to simply reuse what already works. The result: hunting for attackers can leverage unique implementations of “common” functions (such as socket creation or data wiping) to detect future variants that would otherwise not be detected by other security solutions. For example, the registry manipulation item associated with the “wiper” functionality in CRASHOVERRIDE/INDUSTROYER:

rule crashoverride_wiperModuleRegistry

{

meta:

description = "Registry Wiper functionality assoicated with CRASHOVERRIDE"

author = "Joe Slowik, Dragos Inc"

strings:

$s0 = { 8d 85 a0 ?? ?? ?? 46 50 8d 85 a0 ?? ?? ?? 68 68 0d ?? ?? 50 }

$s1 = { 6a 02 68 78 0b ?? ?? 6a 02 50 68 b4 0d ?? ?? ff b5 98 ?? ?? ?? ff 15 04 ?? ?? ?? }

$s2 = { 68 00 02 00 00 8d 85 a0 ?? ?? ?? 50 56 ff b5 9c ?? ?? ?? ff 15 00 ?? ?? ?? 85 c0 }

condition:

uint16(0) == 0x5a4d and all of them

}

 

This very small signature on three bytecode elements zeroes in on very specific functionality within the CRASHOVERRIDE wiper module. The above signature tracks an observed malware instance and application, but also seeks to provide longer-term functionality by focusing on a distinct function deployed within the malware: remapping Windows system service values to null. The hypothesis behind the YARA is simply: this functionality will likely be reused by the same (or similar) adversaries, developing a function-specific detection will catch new malware types that reuse the same underlying codebase and functionality. While signature-like, this rule is designed to look for other cases or samples that re-use or similarly implement the same underlying functionality – thus creating a broader base for detection than a classic, narrowly-tailored signature. What was especially interesting about this detection is that the OlympicDestroyer malware leveraged a technique that, superficially, was very similar to CRASHOVERRIDE’s wiping function: remapping critical Windows system service registry values to prevent a successful, functional reboot. Yet when analyzed in how this functionality was implemented, the two look quite different – which (along with other data points) allows a researcher to discount potential technical links between the two activities.

Based on the above, security teams can leverage tools such as YARA to create flexible, malleable tools for threat hunting and analysis within their environments by developing signifiers of suspicious behavior (such as time stomping, encoding, or hiding active code). This contrasts to a signature-focused approach, which is very useful to detect known-bad items but will likely be limited when trying to identify new variants or simply new families of malicious software. Underpinning all of this is an understanding of suspicious behaviors that track with malicious intent – what observables within files (either as pure strings, byte code instructions, or metadata surrounding the file itself) can serve as direct or indirect cues that can lead to the identification of behaviors indicative of malicious activity. More information on behavior-based threat detection and analysis can be found in previous items.

Ultimately, the above is presented not to say that signature-based applications – whether YARA or antivirus – are wrong or bad but rather that they have capabilities and limitations. Understanding these limits is vital in order to recognize where visibility and detection gaps may exist. Similarly, this hunting approach – especially if done in isolation – can quickly overwhelm an analyst with suspicious items to investigate. Only by pairing this approach with other observables as part of a well-structured hunting hypothesis can analysts begin arriving at valuable, actionable data which can then be leveraged to define follow-on detection items. Thus this post is meant to highlight a technique that appears to be often overlooked, but with an ending emphasis that robust, effective security organizations will apply both approaches in defending networks.


1 Comment

Harlan Carvey · 08/17/2018 at 07:44

Thanks for this, this is great stuff.

When it comes to hunting, there is value in both tactical indicators (looking for very specific things) and strategic indicators (sweeping up general things or contents of data sources), and I think that this sort of discussion is valuable.

Comments are closed.