The idea of naming or labeling items has a fraught intellectual history. Broadly speaking, we, from an intellectual history standpoint, moved from an Aristotelian approach where names and identifications of objects and such fundamentally mean something based on concrete descriptions, to a perspective rooted in Kantian transcendental idealism where names are merely a collection of observations built into a description divorced from the “thing in itself.” Enter Saul Kripke, who argued that names instead represent “rigid designators” – that in all possible worlds, the name would reference the same “thing”, even in those cases where some of the descriptors of the “thing” no longer apply. Quite a revolutionary approach, and this review of Kripke’s Naming and Necessity by Richard Rorty is a nice overview of the argument for those interested.
For those who have stuck around through that introduction, I bring this up in light of recent discussions around naming and attribution within the cyber threat intelligence space. My employer, Dragos, recently began listing all activity groups we track. In response, I have seen responses ranging from confusion (“Wait, is this another name for LAZARUS?”) to derision (“This is just a marketing ploy”). I am sympathetic to the former, and dismissive of the latter, for related reasons.
First, what’s in a name? We use a name to describe something for certain, to address it directly or signify it to others. Therefore, a name must, to some degree, communicate a sense not only what something is but also implicitly what it is not. So referring to a “cat” means I have designated some entity as a “cat”, but also implicitly stated that it is not a “dog”. Names thus are both inclusive (all of these things belong under this label) and exclusive (and by naming something, I have excluded it from being all these equivalent type names). The question is, do we have this in cyber security?
To take a very popular and readily-recognized example, let’s look at “FancyBear” and “APT28”. A common conception is that these are equivalent names for the same “thing” – Russian-sponsored (or Russian-controlled) intrusion activity targeting a variety of organizations with a fairly standardized toolkit. Therefore the names would appear equivalent: both appear to refer to the same fundamental “thing” based on the accumulation of descriptions or observations of said “thing”. Yet, I don’t think this is the case based on how these names are created.
CrowdStrike and FireEye, in assigning their respective names to observed activity, operate with different sets of observations. In some cases, CrowdStrike has event and response data to add to its description where others lack this same information, in other cases FireEye possess this data and others do not. As a result, as both parties begin building their conception of the “same” group, they do so with different data sets. There may be significant overlap between them, but unless all operations by the entity are exactly identical across time and space, they will have observed and incorporated different information at different times leading to broadly similar but in some ways differing conclusions or references.
Looking at names as an accumulation of experiences, then “FancyBear” and “APT28” are different as they do not completely share the accumulation of observations giving rise to their respective names. The two may have many similarities – just as “cat” and “ocelot” do – but nonetheless remain fundamentally different. Conversely, do these names meet the threshold of “rigid designators”? One would like to think so, that in all possible worlds a “Russian GRU” (or whatever ultimate organization we believe responsible for the activity) exists and possesses a cyber espionage program, a “FancyBear”/”APT28” will exist. But is that really what these names refer to?
No, these names – despite assertions to the contrary by others – do not ultimately describe the underlying responsible, directing agency. Instead, these names encapsulate a set of observations, either direct or indirect, that appear related to a single activity. Thus attempting to move from a descriptive-based fundamental methodology toward a strict identification results in confusion and potentially failure by ascribing greater confidence and certainty than is actually warranted.
Which brings me to not just the general problem with threat naming, but also why I contributed to and defend the use of a “new” set of names for seemingly already-existing entities in my current role. Much, if not all, of the impetus for this action comes from availability of data and visibility – a point previously addressed for attribution. In cases where data is imperfect and motivations are impossible to (completely) discern, then traditional “who dunnit” assignment not only makes little sense as the evidence does not support it, but moreover may even lead to mistakes. For example, if one ties a specific piece of malware to “FancyBear”, and this malware appears in an intrusion, the next step will be to assign that intrusion to the actor based on a previously-made correlation – but what if “FancyBear” didn’t do it? What if instead someone wanted the event to look like ”FancyBear”? Or what if the primary tool developer for “FancyBear” decided to get a new job and as a result some new or previously untracked entity – with different targets and goals – now is lumped in with “FancyBear”?
Some might think these are “stretches” but all of the above scenarios not only are plausible, but to differing degrees have occured in the past. The result of considering a name like “APT28” a rigid designator when this is really just a collection of observable items within one’s own corpus of data is a term that means far less than it is capable while projecting more authority than justifiable.
In saying this, I have used the example above not to say FireEye or CrowdStrike are uniquely bad in this space – quite the contrary, both companies have driven the threat intelligence process forward and form much of the foundation for the discipline within the private sector. But in the process of catering to a fundamental human need to identify activity with an actor combined with a sense of authority on the related subjects reported on leads to assertions that cannot bear the weight that is actually placed on them. Thus we wind up with “adversaries” such as “APT28” or “LAZARUS” based on a loosely associated grouping of overlapping technical observables combined with assumptions of the assigned sponsoring nation state’s strategic interest. This certainly makes people “feel” better, but I think it loses value in trying to act on data.
So – why activity groups then? As the name implies, this is grounded in observed activity and makes no pretense of going beyond this epistemic limit. Essentially, the approach first and foremost claims to know no more than what is directly observed about a given activity, and groups these into entities (activity groups) based on correlations in activities for given operations. Thus, a single organization – such as an “APT28” or a “LAZARUS” – observationally consists of multiple activity groups, whether based on tooling, targeting, or infrastructure. Likewise, different observed activity groups – such as Dragos’ “ALLANITE” and “DYMALLOY” – may truly belong to the same overall organization. But the key point is, from directly and immediately observable data, the activities themselves are distinct.
A follower of intellectual history, in light of this essay’s opening, may think I have just introduced a regression in how we conceive and build knowledge within this space. Instead, I would argue that I have simply highlighted fundamental weaknesses in our ability to observe and capture all elements of an intrusion, campaign, or other attack – and rather than hide or run from this uncomfortable truth, embracing it instead to develop the best mechanism for our less-than-ideal circumstances. In this case, we can devise activity groups that simply act as categorical and organizational placeholders for observed events within available data. Making a claim such as “LAZARUS == COVELLITE” or “CHRYSENE == OilRig” is not necessarily incorrect, rather given the apples-to-oranges nature of the comparison it is simply nonsensical. COVELLITE may very well be part of some overarching monstrosity of combined actions we refer to as LAZARUS – but if you’re defending electric utility operations, do you want to focus on ‘LAZARUS’ or just the parts thereof that appear focused on penetrating your facility? CHRYSENE and OilRig feature significant technical overlap (with some key differences), but targeting makes it seem at least that what’s tracked as CHRYSENE is one team operating under a broader banner sharing some tools and techniques.
Essentially: an activity group is a variable name. The name itself is inherently meaningless, and by design should NOT invoke correlation with a country, animal, or target. Rather, this is merely a logical construct under which to group observed items from an attack, campaign, or series of intrusions for purposes of reference. That’s it. As a result, one should also expect activity groups to be non-durable. Rather than continually refining what a name means over time as observed actions, targets, or other aspects shift, as seen with conventional naming, activity groups simply disappear – and new ones rise in their place.
From the perspective of national security decision-makers and similarly-situated individuals who must make concrete decisions on responsibility and culpability, the activity group concept is unhelpful. However, the audience for activity groups is not the National Security Council or NATO, but instead on-keyboard defenders and stakeholders that need, above all else, an understanding of what they face in an attack. By avoiding the baggage and overconfidence of classical naming schema – whether “funny adjective plus representative noun” or “acronym plus number” – defenders can push past the potential pitfalls of “who-based” assignment (e.g., North Korea would never do that, or I’ll never be a target for the USA) to simple, actionable, details: when THESE behaviors are observed, they correlate highly with THIS overall grouping of activity THEREFORE I should anticipate THESE potential follow-on or preparatory actions.
So, was Dragos (and by extension, myself) just trying to generate marketing “buzz” with some new goofy names? No. At least from my perspective, which is focused on actual users and beneficiaries of the reporting, instead I am attempting to shift the narrative toward something that is more representative of the data we have on hand, more useful to those responsible for acting on events, and avoiding all baggage and distraction that comes from a set of attribution dice. The public reaction thus far (at least from the “threat intelligence mafia”) indicates there’s a lot more messaging and convincing to do. But at the operator level, I really think we’re making gains, and I think those persons – and the organizations they defend – will be better for it.