A common theme in metaphysics (and to some extent epistemology as well) since antiquity is understanding the relationships (or differences) between “things” and how we perceive or observe them. Examples extend from Plato’s Allegory of the Cave and forms through Kant’s Transcendental Idealism to modern scientific variations such as Heisenberg’s uncertainty principle. While each possesses distinct differences and are certainly (in many ways) incompatible with each other, the central, fundamental theme remains the same: that a tension exists between objects in pure, unadulterated reality (the “thing in itself” to use Kantian terminology) and objects filtered through our perception (the “thing as it appears”). The very act of perception and subsequent understanding places a barrier between an object as it exists in an unadulterated, pure form and the resulting impression filtered through human sensibility and cognition.
The above discussion and potential constraints may seem deeply irrelevant or inconsequential to our everyday lives, but for the disciplines of intelligence and information security, they have meaning. The classic intelligence progression follows a refinement from raw data collection to information to finished intelligence. When typically considered from the perspective of intelligence analysis and information security, all emphasis resides on data: different (even if overlapping) datasets will produce different (if only slightly) conclusions. This observation seems so obvious and internally consistent as to merit no further discussion – if you would like to read a bit more on some of my previous thoughts on this topic, you can reference this earlier post. As a result, little attention is paid to perceptual or procedural differences in how data is examined. Given that intelligence (and information security) often deal with artifacts several times removed from actual events, such features are critical to the examination process. Short of looking over an attacker’s shoulder for the entirety of an operation, we will always chase shadows of activity as captured in logs, malware, or network traffic.
As a result, a far more interesting and less-explored discussion than comparing different initial data sets lies in the steps between collection and finalized intelligence production. Recently, I saw a comment in a public forum to the effect that there is no difference in “visibility” between entities if the source data is the same – such as a commercially-available dataset like VirusTotal. On its face this may seem obvious and true, but under even slight scrutiny this claim falls apart and displays a complete ignorance of intelligence operations in general, and the intelligence life-cycle (with an emphasis on analysis and production) in particular.
To divert for a moment to non-cyber examples, showing a large dataset to two different individuals will not necessarily result in the same interpretation or understanding of what those datasets mean. From sociology to economics to multiple scientific disciplines using meta-analysis across studies, all fields of statistical and observational research (at least that I am familiar with) are capable of producing different results from essentially the same observations or source data so long as the methodologies underlying such analysis are not identical. The reason for this lies in the methods used to select, analyze, and interpret such data – while it is reasonable to conclude that only one such interpretation can be correct, it does not hold that any existing interpretation must be that correct assessment, thus the various fields of enquiry continue to examine what appears to be previously-explored territory, but through continued refinement of methods and adjustment of assumptions.
Shifting back to information security, the same items (and epistemic limitations) will hold. Among other shaping factors are focus or purpose (who is my customer, what am I trying to support), and methodology (Kill Chain analysis, Diamond Model, structured analytics, etc.). Exposed to the same data (like the VirusTotal repository), different organizations have different visibility into the same dataset based on these factors. To illustrate this concretely, my current organization (Dragos) focuses exclusively on ICS-related items using Diamond Model methodology. As a result, search, sort, and analysis criteria will differ from an organization with either a different focus (e.g., the financial sector or government networks) or different methodologies (such as a Kill Chain approach). While the underlying data – the “thing in itself” – remains the same, through the process of analysis and refinement, perceptions alter what the final “product” or result will look like. Given circumstances, those supposing that all such observations from a shared dataset reflect the same “visibility” are either ignorant of the intelligence process, or worryingly arrogant concerning their ability to derive “ultimate truth” from such data.
An uncharitable view of resulting events would call up the story of the blind men and the elephant – yet a relativistic approach in intelligence, information security, or the sciences is not only undesirable, it is unacceptable. However, short of gaining universal agreement on focus, methodology, and interpretation, the scenario described above (where different entities reach differing conclusions on identical source data) is impossible to avoid. Even then, such universal agreement would be based around one conception or process for analyzing and perceiving data – rather than coalescing around some universal, foundational mechanism of understanding observations “in themselves” as such a view is impossible for indirect observers of events.
While blind tolerance is unacceptable in these circumstances, options exist. For one, demonstration and explanation of methodology seems to be at least a good preliminary step in ensuring third parties understand why a certain conclusion was reached from seemingly similar (or identical) data. In this fashion, clearly stating the underlying mechanisms producing given analysis (which informs others to biases, emphasis, and other possible limitations) can inform a consumer as to what may be missing, or why one conclusion (on similar data) differs from others. Another option, seen in the sciences for decades, is continual analysis and refinement of observation. Rather than simply let initial analysis stand, practitioners should continue refining methodologies, techniques, and other items (especially as new evidence becomes available) to refute past conclusions, refine existing ones, or come up with new explanations entirely.
Irrespective of precise mechanism, thinking that different entities (with non-identical mindsets and analysis techniques) can (or should) produce the same output (i.e., “have the same visibility”) either perpetuates ignorance surrounding the intelligence process, or signifies a worrying overconfidence in one’s own methodology to the exclusion of all others. From an intelligence consumers’ perspective, understanding the differences not only in source data but also in perceptions and potential biases in intelligence producers is critical to formulating actions based upon analysis.