The Attribution Problem – Using PAI to Improve Actor Attribution

May 16, 2019

By Brian Pate, SVP, Babel Street

Within the cyber community, conventional wisdom is that malicious actors can carry out attacks while hiding their true identities. Historically, analysts and investigators have predominantly focused attribution efforts on technical attack aspects, such as digital forensics, malware analysis, and signature analysis. That we’ve yet to fully develop our capabilities or focus efforts at the personal level makes sense, given the technical backgrounds of analysts, the traditional reliance on technical indicators of compromise and the difficulty of analyzing the volume of publicly available information. But by applying advanced tools and analysis to publicly available information (PAI), including deep and dark web data, we can begin to deny malicious actors the cloak of anonymity. Moreover, as sophisticated actors increasingly “live off the land,” repurpose commodity malware, and use cloud infrastructure to continuously change IP addresses, we’re seeing a diminution of the efficacy of technically-focused attribution. Therefore, it’s imperative that we build up our PAI capabilities now.

What is attribution?

Broadly speaking, the objective of attribution is to move from an attack’s technical observables or related digital personas to the true identity of an individual malicious actor, or actors, whether they be nation state-sponsored actors, ideologic actors or criminals. But while the ultimate objective is a real name and location with a high degree of confidence, it’s useful to think of attribution along a spectrum of confidence, with confidence increasing as we gather identifiers that can be used to gain valuable insights about the threat.

At a basic level, PAI analysis finds and links known attack indicators, uncovers unknown indicators, and can yield location, online handles, and email addresses, offline aliases, and affiliations. These, in turn, can often be linked to quasi-identifiers (QIDs) such as gender, age, and date of birth, contained in social-media metadata. Taken individually, none of these indicators are likely to return an identity with a high degree of confidence. But as operators unearth and analyze more leads from PAI sources, each identifier becomes a valuable marker on the road to attribution.

Over time, as operators compile multiple layers of PAI analysis, confidence in the attribution grows, as does the ability to develop insights that come from advanced attribution. Here, operators will begin to discover a malicious actor’s associates as well as their aliases. Creating something akin to a social graph, operators can see the context in which a threat operates. That context makes it possible to begin to determine what a malicious actor’s current activities might be, as well as their planned activities. Moreover, with context, we can make assessments regarding the malicious actor’s capabilities, skill level, strengths, and vulnerabilities. The more context we can create, the greater our degree of confidence.

Why is attribution important?

Anonymity is a tremendous asset for malicious actors because it gives them freedom of maneuver. Hostile nation states and malicious actors can mask their attacks or run false flag operations to discredit competitors. Meanwhile, criminals can steal millions of dollars, secure in the knowledge that they’ll never face justice.

By denying malicious actors their anonymity, we gain several advantages. We make it more difficult for state actors to carry out their own attacks or use proxies. Along similar lines, attribution can deny malicious actors the freedom of maneuver they need to operate in cyberspace. In the case of cybercriminals, attribution is the prerequisite to assigning legal liability and, where possible, mounting successful prosecutions.

Attribution, even partial attribution, also provides key operational advantages to the defender. The more you know about a threat, including their identity, the better your ability to mount an effective defense. For example, if you know with a high degree of confidence that a malicious actor has capabilities and interest in developing PHP exploits against e-commerce portals, a defender can target their vulnerability identification efforts, expedite patching and configuration hardening, conduct deliberate scans for breaches, and generally act to mitigate the threat. This improves a CSO’s ability to better manage risk and make more informed risk decisions.

By achieving attribution with a high level of confidence, properly authorized organizations can also take proactive countermeasures, where appropriate. One possible countermeasure might be infiltrating a forum to dox the malicious actor, burn their alias, or sow discord within the threat community. Another possible countermeasure might be to carry out a hack of the malicious actor’s system to either steal the tools they need to carry out their attack or insert malware in order to destroy or disrupt their system.

Finally, it’s important to point out that attribution brings much-needed transparency to the cyber ecosystem. Over time, that transparency can have a deterrent effect on state actors and criminals, forcing them to consider whether carrying out their malicious attacks are worth the price of compromising their identities. Certainly, some attacks will always be worth the cost for malicious attackers, but by using PAI to deny those malicious actors the certainty of anonymity, we can deter and disrupt attacks at scale.

The problem

While there are myriad methods for attribution, it’s useful to think in terms of two general categories. One category begins with a persona search seed, such as an email address, social media handle or username. The second category begins with a technical indicator, such as a code snippet, registry value, IP address or domain name. I’ll discuss each separately, but in a real-world scenario, both methods typically work in tandem.

A persona search seed can be a valuable lead for querying a variety of sources. Those sources can run the gamut from the open internet to the deep web, to the dark web. Forums, social media, and news sources can often yield artifacts that further the investigation. Ultimately, each artifact increases the investigator’s ability to correlate the information they find, allowing them to drive toward a malicious actor’s true identity with a high level of confidence.

Of course, using a persona search seed can feel a lot like looking for a needle in a haystack—or more accurately, multiple needles in seemingly unrelated haystacks. But even the most sophisticated malicious actors are susceptible to unmasking because they frequently must use public-facing personas, such as email addresses, to launch their attacks. Furthermore, while a sophisticated actor may practice strong tradecraft, their associates may not. By building context with PAI, investigators can expose the weakest link in the chain and then exploit that advantage to develop attribution of the primary threat actor. Finally, it’s important to note that many malicious actors, especially criminals, practice sloppy tradecraft. In many cases, criminals boast about their exploits, and often times those boasts are made in time-stamped forums that allow investigators to intuit their approximate location, intentions, associations, and patterns of life.

Of course, a technical indicator, such as snippets of code, malware, and IP addresses can also be a starting point that leads to attribution. Sometimes, a simple file name might provide an artifact that can be used to run a PAI query. Other times paste bins and further accessible documents that support technical collaboration also contain email addresses or handles that can be used in a PAI query. Increasingly, as malicious actors repurpose commodity malware coupled with novel, public-facing command and control infrastructures, they’re more likely to leave artifacts useful for attribution sprinkled throughout the attack framework. With the right tools and methods, analysist’s can follow these techniques leads to improve attribution.

Whether used separately or in tandem, both approaches can provide valuable starting points for PAI inquiries. In time, as investigators assemble more artifacts and build a context around their targets, pulling QIDs from associated metadata, the inquiry can reveal the location, associates, aliases, and true identities of malicious actors. Just as important, artifacts can also provide insight into past, present, and future activities.

Use case #1: Oleksandr Ieremenko

In 2016, the U.S. Department of Justice secured a guilty plea from a New Jersey man who was part of a complex insider trading scheme that exploited confidential information stolen from three separate business news wire services. Prosecutors alleged that the scheme netted tens of millions in illegal profits. The indictment also identified the still-at-large technical mastermind who hacked into the business wires—a Ukrainian citizen named Oleksandr Ieremenko

Mining the indictment for names, email addresses, online handles and other identifiers, we were able to run a series of PAI queries just on Ieremenko. Obviously, we already had a true identity to go on, but the query was fruitful for several reasons. First, we learned a lot about Ieremenko’s associates. While these malicious actors weren’t indicted, learning who they were and where they operated gave us a better context for understanding the types of malware and tools Ieremenko typically sought to secure. In turn, that information turned up a lot of useful information about Ieremenko’s skills, capabilities, and past targets. We were even able to assess, with a high degree of confidence, what sorts of targets and schemes Ieremenko was working on before, during and after the indictment.

Use case #2: The Iranian Professor

Using an email address associated with a spear-phishing campaign, we ran a PAI query. As it turned out, this hacker employed sloppy tradecraft by failing to more fully obfuscate QIDs associated with the creation of the email address. While sloppy tradecraft may sound like a lucky break, the key point is that hackers are human. They make human mistakes because they’re lazy, careless, poorly trained, pressed for time, etc. These mistakes leave behind artifacts that analysts and investigators can exploit.

In any case, our PAI query told us that the hacker was an Iranian professor. With her true identity, we were able to discover her location, associates, and develop information about her activities, past, present, and future. Just as important, we were able to reduce the likelihood that we were meant to discover her true identity as part of a false flag operation.

Unfortunately, this hacker hasn’t been brought to justice—and likely won’t be. Nevertheless, her identity, area of operation, skill-level, and modus operandi provides a powerful check on her operations going forward.

Conclusion

Attribution is an important component of full-spectrum cyber operations, and as attribution using purely technical methods becomes more difficult, we should improve our ability to use PAI to further the objectives of advanced attribution. While there is no single, silver-bullet solution for ending malicious cyber activity, by making life more difficult for malicious actors, we can more easily disrupt and deter the threats they pose. Just as important, as we learn the names of malicious hackers around the world and increase what we know about them, we shine a light into cyberspace that makes it easier to parse the signal from the noise. Knowing who poses the threat, where they’re attacking from, what they intend to do, and the extent of their capabilities can boost our defenses and countermeasures immediately, and in the long run, inform the framework we need to build to address a range of threats.

About the Author

Brian currently serves as the SVP for Babel Street’s Federal Civil business, where he is responsible for Babel Street customer engagements spanning the Executive and Legislative Branches, to include the Departments of Justice, State, and Homeland Security. Prior to this position, Brian served as the Current Operations officer at Marine Forces Cyberspace Command and Global Plans lead at Joint Task Force Ares, where he was responsible for planning and executing full spectrum, global cyberspace operations. In this capacity, he also ran several 24/7 operations centers responsible for crisis planning and crisis response. Brian is a graduate of Georgetown University’s School of Foreign Service and several advanced military courses. He is a member of the SANS GIAC Advisory Board and a board member of the Capitol Hill Community Foundation.