Trojan Source attack method allows hiding flaws in source code

November 3, 2021

Researchers devised a new attack method called ‘Trojan Source’ that allows hide vulnerabilities into the source code of a software project.

Trojan Source is a new attack technique demonstrated by a group of Cambridge researchers that can allow threat actors to hide vulnerabilities in the source code of a software project.

The technique could be exploited to inject stealth malware without impacting the semantics of the source code while changing its logic.

“We present a new type of attack in which source code is maliciously encoded so that it appears different to a compiler and to the human eye. This attack exploits subtleties in text-encoding standards such as Unicode to produce source code whose tokens are logically encoded in a different order from the one in which they are displayed, leading to vulnerabilities that cannot be perceived directly by human code reviewers.” reads the paper published by the experts.

Trojan Source attacks pose a severe risk to software organizations and could allow supply-chain attacks across the industry.

The researchers exploited two vulnerabilities, tracked as CVE-2021-42574 and CVE-2021-42694, that affect compilers of most popular programming languages, including C, C++, C#, Go, Java, JavaScript, Python, and Rust.

The researchers discovered ways of manipulating the encoding of source code files so that human viewers and compilers see different logic. One of these techniques leverages Unicode directionality override characters to display code as an anagram of its true logic.

The issue concerns Unicode’s bidirectional (or Bidi) algorithm which allows supporting both left-to-right (e.g., English) and right-to-left (e.g., Arabic or Hebrew) languages.

“Bidi overrides will typically cause a cursor to jump positions on a line when using arrow keys to click through tokens, or to highlight a line of text character-by-character. This is an artifact of the effect of the logical ordering of tokens on many operating systems and Unicode implementations. Such behavior, while producing no visible changes in text, may also be enough to alert some experienced developers.” continues the paper.

The researchers demonstrated that an attacker could use control characters embedded in comments and strings and change the logic of the source code by reordering it to trigger the above vulnerabilities.

“Bringing all this together, we arrive at a novel supply-chain attack on source code. By injecting Unicode Bidi override characters into comments and strings, an adversary can produce syntactically-valid source code in most modern languages for which the display order of characters presents logic that diverges from the real logic. In effect, we anagram program A into program B.” continues the paper. “Such an attack could be challenging for a human code reviewer to detect, as the rendered source code looks perfectly acceptable.”

Expert also warns of varients of the Trojan Source attacks that use homoglyphs, in these attacks threat actors leverage
characters that look the same, such as the Cyrillic letter ‘х’ which typically renders identical to the Latin letter ‘x’ used in English but that occupies a different code point.

The attackers could use this trick to create a homoglyph function that seemingly looks similar to the original one but that actually contains a malicious code.

“The fact that the Trojan Source vulnerability affects almost all computer languages makes it a rare opportunity for a system-wide and ecologically valid cross-platform and cross-vendor comparison of responses,” concludes the paper.

Follow me on Twitter: @securityaffairs and Facebook

Pierluigi Paganini
International Editor-in-Chief
Cyber Defense Magazine