From Samsung to the Pentagon – Recent Stories Remind Us About the Importance of Sensitive Data Guardrails

Thomas Segura

August 15, 2023

By Thomas Segura, Cyber Security Expert, GitGuardian

The last few weeks have been very challenging for data protection experts. In a very short period of time, several stories have dealt a blow to the efforts of businesses, consumers, and governments to protect sensitive data.

From a 21-year-old leaking classified military intelligence to Discord, to Samsung employees leaking corporate secrets to ChatGPT, to third-party developers rushing to develop OpenAI-based apps and leaking API keys in their code, all these stories have a lesson to remind us: the potential for sensitive data leaks is ever-present and necessitates fundamental protective measures within organizations.

Sensitive Data Exposure

One common thread among these stories is the human factor. In the case of Samsung, employees reportedly uploaded sensitive information to ChatGPT on three different occasions just three weeks after the South Korean electronics giant allowed employees access to the generative AI tool.

Similarly, third-party developers rushing to create OpenAI-based apps were storing API keys in plaintext, as revealed by Cyril Zakka on Twitter recently:

As highlighted in the thread, storing API keys within an application package “makes it particularly easy to extract since they’re made available in plain string without requiring any fancy tooling or much effort.”

This is exactly what GitGuardian’s State of Secrets Sprawl has been monitoring and reporting on for several years: the number of hard-coded credentials continues to expand at an accelerated rate. In fact, in its latest release, the report indicated an alarming growth of 67% year-over-year in the number of secrets found on public GitHub every year. GitGuardian detection engine scanned 1.027 billion new commits in 2022, finding 10 million secrets occurrences.

If we zoom in to watch more specifically leaks of OpenAI API keys on GitHub, the results speak for themselves:

In short, when it comes to measuring the popularity of a red-hot new technology, such as OpenAI’s GPT, one of the best metrics is measuring the number of related secrets leaked on public GitHub.

What is fueling this rocket? In the IT world, digital authentication credentials, such as the API keys we’ve been talking about, but also certificates and tokens, are the glue between applications, services, and infrastructures. These components are much more numerous today than they were a few years ago. Stacked together, they form the large majority of today’s apps. For example, according to BetterCloud, the average number of software as a service (SaaS) applications used by organizations worldwide has increased 14-fold between 2015 and 2021.

Individuals working in the industry tend to casually insert sensitive information, such as secrets, directly into configuration files, scripts, source code, or even private messages by convenience. This practice of hard-coding secrets leads to a significant increase in what we call “secrets sprawl,” where these sensitive pieces of information can spread across various code repositories as they are cloned or shared without proper protection.

Although these credentials leaks might not always represent an immediate threat, the growing number of secrets exposed on GitHub every year is a red flag highlighting the need for software-driven organizations to prioritize secure coding practices and keep sensitive information and secrets out of source code.

Increasingly Serious Consequences

Finally, as if we needed another reminder that sensitive data protection has real consequences, another recent event has shaken one of the most powerful institutions on the planet.

The “Pentagon leak,” a case under investigation, refers to a massive leak of top-secret military intelligence on a private Discord server—a popular gaming chat platform— that spread through the web in early April. According to the press, the documents included some of the most sensitive information for the USA, such as Ukraine-Russia war prospects and thousands of intelligence reports. The incident is already having international repercussions, such as a heightened suspicion of eavesdropping from United States’ allies.

Beyond questioning the advancement of technologies used to safeguard military secrets, the leaks are a bitter reminder that even the most robust security protocols can be compromised by human error or malintent.

Conclusion

The recent headlines about sensitive information leaks highlight the urgent need for organizations to prioritize protecting their sensitive data. From corporate trade secrets to classified government documents, no organization is immune to the risks of data leaks. The human factor is a common weak point in security protocols, making it crucial for organizations to prioritize employee training and secure coding practices.

In the field of software, programmatic credentials or secrets are one of the most sensitive data. As recent breaches have illustrated, their compromise can lead to a full takeover of an organization’s IT systems.

GitGuardian, a cybersecurity company, specializes in identifying and preventing hard-coded secrets, providing organizations with the tools they need to keep their sensitive data safe. We believe that prevention is the best defense against data leaks. Using our platform, organizations can identify leaked secrets before they become a vulnerability, protecting their sensitive data and mitigating the risks of reputational damage, revenue loss, and legal liabilities.

Contact us today to learn more about how we can help protect your sensitive data and get a free complimentary audit of your secret leaks on public GitHub.

About the Author

Thomas Segura, Cyber Security Expert, GitGuardian. Thomas has worked as both an analyst and a software engineer consultant for various large French companies. His passion for tech and open-source led him to join GitGuardian as a technical content writer. He now focuses on clarifying the transformative changes that cybersecurity and software are undergoing.

Thomas can be reached online at our website: https://www.gitguardian.com/ or Twitter: https://twitter.com/GitGuardian and LinkedIn: https://www.linkedin.com/company/gitguardian.