By Jeremy Thomas, GitGuardian CEO
When we started working on GitGuardian’s detection algorithm and got the first detection results, we could not believe it. We were facing a very counterintuitive reality. Secrets were actually hard coded in source code and available for all to see on public GitHub. And not just developers’ personal secrets but also corporate secrets ending up on developers’ personal repositories outside of corporate control.
After scanning billions of commits each year on public GitHub, we wanted to share our findings and we issued our first State of Secrets Sprawl on public GitHub report. The report, which is based on GitGuardian’s constant monitoring of every single commit pushed to public GitHub, indicates an alarming growth of 20% year-over-year in the number of secrets found. A growing volume of sensitive data, or secrets, like API keys, private keys, certificates, username and passwords end up publicly exposed on GitHub, putting corporate security at risk as the vast majority of organizations are either ignoring the problem or poorly equipped to cope with it.
A major blind spot in application security
What companies ignore most of the time is that only 15% of leaks on GitHub occur within public repositories owned by organizations. 85% of the leaks occur on developers’ personal repositories. Secrets present in all these repositories can be either personal or corporate and this is where the risk lies for organizations as some of their corporate secrets are exposed publicly through their current or former developer’s personal repositories.
GitHub is more than ever “The Place to Be” for developers when it comes to innovating, collaborating and networking. GitHub gathers more than 50 million developers working on their personal and/or professional projects. When 60 million repositories are created in a year and nearly two billion contributions added, some risks arise for companies even if they don’t use GitHub or open source their code, because their developers do.
A growing issue linked to componentization of applications
As architectures move to the cloud and rely more on components and applications, the growth of commits occurring and the use of digital authentication credentials has increased the number of secrets detected. To compound the problem companies are pushing for shorter release cycles, developers have many technologies to master, and the complexity of enforcing good security practices increases with the size of the organization, the number of repositories, the number of developer teams and their geographical spread.
As Anne Hardy CISO of Talend states it, “We launched an audit using GitGuardian, and several leaked secrets were brought to our attention. What was very interesting and what we didn’t anticipate was that most of the alerts came from the personal code repositories of our developers.”
Using our secrets detection engine, we have found over 2 million secrets on public GitHub in 2020 which is about 20% more compared to previous year. The type of secrets found include google keys, keys from development tools, data storage, payment systems, cloud providers and so on.
Why is this happening?
Usually these leaks are unintentional, not malevolent. They happen because developers typically have one GitHub account that they use both for personal and professional purposes, sometimes mixing the repositories. It is also easy to misconfigure git and push wrong data and it is easy to forget that the entire git history is still publicly visible even if sensitive data has since been deleted from the actual version of source code.
A need for automated secrets detection
Companies can’t avoid the risk of secrets exposure even if they put in place centralized secrets management systems. These systems are typically not deployed on the whole perimeter and are not coercitive as they do not prevent developers from hardcoding credentials stored
in the vault.
Solutions are available for them to automate secrets detection and put in place the proper remediation, but the market is far from mature on this subject. The reality is most organizations are operating blind. Most leaks of organization’s credentials on public GitHub occur on developers’ personal repositories, where organizations often have no visibility, let alone the authority to enforce any kind of preventive security measures. Companies need to scan not only public repositories but also private repositories to prevent lateral movements
of malicious actors.
Some best practices can be followed to limit the risk of secrets exposure or the impact of a leaked credential:
- Never store unencrypted secrets in .git repositories
- Don’t share your secrets unencrypted in messaging systems like slack
- Store secrets safely
- Restrict API access and permissions
Developers training programs should be put in place although these do not eradicate the risk of leaked credentials.
Following best practices is not sufficient and companies need to secure the SDLC with automated secrets detection.
Choosing a secrets detection solution they need to take into account:
- Monitoring developers’ personal repositories capacities
- Secrets detection performance – Accuracy, precision & recall
- Real-time alerting
- Integration with remediation workflows
- Easy collaboration between Developers, Threat Response and Ops teams.
There are millions of commits per day on public GitHub, how can organizations look through the noise and focus exclusively on the information that is of direct interest to them? How can they make sure their secrets are not ending on their developers’ personal repositories
on GitHub? They can’t avoid that developers have personal repositories, they need automated detection and efficient remediation tools. In this state of secrets sprawl on GitHub analysis we focused on secrets although this is not the only sensitive information that can end up being
publicly exposed: Intellectual Property, personal and medical data are also at risk.
About the Author
Jérémy Thomas, co-founder of GitGuardian, is an engineer & an entrepreneur. He graduated from Ecole Centrale in Paris. He first worked in finance and then began his entrepreneurial journey by first founding Quantiops, a consulting company specializing in the analysis of large amounts of data, then GitGuardian in 2017. GitGuardian, a cybersecurity start-up co-founded with Eric Fourrier, has been pursuing a strong growth trajectory since 2017, supported by investors such as Balderton Capital, BPI France or Scott Chacon, co-founder of GitHub and Solomon Hykes, founder of Docker.
Holly Hagerman is the Contact