Aqua Blog

GitHub Dataset Research Reveals Millions Potentially Vulnerable to RepoJacking

GitHub Dataset Research Reveals Millions Potentially Vulnerable to RepoJacking

Millions of GitHub repositories are potentially vulnerable to RepoJacking. New research by Aqua Nautilus sheds light on the extent of RepoJacking, which if exploited may lead to code execution on organizations’ internal environments or on their customers’ environments. As part of our research, we found an enormous source of data that allowed us to sample a dataset and find some highly popular targets.

Among the repositories found vulnerable to this attack we discovered organizations such as Google, Lyft and some that requested to remain anonymous. All were notified of this vulnerability and promptly mitigated the risks. In this blog we will show how an attacker can exploit this at scale and share the PoC we ran on popular repositories.

In contrast to past studies, our research emphasizes the security implications and severity of this database if exploited by attackers. Many of whom can find within it numerous high-quality targets susceptible to RepoJacking. In this blog we delve deeper into the exploitation scenarios of this attack and provide illustrations of each scenario using real-life examples.

What is RepoJacking?

What is RepoJacking? 

To read more, you can find additional information in Appendix A.

RepoJacking Restrictions and Bypasses:

There are some restrictions about the capability of the attacker of opening the old repository name (the restrictions are called retired names). However, they are applied only on popular repositories that were popular before the rename, and recently researchers found many bypasses to these restrictions allowing attackers to open any repository they want.

If you want to read more about the restrictions and bypasses, you can find information in the Appendix B.

As we learned from these bypasses, organizations should not depend on the retired names as a security mechanism, so in this research a vulnerable repository is a repository that gets redirected, and the organization name does not exist.

Are You Exposed to RepoJacking?

You may ask yourself; do I own repositories that are directly or indirectly vulnerable to RepoJacking?

The quick answer is that the possibilities of exposure are endless. There are a few basic questions you should ask if you think you may be exposed.

  • What do you know about your organization?
  • What are all the GitHub organization names you used before?
  • Were there any mergers and acquisitions your organization was involved in?
  • Are there any dependencies in my code that lead to a GitHub repository vulnerable to RepoJacking?
  • Is there guidance somewhere (documentation, guides, Stack Overflow answer etc) that suggests you should use a GitHub repository vulnerable to RepoJacking?

As said above, the possibilities of exposure are endless, and depending on the answers to any of these questions you may find your organization is vulnerable.

Compiling a Dataset for Our Research

Attackers don’t need to do all this hard work. They aren’t bound to a specific organization. They can scan the internet and find any victim they’d like and if they sense there’s profit behind the attack, they will continue until they maximize their gain. Websites such as the GHTorrent project provide amazing invaluable data.

The GHTorrent project records any public event (commit, PR, etc.) that happens on Github and saves it in a database. Anyone can download a database dump of a specific timeframe. By utilizing this dataset, malicious actors can uncover the historical names of various organizations and broaden their potential attack surface.

In the image below you can see how easy it is to find a specific timeframe and download it.

 

ghtorrent-downloadlinks---3

Essentially, the entire history of usernames and organizations’ names on GitHub since 2012 is easily accessible to anyone.

It’s important to note that during the research the website ghtorrent.org was available. However, currently it is not online, but the dataset still exists in http://ghtorrent-downloads.ewi.tudelft.nl/mysql.

Our research started from a data sample we found on this website. We downloaded all the logs from a random month (June 2019) and compiled a list of 125 million unique repositories’ names. Next, we sampled 1% (1.25 million repositories’ names) and checked each one to see if it was vulnerable to RepoJacking.

We found that 36,983 repositories were vulnerable to RepoJacking! That is 2.95% success rate. If we extrapolate the result we found on this sample, to the entire GitHub repositories’ base (over 300 million repositories according to GitHub publications), there are potentially millions of vulnerable repositories!

funnel-image

Exploitation Scenarios

Now that we know how widespread RepoJacking is, the remaining question is how can an attacker actually exploit a vulnerable repository?

The attacker can exploit it when there is a reference somewhere in the public internet to the previous name of the repository.

We divided the exploitation scenarios into 2 categories:

  1. An automated download from a RepoJacking vulnerable repository is when the user doesn’t willingly or knowingly download any resources from another GitHub repository. An example to that is when another project is using a component that is stored on a RepoJacking vulnerable GitHub repository. It can be by downloading a resource or a module (for example go, swift).
  2. While a manual download from a RepoJacking vulnerable repository is when the user actively inserts the link to the RepoJackable repository. One example of this is when a link appears in an official installation guide. It can be under the vulnerable repository README.md, or in the organization’s website. Another example to that is when a link that appears somewhere across the internet. For instance, Stack Overflow, a blog, Reddit etc.

Below are 3 real life examples of vulnerable repositories:

  1. Code Execution via installation scripts (automated)
  2. Code Execution via Readme/Build instructions (manual)
  3. Code Execution via repository releases (manual)

Code Execution via Installation Scripts:

In the image below you can see a screenshot from GitHub of the script install.sh from Lyft’s repository. This script is designed to download a zip from the repository https://github.com/YesGraph/Dominus which is vulnerable to RepoJacking! The script extracts a compressed file and executes the extracted shell script.
lyft_example_with_boxes---5

This means that a user who uses the install.sh script in Lyft’s repository will unknowingly fetch a file and run it from another repository. This code is fine as long as the redirection between the new and the old repositories works. However, the old repository is susceptible to RepoJacking. An attacker can easily open the organization YesGraph which is available (in the image below you can see that we control it) and create the repository Dominus.

owning_Yesgraph---6

Once this is done the redirection, between the new and the old repository, will no longer exist and the zip file will be downloaded from a repository controlled by the attacker. This leads to arbitrary code execution on the original repository users.

We responsibly disclosed our finding to Lyft, which replied that the repository was not currently in use. Additionally, they deprecated the repository within 2 days of the initial report.

Code Execution via Readme/Build Instructions

Another example we found of a RepoJacking vulnerable repository was found in a Google repository.

Google-7

In this repository we found a manual – “official installation guide” exploitation type. Specifically, when reading the README.md instructions in this repository you see instructions to clone a project from another GitHub account.

 

Google-8-1

As you can see the instructions guide to clone the project from the Socraticorg (https://github.com/socraticorg/mathsteps) organization rather than the Google (https://github.com/google/mathsteps) organization. A quick Google search reveals that Socratic Org is a subsidiary of Google. (They were founded in 2013, launched their app in 2016 under this name, and acquired by Google on 2018)

When you access https://github.com/socraticorg/mathsteps, you are being redirected to https://github.com/google/mathsteps so eventually the user will fetch Google’s repository. However, because the socraticorg organization was available, an attacker could open the socraticorg/mathsteps repository and users following Google’s instructions will clone the attacker’s repository instead. And because of the npm install this will lead to arbitrary code execution on the users.

We disclosed our findings to Google, which fixed the issue.

Code Execution via Repository Releases:

In this example we will show how RepoJacking can affect the releases in GitHub.

vscode_repojacking_readme---9-1

Here we can see the README instructs us to download the extension.vsix file, which is a Visual Studio Code extension, from the GitHub releases of this repository.

repojacking_vscode_link---10

This link is vulnerable to RepoJacking (old_org is available for the attacker). As a result, when a user accesses this URL there won’t be a HTTP redirect and the user will download the attacker’s VSCode extension instead.

If you want to learn more about the dangers of installing malicious VSCode extensions, you can read our blog regarding malicious VSCode extensions. Actually, this repository is what leads us to research the dangers of installing a malicious VSCode extension and the flaws of the marketplace.

The PoC

To put theory into practice we created a PoC to illustrate how RepoJacking really works. We ran a PoC on several repositories that belong to popular organizations. We gathered basic metadata such as hostname, IP address, and DNS name servers to see who downloaded artifacts from the vulnerable repositories.

Our PoC was triggered a few times leading to code execution on environments related to some big companies. Below you can see an example of such a PoC test executed.

poc_blurred_cropped---11

In the screenshot above you can see the information of the user that downloaded the PoC. His username(blurred), installation directory, DNS servers (some blurred) and home directory (blurred)

Summary and Mitigations

Our goal of this blog was to shed light on the widespread nature of RepoJacking and the potential risks it poses to organizations and their users. We showed our analysis of a subset of the database of the GHTorrent Project, which showed the potential risk to many organizations. Additionally, we presented various exploitation scenarios and provided real-life examples of repositories vulnerable to these scenarios.

To mitigate the risk, we recommend taking the following steps:

  • Regularly check your repositories for any links that may fetch resources from external GitHub repositories, as references to projects like Go module can change its name anytime.
  • If you change your organization name, ensure that you still own the previous name as well, even as a placeholder, to prevent attackers from creating it.

It’s important to note that our analysis only covered a fraction of the available data, meaning that there are many more vulnerable organizations, potentially including yours.

Appendix A:

What is RepoJacking?

GitHub RepoJacking (also known as dependency repository hijacking) is a type of supply chain attack that allows attackers to takeover GitHub projects’ dependencies or an entire project to run malicious code on whoever uses these projects.

RepoJacking can occur when a GitHub user/organization changes its name. To avoid breaking code dependencies in GitHub create a link between the older name to the new name (redirect the old name to the new one). So, if my code is designed to use dependencies from another GitHub project (or the entire project) which is at ‘github.com/username_A/repo_A’ and the owner has changed the account’s name to ‘github.com/username _B/repo_A’, GitHub created a feature that links the dependencies to the new account (‘github.com/username _B/repo_A’) even if your code still points to ‘github.com/username _A/repo_A’.

So far, this is ideal for developers. Nevertheless, the old username becomes available, and anyone can use it. Once someone creates both ‘username_A’ and the repository ‘repo_A’, the link which we described above breaks and any project that relied on ‘github.com/username_A/repo_A’, once again downloads dependencies from that repository, which is now owned and controlled by someone else. Attackers are aware of that and actively exploit this to conduct supply chain attacks.

We can suggest two plausible RepoJacking scenarios:

Username renamed: When a repository owner changes their username, a link is created between the old name and the new name for anyone who downloads dependencies from the old repository. However, it is possible for anyone to create the old username and break this link.

Mergers and Acquisitions: In this scenario, the repository ownership is transferred to another user due to mergers or acquisitions, and the original account is deleted. Anyone who downloads dependencies from the old repository will be redirected to the new account. Nevertheless, it is still possible for anyone to create the old username and disrupt this link.

Appendix B:

RepoJacking restrictions and bypasses:

Although GitHub has made attempts to block RepoJacking over the years, there are still some issues with these protections. They remain incomplete and can be bypassed by attackers.

One example of these protections being incomplete is GitHub’s initiation of protection for repositories with a high volume of cloning (more than 100 clones in the week before the organization name was changed, as mentioned in GitHub’s documentation). However, this protection does not cover repositories that were not popular in the past but gained popularity after the ownership was transferred to large organizations.

Additionally, big organizations may utilize the vulnerable project affected by RepoJacking as a dependency in other projects, potentially leading to a supply chain attack on a highly popular project with many stars, even though the vulnerable repository itself may have a lower number of stars.

Furthermore, even if the repository was initially protected by GitHub, attackers have found ways to circumvent these protections. One such instance was recently discovered by Checkmarx.

Ilay Goldman
Ilay Goldman is a Security Researcher at Aqua's research team, Team Nautilus. He specializes in uncovering and analyzing novel security threats and attack vectors in cloud native environments, as well as in supply chain security and open-source vulnerabilities. Before joining Aqua, he gained experience as a red team member. Ilay has also been an active public speaker, presenting his expertise at major cybersecurity events such as Black Hat and RSA.
Yakir Kadkoda
Yakir Kadkoda is a Lead Security Researcher at Aqua's research team, Team Nautilus. He combines his expertise in vulnerability research with a focus on discovering and analyzing new security threats and attack vectors in cloud native environments, supply chain security, and CI/CD processes. Prior to joining Aqua, Yakir worked as a red teamer. Yakir has shared his deep cybersecurity insights at major industry events like Black Hat and RSA.