close search bar

Sorry, not available in this language yet

close language selection

The 411 on Stack Overflow and open source license compliance

Phil Odence

Apr 14, 2021 / 3 min read

Many of the third-party components we find in audits have been pulled in their entirety from public software repositories (with GitHub being the most popular these days). But with some frequency we also come across snippets—lines of code that have been copied and pasted into source code. They might be a piece of a GitHub project, but they may also have been taken from a blog site like Stack Overflow or CodeGuru. We find that even people who have a pretty good idea of how to deal with code under open source licenses have questions about such findings. We are not lawyers, but we work with many and this blog post covers what we’ve learned from them on the topic.

Because of its popularity, we often get licensing questions specific to Stack Overflow, a well-established site that actually does a good job providing terms of service that speak to usage issues. If you are not familiar, the site is a great resource for developers, and its forum enables people to ask coding questions to their peers.

Understanding your code’s open source license

As you can imagine, answers to coding questions often include code. The Stack Overflow terms of service say, in a nutshell, that you can copy content including code for personal, noncommercial use only. Additionally, on a periodic basis, Stack Overflow provides compilations of content that can be downloaded and used under the Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.  

However, the CC BY-SA license is nebulous with respect to software. This is because it’s not intended for that purpose. The Creative Commons organization is very explicit about this in its FAQ: “We recommend against using Creative Commons licenses for software.” As such, lawyers scratch their heads about what’s okay with respect to code under those licenses. The CC BY-SA license can certainly be read in some situations as having a similar “viral” effect as the GPL and is thus a concern.

All of the above assumes that the person who posted the code is the copyright holder and was therefore able to confer rights to Stack Overflow. However, it’s quite possible that the poster copied the code from somewhere else and did not have the rights. But is that your problem? There’s not a lot of case law on this, but the Fantec case in Germany supports the view that if it’s in your code, you are responsible for the code’s provenance and for ensuring you are licensed by the proper copyright holder. See what your lawyer thinks, but most we talk to believe it’s important to know where the code originated. This is one reason organizations rely on our audits: we do the heavy lifting.

Coding with compliance in mind

brain neurons

At least with Stack Overflow, you know what you are dealing with. We also see code snippets from sites with a complete lack of terms of service or no mention of software terms. Acquirers or sellers preparing for M&A have to decide what to do (or not do) about such content in the software that is part of the deal. Our clients are by nature conservative in such matters, and most simply don’t want license issues in their code. But how to address each specific case is a calculation based on risk, importance of the function of the risky code, and the work required for remediation. 

In the scenario at hand, we are typically not talking about a lot of code. Content copied and pasted from a blog is likely going to be 20 or 30 lines of code. As a consequence, rewriting is certainly an option to be considered. Having a developer rewrite a function “in a clean room,” i.e., without referencing the copied code, will often take less time than debating the issue. 

When remediation would be more involved, some research and debate—and perhaps a legal opinion—is called for. Where exactly did the code come from? Who owns the copyright? Would they care? Would they come after us? One potentially easy solution is to get the copyright holder’s permission. In the case of Stack Overflow, the contributor has granted broad rights to the site but retained copyright themselves, so they’re in a position to do what they want. A friendly developer may be happy to have their code used, and the whole matter could be resolved with a short email exchange. “Yo, OK if we use this under the MIT license?” “Sure.” Matter closed. Again though, you need to make sure that “Yo” is the copyright holder and can therefore grant license to you.

But come on, will anyone ever know? That’s a tough one, and mostly between you, your corporate conscience, and your attorneys. If we were auditing your divestiture, we’d find it. A disgruntled employee might rat you out—it’s happened. But maybe you are planning to rewrite that part of the code in the next release, so….

Most companies, particularly those in software, strive to respect the rights of software copyright holders. Do unto others. In doing so they also eliminate that legal risk, but what to do in a particular case with a particular problematic snippet is always an interesting discussion. Just make sure you are having the discussion.

Minimize open source, legal, security, and quality risks for M&A due diligence or internal reporting with Black Duck® Audits.

Continue Reading

Explore Topics