Erik Bernhardsson

Never attribute to stupidity that which is adequately explained by opportunity cost

Erik Bernhardsson

Hanlon's razor is a classic aphorism I'm sure you have heard before: Never attribute to malice that which can be adequately explained by stupidity. I've found that neither malice nor stupidity is the most common reason when you don't understand why something is in a certain way. Instead, the root cause is probably just that they didn't have time yet. This happens all the time at startups (maybe a bit less at big companies, for reasons I'll get back to).

How to set compensation using commonsense principles

Erik Bernhardsson

Compensation has always been one of the most confusing parts of management to me. Getting it right is obviously extremely important. Compensation is what drives our entire economy, and you could look at the market for labor as one gigantic resource-allocating machine in the same way as people look at the stock market as a gigantic resource-allocating machine for investments.

How To 186

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

How to hire smarter than the market: a toy model

Erik Bernhardsson

Let’s consider a toy model where you’re hiring for two things and that those are equally valuable. It’s not very important what those are, so let’s just call them “thing A” and “thing B” for now. For one set of abilities, the scatter plot looks like this: The assumption here is that A and B are drawn from a 2D-Gaussian with a mild positive correlation.

Developer experience as a competitive advantage

Erik Bernhardsson

I spent a ton of time looking at different software providers, both as a CTO, and as a nerd “advanced” consumer who builds stuff in my spare time. In the last 10 years, there has been an order of magnitude more products that cater directly to developers, througn APIs, SDKs, and tooling. I'm pretty psyched about this trend. As the cost of building software goes down, that drives up the demand for software engineers.

Are Your Embedded Analytics DevOps-Friendly?

Does your analytics solution work with your current tech stack and DevOps practices? If not, any update to the analytics could increase deployment complexity and become difficult to maintain. Learn the 5 elements of a DevOps-friendly embedded analytics solution.

Why software projects take longer than you think – a statistical model

Erik Bernhardsson

Anyone who built software for a while knows that estimating how long something is going to take is hard. It’s hard to come up with an unbiased estimate of how long something will take, when fundamentally the work in itself is about solving something. One pet theory I’ve had for a really long time, is that some of this is really just a statistical artifact. I suspect devs are actually decent at estimating the *median* time to complete a task. Planning is hard because they suck at the *average*.

Mortality statistics and Sweden's "dry tinder" effect

Erik Bernhardsson

We live in a year of about 350,000 amateur epidemiologists and I have no desire to join that “club” But I read something about COVID-19 deaths that I thought was interesting and wanted to see if I could replicated it through data. Basically the claim is that Sweden had an exceptionally “good” year in 2019 in terms of influenza deaths causing there to be more deaths “overdue” in 2020. This post is not an attempt to draw any scientific conclusions!

The hacker's guide to uncertainty estimates

Erik Bernhardsson

It started with a tweet: New years resolution: every plot I make during 2018 will contain uncertainty estimates — Erik Bernhardsson (@fulhack) January 7, 2018. Because I’ve been sitting in 100,000,000 meetings where people endlessly debate whether the monthly number of widgets is going up or down, or whether widget method X is more productive than widget method Y. For almost any graph, quantifying the uncertainty seems useful, so I started trying.

Groups 220

Data architecture vs backend architecture

Erik Bernhardsson

I don't want to learn your garbage query language

Erik Bernhardsson

This is a bit of a rant but I really don’t like software that invents its own query language. There’s a trillion different ORMs out there. Another trillion databases with their own query language. Another trillion SaaS products where the only way to query is to learn some random query DSL they made up. I just want my SQL back. It’s a language everyone understands, it’s been around since the seventies, and it’s reasonably standardized.

5 Things a Data Scientist Can Do to Stay Current

DataRobot together with Snowflake – a leading cloud data platform provider — is helping data scientists stay current with the latest technology and data science best practices so that they can excel in an increasingly AI-driven workplace. Five Things a Data Scientist Can Do to Stay Current offers data scientists guidance for thriving in AI-driven enterprises.

Interviewing is a noisy prediction problem

Erik Bernhardsson

I have done roughly 2,000 interviews in my life. When I started recruiting, I had so much confidence in my ability to assess people. Let me just throw a couple of algorithm questions at a candidate and then I’ll tell you if they are good or not! Over time I’ve come to the (slightly disappointing) realization that knowing who’s going to be good at their job is an extremely hard problem.

Miscellaneous unsolicited (and possibly biased) career advice

Erik Bernhardsson

No one asked for this, but I’m something like ~12 years into my career and have had my fair share of mistakes and luck so I thought I’d share some. Honestly, I feel like I’ve mostly benefitted from luck. Some of the things I did on a whim turned out to be excellent choices many years later. Some of the things were clear blind spots in hindsight. If I could give my 12 years younger self a bunch of career advice, here are some of those things. Choosing a company.

New benchmarks for approximate nearest neighbors

Erik Bernhardsson

UPDATE(2018-06-17): There are is a later blog post with newer benchmarks ! One of my super nerdy interests include approximate algorithms for nearest neighbors in high-dimensional spaces. The problem is simple. You have say 1M points in some high-dimensional space. Now given a query point, can you find the nearest points out of the 1M set? Doing this fast turns out to be tricky. I’m the author of Annoy which has more than 3,000 stars on Github.

Missing the point about microservices – it's about testing and deploying independently

Erik Bernhardsson

Ok, so I have to first preface this whole blog post by a few things: I really struggle with the term microservices. I can’t put my finger on exactly why. Maybe because the term is hopelessly ill-defined, maybe because it’s gotten picked up by the hype train. Whatever. But I have to stick to some type of terminology so let’s just roll with it. This blog post might be mildly controversial, but I’m throwing it out there because I’ve had this itchy feeling for so long and I can’t get rid of it.

Testing at Every Stage of Development

Up to 80% of new products fail. The reality is harsh and the reasons why are endless. Perhaps the new product couldn’t oust a customer favorite. Maybe it looked great but was too hard to use. Or, despite being a superior product, the go-to-market strategy failed. There’s always a risk when building a new product, but you can hedge your bets by understanding exactly what your customers' expectations truly are at every step of the development process.

Business secrets from terrible people

Erik Bernhardsson

I get bored reading management books very easily and lately I’ve been reading about a wide range of almost arbitrary topics. One of the lenses I tend to read through is to see different management styles in different environments. It turns out that some truly f—ng horrific people have some smart management ideas. This is not maybe surprising. If you have some twisted goals, you can’t have incompetent leadership or you won’t get anywhere.

The eigenvector of "Why we moved from language X to language Y"

Erik Bernhardsson

I was reading yet another blog post titled “Why our team moved from to ” (I forgot which one) and I started wondering if you can generalize it a bit. Is it possible to generate a N * N contingency table of moving from language X to language Y? Someone should make a N*N contingency table of all engineering blog posts titled "Why we moved from to " — Erik Bernhardsson (@fulhack) January 25, 2017. So I wrote a script for it.

Learning from users faster using machine learning

Erik Bernhardsson

I had an interesting idea a few weeks ago, best explained through an example. Let’s say you’re running an e-commerce site (I kind of do ) and you want to optimize the number of purchases. Let’s also say we try to learn as much as we can from users, both using A/B tests but also using just basic slicing and dicing of the data. We are looking at how many people convert (buy our widgets) but a constant problem is there’s just too much uncertainty. How can we learn faster?

Language pitch

Erik Bernhardsson

Here’s a fun analysis that I did of the pitch (aka. frequency) of various languages. Certain languages are simply pronounced with lower or higher pitch. Whether this is a feature of the language or more a cultural thing is a good question, but there are some substantial differences between languages. Hertz (or Hz, or ), is the standard way to measure audio frequency. Typical human speech ranges between 50 Hz and 300 Hz. Most men typically range between 85-180Hz, and most women between 165-255Hz.

How Embedding AI-Powered Analytics Can Give You a Competitive Advantage

Embedding dashboards and reports aren’t enough. Futureproof your application by offering instant, actionable insights that will give you and your customers a competitive advantage.

The half-life of code & the ship of Theseus

Erik Bernhardsson

As a project evolves, does the new code just add on top of the old code? Or does it replace the old code slowly over time? In order to understand this, I built a little thing to analyze Git projects, with help from the formidable GitPython project.

Linux 219

Optimizing for iteration speed

Erik Bernhardsson

I’ve written before about the importance of iterating quickly but I didn’t necessarily talk about some concrete things you can do. When I’ve built up the tech team at Better , I’ve intentionally optimized for fast iteration speed above almost everything else. What are some ways we did that? Continuous deployment. My dubious claim is that we might be the only financial institution in the world to deploy continuously. I actually ended up getting quoted in the Economist about this specifically.

Toxic meeting culture

Erik Bernhardsson

I spent six years at a company that went from 50 people to 1500 and one contributing factor leading to my departure was that I went from a “maker” to a person stuck in meetings every day. It wasn’t that I wanted to do that, but everyone else kept dragging me into meetings. There’s about 47 million blog posts about why meetings suck and I’m not going to pile more onto that heap. For the record – a well run meeting is great!

Conversion rates – you are (most likely) computing them wrong

Erik Bernhardsson

How hard can it be to compute conversion rate? Take the total number of users that converted and divide them with the total number of users. Except… it’s a lot more complicated when you have any sort of significant time lag. Prelude – a story. Fresh out of school I joined Spotify as the first data analyst. One of my first projects was to understand conversion rates. Conversion rate from the free service to Premium is tricky because there’s a huge time lag.

Building Like Amazon

Speaker: Leo Zhadanovsky, Principal Solutions Architect, Amazon Web Services

Amazon's journey to its current modern architecture and processes provides insights for all software development leaders. To get there, Amazon focused on decomposing for agility, making critical cultural and operational changes, and creating tools for software delivery. The result was enabling developers to rapidly release and iterate software while maintaining industry-leading standards on security, reliability, and performance. Whether you're developing for a small startup or a large corporation, learning the tools for CI/CD will make your good DevOps team great. We are excited to be joined by Leo Zhadanovsky, a Principal Solutions Architect at Amazon Web Services.

The software engineering rule of 3

Erik Bernhardsson

Here’s a dumb extremely accurate rule I’m postulating* for software engineering projects: you need at least 3 examples before you solve the right problem. This is what I’ve noticed: Don’t factor out shared code between two classes. Wait until you have at least three. The two first attempts to solve a problem will fail because you misunderstood the problem. The third time it will work. Any attempt at being smart earlier will end up overfitting to coincidental patterns.

Waiting time, load factor, and queueing theory – why you need to cut your systems a bit of slack

Erik Bernhardsson

I’ve been reading up on operations research lately, including queueing theory. It started out as a way to understand the very complex mortgage process (I work at a mortgage startup ) but it’s turned into my little hammer and now I see nails everywhere. One particular relationship that turns out to be somewhat more complex is the relationship between cycle time and throughput. Here are some examples of situations where this might apply: What’s a good CPU load for a database?

System 160

The number of letters in the word for each number

Erik Bernhardsson

Just for fun, I generated these graphs of the number of letters in the word for each number. I really spent about 10 minutes on this (ok…possibly also another 40 minutes tweaking the plots): More languages!! I love how Spanish has a few super compact words: “cien mil” for 100,000 for instance. Only eight letters, versus English “one hundred thousand” (20 letters). I don’t know much about French but I think they have some kind of weird system based on 20s. Which by the way also Danish has.

Lambda 173

New approximate nearest neighbor benchmarks

Erik Bernhardsson

As some of you may know, one of my side interests is approximate nearest neighbor algorithms. I’m the author of Annoy , a library with 3,500+ stars on Github as of today. It offers fast approximate search for nearest neighbors with the additional benefit that you can load data super fast from disk using mmap. I built it at Spotify to use for music recommendations where it’s still used to power millions (maybe billions) of music recommendations every day.

eLearning demand is at an all-time high. Pick the right LMS!

The need for online learning is greater than ever having the right LMS is absolutely crucial. Quickly, easily, and cost-effectively decide which Learning Management System is right for you with this 12-step guide from LMS implementation experts.

Miscellaneous unsolicited (and possibly biased) career advice

Erik Bernhardsson

No one asked for this, but I’m something like ~12 years into my career and have had my fair share of mistakes and luck so I thought I’d share some of those things. Honestly, I feel like I’ve mostly benefitted from luck. Some of the things I did on a whim turned out to be excellent choices many years later. Some of the things were clear blind spots in hindsight. If I could give my 12 years younger self a bunch of career advice, here are some of those things. Choosing a company.

Pareto efficency

Erik Bernhardsson

Pareto efficiency is a useful concept I like to think about. It often comes up when you compare items on multiple dimensions. Say you want to buy a new TV. To simplify it let’s assume you only care about two factors: price and quality. We don’t know what you are willing to pay for quality – but we know that everything else equals : The cheaper the better. The higher quality the better. This means we can rule out some TV’s immediately.

Travel 168

When machine learning matters

Erik Bernhardsson

I joined Spotify in 2008 to focus on machine learning and music recommendations. It’s easy to forget, but Spotify’s key differentiator back then was the low-latency playback. People would say that it felt like they had the music on their own hard drive. The other key differentiator was licensing – until early 2009 Spotify basically just had all kinds of weird stuff that employees had uploaded. In 2009 after a crazy amount of negotiation the music labels agreed to try it out as an experiment.

NYC subway math

Erik Bernhardsson

Apparently MTA (the company running the NYC subway) has a real-time API. My fascination for the subway takes autistic proportions and so obviously I had to analyze some of the data. The documentation is somewhat terrible, but here’s some relevant code for how to use the API: from google.transit import gtfs_realtime_pb2. import urllib. for feed_id in [ 1 , 2 , 11 ]: feed = gtfs_realtime_pb2. FeedMessage (). response = urllib. urlopen ( '[link] % s&feed_id= % d' % ( os.

Why Distributed Tracing is Essential for Performance and Reliability

Speaker: Daniel "spoons" Spoonhower, CTO and Co-Founder at Lightstep

Many engineering organizations have now adopted microservices or other loosely coupled architectures, often alongside DevOps practices. Together these have enabled individual service teams to become more independent and, as a result, have boosted developer velocity. However, this increased velocity often comes at the cost of overall application performance or reliability. Worse, teams often don’t understand what’s affecting performance or reliability – or even who to ask to learn more. Distributed tracing was developed at organizations like Google and Twitter to address these problems and has also come a long way in the decade since then. By the end of this presentation, you’ll understand why distributed tracing is necessary and how it can bring performance and reliability back under control.