Erik Bernhardsson

Software infrastructure 2.0: a wishlist

Erik Bernhardsson

Software infrastructure (by which I include everything ending with *aaS, or anything remotely similar to it) is an exciting field, in particular because (despite what the neo-luddites may say) it keeps getting better every year! I love working with something that moves so quickly.

Giving more tools to software engineers: the reorganization of the factory

Erik Bernhardsson

It's a popular attitude among developers to rant about our tools and how broken things are. Maybe I'm an optimistic person, because my viewpoint is the complete opposite!

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

What's Erik up to?

Erik Bernhardsson

I joined Better in early 2015 because I thought the team was crazy enough to actually change one of the largest industries in the US. For six years, I ran the tech team, hiring 300+ people, probably doing 2,000+ interviews, and according to GitHub I added 646,941 lines of code and removed 339,164.

Never attribute to stupidity that which is adequately explained by opportunity cost

Erik Bernhardsson

Hanlon's razor is a classic aphorism I'm sure you have heard before: Never attribute to malice that which can be adequately explained by stupidity. I've found that neither malice nor stupidity is the most common reason when you don't understand why something is in a certain way.

Does Your Company Need Help Solving These 4 Common eLearning Challenges?

If you want to know how to get ahead of the game and avoid the common mishaps in selling your eLearning courses, you’ve come to the right place! Lambda Solutions has identified the most common and costly challenges faced by eLearning providers today.

How to hire smarter than the market: a toy model

Erik Bernhardsson

Let’s consider a toy model where you’re hiring for two things and that those are equally valuable. It’s not very important what those are, so let’s just call them “thing A” and “thing B” for now.

Why software projects take longer than you think – a statistical model

Erik Bernhardsson

Anyone who built software for a while knows that estimating how long something is going to take is hard. It’s hard to come up with an unbiased estimate of how long something will take, when fundamentally the work in itself is about solving something. One pet theory I’ve had for a really long time, is that some of this is really just a statistical artifact. I suspect devs are actually decent at estimating the *median* time to complete a task. Planning is hard because they suck at the *average*.

Headcount goals, feature factories, and when to hire those mythical 10x people

Erik Bernhardsson

Since I started building up a tech team for Better , I made a very conscious decision to pay at the high end to get people. I thought this made more sense: they cost a bit more money to hire, but output usually more than compensates for it. Many fellow CTOs, some went for the other side of the spectrum: bootcamps and campus recruiting are great recruiting grounds for them. This was a mystery to me, until it all made sense to me. What is output?

The hacker's guide to uncertainty estimates

Erik Bernhardsson

It started with a tweet: New years resolution: every plot I make during 2018 will contain uncertainty estimates — Erik Bernhardsson (@fulhack) January 7, 2018. Because I’ve been sitting in 100,000,000 meetings where people endlessly debate whether the monthly number of widgets is going up or down, or whether widget method X is more productive than widget method Y. For almost any graph, quantifying the uncertainty seems useful, so I started trying.

Groups 220

I don't want to learn your garbage query language

Erik Bernhardsson

This is a bit of a rant but I really don’t like software that invents its own query language. There’s a trillion different ORMs out there. Another trillion databases with their own query language. Another trillion SaaS products where the only way to query is to learn some random query DSL they made up. I just want my SQL back. It’s a language everyone understands, it’s been around since the seventies, and it’s reasonably standardized.

Success Story: Swiss Insurtech Company Hires a Remote CTO from Ukraine

Read a story of a Swiss Insurtech startup which successfully extended their remote development team in Ukraine to 15 people and found their perfect CTO to manage their software development team.

Data architecture vs backend architecture

Erik Bernhardsson

Interviewing is a noisy prediction problem

Erik Bernhardsson

I have done roughly 2,000 interviews in my life. When I started recruiting, I had so much confidence in my ability to assess people. Let me just throw a couple of algorithm questions at a candidate and then I’ll tell you if they are good or not! Over time I’ve come to the (slightly disappointing) realization that knowing who’s going to be good at their job is an extremely hard problem.

Developer experience as a competitive advantage

Erik Bernhardsson

Mortality statistics and Sweden's "dry tinder" effect

Erik Bernhardsson

We live in a year of about 350,000 amateur epidemiologists and I have no desire to join that “club” But I read something about COVID-19 deaths that I thought was interesting and wanted to see if I could replicated it through data. Basically the claim is that Sweden had an exceptionally “good” year in 2019 in terms of influenza deaths causing there to be more deaths “overdue” in 2020. This post is not an attempt to draw any scientific conclusions!

4 Approaches to Data Analytics

As the analytics landscape continues to evolve, application teams who need to embed dashboards, reports, and other analytics capabilities in their applications can choose from dozens of solutions. How do you differentiate one solution from the next?

New benchmarks for approximate nearest neighbors

Erik Bernhardsson

UPDATE(2018-06-17): There are is a later blog post with newer benchmarks ! One of my super nerdy interests include approximate algorithms for nearest neighbors in high-dimensional spaces. The problem is simple. You have say 1M points in some high-dimensional space. Now given a query point, can you find the nearest points out of the 1M set? Doing this fast turns out to be tricky. I’m the author of Annoy which has more than 3,000 stars on Github.

Missing the point about microservices – it's about testing and deploying independently

Erik Bernhardsson

Ok, so I have to first preface this whole blog post by a few things: I really struggle with the term microservices. I can’t put my finger on exactly why. Maybe because the term is hopelessly ill-defined, maybe because it’s gotten picked up by the hype train. Whatever. But I have to stick to some type of terminology so let’s just roll with it. This blog post might be mildly controversial, but I’m throwing it out there because I’ve had this itchy feeling for so long and I can’t get rid of it.

Miscellaneous unsolicited (and possibly biased) career advice

Erik Bernhardsson

No one asked for this, but I’m something like ~12 years into my career and have had my fair share of mistakes and luck so I thought I’d share some. Honestly, I feel like I’ve mostly benefitted from luck. Some of the things I did on a whim turned out to be excellent choices many years later. Some of the things were clear blind spots in hindsight. If I could give my 12 years younger self a bunch of career advice, here are some of those things. Choosing a company.

The eigenvector of "Why we moved from language X to language Y"

Erik Bernhardsson

I was reading yet another blog post titled “Why our team moved from to ” (I forgot which one) and I started wondering if you can generalize it a bit. Is it possible to generate a N * N contingency table of moving from language X to language Y? Someone should make a N*N contingency table of all engineering blog posts titled "Why we moved from to " — Erik Bernhardsson (@fulhack) January 25, 2017. So I wrote a script for it.

Is Your IoT Pilot in Danger?

Take our short quiz to determine if your IoT pilot is set up for success, or at risk of being delayed.

Language pitch

Erik Bernhardsson

Here’s a fun analysis that I did of the pitch (aka. frequency) of various languages. Certain languages are simply pronounced with lower or higher pitch. Whether this is a feature of the language or more a cultural thing is a good question, but there are some substantial differences between languages. Hertz (or Hz, or ), is the standard way to measure audio frequency. Typical human speech ranges between 50 Hz and 300 Hz. Most men typically range between 85-180Hz, and most women between 165-255Hz.

The half-life of code & the ship of Theseus

Erik Bernhardsson

As a project evolves, does the new code just add on top of the old code? Or does it replace the old code slowly over time? In order to understand this, I built a little thing to analyze Git projects, with help from the formidable GitPython project.

Linux 219

Business secrets from terrible people

Erik Bernhardsson

I get bored reading management books very easily and lately I’ve been reading about a wide range of almost arbitrary topics. One of the lenses I tend to read through is to see different management styles in different environments. It turns out that some truly f—ng horrific people have some smart management ideas. This is not maybe surprising. If you have some twisted goals, you can’t have incompetent leadership or you won’t get anywhere.

Learning from users faster using machine learning

Erik Bernhardsson

I had an interesting idea a few weeks ago, best explained through an example. Let’s say you’re running an e-commerce site (I kind of do ) and you want to optimize the number of purchases. Let’s also say we try to learn as much as we can from users, both using A/B tests but also using just basic slicing and dicing of the data. We are looking at how many people convert (buy our widgets) but a constant problem is there’s just too much uncertainty. How can we learn faster?

Cassandra Data Modeling Guide to Best Practices

Are you a developer, database architect, or database administrator that's new to Cassandra but have been tasked with developing a Cassandra schema design? Learn the basic rules to keep in mind when designing your schema for Cassandra.

Optimizing for iteration speed

Erik Bernhardsson

I’ve written before about the importance of iterating quickly but I didn’t necessarily talk about some concrete things you can do. When I’ve built up the tech team at Better , I’ve intentionally optimized for fast iteration speed above almost everything else. What are some ways we did that? Continuous deployment. My dubious claim is that we might be the only financial institution in the world to deploy continuously. I actually ended up getting quoted in the Economist about this specifically.

Toxic meeting culture

Erik Bernhardsson

I spent six years at a company that went from 50 people to 1500 and one contributing factor leading to my departure was that I went from a “maker” to a person stuck in meetings every day. It wasn’t that I wanted to do that, but everyone else kept dragging me into meetings. There’s about 47 million blog posts about why meetings suck and I’m not going to pile more onto that heap. For the record – a well run meeting is great!

The software engineering rule of 3

Erik Bernhardsson

Here’s a dumb extremely accurate rule I’m postulating* for software engineering projects: you need at least 3 examples before you solve the right problem. This is what I’ve noticed: Don’t factor out shared code between two classes. Wait until you have at least three. The two first attempts to solve a problem will fail because you misunderstood the problem. The third time it will work. Any attempt at being smart earlier will end up overfitting to coincidental patterns.

Waiting time, load factor, and queueing theory – why you need to cut your systems a bit of slack

Erik Bernhardsson

I’ve been reading up on operations research lately, including queueing theory. It started out as a way to understand the very complex mortgage process (I work at a mortgage startup ) but it’s turned into my little hammer and now I see nails everywhere. One particular relationship that turns out to be somewhat more complex is the relationship between cycle time and throughput. Here are some examples of situations where this might apply: What’s a good CPU load for a database?

System 155

5 Tips to Advance Your Career as a Technical Recruiter

This step-by-step guide is designed to provide technical recruiters with tips and tricks to achieve tangible results that accelerate their recruiting efforts—and career.

Conversion rates – you are (most likely) computing them wrong

Erik Bernhardsson

How hard can it be to compute conversion rate? Take the total number of users that converted and divide them with the total number of users. Except… it’s a lot more complicated when you have any sort of significant time lag. Prelude – a story. Fresh out of school I joined Spotify as the first data analyst. One of my first projects was to understand conversion rates. Conversion rate from the free service to Premium is tricky because there’s a huge time lag.

The number of letters in the word for each number

Erik Bernhardsson

Just for fun, I generated these graphs of the number of letters in the word for each number. I really spent about 10 minutes on this (ok…possibly also another 40 minutes tweaking the plots): More languages!! I love how Spanish has a few super compact words: “cien mil” for 100,000 for instance. Only eight letters, versus English “one hundred thousand” (20 letters). I don’t know much about French but I think they have some kind of weird system based on 20s. Which by the way also Danish has.

Lambda 166

New approximate nearest neighbor benchmarks

Erik Bernhardsson

As some of you may know, one of my side interests is approximate nearest neighbor algorithms. I’m the author of Annoy , a library with 3,500+ stars on Github as of today. It offers fast approximate search for nearest neighbors with the additional benefit that you can load data super fast from disk using mmap. I built it at Spotify to use for music recommendations where it’s still used to power millions (maybe billions) of music recommendations every day.

Pareto efficency

Erik Bernhardsson

Pareto efficiency is a useful concept I like to think about. It often comes up when you compare items on multiple dimensions. Say you want to buy a new TV. To simplify it let’s assume you only care about two factors: price and quality. We don’t know what you are willing to pay for quality – but we know that everything else equals : The cheaper the better. The higher quality the better. This means we can rule out some TV’s immediately.

Travel 166

Open Source & Open Standards: Navigating the Intricacies of a Symbiotic Partnership

Speaker: Guy Martin, Executive Director of OASIS Open

The COVID-19 global pandemic has raised the already bright visibility of technology to an even higher level. Join Guy Martin, Executive Director at OASIS Open, as he presents this webinar that will discuss how we can make open source and open standards even more effective by helping them recapture their strong partnership.

Miscellaneous unsolicited (and possibly biased) career advice

Erik Bernhardsson

No one asked for this, but I’m something like ~12 years into my career and have had my fair share of mistakes and luck so I thought I’d share some of those things. Honestly, I feel like I’ve mostly benefitted from luck. Some of the things I did on a whim turned out to be excellent choices many years later. Some of the things were clear blind spots in hindsight. If I could give my 12 years younger self a bunch of career advice, here are some of those things. Choosing a company.

Functional programming is the libertarianism of software engineering

Erik Bernhardsson

This is a pretty dumb post, in which I argue that functional programming has a lot of the bad parts of libertarianism and a lot of the good parts: Both ideologies strive to eliminate [the] state. (ok, ok, dumb dad joke). Both ideologies are driven by a set of dogmatic axioms rather than a practical goal: Libertarianism wants to reduce the government because any involvement distorts free markets.

When machine learning matters

Erik Bernhardsson

I joined Spotify in 2008 to focus on machine learning and music recommendations. It’s easy to forget, but Spotify’s key differentiator back then was the low-latency playback. People would say that it felt like they had the music on their own hard drive. The other key differentiator was licensing – until early 2009 Spotify basically just had all kinds of weird stuff that employees had uploaded. In 2009 after a crazy amount of negotiation the music labels agreed to try it out as an experiment.

NYC subway math

Erik Bernhardsson

Apparently MTA (the company running the NYC subway) has a real-time API. My fascination for the subway takes autistic proportions and so obviously I had to analyze some of the data. The documentation is somewhat terrible, but here’s some relevant code for how to use the API: from google.transit import gtfs_realtime_pb2. import urllib. for feed_id in [ 1 , 2 , 11 ]: feed = gtfs_realtime_pb2. FeedMessage (). response = urllib. urlopen ( '[link] % s&feed_id= % d' % ( os.

Realizing the Benefits of Automated Machine Learning

How are organizations using machine learning and artificial intelligence (AI) to derive business value? Renowned author and professor Tom Davenport explains the rise of automated machine learning, its benefits, and success stories from businesses that are already using it.