Erik Bernhardsson

The data team: a short story

Erik Bernhardsson

I guess I should really call this a parable. The backdrop is: you have been brought in to grow a tiny data team (~4 people) at a mid-stage startup (~$10M annual revenue). It's a made up story based on n-th hand experiences (for n ? 3), and quite opinionated.

Data 285

Software infrastructure 2.0: a wishlist

Erik Bernhardsson

Software infrastructure (by which I include everything ending with *aaS, or anything remotely similar to it) is an exciting field, in particular because (despite what the neo-luddites may say) it keeps getting better every year! I love working with something that moves so quickly.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Giving more tools to software engineers: the reorganization of the factory

Erik Bernhardsson

It's a popular attitude among developers to rant about our tools and how broken things are. Maybe I'm an optimistic person, because my viewpoint is the complete opposite!

What is the right level of specialization? For data teams and anyone else.

Erik Bernhardsson

This isn't as much of a blog post as an elaboration of a tweet I posted the other day: I think this specialization of data teams into 99 different roles (data scientist, data engineer, analytics engineer, ML engineer etc) is generally a bad thing driven by the fact that tools are bad and too hard to use — Erik Bernhardsson (@fulhack) July 21, 2021. This seem to have resonated with a lot of people, but for whatever reason, it ended up being a lot more polarizing than I thought!

Data 165

How to Reach Peak Performance With the Product Management Organizational Health Checklist

Speaker: Rina Vernovskaya, CEO, 280 Group; and Roger Snyder, VP of Marketing, 280 Group

The degree of maturity of your product management organization can directly drive your ability to satisfy customers and become more profitable. Our Product Management Organizational Health Checklist and on-demand webinar can help.

Never attribute to stupidity that which is adequately explained by opportunity cost

Erik Bernhardsson

Hanlon's razor is a classic aphorism I'm sure you have heard before: Never attribute to malice that which can be adequately explained by stupidity. I've found that neither malice nor stupidity is the most common reason when you don't understand why something is in a certain way.

How to hire smarter than the market: a toy model

Erik Bernhardsson

Let’s consider a toy model where you’re hiring for two things and that those are equally valuable. It’s not very important what those are, so let’s just call them “thing A” and “thing B” for now.

How to set compensation using commonsense principles

Erik Bernhardsson

Compensation has always been one of the most confusing parts of management to me. Getting it right is obviously extremely important.

How To 212

Mortality statistics and Sweden's "dry tinder" effect

Erik Bernhardsson

We live in a year of about 350,000 amateur epidemiologists and I have no desire to join that “club” But I read something about COVID-19 deaths that I thought was interesting and wanted to see if I could replicated it through data. Basically the claim is that Sweden had an exceptionally “good” year in 2019 in terms of influenza deaths causing there to be more deaths “overdue” in 2020. This post is not an attempt to draw any scientific conclusions!

Developer experience as a competitive advantage

Erik Bernhardsson

Bridging the Online and Offline: How to Apply Product Thinking to Expanding Your eCommerce Business

Speaker: John Cutler, Product Evangelist and Coach at Amplitude

In a post-COVID world, online retailers are forced to reevaluate their position and address the challenge of adopting new customer experiences. Even brick and mortar businesses are integrating more digital approaches to CX -- testing out loyalty programs and subscription-based models. Join John Cutler, Product Evangelist and Coach at Amplitude, for this enlightening discussion on the current state of the ecommerce landscape.

Why software projects take longer than you think – a statistical model

Erik Bernhardsson

Anyone who built software for a while knows that estimating how long something is going to take is hard. It’s hard to come up with an unbiased estimate of how long something will take, when fundamentally the work in itself is about solving something. One pet theory I’ve had for a really long time, is that some of this is really just a statistical artifact. I suspect devs are actually decent at estimating the *median* time to complete a task. Planning is hard because they suck at the *average*.

The hacker's guide to uncertainty estimates

Erik Bernhardsson

It started with a tweet: New years resolution: every plot I make during 2018 will contain uncertainty estimates — Erik Bernhardsson (@fulhack) January 7, 2018. Because I’ve been sitting in 100,000,000 meetings where people endlessly debate whether the monthly number of widgets is going up or down, or whether widget method X is more productive than widget method Y. For almost any graph, quantifying the uncertainty seems useful, so I started trying.

Groups 219

I don't want to learn your garbage query language

Erik Bernhardsson

This is a bit of a rant but I really don’t like software that invents its own query language. There’s a trillion different ORMs out there. Another trillion databases with their own query language. Another trillion SaaS products where the only way to query is to learn some random query DSL they made up. I just want my SQL back. It’s a language everyone understands, it’s been around since the seventies, and it’s reasonably standardized.

Headcount goals, feature factories, and when to hire those mythical 10x people

Erik Bernhardsson

Since I started building up a tech team for Better , I made a very conscious decision to pay at the high end to get people. I thought this made more sense: they cost a bit more money to hire, but output usually more than compensates for it. Many fellow CTOs, some went for the other side of the spectrum: bootcamps and campus recruiting are great recruiting grounds for them. This was a mystery to me, until it all made sense to me. What is output?

The Importance of PCI Compliance and Data Ownership When Issuing Payment Cards

This eBook provides a practical explanation of the different PCI compliance approaches that payment card issuers can adopt, as well as the importance of both protecting user PII and gaining ownership and portability of their sensitive data.

Interviewing is a noisy prediction problem

Erik Bernhardsson

I have done roughly 2,000 interviews in my life. When I started recruiting, I had so much confidence in my ability to assess people. Let me just throw a couple of algorithm questions at a candidate and then I’ll tell you if they are good or not! Over time I’ve come to the (slightly disappointing) realization that knowing who’s going to be good at their job is an extremely hard problem.

Data architecture vs backend architecture

Erik Bernhardsson

The eigenvector of "Why we moved from language X to language Y"

Erik Bernhardsson

I was reading yet another blog post titled “Why our team moved from to ” (I forgot which one) and I started wondering if you can generalize it a bit. Is it possible to generate a N * N contingency table of moving from language X to language Y? Someone should make a N*N contingency table of all engineering blog posts titled "Why we moved from to " — Erik Bernhardsson (@fulhack) January 25, 2017. So I wrote a script for it.

Language pitch

Erik Bernhardsson

Here’s a fun analysis that I did of the pitch (aka. frequency) of various languages. Certain languages are simply pronounced with lower or higher pitch. Whether this is a feature of the language or more a cultural thing is a good question, but there are some substantial differences between languages. Hertz (or Hz, or ), is the standard way to measure audio frequency. Typical human speech ranges between 50 Hz and 300 Hz. Most men typically range between 85-180Hz, and most women between 165-255Hz.

How to Scale a Data Literacy Program at Your Organization

Speaker: Megan Brown, Director, Data Literacy at Starbucks; Mariska Veenhof-Bulten, Business Intelligence Lead at bol.com; and Jennifer Wheeler, Director, IT Data and Analytics at Cardinal Health

Join data & analytics leaders from Starbucks, Cardinal Health, and bol.com for a webinar panel discussion on scaling data literacy skills across your organization with a clear strategy, a pragmatic roadmap, and executive buy-in.

New benchmarks for approximate nearest neighbors

Erik Bernhardsson

UPDATE(2018-06-17): There are is a later blog post with newer benchmarks ! One of my super nerdy interests include approximate algorithms for nearest neighbors in high-dimensional spaces. The problem is simple. You have say 1M points in some high-dimensional space. Now given a query point, can you find the nearest points out of the 1M set? Doing this fast turns out to be tricky. I’m the author of Annoy which has more than 3,000 stars on Github.

The half-life of code & the ship of Theseus

Erik Bernhardsson

As a project evolves, does the new code just add on top of the old code? Or does it replace the old code slowly over time? In order to understand this, I built a little thing to analyze Git projects, with help from the formidable GitPython project.

Linux 218

Missing the point about microservices – it's about testing and deploying independently

Erik Bernhardsson

Ok, so I have to first preface this whole blog post by a few things: I really struggle with the term microservices. I can’t put my finger on exactly why. Maybe because the term is hopelessly ill-defined, maybe because it’s gotten picked up by the hype train. Whatever. But I have to stick to some type of terminology so let’s just roll with it. This blog post might be mildly controversial, but I’m throwing it out there because I’ve had this itchy feeling for so long and I can’t get rid of it.

The software engineering rule of 3

Erik Bernhardsson

Here’s a dumb extremely accurate rule I’m postulating* for software engineering projects: you need at least 3 examples before you solve the right problem. This is what I’ve noticed: Don’t factor out shared code between two classes. Wait until you have at least three. The two first attempts to solve a problem will fail because you misunderstood the problem. The third time it will work. Any attempt at being smart earlier will end up overfitting to coincidental patterns.

Assess and Advance Your Organization’s DevSecOps Practices

In this white paper, a DevSecOps maturity model is laid out for technical leaders to use to enable their organizations to stay competitive in the digital economy.

Toxic meeting culture

Erik Bernhardsson

I spent six years at a company that went from 50 people to 1500 and one contributing factor leading to my departure was that I went from a “maker” to a person stuck in meetings every day. It wasn’t that I wanted to do that, but everyone else kept dragging me into meetings. There’s about 47 million blog posts about why meetings suck and I’m not going to pile more onto that heap. For the record – a well run meeting is great!

Miscellaneous unsolicited (and possibly biased) career advice

Erik Bernhardsson

No one asked for this, but I’m something like ~12 years into my career and have had my fair share of mistakes and luck so I thought I’d share some. Honestly, I feel like I’ve mostly benefitted from luck. Some of the things I did on a whim turned out to be excellent choices many years later. Some of the things were clear blind spots in hindsight. If I could give my 12 years younger self a bunch of career advice, here are some of those things. Choosing a company.

The number of letters in the word for each number

Erik Bernhardsson

Just for fun, I generated these graphs of the number of letters in the word for each number. I really spent about 10 minutes on this (ok…possibly also another 40 minutes tweaking the plots): More languages!! I love how Spanish has a few super compact words: “cien mil” for 100,000 for instance. Only eight letters, versus English “one hundred thousand” (20 letters). I don’t know much about French but I think they have some kind of weird system based on 20s. Which by the way also Danish has.

Lambda 158

Learning from users faster using machine learning

Erik Bernhardsson

I had an interesting idea a few weeks ago, best explained through an example. Let’s say you’re running an e-commerce site (I kind of do ) and you want to optimize the number of purchases. Let’s also say we try to learn as much as we can from users, both using A/B tests but also using just basic slicing and dicing of the data. We are looking at how many people convert (buy our widgets) but a constant problem is there’s just too much uncertainty. How can we learn faster?

Boost Your IT Success by Making the Move From Project-Driven to Product-Led

According to Gartner, 85% of organizations have adopted, or plan to adopt, a transition from being project-driven to product-led. Learn how to shape and accelerate digital innovation through a fundamental shift to a product-centric organization.

Business secrets from terrible people

Erik Bernhardsson

I get bored reading management books very easily and lately I’ve been reading about a wide range of almost arbitrary topics. One of the lenses I tend to read through is to see different management styles in different environments. It turns out that some truly f—ng horrific people have some smart management ideas. This is not maybe surprising. If you have some twisted goals, you can’t have incompetent leadership or you won’t get anywhere.

Optimizing for iteration speed

Erik Bernhardsson

I’ve written before about the importance of iterating quickly but I didn’t necessarily talk about some concrete things you can do. When I’ve built up the tech team at Better , I’ve intentionally optimized for fast iteration speed above almost everything else. What are some ways we did that? Continuous deployment. My dubious claim is that we might be the only financial institution in the world to deploy continuously. I actually ended up getting quoted in the Economist about this specifically.

Conversion rates – you are (most likely) computing them wrong

Erik Bernhardsson

How hard can it be to compute conversion rate? Take the total number of users that converted and divide them with the total number of users. Except… it’s a lot more complicated when you have any sort of significant time lag. Prelude – a story. Fresh out of school I joined Spotify as the first data analyst. One of my first projects was to understand conversion rates. Conversion rate from the free service to Premium is tricky because there’s a huge time lag.

Miscellaneous unsolicited (and possibly biased) career advice

Erik Bernhardsson

No one asked for this, but I’m something like ~12 years into my career and have had my fair share of mistakes and luck so I thought I’d share some of those things. Honestly, I feel like I’ve mostly benefitted from luck. Some of the things I did on a whim turned out to be excellent choices many years later. Some of the things were clear blind spots in hindsight. If I could give my 12 years younger self a bunch of career advice, here are some of those things. Choosing a company.

How to Operationalize Data From Multiple Sources to Deliver Actionable Insights

Speaker: Speakers from SafeGraph, Facteus, AWS Data Exchange, SimilarWeb, and AtScale

Learn how to blend various datasets together from top data providers in AWS Data Exchange like Safegraph, Facteus, and SimilarWeb with internal data to innovate and make smarter decisions at scale.

Waiting time, load factor, and queueing theory – why you need to cut your systems a bit of slack

Erik Bernhardsson

I’ve been reading up on operations research lately, including queueing theory. It started out as a way to understand the very complex mortgage process (I work at a mortgage startup ) but it’s turned into my little hammer and now I see nails everywhere. One particular relationship that turns out to be somewhat more complex is the relationship between cycle time and throughput. Here are some examples of situations where this might apply: What’s a good CPU load for a database?

System 123

Pareto efficency

Erik Bernhardsson

Pareto efficiency is a useful concept I like to think about. It often comes up when you compare items on multiple dimensions. Say you want to buy a new TV. To simplify it let’s assume you only care about two factors: price and quality. We don’t know what you are willing to pay for quality – but we know that everything else equals : The cheaper the better. The higher quality the better. This means we can rule out some TV’s immediately.

Travel 147

Why software projects take longer than you think – a statistical model

Erik Bernhardsson

Anyone who built software for a while knows that estimating how long something is going to take is hard. It’s hard to come up with an unbiased estimate of how long something will take, when fundamentally the work in itself is about solving something. One pet theory I’ve had for a really long time, is that some of this is really just a statistical artifact. I suspect devs are actually decent at estimating the *median* time to complete a task. Planning is hard because they suck at the *average*.

New approximate nearest neighbor benchmarks

Erik Bernhardsson

As some of you may know, one of my side interests is approximate nearest neighbor algorithms. I’m the author of Annoy , a library with 3,500+ stars on Github as of today. It offers fast approximate search for nearest neighbors with the additional benefit that you can load data super fast from disk using mmap. I built it at Spotify to use for music recommendations where it’s still used to power millions (maybe billions) of music recommendations every day.

Make Payment Optimization a Part of Your Core Payment Strategy

Everything you need to know about payment optimization – an easy-to-integrate, PCI-compliant solution that enables companies to take control of their PSPs, minimize processing costs, maximize approval rates, and keep control over their payments data.