CTO Universe

Predicting solar eclipses with Python

Erik Bernhardsson

APRIL 7, 2024

As I am en route to see my first total solar eclipse, I was curious how hard it would be to compute eclipses in Python. It turns out, ignoring some minor coordinate system head-banging, I was able to get something half-decent working in a couple of hours. I didn't want to go deep on celestial mechanics, so I decided to leverage Python's fantastic ecosystem for everything.

Lambda

Lambda 3D Construction Testing

Simple sabotage for software

Erik Bernhardsson

DECEMBER 12, 2023

CIA produced a fantastic book during the peak of World War 2 called Simple Sabotage. It laid out various ways for infiltrators to ruin productivity of a company. Some of the advice is timeless, for instance the section about “General interference with Organizations and Production”: Insist on doing everything through “channels” Never permit short-cuts to be taken in order to expedite decisions.

Software Review

Software Review Weak Development Team Technical Advisors Software

What I have been working on: Modal

Erik Bernhardsson

DECEMBER 6, 2022

Long story short: I'm working on a super cool tool called Modal. Please check it out — it lets you run things in the cloud without having to think about infrastructure. Scaling out, scheduling, containerization, using GPUs, setting up webhooks, and all kinds of other stuff. It's primarily meant for data teams. We aren't quite live, but you can sign up for our waitlist.

CTO Coach

CTO Coach Fractional CTO Software Engineering Serverless

Webinars

Peak Performance: Continuous Testing & Evaluation of LLM-Based Applications

MORE WEBINARS

We are still early with the cloud

Erik Bernhardsson

OCTOBER 18, 2022

This is is in many respects a successor to a blog post I wrote last year. about what I want from software infrastructure, but the ideas morphed in my head into something sort of wider. The genesis. I encountered AWS in 2006 or 2007 and remember thinking that it's crazy — why would anyone want to put their stuff in someone else's data center? But only a couple of years later, I was running a bunch of stuff on top of AWS.

Cloud

Cloud Lambda Software Engineering AWS

?-driven project management: when is the optimal time to give up?

Erik Bernhardsson

APRIL 4, 2022

Hi! It's your friendly project management theorician. You might remember me from blog posts such as Why software projects take longer than you think , which is a blog post I wrote a long time ago positing that software projects completion time follow a log-normal distribution. Just a bit of a refresher if you don't want to re-read that whole post. What does it mean that project completion time has a log-normal distribution?

Project Management

Project Management Software Engineering Open Source

Storm in the stratosphere: how the cloud will be reshuffled

Erik Bernhardsson

NOVEMBER 30, 2021

Here's a theory I have about cloud vendors (AWS, Azure, GCP): Cloud vendors 1 will increasingly focus on the lowest layers in the stack: basically leasing capacity in their data centers through an API. Other pure-software providers will build all the stuff on top of it. Databases, running code, you name it. We currently have cloud vendors that offer end-to-end solutions from the developer experience down to the hardware: What if cloud vendors focus on the lowest layer, and other (pure software)

Cloud

Cloud AWS Weak Development Team Serverless

What is the right level of specialization? For data teams and anyone else.

Erik Bernhardsson

JULY 22, 2021

This isn't as much of a blog post as an elaboration of a tweet I posted the other day: I think this specialization of data teams into 99 different roles (data scientist, data engineer, analytics engineer, ML engineer etc) is generally a bad thing driven by the fact that tools are bad and too hard to use — Erik Bernhardsson (@fulhack) July 21, 2021.

Weak Development Team

Weak Development Team Data Tools Engineering

What's Erik up to?

Erik Bernhardsson

APRIL 1, 2021

I joined Better in early 2015 because I thought the team was crazy enough to actually change one of the largest industries in the US. For six years, I ran the tech team, hiring 300+ people, probably doing 2,000+ interviews, and according to GitHub I added 646,941 lines of code and removed 339,164. But I also got married, had two kids, bought an apartment and renovated it!

Data Engineering

Data Engineering Engineering Blockchain Software Engineering

Giving more tools to software engineers: the reorganization of the factory

Erik Bernhardsson

DECEMBER 15, 2020

It's a popular attitude among developers to rant about our tools and how broken things are. Maybe I'm an optimistic person, because my viewpoint is the complete opposite! I had my first job as a software engineer in 1999, and in the last two decades I've seen software engineering changing in ways that have made us orders of magnitude more productive.

Software Engineering

Software Engineering Engineering Tools Software

Developer experience as a competitive advantage

Erik Bernhardsson

OCTOBER 5, 2020

Development

Mortality statistics and Sweden's "dry tinder" effect

Erik Bernhardsson

SEPTEMBER 22, 2020

We live in a year of about 350,000 amateur epidemiologists and I have no desire to join that “club” But I read something about COVID-19 deaths that I thought was interesting and wanted to see if I could replicated it through data. Basically the claim is that Sweden had an exceptionally “good” year in 2019 in terms of influenza deaths causing there to be more deaths “overdue” in 2020.

Exercises

Exercises Trends Data Research

Never attribute to stupidity that which is adequately explained by opportunity cost

Erik Bernhardsson

MARCH 9, 2020

Hanlon's razor is a classic aphorism I'm sure you have heard before: Never attribute to malice that which can be adequately explained by stupidity. I've found that neither malice nor stupidity is the most common reason when you don't understand why something is in a certain way. Instead, the root cause is probably just that they didn't have time yet.

Course

Course Advertising Windows Metrics

How to hire smarter than the market: a toy model

Erik Bernhardsson

JANUARY 12, 2020

Let’s consider a toy model where you’re hiring for two things and that those are equally valuable. It’s not very important what those are, so let’s just call them “thing A” and “thing B” for now. For one set of abilities, the scatter plot looks like this: The assumption here is that A and B are drawn from a 2D-Gaussian with a mild positive correlation.

Marketing

Marketing How To Recruiting Performance

Hiring always means tradeoffs

Erik Bernhardsson

DECEMBER 30, 2019

The title of this blog post makes a claim that seems pretty obvious, but there’s some real “statistical” reasons why it’s true. Let’s consider a toy model where you’re hiring for two things and that those are equally valuable. It’s not very important what those are, so let’s just call them “thing A” and “thing B” Candidates have abilities with those things that are drawn randomly and independently from a distribution, and to

Marketing

Marketing Recruiting Programming Industry

What can startups learn from Koch Industries?

Erik Bernhardsson

DECEMBER 18, 2019

I recently finished the excellent book Kochland. This isn’t my first interest in Koch—I read The Science of Success by Charles Koch himself a couple of years ago. Charles Koch inherited a tiny company in 1967 and turned it into one of the world’s largest ones. That’s impressive! Just a quick disclaimer just to get it out of the way.

Industry

Industry Culture Compliance Metrics

We're hiring at Better

Erik Bernhardsson

DECEMBER 8, 2019

Just a quick note that my team is always hiring at Better. A lot of new people have been joining the team here in NYC lately—the tech team has actually grown from 35 to 60 in just ~3 months. We’re looking for mostly senior software engineers (5+ years work experience, possibly having managed in the past), although we would love to talk if you have less experience too!

Machine Learning

Machine Learning Artificial Inteligence Software Engineering Industry

Miscellaneous unsolicited (and possibly biased) career advice

Erik Bernhardsson

SEPTEMBER 25, 2019

No one asked for this, but I’m something like ~12 years into my career and have had my fair share of mistakes and luck so I thought I’d share some. Honestly, I feel like I’ve mostly benefitted from luck. Some of the things I did on a whim turned out to be excellent choices many years later. Some of the things were clear blind spots in hindsight.

Weak Development Team

Weak Development Team Machine Learning Artificial Inteligence Industry

Luigi: complex pipelines of tasks in Python

Erik Bernhardsson

OCTOBER 20, 2012

I’m shamelessly promoting my first major open source project. Luigi is a Python module that helps you build complex pipelines of batch jobs, handle dependency resolution, and create visualizations to help manage multiple workflows. It also comes with Hadoop support built in (because that’s where really where its strength becomes clear). We use Luigi internally at Spotify to run thousands of tasks every day, organized in complex dependency graphs.

Open Source

Open Source Analysis Infrastructure Testing

A neat little trick with time decay

Erik Bernhardsson

OCTOBER 28, 2012

Something that pops up pretty frequently is to implement time decay, especially where you have recursive chains of jobs. For instance, say you want to keep track of a popularity score. You calculate today’s output by reading yesterday’s output, discounting it by and then adding some hit count for today. Typically you choose so that for a day or something like that.

Lambda

Tumblr’s awesome project names

Erik Bernhardsson

NOVEMBER 17, 2012

Not sure how I managed to miss this, but I’m watching this Tumblr presentation and they talk about their projects named after Arrested Development topics: Gob, Parmesan, Buster, Jetpants , Oscar, George and Motherboy. Still, the best software project name is probably still Apple’s BHA.

Software

Software Development

Calculating cosine similarities using dimensionality reduction

Erik Bernhardsson

DECEMBER 4, 2012

This was posted on the Twitter Engineering blog a few days ago: Dimension Independent Similarity Computation (DISCO). I just glanced at the paper, and there’s some cool stuff going on from a theoretical perspective. What I’m curious about is why they didn’t decide to use dimensionality reduction to solve such a big problem. The benefit of this approach is that it scales much better (linear in input data size) and produces much better results.

Scalability

Scalability Engineering Data Tools

Momentum and mean reversion might just be volatility bias

Erik Bernhardsson

JANUARY 12, 2013

The Economist just published an article called The best, the worst and the ugly. By looking at historical performance for mutual funds, they find strong support for momentum and mean reversion. Picking the best or the worst fund over the previous five years gives great returns over the next five years. I think this is just confusion around what risk reward is.

Performance

NYC Machine Learning meetup

Erik Bernhardsson

JANUARY 21, 2013

From the NYC Machine Learning talk I had last week: Haven’t looked at it yet except briefly. Unfortunately the quality isn’t the best.

Machine Learning

Machine Learning Artificial Inteligence

Slides from NYC Machine Learning talk

Erik Bernhardsson

JANUARY 26, 2013

Slides from the talk. Slightly edited because (a) some of the slides make little sense taken out of context (b) Slideshare seem to have problem converting some of the stuff. Collaborative filtering at Spotify from Erik Bernhardsson.

Machine Learning

Machine Learning Artificial Inteligence

I’m featured in Mashable

Erik Bernhardsson

FEBRUARY 5, 2013

This article from today in Mashable describes some of the fun stuff I get to work with: Erik Bernhardsson is technical lead at Spotify, where he helped to build a music recommendation system based on large-scale machine learning algorithms, mainly matrix factorization of big matrices using Hadoop. He moved into this role after heading the Business Intelligence team, where he collected, aggregated and made sense of all the data at Spotify, whether that’s ad-hoc insights, A/B testing, visualizatio

Machine Learning

Machine Learning Artificial Inteligence Business Intelligence Testing

ML at Twitter

Erik Bernhardsson

FEBRUARY 26, 2013

I recently came across this paper describing how they do ML at Twitter. TL;DR Their approach is pretty interesting. Everything is a Pig workflow and then they do everything as UDF’s. This approach seems pretty interesting. As long as your data can be expressed as small atomic machine learning functions, I’m sure it works great. But there’s so much more than that.

Machine Learning

Machine Learning Artificial Inteligence Programming Engineering

More Luigi!

Erik Bernhardsson

MARCH 21, 2013

Elias Freider just talked about Luigi at PyData 2013: The presentation above is much better than one I put together a few weeks ago.

Annoy

Erik Bernhardsson

APRIL 11, 2013

Annoy is a simple package to find approximate nearest neighbors (ANN) that I just put on Github. I’m not trying to compete with existing packages, but Annoy has a couple of features that makes it pretty useful. Most importantly, it uses very little memory and can put everything in a contiguous blob that you can mmap from disk. This way multiple processes can share the same index.

Data

Being data driven

Erik Bernhardsson

APRIL 12, 2013

I picked up an issue of Foreign Affairs while flying back to NYC from SFO. It features this long interview with U.S. General Stanley McChrystal and I thought it was pretty interesting how striking some of the similarities are between fighting in a war and developing software. On cycle time and how it’s important to learn and integrate quickly: In 2003, in many cases we’d go after someone, we might locate them and capture or kill them, and it would be weeks until we took the intelligence we learn

Data

Data Software Development

Presentation about Luigi

Erik Bernhardsson

APRIL 25, 2013

I like the editing!

Stuff that bothers me: “100x faster than Hadoop”

Erik Bernhardsson

APRIL 26, 2013

The simple way to get featured on big data blog these days seem to be. Build something that does 1 thing super well but nothing else. Benchmark it against Hadoop. Publish stats showing that it’s 100x faster than Hadoop. $$$. Spark claims their 100x faster than Hadoop and there’s a lot of stats showing Redshift is 10x faster than Hadoop. There’s a bunch of papers with similar claims.

Hardware

Hardware Big Data Scalability Tools

Snakebite

Erik Bernhardsson

MAY 6, 2013

Just promoting Spotify stuff here: check out the Snakebite repo on Github, written by Wouter de Bie. It’s a super fast tool to access HDFS over CLI/Python, by accessing the namenode directly over sockets/protobuf. Spotify’s developer blog features a nice blog post outlining what it’s useful for. I think this kicks ass and there will definitely be some kind of Luigi integration coming up at some point. .

Tools

Tools Development

Fermat’s principle

Erik Bernhardsson

MAY 20, 2013

I was browsing around on the Internet and the physics geek in me started reading about Fermat’s principle. And suddenly something came back to me that I’ve been trying to suppress for many years – how I never understood why there’s anything fundamental about the principal of least time. The principle of least time states that the light will travel from A to B in such a way that the time is minimized.

Travel

Travel Construction Internet Course

Spotify’s Discovery page

Erik Bernhardsson

MAY 30, 2013

The Discovery page, the new start page in Spotify, is finally out to a fairly significant percentage of all users. Really happy since we have worked on it for the past six months. Here’s a screen shot: Some cool features. Artist/album/track recommendations based on stuff you’ve listened to before. New releases recommendations. Concert recommendations.

Wikiphilia

Erik Bernhardsson

JUNE 1, 2013

I’ve been obsessed with Wikipedia for the past ten years. Occasionally I find some good articles worth sharing and that’s why I created the wikiphilia Twitter handle. Just a long stream of stuff that for one reason or another may be interesting. It’s also a bunch of friends posting links. Anyway, the tragedy is that there’s 800 tweets but only 70 followers, so you should follow it now.

NoDoc

Erik Bernhardsson

JUNE 15, 2013

We had an unconference at Spotify last Thursday and I added a semi-trolling semi-serious topic about abolishing documentation. Or NoDoc , as I’m going to call this movement. This was meant to be mostly a thought experiment, but I don’t see it as complete madness. To be clear, I’m not talking about comments in the code here. I think those are great, and you should probably do more than you are already doing.

Guidelines

Guidelines .Net Architecture Examples

hdfs2cass

Erik Bernhardsson

JUNE 18, 2013

Just open sourced hdfs2cass which is a Hadoop job (written in Java) to do efficient Cassandra bulkloading. The nice thing is that it queries Cassandra for its topology and uses that to partition the data so that each reducer can upload data directly to a Cassandra node. It also builds SSTables locally etc. Not an expert at Cassandra so I’ll stop describing those parts before I embarrass myself.

Open Source

Open Source Data Tools

More Luigi!

Erik Bernhardsson

JUNE 25, 2013

Continuing in the same spirit of shameless self-promotion, here’s some recent Luigi press: Reddit thread. A Guide to Python Frameworks for Hadoop (slides from the NYC Hadoop User Group ). This presentation from the Open Analytics NYC meetup about how Foursquare uses Luigi. . Luigi is in the middle of a pretty massive refactoring of the visualizer. David Whiting at Spotify just ripped out the old visualizer (based on Graphviz) and replaced it with one based on D3.

Analytics

Analytics Groups

Optimizing over multinomial distributions

Erik Bernhardsson

JULY 23, 2013

Sometimes you have to maximize some function where and. Usually, is concave and differentiable, so there’s one unique global maximum and you can solve it by applying gradient ascent. The presence of the constraint makes it a little tricky, but we can solve it using the method of Lagrange multipliers. In particular, since the surface has the normal , the following optimization procedure works: Go one step in the direction of the gradient.

More Luigi: Presentation from OSCON

Erik Bernhardsson

JULY 26, 2013

I was in Portland, OR for a few days hanging out at OSCON. Was fun. I also talked a bit about Luigi : Next week I’m presenting at the NYC Predictive Analytics meetup together with Blake Shaw from Foursquare. The topic is ML + Hadoop. Will be fun!

Analytics

HubSpot’s Picture Shows how to Maintain Monocultures in the 21st Century

Erik Bernhardsson

JULY 27, 2013

I thought this article about the company culture at HubSpot is kind of funny. “HubSpot’s Awesome Presentation Shows how to Create a 21st Century Culture”. Just FYI: You’re not different. You’re a bunch of white hipsters aged 25-30 dressed up in the same theme. That’s not being different. On a more serious note, this represents one of the most challenging aspects of scaling a company culture.

How To

How To Culture Company

ML+Hadoop at NYC Predictive Analytics

Erik Bernhardsson

AUGUST 2, 2013

I was just at the NYC Predictive Analytics meetup talking about how we build machine learning algorithms using Hadoop to power music recommendations. Great meetup, where we had two speakers, me and Blake Shaw from Foursquare. Blake talked about how they use machine learning at Foursquare, using Hadoop (and Luigi), and he uploaded his slides here ! Here’s the full video for the talk (both mine and Blake’s).

Analytics

Analytics Machine Learning Artificial Inteligence Video

Delivering Music Recommendations

Erik Bernhardsson

AUGUST 8, 2013

I’ve turned into a lazy bastard and I’m just posting presentations on this blog, but here’s one from Rohan Singh at Spotify talking about the backend infrastructure of the Discover page.

Infrastructure

2D embedding of 5k artists = WIN

Erik Bernhardsson

AUGUST 10, 2013

I’m at KDD in Chicago for a few days. We have a Spotify booth tomorrow, and I wanted to put together some cool graphics to show. I’ve been thinking about doing a 2D embedding of the top artists forever since I read about t-SNE and other papers so this was a perfect opportunity to spend some time on it. So – I spent a couple of hours taking the lower dimensionality representation of all artists, plugging it into the C++ implementation they provide, then using matplotlib to render something cool.

Erik Bernhardsson

Predicting solar eclipses with Python

Simple sabotage for software

Webinars

Trending Sources

What I have been working on: Modal

Webinars

We are still early with the cloud

?-driven project management: when is the optimal time to give up?

Storm in the stratosphere: how the cloud will be reshuffled

What is the right level of specialization? For data teams and anyone else.

What's Erik up to?

Giving more tools to software engineers: the reorganization of the factory

Developer experience as a competitive advantage

Mortality statistics and Sweden's "dry tinder" effect

Never attribute to stupidity that which is adequately explained by opportunity cost

How to hire smarter than the market: a toy model

Hiring always means tradeoffs

What can startups learn from Koch Industries?

We're hiring at Better

Miscellaneous unsolicited (and possibly biased) career advice

Luigi: complex pipelines of tasks in Python

A neat little trick with time decay

Tumblr’s awesome project names

Calculating cosine similarities using dimensionality reduction

Momentum and mean reversion might just be volatility bias

NYC Machine Learning meetup

Slides from NYC Machine Learning talk

I’m featured in Mashable

ML at Twitter

More Luigi!

Annoy

Being data driven

Presentation about Luigi

Stuff that bothers me: “100x faster than Hadoop”

Snakebite

Fermat’s principle

Spotify’s Discovery page

Wikiphilia

NoDoc

hdfs2cass

More Luigi!

Optimizing over multinomial distributions

More Luigi: Presentation from OSCON

HubSpot’s Picture Shows how to Maintain Monocultures in the 21st Century

ML+Hadoop at NYC Predictive Analytics

Delivering Music Recommendations

2D embedding of 5k artists = WIN

Stay Connected