Table Of Contents
Stream Episode One:

In the early 2000s, Amazon faced an existential threat: Speed up their pace of software engineering, or risk losing eminence in the burgeoning e-commerce landscape.

Recalling the period, Tom Killalea, Amazon’s then-VP of infrastructure and distributed systems, later wrote: “…we turned to Matt Round, an engineering leader who was a most interesting squeaky wheel in that his team appeared to get more done than any other, yet he remained patient and complained loudly about and with great clarity about how hard it was to get anything done.”

Matt Round was not singularly responsible for the cloud — no one was. But Matt was instrumental in innovations that led to the cloud, including proving the efficiency of distributed storage and computing and pushing for “two-pizza teams” (engineering teams that could be fed with two pizzas, but more importantly, could work quickly).

In an hour long interview, Matt talked with CloudZero about all this and more. You can hear Matt across the first three episodes of “Cloud Atlas: How The Cloud Reshaped Human Life.”

Check out our full interview with Matt below.

Listen to Cloud Atlas, Episode One here. Subscribe to the series wherever you get your podcasts:

Stream Episode One:

applepodcasts-badge
spotify-badge
amazonmusic-badge
deezer-badge

Dustin Lowman: Why don’t you start by introducing yourself, and saying some of the roles you held at Amazon.

Matt Round: My name is Matt Round, and I joined Amazon as a software engineer. I then managed a small team of software engineers, and then I directed Amazon’s Personalization Department, and finally was responsible for setting up their software operation in Scotland.

DL: How would you describe the Personalization Department?

MR: The Personalization Department at Amazon was responsible for everything in the website, and in the wider Amazon ecosystem that tailored itself specifically to who you were as a customer.

We used to think about trying to build a store for every customer. I guess it’s one of the distinctive advantages of being an online operation. For a physical store, everyone has to walk into the same shop.

But with a digital store you can reconfigure the store for every customer. Each customer can have a completely customized experience. And so our endgame goal was to deliver a store for every customer. They will find exactly what they want right at the front of the store, leading to serendipitous discovery of products deep in the catalog. Particularly as Amazon’s catalog grew over time, and we weren’t just selling books, but selling everything, navigating that massive selection became an increasingly substantial problem.

DL: Yeah, it’s convenient to talk about that, because my understanding is that the goal in your era of Amazon was to simulate the flexibility of an in-person shopping experience online. But it sounds like what you’re really saying is that it wasn’t just to meet the flexibility of in-person, but to exceed the flexibility and optionality of shopping in person. Is that accurate?

MR: Yeah, I think that’s part of it. I mean, Jeff’s a genius. Part of what Jeff recognized was the potential and the opportunity to go beyond that which is possible in the physical environment, and to deliver something even more compelling and more helpful to our customers.

DL: So give us sort of a SparkNotes of your career — overall, but especially the trajectory from your first professional experience to Amazon.

MR: So I studied computer science at university because I presumed it would probably get me a job at the end. I was thinking about astrophysics, or planetary physics, but it wasn’t quite so clear that those would go anywhere.

At that time in the U.K., it was quite common for a large company to sponsor you through university, and they would give you a bit of money in return for working for them in the summer. So I was sponsored by British Telecommunications, meaning I would spend my years in university and my summers working for British Telecom.

They have a huge research laboratory where all their geeky people get together and do cool things, and I got to spend my summers there — looking at speech recognition, which was pretty cutting-edge at the time. I worked on systems for their custom hardware that was intended to enable speech recognition in phone answering.

I remember very clearly in my second summer with British Telecom one of my bosses showing me Mosaic, which was one of the very first web browsers, and telling me that one day this web thing was gonna be really significant. But this was at the stage where there was, you know, a website that listed every website that existed on the internet, so you could click on them all. That was all there was.

I next moved to London and worked for British Telecom again, working on satellite internet (a long time before Starlink), and was then recruited by a hedge fund called D.E. Shaw. Jeff Bezos used to work for D.E. Shaw as well. I was part of their currency trading operation. I built trading system visualizations, and I got to sit next to the really clever people who did the smart mathematics that made them all the money, and I got to sit next to the traders who picked up the phones and went on fancy dinners. I was kind of the middle glue that stuck the two together, so not particularly technically complex, but great people to be around in a fascinating company, doing lots of interesting things.

I moved with D.E. Shaw to their New York office after the Asian financial crisis caused some trouble for our London operation, which was handling a lot of Japanese equities and derivatives. After working in the New York office for a bit, I had decided I was gonna move on.

I was writing to my friends about moving on, and one of the friends I wrote to was a guy called John Overdeck. John Overdeck had worked with Jeff and gone on to Amazon, and he was beginning to tell people to come and explore Amazon with him as well. I took a last-minute trip out to Seattle just before accepting a separate hedge fund job, interviewed with Amazon, and decided it’d be worth taking a risk on this crazy thing called the internet, just to see whether there was anything in it at all.

I joined Amazon as a programmer when there were about 400 programmers in the company. It’s not the super early days of Amazon, with Jeff on his knees in his garage, you know, packaging up books. It’s quite a substantial company. But it’s still very early in its evolution, and I joined the Personalization team which at the time was about 12 people, run by a guy called Josh Peterson.

Josh had this amazing creativity and business orientation, and could put the two together to have really creative ideas for products. And, he had a lot of software engineers supporting him to make those ideas real. It was a great team to be a part of.

DL: There’s a lot I want to dive into there. Maybe the most interesting thing for me as an onlooker is that you’ve been part of a lot of technologies that have become integral to the way people live and work. Speech recognition, software, the internet — what was it like to see these things in beta? Did you sense, as you were working on these things, that they would become pillars of civilization?

MR: I think it’s funny. Having seen a bunch of these technologies that have become so significant in our world, I’ve probably got very poor judgment, because they didn’t look very promising to me at the the stage that I interacted with them. I really didn’t recognize the potential that was in them at all. I mean, they were quite disappointing at this stage.

You have to remember at that stage in British Telecom, we had custom hardware to do speech recognition, and we could recognize whether somebody had said “Yes” or “No” with something like 70% accuracy, so long as they didn’t have a strong accent. So it was reasonably ineffective at this stage. The web in the early days really did not look like it would ever kind of amount to anything. So although I saw them, I didn’t see the potential in them.

The truth is that that’s happened to me lots of times. I didn’t tink [Amazon] Web Services was ever gonna be a significant part of anything. I can remember deriding it early on in my time as just a great way of giving away our crown jewels, because at that stage, it was primarily about enabling syndication of Amazon content on other websites.

One of the things we would give away was the “personalization similarities” dataset. People who bought Web Services also bought that. And we believed, with good reason, that that was incredibly valuable data, and we just didn’t understand why you want to give that away to anyone else.

But as it turns out, Web Services is quite a good idea. So I’ve got a good track record of missing the boat on key technologies.

DL: Well, you’re certainly not alone in that. So, another thing that interests me is hearing about your early life as a software engineer. I sometimes think about people doing the same profession at different points in history — take musicians for example. If you’re a troubadour in the mid-1800s versus the 1960s. In the 1960s, you can become the king of the world. In the mid-1800s I don’t think you really can.

It seems there’s very much a parallel there with software engineering. Nowadays, I feel like whenever a friend of mine doesn’t know what to do with their life, they become a software engineer. I imagine it has not always been that way. So, I mean, you were a software engineer at a very different historical moment, and I’m curious to hear what differences you see between the two moments.

MR: When I started out working for this big telecommunications company, our job was very much to deliver what the business people and the marketing people had promised. So, software was very much a kind of service function — not a leading function or an ideas function.

One of the crucial shifts that has happened is that technology is now leading rather than following, and at the center, rather than just being a service function. Again, that’s one of the amazing sweet spots with Amazon is that it was so much a technology company doing business rather than a business company that tried to bolt on some technology. Jeff gets the credit for that; he always made Amazon a place where technology was the leader rather than the servant.

DL: It just strikes me that every major innovation or business event is driven by some kind of central technological innovation. Maybe that’s always been the case, but it seems especially easy, or especially frictionless today.

MR: Yeah, yes, definitely. We live in an age of opportunity where there is so much power and potential available to anyone. Now it’s the ideas they’re in short supply. You can do almost anything you can imagine. It’s whether you can imagine something worth doing.

DL: So talk to me about the time when you’re interviewing with Amazon. How do you conceptualize the company at that point? It’s not only a digital bookstore by then. It’s trying to become something more, and indeed, succeeding in certain early goals of becoming something more. But how do you see it from the outside? And why is it appealing to you?

MR: So before I interviewed with Amazon, I think I’d only had one interaction with the company. I wasn’t, you know, a committed mega-user or anything like that. We’d had one failed delivery. We contacted customer service, and were really surprised to get an engaging reply from what appeared to be something like a human that addressed my problem. So the only experience I had was a surprising level of customer service. I hadn’t been particularly taken by the website or impressed by it at all.

When I arrived at Amazon to interview, the smell of chaos was in the air. It definitely felt like this was a company in a radical growth phase, with all the excitement and energy and complexity and difficulty that comes with that. It was clear to us that it was a time of opportunity, but it wasn’t at all clear in 1999 that Amazon was going to survive. Not all clear.

I was definitely thinking, “This is a very high-risk move, but I’m pretty young — if this whole thing goes up in smoke, it won’t really matter.” I joined when the stock price, I think, was $65. I rode it down to the bottom, which was something like $4 pre-split.

DL: What was it like from a software development perspective? How did you see the mountain before you that you would have to climb as a software developer?

MR: This was a stage where there were no building blocks in the software engineering side of the universe at all. So basically everything that we did at Amazon, we had to do from absolute scratch. Nowadays, nobody in their right mind would think about building a website without some sort of templating language, some sort of way to rapidly construct reusable parts of the page, build links, find your way back to them, things like that. But nothing like that existed.

So, all of it was totally homegrown. If you think about being able to search a catalog of a million products, well, nobody had a catalog of a million products that you could search a thousand times a second. So these were theoretical problems that nobody else had run into. There were academic approaches to some of these solutions. But there weren’t commercialized versions.

In Personalization, we were using a system called BookMatcher that I think was actually a university product. It was a so-called “collaborative filtering technology,” which would connect you to customers like you, scan what they’d buy that you hadn’t bought, and recommend that you buy those things.

The big problem with BookMatcher was that it was way too slow. It just couldn’t handle the scale of the dataset, the number of customers, the number of purchases, the number of items, but also the frequency with which we were asking the question, “What should I show to this customer?”

So you had these academic solutions to some of the problems. But we didn’t have industrial solutions, and we were building the industrial solutions to a bunch of these problems.

I don’t want to overplay it, but it did feel a bit like an Industrial Revolution type of thing. There are theoretical understandings of gears and cogs, and you know, compressors and pneumatics. But nobody’s built a train. There’s a toy engine on somebody’s desk in a university. But nobody’s actually built a train that takes a hundred people from London to somewhere else, and that is definitely what it felt like in the land of the web at the time.

When I joined, there was a mountain of code, and because the company had been building very, very fast. It wasn’t particularly tidy or well-organized, because they hadn’t had the chance to learn about repeatable patterns. There’s lots of duplication, lots of different ways of doing nearly the same thing.

What we needed was something like apprenticeship. We needed somebody to take you in and show you how to hit the horseshoe, so you can hit the horseshoe as well. When there was nobody showing you, and it had to get done, you just did what you could.

I will say, it was really cool to be able to do work, show it to a million people the next day, and find out that it failed in some catastrophic way, or it didn’t work very well for the business, and then iterate quickly and try something else.

DL: What it makes me think about is the division between theory and practice. Academia is often criticized on the basis that their work may be interesting, but it has no practical value. What it sounds like here is that Amazon was sort of like the academy and foundry all at once. You’re doing things at the leading edge of technology, and also testing them, day-of, to see if they have any actual practical value.

MR: To be fair, not that many things we were doing were so complicated that they hadn’t been done before or couldn’t be done before. Most of the challenge came from scale. Things that work fine when you’ve got a relatively small user base no longer work when you’ve got a million customers, a million hits a day, ten million items in your catalog — that’s where things broke down, more than, “Nobody has ever thought before about how to make recommendations to people.”

DL: That seems to lead naturally into the monolith-to-microservice transition, that enables you and others to operate at web scale or at, you know, a non-geographically-limited scale. Maybe now is the time to get into that central innovation, and what it took to make that happen.

MR: Yeah, okay. Forgive me — this is a while ago now, so I might not get all the details right.

At the time we had four of what we called “onlines.” That means four computers that served the website. We’d been through this phase where we were just trying to get big as fast as we could. So, we had bought faster computers rather than trying to make our software better. And we’d reached the end of that road.

We were running Deck Alphas which were the biggest, fastest things that you could easily buy at the time. We had these four beastly machines that would serve the website, and behind them we had a couple of massive databases that were probably some of the largest operational databases in the world. We were reaching the stage where we couldn’t access enough information quickly enough on these four front web servers to do what we wanted to do.

The Personalization team had done some creative problem-solving to understand how we could rapidly make some sort of customer-specific recommendation for hundreds and even thousands of people a second. We’d done this by pushing pre-digested information all the way up to these front web servers and then accessing it from those from web servers, but we’d reached the limit of the amount of information that we could keep on those front servers. There just wasn’t any more room.

We had this project called Eight is Not Enough, whose goal was to make it so that we were recommending new purchases based on more than your eight most recent purchases. The reason for this is, if you’d bought nine items, and they were all in, say, the “Harry Potter” series, which is what everyone would do at the time, eight of your nine purchases was all we could manage to keep, because we couldn’t store any more in this forward-deployed dataset.

Then we would almost always recommend you the last book, which you’d already bought. We knew perfectly well you’d bought it. We could have avoided recommending it. We just couldn’t get that information close enough to the front, and so we were trying to think about and look at ways to make more information more readily accessible.

So I was toying with different ideas and different solutions focused in machine work. I was thinking about these fabulously expensive computers, doing all of our work inside those machines and getting it done there, and I was just trialing the idea of putting information on another machine and getting across the network.

I thought, “There’s no way this is going to be competitive with work on the machine. There’s no way it can be faster than going to the secondary storage on that first machine. But I want to jump in on that.”

DL: When you’re talking about the difference between doing work within a server versus communicating across different ones, was this a technology you were inventing? Or was this a technology that existed that you were redeploying in an Amazonian context?

MR: The basic ability to talk between computers, that’s how the web works — that’s as old as the hills. But the idea of moving information somewhere else, and accessing it from somewhere else was novel commercially. That didn’t exist at all.

It seemed really implausible to me that it would be faster to go to another computer and get something from its memory, than to go from your memory down to your disk and bring it back. But in my first test, the performance was much better than I’d anticipated. It really looked like it might work.

I can’t quite remember how the decision process worked, but we decided we would try and build not just a single-use remote service, but a framework for building services that would take some information and put it on a secondary machine.

And so at that point I worked with a small team, and the four of us built what we called the “Iquitos Service Framework.” Amazon loves to name things after Amazon-y things. The Amazon River has a point in it, apparently where it is the widest, and flows through many different channels, called Iquitos. And, it has a narrowest point, where it all gathers together, called Obidos. Obidos was the name for our front web servers, and so we designed Iquitos to enable you to service-ize remote small pieces of data easily.

Then, we realized that we could do previously impossible things with this. We suddenly found we were able to remember the last eight pages every customer had visited in real time. So we added a feature built around recently viewed items and recommendations that had previously been impossible.

The way it used to work is every night we would download all the things everyone had purchased. We built that into a kind of specialized database, and then we pushed that specialized database to the front-end machines. It always took a day, but with this new framework, we were able to reflect what you’d done immediately. You couldn’t have accessed that information fast enough before. It created new opportunities and enabled us to solve Eight is Not Enough.

The first constraint on these four key web servers was, they ran out of memory. So, we had remoted some data from them to effectively expand the storage.

The next thing was, they ran out of computational capacity, and so then you could remote computation instead. From that, the idea of moving processing around arose.

At that point, we started as a company quite rapidly, dividing into chunks all the work that was going to get done, and we started revisiting our whole framework for how you build webpages, presuming that lots of this was going to be done on lots of different machines in lots of different places in parallel, and this is around about the point that we started running into challenges around getting enough hardware deployed quickly enough that we could launch features.

We had that conversation around, “Wouldn’t it be great if there was a way to programmatically get additional server space,” which turned out to be a very, very good idea.

DL: I’m comparing it to a human’s own random access memory, versus something that they have to have their memory jogged on. And what it sounds like is that what this was doing was, you know, effectively making everything random access. So if you could remember every moment of your life just like that — it gave Amazon a sort of photographic memory, where it remembered everything instantly, and could deliver it at the drop of a hat.

MR: Yeah, that’s right. I mean, yeah, in principle, it made that unlimited. In reality, at this stage, we were still very limited by the number of computers you could rack, and how much they cost, how much memory you could put in them, and how many things you could connect on a network to one another.

The other way to think about it is, you get to the stage where you don’t need to have encyclopedic knowledge anymore. You can remember everything about the Napoleonic War. But when somebody asks you about the Civil War, you’re like, “Let me ask an expert who knows all that stuff and can tell me in a flash.”

So specialization is what’s going on. Rather than having one computer that has to do everything, we suddenly create the ability to have a whole network of specialists that you can ask, and they’re ready at a moment’s notice to answer specialized questions. and then you can put together the specialized answers and give a more holistic response.

DL: The way that I came across your name was because of the six-pager you wrote where you likened Amazon to a tectonic plate rather than an F-16, which is, from a language perspective, a great turn of phrase. Where in the timeline does that comment fall?

MR: I can’t quite remember. I think we built this initial service framework pretty early on in my time at Amazon. So I think this would be, you know, 2000-2001 that we needed a solution to the memory limitations. That would have been pretty early. What started as a solution for my team, the Personalization team, I guess we relatively quickly realized was a more generally useful tool, and began kind of offering it to others as something that they could use.

I probably wrote about our poor maneuverability later on — that was more associated with the tension between the large, corporate-wide projects versus small, independently operating SWAT teams with specific areas of focus.

I don’t know if you’ve heard the term “two-pizza team.” It comes from a couple engineering leaders in Amazon who had created a small-scale, cross-functional team which had design resources, in-house marketing, project management software, and presentation-layer technology all in one team. And what that meant was you could build products, front to back, at a pace — and you could feed them with just two pizzas.

Nobody else at the company could do that, because they had to get organizational agreement from fifteen different groups, who all had their own agendas and their own set projects. So that cross-functional, front-to-back ownership is part of what would allow greater maneuverability.

Jeff Bezos wanted to expand that pattern and see if we could create more teams who were able to execute independently of one another, rather than get bottlenecked all the time, or locked up on process. We were in the midst of a giant technology transition from one kind of website platform to another, which was tying up a lot of resources. And there’s a lot of complexity and competition within the platform. I was trying to argue that we should concentrate on doing fewer things, but doing them better, so that we would maintain our maneuverability and our adaptability rather than getting big and slow.

DL: It strikes me, researching AWS and talking to people who were building it, that Amazon had recognized the opportunity of the cloud and committed the resources to it long before anybody else did. Is that accurate in your view? And if so, why do you think Amazon was so clearly the first mover?

MR: I think that Jeff deserves the credit for identifying it as an opportunity, and choosing to invest heavily in it. Which was good, because, you know, he had the authority to make those sorts of heavy investments. I wasn’t privy to all the discussions, but my understanding is that Jeff was the one who decided this was a significant area of opportunity, the next billion-dollar area.

I don’t think at the time there was anything in the way of competition at all in the commercial space. So I think I think Amazon really was ahead in this, and I think the choice of the first few services to productionize and release was very wise.

A scalable storage service like S3 is a brilliant product, and a breakthrough product in a new category. It’s so simple. Anyone could understand what S3 does. I think that’s part of how it got such wide adoption so quickly.

DL: Did you envision the cloud having the seismic impact on the business world that it ultimately did?

MR: I would definitely have been in the cynic camp. I would not have bet on this at all. Jeff has definitely beat me — by a lot — on his calls on what’s gonna win and what isn’t going to win. I’m amazed by the penetration, and how often I end up seeing S3 or AWS in the URL for something where I would not have had any idea that they had any part in it, whatsoever.

What the cloud really has done is enabled people with ideas to realize those ideas in a way that was previously impossible. Previously, you would have needed so much capital, so much infrastructure, to be able to get to market or deliver a proof of concept. AWS makes it possible to deliver at web-scale with so much less invested. It gives you the capability to scale rapidly up and down in a way that really is exceptional.

I think even today most people don’t see most of what is running on AWS most of the time. You simply don’t know just how fundamental this is to so many of the things you take for granted. If we were to take apart ten interesting things you do, I’d be shocked if nine out of ten of them didn’t rely on AWS or some similar cloud service without us having any idea that it’s happening. It’s quite an invisible, yet remarkably enabling technology.

It’s about what you can do from your garden shed. That’s what AWS is really about. From your shed, you can build something that will look like it is an industrial-grade international behemoth, and it’ll just be you and a few lines of code.

DL: And the challenge will be getting it to scale, to be what it appears to be.

MR: The problem will be growing it to 1,000 customers. At Amazon, we were continuously living in fear of Q4. In Q4, traffic spikes, we run out of compute power, and the site gets slow. Things go badly. Now, that’s not an experience that anyone is going to have anymore. That problem has gone away. New entrepreneurs have a different problem: How do you get so much load that you could have had a problem in the first place?

DL: Do you see any of AWS’s competitors ever posing a serious, existential threat to AWS?

MR: I think fundamentally a lot of what AWS is doing is going to become commodity, and as it becomes commodity, you’re much more exposed to competition. Then, it’s going to be a competition of scale driving down pricing or innovation in provisioning these things behind the scenes.

Conceptually, these building blocks are quite simple. I don’t think it’s a foregone conclusion that AWS will remain at the forefront of these things. If there were to be any sort of national security issue around AWS suddenly, you could see quite a substantial portion of users opting for something more business-grade. Or maybe there’s some sort of ethical question that arises over somebody they allow on the platform or disallow from the platform.

You could see a competitor taking advantage of that sort of positioning because they’re essentially commodity blocks that you’re offering. It’s not like nobody else could make a storage service to store files and retrieve them. Everybody else can make that service.

I don’t think it’s an easy-to-defend position. Their main defense is pricing, and pricing is achieved through scale. But then, maybe some fundamental technological shift that changes the game on pricing. If somebody finds a more energy-efficient way, and energy is money.

DL: Yeah, I mean, it’s always interesting to me when when something or someone gets so big or so ubiquitous — it seems like strength and vulnerability all in one. How do you, the dominator, maintain your strength and minimize your vulnerabilities? 

MR: The risk is that you get greedy or you get lazy, right? You get greedy, and you charge more than you should, and you’re gonna get undercut. Your customers will find another supplier. It’s hard to stay lean, and it’s hard to stay customer-focused.

DL: And humble. My reference point is always creatives, and you certainly see that with very successful artists, they can fall victim to the spoils of success and lose track of what enabled that success in the first place.

Well Matt, we’re a little bit over time. But this has been great — I really appreciate you taking the time.

MR: No worries — brilliant, Dustin. Good to meet you.

The Modern Guide To Managing Cloud Costs

Traditional cost management is broken. Here's how to fix it.

Modern Cost Management Guide