Norvig’s claim that programming competitions correlate negatively with being good on the job

Erik Bernhardsson

I saw a bunch of tweets over the weekend about Peter Norvig claiming there’s a negative correlation between being good at programming competitions and being good at the job. There were some decent Hacker News comments on it. Norvig’s statement is obviously not true if we’re drawing samples from the general population – most people can’t code.

There Will Be Cyberwar: How The Move To Network-Centric War Fighting Has Set The Stage by Richard Stiennon


With new technology, come new problems. Oft times, with the increasing demand for the latest and greatest tech, security is more of an after thought. What has been a consistent theme is developing the next best technology, and then figuring out how to protect it.

Common Probability Distributions: The Data Scientist’s Crib Sheet

Cloudera Engineering

Data scientists have hundreds of probability distributions from which to choose. Where to start? Data science, whatever it may be, remains a big deal. “A data scientist is better at statistics than any software engineer,” you may overhear a pundit say, at your local tech get-togethers and hackathons. The applied mathematicians have their revenge, because statistics hasn’t been this talked-about since the roaring 20s.

What’s Coming Next in Digital and Social in the Enterprise?

Dion Hinchcliffe's Web 2.0 Blog

I’ve been taking a close look at what’s over the enterprise horizon for much of the year as the pace of technology change continues to accelerate, as most experts have long predicated and which will only continue.

The 5 Levels of Analytics Maturity

Pitfalls of Agile Transformations


“We are a conservative company, so we are just starting our agile transformation,” the manager told me. But we expect big things from it: faster delivery, easier recruiting, happier customers.” Interesting objectives,” I thought to myself. Something I might have heard ten years ago.”

Leaving Spotify

Erik Bernhardsson

Febrary 6 was my last day at Spotify. In total I spent more than six years at Spotify and it was an amazing experience. I joined Spotify in Stockholm in 2008, mainly because a bunch of friends from programming competitions had joined already. Their goal to change music consumption seemed ridiculous at that point, but six years later I think it’s safe to say they actually succeeded. Back in the early days, my job was to do almost anything related to data.

Better precision and faster index building in Annoy

Erik Bernhardsson

Sometimes you have these awesome insights. A few days ago I got an idea for how to improve index building in Annoy. For anyone who isn’t acquainted with Annoy – it’s a C++ library with Python bindings that provides fast high-dimensional nearest neighbor search. Annoy recursively builds up a tree given a set of points. The algorithm so far was: at every level, pick a random hyperplane out of all possible hyperplanes that intersect the convex hull given by the point set.

Benchmark of Approximate Nearest Neighbor libraries

Erik Bernhardsson

Annoy is a library written by me that supports fast approximate nearest neighbor queries. Say you have a high (1-1000) dimensional space with points in it, and you want to find the nearest neighbors to some point. Annoy gives you a way to do this very quickly. It could be points on a map, but also word vectors in a latent semantic representation or latent item vectors in collaborative filtering.

List of Cyber Threat “Wake-Up Calls” Growing: Policy makers have been hitting the snooze button since 1970


The list below is an update to our reference of "Cyber Security Wake-Up Calls." What does it take to be on the list? Generally each of the events below was so significant policy makers were loudly proclaiming to all who would listen that they were a wake-up call.

Time To Spread The Word on Internet of Things Dangers: Read what FBI and DHS Cyber Centers Need Us All To Know


The DHS National Cybersecurity and Communications Integration Center (NCCIC) is playing an increasingly important role in collaborating across multiple sectors of the economy and across government in sharing important advisories and alerts.

Prepare for The Cyber Threat : What Executives Need to Know to Manage Risk


By Matt Southmayd. Cybersecurity is one of the most high-profile topics for organizations today and one of their biggest sources of risk. Numerous recent incidents have heightened awareness of and sensitivity to this risk, and have made it even more critical that they assess their cyber readiness.

Nearest neighbors and vector models – epilogue – curse of dimensionality

Erik Bernhardsson

This is another post based on my talk at NYC Machine Learning. The previous two parts covered most of the interesting parts, but there are still some topics left to be discussed. To go back and read the meaty stuff, check out. Part 1: What are vector models useful for? Part 2: How to search in high dimensional spaces – algorithms and data structures. You should also check out the slides and the video if you’re interested. Anyway, let’s talk about the curse of dimensionality today.

Introduction to HDFS Erasure Coding in Apache Hadoop

Cloudera Engineering

Erasure coding, a new feature in HDFS, can reduce storage overhead by approximately 50% compared to replication while maintaining the same durability guarantees. This post explains how it works. HDFS by default replicates each block three times.

How Organizations Can Address the Challenges of Modern Digital Collaboration

Dion Hinchcliffe's Web 2.0 Blog

It’s now clear to me that we must take bold new steps if we are to truly improve the state of workforce collaboration in most organizations. As the majority of us are doing it today, digital collaboration is largely stuck in the doldrums.



One third of the fuel that goes into a car is spent overcoming friction. By comparison, an electric car loses half as much energy - one sixth - to friction. Who knew electric cars had such an advantage?

Big Data Comes To The Call Center

The Accidental Successful CIO

Call centers are a great place to start to use your big data tools Image Credit: King County, WA.

The Sony Hack in Context


By Chris Mellon. The good news for the moment is that the North Korean attack on Sony Pictures is in the headlines and has the nation discussing cyber security issues. The bad news is that neither the press nor the government is placing the Sony attack in context.

The President Speaks At Hadoop World: Introduces DJ Patil as Nation’s First Chief Data Scientist


By Bob Gourley. Data science history was made on 19 Feb 2015. For the first time in the history of Hadoop World, the President of the United States gave a keynote. Dwell on that for a bit. This is huge. No matter what your politics is, you really have to agree, this is huge.

Google, like Samsung, is eavesdropping on your private conversations


If you use Google Chrome , you could be subject to eavesdropping by Google. Similar to what Samsung's TVs are doing, the Chromium browser listens to conversations in the vicinity of your laptop, PC, or tablet, and transmits it back to Google.

Please Help Spread The Word: IEEE Seeks Papers On Bio-inspired Cyber Security


Friend and CTOvision reader Sean Moore of Centripetal Networks is a proven engineer with experience developing technologies and leading tech focused businesses. He is highly regarded for his mastery of network cyber security, IP communications technology and TCP/IP networking.

The Technology Related Content of the President’s State of the Union Address (software developers/coders mentioned for first time in any SOTU)


By Bob Gourley. The 2015 State of the Union address was full of technology related content. Here are key takeaways from the perspective of a CTO: The President showed respect for something many American''s may not have heard of, but you dear readers know very well: Coding!

The Wisdom Of Carl Sagan On Science, Government, and Even Enterprise IT and Digital Risk


Of course you know Carl Sagan the distinguished astronomer and great explainer of science via best selling books and the TV series Cosmos. One of his last interviews was conducted by Charlie Rose in May 1996.

Interview with a Data Scientist: Erik Bernhardsson

Erik Bernhardsson

I was featured in Peadar Coyle’s interview series interviewing various “data scientists” – which is kind of arguable since (a) all the other ppl in that series are much cooler than me (b) I’m not really a data scientist.

Architectural Patterns for Near Real-Time Data Processing with Apache Hadoop

Cloudera Engineering

Evaluating which streaming architectural pattern is the best match to your use case is a precondition for a successful production deployment. The Apache Hadoop ecosystem has become a preferred platform for enterprises seeking to process and understand large-scale data in real time.

How Digital Collaboration is Fragmenting, and Why It’s a Major Opportunity

Dion Hinchcliffe's Web 2.0 Blog

A significant issue has been developing in digital collaboration for the last several years, and it’s now starting to become somewhat acute. I’m referring here to the pronounced trend towards app, environment, and channel fragmentation.

Lean Software Development: The Backstory


We were in a conference room near the Waterfront in Cape Town. “I I just lost a crown from one of my teeth.” my husband Tom declared just before I was scheduled to open the conference. Someone at our table responded, “You’re lucky, Cape Town has some of the best dentists in the world.”

CIOs Need To Know How To Build Bridges

The Accidental Successful CIO

You’ll need a bridge to move from providing a service to being a partnerbr /> Image Credit: JOHN LLOYD. So here’s a question for you: based on the importance of information technology, what is IT’s role in your company?

The Megatrend of Cloud Computing: An update for technology decision-makers


There are seven key megatrends driving the future of enterprise IT. You can remember them all with the helpful mnemonic acronym CAMBRIC, which stands for C loud Computing, A rtificial Intelligence, M obility, B ig Data, R obotics, I nternet of Things, C yberSecurity.

