Testing in Production 101

A top priority for developers is knowing that their new feature code will work in production before their user base has access to it. That’s why testing in production has been gaining popularity in the software development industry.

Developers need to know that they can safely deploy their code to production and have the power to specifically target internal teammates, test the functionality and validate the design and performance of the feature. They also need the flexibility to fix any bugs or defects that emerge. After all of these boxes have been checked, it’s time to turn on the feature flag and give your user base access to the new feature—already knowing that it works in production.

Testing in production provides the fast results that developers expect, but how do you get started? This step-by-step guide provides best practices and order of operations for what to implement to successfully test in production.

Benefits of Testing in Production

You wouldn’t let your users log into your staging environment and use your software that way, so why would you use dummy environments to test features before they’re released? Answer: That’s just the way it’s always been done. For years, the process has been to deploy code to staging, have the QA team test in staging and then deploy to production after testing. But when your staging test results don’t match your production results, what do you do? How do you tell the QA engineer who tested the feature in staging that it broke in production? More importantly, what’s the point of all the effort that was put into testing in staging if it doesn’t give you the same test results as production?

Feature flags let you avoid those issues by testing in production and confirming the feature will successfully function in the environment that it will ultimately live in.

The bottom line is that customers don’t care how a feature performs in a staging environment. They care that it works when they need to use it, and the only way to guarantee it will work in production is to test it in production.

Implementation Plan

Still not sure where to get started? The following plan lays out step-by-step instructions for the first 30, 60, and 90 days of your implementation.

First 30 Days

During the first 30 days, the focus should be on project alignment and familiarizing and educating yourself on the feature flag management tool you decided to use. This time should be spent hammering out the details to ensure testing in production will work for your team, and one vital part of that is revisiting the team’s automation framework to make sure it is easy to use and implement. The automation framework needs to be able to write end-to-end tests and have adequate reporting so the team is updated on details when a test fails. By skipping this step, your automation framework will become an obstacle that hinders your future testing in production and becomes more of an annoyance than a benefit. Do not skip this step!

For those onboarding a new tool to manage your feature flagging and experimentation, the next step is to configure the tool to your needs. This can include setting up SSO, permissions, user creation and user maintenance. When these are complete, it’s time to implement the appropriate SDK.

These first 30 days are also a good time to set baseline metrics for benchmarking. These metrics can include things such as time to release, page load time, percentage of bugs in production versus staging, percentage of bugs found before release versus after release, etc. With these baseline metrics, you have a standard to which you can compare any changes, and once a feature has been released, you can quickly measure its performance and make any necessary process improvements.

Days 30 to 60

During the next 30 days, prioritize setting up your feature flag management tool to mirror your current environment setup. For example, if you currently have Dev, Test, QA, UAT and Prod in your software development life cycle (SDLC), you should have those accurately reflected in your tool. Eventually, you will narrow this down to only production and dev here to mirror a true testing in production setup, but for now, these environments should mirror your current environment setup in your application.

With your environment set up, you can grant segments of teams access to each environment in the ‘Allowlist’ section of your feature flag configuration. For example, if your product team currently validates features after releasing to production, you can add a segment for the product team and allow them to have access to the feature in production. This ensures the team will still have access to the feature even when the feature flag is off.

The next step is to differentiate between test data and real data from production. This can be done by setting up a boolean for test entities in production such as “is_test_user = true” for the test users and “is_test_user = false” set automatically for real production users. For business intelligence (BI) tools used with products such as Datadog and Looker, create a separate database for all of the test users’ activity. This gives you the ability to make business decisions based on real user data rather than data from automated tests in production.

For this phase of the process, it’s also crucial that your team is abiding by the same definition of done. Your team must agree that a feature cannot be labeled “done” until the tests are running in production and the flag has been turned on for 100% of your users.

Days 60 to 90

You’ve reached the home stretch, and now the fun begins: your first real test in production. To start, deploy your first feature to production with the default rule off for safety. This ensures that only the targeted users will have access to the feature. Next, run your automation scripts in production with targeted test users, as well as the regression suite to guarantee previously released features are not affected by your changes. With the feature flag off and only your targeted team members having access to the feature, you will officially be testing in production. This is the time to resolve any bugs and validate all proper functionality. It’s important to remember that because end users do not yet have access to your feature, they will not be impacted if anything does go wrong.

After you’ve resolved the issues that appeared in your first test and you’re confident the feature will work properly, it’s time to use a canary release to open up the feature to 1% of your user base. The next days will be spent monitoring error logs and growing your confidence in the feature until you feel it’s appropriate to increase the percentage of users that can access your feature. Once you reach 100% of users and you know without a doubt that the feature works, it’s time to turn on the default rule for the feature.

By the time you reach the 90-day mark, you should have a regular test cadence and the next step is to work with your product team to determine which tests will run at which cadence. For example, you could schedule your lower priority test suite to run nightly, while your high-priority test suite runs hourly. Be sure to set up alerts to be automatically notified when a test fails so that you can immediately analyze the failure.

With 90 days of testing under your belt, circle back with your team to reflect on what was successful during the process and where the team can make improvements. By aligning your team this way and tweaking the process to make it work best, you’re setting up your team to achieve optimal performance.

Let Your Fear Go

When testing in production, one of the biggest obstacles to overcome is fear. Teams are scared of negatively impacting their user base, negatively impacting data and generally creating a mess in production. While these fears are valid, they can all be avoided with feature flags. When feature flags are implemented correctly, they open up a world of possibilities for testing. So put your fears aside and trust in the abilities of feature flags!