A/B Testing: Can You Learn from a Mobile App Test Netflix Probably Didn’t Run?

    1024 683 Shamir Duverseau

    Last week I heard a story on NPR revealing that Netflix has been reducing the speed and quality of its videos — a practice called throttling — for customers streaming them on AT&T and Verizon wireless networks.* A week after the carriers were accused of throttling video speeds on their networks, Netflix stepped forward to take the blame, saying the approach was carried out to prevent viewers from using up their data and thus help members avoid unplanned network fines for exceeding mobile data caps.

    Netflix now plans to shift some of that control to viewers themselves. In May, it expects to make a “data saver” feature for mobile apps available to some members, letting them choose either to stream more (but lower-quality) video if they have a smaller-capacity data plan or to increase video quality if they have a less-restrictive plan.

    At present, Netflix does allow its members to control video quality in other mediums, such as via their web browsers.

    Call me a nerd if you like, but the first question that crossed my mind was, did Netflix even test this? Whether they did or didn’t became irrelevant—my stream of consciousness was set in motion. I immediately began planning the test that Netflix could/should have run.

    putting throttle to the test

    When it comes to mobile app testing, powerful tools like Optimizely make it possible to run A/B tests, target different content to different audience segments, build experiments and make decisions based on the results. Using such a tool, here’s how a test might work.

    Optimizely allows you to test and roll out entire features, no matter how custom your app is built, without the need for app store updates.

    Of their more than 75 million total members, Netflix could select a small percentage of applicable members for the test (easily about 5% I’m guessing). Because they actually know who their members are and have information on each, they have the ability to choose a thoughtful sample set that spans geographies, demographics, mobile carriers, etc.

    This sample set can then be divided into three groups — A, B and C — with a different experience designed for each. Each of these variations assumes that the new “data saver” feature is in place, which allows members to opt in or out of the throttling.

    • Group A: This subset’s bandwidth is automatically reduced—the data saver is defaulted to “on.” They receive no communication about this change.
    • Group B: Their bandwidth is also automatically reduced, however they receive a default message alerting them to the change and the reasoning for it (e.g., to ensure they don’t incur overage costs with their carrier).
    • Group C: Their bandwidth is not automatically reduced—the data saver is defaulted to off—however they receive a default message alerting them to the new feature and the reason for it. 

    measuring results — and satisfaction

    What one can learn from a test like that above is how many users accepted the throttle (kept/positioned the data saver “on”) and how many chose to decline (kept/positioned the data saver “off.”) Optimizely will calculate the results of a test like the one above until a level of statistical confidence in the data is achieved, to ensure you can trust the implied trends.

    While the above would reveal how users react to the changes, how do we know how they actually feel about them? Enter ForeSee, a robust tool for satisfaction measurement. At Smart Panda Labs, we work with ForeSee to help our clients measure customer satisfaction with multichannel customer experience analytics. ForeSee also assigns each customer their own analyst who can help you trust and interpret the data that’s coming in. The tool scores the survey results against the ACSI (American Customer Satisfaction Index), a time-tested scientific model that stacks up your data against national customer satisfaction benchmarks.

    With the results of the test and surveys in hand, Netflix would have an informed idea of how to best roll out its changes to video speeds and/or data saver feature.


    ForeSee’s Executive Portal provides visibility into all measured touch points in a multichannel customer experience in one place, allowing decision-makers to easily gain insights into their customers’ mobile app experience.

    it’s what you say — and how you say it — that counts

    My streaming (of consciousness) doesn’t end there. Let’s go back to our test groups above and assume, for the sake of example, that the best adoption rate with the highest satisfaction scores came from B, the group that received messaging about the data saving feature, which was defaulted to “on.” Let’s say that based on these results, Netflix has decided to roll out its data saver to all its members in the same fashion. Before doing so, is there another test to run that could further ensure customer satisfaction and adoption? Of course, the answer is yes.

    A multivariate test is one that pits multiple combinations of multiple variables against each other. Netflix could test different messages about throttling using different means of delivery, such as communications sent via email against messages that occur within the app itself, perhaps as a push notification that appears on the users’ mobile device. Optimizely is well suited for these more complex tests, and with an ample sample test, the results would shed a lot of light on how best to communicate the throttling approach.

    knowledge is power

    You know what’s really exciting about testing in an environment like Netflix? Because they have the ability to specifically identify the subset they are testing, including their account details, Netflix could analyze their test results by segment—for instance, members who mostly watch content via their Netflix mobile app, members who only occasionally stream via app, and members who rarely stream on mobile or uses mobile mostly in tandem with their desktop or TV. When viewed by these segments, the results may prove that Netflix benefits most by rolling out their data saving practice in three different ways to these three different segments of their entire member base, which a tool like Optimizely can help them to do. How cool is that?

    Ok Shamir, you say, we get it. But what if a company has no time for all this testing? That’s a fair question. Sometimes new features or policies need to be rolled out quickly, and the time it takes to build a series of tests that build on one another isn’t feasible. I would insist that a very basic test could still lend enormous value to the process. Netflix could have taken a portion of their members, say 10%, and throttle half of them down immediately, leaving the other half alone to act as the control group. They could then use the ForeSee survey, delivered to both groups to at least gauge customer satisfaction. In other words, a simple A/B test can make sure you aren’t ticking people off. Is that too much to ask?

    So in reality, was this a missed opportunity for Netflix? Perhaps their decision was indeed informed by testing that we don’t know about. Or maybe they were in the middle of a test when the carriers caught wind of the throttling and decided to call them on it. All I know is that they have been downgrading video on their app for five years without communicating it to the general public, leading me to believe they missed a chance to build rapport and trust with their users.

    If your business is thinking about making a sweeping change to the products or services you provide, take a step back and consider your options and make choices based on data, not just what seems like a good idea. Because not everyone may agree with your approach. And because customer satisfaction and loyalty are worth it.

    To learn more about A/B testing, download our free white paper. We also invite you to learn more about the testing + optimization services available to you and the marketing technology support services we offer for Optimizely, ForeSee Results and other leading tools. Contact us for your free consultation.

    *Not every carrier is getting this treatment from Netflix. T-Mobile and Sprint customers are exempt from the policy because those carriers throttle wireless speeds rather than hit their customers with extra costs when they exceed their data limits. You can learn more about the difference among carriers in this recent Wall Street Journal article.



    Shamir Duverseau

    Shamir leads the firm's strategic planning and business management. He has worked across a number of industries, from travel to entertainment to technology, working with brands like Southwest Airlines, The Walt Disney Company, and NBC Universal. During his last 20 years in marketing, Shamir has held leadership roles, overseeing everything from product management to digital strategy, including user experience design, web development, testing and web analytics. Prior to Smart Panda Labs, Shamir was the Senior Director of Digital Strategy and Services for Marriott International’s Vacation Club Division.

    All stories by: Shamir Duverseau

    Leave a Reply

    Your email address will not be published.