You can't A/B test yourself to product transformation

Aug 11, 2022

As Seth points out, this is not a coherent use of colors. There are way too many shades of blue and gray, and many of them don’t work well together.

Google's new Gmail design is out, and a lot of people aren't happy with its haphazard, non-cohesive design.

How does a product as big as Gmail from a company as big as Google come out looking like this?

Google famously has a culture of A/B testing (including testing upwards to 50 different shades of blue against each other to see which yielded the most search clicks), and I believe this explains why the product feels like a collection of non-cohesive optimizations.

Before I delve deeper into this, I want to drive this home: You can't A/B test to product transformation or to a new product. A/B testing cannot build a foundation. A/B testing can help you go from good to great.

Many orgs are focusing on too many micro-decisions, often powered exclusively by A/B tests. The test results often don't mesh well together. There is far too little UX research being conducted. Micro tests end up replacing product vision.

A/B testing can be great! But you can't over-rely on it, and you have to ensure you are controlling for one variable at a time (and that you understand what the data is actually telling you). You still need a cohesive product vision. And any good research regime has both robust quantitative and qualitative research capabilities.

A/B testing is best at optimizing a strong product vision and design. That strong product vision must come first.

A/B testing can also be a way to test out wildly different product designs to see the impact a new design direction will have. This is particularly true in conversion-heavy situations, like search listings or an online store, where you don’t want to risk taking a huge revenue hit on a new divergent design.

But when A/B testing is used as the guiding light for product development, it leads to a lack of cohesive product design, particularly when you have several agile teams working independently on the same product, all conducting their own micro experiments. This leads to unusable products that leave users unhappy.

Now, if you are Google, maybe you can get away with this. But if you're not Google, you cannot and should not do this.

A bunch of people at Google probably did realize that the new Gmail looked off, but the product development process made it go through anyway.

You don't get a product that looks like the new Gmail unless you want it to look like that. I bet no one at Google literally wanted the product to look like this, but products reflect the product development processes that create them. If you have a process with a lot of disparate agile teams working without a lot of oversight and coordination with a culture of A/B testing everything, you will end up with disjointed products.

I would also be extremely careful A/B testing different shades of the same color against each other. Different displays display colors differently (and displays are becoming more accurate over time). Different lighting conditions make people perceive colors differently. And the conditions in which people are using computing devices is changing over time.

Both Google and Bing A/B tested to find the “optimal” shade of blue for their search results. And yet, their tests yielded different results.

Do not mistake this post as suggesting that you shouldn't have robust A/B testing capabilities. You should. But that's not nearly enough.

Better Designed

Discussion about this post