Don’t Metricsmaxx

Tokenmaxxing and leaderboards have been under the microscope. In a nutshell, some companies have been encouraging AI adoption by tracking which employees use AI the most. This led to employees racking up massive AI bills with little business value to show for it as they ran meaningless AI queries in order to rise to the top of the leaderboard (or avoid being at the bottom). A lot has already been written about AI leaderboards and their failure to deliver positive results. Business output is what should have been tracked. I’m going to side step that topic and focus on another smell from the tokenmaxxing craze. Maximizing a metric is a smell that should be approached with suspicion.

In an effort to be data driven, companies collect vast quantities of data in order to drive decisions. What’s working? Are we moving in the right direction? The data will tell us. This approach makes sense. Data can validate or invalidate our assumptions. Once you have data, it’s tempting to gamify metrics and drive them to the smallest or largest possible values. Following through on this temptation is problematic. Most metrics are healthiest when they are in a target range, not when they are maximized or minimized.

Consider a trip to the doctor. Your blood pressure, weight, and temperature all have healthy ranges. Both hypothermia and hyperthermia can kill you. Body temperature is an easy metric to collect. It’s also an important one. Trying to get it as low or high as possible is ill advised.

Back in engineering world, we all want to minimize defects. But that doesn’t mean we should drive defects to zero (although we may strive to do so). Zero defects should set off alarm bells that something is wrong. In my experience, very few defect reports mean that nobody is using the product. If nobody uses the feature you shipped, then there won’t be any defects discovered. That’s not the only possibility. Low defect reporting rates could mean that defects aren’t being tracked because the reporting process is too difficult or someone is gaming the system. Even worse, you might not get reports from the field because customers decide to move away from the product and it isn’t worth their time to report the problem. So if zero defect reports are bad, then should we aim to be the team with the most defect reports? Of course not. There’s a range of reasonable defects. The team wants to be at the low end of that range, but staying in the range is the goal.

What about maxing your sprint points. More productive, more better, right? Having a great sprint is great, but if you are blowing the estimate out of the water every time, then you should investigate what’s happening. Are you more productive or do you need to improve your estimates? Did you become more productive because you acheived an efficiency or are you reaching new productivity heights at the expense of customer response times?

Both examples point to a core problem with leaderboards and metricsmaxxing. They focus on a single dimension of a multi-dimensional space. You don’t build a skyscraper by focusing all of your attention on the height and ignoring the health of the foundation. When you create a leaderboard for height, you focus the entire crew’s attention away from the foundation. You end up with a tall building, but it may not survive a light breeze.

We should be wary of the metrics that we work toward. Not because they are unimportant, but because they are deeply important in a data driven world. We need to be driven in the right direction by the right data. Metricsmaxxing is a smell that we may be on the wrong path.