Posted on Categories Opinion, Pragmatic Machine Learning, RantsTags ,

Statsmanship: Failure Through Analytics Sabotage

Ambitious analytics projects have a tangible risk of failure. Uncertainty breeds anxiety. There are known techniques to lower the uncertainty, guarantee failure and shift the blame onto others. We outline a few proven methods of analytics sabotage and their application. In honor of Steven Potter call this activity “statsmanship” which we define as pursing the goal of making your analytics group cry.


Panthouse-klompen.jpg

  • Squander A/B testing bandwidth on bugfixes:

    A/B testing is the art of testing two or more variations of a product in parallel to try and directly detect or measure an important difference between them. The idea is to lock in any positive changes and back out any negative changes. An A/B platform needs to manage a lot of measurements to get sample sizes large enough to return reliable results. A typical result of measuring a series of 10 attempts to raise revenue per customer might look like the following:


    ab1.png

    Some of these changes are good and some are bad. However, if you don’t have the bandwidth to make the A/B tests you don’t know which are which and you end up essentially forced to take all of the changes that “sounded good” (shown as the blue curve in the next chart). If you do have the measurements you back-out the bad changes and keep accumulating the good ones (the violet curve in the next chart). The difference is dramatic.


    ab2.png

    As you can see: with enough useful experiments you can cherry pick a bunch of risky ideas into accumulated improvements.

    All of this depends on a stream of good ideas and having enough bandwidth (customers segregated into different treatment and control groups) to make all of these measurements as each of these variations is applied. A good way to lower the demand for new ideas is to clog up the A/B testing infrastructure.

    One way to clog up the A/B testing infrastructure is to reserve the testing and statistics infrastructure to document the group is meeting goals. For example- collect a lot of statistics on a necessary bug fix. A policy saying the impact of all bug fixes (even those you have no choice but to implement) must be quantified can easily eat up all of your A/B bandwidth without testing any new ideas. If asked why you are doing this say it is to ensure that bug fixing is meeting its ROI targets.

  • Encourage the A/B testing framework to sting itself to death:

    The hardest thing to measure statistically is a non-effect. This is because a non-effect (that is a change that does nothing) is identical to what statisticians call the “null hypothesis” (which is the hypothesis you are trying to reject). Any attempt to measure a non effect will return a result that isn’t quite zero but doesn’t quite have enough data to show there is an effect significant at the current effect size being looked at. A repeated study with more data will get the same sort of equivocal result, just for a smaller effect size. This is why when designing a study you need to first establish a lower bound on effect sizes or be willing to say something like “we see no change below x% as being clinically relevant.” Otherwise if there really is no effect you get a series of bad studies as you are tempted to mis-use larger and larger sample sizes to study smaller and smaller effects. Statistics can never “prove zero” they can only prove below a given bound.

    A famous example is attempting to test the difference between 41 shades of blue on a thin border. You know this
    can make no real difference, but the poor suckers running the test will only get equivocal results. You can then send them back to run larger tests (which will also fail to achieve statistical significance) because “at our scale even a very small effect is important.” Insist the statistician prove there is absolutely no effect (don’t let them get away with proving any effect is below a given size).

  • Don’t provide a domain expert or product manager:

    One of the more useful tenants of modern software development (in particular some of the variations of “agile”) is that for useful work to be done a domain expert or product manager must be integral to the effort. Often this role is called “the customer” and it is an individual in the company (not a real customer) who has the experience, intuition and authority to declare success or failure (in addition to supplying ideas and useful intermediate goals). Ambitions research is even riskier than development, so make sure the research group does not even meet good development practices (let alone good research practices).

    For example:
    Statistics/Analytics is very good at testing and quantifying possible profitable hunches but (despite some of the broader claims attributed to data mining) has no systematic way of generating non-trivial hunches. So you can slow down an analytics effort by not supplying any intuition. Insist that it all “come from the data.”

  • Insist on retrospective studies:

    Convince management that the market will not tolerate experimentation (customers will revolt, competitors will see our secret sauce, …). So any proposed change can only by analyzed by attempting a retrospective study on older data. Instead of exposing new customers to variations on proposed improvements have the analytics group sift through old data and model (guess at) impacts these changes would have had using machine learning or statistical modeling. Machine learning is particularly painful without training data and statistics depends on meaningful measurements. Retrospective studies are very important- but they can not be your only tool.

  • Insist on perfectly clean studies:

    If the retrospective trap doesn’t work you are a good position to push for “perfectly clean studies.” Only one variation can be tried at a time (else variations interfere) and you can’t even end the trial on disaster (“could be a fluke, backing out the change now would give our data a censorship/stopping bias”). With enough procedures and insisting on sample sizes specified before having any hint at the effect size you are trying to measure you can completely crush analytics.

  • Self service analytics:

    If the “clean study” gambit doesn’t work then you are in a good position to advocate “self service analytics.” Push control of the A/B testing infrastructure to all of the engineers. Any engineer can request a fraction of the site traffic to try a variation on. Each customer might see many different variations from the many different engineers- “but hey, with a little linear algebra the stat guys can iron that out.”

  • Security:

    They can’t analyze the data if they can’t get to it. Partition that data into different areas of sensitivity, build elaborate procedures and protocols so data from different sensitivity areas can not be combined. Or just deny analysts access to all of the data. Your IT/Networking department can do this for you with complicated chains of trusted clients, VPNs and approved builds.

  • Dining Philosophers:

    Make sure you don’t provision enough resources (machines, disk, memory, database nodes) for all of your analysts to work at the same time. Get them to turn on each other. This is sometimes called the datamart method.

  • Catch 22 ROI:

    Don’t budget a study until you know the expected ROI of the result and don’t accept an ROI estimate that isn’t backed up by a study.

  • Blue Ocean Strategy:

    Make sure your analyst has a “blue ocean opportunity.” Give them data that nobody has ever used or looked at before. Wait a while and then say “they are way
    too expensive to be running down these picayune data cleaning issues.”

  • Run before you can walk:

    Insist the analysis scale to “billions of records” on the first try. Or try the early spec gambit: “this needs to go into development parallel to you doing the research.”

  • “I could have done that in Excel”:

    The dual to the run before you can walk strategy. Don’t allow any easy victories (like “we found all of the currently unprofitable accounts”). Insist on exotic models and above all “prediction” (“predict which accounts will become unprofitable”).

  • “Needs to be more explainable”:

    The entire analysis technique needs to fit onto a single Powerpoint slide- “for upper management.” This is the dual strategy to the “I could have done that in Excel” strategy.

  • Insist on Excel:

    Insist on and enjoy the deadly dance of pivot tables, office data connections and plugin solvers.

  • Death by software engineering:

    Insist not on a result or procedure but a “dashboard” with “an intuitive UI.”

  • Postulate sub-populations of non-customers:

    Even with a product manager you can force failure by concentrating analytics on the wrong questions. Postulate three to five customer types that find your product lacking for different contradictory reasons (“too technical”, “not for power users”, …). Now you can squander effort on: characterizing the customer groups, estimating the size of the customer groups and estimating the improved uptake each incompatible change to your product would induce in each group. Instead of working on your product you are now working on psychology, demographics and many incompatible variations of your product.

In conclusion: if you can’t win this game against the analysts, you aren’t really trying. Or for the non tongue in cheek version: successful ambitious analytics requires a minimum amount of attention and flexibility. All of the “blockers” here are variations of valid concerns that only become blockers when either there is no attention or adaption.


Edit. I forgot to mention another inspiration for this article: Nicholas Vanserg “Mathmanship”, “The American Scientist”, 1958.

2 thoughts on “Statsmanship: Failure Through Analytics Sabotage”

  1. There’s also the Cindy Crawford strategy. The analysis looks great overall, but you really can’t use it with that one ugly mole (e.g. seasonality in Mongolia is incorrect).

Comments are closed.