Posted on Categories Opinion, StatisticsTags

Adversarial machine learning

I just got back from a very good conference organized by Adversarial Machine Learning. Please read on for my to comments on part of one of the very good talks.

Classic machine learning (especially as it is taught in classes) emphasizes a nice safe static environment where you are given some unchanging data and are asked to produce a nice predictive model one time. It is formally easier that casual inference or statistical inference as being right often is enough, no matter what the reason. It lives in an overly idealized world where one implicitly assumes the following simplifying assumptions:

Adversarial machine learning is the formal name for studying what happens when conceding even a slightly more realistic alternative to assumptions of these types (harmlessly called “relaxing assumptions”).

At’s adversarial machine learning conference Dr. Alyssa Frazee gave a good talk on her work at Stripe. One point she was particularly clear on: once you actually start using your model in a sense you become an additional adversary.

Her example was denying payment requests. Suppose you have a model that for a transaction x returns an estimate pfraud(x), the estimated probability that a payment request is fraudulent. Further suppose you set up your business rules to refuse all transactions x where pfraud(x) ≥ T, where T is a chosen threshold. Then after running your system for a while you will no longer have any recent observations on the behavior of transactions where your model thinks pfraud(x) ≥ T (as you never let them through!). In particular you can no longer asses your false-positive rate in a meaningful way as you are no longer collecting outcome data on items our classifier thinks are in the fraud class.

I don’t want to try explain the setup or derivation of the solution any further as Alyssa Frazee developed it very well and very concretely, and I assume we will be hearing more of her speaking and writing in the future.

The solution suggested is standard, clever, simple and clear: intentionally let some of the pfraud(x) ≥ T cases through to see what happens (though if possible spend to take some additional measures to mitigate potential loss on these) and then use inverse probability weighting to adjust the impact of these test cases. The idea is if you are letting through these “I should have rejected these” items at a rate of 1 in 100 (instead of the full rejection rate of 0 in 100) then each of these requests in fact represents a collection of 100 similar requests: so replicate each of them 100 times in your data and you have an estimate of what would have followed all of these cases through to the end.

The above may sound “dangerous and expensive” but I’ve never seen anything safer or cheaper that actually works reliably. And it is classic experimental design in disguise (the “accept even though I think I should reject” group can be thought of having been marked as “control” before scoring).

There is a tempting (but very wrong) alternative of treating the data marked as potentially fraudulent as being confirmed fraudulent during re-training (something that can actually happen in semi-supervised learning if you are not careful). I wrote on the dangers of this (incorrect) alternate method in my praise of a famous joke (DO NOT USE) method called the data enrichment method.

It is not surprising that the correct adjustment is already well known to statisticians; statistics is largely a field of trying to reliably extract meaningful summaries and inferences from a potentially hostile data environment. This distinction is why I say machine learning stands out from statistics in being a more optimistic (meaning more naive) field.