Pragmatic Programming Techniques: July 2011

Online electronic fraud has become increasingly problematic to many companies offering services on the web. Here I am trying to generalize a set of techniques that I found useful in the past.

To be effective in combating frauds, the first thing companies need to have is an overall top-down strategy to deal with frauds, including ...

Have a clearly defined security objective, a good understanding of the fraudsters' motivation, as well as the consequences of fraud.
Have an effective analytic method in place to detect fraud immediately when it happens
Have an responsive handling process in place to react immediately after fraud is detected
Have an preventive process in place to feedback newly discovered fraud patterns into the system

I will be focusing more in following discussion on the technical side of the analytic methods but I want to reiterate that the process side is equally (or even more) important in order for the whole effort of combating fraud to be effective.

Setting Objectives and Targets
Setting the objectives upfront is very important for guiding the subsequent design process of the technical mechanism, especially when making tradeoffs decisions between false positive and false negative. A high false negative rate means fraud goes through undetected while a high false positive rate will cause inconvenience to your existing customers as well as unnecessarily large manual investigation effort.

From another angle, some companies look at the fraud detection methods as an optimization mechanism of using existing resource for conducting manual investigation, which is usually the last resort to handle fraud. These companies usually has a constant team size of fraud investigators. If these people spend too much time in legitimate transactions, there will be less time left to investigate the real fraud transactions. Therefore, the analytical methods aim at guiding the manual investigation effort to those transaction with a higher chance of fraud.

Notice that fraud detection is a continuously-improvement-game. At each iteration, there is a baseline (usually the current best method) and an improvement threshold. The method at each iteration is supposed to provide at least an improvement over the baseline. In the first iteration, the baseline can be very low (e.g. simply random guess). At each iteration, the baseline will be raised until the companies' objectives and targets have been satisfactorily met.

Instrumenting Analytical Methods
Depends on the nature of business and the motivation of fraudster, the characteristics of fraud can be very different. It is very important to understand them before designing the best mechanism to combat them.

Here is a high level decision process to determine the correct method

a) Rule-base approach
If the attack pattern is well-defined (e.g. credit card fradulent transactions tend to have a higher-than-usual spending amount as well as higher-than-usual transaction rate). These attack pattern can usually be extracted from domain experts in the business. The best method in this case to implement a solution is to encode such knowledge as rules or even hard-wired into the application code for efficiency reasons.

Notice that rules need to maintain as new attack patterns are discovered or old attack patterns become obsoleted. Rule engine is a pretty common approach in order to keep such domain knowledge in a declarative form so it can be easily maintained.

b) Classification approach
If we have training examples for both normal case and fraud case, classification methods (based on machine learning) can perform very well. Such analytic methods includes logistic regression, decision trees (random forest), Support vector machine, Bayesian network (naive bayes), Neural network ... etc.

To compare the performance of different classification methods, confusion matrix is commonly used. It is a 2 by 2 matrix measuring the ratio of true positive, false positives, true negative and false negative. Based on the cost associated with false positive and false negative, we can determine a best method (or ensemble of multiple methods) to achieve a minimal cost.

c) One-Class Model approach
If we have just training examples for norm cases but no fraud examples, we still can learn a model based on normal data and then compute the distance between the transaction data and the model we learned. We flag the transaction as fraud if the distance exceed a domain-specific threshold. Here the distance function between the model and a data point needs to be defined ad commonly used ones include statistic methods where the model is the mean and standard deviation of the norm data and the P-value as the distance function. On the other hand, Euclidean distance, Jaccard distance and cosine distance are also commonly used.

d) Density based methods and clustering methods
If we know nothing about the fraud patterns and also don't have training examples for even norm cases, then we can make some assumptions about the distribution of data, such as fraud data is less dense than norm data, in other words, fraud transaction will have less neighbors within a certain radius. If this assumption is reasonable, then we can use density-based method to predict fraud transaction. For example, counting number of neighbors within radius r, or measuring the distance to the kth nearest neighbour. We can also use clustering method to learn clusters and flag transactions too distant from its cluster center as fraud.

Determining input signals
In my experience, determining the right signal is the most important part of the whole process. Sometimes we use raw input attributes as the signal while other times we need to combine multiple attributes to provide the signal.

For example, as we take raw measurements at different points in time, the input signal may involve computing the rate of change of these raw measurement over time. In other words, it is not adequate to just look at each data point in isolation and we need to aggregate raw measurement in a domain specific way.

In my past experience, a large portion of fraud detection cases is about how to deal with account takeover transactions (stolen identities and impersonation). Usually detecting sudden change of behavior (e.g. change point detection) is an effective approach to deal with this kind of frauds.

Time dimension
Instead of looking at each fraud in isolation, in many cases we need to look at the "context" under which fraud are evaluated. As we discussion above in detecting sudden change of behavior, it is quite common to use the past data of a user to build a norm model and evaluate the recent transactions against it to determine if it is fraud. In other words, we compare his/her current behavior with the past.

Besides the "time dimension", we can look into other context as well. For example, we can look at user's peer-group's behavior, observing the deviation of one person's behavior to its peer-group as an indication of a stolen identity.

Notice that the norm pattern may also evolve/change over time, nevertheless we usually don't expect such change to be sudden or rapid. To cater for such slow drift, the norm model need to be continuously adjusted as well. A pretty common technique is to compute a long-term behavioral signature based on a longer time span of transactional data (e.g. 6 months) and compute a short-term behavioral signature based on a shorter time span of data. Then the short-term signature is compared with the long-term signature using a distance function and fraud is flagged if it exceed a pre-defined threshold. It is also important to have an incremental update mechanism for the long-term signature rather than recomputing it from scratch at every update. A common approach is to use exponentially time-decay function such as ...
M[t+1] = a.M[t] + (1-a)S[t].
where 0 < a < 1
M[t] is model at time t
S[t] is the transaction at time t

The importance of Domain Experts
Although sophisticated machine learning algorithms has been pretty powerful in using a generalized solution for a broad scenarios of problems. From my past experience, I have yet seen much cases a sophisticated machine learning algorithm can beat domain expertise. In many projects, simple algorithm with deep domain expertise out-performs sophisticated analytical methods significantly. Therefore, the common pattern that I recommend is to build a rule-based solution at the core and augment it using machine learning analytical methods.

Pragmatic Programming Techniques

Saturday, July 9, 2011

Fraud Detection Methods