The traditional approach for fraud detection is based on expert analysis and heuristics. Based on the expertise, certain rules are established that determine if a claim needs to be flagged for further investigation. In other situations, basic statistical models determine the indicators and the thresholds for flagging a claim post-adjudication and pricing. These thresholds and indicators are evaluated statistically and periodically recalibrated. Given the advances and progress in the world of machine learning, the claim adjudication business process can gain from these techniques and algorithms.
The basic idea is to implement intelligently configured algorithms to deliver superior fraud predictive performance in real time. For Healthcare payers, these smart algorithms reduce wasteful spending and minimize the costly post-adjudication recovery processes. The approach described below has proven to be successful in preventing improper payments without impacting provider relations and compliance with prompt pay requirements for public programs like Medicaid or contractual prompt payment terms in the commercial sector. Further, with all the commercial health plans required to implement robust fraud prevention controls in their operations, a real time approach for fraud prevention will go a long way in addressing the market needs.
There are a few machine learning algorithms that are available in the market which provides the capability of predicting and identifying fraudulent claims. The machine learning algorithms ranging from the basic logistic regression, Multi-variate Gaussian (MVG), k-means clustering to neural networks are all candidates for implementing a fraud prevention model.
Sky Solutions data scientists and fraud experts have categorized these statistical models into two buckets:
a) Models/Statistical techniques used for Patterns Discovery: These are statistical techniques such as linked analysis, clustering which can be done in the modeling environment and can help identify the patterns and their associated features.
b) High Performing Models: These are models that can be processed at scale in production and in real time. Logistical Regression and Bayesian models are models that can effectively scale and return results in sub-seconds for doing realtime processing.
The following is a high-level conceptual architecture of the real-time fraud prevention services working with a claim adjudication engine such as PEGA Smarts Claim Engine
CFPE is a claims fraud probability estimator engine, which is the exposed API to the external claims processing engine. This CFPE takes the incoming claims data (RESTFUL API) and evaluates all the configured patterns and computes the probability of fraud score for the claim. The results can either be passed on to the decision API to evaluate different scores across different patterns and provide a recommendation code for the external claims engine.
The set-up of the different models and patterns and their associated coefficients are calculated in a separate environment and the final computed coefficients are fed to the intelligent patterns knowledge base which ensures optimum throughput in run time. Sky Solutions experts have taken a contextual approach to implement these machine learning algorithms in a healthcare payer environment. The following are a couple of examples of this unique approach:
Sky Solutions recognizes that different claim types and scenarios need to be configured differently for the same general pattern. For example, configuring a ‘Gaming the System’ or ‘Kickbacks’ fraud pattern requires different feature attributes and tweaking of the pattern for lab claims as opposed to inpatient claims. Running the model against available claims data for that site further refines these features
Most of the fraud patterns are primarily based on provider profiling and using the provider as the anchor. Sky Solutions experts recognized the need for configuring some of the fraud patterns by using the member as the pivot for assessing patterns of overutilization and geo-location.
The approach to use real-time fraud prevention API requires an iterative approach due to the lack of valid training data sets. The lack of training data set was overcome by using a combination of heuristics and running the production claims via the CFPE engine without impacting the outcome of the claim.
The models continued to improve in accuracy and precision and then flattened out in terms of their effectiveness and efficiency. Further, the operational procedures established refreshed the knowledge base on a weekly basis with revised coefficients and feature weights based on the feedback from the claims payment systems.