Introducing Machine Learning to Corporate Fraud Detection

Accounting fraud is a worldwide problem with potentially serious consequences, but it is often detected after the damage has been done. Hence, efficient and effective methods of detecting corporate accounting fraud would offer significant value to regulators, auditors, and investors.

In a new study, we develop a state-of-the-art fraud prediction model using a machine learning. Following prior research in accounting fraud detection, we use as our sample the detected material accounting misstatements disclosed in the SEC’s Accounting and Auditing Enforcement Releases (AAERs). Our sample covers all publicly listed U.S. firms over the period 1991–2008. Although there are useful nonfinancial predictors of accounting fraud (e.g., an executive’s personal behavior), we use only readily available financial data to make sure that our model can be applied to any publicly traded firm at low cost, and we can easily compare our model’s performance with the performance of fraud prediction models from the extant literature.

We use two types of fraud prediction models from the extant literature as benchmarks. The first is ratio-based logistic regression, commonly used in the accounting literature. Such models typically use financial ratios as predictors; the ratios are often identified by human experts based on theories (e.g., the motivation-ability-opportunity framework from the criminology literature). The second benchmark is a fraud prediction model based on a support vector machine with a financial kernel (SVM-FK) that maps raw financial data into a broader set of ratios within the same year and changes in ratios across different years.

Our proposed fraud prediction model differs from both of these benchmark models in two key ways.

First, we use ensemble learning, a state-of-the art machine learning paradigm, to predict fraud. Unlike conventional machine learning methods that usually generate a single estimator, ensemble learning methods combine the predictions of a set of base estimators to improve the generalization ability and robustness. To address the problem of class imbalance in the number of fraudulent and nonfraudulent firms in our sample, we use RUSBoost, one of the ensemble learning methods that makes use of random under sampling (RUS).

Second, our proposed model uses raw financial data taken directly from financial statements as fraud predictors. Because raw financial data are the most fundamental building blocks of the accounting system, it is interesting to explore whether they can be directly used in fraud prediction. Ex ante, it is unclear whether fraud prediction models based on raw financial data can outperform fraud prediction models based on human expert–identified financial ratios. Fraud prediction models based on financial ratios could be more powerful because the ratios identified by human experts are often grounded in theories that offer sharp prediction on when corporate managers have incentives to engage in fraud. However, existing theories about the drivers of accounting fraud may well be incomplete, as accounting fraud is, by definition, conducted in secrecy and designed to be difficult to detect. Accordingly, converting raw accounting data into a limited number of financial ratios based on potentially incomplete behavioral theories could mean the loss of useful predictive information. In contrast, fraud prediction models that make use of raw financial data could be more powerful because they do not impose any ex ante structure on the raw data, instead letting them speak for themselves. In addition, with the rapid advance of machine learning methods in computer science, fraud prediction models based on raw data can take on more flexible and complex functional forms. We use 14 financial ratios identified by human experts and 28 raw financial data items associated with these ratios as inputs in our fraud prediction models.

To compare the out-of-sample performance of different fraud prediction models, we adopt two distinctive performance evaluation metrics.

First, we use the area under the Receiver Operating Characteristics (ROC) curve (AUC) as a performance evaluation metric. The AUC is equivalent to the probability that a randomly chosen fraud observation will be ranked higher by a classifier than will a randomly chosen nonfraud observation. The AUC for random guesses is 0.50. Therefore, any reasonable fraud prediction model must have an AUC higher than 0.50.

Second, we introduce an alternative performance evaluation metric commonly used for ranking problems, referred to as Normalized Discounted Cumulative Gain at the position k (NDCG@k). Because of the infrequency with which accounting fraud is identified by the SEC’s AAERs, even the best performing fraud prediction model would result in a large number of false positives that far exceed the number of true positives in a test period. Clearly, it is impractical for regulators or corporate monitors to investigate all predicted cases of fraud, given the limited resources available to fight such fraud. Naturally, then, regulators and other monitors seek to investigate the smallest number of observations with the highest predicted likelihood of fraud. Accordingly, we also evaluate the out-of-sample performance of the different fraud prediction models using NDCG@k. Intuitively, NDCG@k assesses the ability of a fraud prediction model to identify actual fraud by picking the top k observations in a test year that have the highest predicted probability of fraud. In our study, we pick a k that equals the top 1 percent of the observations. We select a cutoff of 1 percent because typically less than 1 percent of the firms in a year acommitfraud per the SEC’s AAERs. The values of NDCG@k are bounded between 0 and 1.0, with a higher value representing better model performance.

This table below shows fraud prediction models’ performance comparison averaged over the test period 2003–08:

The Out-of-Sample Performance Evaluation Metrics for the test period 2003-2008

Performance Metrics averaged over the test period 2003-2008
Metric one Metric two
Input Variables Method AUC NDCG@k
14 Financial Ratios 1) logit 0.672

(0.167)

0.028

(0.479)

28 Raw Financial data Items 2) SVM-FK 0.626

(0.012)

0.020

(0.171)

3) RUSBoost 0.725 0.049

Our study joins a small but growing accounting literature that uses financial statement data to predict accounting fraud out of sample. More importantly, our study differs from these existing studies in several key aspects.

First, we introduce a state-of-the-art and powerful machine learning method, ensemble learning. Our results suggest that ensemble learning, if properly used, is more powerful than logistic regression and SVM for the purposes of fraud prediction.

Second, we are the first study to assess the usefulness of using raw financial data, rather than ratios derived from raw financial data, for the purposes of fraud prediction. Our empirical investigations provide preliminary evidence that it is possible to produce more powerful fraud prediction models by carefully selecting—with the aid of theoretical guidance—a small set of raw financial data, then coupling that data with a powerful machine learning method. Our results also raise the exciting possibility that we can further improve fraud prediction by using additional readily available raw financial data guided by new theory.

Third, we introduce to the fraud prediction literature a new performance evaluation metric, NDCG@k. Compared with the commonly used performance evaluation metric AUC, NDCG@k is more useful to regulators and other monitors. This is because regulators and other monitors often face significant resource constraints; therefore, they can only investigate a small number of alleged fraud cases. Because NDCG@k measures a model’s prediction performance by picking the top k firms with the highest predicted probability of fraud, it offers a simple decision rule for regulators and other monitors to identify the most suspicious firms for investigation.

The results from this study also have important implications for the ongoing accounting research that compares the usefulness of textual data versus quantitative data in predicting accounting frauds. Our results raise the bar for this line of text-mining research because we show that the ratio-based logistic regression model in prior research significantly understates the value of financial data in fraud prediction.

This post comes to us from professors Yang Bao at Shanghai Jiao Tong University, Bin Ke at NUS Business School, Bin Li at Wuhan University, Y. Julia Yu at the University of Virginia, and Jie Zhang at Nanyang Technological University. It is based on their recent article, Detecting Accounting Fraud in Publicly Traded U.S. Firms Using a Machine Learning Approach,” published in the Journal of Accounting Research and available here.