ARTIFICIAL INTELLIGENCE IN CREDIT RISK: A LITERATURE REVIEW

This study aimed to address the needs of using artificial intelligence (AI) by investors and industry players to quantify credit risk in a more forward-looking view instead of the traditional non-forward-looking methods. This is a literature review of nine studies on how applications of AI used to provide better forecast power, and whether the results can be adequately understood by analysts who will need to make decisions based on AI computation. We use the keywords "artificial intelligence", "machine learning", and "credit risk" in google scholar. Full text is obtained from Web of Science if unavailable as open-source documents. The consensus is quite consistent and positive. AI can provide better forecast power, and when used correctly, AI can increase the acceptance for less privileged people to access credit, which is good for the overall economy. However, several key challenges remain to make this technology affordable, especially on how to reduce the complexity so that more people can learn how to configure, operate, and interpret the AI computation results. This study is looking for consensus of how AI can help more accurate forecasting of forward-looking credit risk quantification.


Introduction
Initially developed in 1956, Artificial Intelligence (AI) represents one of the most recent spheres of investigation within the domains of science and technology (Russell et al., 2016).However, AI is more expansive than merely being a combination of computer science and mathematics.Instead, it is a crossdisciplinary domain receiving significant inputs from other disciplines like economics, neuroscience, and psychology (Shalev-Shwartz and Ben-David, 2014;Taulli, 2019).
These days, AI is a broad phrase encompassing various technologies and methods for a wide range of jobs.One of the most often used groups of AI techniques is machine learning (ML).In order to solve problems, machine learning (ML) uses past data or precedents (Libbrecht and Noble, 2015).ML may be summed up as computational techniques that use past performance to enhance current performance or produce more precise forecasts.According to Mohri et al. (2018), the ML tool draws its expertise from electronic data available to the system for analysis.AI approaches are now frequently used to handle complicated, non-linear issues, encouraged by the growing amount of data available and the development of information technology.In comparison to conventional evaluation techniques, AI algorithms can offer ones that are more precise and effective (Rampini and Cecconi, 2021).
Machine learning methodologies can be broadly classified into two categories based on their learning approaches: supervised and unsupervised learning (Bastanlar and Ozuysal, 2014).Even though there exist other forms of learning like semi-supervised and online learning (Mohri et al., 2018), the most utilized and favored methods continue to be supervised and unsupervised learning (Alloghani et al., 2020).
The primary distinction between supervised and unsupervised learning lies in the existence of labels within the training data (Alloghani et al., 2020).Supervised learning involves the usage of labeled examples and inputs for training the system (Raschka and Mirjalili, 2019).Broadly speaking, machine learning can be described as computational methods that leverage past experiences to enhance performance or refine predictions.Such experiences are derived from electronic data that can be processed by these systems (Mohri et al., 2018).However, the efficacy of machine learning and AI algorithms is reliant on the quality of their training data (Dong and Rekatsinas, 2018;Halevy et al., 2009).Thus, the quality and comprehensiveness of data used for training are critical factors (Goodfellow et al., 2016).Moreover, reinforcement learning, a different learning model, has been gaining more attention in recent AI research.Unlike supervised and unsupervised learning, reinforcement learning employs AI systems, or agents, which must learn certain behaviors to surmount a problem.This behavior is inculcated through trial-and-error interactions with the agent's environment (Kaelbling et al., 1996).Consequently, AI systems turn into their own educators, eliminating the necessity for human-provided data, guidance, or knowledge (Silver et al., 2017).
Neural networks are a frequently applied AI technology inspired by the human brain's structure.They comprise interconnected, miniaturized units known as artificial neurons (Kureljusic and Reisch, 2022).These artificial neurons are concise processing elements connected to each other, generating outputs based on the learning rules they follow and the inputs they receive.Hence, the goal of neural networks is to emulate the brain's functioning in humans or other organisms (Aggarwal, 2018).Within this context, deep learning represents a term signifying various types of intricate neural networks.The rise of large datasets and massive computational power has led to a significant expansion of deep learning in recent years (Goodfellow et al., 2016).A deep learning architecture is formed of multiple modules or artificial neurons organized in multiple layers.Each layer can transform the input data and can be trained.Deep learning has resulted in substantial advancements in fields like speech recognition, visual object recognition, and object detection (LeCun et al., 2015).
Accounting forecasting is a common and productive application field for AI-based algorithms (Kureljusic and Karger, 2023).Accounting data is typically rule-based and well-structured, making it suitable for automated valuation using AI models.Financial indicators are often interrelated and thus useful for pattern recognition (Soliman, 2008).Nonetheless, AI-based solutions may be better suited to recognize complex relationships in accounting data and distinguish between short-term and long-term developments (Cho et al., 2020).Various approaches can be used for task prediction, including classification, regression, ranking, and clustering algorithms.In the taxonomy approach, the problem is to identify the category of the item under study (Baharudin et al., 2010).A ranking task aims to rank articles based on one or more criteria (Gerdes et al., 2021).Furthermore, clustering involves partitioning a set of elements into homogeneous subsets (Kansal et al., 2018).Finally, unlike previous approaches, regression provides continuous values that can be compared with other observations (Mohri et al., 2018).
The assessment of credit risk is essential to contemporary economies.Historically, statistical techniques and manual auditing have been used to measure it.Recent developments in financial AI are the result of a new generation of machine learning (ML)-driven credit risk models that have drawn a lot of interest from both business and academia.(Shi et al., 2022).Credit risk can be defined as 'the potential that a contractual party will fail to meet its obligations in accordance with the agreed terms'.As a result of transactions of various kinds, credit risk, and credit risk management are key issues for most firms.The possibility that a contractual arrangement is not adhered to equates to the risk of non-performance.This can damage a firm's aims; that is when a strategic plan is drawn up, and it does not happen.Money can be lost if the customer does not pay, or if the financial institution in which money is deposited goes bankrupt.Companies with whom the firm has placed orders may themselves become insolvent and do not deliver on their promises.
At present, to our understanding, there are limited applications of AI in credit risk forecasting that could establish a foundational understanding for future research in this area.The scarcity of research in this field is surprising considering AI's increasing importance in prediction models and the significant contributions it could make.To fill this gap, our objective is to present a contemporary summary of the research environment surrounding AI-based forecasting within the realm of financial accounting, through a systematic literature review.This summary includes examining the possibilities, current methodologies, and multiple use cases.
Based on the research questions, this study aimed to address the need to use artificial intelligence (AI) by investors and industry players to quantify credit risk in a more forward-looking view instead of the traditional non-forward-looking methods.So, the state of the art of this study is looking for consensus on how AI can help more accurate forecasting of forward-looking credit risk quantification, synthesized through a literature review of nine studies on how applications of AI are used to provide better forecast power, and whether the results can be adequately understood by analysts who will need to make decisions based on AI computation.

Methods
In this study, we are endeavoring to discern how AI-based algorithms are applied for tasks related to forecasting credit risk.Literature reviews enable categorizing and amalgamating relevant findings from various disciplines by systematically assembling research, thereby making them a valuable method for creating an exhaustive synopsis of this research field.This comprehensive overview conveys crucial introductory knowledge and serves as groundwork for subsequent research.We initially established the database where the research info was gathered.This study uses a Systematic Literature Review (SLR) approach with data resource derived from published journal indexed by Scopus.Although many different scientific publications exist, we choose the Scopus Index since it was one of the largest (Forliano et al., 2021).A literature review helps classify and integrate relevant findings from multiple disciplines by systematically collecting research and also as a useful method for generating a comprehensive overview of this research field that can serve as an initial overview and foundation for further studies (Kureljusic and Karger, 2023).We applied PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses (Fig. 1) reviewing methodology in our paper (Shi et al., 2020).
For the first step, we adopted the searching platform for our investigation from Google Scholar.Data collection were selected according to the keywords related to our research.Based on the purpose of this research needed, the keywords used are "Artificial Intelligence", "Machine Learning" and "Credit Risk".The searching result shows us 14.000 articles in total.The following step, we applied the criteria of time range from year of 2017 to year of 2023 which resulted the elimination of 1.400 data from the original sources.After removing 1.400 data, we are using filtering of review articles and sort by relevance.From this step, the result that we found is from 12.600 articles data, only 816 articles data recorded by the platform databases.Following the next step, we proceed the data collection by doing manual checking.We selected 24 articles published which has been consider to be eligible related with the purpose of our research.For further examination we started to investigate the article's titles, then studied the keywords and abstracts also comply with our last step consisted of investigating the article's full texts as a final step.This process has leaded to further elimination of 15 publications which were not index by Scopus, therefore we consider as the excluded from our eligible research data collection.
From the exclusion of 15 publications, our final literature collection selected to be applied for our research are nine articles studies in total with terms of the relevancy to the research topic, publication time frame and previous research conclusion.The summary of the result of those 9 articles can be seen on table 8 (on the attachment).

Results and Discussion
Based on a number of literatures reviewed using the method as mentioned in the research method, descriptive explanation from nine articles only a limited number of articles falls into our search criteria per year.The most frequent is in 2021 with three publications (Table 1).We found four articles published in journals in quartile 1, and 3 articles from journals in quartiles 2 to 4. We decided to include two articles that were not indexed but are published by the Central Bank of Spain or funded by the European Union (Table 2).We also found that credit risk is not specific to be the main concern of the financial services industry but also various industries, with two of the articles we review also taking into account the non-financial industry (Table 3).In terms of sample size, the article we review mostly uses a large set of data, three of them even use sample sizes larger than 100,000 (Table 4).

Size
Count < 1,000 2 Between 1,000 to 100,000 4 > 100,000 3 Total 9 Source: various literatures reviewed (2023) For the dependent variable, only one sample that uses wide definitions of default, e.g., rating downgrade after one year of observation.Other articles used relatively simple definitions (default or non-default) (Table 5a).

Variables
Count Non-binary (e.g., one year forecast of default events (bankrupt, rating downgrade) as measured by several proxies) 1 Binary (e.g., default or non-default, solvent or non-solvent) 8 Source: various literatures reviewed (2023) For independent variable, three articles use financial input only, while six use financial and non-financial input (Table 5b).Python is the most popular tool with almost 50% of the articles reviewed using Python as the tool to run machine learning algorithms, followed by R (Table 6).In average, each article uses two to six machine-learning methods.The most popular is artificial/deep neural networks (Table 7a).In average, each article uses one to seven machine-learning validation methods.The most popular is the "Area under the curve" (AUC) (Table 7b).All articles result in consensus that machine learning techniques provide more accurate prediction.However, not all specify which technique is superior to the other.Specific is made by: a. Khemakem and Boujelbene (2017) in favor of SVM, b.Xu (2019) in favor of hybrid decision tree-ANN, c.Abdullah (2021) in favor of ensemble classifier, d.Alonso and Carbo (2021) and Liu (2022) in favor of XGBoost.We also note that Xgboost only appears in journal issued after 2020.We found two special features from selected article, namely Machine learning explainability and business value.

a. Machine learning explainability
Bussmann (2020) and Misheva (2021) are the two articles that take into account how to interpret and explain the resulting suggestions from ML models.Explainability in this context refers to the ability for a party with an interest to understand the primary factors influencing the outcome of a decision made by a model.The Financial Stability Board (2017) proposed that the inability to interpret and audit AI and Machine Learning technologies might escalate to a mass-scale risk.Similarly, Croxson et al. (2019) highlighted that there are certain instances where the legal system may require a level of explainability.
The GDPR (General Data Protection Regulation) (author: GDPR is equivalent to Undang-Undang No. 27 Tahun 2022 tentang Perlindungan Data Pribadi in Indonesia) EU regulation of 2016 in Europe stipulates that "the existence of automated decision-making should include substantial information about the involved logic, as well as the impact and envisioned consequences of such processing on the data subject."Hence, under certain conditions, the GDPR regulation grants the data subject the right to obtain significant information about the logic behind automated decision-making.
Based on the author's observation, the explainability factor is one of the key obstacles face by companies in using machine learning techniques to comply with accounting standards or regulators requirements to measure their forward-looking credit risk.Alonso and Carbo (2021) is able to link between how the use of more accurate machine-learning techniques can create value to shareholders in the form of lower capital requirements.

b. Business value
One method for assessing the economic impact of improved predictions involves identifying the loans that could have been approved with a more accurate forecast model.This could be done either out-of-sample or retrospectively on a subset of the portfolio.This latter approach is employed by Khandhani et al (2010) and Albanessi and Vamossy (2019), who estimate the value addition (VA) of using Machine Learning models by contrasting the profits made with and without forecast.
In their model, the cost savings would depend on the True Positive (TP) rate, which signifies the accurate decision to not approve a loan.However, these savings would be counterbalanced by the opportunity costs from the loss of return on the loans that were rejected due to incorrect anticipation of default (False Positive or FP) by our model.Consequently, one could calculate the VA in relative terms by comparing the savings achieved through the use of a predictive model versus a hypothetical scenario of using a strategy with perfect foresight.

Conclusions
Based on the systematic literature review of the nine studies surrounding AI-based forecasting within the realm of financial accounting, we have three conclusions.First, credit risk assessment holds significant importance in today's business because credit (e.g., loans or trade receivables) issuance involves careful evaluation of potential returns.The consensus is quite consistent and positive that the application of Artificial Intelligence (AI) and Machine Learning (ML) in leveraging alternative data sources other than financial attributes can greatly facilitate comprehensive credit risk analysis.This enables lenders to accurately assess customer behavior and verify clients' loan repayment capabilities.AI can provide better forecast power, and when used correctly, AI can increase the acceptance for less privileged people to access credit, which is good for the overall economy.However, several key challenges remain to make this technology affordable,

389
especially on how to reduce the complexity so that more people can learn how to configure, operate, and interpret the AI computation results.Second, although there seems to be no consensus as to what is the most accurate machine learning technique, this should not be an issue because the most popular tools to run machine learning technique is open source (Python) and all types of techniques are open for everyone to access and learn.Three, the concept of explainable AI/ML and the new techniques found to quantify the value creation of AI/ML might increase the adoption of this technique in real life.They might also be seen as an alternative to increasing the bankability or financial inclusion of many Indonesian people who have not yet "bankable."This is possible because AI/ML does not only consider financial information but also non-financial information to provide suggestions to management.This is an area that the regulator might want to provide incentives to researchers or to industry players on how to decode what non-financial information that could be utilized to reach this objective and realize financial inclusions for all Indonesian people.This study has limitation, namely the possibility of missing some important studies that could affect the conclusions, given the relatively shorter research time used in this study.Therefore, it is expected that future research can expand the results of this study with the support of quantitative methods regarding the use of Artificial Intelligence (AI) and Machine Learning (ML) related to credit risk.

Attachment
Table 8 reveals the summary of the nine previous studies we analyzed.The research has applied supervised learning and has been performed on 133,152 mortgage and credit card customers in prime, near prime and sub-prime lending segments of three European lenders across the UK and the Netherlands during the period January 2016 to July 2017.

Summary result
As candidate models, we chose neural nets and random forests, as they are the most popular supervised learning methods in credit risk for their benefit of applying both structured and unstructured data.The research describes three experiments that develop the AI probability of default models and compares the model quality with the quality of the traditional applied logistic probability of default (PD) models.In all experiments, AI models performed better than the traditional models.Scalable automated credit risk solutions can therefore build on AI in their risk scoring."The need to leverage the high predictive accuracy brought by sophisticated machine learning models, making them interpretable, has an agnostic, post-processing methodology based on correlation network models.From a substantial viewpoint, the model can explain any prediction regarding the Shapley value contribution of each explanatory variable.Total assets to total liabilities (the leverage) is the most important variable, followed by the EBITDA, profit before taxes plus interest paid, measures of operational efficiency, and trade receivables related to solvency.Suggestion: Network-based explainable AI models can effectively advance understanding the determinants of financial and credit risks.
5 Donovan et al (2021) This study creates a summary measure for the borrower's credit risk by combining the estimates generated by each of the three methods using factor analysis for conference calls and MD&As.Using holdout samples, we verify that our credit risk measures explain a substantial portion of the borrower's credit risk as measured by CDS spreads.
In out-of-sample tests, our text-based measures based on conference calls and the MD&A predict across-firm variation in future interest rate spreads, credit rating downgrades, and bankruptcy filings.However, we only find that the credit risk measures based on conference calls predict within-firm variation in future interest rate spreads, credit rating downgrades, and bankruptcy filings.
6 Abdullah (2021) This study found that the artificial neural network classifier has 88% accuracy and sensitivity rate; also, the AUC for this model is 96%.However, the ensemble classifier outperforms all other models by considering log loss and other metrics.ML models outperform Logit both in classification and in calibration, more complex ML algorithms do not necessarily predict better.estimating the savings in regulatory capital when using ML models instead of a simpler model like Lasso to compute the risk-weighted assets.
Implementing XGBoost could yield savings from 12.4% to 17% in terms of regulatory capital requirements under the IRB approach.This leads us to conclude that the potential benefits in economic terms for the institutions would be significant, which justifies further research to better understand all the risks embedded in ML models.
8 Misheva (2021) From an aggregate perspective, a wide adoption of AI-based solutions in credit risk management may benefit financial inclusion and financial system diversity.The robust SHAP values communicate each feature's importance over the model prediction.However, in the case of many features, it can take an extremely long time to generate these values, owing to its exponential run time.Similarly, on the other hand, LIME has certain limitations on model objects and the types of models that it can explain (probabilistic models only).
The lack of algorithmic transparency is one of the main barriers to the wider adoption of AIbased solutions in credit risk management.The greater the trust in AI, the more loan originators will deploy it, which in turn will enable them to foster innovation and move ahead in adopting next-generation capabilities.9 Liu (2022) The results showed that XGBoost is an effective tool in data preprocessing for credit risk prediction, In the second stage, they employ forgeNet to handle the complex relationships between features and to produce the prediction results.The significance test results indicate that the advantages of the proposed two-stage hybrid model are mainly attributed to feature linearization and feature graph mining when utilizing DNN for credit risk prediction.
Source: Previous researches

Table 1 .
Year distribution

Table 2 .
Rank in Schimago journal rank

Table 3 .
Industry focus

Table 5b .
Independent variables

Table 7b .
PerformanceOnly two articles use tools to explain the resulting suggestion from machine learning methodology, issued in 2020 and 2021 (Table7c).

The 6 th International Seminar on Business, Economics, Social Science, and Technology (ISBEST) 2023
This study indicates that the hybrid artificial intelligence (AI) model, specifically the decision tree-ANN model, improves the accuracy of credit risk prediction on e-commerce platforms.The model, trained using data from Taobao, shows the highest accuracy among the tested models and can facilitate healthy and efficient transactions between buyers and sellers.The study suggests that this model can contribute to the sustainable development of the e-commerce ecosystem.