Real-time graph database helps financial risk control upgrade

With the development of Internet finance, traditional financial institutions enjoy the efficiency improvement brought by financial technology and the expansion of service boundaries. On the other hand, the attack methods of “black production (network black production)” are also escalating, and frauds encountered by financial institutions. The situation is becoming more and more complicated, and the anti-fraud based on the knowledge map has emerged.

Wind control is the evolutionary history of both offensive and defensive techniques

With the development of Internet finance, traditional financial institutions enjoy the efficiency improvement brought by financial technology and the expansion of service boundaries. On the other hand, with the development of science and technology, “black products” have also evolved from hacking to large-scale attacks. Technologies such as IP pools bypass risk control rules, and the frauds experienced by financial institutions are becoming more and more complex. There are four main changes:

First, specialization. The current “black production” team is very professional, not only professional risk control personnel, professional hackers, and even AI experts. Therefore, financial institutions or financial service organizations cannot obtain a technical advantage if they do not have more advanced technology. Comparative Advantage.

Second, industrialization. Financial fraud has evolved from a single crime to a gang, so the “black-earning” gang needs to conduct large-scale attacks through a large number of accounts in order to obtain greater profits. This way, although their fraud mode is constantly changing, there will be behavioral inertia in a short period of time, which gives the risk control personnel the opportunity to seize the traces of this behavior.

Third, concealment. At present, cross-border crimes of the “black-produced” group have become very common. The means of these cross-border criminal groups are more subtle, including the use of the cat pool (ModemPOOL) and the IP pool to shuffle the identity, the latent time is longer, and the transaction link is more complicated. Therefore, higher requirements are placed on the coverage of data, and wind control personnel are required to do deeper data and exploration.

Fourth, sudden. Because the black number will not be able to swindle again once it enters the credit information system, the “black production” industry mainly extracts the maximum value of the black number from two ways. First, they will use one number for a short time. The platform applies for fraudulent loans. Second, many accounts conduct large-scale surprise attacks on a vulnerability at the same time. These two types of sudden attacks require the anti-fraud monitoring system to have high real-time capabilities.

On the whole, looking back at the development of the financial industry in recent years, it can be seen that risk control is the evolutionary history of both offensive and defensive technologies.

Financial solution based on graph relationship

As of now, anti-fraud has also experienced three kinds of anti-fraud, big data-based anti-fraud and now based on knowledge map (relational network) anti-fraud. The underlying technology based on knowledge map anti-fraud relies on graph database technology.

There is a little-known game “connect the dots” in the United States. Its gameplay is to connect all the clues together to get a complete picture of the event. In fact, the point in Figure 1 is equivalent to a lot of data now, which is scattered in the system, so how to quickly connect these scattered points according to the law is what the real-time graph database needs to do.

The point is user data, and the user data is the basis of the graph database. Therefore, how to comply with laws, regulations and regulatory requirements, and collect data according to user authorization is the key to network anti-fraud. This article does not explain the data collection part.

In addition to internal data, plus other external data collection, the risk control personnel can use the customer’s customer social relationship, transaction mode association, Internet behavior, mobile devices and other data to match the customer’s behavior patterns, and finally through anti-fraud The rules engine and machine learning assist to determine the likelihood of a customer being fraudulent.

Let’s take a look at the anti-fraud system architecture based on TigerGraph.

First, the TigerGraph real-time graph database can identify fraud before payment processing. Based on the flexible Schema feature of the graph, TigerGraph supports the collection of data from different sources, based on the relationship between the data to form a global graph similar to a wide table.

Secondly, TigerGraph identifies fraud through “machine learning + graph database”. At present, machine learning technology faces the problem that the feature value is not enough and is not effective enough. The TigerGraph graph database can model the relationship characteristics of the user and then based on the graph database. Real-time screening in milliseconds. Therefore, for the marked people in the system, the graph database can quickly pass his crowd characteristics to make fraud determination, and for the unmarked or marked outdated population in the system, the TigerGraph graph database can generate more than 100 items in the system in milliseconds. Relationship characteristics, and classification and data analysis based on decision tree or logistic regression.

The most important point is that the TigerGraph graph database is well understood based on the relationship-based features or based on decision trees or logistic regression-based judgments. This provides a solution for the company’s “interpretable AI”. .

TigerGraph real-time graph database anti-money laundering application

The example of anti-fraud shows the combination of the TigerGraph graph database and machine learning. Next, through the two anti-money laundering application scenarios to understand another advantage of the TigerGraph graph database – deep link analysis.

The first scenario is to use the graph database to find underreporting in anti-money laundering and improve the accuracy of anti-money laundering detection. For example, if a new user does not have a financial transaction history before, then the system will have no warning and will not be included in the high-risk category, and the staff will mark the transaction as low risk. However, after using the graph database for deep link analysis, it will be found that the user’s number is shared with others, and there have been some early warnings of money laundering. The system decision of this transaction has changed from low risk to high risk.

It can be seen that although the fraudsters have forged some of their basic characteristics and shallow link information in the process of fraud, the deep relationship network cannot be forged in advance or needs to be costly to cover up, based on TigerGraph. The application can easily extract deep features and assist the company in making judgments.

The second scenario is to use the graph database to track anti-money laundering (traditional currency + cryptocurrency) in a mixed economy model. For example, the box in Figure 2 shows the two suspicious transactions that have been discovered. Based on these two transactions, the deep transaction tracking of the upstream and downstream of the capital flow is carried out to master its entire money laundering network, depending on whether the database can support 10 Depth data link query above layer. At present, TigerGraph is trying to incorporate Bitcoin or other cryptocurrency transactions into the regulation with some customers. Even if the trading methods and trading links are more complicated than in the past, under the powerful computing power of TigerGraph, the money laundering network will be exhausted.

Big data + graph technology application status

Based on the excellent performance of graph database technology in these scenarios, many companies have begun to show interest in graph databases, and some forward-looking companies have benefited from this technology and gained competitive advantage.

In fact, graph technology has been around for many years, but there are still many companies that are not used. What hinders the advancement of this technology?

The first point is that it cannot be extended to multiple machines. As we mentioned earlier, in order to maximize the utility of the knowledge map, the richness of data types and the age of data storage are very important, but most of the previous graph databases are stand-alone versions. The configuration of the machine greatly limits the storage range of the data.

For example, one of our bank customers wants to analyze the return of funds for the anti-set scene, that is, the individual sends the money to the merchant through the credit card, and the merchant then returns the money to the individual savings card through other accounts.

In this scenario, the data of the debit card and the credit card are needed, and after the data of the customer is cleaned by the data, only the 10-month debit card data is +1 month, and the credit card data size still has 5 T.

In the past, the graph database could not support such data volume. Based on TigerGraph, we used a cluster of 12 machines to realize the storage of the current data, and shortened the calculation efficiency from the previous 3 to 4 days to 1 to 30 minutes.

In addition, the customer’s later idea is to put the 13-month data of the debit card and the credit card into the analysis, based on these more data application scenarios to more fully cover the cash-out group, and this data size for TigerGraph It is not a problem.

The second point is that through the above-mentioned anti-money laundering case, it can be seen that each additional step of the analysis path may reveal more links and implied relationships. In actual business, it is necessary to analyze more than 3 to 10 steps. The graph database in the enterprise-level scenario, there will be a timeout or memory overflow scenario when the query is 2 to 3 degrees, so the shallow feature relationship fraudster can even forge, such performance can be said to help the fraud screening very small .

The last point is that we have real-time requirements for scenarios such as fraud, while other databases are difficult to perform sub-second queries and support real-time update operations.

Although the current time limit for countries such as anti-money laundering is not high (on a daily basis), this is also a compromise due to the failure of previous technological developments to achieve more rapid computational efficiency. In theory, any case in the financial field must be as fast as possible.

At present, we have done an anti-money laundering system for a domestic payment institution, and the scene recognition has been done at the minute level. Based on the above three points, at present, there are alternative solutions for each point. Many companies achieve the corresponding effect of large data volume + second level by means of graph database + big data platform, but such a solution cannot be easily due to high technical threshold. grasp.

As a general enterprise, a simple and mature solution is needed to meet these three requirements. The real-time graph database TigerGraph can well meet the needs of the three aspects of the enterprise.

TigerGraph’s unique weapon

First, in terms of scalability and high performance, TigerGraph is a global company that implements native parallel graph database technology. The underlying layer is based on the native parallel graph storage structure. All data is stored in the form of edges and nodes. When the data enters TigerGraph, It will be compressed, combined with the map partitioning technology, to achieve a storage scale of 50 to 200 times that of other graph databases.

Edges and nodes are both storage models and computational models. All nodes are expressed in the form of internal indexes to facilitate fast positioning. The MPP architecture supports massively parallel computing. Based on the above, TigerGraph has achieved hundreds of billions of nodes on a single project, supporting 2 billion data queries and updates per day at a data scale of teraflops. For deep link queries, you can do 6 to 10+ degrees of queries.

Secondly, in terms of ease of use, TigerGraph has independently developed a SQL-like graph query language GSQL. GSQL has a complete expression of Turing/SQL, that is, all current SQL queries and algorithms can be implemented by GSQL, supporting days. PoC (project verification) (not weekly) shows value to customers.

Third, TigerGraph developed GraphStudio visual development tool, which is based on browser, from graph schema design to relational data-Graph mapping, data import, data query can be achieved through this tool, greatly reducing the threshold of use. After the data is imported, the N-degree neighbors, the shortest path, etc. can be queried, and the relatively complicated queries can be queried by the business personnel after the technicians write them.