aerospike
为什么要使用谓词? (Why use Predicates?)
Aerospike is a high throughput/low latency distributed NoSQL database. Records may be retrieved on the basis of a primary key, secondary index query, or full scan using the most basic API functionality. As Aerospike doesn’t support retrieval of records other than via a primary key, a secondary index, a record UDF function, or a full set (table) or namespace scan, what if you require greater selectivity than one of those basic retrieval methods? Obviously, the selected records may be filtered further by executing code within your client application. But via the utilization of Aerospike Predicates, which are an adjunct to the basic APIs, you can avoid unnecessary network traffic and take advantage of the available CPUs in the cluster of servers in the Aerospike database.
Aerospike是高吞吐量/低延迟的分布式NoSQL数据库。 可以基于主键,辅助索引查询或使用基本的API功能进行的全面扫描来检索记录。 由于Aerospike不支持通过主键,辅助索引,记录UDF函数或完整集(表)或名称空间扫描进行记录检索,如果您需要比那些基本检索方法之一更高的选择性呢? 显然,可以通过在客户端应用程序中执行代码来进一步过滤所选记录。 但是,通过使用基本API的辅助元素Aerospike谓词,您可以避免不必要的网络流量,并利用Aerospike数据库中服务器集群中的可用CPU。
Reducing network traffic is particularly important if your client applications aren’t physically colocated. When the clients are in a different Amazon AWS Availability Zone than the Aerospike DB servers, then they are not physically colocated. The cost for data traffic between Availability Zones can be significant.
如果您的客户端应用程序不在物理上并置,那么减少网络流量尤为重要。 如果客户端与Aerospike DB服务器位于不同的Amazon AWS可用区中,则它们不在物理上并置。 可用区之间的数据通信成本可能很高。
The Aerospike feature that provides the additional selectivity for database API calls is called Predicates. The definition of Predicates is via Postfix Notation (up to Aerospike version 5.1) ordering. Filtering of results via Predicates is performed in parallel across all the servers on the cluster, providing the desired selectivity, while minimizing network traffic and leaving less work for the individual client applications to perform. And, of course, this leads to lower latencies and higher throughput for the applications than the alternative of filtering within the client applications.
为数据库API调用提供额外选择性的Aerospike功能称为谓词。 谓词的定义是通过后缀表示法 (高为Aerospike 5.1版)进行的。 通过谓词对结果进行的过滤是在群集中的所有服务器上并行执行的,从而提供了所需的选择性,同时大程度地减少了网络流量,并减少了单个客户端应用程序要执行的工作量。 而且,当然,与客户端应用程序中的筛选替代方法相比,这导致应用程序的等待时间更短,吞吐量更高。
Predicate expressions use record metadata and record bin values. They are may be used in conjunction to secondary index queries, scans, batch, read, write, delete functions as well as record UDF functions. They can include the following data values in their expressions:
谓词表达式使用记录元数据和记录bin值。 它们可以与辅助索引查询,扫描,批处理,读取,写入,删除功能以及记录UDF功能结合使用。 它们可以在表达式中包含以下数据值:
- Record storage size记录存储大小
- Record last update time记录上次更新时间
- Record expiration time记录到期时间
- Record digest modulo记录摘要模
- Integer, string and GeoJSON bin values整数,字符串和GeoJSON bin值
- List and map element values列出和映射元素值
The available operators include:
可用的运算符包括:
- Logical AND, OR and NOT expressions逻辑AND,OR和NOT表达式
- Integer <, <=, ==, !=, >=, and >整数<,<=,==,!=,> =和>
- String == and !=字符串==和!=
- GeoJSON “within” and “contains” comparisonsGeoJSON的“内部”和“包含”比较
- String regular expressions字符串正则表达式
信用卡交易示例 (Credit Card Transaction Example)
As an example of the utilization of predicate filters, let’s assume that your Aerospike database contains credit card transactions. All transactions for an individual within a given week are stored with a Complex Data Type (Map or List) within a record keyed formed using the concatenation of the unique ID for the user along with the week number.
作为使用谓词过滤器的示例,我们假设您的Aerospike数据库包含信用卡交易。 给定一周内某个人的所有交易都将使用复杂数据类型(地图或列表)存储在使用用户ID与周数的串联而形成的记录键中。
When a transaction is initiated using the credit card for a large amount, say $1500, you want to view all transactions that have been made on that same credit card over $300 within the past 12 months. This is to determine whether the owner of the card regularly initiates large transactions or instead, this is an anomaly that needs to be flagged for special attention/validation for potential fraud/theft of the credit card.
当使用信用卡大量发起交易(例如1500美元)时,您要查看在过去12个月中使用同一张信用卡进行的所有交易,金额超过300美元。 这是为了确定卡的所有者是否定期发起大笔交易,或者取而代之的是,这是一个异常,需要标记以特别注意/验证信用卡的潜在欺诈/盗窃行为。
Assuming that your database contains one record per user per month, there are 12 potential records to review. But, rather than the client application program retrieving all 12 records from the Aerospike DB and then scan through all 12 records, we take advantage of the distributed Aerospike DB servers to perform the desired filtering. A single batch read API, augmented with the Predicate is executed, only returning the records for the months with the desired records (transactions greater than $300).
假设您的数据库每个用户每月包含一条记录,则有12条潜在记录需要查看。 但是,我们不是利用客户端应用程序从Aerospike DB中检索所有12条记录,然后扫描所有12条记录,而是利用分布式Aerospike DB服务器来执行所需的过滤。 执行添加了谓词的单个批读取API,仅返回具有所需记录的月份的记录(事务大于$ 300)。
In this example, the primary key for the records is a concatenation of the credit card number and the month separated by a vertical bar (“|”). So for credit card #869387936, the primary key for the January record would be “869387936|Jan”. The bin within the records is a List with the amounts of all the credit card transactions against that credit card for the month of January stored as integers in cents. We will only be looking at the records for 5 months. The records in the demonstration DB look like the following:
在此示例中,记录的主键是信用卡号和月份的连接,并用竖线(“ |”)隔开。 因此,对于信用卡#869387936,一月份记录的主键为“ 869387936 | Jan”。 记录中的bin是一个列表,其中包含该月份针对该信用卡的所有信用卡交易的所有金额,均以美分形式存储。 我们只会查看5个月的记录。 演示数据库中的记录如下所示:
+-----------------+-----------------------------------+| PK | xacts |+-----------------+-----------------------------------+| "869387936|Jan" | LIST('[1995, 595, 52500]') || "869387936|Feb" | LIST('[2150, 1995, 25578, 9829]') || "869387936|Mar" | LIST('[1750]') || "869387936|Apr" | LIST('[1995, 2578, 3650]') | | "869387936|May" | LIST('[26500, 1995, 5.75]') | |+-----------------+-----------------------------------+
The Java logic fragment required to retrieve the desired records is as follows (predicate syntax up to and including Aerospike version 5.1):
检索所需记录所需的Java逻辑片段如下(谓词语法,高包括Aerospike版本5.1):
- Key[] keys = new Key[5];
- // Lets assume that the credit card number is 869387936
- // Lets only look at 5 months
-
- // Create the appropriate keys to retrieve the records for
- // all the months with a SINGLE batch API call.
- keys[] = new Key("test", "pd", "869387936|Jan");
- keys[1] = new Key("test", "pd", "869387936|Feb");
- keys[2] = new Key("test", "pd", "869387936|Mar");
- keys[3] = new Key("test", "pd", "869387936|Apr");
- keys[4] = new Key("test", "pd", "869387936|May");
- // Now we set up the Predicate filtering
- // Predicate filtering utilizes Postfix Notation
- BatchPolicy bPolicy = new BatchPolicy();
- bPolicy.setPredExp(
- // Set the value at $300
- PredExp.integerVar("v"),
- PredExp.integerValue(30000),
- // Select values greater than $300
- PredExp.integerGreater(),
- // Look at the "xacts" bin (attribute)
- PredExp.listBin("xacts"),
- // Iterate over the entire "xacts" list
- PredExp.listIterateOr("v")
- );
-
- // Now execute the BATCH get request
- // The request is fanned out to the servers based on the
- // hash of the primary keys. So, between 1 and 4 servers
- // would receive the request to read the desire(d) record(s)
- // and execute the Predicate, returning only the desired
- // records
-
- Record[] records = client.get(bPolicy, keys);
- // Iterate over the results
- for (Record aRec : records) {
- if (aRec != null) {
- List<?> rList = (List<?>) aRec.getValue("xacts");
- System.out.println(rList);
- } else {
- System.out.println("No record for month");
- }
The output is as follows:[1995, 595, 52500][2150, 1995, 25578, 9829]No record for monthNo record for month[26500, 1995, 5.75]
一个简单的变化 (A Simple Variation)
Let’s try something different; retrieval of all records that include at least one value of between $250 and $500 within the transaction List. The predicate (syntax up to and including Aerospike version 5.1) for that may be defined as:
让我们尝试一些不同的东西。 检索事务列表中至少包含一个介于$ 250和$ 500之间的值的所有记录。 谓词(高达并包括Aerospike版本5.1的语法)可以定义为:
- bPolicy.setPredExp(
- PredExp.integerVar("v"),
- PredExp.integerValue(25000),
- PredExp.integerGreater(),
- PredExp.integerVar("v"),
- PredExp.integerValue(50000),
- PredExp.integerLess(),
- PredExp.and(2),
- PredExp.listBin("xacts"),
- PredExp.listIterateOr("v")
The output when using that predicate is:
使用该谓词时的输出为:
No record for month[2150, 1995, 25578, 9829]No record for monthNo record for month[26500, 1995, 5.75]
文档和将来的增强 (Documentation and future enhancements)
A few times up above, it is mentioned that the Predicate syntax is valid through at least version 5.1. There is a new and expanded Predicate syntax coming in the future that will provide additional functionality.
上面提到过几次,谓词语法至少在5.1版中有效。 将来会有新的扩展谓词语法,它将提供附加功能。
The documentation regarding Aerospike Predicates can be found at: https://www.aerospike.com/docs/guide/predicate.html
有关Aerospike谓词的文档可在以下网址找到: https://www.aerospike.com/docs/guide/predicate.html
翻译自: https://medium.com/aerospike-developer-blog/the-power-of-aerospike-predicate-filters-79c9160469bd