Welcome to Stambia MDM.
This guide contains information about the SemQL language which is used across Stambia MDM.
Preface
Audience
If you want to learn about MDM or discover Stambia MDM, you can watch our tutorials. |
The Stambia MDM Documentation Library, including the development, administration and installation guides is available online. |
Document Conventions
This document uses the following formatting conventions:
Convention | Meaning |
---|---|
boldface |
Boldface type indicates graphical user interface elements associated with an action, or a product specific term or concept. |
italic |
Italic type indicates special emphasis or placeholder variable that you need to provide. |
|
Monospace type indicates code example, text or commands that you enter. |
Other Stambia Resources
In addition to the product manuals, Stambia provides other resources available on its web site: http://www.stambia.com.
Obtaining Help
There are many ways to access the Stambia Technical Support. You can call or email our global Technical Support Center ([email protected]). For more information, see http://www.stambia.com.
Feedback
We welcome your comments and suggestions on the quality and usefulness
of this documentation.
If you find any error or have any suggestion for improvement, please
mail [email protected] and indicate the title of the documentation
along with the chapter, section, and page number, if available. Please
let us know if you want a reply.
Overview
Using this guide, you will learn the SemQL language and its usage in Stambia MDM.
Introduction to SemQL
SemQL is a language to express declarative rules in Stambia MDM. It is used for example to define:
-
Enrichers, Matchers, Validations and Consolidators involved in the Data Certification Process.
-
Composite and transformed attributes in form and table views appearing in MDM Application.
-
Filters when browsing data in the MDM Hub.
It has the following main characteristics:
-
The syntax is close to the Oracle Database SQL language and most SemQL functions map to Oracle functions.
-
SemQL is converted on the fly and executed by the hub database.
-
SemQL is not a query language: It does not support Joins, Sub-queries, Aggregation, in-line Views and Set Operators.
-
SemQL uses Qualified Attribute Names instead of columns names. The code remains implementation-independent.
The following sections describe the main characteristics of the SemQL language.
SemQL is not a query language: SELECT, UPDATE or INSERT queries are not supported, as well as joins, sub-queries, aggregates, in-line views, set operators. |
SQL-Like Clauses
The SemQL Language allows users to define SQL-Like clauses. Depending on the context, these clauses may be one of the following:
-
Condition: A clause that returns a boolean result from the evaluation of expressions using operators. A condition can be used for example for filtering or validating data records (if the clause is false, then the record is filtered out or considered invalid).
-
Expression: A clause that returns a value. In the context of a SemQL Enricher for example, an expression transforms, standardizes and enriches source attributes.
-
Order By Clause: An expression used to sort records. In consolidators, such a clause is used to manage the consolidation conflicts. For example, consider a consolidation made by Most Frequent Value. When multiple values occur with equal frequency, then the SemQL in the Additional Order By Clause determines which value is used.
Functions differ from comparison operators as they
return a non-boolean value. They cannot be used as is in conditions
unless used with a comparison operator. For example,
TO_CHAR(IsValidCustomer) is a valid expression, but not a valid
condition. TO_CHAR(IsValidCustomer)='1' is a valid condition.
|
Not a Query Language
Although it support SQL-Like clauses, SemQL is not a query language. It does not support SQL’s Data Manipulation Language (DML) or Data Definition Language (DDL) statements such as SELECT, INSERT or CREATE TABLE. It does not support Joins, Sub-queries, Aggregation, in-line Views and Set Operators.
Users proficient with SQL should not be mistaken by the appearances. Even if SemQL looks like SQL, the SemQL expressions are entirely parsed and rewritten by the Stambia MDM platform into SQL before their execution by the hub database. A simple SemQL expression may result into a complex SQL statement. As a consequence, it is not recommended to try injecting SQL statements within SemQL expressions. |
Qualified Attribute Names
SemQL clauses manipulate attributes and variables defined in the Stambia MDM model. Attributes are accessed through an unambiguous Qualified Attribute Name. The Qualified Attribute Name is the path to an attribute from the entity being processed.
Built-in and Customized Functions
In expressions, conditions and order by clauses, it is possible to use the built-in SemQL functions. Most of these functions are functions built in the hub’s database and processed by it. Other functions (for example, matching functions) are specific to Stambia MDM.
You can also use in SemQL customized functions implemented using PL/SQL. You must declare these functions in the model to have them appear in the list of functions. See the Declaring PL/SQL Functions for SemQL section in the Stambia MDM Developer’s Guide for more information about declaring customized functions.
Functions that are not declared can still be used in SemQL, but will not be recognized by the SemQL parser and will cause validation warnings. |
The SemQL Editor lists all built-in SemQL functions, plus the customized functions that are declared in the model, with their syntax. |
SemQL Syntax
Language Elements
The SemQL syntax supports the equivalent of SQL Expressions, Conditions or Order By Clause, which are a combination of one or more Values, Operators, and Functions.
Values, Operators and Functions
Values, operators and functions are the tokens in the SemQL language.
-
Values are simple expressions. They may be literals, attributes or variables.
-
Operators modify or compare expressions. SemQL support most SQL operators, including arithmetic and character operators (
+
,-
,*
,/
,||
), comparison operators (=
,!=
,>
,>=
,<
,⇐
,IN
,BETWEEN
,LIKE
,REGEXP_LIKE
,IS NULL
) and logical operators (AND
,OR
,NOT
). -
Functions & Expression Constructs combine other tokens to create new expressions. They include most functions available in the hub’s database, plus the functions implemented by the user.
Operators and Functions are not case-sensitive. Values are case-sensitive.
|
Expressions, Conditions, Order By Clause
Expressions, Conditions and Order By Clauses are the phrases supported by the SemQL Language.
-
Expressions combine values, operators and functions to return a non-boolean value.
-
Conditions combine values, operators and functions to return a boolean value (true or false).
-
Order By Clauses are expressions used to sort data by ascending or descending values of the expression. The
ASC
orDESC
post-fix define the sort order. Default sort order is ascending.
Examples of Expressions
-
FirstName
is an attribute. -
'USA'
is a string literal. -
Initcap(FirstName)
is a function using an attribute. -
Initcap(FirstName) || ' ' || Initcap(LastName)
is a combination of operators, functions, attributes and literals.
Example of Conditions
-
1=1
compares two literals. -
Country='USA'
compares an attribute and a literal. -
Upper(Country) in ('USA', 'US', 'CANADA')
uses a function.
Example of Order By Clauses
-
Country
sorts by the Country attribute (ascending by default) -
Country DESC
sorts by the Country attribute (descending) -
CASE PublisherID WHEN 'MKT' THEN 1 WHEN 'CRM' THEN 2 ELSE 99 END ASC
sorts records where PublisherID equalsMKT
, thenCRM
, then the rest.
The following sections detail the elements of a SemQL clause.
Comments
Comments in SemQL are surrounded by /*
and */
.
An example of code with comments is provided below.
ANY Contacts HAVE (1=1) /* customer with contacts */ AND NOT ANY Contacts HAVE ( IsInfluencer = 1 ) /* and no contact is an influencer */
Values
The values in a SemQL expression may be literals, attributes or model variables:
-
literals are constant values. Numeric are provided as is, other literals must be surrounded by single quotes
'
.
Examples:
-
'John'
-
-42
-
'1994-11-07'
-
Attributes refer to attributes of the entities in the model.
-
Model Variables store values retrieved from remote servers (declared as Variable Value Providers). Variable values are local to each user session, and are refreshed when the user accesses Stambia MDM. Model variables can be used in SemQL filters and expressions created at design and run-time to adapt the experience to the connected user.
-
Search Parameters store the values entered in search form and submitted to the search condition attached to the search form. Search parameters are available only in their own search form’s condition.
For more information about Attributes and Model Variables, refer to the Attributes and Variables/Search Parameters section. |
Operators
Operators are used to:
-
Combine expressions to create new expressions (Arithmetic or Character Operators).
-
Evaluate expressions to return a boolean value. Such operators are used to create conditions.
This section details the operators supported in SemQL.
Arithmetic Operators
Operator | Description |
---|---|
|
Addition |
|
Subtraction |
|
Multiplication |
|
Division |
Character Operators
The ||
(double pipe) is used for string concatenation.
Comparison Operators
Operator | Description |
---|---|
|
Equality |
|
Inequality |
|
Greater than, greater than or equal |
|
Smaller than, smaller than or equal |
|
Compares a value with each value in the list, returns true if one value matches. |
|
Greater than or equal to value_1 and less than or equal to value_2 |
|
TRUE if value matches the pattern. Within the pattern, the character |
|
returns true if the string matches the regular expression pattern. The match parameter may contain one of more of the following options: |
|
Tests for nulls |
|
Condition that returns true if any of child records - in a one to many relationship - meet the given condition. For more information, see Using Related Entities’ Attributes. |
|
Condition that returns true if all of child records - in a one to many relationship - meet the given condition. For more information, see Using Related Entities’ Attributes. |
Logical Operators
Operator | Description |
---|---|
|
Return true if both conditions are true. |
|
Return true if one condition of the other is true. |
|
Returns true if the following condition is false. |
Functions & Expression Constructs
SemQL support functions and expression constructs, that is elements that return a value.
Functions differ from comparison operators as they
return a non-boolean value. They cannot be used as is in conditions
unless used with a comparison operator. For example,
TO_CHAR(IsValidCustomer) is a valid expression, but not a valid
condition. TO_CHAR(IsValidCustomer)='1' is a valid condition.
|
Built-in Functions
The functions available in Stambia MDM include functions in the following categories:
-
Strings
-
Comparison
-
Conversion
-
Date & Time
-
Matching
-
Miscellaneous
-
Null Management
-
Numeric
Useful & Noteworthy Functions
The following list contains noteworthy functions and expressions:
-
TO_CHAR
,TO_DATE
,TO_NUMBER
functions to perform conversion across data types. -
TRIM
,LTRIM
,RTRIM
,PAD
LPAD
,RPAD
to trip or pad with blanks. -
SUBSTR
to retrieve a part of a string. -
REPLACE
,REGEXP_REPLACE
to replace part of a strings. -
INSTR
to find the location of a substring in a string. -
NULLIF
,COALESCE
andNVL
to handle null values. -
GREATEST
andLEAST
to return the greatest and least of a list of expressions. -
SYSDATE
to retrieve the system date.
The complete set of built-in functions with their description is available in Appendix A
Functions for Matching
Certain functions are key in a fuzzy matching process.
Functions for normalizing of transforming values to reduce the noise during fuzzy matching:
-
UPPER
,LOWER
andINITCAP
absorb the case-sensitivity differences in strings. -
SOUNDEX
returns phonetic representations of strings, absorbing typos. -
SEM_NORMALIZE
returns a string with non-ASCII characters transformed to ASCII-equivalent or a blank.
Functions that implement fuzzy matching capabilities:
-
SEM_EDIT_DISTANCE
andSEM_EDIT_DISTANCE_SIMILARITY
respectively returns the distance and percentage of similarity between two strings according to the Levenshtein distance algorithm. -
SEM_JARO_WINKLER
andSEM_JARO_WINKLER_SIMILARITY
respectively return the distance and percentage of similarity between two strings according to the Jaro-Winkler distance algorithm. -
SEM_SEM_NGRAMS_SIMILARITY
returns the percentage of similarity of two strings according to the Dice’s coefficient similarity measure applied to the n-grams of the strings.
The *_SIMILARITY functions return a value between 0 (no match) and 100 (perfect match). If one or both strings are null, the returned value is 0.
|
Other Constructs
CASE Expression
The CASE
expression selects a result from one or more alternatives,
and returns this result.
This syntax returns the first result for which the expression matches the selector. If none match, it returns the default result.
CASE selector WHEN expression_1 THEN result_1 ... WHEN expression_n THEN result_n [ELSE default_result] END
This syntax returns the first result for which the condition is true. If none is true, it returns the default result.
CASE WHEN condition_1 THEN result_1 ... WHEN condition_n THEN result_n [ELSE default_result] END
The following example from an Enricher transforms the CustomerName attribute according to the Publisher of the record.
CASE PublisherID WHEN 'CRM' THEN Upper(CustomerName) WHEN 'MKT' THEN Upper(Replace(CustomerName, '-', ' ')) ELSE CustomerName END
The same example with the second syntax:
CASE WHEN PublisherID='CRM' THEN Upper(CustomerName) WHEN PublisherID='MKT' THEN Upper(Replace(CustomerName, '-', ' ')) ELSE CustomerName END
Customized Functions
SemQL allows you to access database functions implemented in the database instance hosting the hub.
Call these functions as regular functions by prefixing them
with their schema and (optionnally) their package name: <schema>.<package>.<function>
.
For example, to call a CUSTFUNC() function, stored in a CUST001 package, in a COMMON_TOOLS schema, the syntax is:
COMMON_TOOLS.CUST001.CUSTFUNC(CustomerName)
The database user of the schema hosting the hub must have sufficient privileges to execute the customized functions. |
Database functions process data with the database engine. For certain processing involving for example algorithms, libraries or services not easily implemented with the database capabilities, it is preferable to opt for the plugin option. See the Stambia MDM Plug-in Development Guide for more information. |
Attributes and Variables/Search Parameters
Variables
Variables store values that are integration job-specific or user session-specific. Variable are either Built-in Platform Variables, or Model Variables.
Built-in Platform Variables
Built-in Platform Variables are built-in the platform. They are described below:
Variable Name | Definition |
---|---|
|
For the certification process, the data branch number where the integration runs. For a user data session, the data branch number to which this user connects to. |
|
For the certification process, the data edition (version) number where the integration job runs. For a user data session, the data edition number to which this user connects to. |
|
For the certification process, name of the user who has submitted the integration job. For a user data session, the name of the connected user. |
|
ID of the external load that has submitted the integration job. This variable is not available in expressions used outside the certification process, such as enrichers or validations triggered in workflows. |
|
ID of the batch running the integration job. This variable is not available in expressions used outside the certification process, such as enrichers or validations triggered in workflows. |
Model Variables
Model Variables are available in user sessions, for example when the user accesses a data location via an application. They are derived from built-in platform variables.
Model variables are typically retrieved from remote servers (which are declared as Variable Value Providers). Variable values are local to each user session, and are refreshed when the user accesses Stambia MDM. Model variables can be used in SemQL filters and expressions created at design and run-time to adapt the experience to the connected user.
Using the built-in platform variables V_USERNAME
, it is
possible to query (via a Variable Value Provider definition) the
corporate LDAP directory and retrieve the email of the connected user,
and then store this value in a model variable called USER_EMAIL
.
For more information about Model Variables, see the Model Variables section in the Logical Modeling chapter of the Stambia MDM Developer’s Guide. For more information about Variable Value Providers, see the Configuring Variable Value Providers in the Stambia MDM Administration Guide. |
Using Variables
Variables are used with the following syntax: :<variable_name>
, for
example: :USER_EMAIL
.
Search Parameters
Search Parameters store the values entered into a search form and submitted to the search condition attached to that search form. They are available only for their own search form’s condition.
For more information about Search Parameters, see the Creating Search Forms section in the Stambia MDM Developer’s Guide. |
Using Search Parameters
Parameters are used using their defined Binding, using the following syntax: :<binding_name>
, for example: :SEARCHED_NAME
.
When editing the SemQL condition of a search form, the available search parameters are listed in the Variables section of the Expression Editor. |
Attribute Qualified Names
A Attribute Qualified Name is the path to an attribute from the current entity being processed. This path not only allows accessing the attributes of the entity. It also allows access to:
-
The attributes of the entities related to the current entity (parent and child entities).
-
The Lineage. For example, to access the attributes of the golden record a master record relates to.
In this section, Name always refers to the (internal) Name of an attribute, and not to the Label. The label may be translated in various languages, but the name is invariant. |
Attribute names are case sensitive. For example
customerName and CustomERName do not represent the same attribute.
|
Using Current Entity’s Attributes
A given SemQL clause is expressed for a given entity. For example, an enricher, a validation or a filter is performed on a pre-defined entity. Such a clause always has access to the attributes of this entity. These attributes may be simple of complex attributes.
Simple Attributes
Simple attributes can be accessed using their Name.
FirstName
returns the value of the FirstName simple
attribute of the current entity (Employee).
Complex Attributes
Display Name
Complex attributes can be accessed using their Attribute Name. This returns the value of the complex attribute in the format of the corresponding complex type’s Display Name.
The SimpleAddressType complex type is defined with a
display type that shows the Address, City and Country definition
attributes separated by a space. This type is used for the
InputAddress attribute of the Custome entity. The InputAddress
qualified name therefore returns a string containing
<Address> <City> <Country>
value for each Customer.
Definition Attribute
It is also possible to retrieve the value of each definition attribute of a complex type by prefixing this definition attribute name by the name of the complex attribute.
The SimpleAddressType complex type includes the Country
definition attribute. This type is used for the InputAddress attribute
of the Custome entity. The InputAddress.Country
qualified name
therefore returns the Country stored in the InputAddress complex
type value for each Customer.
Built-in Attributes
Built-in attributes are provided by Stambia MDM to support the certification process. They appear in addition to the attributes designed in the data model. The available attributes depending on the location of the SemQL expression.
For example, a clause that involves source data (for
example, an enricher on source data) will support built-in attributes such as the
SourceID
(ID of the source record) or the PublisherID
(Code of the
application that published the source record). On Golden Records - which
are not directly related to one source - these built-in attributes no
longer make sense.
For more information about the built-in attributes, refer to the Integration Job chapter of the Stambia MDM Integration Guide. |
Using Related Entities’ Attributes
Related entities may be either parent entities (for a given relation, the current entity has zero or one parent), or child entities (for a given relation, the current entity has zero or more children).
Parent Entities
It is possible to access attributes of a parent entity by prefixing this attribute by the Role of this parent in the relation.
The current Customer entity references the Employee
entity in a relation. In this relation, the Role of this second entity
is AccountManager. The AccountManager.FirstName
attribute refers
to the FirstName of the Employee that is the parent - in this
relation, the AccountManager - of the current Customer.
Referring to parent entities can follow a chain of relations.
AccountManager.CostCenter.CostCenterName
follows two
relations to return the CostCenterName of the CostCenter to which
the AcountManager of the current Customer reports to.
Child Entities
Accessing child entities is possible in conditions only using the
SemQL any
and all
syntax.
Any and All Syntax
The any
syntax is a condition that returns true if any of child
records meet the given condition.
any <child_entity_role> have ( <condition_on_child_entity> )
To filter Customers having at least one Contact named John
any Contacts have ( FirstName = 'John' )
The any
syntax is a condition that returns true if all the child
records meet the given condition.
all <child_entity_role> have ( <condition_on_child_entity> )
To filter Customers having all their Contacts with the IsInfluencer flag set to `0' :
all Contacts have ( isInfluencer = '0' )
Cascading References
It is possible to cascade through several relations’ roles.
To filter Employees managing Customers having one contact with the IsInfluencer flag set to 1
:
any Customers.Contacts have ( IsInfluencer = '1' )
ParentRecord
In the Any and All syntax, it is possible to access the direct parent’s
record from the condition on the child entity, through the
ParentRecord
reserved keyword or through the parent’s role name in the
relation.
The following condition returns all the customers having two contacts with a different ContactID but the same FirstName.
any Contacts have ( any ParentRecord.Contacts have ( ParentRecord.ContactID != ContactID and ParentRecord.FirstName = FirstName ) )
Using the Lineage
You can navigate records lineage using SemQL. Using this navigation, you can for example access the master and the golden records consolidated from a source record, or you can access all the attached master records from a golden record.
To understand the lineage structures, refer to the Data Certification section in the Stambia MDM Integration Guide. |
Lineage Parent Records
You can access the attributes of a parent record related to your current record in the lineage.
This navigation is possible using a pseudo-role name representing the parent relation in the lineage, such as GoldenRecord
(of a master record for example), MasterRecord
or SourceRecord
.
To filter out the master records that are singletons, you can access the golden record using the GoldenRecord
pseudo-role and then use its number of masters MastersCount
:
GoldenRecord.MastersCount > 1
To get the CustomerName
consolidated in the golden record that results from a given source record:
MasterRecord.GoldenRecord.CustomerName
The following lineage navigation patterns are available:
Navigation Source | Navigation Target | Pseudo-Role Name | Description |
---|---|---|---|
Master |
Golden |
|
Golden record into which a master record consolidates. |
Source |
Master |
|
Master record corresponding to a source record. |
Source |
Golden |
|
Golden record consolidated from a source record (ID Matching entities only). |
Source Data Entry |
Master |
|
Master record corresponding to a source record being edited (Data Entry workflows only). |
Source Data Entry |
Golden |
|
Golden record corresponding to a source record being edited. (ID Matching entities in Data Entry workflows only). |
Golden Integration |
Golden |
|
Golden record validated and enriched from the golden integration record. |
Source Error |
Source |
|
Source record that raised the error. |
Lineage Child Records
You can access the attributes of a child records related to your current record in the lineage.
This navigation is possible using a pseudo-role name representing the child records in the lineage relation, such as MasterRecords
(the master records attached to a golden record) or SourceRecords
(the source records attached to a master record).
You can use the lineage child records similarly to the Child Entities records, using the SemQL any
and all
syntax.
To filter golden records that have duplicates only from the CRM publisher:
MastersCount > 1 and all MasterRecords have ( PublisherID = 'CRM' )
To filter all master records created from source records older than 5 days.
all SourceRecords have (CreationDate < SYSDATE() - 5)
The following navigation patterns are available:
Navigation Source | Navigation Target | Pseudo-Role Name | Description |
---|---|---|---|
Golden |
Master |
|
Master data consolidated in the golden record |
Golden |
Source |
|
Source records with the same ID as the golden record (ID Matching entities only). |
Master |
Source |
|
History of the source records for the master record. |
Source |
Source Errors |
|
Errors detected for a given source record. |
Special Cases
Attributes of Duplicates
Certain SemQL expressions manipulate two similar records simultaneously:
-
The SemQL condition that defines the records matching rule in a Matcher.
-
The SemQL condition used to filter duplicates in a duplicate management workflow.
For these expressions, the two similar records are identified by the
RECORD1
and RECORD2
pseudo-record prefixes.
The following condition returns the duplicates with the same InputAddress.Address (complex type) but a different CustomerName
Record1.CustomerName <> Record2.CustomerName and Record1.InputAddress.Address = Record2.InputAddress.Address
Attributes in Reference Pickers Filters
In Business Object Views, SemQL conditions can filter the records selectable with reference pickers. These conditions usually manipulate two different records.
In this context, the records are identified using pseudo-record prefixes:
-
Record
represents the record under edition. It is the referencing record. -
Referenced
represents the selectable referenced record (the one filtered in the reference picker).
The following condition is set on a reference picker that enables selecting an Account Manager (Employee) from a Customer business object view. It reduces the selectable employee records to only those in the same country as the Customer being edited.
Record.Country = Referenced.Country
See the Advanced Business Object View Configuration section in the Working with Applications chapter in the Stambia MDM Developer’s Guide for more information.
Using SemQL in Stambia MDM
This section describe the various uses of SemQL in Stambia MDM, as well as the attributes available for these uses.
Using SemQL at Design-Time
At design-time, SemQL is used:
-
In the various rules defining the Certification Process, that is the enrichers, validations, matchers and consolidators.
-
In the Applications to define the filters applied to the lists in the business objects and the form/table view attributes.
-
In the conditions used for securing data Row-Based Filtering.
SemQL in the Certification Process
The certification process takes source records pushed to the MDM Hub by identified Publishers and creates enriched, validated, matched and consolidated golden records. SemQL is involved in the various phases of this process.
Certification Process Clauses
The following table describes the expressions, condition and order by clauses used in the consolidation process.
Certification Phase | Clause Type | Description |
---|---|---|
SemQL Enricher Filter |
Condition |
This condition filters the source records of the entity that must go through this enricher. |
SemQL Enricher Expressions |
Expression |
The result of the expressions load the enriched entity’s attributes. |
Plug-in Enricher Filter |
Condition |
This condition filters the source records of the entity that must go through this enricher. |
Plug-in Enricher Inputs |
Expression |
Each expression result is pushed to a plug-in input. The outputs of the plug-in load the entity’s attributes with enriched values. |
SemQL Validation Condition |
Condition |
This condition defines which source or golden records pass the validation. |
Plug-in Validation Input |
Expression |
Each expression result is pushed to a plug-in input. The plug-in outputs a boolean defining the records that pass the validation. |
Matcher Binning Expression |
Expression |
Master records having the same value for all binning expressions are in the same bin, and will be matched using the matching condition. |
Matcher Matching Condition |
Condition |
Master records in the same bin for which the matching condition returns true are considered as matches. |
Consolidator Custom Ranking |
Order By |
Consolidation uses values from the first of the records ordered by this expression. |
Consolidator Additional Order By |
Order By |
If the consolidation strategy returns two records with the same rank, the consolidation sorts them by this expression and uses the value of the first one. |
Available Attributes by Clause
In the certification process:
-
Attributes from the entity being processed are always available.
-
Attributes from related entities (parent or child entities as well as Lineage parent and child records) are available for certain clauses, as listed in the table below.
-
If the current entity references a parent entity, the Foreign ID representing this reference may be available in certain cases, as listed in the table below.
-
Certain built-in attributes are available for the clauses.
The list of attributes depending on the clause are listed below.
Certification Phase | Built-in Attributes | Parent ID | Parent Attributes | Child Attributes |
---|---|---|---|---|
SemQL Enricher Filter |
LoadID [3], PublisherID [3], SourceID [3], OriginalBatchID [3], Status [3], MastersCount [3], IsConfirmed [3], ConfidenceScore [3], ConfirmationStatus [3], HasSuggestedMerge [3], ClassName, Creation Date, UpdateDate, Creator, Updator |
Yes |
Yes |
Yes [2] |
SemQL Enricher Expressions |
LoadID [3], PublisherID [3], SourceID [3], OriginalBatchID [3], Status [3], MastersCount [3], IsConfirmed [3], ConfidenceScore [3], ConfirmationStatus [3], HasSuggestedMerge [3], ClassName, Creation Date, UpdateDate, Creator, Updator |
Yes |
Yes |
|
Plug-in Enricher Filter |
LoadID [3], PublisherID [3], SourceID [3], OriginalBatchID [3], Status [3], MastersCount [3], IsConfirmed [3], ConfidenceScore [3], ConfirmationStatus [3], HasSuggestedMerge [3], ClassName, Creation Date, UpdateDate, Creator, Updator |
Yes |
Yes |
Yes |
Plug-in Enricher Inputs |
LoadID [3], PublisherID [3], SourceID [3], OriginalBatchID [3], Status [3], MastersCount [3], IsConfirmed [3], ConfidenceScore [3], ConfirmationStatus [3], HasSuggestedMerge [3], ClassName, Creation Date, UpdateDate, Creator, Updator |
Yes |
Yes |
|
SemQL Validation Condition |
LoadID [3], PublisherID [3], SourceID [3], OriginalBatchID [3], Status [3], MastersCount [3], IsConfirmed [3], ConfidenceScore [3], ConfirmationStatus [3], HasSuggestedMerge [3],ClassName, Creation Date, UpdateDate, Creator, Updator |
No |
Yes |
|
Plug-in Validation Input |
LoadID [3], PublisherID [3], SourceID [3], OriginalBatchID [3], Status [3], MastersCount [3], IsConfirmed [3], ConfidenceScore [3], ConfirmationStatus [3], HasSuggestedMerge [3],ClassName, Creation Date, UpdateDate, Creator, Updator |
No |
Yes |
|
Matcher Binning Expression |
BatchID, PublisherID, SourceID, ClassName, Creation Date, UpdateDate, Creator, Updator |
Yes |
No |
No |
Matcher Matching Condition |
BatchID, PublisherID, SourceID, ClassName, Creation Date, UpdateDate, Creator, Updator |
Yes |
No |
No |
Consolidator Custom Ranking |
PublisherID, SourceID, Creation Date, UpdateDate, Creator, Updator |
Yes |
No |
No |
Consolidator Additional Order By |
PublisherID, SourceID, Creation Date, UpdateDate, Creator, Updator |
Yes |
No |
No |
[1] Children records are available only in the contexts of conditions clauses expressed within the expressions or conditions.
[2] Only the children existing in the current load are visible in source validations and enrichers. All children are visible for post consolidation validations.
[3] The availability of certain attributes depend on the validation scope (pre or post consolidation)
Certification Process SemQL Example
Enricher Expressions
Examples of enricher conditions:
-
FirstName:
InitCap(FirstName)
-
Name:
InitCap(FirstName) || Upper(FirstName)
-
City:
Replace(Upper(InputAddress.City),'CEDEX','')
-
EmployeeCostCenterName :
CostCenter.CostCenterName
: Current Employee entity references the CostCenter entity and this expression returns an employee’s cost center name
Validation Conditions
Checking the Customer’s InputAddress complex attribute validity:
InputAddress.Address is not null and ( InputAddress.PostalCode is not null or InputAddress.City is not null)
In this example, the IS NOT NULL
, AND
and OR
SemQL operators are
used to build the condition.
Matcher
-
Binning Expression to grouping customers by their Country/PostalCode:
InputAddress.Country || InputAddress.PostalCode
-
Matching Condition: Matching two customer records by name, address and city name similarity:
SEM_EDIT_DISTANCE_SIMILARITY( Record1.CustomerName, Record2.CustomerName ) > 65 and SEM_EDIT_DISTANCE_SIMILARITY( Record1.InputAddress.Address, Record2.InputAddress.Address ) > 65 and SEM_EDIT_DISTANCE_SIMILARITY( Record1.InputAddress.City, Record2.InputAddress.City ) > 65
In this second example, SEM_EDIT_DISTANCE_SIMILARITY
is a SemQL
function. Record1
and Record2
are predefined names for qualifying
the two record to match.
SemQL in Privilege Grants
Entity privileges support Row-Level Filtering, to apply privileges only
to a subset of the records of the entity. The subsets are defined using
SemQL filters.
Each entity privilege grant may be associated with a filtering
Condition.
This condition has access to all attributes of the entity, its related
entities and built-in attributes ClassName, BranchID, CreationDate,
UpdateDate, Creator and Updator.
To grant a given privilege to a user which email address
(stored in a USER_EMAIL
variable) appears in the current record in the
EmailAddress attribute, the filter is EmailAddress = :USER_EMAIL
.
SemQL in Applications
Business Object Filters
Each entity object of a business object supports a Condition that
filters the records displayed in this entity object.
This condition has access to all attributes of the entity, its related
entities and built-in attributes ClassName, BranchID, CreationDate,
UpdateDate, Creator and Updator.
To filter at the root of the hierarchy of cost centers only
those with no parent, the following filter is applied:
FID_ParentCostCenter is null
. FID_ParentCostCenter is the attribute
representing the relation between a cost center and its parent cost
center.
Form/Table View Attributes
Form and table view attributes support SemQL Expressions. These expressions help building composite attribute for display purposes. In form views used for data entry, such composite attributes appear as read-only.
This expression has access to all attributes of the entity, its related entities and built-in attributes.
To display a custom attribute called ContactsStatius on a Customer form view which displays whether this customer has contacts and whether these contacts have the IsInfluencer flag set to '1':
CASE WHEN ANY Contacts HAVE (IsInfluencer = 1 ) THEN 'One or More Influencers' WHEN NOT ANY Contacts HAVE (1=1) THEN 'No Contacts' ELSE 'No Influencer Contact' END
Predefined Sort
In Business Object Views, you can defined an Expression that is used to sort the records under the node in the business object view.
Customized Display Name
The default display name defined for an entity can be customized:
-
In a business object view transition referring to this entity, to change the string displayed for this entity in the hierarchy.
-
In a form view that refers to this entity, to customize the string displayed for the entity in the form.
Both these customizations are expressed as SemQL Expressions that use the attributes of the entity.
Reference Picker Filters
In Business Object Views, SemQL conditions can filter the records selectable with reference pickers. See Reference Pickers for more information.
Search Forms
In Search Forms, you define a SemQL condition to filter records based on the search parameters values entered in the form.
This condition has access to all attributes of the searched entity, its related entities and built-in attributes. Note that certain built-in attributes should be used with cautious as some do not apply to all the views the search form may apply to.
Using SemQL at Run-Time
Filters
All date view in an application support filters in the form of
Conditions.
This condition has access to all attributes of the entity, its related
entities and built-il attributes that depend on the view that is
accessed.
For example, on a Golden Record: BatchID, BranchID,FromEdition,
ToEdition, ClassName, CreationDate, UpdateDate, Creator and Updator
are available.
Filters using lineage navigation clauses only work only if lineage is enabled for the user and application. |
Duplicates Filters
When filtering duplicates, for example to filter records to checkout in
a duplicate management activity, it is possible to use Record1
and
Record2
as in a Matcher, except that the comparison
takes place on the pairs of records that have matched.
To show matching records, but with the CustomerName fields less similar by less than 95%:
SEM_EDIT_DISTANCE_SIMILARITY(Record1.CustomerName, Record2.CustomerName) < 95
The SemQL Editor
The SemQL editor can be called from the workbench when a SemQL expression, condition or clause needs to be built.
This editor is organized as follows:
-
Attributes available for the expression appear in left panel. Double-click an attribute to add it to the expression.
-
Functions declared in SemQL appear in the left bottom panel, grouped in function groups. Double-click a function to add it to the expression.
-
Variables available for the expression appear in the bottom center panel.
-
Messages appear in the right bottom panel, showing parsing errors and warnings.
-
Description for the selected function or attribute appear at the bottom of the editor.
-
The Toolbar allows to indent the code or hide/display the various panels of the editor and to undo/redo code edits.
Appendix A: SemQL Functions List
The following table lists the built-in functions available in SemQL.
Function | Description |
---|---|
|
Returns the absolute value of number. |
|
Returns the arc cosine of number. The argument number must be in the range of –1 to 1, and the function returns a value in the range of 0 to pi, expressed in radians. |
|
Returns the date |
|
Returns the decimal representation in the database character set of the first character of string. |
|
Takes as its argument a string, or an expression that resolves to a string, in any character set and returns an ASCII version of the string in the database character set. Non-ASCII characters are converted to the form xxxx, where xxxx represents a UTF-16 code unit. |
|
Returns the arc sine of number. The argument number must be in the range of –1 to 1, and the function returns a value in the range of -pi/2 to pi/2, expressed in radians. |
|
Returns the arc tangent of number. The argument number can be in an unbounded range and the function returns a value in the range of -pi/2 to pi/2, expressed in radians. |
|
Returns the arc tangent of number1 and number2. The argument number1 can be in an unbounded range and the function returns a value in the range of -pi to pi, depending on the signs of number1 and number2, expressed in radians. ATAN2(n1,n2) is the same as ATAN2(n1/n2). |
|
Converts a bit vector to its equivalent number. Each argument to this function represents a bit in the bit vector. This function takes as arguments any numeric data type, or any non-numeric data type that can be implicitly converted to a number. Each expr must evaluate to 0 or 1. |
|
Computes an AND operation on the bits of expr1 and expr2, both of which must resolve to nonnegative integers, and returns an integer. |
|
Returns the smallest integer greater than or equal to number. |
|
Returns the character having the binary equivalent to number as a string value in the database character set. |
|
Returns the first non-null expr in the expression list. At least one expr must not be the literal NULL. If all occurrences of expr evaluate to null, then the function returns null. |
|
Takes as its argument a string, or an expression that resolves to a string, in any data type, and returns a Unicode string in its fully normalized form in the same character set as the input. |
|
Returns string1 concatenated with string2. This function is equivalent to the concatenation operator. |
|
Converts a character string from one character set to another. |
|
Returns the cosine of number (an angle expressed in radians). |
|
Returns the hyperbolic cosine of number. |
|
Returns the current date in the session time zone, in a value in the Gregorian calendar. |
|
Returns the current date and time in the session time zone. If you omit precision, then the default is 6. |
|
Returns the value of the database time zone. The return type is a time zone offset a time zone region name. |
|
Compares expr to each search value one by one. If expr is equal to a search, then it returns the corresponding result. If no match is found, then it returns default. If default is omitted, then Oracle returns null. |
|
Takes as its argument a string in any data type and returns a Unicode string after decomposition in the same character set as the input. For example, an o-umlaut code point will be returned as the |
|
Returns e raised to the number-th power. The function returns a value of the same type as the argument. |
|
Extracts and returns the day from expr. expr must be a valid ANSI date. |
|
Extracts and returns the hour from expr. expr must be a valid ANSI datetime. |
|
Extracts and returns the minute from expr. expr must be a valid ANSI datetime. |
|
Extracts and returns the month from expr. expr must be a valid ANSI date. |
|
Extracts and returns the second from expr. expr must be a valid ANSI datetime. |
|
Extracts and returns the abbreviation of the timezone from expr. expr must be a valid ANSI datetime including a timezone. |
|
Extracts and returns the hour of the timezone from expr. expr must be a valid ANSI datetime including a timezone. |
|
Extracts and returns the minute of the timezone from expr. expr must be a valid ANSI datetime including a timezone. |
|
Extracts and returns the region of the timezone from expr. expr must be a valid ANSI datetime including a timezone. |
|
Extracts and returns the year from expr. expr must be a valid ANSI date. |
|
Returns largest integer equal to or less than number. |
|
Converts a timestamp value and a time zone to a TIMESTAMP WITH TIME ZONE value. Time_zone_value is a character string in the format |
|
Returns the greatest of the list of one or more expressions. |
|
Converts string containing hexadecimal digits to a raw value. |
|
Returns string, with the first letter of each word in uppercase, all other letters in lowercase. Words are delimited by white space or characters that are not alphanumeric. |
|
Searches string for substring using the input character set. The function returns an integer indicating the position of the first occurrence. Optionally, you can indicate the starting position for the search and the occurrence to search for. |
|
Searches string for substring using UC2 code points. The function returns an integer indicating the position of the first occurrence. Optionally, you can indicate the starting position for the search and the occurrence to search for. |
|
Searches string for substring using UC4 code points. The function returns an integer indicating the position of the first occurrence. Optionally, you can indicate the starting position for the search and the occurrence to search for. |
|
Searches string for substring using bytes instead of character. The function returns an integer indicating the position of the first occurrence. Optionally, you can indicate the starting position for the search and the occurrence to search for. |
|
Searches string for substring using Unicode complete characters. The function returns an integer indicating the position of the first occurrence. Optionally, you can indicate the starting position for the search and the occurrence to search for. |
|
Returns the date of the last day of the month that contains date. |
|
Returns the least of the list of expressions. |
|
Returns the length of string. Length is calculated using characters as defined by the input character set. |
|
Returns the length of string. Length is calculated using characters as defined by the UC2 code point. |
|
Returns the length of string. Length is calculated using characters as defined by the UC4 code point. |
|
Returns the length of string. Length is calculated using bytes instead of characters. |
|
Returns the length of string. Length is calculated using Unicode complete characters. |
|
Returns the natural logarithm of number, where number is greater than 0. |
|
Returns the current date and time in the session time zone. |
|
Returns the logarithm, base number2, of number1. The base number1 can be any positive value other than 0 or 1 and number2 can be any positive value. |
|
Returns char, with all letters lowercase. |
|
Returns expr1, left-padded to length number characters with the sequence of characters in expr2. If you do not specify expr2, then the default is a single blank. |
|
Removes from the left end of string all of the characters contained in set_of_chars. If you do not specify set_of_chars, it defaults to a single blank. |
|
Returns the remainder of number2 divided by number1. Returns number2 if number1 is 0. |
|
Returns number of months between dates date1 and date2. If date1 is later than date2, then the result is positive. If date1 is earlier than date2, then the result is negative. |
|
If number1 is not a number (NaN) then NANVL returns number2. Otherwise, it returns number1. |
|
Returns the character having the binary equivalent to number as a string value in the national character set. |
|
Returns the date and time (given in timezone 1) converted in time zone timezone2. |
|
Returns the date of the first weekday named by day_name that is later than the date date. |
|
Returns a collation key for string, that is a string of bytes used to sort strings. lsparam is in the form NLS_SORT=sort where sort is either BINARY or a liguistic sort sequence. This function manages language-specific sorting. |
|
Returns string with the first letter of each word in uppercase and all other letters in lowercase. nlsparam is in the form NLS_SORT=sort where sort is either BINARY or a liguistic sort sequence. This function manages language-specific characters case changes. |
|
Returns string with all letters in lowercase. nlsparam is in the form NLS_SORT=sort where sort is either BINARY or a liguistic sort sequence. |
|
Returns string with all letters in uppercase. nlsparam is in the form NLS_SORT=sort where sort is either BINARY or a liguistic sort sequence. |
|
Compares expr1 and expr2. If they are equal, then the function returns null. If they are not equal, then the function returns expr1. You cannot specify the literal NULL for expr1. The NULLIF function is logically equivalent to the following CASE expression: CASE WHEN expr1 = expr 2 THEN NULL ELSE expr1 END |
|
Converts number to an INTERVAL DAY TO SECOND literal. The value for interval_unit specifies the unit of number and must resolve to one of the following string values: |
|
Converts number to an INTERVAL YEAR TO MONTH literal. The value for interval_unit specifies the unit of number and must resolve to one of the following string values: |
|
If expr1 is null, then NVL returns expr2. If expr1 is not null, then NVL returns expr1. |
|
If expr1 is not null, then NVL2 returns expr2. If expr1 is null, then NVL2 returns expr3. |
|
Computes a hash value for a given expression. The expr argument determines the data for which you want to compute a hash value. The optional max_bucket argument determines the maximum bucket value returned by the hash function. You can specify any value between 0 and 4294967295. The default is 4294967295. The optional seed_value argument produces many different results for the same set of data. |
|
Returns number2 raised to the number1 power. The base number2 and the exponent number1 can be any numbers, but if number2 is negative, then number1 must be an integer. |
|
Converts raw to a character value containing its hexadecimal equivalent. |
|
Extends the functionality of the REGEXP_INSTR function by returning the number of times a pattern occurs in a source string, starting at position. match_param contains one of more of the following values: i - Case-insensitive match, c - Case-sensitive match, n - Allow |
|
Extends the functionality of the INSTR function by letting you search a string for a regular expression pattern. It returns an integer indicating the beginning or ending position of the matched substring, depending on the value of the return_option argument. If no match is found, the function returns 0. |
|
Extends the functionality of the REPLACE function by letting you search a string for a regular expression pattern. By default, the function returns source_char with every occurrence of the regular expression pattern replaced with replace_string. |
|
Extends the functionality of the SUBSTR function by letting you search a string for a regular expression pattern. It is also similar to REGEXP_INSTR, but instead of returning the position of the substring, it returns the substring itself. This function is useful if you need the contents of a match string but not its position in the source string. |
|
Returns the remainder of number2 divided by number1. |
|
Returns string with every occurrence of search_string replaced with replacement_string. If replacement_string is omitted or null, then all occurrences of search_string are removed. If search_string is null, then string is returned. |
|
Returns date_or_number rounded to the unit specified by the format model fmt_or_integer. |
|
Returns expr1, right-padded to length number characters with expr2, replicated as many times as necessary. If you do not specify expr2, then it defaults to a single blank. |
|
Removes from the right end of string all of the characters that appear in set_of_chars. If you do not specify set_of_chars, then it defaults to a single blank. |
|
Calculates the number of insertions, deletions or substitutions required to transform string1 into string2. If one or both of the strings are null the distance will be the largest integer value (2147483647). |
|
Calculates the number of insertions, deletions or substitutions required to transform string1 into string2, and returns the Normalized value of the Edit Distance between two Strings. The value is between 0 (no match) and 100 (perfect match). If one or both strings are null the result will be 0. |
|
Calculates the measure of agreement between two strings using Jaro-Winkler method. The value is between 0 (no match) and 1 (perfect match). If one or both strings are null the result will be 0. |
|
Calculates the measure of agreement between two strings using Jaro-Winkler method, and returns a score between 0 (no match) and 100 (perfect match). If one or both strings are null the result will be 0. |
|
Returns a string with Latin (supplement, Extended-A and Extended-B) characters converted into their ASCII equivalents, and other (space and non-alphanumeric) characters eliminated. |
|
Calculates the measure of agreement between two strings using the Dice’’s coefficient similarity measure applied to the n-grams of the strings, and returns a score between 0 (no match) and 100 (perfect match). If one or both strings are null the result will be 0. The ngrams_length parameter defines the length of the n-grams (2 by default). |
|
Returns the sign of number. The sign is: –1 if n<0, 0 if n=0, 1 if n>0. |
|
Returns the sine of number (an angle expressed in radians). |
|
Returns the hyperbolic sine of number. |
|
Returns a character string containing the phonetic representation of string. This function lets you compare words that are spelled differently, but sound alike in English. |
|
Returns the square root of number. |
|
Computes a hash value for a given expression and returns it in a RAW value. The optional method lets you choose the hash algorithm (defaults to SHA1) in the following list: SHA1, SHA256, SHA384, SHA512 and MD5. This function requires Oracle version 12c or above. |
|
Returns a portion of string, beginning at character position, substring_length characters long. If position is 0, then it is treated as 1. If position is positive, then the function counts from the beginning of string to find the first character. If position is negative, then the function counts backward from the end of string. If substring_length is omitted, then the function returns all characters to the end of string. Length is calculated using characters as defined by the input character set. |
|
Returns a portion of string, beginning at character position, substring_length characters long. If position is 0, then it is treated as 1. If position is positive, then the function counts from the beginning of string to find the first character. If position is negative, then the function counts backward from the end of string. If substring_length is omitted, then the function returns all characters to the end of string. Length is calculated using UCS2 code points. |
|
Returns a portion of string, beginning at character position, substring_length characters long. If position is 0, then it is treated as 1. If position is positive, then the function counts from the beginning of string to find the first character. If position is negative, then the function counts backward from the end of string. If substring_length is omitted, then the function returns all characters to the end of string. Length is calculated using UCS4 code points. |
|
Returns a portion of string, beginning at character position, substring_length characters long. If position is 0, then it is treated as 1. If position is positive, then the function counts from the beginning of string to find the first character. If position is negative, then the function counts backward from the end of string. If substring_length is omitted, then the function returns all characters to the end of string. Length is calculated using bytes instead of characters. |
|
Returns a portion of string, beginning at character position, substring_length characters long. If position is 0, then it is treated as 1. If position is positive, then the function counts from the beginning of string to find the first character. If position is negative, then the function counts backward from the end of string. If substring_length is omitted, then the function returns all characters to the end of string. Length is calculated using Unicode complete characters. |
|
Returns the current date and time set for the operating system on which the database resides. |
|
Returns the system date, including fractional seconds and time zone, of the system on which the database resides. |
|
Returns the value of parameter associated with the context namespace. |
|
Extracts the UTC (Coordinated Universal Time—formerly Greenwich Mean Time) from a datetime value with time zone offset or time zone region name. |
|
Generates and returns a globally unique identifier (RAW value) made up of 16 bytes. |
|
Returns the tangent of number (an angle expressed in radians). |
|
Returns the hyperbolic tangent of number. |
|
Returns a double-precision floating-point number. |
|
Returns a single-precision floating-point number. |
|
Converts expr to its string representation optionally using fmt and nlsparam for the conversion. |
|
Converts expr to a CLOB (large string) |
|
Converts string to a date value. The fmt is a datetime model format specifying the format of string. If you omit fmt, then string must be in the default date format. If fmt is J, for Julian, then string must be an integer. |
|
Converts a character string to an INTERVAL DAY TO SECOND value. |
|
Returns string with all of its single-byte characters converted to their corresponding multibyte characters. |
|
Converts expr to a number value using the optional format model fmt and nlsparam. |
|
Returns string with all of its multibyte characters converted to their corresponding single-byte characters. |
|
Converts string a timestamp value. The optional fmt specifies the format of string. |
|
Converts string to a TIMESTAMP WITH TIME ZONE value. The optional fmt specifies the format of string. |
|
Converts string to an INTERVAL YEAR TO MONTH type. |
|
Returns expr with all occurrences of each character in from_string replaced by its corresponding character in to_string. Characters in expr that are not in from_string are not replaced. The argument from_string can contain more characters than to_string. In this case, the extra characters at the end of from_string have no corresponding characters in to_string. If these extra characters appear in string, then they are removed from the return value. |
|
Removes from the left and right ends of string all of the characters contained in set_of_chars. If you do not specify set_of_chars, it defaults to a single blank. |
|
When expr is a date, returns expr with the time portion of the day truncated to the unit specified by the format model fmt_or_number. If you omit fmt_or_number, then date is truncated to the nearest day. When expr is a number, returns expr truncated to fmt_or_number decimal places. If fmt_or_number is omitted, then expr is truncated to 0 places. |
|
Takes as its argument a text literal or an expression that resolves to character data and returns it in the national character set. The national character set of the database can be either AL16UTF16 or UTF8. UNISTR provides support for Unicode string literals by letting you specify the Unicode encoding value of characters in the string. |
|
Returns string with all letters uppercase. |
|
Lets you construct equiwidth histograms, in which the histogram range is divided into intervals that have identical size. For a given expression, the function returns the bucket number into which the value of this expression would fall after being evaluated. Expr must evaluate to a numeric or datetime. Min_value and max_value are expressions that resolve to the end points of the acceptable range for expr. Both of these expressions must also evaluate to numeric or datetime values, and neither can evaluate to null. |