Welcome to Stambia MDM.
This guide contains information about administering and monitoring Stambia MDM.

Preface

Audience

This document is intended for administrators managing and configuring Stambia MDM in an Enterprise Master Data Management Initiative.

If you want to learn about MDM or discover Stambia MDM, you can watch our tutorials.
The Stambia MDM Documentation Library, including the development, administration and installation guides is available online.

Document Conventions

This document uses the following formatting conventions:

Convention Meaning

boldface

Boldface type indicates graphical user interface elements associated with an action, or a product specific term or concept.

italic

Italic type indicates special emphasis or placeholder variable that you need to provide.

monospace

Monospace type indicates code example, text or commands that you enter.

Other Stambia Resources

In addition to the product manuals, Stambia provides other resources available on its web site: http://www.stambia.com.

Obtaining Help

There are many ways to access the Stambia Technical Support. You can call or email our global Technical Support Center ([email protected]). For more information, see http://www.stambia.com.

Feedback

We welcome your comments and suggestions on the quality and usefulness of this documentation.
If you find any error or have any suggestion for improvement, please mail [email protected] and indicate the title of the documentation along with the chapter, section, and page number, if available. Please let us know if you want a reply.

Overview

Using this guide, you will:

  • Understand the Stambia MDM architecture and components.

  • Learn how to manage the various component of the architecture.

  • Learn how to manage run-time and troubleshoot errors.

  • Learn how to enable and manage a secure environment for Stambia MDM.

Introduction to Stambia MDM

What is Stambia MDM?

Stambia MDM is designed to support any kind of Enterprise Master Data Management initiative. It brings an extreme flexibility for defining and implementing master data models and releasing them to production. The platform can be used as the target deployment point for all master data of your enterprise or in conjunction with existing data hubs to contribute to data transparency and quality with federated governance processes. Its powerful and intuitive environment covers all use cases for setting up a successful master data governance strategy.

Stambia MDM is based on a coherent set of features for all Master Data Management projects.

The Stambia MDM Pulse component enables business users to collect metrics and measure - with dashboards and KPIs - the health of their Stambia MDM Hub.

Architecture Overview

The Stambia MDM architecture includes:

  • The Stambia MDM Application: This JEE application is deployed and runs in an application server, and stores its information in a repository.

  • The Repository that stores all the MDM projects metadata and execution logs. This repository is hosted in a schema within an Oracle Database instance. A given Stambia MDM Application is attached to a single repository, and users connecting to this applications access only the content of this repository.

  • The Data Locations. Each data location contains a Stambia MDM hub. This hub contains the golden data and master data, the source data pushed by publishers in the hub, and the various stages of this data in the certification process. A data location is hosted in a schema of an Oracle instance. Several data locations can be attached to a single repository.

  • The Stambia MDM Workbench: This web 2.0 application is served from the Stambia MDM Application and runs in a web browser.

  • The Pulse Metrics Warehouse. This database schema stores the metrics gathered from the Stambia MDM Repository and Data Locations. A Pulse Metrics Warehouse is associated to a single Repository.

Architecture

This section details the various components of the Stambia MDM architecture and their interactions.

Stambia MDM Application

The Stambia MDM application is a Java EE application deployed and running in a supported application server.

This application provides several access methods:

  • Users access it via their web browser using the Stambia MDM Workbench user interface.

  • Applications access the platform services and MDM hub data via Web Services or APIs.

The Stambia MDM application stores its information in a repository. One application is always attached to a single repository, and connects this repository using a JDBC datasource named SEMARCHY_REPOSITORY. This datasource is configured in the application server.

The Stambia MDM application is used at design-time to design and version control data models. At run-time, it is used to deploy the models in the form of MDM hubs and to manage the integration process that certifies golden data from source applications’ data in these hubs. It also exposes the Web Services used to access the golden data from the hubs.

Integration Process

Stambia MDM certifies golden data from data pushed by source applications (the Publishers), through an integration process.

The integration process is triggered as explained below:

  1. Publishers submit to the MDM hub source data as an External Load. Publishing the source data is a three steps operation:

    1. The publisher initializes (via a SQL command or a web service call) an External Load, and receives unique Load ID from the platform.

    2. The publisher inserts source data in various landing tables of the MDM hub in the context of this external load (via SQL commands or web service calls)

    3. The publisher submits the external load (via a SQL command or a web service call), which is converted to an Integration Batch, identified by a unique Batch ID.

  2. The Integration Batch Poller polls integration batches at regular intervals. When a new batch is detected, the integration batch poller requests the Execution Engine to start the Integration Job associated with this batch. This Integration Job is created from a Job Definition. Data in the batch is passed through the different steps of the integration process, and golden data is certified from the source data.

The various steps of the integration process are detailed in the Integration Process Design chapter of the Stambia MDM Developer’s Guide.

The integration processes involves the following components:

  • The Integration Batch Poller that polls new data batches submitted in the hub by the publishing applications.

  • The Execution Engine that orchestrates the certification process to generate the golden data.

Repository

The repository contains the design-time and run-time information for a given Stambia MDM Application instance.

Repository Contents

The repository stores the following information:

  • The models: entities, attributes, etc.

  • The version control information: branches, editions, etc.

  • The configuration & security information: roles, privileges, notification servers, notification policies, preferences, etc.

  • Data locations and data editions: branches, deployed model and data editions, job definitions, etc.

  • Run-time information: queues, logs

A repository is stored in an Oracle database schema accessed from the application using a JDBC datasource named SEMARCHY_REPOSITORY.

The repository should never be accessed directly via SQL queries. Access to the Stambia MDM information must be performed through the Stambia MDM Workbench user interface provided by the application.

Repository Types

There are two types of repositories:

  • Design: All design-time and run-time operations are possible in this type of repository.

  • Deployment: With this repository type, you can only import closed model editions and cannot edit them.

The deployment repositories are suitable for production sites. Model transfer from design to deployment repositories is handled via incremental export/import of closed model editions. Refer to the Planning the Installation chapter in the Stambia MDM Installation Guide for examples of repository deployment patterns.

The repository type is selected at creation time and cannot be modified afterwards.

Data Locations

When a MDM hub must be available for run-time (for testing or production purposes), it is generated from a data model defined in the repository, and deployed in a data location. Data Locations contain the deployed hubs. Each hub contains the golden data, the source data pushed by publishers in the hub, and the various stages of this data in the certification process.

The data location content is hosted in an Oracle database schema and accessed via a JDBC datasource defined in the application server. A data location refers to the datasource via its JNDI URL.

Data Locations, Repositories and Models

A data location is attached to a repository: You can declare as many data locations as you want in a repository, but a data location is always attached to a single repository. It is not possible to have a data location attached to two repositories at the same time.

A data location can contain several editions of a single model from this repository. It is not possible to store two different models in the same data location.

Data Location Contents

A Data Location stores several Deployed Model Editions of the same model and several Data Editions:

  • A Deployed Model Edition is a model version deployed at a given time in a data location. As an MDM model evolves over time, for example to include new entities or functional areas, new model editions are created then deployed. Deployed Model Editions reflect this evolution in the structure of the MDM Hub.

  • Similarly, a Data Edition reflects the evolution of the data stored in the hub over time. You can perform snapshots (editions) of the master data at given points in time. Data Editions reflect the evolution in the content of the MDM Hub. A Data Edition is always attached to a given Deployed Model Edition.

Data Location Types

There are two types of data locations:

  • Development Data Locations: A data location of this type supports installing closed model editions (closed to changes) and installing/updating open model editions (subject to changes). This type of data location is suitable for testing models in development and quality assurance environments.

  • Production Data Location: A data location of this type supports only installing closed model editions. Updating is not allowed, and deploying an open model edition is not allowed. This type of data location is suitable for deploying MDM hubs in production environments.

The type is selected when the data location is created and cannot be changed afterwards.

Refer to the Planning the Installation chapter in the Stambia MDM Installation Guide for examples of data location deployment patterns.

Pulse Metrics Warehouse

Stambia MDM Pulse enables business users to measure - with dashboards and KPIs - the health of their Master Data Stambia MDM Hub using the Pulse Metrics component.

The Pulse Metrics Warehouse is a data schema that stores the metrics gathered from the Stambia MDM Repository and Data Locations. A Pulse Metrics Warehouse is associated to a single Repository.

The Stambia MDM Application gathers the metrics from the Repository and its related Data Locations and loads the Pulse Metrics Warehouse. This collection of metrics is done by the Execution Engine on a customizable schedule.

Data Structures and Integration Processes

A deployed model edition is made of a Data Structure and an Integration Process.

  • The Data Structure of the MDM Hub is a set of tables stored in the database schema of the data location. This structure contains the landing tables for the loads pushed in the hub by the publishers, the golden records tables and the tables handled by the integration process to create golden records from the source records. The Data Structure is implemented to support all the successive editions of a given model deployed in the hub.

  • The Integration Process is a sequence of tasks stored in the repository. This integration process is attached to a given deployed model edition. When a model edition is deployed or updated, the integration process definition for this model edition is created or updated.

Data Structure Details

The data structure is implemented to support all the editions of the model deployed in the hub. It is created when the first model edition is deployed, and is changed when new model editions are deployed. Changes to the structure are incremental and the same set of tables hold the data for the various data editions and deployed model edition.

For example, the GD_CUSTOMER table holds the data for all the data editions and all the editions of the Customer entity. If a new FaxNumber attribute is added to the entity and deployed in the model edition 1.1, a new FAX_NUMBER column is created and is taken into account in the data editions using model edition 1.1 and above. If the attribute Telex Number is removed in the model edition 1.2, the TELEX_NUMER column for this attribute remains in the data structures. Data editions using model editions prior to 1.2 still use this column, but it is no longer used by data editions using the model editions 1.2 and above.

The data structure is also implemented to reduce the storage of the various data editions. Data duplication is avoided as much as possible across data editions.

For example, if a golden record exists and remains for 5 successive editions, it exists only once in the data structures, and flagged as existing in the 5 editions. If it is changed in the next edition, then new data is added to store this change while preserving the previous data editions’ content.

Platform Components

The Stambia MDM platform contains several components described in the sections below.

Integration Batch Poller

The Integration Batch Poller polls the integration batches submitted to the platform on a defined schedule, and starts the Integration Jobs on the Execution Engine.

Execution Engine

The Execution Engine processes the Integration Jobs submitted by the Integration Batch Poller. It orchestrates the certification process to generate the golden data. This engine sequences the jobs in Clusters and Queues.

The engine can use user-created Plug-ins developed using the Stambia MDM Open Plug-in API. For more information about plug-ins development, see the Stambia MDM Plug-in Development Guide.

The execution engine logs the activity of the platform and manages notification policies.

  • Job Logs are stored in the repository and trace the execution of the jobs submitted to the engine. These logs include full job description and statistics.

  • Job Notification Policies are configured per data location. These policies define the conditions upon which job notifications are issued, as well as the content of these notifications. These notifications use Notification Servers declared in the platform.

  • The Execution Console displays the execution detailed activity, and can be used for troubleshooting purposes, for example when restarting a job.

  • Logging (trace) can be configured for debugging the platform behavior.

The execution engine is also in charge of collecting the metrics from the repository and various data locations to load the Pulse Metrics Warehouse.

Web Services

There are several type of web services available in the platform:

  • The Platform Services, which include:

    • The Platform Status Web Service that provides access to the platform status.

    • The Integration Load Web Service that allows loading data into the hub.

    • The Administration Service that exposes administrative features such as purges or data edition management.

    • The Metadata Web Service that provides read access to the model metadata. This web service is used by the Client API.

  • The Data Services that provide access to the data editions known to the application instance. These data services are generated per data edition, and their capabilities depend on the underlying deployed model edition. The data services for each data edition include:

    • The Data Access service, named after the model, that provides read access to the various views storing golden data, master data, errors detected, etc. The structure of this service depends on the model structure.

    • The Data Edition Integration Service that allows loading data in a given open data edition of the hub.

    • The Activity Service that allows SOA applications to manage instances of workflows defined in applications.

    • The Generic Data Service that allows applications to interact (read/write) with data in the hub in a generic way. Unlike the Data Access service, this web service exposes generic (model-independent) structures.

Security

The application uses role-based security for accessing Stambia MDM features. The users and roles used to connect to the application must be defined in the security realm as part of the application server configuration and then declared in Stambia MDM.

Role base security is used in Stambia MDM to define the access privileges to the features of the platform (Platform-Level Security), as well as the privileges to access and modify data in the data editions (Model-Level Security).

Introduction to the Administration Perspectives

For an introduction to the Stambia MDM Workbench user interface, see the Introduction to the Stambia MDM Workbench chapter in the Stambia MDM Developer’s Guide.

The Stambia MDM Workbench provides three perspectives for administrators:

  • Administration Console: this perspective is used to administer the platform components and monitor run-time activity.

  • Model Administration: this perspective is used to manage model editions and branches.

  • Data Locations: this perspective is used to create data locations and manage the model and data editions in these locations.

Administration Console

In the Administration Console perspective, you can view and administer the following components:

  • Execution Engine: Start and stop the engine and manage the jobs, queues and clusters.

  • Integration Batch Poller: Start, stop and configure the behavior of this component.

  • Purge Scheduler: Schedule data location purges to prune the history of data changes according to the Data Retention Policies defined in the models.

  • Notification Servers: Add, remove and configure servers used to send job notifications and workflows emails.

  • Web Services Manager: Start/stop web services and their auto-start configuration.

  • Pulse Configuration: Reset or install the Pulse Metrics Warehouse, schedule or start metrics collection.

  • Plug-ins: View, add or update user-created plug-ins.

  • Executions: View the job log as well as the job definitions.

  • Logging Configuration: Configure the platform logging (trace) for debugging purposes.

  • Variable Value Providers: Configure the system queried by Stambia MDM to retrieve values for model variables.

  • Applications Configuration: Global parameters for all applications.

  • Roles: Declare in Stambia MDM the application server roles, and grant them with platform-level privileges.

Model Administration

In the Model Administration perspective, you can manage the versions (editions) of the models in design-time as well as the model branches. You can create and maintain using this perspective several simultaneous branches of a model, and let developers work on these various branches.

Data Locations

In the Data Location perspective, you can manage the data locations, including:

  • Data Locations creation and deletion.

  • Deployed Model Editions: Install or update model editions in a data location, and view the integration job definitions attached to the deployed model editions.

  • Data Editions: Manage the various data editions in the data location. Create new editions and close old editions. Review the external loads submitted and integration processes executed for the data editions.

  • Job Notifications Policies: Configure the job notification issued on job success or failure.

Managing Repositories

The repository contains the design-time and run-time information for the Stambia MDM application.

Understanding Repositories

The repository is created when Stambia MDM is installed. The type of the repository (Design Repository or Deployment Repository) is set also at creation time, and the application always connects to a single repository.

The repository creation process is detailed in the Stambia MDM Installation Guide.

The type of a repository defines the capabilities of this repository:

  • A Design Repository allows you to perform all design-time and run-time operations.

  • A Deployment Repository only allows run-time operations. You can import closed model editions in such repository but cannot edit them.

Typical Patterns for repository deployment are detailed in the Planning the Installation chapter of the Stambia MDM Installation Guide. Simple and advanced deployment tasks are explained in the Deployment chapter of the Stambia MDM Developer’s Guide.

Repository Administration Tasks

Purging Logs

Both the design and deployment repositories contain the execution logs of the integration jobs. These logs should be deleted regularly to reduce the repository space in the database.

See the Purging the Logs section in this guide for a description of this task.

Viewing the Repository and System Information

Stambia MDM exposes the repository and system details in the About dialog.

To view the repository and system details:

  1. In the Stambia MDM Workbench menu, select Help > About.

  2. In the About dialog:

    • The License Information link displays the current license information and allow for Updating the License Key.

    • The Repository Information link displays the repository details (including name and version).

    • The System Information link dsplays the platform system details and may be used for support purposes.

Updating the License Key

Stambia MDM stores in the repository the license information and the license key provided to you for evaluation or when you purchased the product. You can update an expired license key with a new one using the following procedure.

You must be logged with a user having the semarchyAdmin role to perform license key update tasks. Without this privilege, you are only able to view the license key.

To update the license key:

  1. In the Stambia MDM Workbench menu, select Help > About.

  2. In the About dialog select the License Information link.

  3. In the License Key Information dialog, click the Upload License Key File… button.

  4. Use the Browse button to select the license key file.

  5. If the selected license key recognized as a valid one, you can click the OK button to register the license key in the repository.

A temporary license key must be updated when it expires. When such license key expires, the repository content is preserved as is, but the application is no longer accessible and a popup window will prompt you for a new license key when you log in.

Managing Model and Data Editions

This chapter discusses administration considerations related to Model and Data Editions management.

Understanding Model and Data Editions

Stambia MDM manages two flows of changes:

  • Model Changes are handled using Model Editions. This version control mechanism allows you to freeze versions of a model (called Model Editions) then deploy them for data loading and integration processing in a Data Location.

  • Data Changes are handled using Data Editions. This data version control mechanism allows you to freeze versions of the data (called Data Editions) at any point of time.

A data edition is always based on a given model edition. This means that this data edition contains data organized according to the model structure in the given model edition, and that golden data in this data edition is processed and certified according to the rules of the model in the given model edition.

These two version control mechanisms can be used simultaneously and in parallel threads.

Version Numbers

Model and data editions are identified by a version number. This version number format is <branch>.<edition>. The branch and model numbers start at zero and are automatically incremented as you create new branches or editions.
For example, the first model edition in the first branch has the version [0.0]. The fourth edition of the CustomerAndFinancialMDM model in the second branch is named CustomerAndFinancialMDM [1.3].

Model Editions

A Model Edition reflects the state of the model at a given point in time.

Actions on Model Editions

Model Editions support the following actions:

  • Creating a New Model creates the first edition of the model.

  • Closing and Creating a New Edition of the model freezes the model edition in its current state, and opens a new edition of the model for modification.

  • Branching, to maintain several parallel branches of the model. You create a branch based on an existing closed model edition when you want to fork the project from this edition, or create a maintenance branch.

  • Deployment, to install a model edition in a data location. You can deploy several editions of the same model within the same data location.

  • Update, to re-deploy an open model edition on top of a previous deployed edition. This update process allows to iteratively deploy a model and its integration process, then test it without creating successive editions of this model. This update option is only possible on development data locations.

  • Export and Import model editions, to transfer them between repositories.

Refer to the the following chapters for more information about model editions management tasks:

  • Models Management chapter in the Stambia MDM Developer’s Guide.

  • Deployment chapter in the Stambia MDM Developer’s Guide.

Model Editions Lifecycle

The model edition lifecycle is described below.

  1. The project manager creates a new model and the first model edition.

  2. Developers edit the model metadata. They perform their logical modeling and integration process design activities.

  3. When the developers reach a level of completion in the project, they deploy the model edition for testing, and afterwards update the model edition while pursuing their developments and tests. Such actions are typically performed in a development data location. Sample data can be submitted to the data location for integration in the hub.

  4. When the first project milestone is reached, the project manager:

    1. Closes and create a new model edition.

    2. Deploys the closed model edition or exports the model edition for deployment on a remote repository.

  5. The project can proceed to the next iteration (go to step 2).

  6. When needed, the project manager creates a new branch starting from a closed edition. This may be needed for example when a feature or fix needs to be backported to a close edition without taking all the changes done on later editions.

Considerations for Models Editions Management

The following points should be taken into account when managing the model editions lifecycle:

  • No Model Edition Deletion: It is not possible to delete old model editions. The entire history of the project is always preserved.

  • Update in Design-Time Only: Although update is a useful feature in development for updating quickly a model edition, it is not recommended to perform updates on data location that host production data, and it is not recommended to use development data locations for production. The best practice is to have Production Data Locations that only allow deploying closed model edition for production data.

  • Import/Export for Remote Deployment: It is possible to export and import model from both deployment and development repositories. Importing a model edition is possible in a Deployment Repository if this edition is closed.

  • Avoid Skipping Editions: When importing successive model editions, it is not recommended to skip intermediate editions, as it is not possible import them at a later time. For example, if importing edition 0.1 of a model, then importing edition 0.4, the intermediate editions - 0.2 and 0.3 - can longer be imported in this repository.

Data Editions

A Data Edition reflects the state of the data in a data location at a given point in time.

Actions on Data Editions

Data Editions support the following actions:

  • Creating a Root Branch creates the first edition of the data, based on a given deployed model edition.

  • Closing and Creating a New Edition of the data freezes the edition in its current state, and opens a new edition of the data for modification, on the same or a different deployed model edition.

  • Switching Model Edition changes the deployed model edition supporting a given data edition without closing the data edition.

  • Data Editions can be moved to a Maintenance status. This status prevents new external loads to be submitted in this data edition. This mode can be used before closing a data edition, or when switching it to a different deployed model edition.

Refer to the the following chapters and guides for more information on data editions managements:

  • Models Management chapter in the Stambia MDM Developer’s Guide.

  • Deployment chapter in the Stambia MDM Developer’s Guide.

Data Editions Lifecycle

A typical Data Edition lifecycle (in the context of a Data Location) is described below.

  1. The administrator creates the data location based on a given model

  2. The project manager installs a first model edition (as described in the model editions lifecycle)

  3. The administrator creates a root branch and a first data edition. Integration batches now target this data edition.

  4. The administrator manages data editions:

    • When there is a data milestone, the administrator closes and opens a new data edition, based on the same deployed model edition.

    • When a new model edition is deployed by the project manager (as described in the model editions lifecycle), The administrator closes and opens a new data edition, based on this newly deployed model edition or switches the model edition of the same data edition.

In production, after deploying a new model edition, remember to create a new data edition based on this model edition or switch the existing data edition to the new model edition. Otherwise, the data edition in place still uses its old model edition. In development environments, when you use the update capability, the existing model edition is overwritten, and as a consequence, the open data edition automatically benefits from the deployed updates.

Considerations for Data Editions Management

The following points should be taken into account when managing the data editions lifecycle:

  • Closed Data Edition are Read-only: When closing a data edition, make sure that no further changes are required on the data. It is better to move first the data edition to a Maintenance state, as this state can be reverted.

  • No Data Edition Deletion: It is not possible to delete old data editions. The entire history of the hub is always preserved.

Defining the Version Control Strategy

The model and data edition feature is a framework supporting the organization of your MDM project.

It is important to perform version control planning according to your needs. The following questions should help this planning exercise:

  • When will model editions be created? By who?

  • When (at which frequency) will data editions be created? By who? What are the data milestones?

For example, you may decide as part of the version control plan that:

  • For the Customer hub project:

    • An updated version of the Customer hub is scheduled every 6 month for the next 2 years, and model editions must be created when these milestones are reached.

    • At every milestone, a maintenance branch must be created to maintain the released version.

    • The data in the Customer hub can be version controlled once a year, and a yearly data edition is sufficient.

  • For the Product hub project:

    • The data in Product hub needs to be version controlled every quarter when the product catalog is released to the public, regardless of the project’s milestones.

Managing the Platform

The platform consists of several components that can be managed from the Administration Console perspective.
These components include the Engine, the Integration Batch Poller, the Notification Servers and Notification Policies, the Plug-ins and the Web Services.

Managing the Execution Engine

Accessing the Execution Engine

To access the execution engine:

  1. In the Administration view, double-click the Execution Engine node.
    The Execution Engine editor opens.

The Execution Engine Editor

This editor displays the list of queues grouped by clusters. Queue currently pending on suspended jobs appear in red.

The list of queues and clusters displays the following information:

  • Cluster/Queue Name: the name of the cluster or queue.

  • Status: Status of the queue or cluster. A queue can be either READY, SUSPENDED or BLOCKED. A cluster may be in a BLOCKED or READY status.

  • Queued Jobs: For a queue, the number of jobs queued in this queue. For a cluster number of jobs queued in all the queues of this cluster.

  • Running Jobs: For a queue, the number of jobs running in this queue (1 or 0). For a cluster, the number of jobs running in all the queues of this cluster.

  • Suspend on Error: Defines the behavior of the queue on job error. See the Troubleshooting Errors section for more information.

From the Execution Engine editor, you can perform the following operations:

Stopping and Starting the Execution Engine

To stop and start the execution engine:

  1. In the Administration view, double-click the Execution Engine node. The Execution Engine editor opens.

  2. Use the image Stop this component and image Start this component buttons in the editor’s toolbar to stop and start the execution engine.

Stopping the execution engine does not kill running jobs. The engine stops after all running jobs are completed. Beside, the content of the queues is persisted. When the execution engine is restarted, the execution of queued jobs proceeds normally.

Managing the Integration Batch Poller

The Integration Batch Poller polls the integration batches submitted to the platform, and starts the integration jobs on the execution engine. The polling action is performed on a schedule configured in the batch poller.

Stopping and Starting the Integration Batch Poller

To stop and start the integration batch poller:

  1. In the Administration view, double-click the Integration Batch Poller node. The Integration Batch Poller editor opens.

  2. Use the image Stop this component and image Start this component buttons in the editor’s toolbar to stop and start the integration batch poller.

Stopping the batch poller does not kill running jobs, and does not prevent new batches to be submitted. When this component is stopped, the submitted batches are simply not taken into account and no jobs is queued on the execution engine until the batch poller is restarted.

Configuring the Integration Batch Poller

The integration batch poller configuration determines the frequency at which submitted batches are picked up for processing.

To configure the integration batch poller:

  1. In the Administration view, double-click the Integration Batch Poller node.

  2. In the Integration Batch Poller editor, choose in the Configuration section the polling frequency:

    • Weekly at a given day and time.

    • Daily at a given time.

    • Hourly at a given time.

    • Every n second.

    • With a UNIX See Cron syntax.

  3. Press CTRL+S to save the configuration.

It is not necessary to restart the integration batch poller to take into account the configuration changes.

In the Advanced section, set optionally the following logging parameters:

  • Job Log Level: Select Exclude Skipped Tasks to exclude from the job log the tasks that are skipped. Select Include All Tasks to log all tasks..

  • Execution Monitor Log Level: Logging level [1…3] for the execution console for all the queues.

  • Enable Conditional Execution: A task may be executed or skipped depending on a condition set on the task. For example, a task may be skipped depending on parameters passed to the job. Disabling this option prevents conditional executions and forces the engine to process all the tasks.

Configuring Notifications

Notifications tell users or applications when a job completes or when a workflow task is assigned to a role.

There are two types of notifications:

  • Job Notifications issued under certain conditions when a certification job completes. These notifications are used for administration, monitoring, or integration automation. These notifications are configured with Notification Policies in the data locations.

  • Workflow Notifications are emails sent to users that belong to a role when an activity (a workflow instance) is assigned to this role. These emails are automatically generated and do not require specific configuration.

Both families of Notifications are issued via Notification Servers.

Notifications Servers Types

Notifications recipients may be users or systems. The type of notification sent as well as the recipient depends on the type of notification server configured.

Each notification server uses a Notification Plug-in that:

  • defines the configuration parameters for the notification server,

  • defines the configuration and form of the notification,

  • sends the notifications via the notification servers.

Stambia MDM is provided with several built-in notification plug-ins:

  • JavaMail: The notification is sent in the form of an email via a Mail Session server configured in the application server, and referenced in the notification server. For more information about configuring Mail Session, see the Stambia MDM Installation Guide.

  • SMTP: The notification is sent in the form of an email via a SMTP server entirely configured in the notification server.

  • File: The notification is issued as text in a file stored in a local directory or in a FTP/SFTP file server.

  • HTTP: The notification is issued as a GET or POST request sent to a remote HTTP server. Use this server type to call a web service with the notification information.

  • JMS: The notification is issued as a JMS message in a message queue.

It is possible to develop additional plug-ins to issue other type of notifications. See the Stambia MDM Plug-in Development Guide for more information about plug-in development.

A single notification server having either the JavaMail or SMTP type can be used to send Workflow Notifications. This server is flagged as the Workflow Notification Server

Any servers can be used to send Job Notifications. Each Job Notification Policy specifies the notification server it uses.

Configuring Notification Servers

This section explains how to create notification servers using the built-in notification plug-ins.

Creating a Notification Server

To create a notification server:

  1. In the Administration view, double-click the Notification Servers node. The Notification Servers editor opens.

  2. Select the Notification Servers list, right click and select image New Notification Server. The Create New Notification Server wizard opens.

  3. Enter the following workflow parameters:

    • Name: Internal name of the notification server.

    • Label: User-friendly label for the server.

    • Plug-in ID: Select one of the available notification server plug-in.

    • Workflow Notification Server: Select this option to use this notification server by default in the human workflows. This options can be selected only if the Plug-in ID is JavaMail or SMTP.

  4. Click Next.

  5. In the second wizard page, enter the configuration information for your type of server:

    • JavaMail:

      • JNDI URL: JNDI URL of the Java Mail Session service available in the application server. This URL is typically java:comp/env/mail/Session if the Mail Session service is declared as mail/Session in the application server.

      • From User: Email address of the sender of the notifications from this server. This address is also used in the reply-to address for notification emails.

      • Password If this server requires specific authentication, enter a password for this server.

    • SMTP:

      • SMTP Host Name and SMTP Port: Name or address, and port of the SMTP host.

      • From User: Email address of the sender of the notifications from this server. This address is also used in the reply-to address for notification emails.

      • Authentication Required: If this server requires specific authentication, select the Authentication Required option and enter a User Name and Password for this server, and indicate whether it uses TLS or SSL.

      • Additional SMTP Properties Enter additional properties as property=value pairs.

    • File:

      • File System: Select the file system of the file server. FILE for a local server, FTP or SFTP for a remote server.

      • Host, Port, Login, Password are required to connect an FTP or SFTP server.

      • Root Path: Provide the root path for storing the notification file.

    • HTTP:

      • Scheme: Specify whether the HTTP request should be done using HTTP or HTTPS

      • Host, Port and optionally Login, Password are used to connect the HTTP server.

      • Base Path: Root path appended added after the host and port in the URL.

      • Use System Properties: Check this option to use the system-defined properties to configure the HTTP connection. This option allows using a proxy configuration defined in the Java parameters for the application server.

      • Proxy Host, Proxy Port, Proxy Login and Proxy Password are used to configure the connection through an HTTP proxy.

      • Headers: Enter additional HTTP headers as property=value pairs.

    • JMS:

      • Connection Factory URL: JNDI URL of the factory used to create a connection to the JMS destination. The URL is typically java:comp/env/jms/ConnectionFactory if the connection factory is declared as jms/ConnectionFactory in the application server.

      • Login and Password used when initiating the JMS connection.

  6. Press CTRL-S to save the configuration.

Testing a Notification Server

After configuring the notification server, it is recommended to run a test email on this server.

To test a notification server:

  1. In the Notification Servers editor, select the notification server that you want to test, right-click and select Test Configuration.

  2. The next steps depend on the type of notification servers:

    • File, HTTP, JMS: No further operation is needed. A connection attempt is made on the notification server.

    • JavaMail and SMTP: Provide a comma-separated list of email addresses and then click OK. An email is sent via the notification server to these recipients.

Configuring a Job Notification Policy

With a notification server configured, it is possible to create notification policies using this server.

To create a notification policy:

  1. Open the Data Locations perspective.

  2. In the Data Editions view, right-click the Job Notification Policies node and select image New Job Notification Policy. The Create New Job Notification Policy wizard opens.

  3. In the first wizard page, enter the following information:

    • Name: Internal name of the notification policy.

    • Label: User-friendly label for the notification policy. Note that as the Auto Fill box is checked, the Label is automatically filled in. Modifying this label is optional.

    • Notification Server: Select the notification server that will be used to send these email notifications.

    • Use Complex Condition: Check this option to use a freeform Groovy Condition. Leave it unchecked to define the condition using a form.

  4. Click Next.

  5. Define the job notification condition. This condition apply to a completing job.

    • If you have checked the Use Complex Condition option, enter the Groovy Condition that must be true to issue the notification. See Groovy Condition for more information.

    • If you have not checked the Use Complex Condition option, use the form to define the condition to issue the notification.

      • Job Name Pattern: Name of the job. Use the _ and % wildcards to represent one or any number of characters.

      • Notify on Failure: Select this option to send notification when a job fails or is suspended.

      • Notify on Success: Select this option to send notification when a job completes successfully.

      • … Count Threshold: Select the maximum number of errors, inserts, etc. allowed before a notification is sent.
        If you define a Job Name Pattern, Notify on Failure and a Threshold, a notification is sent if a job matching the pattern fails or to reaches the threshold.

  6. Click Next.

  7. Define the job notification Payload. This payload is a text content, but you can use Groovy also to programmatically generate it. See Groovy Template for more information.
    This payload has a different purpose depending on the type of notification:

    • JavaMail or SMTP: The body of the email

    • File: the content written to the target file.

    • JMS: the payload of the JMS message.

    • HTTP: The content of a POST request.

  8. Click Next.

  9. Define the Notification Properties. These properties depend on the type of notification server:

    • JavaMail or SMTP:

      • Subject: Subject of the email. The subject may be a Groovy Template

      • To, CC: List of recipients of this email. These recipients are roles. Each of these roles points to a list of email addresses.

      • Content Type: Email content type. For example: text/html, text/plain. This content type must correspond to the generated payload.

    • File:

      • Path: Path of the file in the file system. The path may be a Groovy Template.

      • Append: Check this option to append the payload to the file. Otherwise, the file is overwritten.

      • Charset: Charset used for writing the file. Typically UTF-8, UTF-16 or ISO-8859-1.

      • File Name: Name of the file to write. the file name may be a Groovy Template.

      • Root Path: Provide the root path for storing the notification file.

    • HTTP:

      • Method: HTTP request method (POST or GET)

      • Request Path: Path of the request in the HTTP server. The request path may be a Groovy Template

      • Parameters: HTTP Parameters passed to the request in the form a list of property=value pairs separated by a & character. If no parameter is passed and the method is GET, all the notification properties are passed as parameters. The parameters may be a Groovy Template

      • Headers: HTTP Parameters passed to the request as header=value pairs, with one header per line.

      • Content Type: Content type of the payload. For example: text/html, text/plain. This content type must correspond to the generated payload.

      • Failure Regexp: The HTTP server response is parsed with this regular expression. If the regular expression matches, then the notification is considered failed.

    • JMS:

      • JMS Destination: JNDI URL of the JMS topic or queue. The URL is typically java:comp/env/jms/queue/MyQueue if a queue factory is declared as jms/queue/MyQueue in the application server. The destination may be a Groovy Template

      • Message Type: Type of JMS Message sent: TextMessage, MapMessage or Message. See Message Types for more information. When using a MapMessage, the payload is ignored and all properties are passed in the MapMessage.

      • Set Message Properties: Check this option to automatically set all notification properties as message properties. Passing properties in this form simplifies message filtering.

  10. Press CTRL-S to save the configuration.

Using Groovy for Notifications

The Groovy scripting language is used to customize the notification. See http://groovy.codehaus.org for more information about this language.

Groovy Condition

When using a complex condition for triggering the notification, the condition is expressed in the form of a Groovy expression that returns true or false. If this condition is true, then the notification is triggered.

This condition may use properties of the job that completes. Each property is available as a Groovy variable.

You can use the image Edit Expression button and open the condition editor.
In the condition editor:

  • Double-click one of the Properties in the list to add it to the condition.

  • Click the Test button to test the condition against the notification properties provided in the Test Values tab.

  • In the Test Values tab, if you enter an existing Batch ID and click the > button, the properties from this batch are retrieved as test values.

Sample conditions are given below:

Trigger a notification if a job has got errors.
ErrorCount > 0
Trigger a notification for batches in status DONE, triggered by a workflow which name contains "Product".
BatchStatus == 'DONE' && WorkflowName.find("Product") != null
Trigger a notification if the batch has processed the "Customers" or "Contacts" entities. EntityNames is a list of the names of the entities processed by the job.
EntityNames.find() == "Customers" || EntityNames.find() == "Contacts"
Groovy Template

You can use Groovy to customize some elements of the notification, such as the Payload, the subject or the name of the JMS destination of the notification.

In these cases, a Groovy Template is used generate a string output from the notification properties.

In the template:

  • The notification properties are available using the the $<property_name> syntax.

  • You can also use Groovy code surrounded with <% %> tags.

  • You can use the <%= %> syntax to output a string generated by Groovy.

Use the image Edit Expression button to open the expression editor to modify a Groovy template. In the template editor:

  • Double-click one of the Properties in the list to add it to the template. It is added with the $<property_name> syntax.

  • Click the Test button to test the template against the notification properties provided in the Test Values tab.

  • In the Test Values tab, if you enter an existing Batch ID and click the > button, the properties from this batch are retrieved as test values.

Sample templates are given below:

Generated email subject that contains the Job Name and Batch Status
Job ($JobName) is finished as: $BatchStatus.
Creates a message with the job name, and extra content if the batch status is not DONE.
Job ($JobName) is complete.
<% if (BatchStatus != 'DONE')  { %> Please reviews the completed batch : $BatchStatus. <% } %>
Generates an HTML content with a formatted list of entities.
<p>Job ($JobName) is complete.</p>
<p>Entities:</p>
<ul>
<% EntityNames.each() { entityName-> %>
        <li>entityName</li>
<% } %>
</ul>

Configuring Variable Value Providers

Stambia MDM uses variables defined in models to enforce certain data governance policies for a user’s session.
For more information about model variables, see the Stambia MDM Developer’s Guide.

A Variable Value Provider is a system that can be queried by Stambia MDM to retrieve the values for these variables. Typically, this system is a server containing information about the user connected to Stambia MDM.

Two type of variable value providers are supported out-of-the-box:

  • Datasource Variable Provider: This variable value provider is a relational database that is accessed through a JDBC datasource. Stambia MDM can issue SQL statements against this database to retrieve variable values. For example, an employee database that can be queried to retrieve the country of the connected user.

  • LDAP Variable Provider: This variable value provider is a directory server that is accessed using the LDAP protocol. Stambia MDM can issue queries against this directory server to retrieve variable values. For example, an LDAP directory that can be used to retrieve the organizational unit of the connected user.

Variable value providers are configured in the repository, and can be used by any model in this repository.

When working with a deployment repository, make sure to configure the variable value providers used in the models before importing or deploying them in this repository.

Creating a Variable Value Provider

To create a variable value provider:

  1. In the Administration view, double-click the Variable Value Providers node. The Variable Value Providers editor opens.

  2. Select the Variable Value Providers list, right-click and then select image New Variable Value Provider. The Install Variable Value Provider wizard opens.

  3. Enter the following information:

    • Name: Internal name of the variable value provider.

    • Label: User-friendly label for the variable value provider. Note that as the Auto Fill box is checked, the Label is automatically filled in. Modifying this label is optional.

  4. Select the Plug-in ID corresponding to the variable value provider type: LDAP Variable Provider or Datasource Variable Provider.

  5. Click Next.

  6. Click the image Edit Expression button.

  7. In the the Variable Value Configuration dialog, enter the configuration information.
    This information differs depending on the selected Plug-In.

    • For a Datasource Variable Value Provider, enter the following information:

      • JNDI Data Source Name: Select a JDBC datasource in the list.
        This datasource must be defined in the application server. It is used to connect the database acting as the variable value provider.

    • For an LDAP Variable Value Provider, enter the following information:

      • LDAP Host: Name or IP address of the LDAP server.

      • LDAP Port: Listening port of the LDAP server. The port is typically 389 for non-SSL connections, and 636 for SSL connections.

      • Use SSL: Check this options to use SSL to connect to the LDAP server.

      • User: Name of the user used to retrieve data from the LDAP Server. Note that this user should have read privileges to the LDAP structure.

      • Password: This user’s password.

  8. Click OK to close the Variable Value Configuration dialog.

  9. Click Finish.

The variable value provider is added to the list.

Testing the Variable Value Provider Configuration

After configuring a new variable value provider, it is recommended to test its configuration.

to test a variable value provider configuration:

  1. In the Variable Value Providers editor, select of the variable value provider in the list.

  2. Right-click and select Test Configuration.

A message indicates whether the connection test was successful or not.

The configuration test only tests the connection information, but does not check the privileges granted to the user to retrieve the values from the provider.

Managing Plug-ins

Stambia MDM allows extending its capabilities using Java code and external APIs. Using the Open Plug-in Architecture, existing services or information systems can contribute to the master data processing and enrichment. You can extend the Enrichment and Validation capabilities in Stambia MDM through user-defined plug-ins.

For detailed information about plug-in development and packaging, see the Stambia MDM Plug-in Development Guide.

A Plug-in is delivered as a jar file bundle that must be deployed in each Stambia MDM application instance running integration jobs that use the plug-in. You do not need to restart the server to take new or updated bundles into account.

These bundles are tagged with a version number. As a consequence, updating an existing plug-in with a newer version of this plug-in will automatically make the platform work with the newer plug-in version. The deployment process installs a new plug-in or replaces an existing plug-in version with a new one.

To deploy a plug-in:

  1. Open the Administration Console perspective.

  2. Double-click the Plug-ins node in the Administration view.

  3. Click the image Install or Update Plug-in button in the upper right corner of the Plug-ins editor. The Install/Update Plug-ins dialog opens.

  4. Click the Browse button and select the plug-in binary file. For example: com.acme.phoneStandardizer_1.0.0.jar.

  5. Click OK. A Status window shows the number of plug-ins installed or updated.

  6. Your session is closed to take this new plug-in into account. Click the link to restart the session on the Overview perspective.

  7. Open the Administration Console perspective.

  8. Double-click the Plug-ins node from the Administration view.

The plug-in now appears in the list, and can be used in the models and the integration jobs.

Make sure to install the plug-ins required by the jobs of a model before creating a data edition using this model. If a job requires a plug-in that is not installed, then the job fails. The plug-in can be installed and the job resumed after the installation.

To uninstall a plug-in:

  1. Open the Administration Console perspective.

  2. Double-click the Plug-ins node in the Administration view.

  3. Select the plug-in in the list.

  4. Click the image Uninstall Selected Plug-ins button in the editor’s toolbar.

Managing Web Services

The Web Services are not configured to start by default. It is possible to start them manually or configure them to start with the platform.

To access the Web Service Manager:

  1. Open the Administration Console perspective.

  2. Double-click the Web Services Manager node in the Administration view. the Web Services Manager Configuration editor opens.

This editor displays the various web services with their status, and allows managing web service startup:

  • Select the Auto-Start option to have a web service start automatically when the application starts. Press CRTL+S to save this configuration change.

  • Use the image Start, image Stop and image Restart buttons in the editor’s toolbar to start, stop or restart the selected web service.

  • Use the image Restart all Web Services in the editor’s toolbar to stop all web services and restart those configured to auto-start.

  • When a web service is started, you can access its WSDL by double-clicking its WSDL URL in the services list. In the Web Service Details dialog that opens, click the WSDL URL link to view the WSDL content.

Managing Applications Configuration

The global behavior of the MDM Applications is configured from the Administration Console Perspective.

Changes performed in this configuration apply to all the applications started in the instance of Stambia MDM.
The semarchyAdmin role is required to configure the global application parameters.

To set the Applications Configuration:

  1. Open the Administration Console perspective.

  2. Double-click the Applications Configuration node in the Administration view. the Applications Configuration editor opens.

This editor displays the global application parameters:

  • Set the CSV Export Limit and Excel Export Limit to define the maximum export size allowed in each format. Note that generating export files is resource consuming on the server. It is recommended to test the scalability of the new export limits.

  • Change the Header Logo that appears for all applications and in the welcome page by uploading a 120x30px picture.

Configuring Pulse Metrics

The Pulse Metrics collection schedule for the Stambia MDM instance is defined as a Pulse Configuration.

To define the Pulse Configuration:

  1. Open the Administration Console perspective.

  2. Double-click the Pulse Configuration node in the Administration view. The Pulse Configuration editor opens.

  3. Select the Active option to make the schedule active.

  4. In the Cron Schedule field, enter a Cron expression defining the schedule. You can also use the image Edit Expression button to edit a simple Cron expression using a dialog.

  5. Optionally enter a Description for the Pulse Configuration.

  6. Select the Data Locations for which you want to collect metrics.

  7. Press CTRL+S to save the editor.

From this editor, you can also perform review the Latest Logs for the metrics collection process, and perform the following actions:

  • image Drop and Recreate Pulse Metrics Warehouse: This action resets entirely the warehouse content.

  • image Load Pulse Metrics: This action force an unscheduled collection of the metrics.

Managing Execution

The Execution Engine processes the jobs submitted by the integration batch poller. It orchestrates the certification process for golden data.

Understanding Jobs, Queues and Logs

Jobs are processing units that run in the execution engine. There are two main types of jobs running in the execution engine:

  • Integration Jobs that process incoming batches to perform golden data certification.

  • Deployment Jobs that deploy new model editions in data locations.

Jobs are processed in Queues. Queues work in First-In-First-Out (FIFO) mode. When a job runs in the queue, the next jobs are queued and wait for their turn to run. To run two jobs in parallel, it is necessary to distribute them into different queues.

Queues are grouped into Clusters. There is one cluster per Data Location, named after the data location.

System Queues and Clusters

Specific queues and cluster exist for administrative jobs:

  • For each data location cluster, a specific System Queue called SEM_SYS_QUEUE is automatically created. This queue is used to run administrative operations for the data location. For example, this queue processes the deployment jobs updating the data structures in the data location.

  • A specific System Cluster cluster called SEM_SYS_CLUSTER, which contains a single SEM_SYS_QUEUE queue, is used to run platform-level maintenance operations.

Job Priority

As a general rule, integration jobs are processed in their queues, and have the same priority.
There are two exceptions to this rule:

  • Jobs updating the data location, principally model edition Deployment Jobs.

  • Platform Maintenance Jobs that updating the entire platform.

Model Edition Deployment Job

When a new model edition is deployed and requires data structure changes, DDL commands are issued as part of a job called DB Install<model edition name>. This job is launched in the in the SEM_SYS_QUEUE queue of the data location cluster.

This job modifies the tables used by the DML statements from the integration jobs. As a consequence, it needs to run while no integration job runs. This job takes precedence over all other queued jobs in the cluster, which means that:

  1. Jobs currently running in the cluster are completed normally.

  2. All the queues in the cluster, except the SEM_SYS_QUEUE are moved to a BLOCKED status. Queued jobs remain in the queue and are no longer executed.

  3. The model edition deployment job is executed in the SEM_SYS_QUEUE.

  4. When this job is finished, the other queues return to the READY status and resume the processing of their queued jobs.

This execution model guarantees a minimum downtime of the integration activity while avoiding conflicts between integration jobs and model edition deployment.

Platform Maintenance Job

If a job is queued in the SEM_SYS_CLUSTER/SEM_SYS_QUEUE queue, it takes precedence over all other queued jobs in the execution engine.

This means that:

  1. Jobs currently running in all the clusters are completed.

  2. All the clusters and queues except the SEM_SYS_CLUSTER/SEM_SYS_QUEUE are moved to a BLOCKED status. Queued jobs are no longer executed in these queues/clusters.

  3. The job in the in the SEM_SYS_CLUSTER/SEM_SYS_QUEUE is executed.

  4. When this job is finished, the other queues are moved to the READY status and resume the processing of their queued jobs.

This execution model guarantees a minimal disruption of the platform activity while avoiding conflicts between the platform activity and maintenance operations.

Queue Behavior on Error

When a job running in a queue encounters a run-time error, it behaves differently depending on the queue configuration:

  • If the queue is configured to Suspend on Error, the job hangs on the error point, and blocks the rest of the queued jobs. This job can be resumed when the cause of the error is fixed, or can be cancelled by user choice.

  • If the queue is not configured to Suspend on Error, the job fails automatically and the next jobs in the queue are executed. The failed job cannot be restarted.

The integration jobs are processed in a FIFO mode, a job that is failed automatically or cancelled by user choice cannot be restarted. To resubmit the source data for certification, the external load needs to be resubmitted entirely as a new load.
The integration job performs a commit after each task. As a consequence, when a job fails or is suspended, already processed entities have their golden data certified and committed in the hub.
A user can explicitly choose to halt a running job by suspending it. When such a use operation is performed, the job is considered in error and can be restarted or canceled.

Suspending a job on error is the preferred configuration under the following assumptions:

  1. All the data in a batch needs to be integrated as one single atomic operation.
    For example, due to referential integrity, it is not possible to integrate contacts without customers and vice versa. Suspending the job guarantees that it can be continued - after fixing the cause of the error - with the data location preserved in the same state.

  2. Batches and integration jobs are submitted in a precise sequence that represents the changes in the source, and need to be processed in the order they were submitted.
    For example, missing a data value change in the suspended batch that may impact the consolidation of future batches. Suspending the job guarantees that the jobs are processed in their exact submission sequence, and no batch is skipped without an explicit user choice.

There may be some cases when this behavior can be changed:

  • If the batches/jobs do not have strong integrity or sequencing requirement, then they can be skipped on error by default. These jobs can run in a queue where Suspend on Error is disabled.

  • If the integration velocity is critical for making golden data available as quickly as possible, it is possible to configure the queue running the integration job with Suspend on Error disabled.

Queue Status

A queue is in one the following statuses:

  • READY: The queue is available for processing jobs.

  • SUSPENDED: The queue is blocked because a job has encountered an error or was suspended by the user. This job remains suspended. Queued jobs are not processed until the queues becomes READY again, either when the job is cancelled or finishes successfully. For more information, see the Troubleshooting Errors section.

  • BLOCKED: When a job is running in the SEM_SYS_QUEUE queue of the cluster, the other queues are moved to this status. Jobs cannot be executed in a blocked queue and remain queued until the queue becomes READY again.

A cluster can be in one the following statuses:

  • READY: The cluster is not blocked by the SEM_SYS_CLUSTER cluster, and queues under this cluster can process jobs.

  • BLOCKED: The cluster is blocked when a job is running in the SEM_SYS_CLUSTER cluster. When a cluster is blocked, all its attached queues are also blocked.

Managing the Execution Engine and the Queues

Accessing the Execution Engine

To access the execution engine:

  1. In the Administration view, double-click the Execution Engine node.
    The Execution Engine editor opens.

The Execution Engine Editor

This editor displays the list of queues, grouped by clusters. If a queue is currently pending on a suspended job, it appear in red.

From the Execution Engine editor, you can perform the following operations:

Changing the Queue Behavior on Error

See the Troubleshooting Errors and the Queue Behavior on Error sections for more information about queue behavior on error and error management.

To change the queue behavior on error:

  1. In the Administration view, double-click the Execution Engine node. The Execution Engine editor opens.

  2. Select or de-select the Suspend on Error option for a queue to set its behavior on error or on a cluster to set the behavior of all queues in this cluster.

  3. Press CTRL+S to save the configuration. This configuration is immediately active.

Opening an Execution Console for a Queue

The execution console provides the details of the activity of a given queue. This information is useful to monitor the activity of jobs running in the queue, and to troubleshoot errors.

The content of the execution console is not persisted. Executions prior to opening the console are not displayed in this console. Besides, if the console is closed, its content is lost.

To open the execution console:

  1. In the Administration view, double-click the Execution Engine node. The execution engine editor appears.

  2. Select the queue, right-click and select Open Execution Console.
    The Console view for this queue opens. Note that it is possible to open multiple execution consoles to monitor the activity of multiple queues.

In the Console view toolbar you have access to the following operations:

  • The image Close Console button closes the current console. The consoles for the other queues remain open.

  • The image Clear Console button clears the content of the current console.

  • The image Display Selected Log button allows you to select one of the execution consoles currently open.

Suspending a Job Running in a Queue

To restart a suspended job in a queue:

  1. In the Administration view, double-click the Execution Engine node. The execution engine editor appears..

  2. Select the queue that contains one Running Job.

  3. Right-click and then select Suspend Job.
    The job is suspending and the queue switches to the SUSPENDED status.

Suspending the job is an operation that should be performed with care, as respecting the sequence of the submitted job have strong impact on the consistency of the data in the hub.

Restarting a Suspended Job in a Queue

To restart a suspended job in a queue:

  1. In the Administration view, double-click the Execution Engine node. The execution engine editor appears. The suspended queue appears in red.

  2. Select the suspended queue.

  3. Right-click and then select Restart Job.
    The job restarts from the failed step. If the execution console for this queue is open, the details of the execution are shown in the Console.

Canceling a Suspended Job in a Queue

To cancel a suspended job in a queue:

  1. In the Administration view, double-click the Execution Engine node. The execution engine editor appears. The suspended queue appears in red.

  2. Select the suspended queue.

  3. Right-click and then select Cancel Job.
    The job is cancelled, the queue become READY and starts processing queued jobs.
    In the job logs, this job appears in Error status.

Managing Jobs Logs

The job logs display the jobs being executed or executed in the past by the execution engine. Reviewing the job logs allows you to monitor the activity of these jobs and troubleshoot execution errors.

Accessing the Job Logs

To access the logs:

  1. Open the Administration Console perspective.

  2. In the Administration View, double click the Job Logs node.

  3. The Job Logs editor opens.

The Job Logs Editor

From this editor you can review the job execution logs and drill down into these logs.

The following actions are available from the Job Logs editor toolbar.

  • Use the image Refresh button to refresh the view.

  • Use the image Auto Fit Column Width button to adjust the size of the columns.

  • Use the image Apply and Manage User Defined Filters button to filter the log. See the Filtering the Logs section for more information.

  • Use the image Purge Selection button to delete the entries selected in the job logs table. See the Purging the Logs section for more information.

  • Use the image Purge using a Filter button to purge logs using an existing or a new filter. See the Purging the Logs section for more information.

Drilling Down into the Logs

The Job Logs editor displays the log list. This view includes:

  • The Name, Start Date, End Date and Duration of the job as well as the name of its creator (Created By).

  • The Message returned by the job execution. This message is empty if the job is successful.

  • The rows statistics for the Job:

    • Select Count, Insert Count, Update Count, Deleted Count: number of rows selected, inserted, updated, deleted, merged as part of this job.

    • Row Count: Sum of all the Select, Insert, etc metrics.

To drill down into the logs:

  1. Double-click on a log entry in the Job Logs editor.

  2. The Job Log editor open. It displays all he information available in the job logs list, plus:

    • The Job Definition: This link opens the job definition for this log.

    • The Job Log Parameters: The startup parameters for this job. For example, the Batch ID and Load ID.

    • The Tasks: In this list, each entity is displayed with the statistics for this integration job instance.

  3. Double-Click one entity in the Tasks list to drill down into the Task Group Log corresponding to this entity. The Task Group Log for the entity shows the details of the entity task, and the list of task groups performed for the entity. These tasks groups represent the successive operations performed for the given entity. For example: Enrich and Standardize, Validate Source Data, etc.

  4. Double-click one of the task groups in the Task Log list to drill down into one of the tasks group. Each tasks may contain one of more tasks or child task groups. For example, the Enrich and Standardize task group contains the log of the enrichers executed for the given entity.

  5. Double click of the task group to drill down down into the Task Log.

  6. The task log shows the task statistics, and provides a link to the Task Definition.

By drilling down into the task groups down to the task, it is possible to monitor the activity of a job, and review in the definition the executed code or plug-in.

Filtering the Logs

To create a job log filter:

  1. In the Job Logs editor, click the image Apply and Manage User Defined Filters button and then select Search. The Define Filter dialog opens.

  2. Provide the filtering criteria:

    • Job Name: Name of the job. Use the _ and % wildcards to represent one or any number of characters.

    • Created By: Name of the job creator. Use the _ and % wildcards to represent one or any number of characters.

    • Status: Select the list of job statuses included in the filter.

    • Only Include: Check this option to limit the filter to the logs before/after a certain number of executions or a certain point in time. Note that the time considered is the job start time.

  3. Click the Save as Preferred Filter option and enter a filter name to save this filter.

Saved filters appear when you click the Apply and Manage User Defined Filters button.
You can enable of disable a filter by marking it as active or inactive from this menu. You can also use the Apply All and Apply None to enable/disable all saved filters.

Filters are saved in the user preferences and can be shared using preferences import/export.

To manage job log filters:

  1. Click the image Apply and Manage User Defined Filters button, then select Manage Filters. The Manage User Filters editor opens.

  2. From this editor, you can add, delete or edit a filter, and enable disable filters for the current view.

  3. Click Finish to apply your changes.

Purging the Logs

You can purge selected job logs or all job logs returned by a filter.

To purge selected job logs:

  1. In the Job Logs editor, select the job logs that you want to purge. Press the CTRL key to select multiple lines or the SHIFT key to select a range of lines.

  2. Click the image Purge Selection button.

  3. Click OK in the confirmation window.
    The selected job logs are deleted.

To purge filtered job logs:

  1. In the Job Logs editor, click the image Purge using a Filter button.

    • To use an existing filter:

      1. Select the Use Existing Filter option.

      2. Select a filter from the list and then press Finish.

    • To create a new filter:

      1. Select the Define New Filter option and then click Next.

      2. Provide the filter parameters, as explained in the Filtering the Logs section and then click Finish.

    • To purge all logs (no filter):

      1. Select the Purge All Logs (No Filter) option and then click Finish.

The jobs logs are purged.

It is possible to trigger job logs purges through web services. The Administration Service exposes such operations.

Troubleshooting Errors

When a job fails, depending on the configuration of the queue into which this job runs, it is either in a Suspended or Error status.

The status of the job defines the possible actions on this job.

  • A job in Error cannot be continued or restarted. It can be reviewed for analysis, and possible fixes will only affect subsequent jobs.

  • A Suspended job blocks the entire queue, and can be restarted after fixing the problem, or cancelled.

You have several capabilities in Stambia MDM to help you troubleshooting issues. You can drill down in the erroneous task to identify the issue or restart the job with the Execution Console activated

To troubleshoot an error:

  1. Open the Job Logs.

  2. Double-click the log entry marked as image Suspended or in image Error.

  3. Drill down into the Task Log, as explained in the Drilling Down into the Logs section.

  4. In the Task Log, review the Message.

  5. Click the Task Definition link to open the task definition and review the SQL Statements involved, or the plug-in called in this task.

Scheduling Data Purges

Data Purge helps you maintain a reasonable storage volume for the MDM hub and the repository by pruning the history of data changes and job logs.

Introduction to Data Purge

The MDM hub stores the lineage of the certified golden data, as well as the changes that led to this golden data.
Preserving the lineage and history is a master data governance requirement. It is key in a regulatory compliance focus. However, keeping this information may also create a large volume of data in the hub storage.

To make sure lineage and history are preserved according to the data governance and compliance requirements, model designers define Data Retention Policy in the model. To keep a reasonable volume of information, administrators have to schedule regular Purges for this data.

Purges are managed by the Purge Scheduler. This service manages purge schedules, and triggers the appropriate purge job on the execution engine to prune the lineage and history according to the Data Retention Policy.

The purges delete the following elements of the lineage and history:

  • Source data pushed in the hub via external loads

  • Data authored or modified in data entry workflows

  • Errors detected on the source data by the integration job

  • Errors detected on the candidate golden records by the integration job

  • Duplicate choices made in duplicate management workflows. The duplicate management decision still applies, but the time of the decision and the decision maker information are deleted.

Optionally, the job logs can also be deleted as part of the purge process.

Accessing the Purge Scheduler

To access the purge scheduler:

  1. Open the Administration Console perspective.

  2. In the Administration View, double click the Purge Scheduler node. The Purge Scheduler editor opens.

This editor displays the scheduled purges. From the Purge Scheduler editor, you can stop, start or restart the Purge Scheduler service.

Creating a Purge Schedule

To create a purge schedule:

  1. In the Purge Scheduler editor toolbar, click the New Purge Schedule button. The Data Branch Purge Scheduling wizard opens.

  2. Select the data branches that you want to purge and then click the Add >> Button.

  3. Click the Next button.

  4. Set the schedule for the purge with a purge frequency (Monthly, Weekly, Daily) or as a Cron Expression.

  5. Select the Purge Repository Logs option to prune the logs related to the purged history and lineage.

  6. Click Finish to close the wizard.

  7. Press CTRL+S to save the editor.

Regardless of the frequency of the purges scheduled by the administrator, the data history retained is as defined by the model designer in the data retention policies.
It is possible to trigger data purges through web services. The Administration Service exposes such operations.

Managing the Security

The application uses role-based security and privilege grants for accessing the Stambia MDM features as well as the data contained in the MDM hub.

Understanding the Security Model

Platform-Level and Model-Level Security

There are two levels of security in Stambia MDM:

  • Platform-Level Security defines access to the features of the platform. For example, access to the administrative features, or access to the design-time capabilities. Platform-level security sets platform users’ privileges (who can design models, monitor executions, manage security, etc.), and should be managed by the platform administrator.

  • Model-Level Security defines security privileges to access and modify data in the data editions. Defining these privileges is a data governance decision and should be defined as part of the data governance initiative. Defining Model Security is covered in the Securing Data chapter of the Stambia MDM Developer’s Guide.

Role-Based Security

Both levels of security are role-based:

  • The Privileges (platform level/model level) are granted to Roles in Stambia MDM.

  • These Roles are declared in Stambia MDM. The roles declared in Stambia MDM must map roles that pre-exist in the application server. These application server roles are created and granted to application server users as part of the application server configuration

  • Users logging in to Stambia MDM use their application server credentials. Users, passwords, groups and roles are not owned or stored in Stambia MDM.

Depending on the application server hosting the Stambia MDM application, the roles/user association may be made through a concept of group: A user belongs to a group and the role is granted to the group.
Depending on its configuration, the application server may delegate user authentication and management in general to a security provider (SSO, LDAP, etc…).

Note that roles are not only used for security purposes. They are also used as email aliases for email notifications.

Security Context

When you log in to the Stambia MDM Workbench:

  1. You enter the user and password in the Stambia MDM login window.

  2. This information is passed to the application server.

  3. The application server itself or its security provider (SSO, LDAP, etc.) authenticates the user, gets the list of roles associated to this user (possibly via groups) and returns this list of roles in the session’s Security Context.

  4. Stambia MDM starts a session with this security context, allowing:

    • Certain platform features depending on the Platform-Level Privileges granted to the roles.

    • Certain data access/modification capabilities depending on the Model-Level Privileges granted to the roles.

Stambia MDM enforces security at several layers in the application. Insufficient privileges for a user will reflect in the user interface as missing elements or disabled menus. For example, a user with no privileges on Data Location will not see any of the Data Location links in his Overview perspective.
Stambia MDM does not store the users, password and and user/roles associations. All this critical information remains in the application sever or in the enterprise security provider.

Privilege Precedence

Privileges apply in order of precedence: Read/Write then Read then None. As a consequence, a user always has the best privileges associated to his roles.

For example: The user John has two user-defined roles granted to him:

  • ModelDesigner has Read privileges for Job and Job Log Administration and Read/Write for Model Design.

  • ProductionUser has Read/Write privileges for Job and Job Log Administration and None on for Model Design

The resulting privileges for John are Read/Write for both Job and Job Log Administration and Model Design.

Privileges Description

The following table describes the platform privileges that you can grant to a role:

Platform Privilege Description

Data Location

Grants access to all components of the Data Locations perspective and to the Notification Servers and the Variable Value Providers in the Administration Console perspective. Write privileges are needed to create data editions, deploy new model editions and create data editions. Write privileges are also required to create and modify variable value providers and notification servers.

Model Design

Grants access to all the components of the Model Administration (to manage model editions/version control) and Model Edition (to design models) perspective. Write privileges are needed to modify models and create new model editions.

Execution Engine

Grants access to the Execution Engine, Integration Batch Poller and Web Services Manager components in the Administration Console Perspective. Write privileges are needed to start/stop and configure these components.

Job and Job Log Administration

Grants access to Job Logs and Job Definitions in the Administration Console Perspective. Write privileges are needed to purge the logs. Note that you need the Execution Engine privileges to restart jobs in queues.

Logging Configuration

Grants access to the Logging Configuration component in the Administration Console Perspective. Write privileges are needed to modify this configuration.

Plug-ins Administration

Grants access to the Plug-ins component in the Administration Console Perspective. Write privileges are needed to add new plug-ins.

Built-in Roles

The following roles are built in the platform:

  • semarchyConnect: This role must be granted for a user to log in. It should be granted by default to all users connecting to Stambia MDM.

  • semarchyAdmin : This role has full access to all the features of the platform. semarchyAdmin is the only role that gives you access to the Roles in the Administration Console perspective.

When a creating a new model, a model-level privilege grant is automatically created for the semarchyAdmin role, giving this role full access to the data. By modifying this privilege grant, the model designer can reduce the privileges of the semarchyAdmin role on the data.
Be cautious when granting the semarchyAdmin role. This role defines a super user who can create roles, grant privileges and update the license information.

Managing Roles and Privileges

Creating the Roles and Users in the Application Server Security Realm

Before declaring a new role in Stambia MDM, make sure that this role is defined in the application server and that a user is granted with this role and the semarchyConnect role to log in to Stambia MDM.

The role/user creation procedure depends on the application server hosting Stambia MDM. Please refer to your application server documentation for more information.

An example is given below for creating a role and a user for Apache Tomcat.

To configure a new role and user for Stambia MDM:

  1. Stop the Apache Tomcat Server using the stop the Apache Tomcat server using <tomcat>/bin/shutdown.bat (Windows) or <tomcat>/bin/shutdown.sh (UNIX/Linux), where <tomcat> is the Apache Tomcat installation folder.

  2. Edit the <tomcat>/conf/tomcat-users.xml file.

  3. In the <tomcat-users> section, add the following lines (<password> is the password for this user):

     <role rolename="MDMDev">
     <user username="john" password="<password>" roles="semarchyConnect,MDMDev"/>
  4. Save the file.

  5. Restart the Apache Tomcat server using <tomcat>/bin/startup.bat (Windows) or <tomcat>/bin/startup.sh (UNIX/Linux).

A new role MDMDev is created. The user john is also created with the semarchyConnect built-in role and the MDMDev role.

Declaring the Roles in Stambia MDM

To create new role:

  1. Open the Administration Console perspective.

  2. In the Administration View, double click the Roles node.

  3. In the Roles editor, right-click Roles table and select image New Role. The Install Role wizard opens.

  4. Enter the following information:

    • Name: Role name. This role name must exactly match the role name in the application server security configuration. For example: MDMDev.

    • Label: User-friendly label for the role. Note that as the Auto Fill box is checked, the Label is automatically filled in. Modifying this label is optional.

    • Email(s): Enter a comma-separated list of email addresses of recipients for notifications sent to this role.

  5. Click Next.

  6. Select the privileges to grant to this role. For example: Model Design: Read/Write, Job and Job Log Administration: Read.

  7. Click Finish.
    The role is created. You can connect a user with this role to test the set of privileges.

Make sure to use a role name that matches exactly (with the same case) a role name defined in the application server configuration.

Sample Roles

You can use the following role examples in a typical Stambia MDM configuration:

Platform Privilege Dev Operator Deployer

Data Location

Read

Read

Read/Write

Model Design

Read/Write

None

Read

Execution Engine

Read

Read/Write

Read

Job and Job Log Administration

Read

Read/Write

Read

Logging Configuration

None

Read/Write

None

Plug-ins Administration

None

Read

Read/Write

These roles are given as examples and should be adapted to your environment’s requirements.

Administering using Web Services

SOA-enabled applications use Stambia MDM Web Services to interact with the platform and the data in the MDM Hub.

The Platform Services provide platform-level information and operations which can be used for monitoring and managing the platform.

The Platform Services include:

  • The Platform Web Service that provides access to the platform status.

  • The Administration Service that exposes administrative features such as purges.

  • The Metadata Web Service that provides read access to the model metadata. This web service is used by the Client API.

  • The Integration Load Web Service that allows loading data into the hub. This web services is explained the Stambia MDM Integration Guide.

Platform Web Service

The Platform Web Service provides generic information about the platform. Integrators may use this service to check the platform status and list the data editions available via this platform instance.

This web service provides the following bindings:

Binding Name Description

getAPIVersion

Returns the version of the API supported by the platform.

getDataEditionInfo

Returns information about a given data edition (identified by the data location name, data branch ID and data edition ID).

getDataEditionInfos

Returns information about a one of more data editions (optionally filtered by data location name, data branch ID or data edition ID).

getPlatformBuildID

Return the platform build number.

getPlatformStatus

Return the platform status. The normal status returned is PLATFORM_READY.

getUserName

Returns the name of the connected user.

getUserPlatformRoles

Returns the list of roles of the authenticated user.

isUserInRole

Returns true if a given user is in a given role.

Ping

Simple ping binding.

Administration Service

The Administrative Service provides administrative operations such as data edition closing, log and data purges.Integrators may use this service to automate purges.

This web service provides the following bindings:

Binding Name Description

closeAndCreateNewDataEdition

Closes the currently open data edition in a data location and a branch, and opens a new one with the description provided.

purgeDataBranch

Purges a data branch from outdated data according to the model’s data retention policies. The data branch is identified by a data location name and a data branch ID. Optionally purges the job logs with the data.

purgeJobLogByFilter

Purges job logs filtered by job name, status, creator name and timeframe.

purgeJobLogByIds

Purges job logs identiifed by their unique IDs.

Metadata Web Service

The Metadata Web Service provides a binary serialization of a given data model.
This web service is not for public consumption. It is used exclusively by the Stambia MDM Client API.