Configure Data Quality Policy

To create data quality policy, do the following:

  1. Click Discover from the side menu bar. The Discover page is displayed.

  2. Search for an asset using the search bar.

  3. On finding the asset, click and click Add Data Quality from the drop-down list. The Create Data Quality Policy page is displayed. create data quality policy

  4. Specify a name for the data quality policy.

  5. Specify a description for the data quality policy.

  6. Click Show Sample Data to view the columns that belong to the table. Click Select Column to mark all the columns that require a rule definition.

    note

    Only the selected columns will appear in Select Column drop-down list while trying to add a rule definition. All columns are displayed in the Select Column drop-down list, by leaving all the columns in the asset unselected.

  7. Click the Rule definition drop-down. Select a rule and specify values for it. The below table describes the different types of rule definitions you can add to the asset.

    Rule DefinitionDescription
    Null ValuesChecks for null values.
    Schema MatchChecks if the column value matches the data type selected.
    Pattern MatchChecks if the column value matches with the pattern provided by you in the input box.
    EnumerationsChecks if the selected column values are present in the list provided.
    Tags MatchChecks if the selected column values are present in the Tag provided.
    Range MatchCheck if a value falls within the selected range.
    Duplicate CheckChecks for distinct values.
    Row CheckChecks for number of rows.
    Business RulesMatches a set of rules that are configured.
    CustomMake a custom condition involving one or more columns for example C1 + C2 > C3.
  8. Click the Check Incrementally toggle button to incrementally check the conditions by selecting one of the following incremental strategies and specify required values accordingly.

    • Auto Increment ID based

      Every time a new row or rows of data are added to the database, they are allotted with an auto-incrementing numeric value. For instance upon adding 1000 rows of data to the database, each row is given an id starting from 1 to 1000. On execution of a policy on the database, the first 1000 rows are taken into consideration. Lets say you added another thousand rows of data to the database. An auto increment id based strategy is used to provide values from the last incremented value of the preceding set of rows, i.e., 1001 to 2000. On re-execution of the policy, only the new set of rows is executed.

    • Partition based

      The incremental profile uses a date based partition column to determine the bounds for selecting data from the data source. Only useful if the data source supports partition.

    • Incremental date based

      The incremental profile uses a monotonically increasing date column to determine the bounds for selecting data from the data source. In order to execute a policy on a database with incremental date based strategy, you need to provide values for the following properties:

    Field nameDescription
    Date ColumnSelect the column name that is used to save dates and time-stamps.
    Date FormatProvide a date format to save the date time-stamp. Example YYYY-MM-DD
    Advance FieldsTimezone: If you are from a different timezone, select a timezone from the drop-down list. Minute Offset: If the selected timezone is offset by a few hours or minutes, then enter the number of minutes in the field provided.
    Round End DateOn checking Round End Date, the last executed date value is rounded up by the frequency that is selected from the Frequency drop-down list for the next execution of the policy. For instance, at 12:20, the last data row was executed, and you checked Round End Date and selected Hourly frequency. Therefore, the next time the policy is executed, it will only be executed on the data created at 13:20 and there after.
  9. Enable the Schedule Execution toggle button to set a time at which the data quality policy has to run. Based on the time selected, fill in the time properties. Enable the Start Schedule Runs toggle.

  10. Enable Alert Configuration toggle button and select one or more of the following channels to receive alerts when the data quality policy has succeeded or when an error has occurred:

    note

    Click the Alert On drop-down button and select success, failure, or all option to receive notifications respectively.

    Email: Email notifications is sent to your default email. Additional mail recipient can be added to also receive alerts.

    Slack: Slack notifications is sent to your default Slack channel. Additional channels can be added to also receive alerts.

    Webhook: Webhook notifications are sent every time a rule execution fails.

    alert configurations

  11. Click the Enable Policy toggle button to activate the policy.

  12. Click the Save Policy button to save your configurations.