Skip to content

Quality Rules

Quality Rules help ensure that your collected data is accurate, consistent, and reliable. By applying quality rules, you can automatically detect errors, maintain high-quality data, and make better decisions based on validated information.

You have two options for creating and applying rules:

  • Create custom rules

  • Generate AI suggested rules

The system sends 100 runs as a sample to the AI. It analyzes the selected run, generates rules and applies them to subsequent runs. This ensures that similar patterns and issues are automatically monitored and validated in future data collection.

Types of Rules

Type Description
Custom Rule - Provides complete control over your data validation logic.
- Allows you to manually create and configure rules to meet your specific needs.
- Ideal when you have known criteria or business rules that must be enforced consistently.
Generate Suggested Rule - Leverages AI to analyze data patterns and suggest rules automatically.
- Helps identify potential issues that may not be obvious manually.
- You can choose to apply a specific rule or select all suggested rules to ensure comprehensive coverage.

Quality Rule Steps

  • Step 1: Navigate to the Datasets section.
  • Step 2: Select the run for which you want to set a rule.
  • Step 3: Click on Set QA Rule.
  • Step 4: You will be directed to the Generate Rule section, where you can select the page to which you want to apply the rule.
  • Step 5: You can either choose to create custom rules or ask AI to generate rules.
  • Step 6: When the rule is applied, view the quality status for the run, which can be Skipped, Failed or Passed.

Managing Quality Rules

All applied rules can be accessed and managed in the Quality Rules section of the platform. Here, you can:

  • View a list of all active and inactive rules.

  • View the columns and pages that you have selected for each rule.

  • Activate or deactivate rules depending on project needs.

  • Edit existing rules to fine-tune their logic.

  • Delete rules that are no longer relevant.

By actively managing quality rules, you can ensure that your data remains accurate, consistent and aligned with the business objectives. This feature provides flexibility, whether you prefer full manual control or want to leverage AI for smarter rule suggestions.

Types of Quality status

Type Description
Not Started Displayed when the run has not started yet and is still getting ready to begin processing.
Processing Displayed when the run is actively validating the data and processing is in progress.
Skipped Displayed when no rule is set for the run, when the run is merged or imported or when it contains JSON-formatted data. Also used as the default status when no other status applies.
Success Displayed when all applied rules have been successfully validated.
Failed Displayed when any one of the applied rules fails, even if the others have passed.

Limitations

  • Quality Rules can be applied only to standard runs. Runs that are imported, merged, or use JSON-formatted data do not support rule creation.
  • Rules cannot be applied to runs with data sizes larger than 4GB.
  • Rules can be created for only one column at a timeconditional or multi-column rules are not supported.

Categories of Rules

Category Rule Description
Value Set Comparisons expect_column_distinct_values_to_be_in_set(column, value_set) All distinct column values must exist within a given set.
expect_column_distinct_values_to_contain_set(column, value_set) Column must contain all values from the given set (but may have extra values).
expect_column_distinct_values_to_equal_set(column, value_set) Column’s distinct values must exactly match the given set.
Category Rule Description
Value Checks expect_column_values_to_be_between(column, min_value, max_value, strict_min, strict_max) Each value should lie within a specified range.
expect_column_values_to_be_in_set(column, value_set) Each value must be one of the specified set.
expect_column_values_to_be_null(column) Values must be null.
expect_column_values_to_be_of_type(column, type_) Values must match the specified data type.
expect_column_values_to_be_unique(column) Values in the column must not repeat.
expect_column_values_to_not_be_in_set(column, value_set) Values should not appear in the given set.
expect_column_values_to_not_be_null(column) Values must not be null.
Category Rule Description
String Operations expect_column_value_lengths_to_be_between(column, min_value, max_value, strict_min, strict_max) String lengths must be within the given range.
expect_column_value_lengths_to_equal(column, value) String length must equal the given value.
expect_column_values_to_match_regex(column, regex) Values must match the specified regex pattern.
expect_column_values_to_match_regex_list(column, regex_list, match_on) Values must match at least one regex from a list.
expect_column_values_to_not_match_regex(column, regex) Values must not match the regex.
expect_column_values_to_not_match_regex_list(column, regex_list) Values must not match any regex in the list.
Category Rule Description
Multi-Column Expectations expect_column_pair_values_a_to_be_greater_than_b(column_A, column_B, or_equal) Column A’s value should be greater (or equal) than Column B’s value.
expect_column_pair_values_to_be_equal(column_A, column_B) Values in both columns must match exactly.
expect_column_pair_values_to_be_in_set(column_A, column_B, value_pairs_set) Column pairs must exist in a specified set of pairs.
expect_compound_columns_to_be_unique(column_list) Combination of values across multiple columns must be unique.
expect_multicolumn_sum_to_equal(column_list, sum_total) Sum of specified columns must equal a given total.
expect_select_column_values_to_be_unique_within_record(column_list) Values across selected columns in a row must be unique.
Category Rule Description
Table-Level Expectations expect_column_to_exist(column) The column must exist in the table.
expect_table_column_count_to_be_between(min_value, max_value) Number of columns must fall within a range.
expect_table_column_count_to_equal(value) Table must have exactly this many columns.
expect_table_columns_to_match_ordered_list(column_list) Columns must match the specified list in the given order.
expect_table_columns_to_match_set(column_set, exact_match) Table columns must match a set (order doesn’t matter).
expect_table_row_count_to_be_between(min_value, max_value, strict_min, strict_max) Number of rows must be within a range.
expect_table_row_count_to_equal(value) Table must have exactly this many rows.
Category Rule Description
Statistical Measures expect_column_kl_divergence_to_be_less_than(column, partition_object, threshold, ...) KL divergence from reference distribution must be below threshold.
expect_column_max_to_be_between(column, min_value, max_value, strict_min, strict_max) Maximum value must be within a range.
expect_column_median_to_be_between(column, min_value, max_value, strict_min, strict_max) Median value must be within a range.
expect_column_min_to_be_between(column, min_value, max_value, strict_min, strict_max) Minimum value must be within a range.
expect_column_stdev_to_be_between(column, min_value, max_value, strict_min, strict_max) Standard deviation must be within a range.
expect_column_sum_to_be_between(column, min_value, max_value, strict_min, strict_max) Sum of column values must be within a range.
Category Rule Description
Value Distribution expect_column_most_common_value_to_be_in_set(column, value_set, ties_okay) Most frequent value(s) should be in the given set.
expect_column_proportion_of_non_null_values_to_be_between(column, min_value, max_value, ...) Percentage of non-null values must be within range (0–1).
expect_column_proportion_of_unique_values_to_be_between(column, min_value, max_value, ...) Percentage of unique values must be within range (0–1).
expect_column_quantile_values_to_be_between(column, quantile_ranges, allow_relative_error) Specific quantiles must fall within given ranges.
expect_column_unique_value_count_to_be_between(column, min_value, max_value, strict_min, strict_max) Number of unique values must be within a range.
Category Rule Description
Statistical Outliers expect_column_value_z_scores_to_be_less_than(column, threshold, double_sided) Z-scores of values must be below a threshold (detect outliers).