Concepts → Data Retention
Overview
Data retention refers to the practice of maintaining data for a specific period. Starting 2024.7.0, Incorta has introduced new table-level settings for schema managers to better control the data stored on disk based on predefined criteria. This feature allows schema managers to specify criteria for keeping or deleting data on the shared storage, promoting efficient data management, improving performance, and optimizing resource and disk space usage.
Data retention policies can apply only to physical schema tables and materialized views (MVs). A purge job is required to remove data that does not meet the retention criteria.
Creating a data retention policy
Data retention settings can be configured on the Advanced Settings tab of the Table Editor for any physical table or MV. You can define data retention policies using time-window configurations or custom conditions.
Exercise caution when setting criteria for data retention. Once data is purged, it is irretrievable. You can fully load affected tables and MVs to recover from an accidental purge operation.
Data retention via a time-window configuration
For time-window retention policies, you specify the time window based on a date or timestamp column in the table or MV. Records within the defined time window will be retained, while those outside will be marked for deletion during the next purge job.
Data retention via a custom condition
You can also create a custom condition that defines which data to retain. Records satisfying the custom condition will remain while those not satisfying the condition will be marked for deletion.
Custom conditions offer more flexibility than time-window configurations. Within a custom condition, you can:
- Reference columns of different data types
- Use system variables
- Use different types of built-in functions
- Specify more complex conditions
Purging unneeded data
After configuring a data retention policy, data that does not meet the retention criteria can be removed via a data purge job. You can execute a purge job manually for individual tables or MVs or across all tables and MVs within a schema in the same dialogues you use to perform other load actions. Alternatively, you can schedule purge jobs (via a load plan) to clean up data from one or more physical schemas.
The purge job creates a new version of table data (Parquet files). These files may have records in a different order compared to the previous versions.
For more details about data purge jobs, refer to Concepts → Data Purge.