Incorta Copilot Clustering

Introduction to clustering

Clustering is a powerful unsupervised machine learning technique that identifies natural groupings within your data. When applied effectively, clustering reveals hidden patterns and segments that can inform targeted strategies across various business functions.

What is k-means clustering?

K-means clustering (usind in the Copilot) is one of the most popular unsupervised machine learning algorithms. It works by partitioning data points into K distinct groups (clusters) based on their similarity in features or attributes. The algorithm:

Identifies K centroids (center points) in the data space
Assigns each data point to the nearest centroid
Recalculates the centroids based on the mean of all points assigned to that cluster
Repeats until convergence (minimal change in centroid positions)

By default, many systems will set the cluster number to 6 clusters, which often provides a balanced level of segmentation for business applications.

Clustering command syntax

The standard syntax for clustering follows this pattern:

/cluster [dimension] based on [measure(s)]

Example :

/cluster customers based on recency and frequency scores

Key command components:

Dimension: The high-cardinality attribute you want to cluster (e.g., customers, products, locations)
Measures: The metrics or attributes used to determine similarity (e.g., scores, spending, engagement metrics)

Interpreting cluster analysis results

Cluster

The clustering tool provides a comprehensive summary section that helps you understand the generated segments at a glance. This summary typically includes:

Overview Statement: A brief introduction explaining how many segments were identified and the primary factors that differentiate them (e.g., "Based on the clustering analysis, six distinct customer segments have emerged, characterized by their purchase power and behavior.")
Segment Breakdown: A bulleted list of each segment with detailed descriptions that combine:
- Descriptive labels (e.g., "Low-value customers")
- Key metrics (e.g., "with minimal sales orders and average sales volume")
- Business implications where relevant
Strategic Guidance: Some summaries may conclude with overall strategic recommendations based on the segmentation results.

The summary section transforms complex data patterns into business language that can be easily communicated across your organization.

Within the returned dataset, each cluster item will have an assigned cluster value.

Additional features

Query Breakdown: Click the Query Breakdown button (visible in the top right of all three example panels) to see the SQL and logic used to generate the forecast.
- Within the Query Breakdown, the summary will provide the centroid values.

Common business applications

Customer segmentation

Clustering provides a data-driven approach to customer segmentation. By analyzing purchase patterns, engagement metrics, and demographic information, businesses can identify distinct customer groups.

Product categorization

Clustering can automatically categorize products based on:

Purchase patterns
Usage statistics
Feature similarities
Price points
Customer preferences

Geographic analysis

For location-based businesses, clustering helps identify:

High-density customer regions
Underserved markets
Regions with similar purchasing behaviors
Areas requiring targeted marketing approaches

Best practices for effective clustering

Select appropriate features: Choose metrics that meaningfully differentiate your data points
Validate results: Ensure clusters make business sense and are actionable
Review automatic segment names: While the system generates descriptive labels, review them to ensure they align with your business terminology
Update regularly: Re-cluster periodically as your data evolves

Content