Incorta Copilot Clustering

Introduction to clustering

Clustering is a powerful unsupervised machine learning technique that identifies natural groupings within your data. When applied effectively, clustering reveals hidden patterns and segments that can inform targeted strategies across various business functions.

What is k-means clustering?

K-means clustering (usind in the Copilot) is one of the most popular unsupervised machine learning algorithms. It works by partitioning data points into K distinct groups (clusters) based on their similarity in features or attributes. The algorithm:

  1. Identifies K centroids (center points) in the data space
  2. Assigns each data point to the nearest centroid
  3. Recalculates the centroids based on the mean of all points assigned to that cluster
  4. Repeats until convergence (minimal change in centroid positions)

By default, many systems will set the cluster number to 6 clusters, which often provides a balanced level of segmentation for business applications.

Clustering command syntax

The standard syntax for clustering follows this pattern:

/cluster [dimension] based on [measure(s)]

Example :

/cluster customers based on recency and frequency scores

Key command components:

  • Dimension: The high-cardinality attribute you want to cluster (e.g., customers, products, locations)
  • Measures: The metrics or attributes used to determine similarity (e.g., scores, spending, engagement metrics)

Interpreting cluster analysis results

Cluster

The clustering tool provides a comprehensive summary section that helps you understand the generated segments at a glance. This summary typically includes:

  • Overview Statement: A brief introduction explaining how many segments were identified and the primary factors that differentiate them (e.g., "Based on the clustering analysis, six distinct customer segments have emerged, characterized by their purchase power and behavior.")

  • Segment Breakdown: A bulleted list of each segment with detailed descriptions that combine:

    • Descriptive labels (e.g., "Low-value customers")
    • Key metrics (e.g., "with minimal sales orders and average sales volume")
    • Business implications where relevant
  • Strategic Guidance: Some summaries may conclude with overall strategic recommendations based on the segmentation results.

The summary section transforms complex data patterns into business language that can be easily communicated across your organization.

Within the returned dataset, each cluster item will have an assigned cluster value.

Additional features

  • Query Breakdown: Click the Query Breakdown button (visible in the top right of all three example panels) to see the SQL and logic used to generate the forecast.
    • Within the Query Breakdown, the summary will provide the centroid values.

Common business applications

Customer segmentation

Clustering provides a data-driven approach to customer segmentation. By analyzing purchase patterns, engagement metrics, and demographic information, businesses can identify distinct customer groups.

Product categorization

Clustering can automatically categorize products based on:

  • Purchase patterns
  • Usage statistics
  • Feature similarities
  • Price points
  • Customer preferences

Geographic analysis

For location-based businesses, clustering helps identify:

  • High-density customer regions
  • Underserved markets
  • Regions with similar purchasing behaviors
  • Areas requiring targeted marketing approaches

Best practices for effective clustering

  1. Select appropriate features: Choose metrics that meaningfully differentiate your data points
  2. Validate results: Ensure clusters make business sense and are actionable
  3. Review automatic segment names: While the system generates descriptive labels, review them to ensure they align with your business terminology
  4. Update regularly: Re-cluster periodically as your data evolves