OPEN-SOURCE SCRIPT

Static K-means Clustering | InvestorUnknown

406
Static K-Means Clustering is a machine-learning-driven market regime classifier designed for traders who want a data-driven structure instead of subjective indicators or manually drawn zones.

This script performs offline (static) K-means training on your chosen historical window. Using four engineered features:
  • RSI (Momentum)
  • CCI (Price deviation / Mean reversion)
  • CMF (Money flow / Strength)
  • MACD Histogram (Trend acceleration)


It groups past market conditions into K distinct clusters (regimes). After training, every new bar is assigned to the nearest cluster via Euclidean distance in 4-dimensional standardized feature space.

This allows you to create models like:
  • Regime-based long/short filters
  • Volatility phase detectors
  • Trend vs. chop separation
  • Mean-reversion vs. breakout classification
  • Volume-enhanced money-flow regime shifts
  • Full machine-learning trading systems based solely on regimes


Note:
  • * This script is not a universal ML strategy out of the box.
    * The user must engineer the feature set to match their trading style and target market.
    * K-means is a tool, not a ready made system, this script provides the framework.


Core Idea
K-means clustering takes raw, unlabeled market observations and attempts to discover structure by grouping similar bars together.

Pine Script®
// STEP 1 — DATA POINTS ON A COORDINATE PLANE // We start with raw, unlabeled data scattered in 2D space (x/y). // At this point, nothing is grouped—these are just observations. // K-means will try to discover structure by grouping nearby points. // // y ↑ // | // 12 | • // | • // 10 | • // | • // 8 | • • // | // 6 | • // | // 4 | • // | // 2 |______________________________________________→ x // 2 4 6 8 10 12 14 // // // // STEP 2 — RANDOMLY PLACE INITIAL CENTROIDS // The algorithm begins by placing K centroids at random positions. // These centroids act as the temporary “representatives” of clusters. // Their starting positions heavily influence the first assignment step. // // y ↑ // | // 12 | • // | • // 10 | • C2 × // | • // 8 | • • // | // 6 | C1 × • // | // 4 | • // | // 2 |______________________________________________→ x // 2 4 6 8 10 12 14 // // // // STEP 3 — ASSIGN POINTS TO NEAREST CENTROID // Each point is compared to all centroids. // Using simple Euclidean distance, each point joins the cluster // of the centroid it is closest to. // This creates a temporary grouping of the data. // // (Coloring concept shown using labels) // // - Points closer to C1 → Cluster 1 // - Points closer to C2 → Cluster 2 // // y ↑ // | // 12 | 2 // | 1 // 10 | 1 C2 × // | 2 // 8 | 1 2 // | // 6 | C1 × 2 // | // 4 | 1 // | // 2 |______________________________________________→ x // 2 4 6 8 10 12 14 // // (1 = assigned to Cluster 1, 2 = assigned to Cluster 2) // At this stage, clusters are formed purely by distance.


Your chosen historical window becomes the static training dataset, and after fitting, the centroids never change again.

This makes the model:
  • Predictable
  • Repeatable
  • Consistent across backtests
  • Fast for live use (no recalculation of centroids every bar)


Static Training Window

You select a period with:
  • Training Start
  • Training End


Only bars inside this range are used to fit the K-means model. This window defines:
  • the market regime examples
  • the statistical distributions (means/std) for each feature
  • how the centroids will be positioned post-trainin


снимок
  • Bars before training = fully transparent
  • Training bars = gray
  • Post-training bars = full colored regimes


Feature Engineering (4D Input Vector)

Every bar during training becomes a 4-dimensional point: [rsi, cci, cmf, macd_histogram]
This combination balances: momentum, volatility, mean-reversion, trend acceleration giving the algorithm a richer "market fingerprint" per bar.

Standardization
To prevent any feature from dominating due to scale differences (e.g., CMF near zero vs CCI ±200), all features are standardized:

Pine Script®
standardize(value, mean, std) => (value - mean) / std


Centroid Initialization

Centroids start at diverse coordinates using various curves:
  • linear
  • sinusoidal
  • sign-preserving quadratic
  • tanh compression


Pine Script®
init_centroids() => // Spread centroids across [-1, 1] using different shapes per feature for c = 0 to k_clusters - 1 frac = k_clusters == 1 ? 0.0 : c / (k_clusters - 1.0) // 0 → 1 v = frac * 2 - 1 // -1 → +1 array.set(cent_rsi, c, v) // linear array.set(cent_cci, c, math.sin(v)) // sinusoidal array.set(cent_cmf, c, v * v * (v < 0 ? -1 : 1)) // quadratic sign-preserving array.set(cent_mac, c, tanh(v)) // compressed


This makes initial cluster spread “random” even though true randomness is hardly achieved in pinescript.

K-Means Iterative Refinement

The algorithm repeats these steps:
(A) Assignment Step, Each bar is assigned to the nearest centroid via Euclidean distance in 4D:
  • distance = sqrt(dx² + dy² + dz² + dw²)


(B) Update Step, Centroids update to the mean of points assigned to them. This repeats iterations times (configurable).

LIVE REGIME CLASSIFICATION

After training, each new bar is:
  • Standardized using the training mean/std
  • Compared to all centroids
  • Assigned to the nearest cluster
  • Bar color updates based on cluster


No re-training occurs. This ensures:
  • No lookahead bias
  • Clean historical testing
  • Stable regimes over time


CLUSTER BEHAVIOR & TRADING LOGIC

Clusters (0, 1, 2, 3…) hold no inherent meaning. The user defines what each cluster does.
Example of custom actions:
  • Cluster 0 → Cash
  • Cluster 1 → Long
  • Cluster 2 → Short
  • Cluster 3+ → Cash (noise regime)


снимок

This flexibility means:
  • One trader might have cluster 0 as consolidation.
  • Another might repurpose it as a breakout-loading zone.
  • A third might ignore 3 clusters entirely.


снимок
Example on ETHUSD

Important Note:
  • Any change of parameters or chart timeframe or ticker can cause the “order” of clusters to change
  • The script does NOT assume any cluster equals any actionable bias, user decides.


PERFORMANCE METRICS & ROC TABLE

The indicator computes average 1-bar ROC for each cluster in:
  • Training set
  • Test (live) set


This helps measure:
  • Cluster profitability consistency
  • Regime forward predictability
  • Whether a regime is noise, trend, or reversion-biased


снимок

EQUITY SIMULATION & FEES

Designed for close-to-close realistic backtesting.
Position = cluster of previous bar
Fees applied only on regime switches. Meaning:
  • Staying long → no fee
  • Switching long→short → fee applied
  • Switching any→cash → fee applied


Fee input is percentage, but script already converts internally.

снимок

Disclaimers
⚠️ This indicator uses machine-learning but does not predict the future. It classifies similarity to past regimes, nothing more.
⚠️ Backtest results are not indicative of future performance.
⚠️ Clusters have no inherent “bullish” or “bearish” meaning. You must interpret them based on your testing and your own feature engineering.

Отказ от ответственности

Информация и публикации не предназначены для предоставления и не являются финансовыми, инвестиционными, торговыми или другими видами советов или рекомендаций, предоставленных или одобренных TradingView. Подробнее читайте в Условиях использования.