Constructs a KMeans instance with default parameters: {k: 2, maxIterations: 20, runs: 1, initializationMode: "k-means||", initializationSteps: 5, epsilon: 1e-4, seed: random}.
Constructs a KMeans instance with default parameters: {k: 2, maxIterations: 20, runs: 1, initializationMode: "k-means||", initializationSteps: 5, epsilon: 1e-4, seed: random}.
The distance threshold within which we've consider centers to have converged.
The distance threshold within which we've consider centers to have converged.
The initialization algorithm.
The initialization algorithm. This can be either "random" or "k-means||".
Number of steps for the k-means|| initialization mode
Number of steps for the k-means|| initialization mode
Number of clusters to create (k).
Number of clusters to create (k).
Maximum number of iterations allowed.
Maximum number of iterations allowed.
This function has no effect since Spark 2.0.0.
This function has no effect since Spark 2.0.0.
The random seed for cluster initialization.
The random seed for cluster initialization.
Train a K-means model on the given set of points; data
should be cached for high
performance, because this is an iterative algorithm.
Train a K-means model on the given set of points; data
should be cached for high
performance, because this is an iterative algorithm.
Set the distance threshold within which we've consider centers to have converged.
Set the distance threshold within which we've consider centers to have converged. If all centers move less than this Euclidean distance, we stop iterating one run.
Set the initial starting point, bypassing the random initialization or k-means|| The condition model.k == this.k must be met, failure results in an IllegalArgumentException.
Set the initial starting point, bypassing the random initialization or k-means|| The condition model.k == this.k must be met, failure results in an IllegalArgumentException.
Set the initialization algorithm.
Set the initialization algorithm. This can be either "random" to choose random points as initial cluster centers, or "k-means||" to use a parallel variant of k-means++ (Bahmani et al., Scalable K-Means++, VLDB 2012). Default: k-means||.
Set the number of steps for the k-means|| initialization mode.
Set the number of steps for the k-means|| initialization mode. This is an advanced setting -- the default of 5 is almost always enough. Default: 5.
Set the number of clusters to create (k).
Set the number of clusters to create (k). Default: 2.
Set maximum number of iterations allowed.
Set maximum number of iterations allowed. Default: 20.
This function has no effect since Spark 2.0.0.
This function has no effect since Spark 2.0.0.
Set the random seed for cluster initialization.
Set the random seed for cluster initialization.
K-means clustering with a k-means++ like initialization mode (the k-means|| algorithm by Bahmani et al).
This is an iterative algorithm that will make multiple passes over the data, so any RDDs given to it should be cached by the user.