Strategy

Instance Constructors

new Strategy(algo: Algo.Algo, impurity: Impurity, maxDepth: Int, numClasses: Int, maxBins: Int, categoricalFeaturesInfo: Map[Integer, Integer])

Java-friendly constructor for org.apache.spark.mllib.tree.configuration.Strategy
new Strategy(algo: Algo.Algo, impurity: Impurity, maxDepth: Int, numClasses: Int = 2, maxBins: Int = 32, quantileCalculationStrategy: QuantileStrategy.QuantileStrategy = ..., categoricalFeaturesInfo: Map[Int, Int] = ..., minInstancesPerNode: Int = 1, minInfoGain: Double = 0.0, maxMemoryInMB: Int = 256, subsamplingRate: Double = 1, useNodeIdCache: Boolean = false, checkpointDir: Option[String] = scala.None, checkpointInterval: Int = 10)

algo
Learning goal. Supported: org.apache.spark.mllib.tree.configuration.Algo.Classification, org.apache.spark.mllib.tree.configuration.Algo.Regression
impurity
Criterion used for information gain calculation. Supported for Classification: org.apache.spark.mllib.tree.impurity.Gini, org.apache.spark.mllib.tree.impurity.Entropy. Supported for Regression: org.apache.spark.mllib.tree.impurity.Variance.
maxDepth
Maximum depth of the tree. E.g., depth 0 means 1 leaf node; depth 1 means 1 internal node + 2 leaf nodes.
numClasses
Number of classes for classification. (Ignored for regression.) Default value is 2 (binary classification).
maxBins
Maximum number of bins used for discretizing continuous features and for choosing how to split on features at each node. More bins give higher granularity.
quantileCalculationStrategy
Algorithm for calculating quantiles. Supported: org.apache.spark.mllib.tree.configuration.QuantileStrategy.Sort
categoricalFeaturesInfo
A map storing information about the categorical variables and the number of discrete values they take. For example, an entry (n -> k) implies the feature n is categorical with k categories 0, 1, 2, ... , k-1. It's important to note that features are zero-indexed.
minInstancesPerNode
Minimum number of instances each child must have after split. Default value is 1. If a split cause left or right child to have less than minInstancesPerNode, this split will not be considered as a valid split.
minInfoGain
Minimum information gain a split must get. Default value is 0.0. If a split has less information gain than minInfoGain, this split will not be considered as a valid split.
maxMemoryInMB
Maximum memory in MB allocated to histogram aggregation. Default value is 256 MB.
subsamplingRate
Fraction of the training data used for learning decision tree.
useNodeIdCache
If this is true, instead of passing trees to executors, the algorithm will maintain a separate RDD of node Id cache for each row.
checkpointDir
If the node Id cache is used, it will help to checkpoint the node Id cache periodically. This is the checkpoint directory to be used for the node Id cache.
checkpointInterval
How often to checkpoint when the node Id cache gets updated. E.g. 10 means that the cache will get checkpointed every 10 updates.

Value Members

final def !=(arg0: AnyRef): Boolean

Definition Classes
AnyRef
final def !=(arg0: Any): Boolean

Definition Classes
Any
final def ##(): Int

Definition Classes
AnyRef → Any
final def ==(arg0: AnyRef): Boolean

Definition Classes
AnyRef
final def ==(arg0: Any): Boolean

Definition Classes
Any
var algo: Algo.Algo

Learning goal.
Learning goal. Supported: org.apache.spark.mllib.tree.configuration.Algo.Classification, org.apache.spark.mllib.tree.configuration.Algo.Regression
final def asInstanceOf[T0]: T0

Definition Classes
Any
var categoricalFeaturesInfo: Map[Int, Int]

A map storing information about the categorical variables and the number of discrete values they take.
A map storing information about the categorical variables and the number of discrete values they take. For example, an entry (n -> k) implies the feature n is categorical with k categories 0, 1, 2, ... , k-1. It's important to note that features are zero-indexed.
var checkpointDir: Option[String]

If the node Id cache is used, it will help to checkpoint the node Id cache periodically.
If the node Id cache is used, it will help to checkpoint the node Id cache periodically. This is the checkpoint directory to be used for the node Id cache.
var checkpointInterval: Int

How often to checkpoint when the node Id cache gets updated.
How often to checkpoint when the node Id cache gets updated. E.g. 10 means that the cache will get checkpointed every 10 updates.
def clone(): AnyRef

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( ... )
def copy: Strategy

Returns a shallow copy of this instance.
final def eq(arg0: AnyRef): Boolean

Definition Classes
AnyRef
def equals(arg0: Any): Boolean

Definition Classes
AnyRef → Any
def finalize(): Unit

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( classOf[java.lang.Throwable] )
def getAlgo(): Algo.Algo
def getCategoricalFeaturesInfo(): Map[Int, Int]
def getCheckpointDir(): Option[String]
def getCheckpointInterval(): Int
final def getClass(): Class[_]

Definition Classes
AnyRef → Any
def getImpurity(): Impurity
def getMaxBins(): Int
def getMaxDepth(): Int
def getMaxMemoryInMB(): Int
def getMinInfoGain(): Double
def getMinInstancesPerNode(): Int
def getNumClasses(): Int
def getQuantileCalculationStrategy(): QuantileStrategy.QuantileStrategy
def getSubsamplingRate(): Double
def getUseNodeIdCache(): Boolean
def hashCode(): Int

Definition Classes
AnyRef → Any
var impurity: Impurity

Criterion used for information gain calculation.
Criterion used for information gain calculation. Supported for Classification: org.apache.spark.mllib.tree.impurity.Gini, org.apache.spark.mllib.tree.impurity.Entropy. Supported for Regression: org.apache.spark.mllib.tree.impurity.Variance.
final def isInstanceOf[T0]: Boolean

Definition Classes
Any
def isMulticlassClassification: Boolean
def isMulticlassWithCategoricalFeatures: Boolean
var maxBins: Int

Maximum number of bins used for discretizing continuous features and for choosing how to split on features at each node.
Maximum number of bins used for discretizing continuous features and for choosing how to split on features at each node. More bins give higher granularity.
var maxDepth: Int

Maximum depth of the tree.
Maximum depth of the tree. E.g., depth 0 means 1 leaf node; depth 1 means 1 internal node + 2 leaf nodes.
var maxMemoryInMB: Int

Maximum memory in MB allocated to histogram aggregation.
Maximum memory in MB allocated to histogram aggregation. Default value is 256 MB.
var minInfoGain: Double

Minimum information gain a split must get.
Minimum information gain a split must get. Default value is 0.0. If a split has less information gain than minInfoGain, this split will not be considered as a valid split.
var minInstancesPerNode: Int

Minimum number of instances each child must have after split.
Minimum number of instances each child must have after split. Default value is 1. If a split cause left or right child to have less than minInstancesPerNode, this split will not be considered as a valid split.
final def ne(arg0: AnyRef): Boolean

Definition Classes
AnyRef
final def notify(): Unit

Definition Classes
AnyRef
final def notifyAll(): Unit

Definition Classes
AnyRef
var numClasses: Int

Number of classes for classification.
Number of classes for classification. (Ignored for regression.) Default value is 2 (binary classification).
var quantileCalculationStrategy: QuantileStrategy.QuantileStrategy

Algorithm for calculating quantiles.
Algorithm for calculating quantiles. Supported: org.apache.spark.mllib.tree.configuration.QuantileStrategy.Sort
def setAlgo(algo: String): Unit

Sets Algorithm using a String.
def setAlgo(arg0: Algo.Algo): Unit
def setCategoricalFeaturesInfo(categoricalFeaturesInfo: Map[Integer, Integer]): Unit

Sets categoricalFeaturesInfo using a Java Map.
def setCategoricalFeaturesInfo(arg0: Map[Int, Int]): Unit
def setCheckpointDir(arg0: Option[String]): Unit
def setCheckpointInterval(arg0: Int): Unit
def setImpurity(arg0: Impurity): Unit
def setMaxBins(arg0: Int): Unit
def setMaxDepth(arg0: Int): Unit
def setMaxMemoryInMB(arg0: Int): Unit
def setMinInfoGain(arg0: Double): Unit
def setMinInstancesPerNode(arg0: Int): Unit
def setNumClasses(arg0: Int): Unit
def setQuantileCalculationStrategy(arg0: QuantileStrategy.QuantileStrategy): Unit
def setSubsamplingRate(arg0: Double): Unit
def setUseNodeIdCache(arg0: Boolean): Unit
var subsamplingRate: Double

Fraction of the training data used for learning decision tree.
final def synchronized[T0](arg0: ⇒ T0): T0

Definition Classes
AnyRef
def toString(): String

Definition Classes
AnyRef → Any
var useNodeIdCache: Boolean

If this is true, instead of passing trees to executors, the algorithm will maintain a separate RDD of node Id cache for each row.
final def wait(): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long, arg1: Int): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )

class Strategy extends Serializable

Instance Constructors

new Strategy(algo: Algo.Algo, impurity: Impurity, maxDepth: Int, numClasses: Int, maxBins: Int, categoricalFeaturesInfo: Map[Integer, Integer])

Value Members

final def !=(arg0: AnyRef): Boolean

final def !=(arg0: Any): Boolean

final def ##(): Int

final def ==(arg0: AnyRef): Boolean

final def ==(arg0: Any): Boolean

var algo: Algo.Algo

final def asInstanceOf[T0]: T0

var categoricalFeaturesInfo: Map[Int, Int]

var checkpointDir: Option[String]

var checkpointInterval: Int

def clone(): AnyRef

def copy: Strategy

final def eq(arg0: AnyRef): Boolean

def equals(arg0: Any): Boolean

def finalize(): Unit

def getAlgo(): Algo.Algo

def getCategoricalFeaturesInfo(): Map[Int, Int]

def getCheckpointDir(): Option[String]

def getCheckpointInterval(): Int

final def getClass(): Class[_]

def getImpurity(): Impurity

def getMaxBins(): Int

def getMaxDepth(): Int

def getMaxMemoryInMB(): Int

def getMinInfoGain(): Double

def getMinInstancesPerNode(): Int

def getNumClasses(): Int

def getQuantileCalculationStrategy(): QuantileStrategy.QuantileStrategy

def getSubsamplingRate(): Double

def getUseNodeIdCache(): Boolean

def hashCode(): Int

var impurity: Impurity

final def isInstanceOf[T0]: Boolean

def isMulticlassClassification: Boolean

def isMulticlassWithCategoricalFeatures: Boolean

var maxBins: Int

var maxDepth: Int

var maxMemoryInMB: Int

var minInfoGain: Double

var minInstancesPerNode: Int

final def ne(arg0: AnyRef): Boolean

final def notify(): Unit

final def notifyAll(): Unit

var numClasses: Int

var quantileCalculationStrategy: QuantileStrategy.QuantileStrategy

def setAlgo(algo: String): Unit

def setAlgo(arg0: Algo.Algo): Unit

def setCategoricalFeaturesInfo(categoricalFeaturesInfo: Map[Integer, Integer]): Unit

def setCategoricalFeaturesInfo(arg0: Map[Int, Int]): Unit

def setCheckpointDir(arg0: Option[String]): Unit

def setCheckpointInterval(arg0: Int): Unit

def setImpurity(arg0: Impurity): Unit

def setMaxBins(arg0: Int): Unit

def setMaxDepth(arg0: Int): Unit

def setMaxMemoryInMB(arg0: Int): Unit

def setMinInfoGain(arg0: Double): Unit

def setMinInstancesPerNode(arg0: Int): Unit

def setNumClasses(arg0: Int): Unit

def setQuantileCalculationStrategy(arg0: QuantileStrategy.QuantileStrategy): Unit

def setSubsamplingRate(arg0: Double): Unit

def setUseNodeIdCache(arg0: Boolean): Unit

var subsamplingRate: Double

final def synchronized[T0](arg0: ⇒ T0): T0

def toString(): String

var useNodeIdCache: Boolean

final def wait(): Unit

final def wait(arg0: Long, arg1: Int): Unit

final def wait(arg0: Long): Unit

Inherited from Serializable

Inherited from Serializable

Inherited from AnyRef

Inherited from Any

Ungrouped