Class/Object

org.apache.spark.mllib.tree.configuration

Strategy

Related Docs: object Strategy | package configuration

Permalink

class Strategy extends Serializable

Stores all the configuration options for tree construction

Annotations
@Since( "1.0.0" )
Source
Strategy.scala
Linear Supertypes
Serializable, Serializable, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. Strategy
  2. Serializable
  3. Serializable
  4. AnyRef
  5. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new Strategy(algo: Algo.Algo, impurity: Impurity, maxDepth: Int, numClasses: Int, maxBins: Int, categoricalFeaturesInfo: Map[Integer, Integer])

    Permalink

    Java-friendly constructor for org.apache.spark.mllib.tree.configuration.Strategy

    Annotations
    @Since( "1.1.0" )
  2. new Strategy(algo: Algo.Algo, impurity: Impurity, maxDepth: Int, numClasses: Int = 2, maxBins: Int = 32, quantileCalculationStrategy: QuantileStrategy.QuantileStrategy = Sort, categoricalFeaturesInfo: Map[Int, Int] = Map[Int, Int](), minInstancesPerNode: Int = 1, minInfoGain: Double = 0.0, maxMemoryInMB: Int = 256, subsamplingRate: Double = 1, useNodeIdCache: Boolean = false, checkpointInterval: Int = 10)

    Permalink

    algo

    Learning goal. Supported: org.apache.spark.mllib.tree.configuration.Algo.Classification, org.apache.spark.mllib.tree.configuration.Algo.Regression

    impurity

    Criterion used for information gain calculation. Supported for Classification: org.apache.spark.mllib.tree.impurity.Gini, org.apache.spark.mllib.tree.impurity.Entropy. Supported for Regression: org.apache.spark.mllib.tree.impurity.Variance.

    maxDepth

    Maximum depth of the tree (e.g. depth 0 means 1 leaf node, depth 1 means 1 internal node + 2 leaf nodes).

    numClasses

    Number of classes for classification. (Ignored for regression.) Default value is 2 (binary classification).

    maxBins

    Maximum number of bins used for discretizing continuous features and for choosing how to split on features at each node. More bins give higher granularity.

    quantileCalculationStrategy

    Algorithm for calculating quantiles. Supported: org.apache.spark.mllib.tree.configuration.QuantileStrategy.Sort

    categoricalFeaturesInfo

    A map storing information about the categorical variables and the number of discrete values they take. An entry (n -> k) indicates that feature n is categorical with k categories indexed from 0: {0, 1, ..., k-1}.

    minInstancesPerNode

    Minimum number of instances each child must have after split. Default value is 1. If a split cause left or right child to have less than minInstancesPerNode, this split will not be considered as a valid split.

    minInfoGain

    Minimum information gain a split must get. Default value is 0.0. If a split has less information gain than minInfoGain, this split will not be considered as a valid split.

    maxMemoryInMB

    Maximum memory in MB allocated to histogram aggregation. Default value is 256 MB. If too small, then 1 node will be split per iteration, and its aggregates may exceed this size.

    subsamplingRate

    Fraction of the training data used for learning decision tree.

    useNodeIdCache

    If this is true, instead of passing trees to executors, the algorithm will maintain a separate RDD of node Id cache for each row.

    checkpointInterval

    How often to checkpoint when the node Id cache gets updated. E.g. 10 means that the cache will get checkpointed every 10 updates. If the checkpoint directory is not set in org.apache.spark.SparkContext, this setting is ignored.

    Annotations
    @Since( "1.3.0" )

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. var algo: Algo.Algo

    Permalink

    Learning goal.

  5. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  6. var categoricalFeaturesInfo: Map[Int, Int]

    Permalink

    A map storing information about the categorical variables and the number of discrete values they take.

    A map storing information about the categorical variables and the number of discrete values they take. An entry (n -> k) indicates that feature n is categorical with k categories indexed from 0: {0, 1, ..., k-1}.

    Annotations
    @Since( "1.0.0" )
  7. var checkpointInterval: Int

    Permalink

    How often to checkpoint when the node Id cache gets updated.

    How often to checkpoint when the node Id cache gets updated. E.g. 10 means that the cache will get checkpointed every 10 updates. If the checkpoint directory is not set in org.apache.spark.SparkContext, this setting is ignored.

    Annotations
    @Since( "1.2.0" )
  8. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  9. def copy: Strategy

    Permalink

    Returns a shallow copy of this instance.

    Returns a shallow copy of this instance.

    Annotations
    @Since( "1.2.0" )
  10. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  11. def equals(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  12. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  13. def getAlgo(): Algo.Algo

    Permalink
    Annotations
    @Since( "1.0.0" )
  14. def getCategoricalFeaturesInfo(): Map[Int, Int]

    Permalink
    Annotations
    @Since( "1.0.0" )
  15. def getCheckpointInterval(): Int

    Permalink
    Annotations
    @Since( "1.2.0" )
  16. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  17. def getImpurity(): Impurity

    Permalink
    Annotations
    @Since( "1.0.0" )
  18. def getMaxBins(): Int

    Permalink
    Annotations
    @Since( "1.0.0" )
  19. def getMaxDepth(): Int

    Permalink
    Annotations
    @Since( "1.0.0" )
  20. def getMaxMemoryInMB(): Int

    Permalink
    Annotations
    @Since( "1.0.0" )
  21. def getMinInfoGain(): Double

    Permalink
    Annotations
    @Since( "1.2.0" )
  22. def getMinInstancesPerNode(): Int

    Permalink
    Annotations
    @Since( "1.2.0" )
  23. def getNumClasses(): Int

    Permalink
    Annotations
    @Since( "1.2.0" )
  24. def getQuantileCalculationStrategy(): QuantileStrategy.QuantileStrategy

    Permalink
    Annotations
    @Since( "1.0.0" )
  25. def getSubsamplingRate(): Double

    Permalink
    Annotations
    @Since( "1.2.0" )
  26. def getUseNodeIdCache(): Boolean

    Permalink
    Annotations
    @Since( "1.2.0" )
  27. def hashCode(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  28. var impurity: Impurity

    Permalink

    Criterion used for information gain calculation.

    Criterion used for information gain calculation. Supported for Classification: org.apache.spark.mllib.tree.impurity.Gini, org.apache.spark.mllib.tree.impurity.Entropy. Supported for Regression: org.apache.spark.mllib.tree.impurity.Variance.

    Annotations
    @Since( "1.0.0" )
  29. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  30. def isMulticlassClassification: Boolean

    Permalink

    Annotations
    @Since( "1.2.0" )
  31. def isMulticlassWithCategoricalFeatures: Boolean

    Permalink

    Annotations
    @Since( "1.2.0" )
  32. var maxBins: Int

    Permalink

    Maximum number of bins used for discretizing continuous features and for choosing how to split on features at each node.

    Maximum number of bins used for discretizing continuous features and for choosing how to split on features at each node. More bins give higher granularity.

    Annotations
    @Since( "1.0.0" )
  33. var maxDepth: Int

    Permalink

    Maximum depth of the tree (e.g.

    Maximum depth of the tree (e.g. depth 0 means 1 leaf node, depth 1 means 1 internal node + 2 leaf nodes).

    Annotations
    @Since( "1.0.0" )
  34. var maxMemoryInMB: Int

    Permalink

    Maximum memory in MB allocated to histogram aggregation.

    Maximum memory in MB allocated to histogram aggregation. Default value is 256 MB. If too small, then 1 node will be split per iteration, and its aggregates may exceed this size.

    Annotations
    @Since( "1.0.0" )
  35. var minInfoGain: Double

    Permalink

    Minimum information gain a split must get.

    Minimum information gain a split must get. Default value is 0.0. If a split has less information gain than minInfoGain, this split will not be considered as a valid split.

    Annotations
    @Since( "1.2.0" )
  36. var minInstancesPerNode: Int

    Permalink

    Minimum number of instances each child must have after split.

    Minimum number of instances each child must have after split. Default value is 1. If a split cause left or right child to have less than minInstancesPerNode, this split will not be considered as a valid split.

    Annotations
    @Since( "1.2.0" )
  37. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  38. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  39. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  40. var numClasses: Int

    Permalink

    Number of classes for classification.

    Number of classes for classification. (Ignored for regression.) Default value is 2 (binary classification).

    Annotations
    @Since( "1.2.0" )
  41. var quantileCalculationStrategy: QuantileStrategy.QuantileStrategy

    Permalink

    Algorithm for calculating quantiles.

    Algorithm for calculating quantiles. Supported: org.apache.spark.mllib.tree.configuration.QuantileStrategy.Sort

    Annotations
    @Since( "1.0.0" )
  42. def setAlgo(algo: String): Unit

    Permalink

    Sets Algorithm using a String.

    Sets Algorithm using a String.

    Annotations
    @Since( "1.2.0" )
  43. def setAlgo(arg0: Algo.Algo): Unit

    Permalink
  44. def setCategoricalFeaturesInfo(categoricalFeaturesInfo: Map[Integer, Integer]): Unit

    Permalink

    Sets categoricalFeaturesInfo using a Java Map.

    Sets categoricalFeaturesInfo using a Java Map.

    Annotations
    @Since( "1.2.0" )
  45. def setCategoricalFeaturesInfo(arg0: Map[Int, Int]): Unit

    Permalink
  46. def setCheckpointInterval(arg0: Int): Unit

    Permalink
    Annotations
    @Since( "1.2.0" )
  47. def setImpurity(arg0: Impurity): Unit

    Permalink
    Annotations
    @Since( "1.0.0" )
  48. def setMaxBins(arg0: Int): Unit

    Permalink
    Annotations
    @Since( "1.0.0" )
  49. def setMaxDepth(arg0: Int): Unit

    Permalink
    Annotations
    @Since( "1.0.0" )
  50. def setMaxMemoryInMB(arg0: Int): Unit

    Permalink
    Annotations
    @Since( "1.0.0" )
  51. def setMinInfoGain(arg0: Double): Unit

    Permalink
    Annotations
    @Since( "1.2.0" )
  52. def setMinInstancesPerNode(arg0: Int): Unit

    Permalink
    Annotations
    @Since( "1.2.0" )
  53. def setNumClasses(arg0: Int): Unit

    Permalink
    Annotations
    @Since( "1.2.0" )
  54. def setQuantileCalculationStrategy(arg0: QuantileStrategy.QuantileStrategy): Unit

    Permalink
    Annotations
    @Since( "1.0.0" )
  55. def setSubsamplingRate(arg0: Double): Unit

    Permalink
    Annotations
    @Since( "1.2.0" )
  56. def setUseNodeIdCache(arg0: Boolean): Unit

    Permalink
    Annotations
    @Since( "1.2.0" )
  57. var subsamplingRate: Double

    Permalink

    Fraction of the training data used for learning decision tree.

    Fraction of the training data used for learning decision tree.

    Annotations
    @Since( "1.2.0" )
  58. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  59. def toString(): String

    Permalink
    Definition Classes
    AnyRef → Any
  60. var useNodeIdCache: Boolean

    Permalink

    If this is true, instead of passing trees to executors, the algorithm will maintain a separate RDD of node Id cache for each row.

    If this is true, instead of passing trees to executors, the algorithm will maintain a separate RDD of node Id cache for each row.

    Annotations
    @Since( "1.2.0" )
  61. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  62. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  63. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from Serializable

Inherited from Serializable

Inherited from AnyRef

Inherited from Any

Ungrouped