org.apache.spark.mllib.stat

Statistics

object Statistics

:: Experimental :: API for statistical functions in MLlib.

Annotations
@Experimental()
Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By inheritance
Inherited
  1. Statistics
  2. AnyRef
  3. Any
  1. Hide All
  2. Show all
Learn more about member selection
Visibility
  1. Public
  2. All

Value Members

  1. final def !=(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  2. final def !=(arg0: Any): Boolean

    Definition Classes
    Any
  3. final def ##(): Int

    Definition Classes
    AnyRef → Any
  4. final def ==(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  5. final def ==(arg0: Any): Boolean

    Definition Classes
    Any
  6. final def asInstanceOf[T0]: T0

    Definition Classes
    Any
  7. def chiSqTest(data: RDD[LabeledPoint]): Array[ChiSqTestResult]

    Conduct Pearson's independence test for every feature against the label across the input RDD.

    Conduct Pearson's independence test for every feature against the label across the input RDD. For each feature, the (feature, label) pairs are converted into a contingency matrix for which the chi-squared statistic is computed. All label and feature values must be categorical.

    data

    an RDD[LabeledPoint] containing the labeled dataset with categorical features. Real-valued features will be treated as categorical for each distinct value.

    returns

    an array containing the ChiSquaredTestResult for every feature against the label. The order of the elements in the returned array reflects the order of input features.

  8. def chiSqTest(observed: Matrix): ChiSqTestResult

    Conduct Pearson's independence test on the input contingency matrix, which cannot contain negative entries or columns or rows that sum up to 0.

    Conduct Pearson's independence test on the input contingency matrix, which cannot contain negative entries or columns or rows that sum up to 0.

    observed

    The contingency matrix (containing either counts or relative frequencies).

    returns

    ChiSquaredTest object containing the test statistic, degrees of freedom, p-value, the method used, and the null hypothesis.

  9. def chiSqTest(observed: Vector): ChiSqTestResult

    Conduct Pearson's chi-squared goodness of fit test of the observed data against the uniform distribution, with each category having an expected frequency of 1 / observed.size.

    Conduct Pearson's chi-squared goodness of fit test of the observed data against the uniform distribution, with each category having an expected frequency of 1 / observed.size.

    Note: observed cannot contain negative values.

    observed

    Vector containing the observed categorical counts/relative frequencies.

    returns

    ChiSquaredTest object containing the test statistic, degrees of freedom, p-value, the method used, and the null hypothesis.

  10. def chiSqTest(observed: Vector, expected: Vector): ChiSqTestResult

    Conduct Pearson's chi-squared goodness of fit test of the observed data against the expected distribution.

    Conduct Pearson's chi-squared goodness of fit test of the observed data against the expected distribution.

    Note: the two input Vectors need to have the same size. observed cannot contain negative values. expected cannot contain nonpositive values.

    observed

    Vector containing the observed categorical counts/relative frequencies.

    expected

    Vector containing the expected categorical counts/relative frequencies. expected is rescaled if the expected sum differs from the observed sum.

    returns

    ChiSquaredTest object containing the test statistic, degrees of freedom, p-value, the method used, and the null hypothesis.

  11. def clone(): AnyRef

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  12. def colStats(X: RDD[Vector]): MultivariateStatisticalSummary

    Computes column-wise summary statistics for the input RDD[Vector].

    Computes column-wise summary statistics for the input RDD[Vector].

    X

    an RDD[Vector] for which column-wise summary statistics are to be computed.

    returns

    MultivariateStatisticalSummary object containing column-wise summary statistics.

  13. def corr(x: RDD[Double], y: RDD[Double], method: String): Double

    Compute the correlation for the input RDDs using the specified method.

    Compute the correlation for the input RDDs using the specified method. Methods currently supported: pearson (default), spearman.

    Note: the two input RDDs need to have the same number of partitions and the same number of elements in each partition.

    x

    RDD[Double] of the same cardinality as y.

    y

    RDD[Double] of the same cardinality as x.

    method

    String specifying the method to use for computing correlation. Supported: pearson (default), spearman

    returns

    A Double containing the correlation between the two input RDD[Double]s using the specified method.

  14. def corr(x: RDD[Double], y: RDD[Double]): Double

    Compute the Pearson correlation for the input RDDs.

    Compute the Pearson correlation for the input RDDs. Returns NaN if either vector has 0 variance.

    Note: the two input RDDs need to have the same number of partitions and the same number of elements in each partition.

    x

    RDD[Double] of the same cardinality as y.

    y

    RDD[Double] of the same cardinality as x.

    returns

    A Double containing the Pearson correlation between the two input RDD[Double]s

  15. def corr(X: RDD[Vector], method: String): Matrix

    Compute the correlation matrix for the input RDD of Vectors using the specified method.

    Compute the correlation matrix for the input RDD of Vectors using the specified method. Methods currently supported: pearson (default), spearman.

    Note that for Spearman, a rank correlation, we need to create an RDD[Double] for each column and sort it in order to retrieve the ranks and then join the columns back into an RDD[Vector], which is fairly costly. Cache the input RDD before calling corr with method = "spearman" to avoid recomputing the common lineage.

    X

    an RDD[Vector] for which the correlation matrix is to be computed.

    method

    String specifying the method to use for computing correlation. Supported: pearson (default), spearman

    returns

    Correlation matrix comparing columns in X.

  16. def corr(X: RDD[Vector]): Matrix

    Compute the Pearson correlation matrix for the input RDD of Vectors.

    Compute the Pearson correlation matrix for the input RDD of Vectors. Columns with 0 covariance produce NaN entries in the correlation matrix.

    X

    an RDD[Vector] for which the correlation matrix is to be computed.

    returns

    Pearson correlation matrix comparing columns in X.

  17. final def eq(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  18. def equals(arg0: Any): Boolean

    Definition Classes
    AnyRef → Any
  19. def finalize(): Unit

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  20. final def getClass(): Class[_]

    Definition Classes
    AnyRef → Any
  21. def hashCode(): Int

    Definition Classes
    AnyRef → Any
  22. final def isInstanceOf[T0]: Boolean

    Definition Classes
    Any
  23. final def ne(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  24. final def notify(): Unit

    Definition Classes
    AnyRef
  25. final def notifyAll(): Unit

    Definition Classes
    AnyRef
  26. final def synchronized[T0](arg0: ⇒ T0): T0

    Definition Classes
    AnyRef
  27. def toString(): String

    Definition Classes
    AnyRef → Any
  28. final def wait(): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  29. final def wait(arg0: Long, arg1: Int): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  30. final def wait(arg0: Long): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from AnyRef

Inherited from Any

Ungrouped