public class MultivariateOnlineSummarizer extends java.lang.Object implements MultivariateStatisticalSummary, scala.Serializable
MultivariateStatisticalSummary to compute the mean,
variance, minimum, maximum, counts, and nonzero counts for instances in sparse or dense vector
format in a online fashion.
Two MultivariateOnlineSummarizer can be merged together to have a statistical summary of the corresponding joint dataset.
A numerically stable algorithm is implemented to compute the mean and variance of instances:
Reference: variance-wiki
Zero elements (including explicit zero values) are skipped when calling add(),
to have time complexity O(nnz) instead of O(n) for each column.
For weighted instances, the unbiased estimation of variance is defined by the reliability
weights: https://en.wikipedia.org/wiki/Weighted_arithmetic_mean#Reliability_weights.
| Constructor and Description |
|---|
MultivariateOnlineSummarizer() |
| Modifier and Type | Method and Description |
|---|---|
MultivariateOnlineSummarizer |
add(Vector sample)
Add a new sample to this summarizer, and update the statistical summary.
|
long |
count()
Sample size.
|
Vector |
max()
Maximum value of each dimension.
|
Vector |
mean()
Sample mean of each dimension.
|
MultivariateOnlineSummarizer |
merge(MultivariateOnlineSummarizer other)
Merge another MultivariateOnlineSummarizer, and update the statistical summary.
|
Vector |
min()
Minimum value of each dimension.
|
Vector |
normL1()
L1 norm of each dimension.
|
Vector |
normL2()
L2 (Euclidian) norm of each dimension.
|
Vector |
numNonzeros()
Number of nonzero elements in each dimension.
|
Vector |
variance()
Unbiased estimate of sample variance of each dimension.
|
public MultivariateOnlineSummarizer add(Vector sample)
sample - The sample in dense/sparse vector format to be added into this summarizer.public MultivariateOnlineSummarizer merge(MultivariateOnlineSummarizer other)
this object will be modified.)
other - The other MultivariateOnlineSummarizer to be merged.public Vector mean()
mean in interface MultivariateStatisticalSummarypublic Vector variance()
variance in interface MultivariateStatisticalSummarypublic long count()
count in interface MultivariateStatisticalSummarypublic Vector numNonzeros()
numNonzeros in interface MultivariateStatisticalSummarypublic Vector max()
max in interface MultivariateStatisticalSummarypublic Vector min()
min in interface MultivariateStatisticalSummarypublic Vector normL2()
normL2 in interface MultivariateStatisticalSummarypublic Vector normL1()
normL1 in interface MultivariateStatisticalSummary