SparseVector¶
- 
class pyspark.mllib.linalg.SparseVector(size: int, *args: Union[bytes, Tuple[int, float], Iterable[float], Iterable[Tuple[int, float]], Dict[int, float]])[source]¶
- A simple sparse vector class for passing data to MLlib. Users may alternatively pass SciPy’s {scipy.sparse} data types. - Methods - asML()- Convert this vector to the new mllib-local representation. - dot(other)- Dot product with a SparseVector or 1- or 2-dimensional Numpy array. - norm(p)- Calculates the norm of a SparseVector. - Number of nonzero elements. - parse(s)- Parse string representation back into the SparseVector. - squared_distance(other)- Squared distance from a SparseVector or 1-dimensional NumPy array. - toArray()- Returns a copy of this SparseVector as a 1-dimensional NumPy array. - Methods Documentation - 
asML() → pyspark.ml.linalg.SparseVector[source]¶
- Convert this vector to the new mllib-local representation. This does NOT copy the data; it copies references. - New in version 2.0.0. - Returns
 
 - 
dot(other: Iterable[float]) → numpy.float64[source]¶
- Dot product with a SparseVector or 1- or 2-dimensional Numpy array. - Examples - >>> a = SparseVector(4, [1, 3], [3.0, 4.0]) >>> a.dot(a) 25.0 >>> a.dot(array.array('d', [1., 2., 3., 4.])) 22.0 >>> b = SparseVector(4, [2], [1.0]) >>> a.dot(b) 0.0 >>> a.dot(np.array([[1, 1], [2, 2], [3, 3], [4, 4]])) array([ 22., 22.]) >>> a.dot([1., 2., 3.]) Traceback (most recent call last): ... AssertionError: dimension mismatch >>> a.dot(np.array([1., 2.])) Traceback (most recent call last): ... AssertionError: dimension mismatch >>> a.dot(DenseVector([1., 2.])) Traceback (most recent call last): ... AssertionError: dimension mismatch >>> a.dot(np.zeros((3, 2))) Traceback (most recent call last): ... AssertionError: dimension mismatch 
 - 
norm(p: NormType) → numpy.float64[source]¶
- Calculates the norm of a SparseVector. - Examples - >>> a = SparseVector(4, [0, 1], [3., -4.]) >>> a.norm(1) 7.0 >>> a.norm(2) 5.0 
 - 
numNonzeros() → int[source]¶
- Number of nonzero elements. This scans all active values and count non zeros. 
 - 
static parse(s: str) → pyspark.mllib.linalg.SparseVector[source]¶
- Parse string representation back into the SparseVector. - Examples - >>> SparseVector.parse(' (4, [0,1 ],[ 4.0,5.0] )') SparseVector(4, {0: 4.0, 1: 5.0}) 
 - 
squared_distance(other: Iterable[float]) → numpy.float64[source]¶
- Squared distance from a SparseVector or 1-dimensional NumPy array. - Examples - >>> a = SparseVector(4, [1, 3], [3.0, 4.0]) >>> a.squared_distance(a) 0.0 >>> a.squared_distance(array.array('d', [1., 2., 3., 4.])) 11.0 >>> a.squared_distance(np.array([1., 2., 3., 4.])) 11.0 >>> b = SparseVector(4, [2], [1.0]) >>> a.squared_distance(b) 26.0 >>> b.squared_distance(a) 26.0 >>> b.squared_distance([1., 2.]) Traceback (most recent call last): ... AssertionError: dimension mismatch >>> b.squared_distance(SparseVector(3, [1,], [1.0,])) Traceback (most recent call last): ... AssertionError: dimension mismatch 
 
-