StandardScaler (Spark 3.2.1 JavaDoc)

Object
- org.apache.spark.ml.PipelineStage
- - org.apache.spark.ml.Estimator<StandardScalerModel>
  - - org.apache.spark.ml.feature.StandardScaler

All Implemented Interfaces:

java.io.Serializable, org.apache.spark.internal.Logging, StandardScalerParams, Params, HasInputCol, HasOutputCol, DefaultParamsWritable, Identifiable, MLWritable
```
public class StandardScaler
extends Estimator<StandardScalerModel>
implements StandardScalerParams, DefaultParamsWritable
```
Standardizes features by removing the mean and scaling to unit variance using column summary statistics on the samples in the training set.
The "unit std" is computed using the corrected sample standard deviation, which is computed as the square root of the unbiased sample variance.

See Also:

Serialized Form

Constructor Summary

Constructors
Constructor and Description

StandardScaler()

StandardScaler(String uid)

Constructors
Constructor and Description
`StandardScaler()`
`StandardScaler(String uid)`

Method Summary

All Methods Static Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`StandardScaler`	`copy(ParamMap extra)` Creates a copy of this instance with the same UID and some extra params.
`StandardScalerModel`	`fit(Dataset<?> dataset)` Fits a model to the input data.
`Param<String>`	`inputCol()` Param for input column name.
`static StandardScaler`	`load(String path)`
`Param<String>`	`outputCol()` Param for output column name.
`static MLReader<T>`	`read()`
`StandardScaler`	`setInputCol(String value)`
`StandardScaler`	`setOutputCol(String value)`
`StandardScaler`	`setWithMean(boolean value)`
`StandardScaler`	`setWithStd(boolean value)`
`StructType`	`transformSchema(StructType schema)` Check transform validity and derive the output schema from the input schema.
`String`	`uid()` An immutable unique ID for the object and its derivatives.
`BooleanParam`	`withMean()` Whether to center the data with mean before scaling.
`BooleanParam`	`withStd()` Whether to scale the data to unit standard deviation.

Methods inherited from class org.apache.spark.ml.Estimator
fit, fit, fit, fit

Methods inherited from class org.apache.spark.ml.PipelineStage
params

Methods inherited from class Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface org.apache.spark.ml.feature.StandardScalerParams
getWithMean, getWithStd, validateAndTransformSchema

Methods inherited from interface org.apache.spark.ml.param.shared.HasInputCol
getInputCol

Methods inherited from interface org.apache.spark.ml.param.shared.HasOutputCol
getOutputCol

Methods inherited from interface org.apache.spark.ml.param.Params
clear, copyValues, defaultCopy, defaultParamMap, explainParam, explainParams, extractParamMap, extractParamMap, get, getDefault, getOrDefault, getParam, hasDefault, hasParam, isDefined, isSet, paramMap, params, set, set, set, setDefault, setDefault, shouldOwn

Methods inherited from interface org.apache.spark.ml.util.Identifiable
toString

Methods inherited from interface org.apache.spark.ml.util.DefaultParamsWritable
write

Methods inherited from interface org.apache.spark.ml.util.MLWritable
save

Methods inherited from interface org.apache.spark.internal.Logging
$init$, initializeForcefully, initializeLogIfNecessary, initializeLogIfNecessary, initializeLogIfNecessary$default$2, initLock, isTraceEnabled, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning, org$apache$spark$internal$Logging$$log__$eq, org$apache$spark$internal$Logging$$log_, uninitialize

- Constructor Detail
  - StandardScaler
```
public StandardScaler(String uid)
```
  - StandardScaler
```
public StandardScaler()
```
- Method Detail
  - load
```
public static StandardScaler load(String path)
```
  - read
```
public static MLReader<T> read()
```
  - withMean
```
public BooleanParam withMean()
```
    Description copied from interface: StandardScalerParams
    
    Whether to center the data with mean before scaling. It will build a dense output, so take care when applying to sparse input. Default: false
    
    Specified by:
    
    withMean in interface StandardScalerParams
    
    Returns:
    
    (undocumented)
  - withStd
```
public BooleanParam withStd()
```
    Description copied from interface: StandardScalerParams
    
    Whether to scale the data to unit standard deviation. Default: true
    
    Specified by:
    
    withStd in interface StandardScalerParams
    
    Returns:
    
    (undocumented)
  - outputCol
```
public final Param<String> outputCol()
```
    Description copied from interface: HasOutputCol
    
    Param for output column name.
    
    Specified by:
    
    outputCol in interface HasOutputCol
    
    Returns:
    
    (undocumented)
  - inputCol
```
public final Param<String> inputCol()
```
    Description copied from interface: HasInputCol
    
    Param for input column name.
    
    Specified by:
    
    inputCol in interface HasInputCol
    
    Returns:
    
    (undocumented)
  - uid
```
public String uid()
```
    Description copied from interface: Identifiable
    
    An immutable unique ID for the object and its derivatives.
    
    Specified by:
    
    uid in interface Identifiable
    
    Returns:
    
    (undocumented)
  - setInputCol
```
public StandardScaler setInputCol(String value)
```
  - setOutputCol
```
public StandardScaler setOutputCol(String value)
```
  - setWithMean
```
public StandardScaler setWithMean(boolean value)
```
  - setWithStd
```
public StandardScaler setWithStd(boolean value)
```
  - fit
```
public StandardScalerModel fit(Dataset<?> dataset)
```
    Description copied from class: Estimator
    
    Fits a model to the input data.
    
    Specified by:
    
    fit in class Estimator<StandardScalerModel>
    
    Parameters:
    
    dataset - (undocumented)
    
    Returns:
    
    (undocumented)
  - transformSchema
```
public StructType transformSchema(StructType schema)
```
    Description copied from class: PipelineStage
    
    Check transform validity and derive the output schema from the input schema.
    We check validity for interactions between parameters during transformSchema and raise an exception if any parameter value is invalid. Parameter value checks which do not depend on other parameters are handled by Param.validate().
    Typical implementation should first conduct verification on schema change and parameter validity, including complex parameter interaction checks.
    
    Specified by:
    
    transformSchema in class PipelineStage
    
    Parameters:
    
    schema - (undocumented)
    
    Returns:
    
    (undocumented)
  - copy
```
public StandardScaler copy(ParamMap extra)
```
    Description copied from interface: Params
    
    Creates a copy of this instance with the same UID and some extra params. Subclasses should implement this method and set the return type properly. See defaultCopy().
    
    Specified by:
    
    copy in interface Params
    
    Specified by:
    
    copy in class Estimator<StandardScalerModel>
    
    Parameters:
    
    extra - (undocumented)
    
    Returns:
    
    (undocumented)

Class StandardScaler

Constructor Summary

Method Summary

Methods inherited from class org.apache.spark.ml.Estimator

Methods inherited from class org.apache.spark.ml.PipelineStage

Methods inherited from class Object

Methods inherited from interface org.apache.spark.ml.feature.StandardScalerParams

Methods inherited from interface org.apache.spark.ml.param.shared.HasInputCol

Methods inherited from interface org.apache.spark.ml.param.shared.HasOutputCol

Methods inherited from interface org.apache.spark.ml.param.Params

Methods inherited from interface org.apache.spark.ml.util.Identifiable

Methods inherited from interface org.apache.spark.ml.util.DefaultParamsWritable

Methods inherited from interface org.apache.spark.ml.util.MLWritable

Methods inherited from interface org.apache.spark.internal.Logging

Constructor Detail

StandardScaler

StandardScaler

Method Detail

load

read

withMean

withStd

outputCol

inputCol

uid

setInputCol

setOutputCol

setWithMean

setWithStd

fit

transformSchema

copy