public class OneHotEncoderEstimator extends Estimator<OneHotEncoderModel> implements OneHotEncoderBase, DefaultParamsWritable
[0.0, 0.0, 1.0, 0.0].
The last category is not included by default (configurable via dropLast),
because it makes the vector entries sum up to one, and hence linearly dependent.
So an input value of 4.0 maps to [0.0, 0.0, 0.0, 0.0].
StringIndexer for converting categorical values into category indices,
Serialized Form
When handleInvalid is configured to 'keep', an extra "category" indicating invalid values is
added as last category. So when dropLast is true, invalid values are encoded as all-zeros
vector.
, When encoding multi-column by using inputCols and outputCols params, input/output cols
come in pairs, specified by the order in the arrays, and each pair is treated independently.
| Constructor and Description |
|---|
OneHotEncoderEstimator() |
OneHotEncoderEstimator(String uid) |
| Modifier and Type | Method and Description |
|---|---|
OneHotEncoderEstimator |
copy(ParamMap extra)
Creates a copy of this instance with the same UID and some extra params.
|
OneHotEncoderModel |
fit(Dataset<?> dataset)
Fits a model to the input data.
|
static OneHotEncoderEstimator |
load(String path) |
static MLReader<T> |
read() |
OneHotEncoderEstimator |
setDropLast(boolean value) |
OneHotEncoderEstimator |
setHandleInvalid(String value) |
OneHotEncoderEstimator |
setInputCols(String[] values) |
OneHotEncoderEstimator |
setOutputCols(String[] values) |
StructType |
transformSchema(StructType schema)
:: DeveloperApi ::
|
String |
uid()
An immutable unique ID for the object and its derivatives.
|
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitdropLast, getDropLast, handleInvalid, validateAndTransformSchemagetHandleInvalidgetInputCols, inputColsgetOutputCols, outputColsclear, copyValues, defaultCopy, defaultParamMap, explainParam, explainParams, extractParamMap, extractParamMap, get, getDefault, getOrDefault, getParam, hasDefault, hasParam, isDefined, isSet, paramMap, params, set, set, set, setDefault, setDefault, shouldOwntoStringwritesaveinitializeLogging, initializeLogIfNecessary, initializeLogIfNecessary, isTraceEnabled, log_, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarningpublic OneHotEncoderEstimator(String uid)
public OneHotEncoderEstimator()
public static OneHotEncoderEstimator load(String path)
public static MLReader<T> read()
public String uid()
Identifiableuid in interface Identifiablepublic OneHotEncoderEstimator setInputCols(String[] values)
public OneHotEncoderEstimator setOutputCols(String[] values)
public OneHotEncoderEstimator setDropLast(boolean value)
public OneHotEncoderEstimator setHandleInvalid(String value)
public StructType transformSchema(StructType schema)
PipelineStageCheck transform validity and derive the output schema from the input schema.
We check validity for interactions between parameters during transformSchema and
raise an exception if any parameter value is invalid. Parameter value checks which
do not depend on other parameters are handled by Param.validate().
Typical implementation should first conduct verification on schema change and parameter validity, including complex parameter interaction checks.
transformSchema in class PipelineStageschema - (undocumented)public OneHotEncoderModel fit(Dataset<?> dataset)
Estimatorfit in class Estimator<OneHotEncoderModel>dataset - (undocumented)public OneHotEncoderEstimator copy(ParamMap extra)
ParamsdefaultCopy().copy in interface Paramscopy in class Estimator<OneHotEncoderModel>extra - (undocumented)