org.apache.spark.mllib.optimization

LogisticGradient

class LogisticGradient extends Gradient

:: DeveloperApi :: Compute gradient and loss for a multinomial logistic loss function, as used in multi-class classification (it is also used in binary logistic regression).

In The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition by Trevor Hastie, Robert Tibshirani, and Jerome Friedman, which can be downloaded from http://statweb.stanford.edu/~tibs/ElemStatLearn/ , Eq. (4.17) on page 119 gives the formula of multinomial logistic regression model. A simple calculation shows that

P(y=0|x, w) = 1 / (1 + \sum_i^{K-1} \exp(x w_i))
P(y=1|x, w) = exp(x w_1) / (1 + \sum_i^{K-1} \exp(x w_i))
  ...
P(y=K-1|x, w) = exp(x w_{K-1}) / (1 + \sum_i^{K-1} \exp(x w_i))

for K classes multiclass classification problem.

The model weights w = (w_1, w_2, ..., w_{K-1})^T becomes a matrix which has dimension of (K-1) * (N+1) if the intercepts are added. If the intercepts are not added, the dimension will be (K-1) * N.

As a result, the loss of objective function for a single instance of data can be written as

l(w, x) = -log P(y|x, w) = -\alpha(y) log P(y=0|x, w) - (1-\alpha(y)) log P(y|x, w)
= log(1 + \sum_i^{K-1}\exp(x w_i)) - (1-\alpha(y)) x w_{y-1}
= log(1 + \sum_i^{K-1}\exp(margins_i)) - (1-\alpha(y)) margins_{y-1}

where \alpha(i) = 1 if i != 0, and \alpha(i) = 0 if i == 0, margins_i = x w_i.

For optimization, we have to calculate the first derivative of the loss function, and a simple calculation shows that

\frac{\partial l(w, x)}{\partial w_{ij}}
= (\exp(x w_i) / (1 + \sum_k^{K-1} \exp(x w_k)) - (1-\alpha(y)\delta_{y, i+1})) * x_j
= multiplier_i * x_j

where \delta_{i, j} = 1 if i == j, \delta_{i, j} = 0 if i != j, and multiplier = \exp(margins_i) / (1 + \sum_k^{K-1} \exp(margins_i)) - (1-\alpha(y)\delta_{y, i+1})

If any of margins is larger than 709.78, the numerical computation of multiplier and loss function will be suffered from arithmetic overflow. This issue occurs when there are outliers in data which are far away from hyperplane, and this will cause the failing of training once infinity / infinity is introduced. Note that this is only a concern when max(margins) > 0.

Fortunately, when max(margins) = maxMargin > 0, the loss function and the multiplier can be easily rewritten into the following equivalent numerically stable formula.

l(w, x) = log(1 + \sum_i^{K-1}\exp(margins_i)) - (1-\alpha(y)) margins_{y-1}
= log(\exp(-maxMargin) + \sum_i^{K-1}\exp(margins_i - maxMargin)) + maxMargin
  - (1-\alpha(y)) margins_{y-1}
= log(1 + sum) + maxMargin - (1-\alpha(y)) margins_{y-1}

where sum = \exp(-maxMargin) + \sum_i^{K-1}\exp(margins_i - maxMargin) - 1.

Note that each term, (margins_i - maxMargin) in \exp is smaller than zero; as a result, overflow will not happen with this formula.

For multiplier, similar trick can be applied as the following,

multiplier = \exp(margins_i) / (1 + \sum_k^{K-1} \exp(margins_i)) - (1-\alpha(y)\delta_{y, i+1})
= \exp(margins_i - maxMargin) / (1 + sum) - (1-\alpha(y)\delta_{y, i+1})

where each term in \exp is also smaller than zero, so overflow is not a concern.

For the detailed mathematical derivation, see the reference at http://www.slideshare.net/dbtsai/2014-0620-mlor-36132297

Annotations
@DeveloperApi()
Source
Gradient.scala
Linear Supertypes
Gradient, Serializable, Serializable, AnyRef, Any
Ordering
  1. Alphabetic
  2. By inheritance
Inherited
  1. LogisticGradient
  2. Gradient
  3. Serializable
  4. Serializable
  5. AnyRef
  6. Any
  1. Hide All
  2. Show all
Learn more about member selection
Visibility
  1. Public
  2. All

Instance Constructors

  1. new LogisticGradient()

  2. new LogisticGradient(numClasses: Int)

    numClasses

    the number of possible outcomes for k classes classification problem in Multinomial Logistic Regression. By default, it is binary logistic regression so numClasses will be set to 2.

Value Members

  1. final def !=(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  2. final def !=(arg0: Any): Boolean

    Definition Classes
    Any
  3. final def ##(): Int

    Definition Classes
    AnyRef → Any
  4. final def ==(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  5. final def ==(arg0: Any): Boolean

    Definition Classes
    Any
  6. final def asInstanceOf[T0]: T0

    Definition Classes
    Any
  7. def clone(): AnyRef

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  8. def compute(data: Vector, label: Double, weights: Vector, cumGradient: Vector): Double

    Compute the gradient and loss given the features of a single data point, add the gradient to a provided vector to avoid creating new objects, and return loss.

    Compute the gradient and loss given the features of a single data point, add the gradient to a provided vector to avoid creating new objects, and return loss.

    data

    features for one data point

    label

    label for this data point

    weights

    weights/coefficients corresponding to features

    cumGradient

    the computed gradient will be added to this vector

    returns

    loss

    Definition Classes
    LogisticGradientGradient
  9. def compute(data: Vector, label: Double, weights: Vector): (Vector, Double)

    Compute the gradient and loss given the features of a single data point.

    Compute the gradient and loss given the features of a single data point.

    data

    features for one data point

    label

    label for this data point

    weights

    weights/coefficients corresponding to features

    returns

    (gradient: Vector, loss: Double)

    Definition Classes
    LogisticGradientGradient
  10. final def eq(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  11. def equals(arg0: Any): Boolean

    Definition Classes
    AnyRef → Any
  12. def finalize(): Unit

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  13. final def getClass(): Class[_]

    Definition Classes
    AnyRef → Any
  14. def hashCode(): Int

    Definition Classes
    AnyRef → Any
  15. final def isInstanceOf[T0]: Boolean

    Definition Classes
    Any
  16. final def ne(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  17. final def notify(): Unit

    Definition Classes
    AnyRef
  18. final def notifyAll(): Unit

    Definition Classes
    AnyRef
  19. final def synchronized[T0](arg0: ⇒ T0): T0

    Definition Classes
    AnyRef
  20. def toString(): String

    Definition Classes
    AnyRef → Any
  21. final def wait(): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  22. final def wait(arg0: Long, arg1: Int): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  23. final def wait(arg0: Long): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from Gradient

Inherited from Serializable

Inherited from Serializable

Inherited from AnyRef

Inherited from Any

Ungrouped