FURP Driven Insights: `Focal Loss`

Topic: Focal Loss for Multi-Class Classification

Focal Loss: Designed to address the class imbalance by down-weighting the easy examples even if their number is large.

https://doi.org/10.48550/arXiv.1708.02002

Definition

Basically, Focal Loss is an advanced version of alpha-balanced Cross Entropy Loss:

Where: - refers to the predicted probability of corresponding category(class).

For Binary Classification, is usually calculated with sigmoid;

For Multi-Class Classification, is usually calculated with softmax.

eases the class imbalance problem. by default.
control & splits the difficulty to detect samples, especially decrease the loss of simple samples. by default.

Backgrounds

In real-life scenarios, there lies situations of imbalance between easy-samples and hard-samples, whereas the imbalance between the loss of easy-samples and hard-samples would significantly impact the performance of models: they do have a preference of learning easy ones, and usually ignore the hard ones.

A simple Keras Implementation

Binary Classification

import keras.backend as K
import tensorflow as tf

def binary_focal_loss(gamma = 2, alpha = 0.25):
    """
    Definition:
    FL(p_t) = -alpha * (1 - p_t) ** gamma * log(p_t)

    Reference:
    https://doi.org/10.48550/arXiv.1708.02002

    Sample Usage:
    model.compile(loss = [binary_focal_loss(alpha = 0.25, gamma = 2),
                  metrics = ["accuracy"],
                  optimizer = adam)
    """

    alpha = tf.constant(alpha, dtype = tf.float32)
    gamma = tf.constant(gamma, dtype = tf.float32)

    def binary_focal_loss_fixed(y_true, y_pred):
        """
        Arguments:
        y_true.shape should be (None, 1);
        y_pred should be compute with `sigmoid`;
        """

        y_true = tf.cast(y_true, tf.float32)

        alpha_t = y_true*alpha + (K.ones_like(y_true) - y_true)*(1 - alpha)
    
        p_t = y_true*y_pred + (K.ones_like(y_true) - y_true) * (K.ones_like(y_true - y_pred) + K.epsilon())

        focal_loss = - alpha_t * K.pow((K.ones_like(y_true) - p_t),gamma) * K.log(p_t)

        return K.mean(focal_loss)
    return binary_focal_loss_fixed

Multi Classification

import keras.backend as K
import tensorflow as tf

def multi_focal_loss(alpha, gamma = 2.0):
    """
    Definition:
    FL(p_t) = -alpha * (1 - p_t) ** gamma * log(p_t)

    Reference:
    https://doi.org/10.48550/arXiv.1708.02002

    Sample Usage:
    model.compile(loss = [binary_focal_loss(alpha = [0.25, 0.5, 0.125], gamma = 2),
                  metrics = ["accuracy"],
                  optimizer = adam)

    Note:
    array alpha's size should be the same with the number of categories / classes.
    """

    epsilon = 1e-7
    alpha = tf.constant(alpha, dtype = tf.float32)
    gamma = tf.constant(gamma, dtype = tf.float32)

    def multi_focal_loss_fixed(y_true, y_pred):
        """
        Arguments:
        y_true.shape should be (None, 1);
        y_pred should be compute with `sigmoid`;
        """
        y_true = tf.cast(y_true, tf.float32)
        y_pred = tf.clip_by_value(y_pred,
                                  epsilon,
                                  1 - epsilon)

        y_t = tf.multiply(y_true, y_pred) + tf.multiply(1 - y_true, 1 - y_pred)
        ce = -tf.log(y_t)
        weight = tf.pow(tf.subtract(1, y_t), gamma)

        focal_losses = tf.matmul(tf.multiply(weight, ce), alpha)
        focal_loss = tf.reduce_mean(fl)

        return focal_loss

Comments:

Sadly we're planning to deprecate focal loss's application with in our project... We're more focused on ensemble learnings, including Random Forest, Gradient Boosting Decision Trees and XGBoostClassifiers in the future, and, we're not expecting a perfect outcome of Multi-Layer Perceptions.

However, what' I've learned about keras backends and losses implementations is still a very fruitful experience for me ;)