Which metric is most appropriate for evaluating a classification model when the dataset has a severe class imbalance?