This article will be permanently flagged as inappropriate and made unaccessible to everyone. Are you certain this article is inappropriate? Excessive Violence Sexual Content Political / Social
Email Address:
Article Id: WHEBN0007466947 Reproduction Date:
In machine learning, multi-label classification and the strongly related problem of multi-output classification are variants of the classification problem where multiple target labels must be assigned to each instance. Multi-label classification should not be confused with multiclass classification, which is the problem of categorizing instances into one of more than two classes. Formally, multi-label learning can be phrased as the problem of finding a model that maps inputs x to binary vectors y, rather than scalar outputs as in the ordinary classification problem.
There are two main methods for tackling the multi-label classification problem:^{[1]} problem transformation methods and algorithm adaptation methods. Problem transformation methods transform the multi-label problem into a set of binary classification problems, which can then be handled using single-class classifiers. Algorithm adaptation methods adapt the algorithms to directly perform multi-label classification. In other words, rather than trying to convert the problem to a simpler problem, they try to address the problem in its full form.
Several problem transformation methods exist for multi-label classification; the baseline approach, called the binary relevance method,^{[2]}^{[1]} amounts to independently training one binary classifier for each label. Given an unseen sample, the combined model then predicts all labels for this sample for which the respective classifiers predict a positive result. This method of dividing the task into multiple binary tasks has something in common with the one-vs.-all (OvA, or one-vs.-rest, OvR) method for multiclass classification. Note though that it is not the same method: in binary relevance we train one classifier for each label, not one classifier for each possible value for the label.
Various other transformations exist. Of these, the label powerset (LP) transformation creates one binary classifier for every label combination attested in the training set.^{[1]} The random k-labelsets (RAKEL) algorithm uses multiple LP classifiers, each trained on a random subset of the actual labels; prediction using this ensemble method proceeds by a voting scheme.^{[3]}
Classifier chains are an alternative ensembling method ^{[2]} that have been applied, for instance, in HIV drug resistance prediction.^{[4]}
Some classification algorithms/models have been adaptated to the multi-label task, without requiring problem transformations. Examples of these include:
The extent to which a dataset is multi-label can be captured in two statistics:^{[1]}
Evaluation metrics for multi-label classification performance are inherently different from those used in multi-class (or binary) classification, due to the inherent differences of the classification problem. If T denotes the true set of labels for a given sample, and P the predicted set of labels, then the following metrics can be defined on that sample:
Cross-validation in multi-label settings is complicated by the fact that the ordinary (binary/multiclass) way of stratified sampling will not work; alternative ways of approximate stratified sampling have been suggested.^{[9]}
Java implementations of multi-label algorithms are available in the Mulan and Meka software packages, both based on Weka.
The scikit-learn python package implements some multi-labels algorithms and metrics.
A list of commonly used multi-label data-sets is available at the Mulan website.
Multi-label classification, Machine learning, Statistical classification, Binary classification, Heuristic
Statistics, Machine learning, Computer science, Regression analysis, Statistical classification
Evolution, Rna, Syphilis, Lentivirus, Cancer
Machine learning, Linear classifier, Perceptron, Hyperplane, Online machine learning
Categorization, Ted Nelson, Knowledge, Hypertext, Ubiquitous computing
Statistical classification, Alopex, (1 Ε)-approximate Nearest Neighbor Search, AdaBoost, Alternating decision tree