Document Frequency(DF)is reported to be a simple yet quite effective measure for feature selection in text classification,which is a key step in processing big textual data collections.The calculation is based on how many documents in a collection contain a feature,which can be a word,a phrase,a n-gram,or a specially derived attribute.It is an unsupervised and class independent metric.Features of the same DF value may have quite different distribution over different categories,and thus have different discriminative power over categories.For example,in a binary classification problem,if feature A only appears in one category,but feature B,which has the same DF value as feature A,is evenly distributed in both categories.Then,feature A is obviously more effective than feature B for classification.To overcome this weakness of the original document frequency feature selection metric,we,therefore,propose a class based document frequency strategy to further refine the original DF to some extent.Extensive experiments on three text classification datasets demonstrate the effectiveness of the proposed measures.Using Class Based Document Frequency to Select Features