+7 (495) 957-77-43

Article-11_12-2018

APPLICATION OF STATISTICALLY SIMILAR METHODS TO IMPROVE THE RELIABILITY OF OBJECT CLASSIFICATION

Veniamin N. Tarasov, Povolzhskiy State University of Telecommunications and Informatics, Samara, Russia, tarasov-vn@psuti.ru
Ekaterina M. Mezentseva, Povolzhskiy State University of Telecommunications and Informatics, Samara, Russia, katya-mem@mail.ru
Sergey V. Malakhov,  Povolzhskiy State University of Telecommunications and Informatics, Samara, Russia, malakhov-sv@psuti.ru

Abstract
This article provides an algorithm for partitioning a set of objects into a finite set of classes (categories). The task is to determine whether the object belongs to one of the pre-selected classes based on the analysis of all of the features that characterize the object. To solve the problem, we considered three categories (classes) to which objects should be matched when they are classified. The third is usually the category of “undefined objects” which the classifier did not recognize. The article suggested to use simultaneously two statistically similar methods of analyzing data related to the methods of parametric statistics, the method of Bayes and Fisher. The mathematical description of the Bayesian classifier, which is based on the so-called joined probabilities and the Fisher method, is given. The priori, posteriori probabilities, priori odds, and combined probabilities of the belonging of objects to the given classes are calculated. For the software implementation of the Fisher method, a Gauss quadrature formula with 15 nodes was applied. According to the results of testing the developed filter, the optimal decision thresholds for these classification methods are set. The initial training of the classifier is described and the justification is given that continuous training of the classifier should constantly occur throughout the entire life cycle. The optimality criteria for classifying messages based on statistical methods taking into account errors of the first and second kinds are presented. In this article, as the optimal criterion for assessing the quality of training of a classifier, the maximum value of the measure of proximity of two sets SB and SF is taken, this means the absolute measure N(SB?SF)is the number of common objects in these sets. An algorithm for the software implementation of the intersection of two sets is given. The results of experimental studies to evaluate the performance of message filtering algorithms using Bayes and Fisher methods, each separately and a combined algorithm, as well as the performance of a combined filter are described. The described method of organizing a combined classifier can be used in many areas: in information technology, telecommunications, medicine, biology, etc.

Keywords: object categorization, the probability theory, classification algorithms, Bayesian classifier, Fisher method, priori probability, posteriori probability, decision thresholds, subsets of the intersection of sets, combined classifier.

References

  1. Mezenceva E.M, Tarasov V.N. (2010). Computer networks security. Web programming of the multi-module spam filter. Software Engineering, vol. 4, pp. 27-32.
  2. Peter Seibel. (2005). Practical Common Lisp. New York: Apress. 528 p.
  3. Nikolskiy S. (1974). Quadrature Formula. Moscow: Nauka. 226 р.
  4. Mezenceva E.M, Tarasov V.N. (2013). An optimal filter construction based on combining statistical classifiers. Information and communications technologies, 2013. book 1, vol. 4, pp. 53-57.
  5. Mezenceva E.M, Tarasov V.N. Samarkin M.E. (2017). Deep analysis of the data of a telecommunications company to identify abnormal customers. Problems of Infocommunications Science and Technology (PIC S&T), 13-15 Oct. 2017 4th International Scientific-Practical Conference, Kharkiv, Ukraine, pp. 311-314. DOI: 10.1109/INFOCOMMST.2017.8246404.

Information about authors:
Veniamin N. Tarasov, Povolzhskiy State University of Telecommunications and Informatics, Professor, Software and Management in technical Systems Department, Samara, Russia
Ekaterina M. Mezentseva, Povolzhskiy State University of Telecommunications and Informatics, Assistant Professor, Software and Management in technical Systems Department, Samara, Russia
Sergey V. Malakhov, Povolzhskiy State University of Telecommunications and Informatics, Assistant Professor, Software and Management in technical Systems Department, Samara, Russia