+7 (495) 957-77-43

article-T-Comm-7-10-2019

Извините, этот техт доступен только в “Американский Английский”. For the sake of viewer convenience, the content is shown below in the alternative language. You may click the link to switch the active language.

THE ONLINE CLASSIFICATION OF THE MOBILE APPLICATIONS TRAFFIC USING DATA MINING TECHNIQUES

Oleg I. Sheluhin, Moscow Technical University of Communications and Informatics, Moscow, Russia, sheluhin@mail.ru
Viacheslav V. Barkov, Moscow Technical University of Communications  and Informatics, Moscow, Russia,
viacheslav.barkov@gmail.com
Sergey A. Sekretarev, Moscow Technical University of Communications and Informatics, Moscow, Russia,
svoboda.vezde@gmail.com

 

Abstract
The article describes the features of the mobile application traffic classification in real time with the use of such algorithms as Adaptive Random Forest (ARF), Hoeffding Adaptive Tree, K nearest neighbors, and Oza Bagging. The comparison of two operating modes is carried out: with «limited» and «unlimited» memory. During the research, the traffic of six popular mobile applications received experimentally was analyzed. About 5,000 TCP connections for each application with the various distribution of traffic intensity were collected. The work considers the cases of the even and continuous traffic flow and also the case when the analyzed flows arrived unevenly. It is shown that the best quality metrics were shown by an Adaptive Random Forest algorithm both at even, and at uneven receipt of classified applications. It is shown that the ARF algorithm considerably surpasses Hoeffding Adaptive Tree, K nearest neighbors, and Oza Bagging algorithms in speed. It was found that in the case of uneven traffic flow, the best quality metric asses results are demonstrated by the accumulation mode with fixed window size.

Keywords:machine learning; applications, accumulation mode, Adaptive Random Forest, Hoeffding Adaptive Tree,  K nearest neighbors, Oza Bagging, classification, online,  data streams.

References

  1. Sheluhin O.I., Erohin S.D., Vanyushina A.V. (2018). IP traffic classification by machine learning methods. M.: Hotline – Telecom, 2018, 284 p.
  2. Erohin S.D., Vanyushina A.V. (2018).The choice of attributes for classifying IP traffic using machine learning methods. T-Comm.
    12., No. 9, pp. 25-29.
  3. Sheluhin O.I., Barkov V.V., Polkovnikov M.V. (2019).Comparative analysis of the algorithms for estimating the number and structure of attributes in the classification problems of mobile applications. High-tech in Earth space research. Vol. 11. No. 2, pp. 90-100. doi: 10.24411/2409-5419-2018-10263
  4. Sheluhin O.I., Barkov V.V. (2018). Influence of background traffic on the effectiveness of mobile applications traffic classification using data mining techniques. T-Comm, vol. 12, no.10, pр. 52-57.
  5. Aggarwal C. (2017). Data Streams: Models and Algorithms. Boston: Springer. Vol. 1. DOI: 10.1007/978-0-387-47534-9
  6. Bifet A., Kirkby R. (2017). Data stream mining. A practical approach. Waikato: The University of Waikato. Vol. 1.
  7. Bifet A., Kirkby R. (2017). Massive online analysis manual. Waikato: The University of Waikato. Vol. 1.
  8. Rajeev T., Santosh K. (2016). A Quick Review of Data Stream Mining Algorithms. Imperial Journal of Interdisciplinary Research. Vol. 2. No. 7, pp. 870-873.
  9.  Mohammed H., Soliman (2010). Data stream mining. Data Mining and Knowledge Discovery Handbook. Ad. M. Oded, R. Lior. New York: Springer, 2010. Vol. 1. С. 231-235.
  10.  Mining data streams (2017). Mining of Massive Datasets. Ad. J. Leskivec, A. Ullman, D. Jeffrey. Cambridge: Cambridge University Press. Vol. 2. С. 131-162. DOI: 10.1017/CBO9781139924801.
  11.  Krempl G. (2014). Open Challenges for Data Stream Mining Research. SIGKDD Explorations. Vol. 18. № 1. P. 10. DOI: 10.1145/2674026.2674028.
  12. Rohit B., Agarwal S. (2016). Stream Data Mining: Platforms, Algоrithms, Performance Evaluators and Research Trends. International Journal of Database Theory and Application. Vol. 9. No. 9,
    201-218. DOI: 10.14257/ijdta.2016.9.9.19.
  13. Fong S., Wong R., Vasilakos A. (2015). Accelerated PSO Swarm Search Feature Selection for Data Stream Mining Big Data. IEEE Transactions on Services Computing, pp. 1-1. DOI: 10.1109/TSC.2015.2439695.
  14. Ueno K. et al. (2006). Anytime Classification Using the Nearest Neighbor Algorithm with Applications to Stream Mining. Sixth International Conference on Data Mining (ICDM’06). DOI: 10.1109/ICDM.2006.21.
  15. Domingos P., Hulten G. (2000). Mining high-speed data streams. Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining — KDD ’00. 2000. DOI: 10.1145/347090.347107.
  16. Yusuf B., Reddy P. (2012). Mining Data Streams using Option Trees. International Journal of Computer Network and Information Security. Vol. 4. No. 8, pp. 49-54. DOI: 10.5815/ijcnis.2012.08.06.
  17. Pfahringer B., Holmes G., Kirkby R. (2007). New Options for Hoeffding Trees. AI 2007: Advances in Artificial Intelligence, pp. 90-99. DOI: 10.1007/978-3-540-76928-6_11.
  18. Gomes H.M. et al. (2017). Adaptive random forests for evolving data stream classification. Machine Learning. Vol. 106. No. 9-10, 1469-1495. DOI: 10.1007/s10994-017-5642-8.
  19. Breiman L. (2001). Random forests. Machine learning. Vol. 45. No. 1, pp. 5-32. DOI: 10.1023/A:1010933404324.
  20. Oza N.C. (2005). Online bagging and boosting. DOI: 10.1109/ICSMC.2005.1571498.
  21. Breiman L. (1996). Bagging predictors. Machine learning.
    24. No. 2, pp. 123-140. DOI: 10.1023/A:1018054314350.
  22. Domingos P., Hulten G. (2000). Mining high-speed data streams. Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2000, pp. 71-80. DOI: 10.1145/347090.347107.
  23. Bifet A. et al. (2010). Fast perceptron decision tree learning from evolving data streams. Pacific-Asia conference on knowledge discovery and data mining. Springer, Berlin, Heidelberg, pp. 299-310. DOI: 10.1007/978-3-642-13672-6_30.
  24. Bifet A. et al. (2009). New ensemble methods for evolving data streams. Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp. 139-148. DOI: 10.1145/1557019.1557041.
  25. Page E.S. (1954). Continuous inspection schemes. Biometrika. 1954. Vol. 41. No. ½, pp. 100-115. DOI: 10.1093/biomet/41.1-2.100.
  26. Bifet A., Gavalda R. (2007). Learning from time-changing data with adaptive windowing. Proceedings of the 2007 SIAM international conference on data mining. Society for Industrial and Applied Mathematics, pp. 443-448. DOI: 10.1137/1.9781611972771.42.

Information about author:
Oleg I. Sheluhin, Professor, d.t.s., head of Information Security department, Moscow Technical University of Communications and Informatics, Moscow, Russia
Viacheslav V. Barkov, Sn. Lecturer of Information Security department, Moscow Technical University of Communications and Informatics, Moscow, Russia
Sergey A. Sekretarev, Master student of Information Security department, Moscow Technical University of Communications and Informatics, Moscow, Russia