+7 (495) 957-77-43

T-Comm_Article 4_6_2021

Извините, этот техт доступен только в “Американский Английский”. For the sake of viewer convenience, the content is shown below in the alternative language. You may click the link to switch the active language.

CONSTRUCTION OF EXPERT-STATISTICAL MODELS FROM INCOMPLETE DATA

Sergey I. Noskov, Irkutsk State Transport University, Irkutsk, Russia, sergey.noskov.57@mail.ru

Abstract
The article deals with the problem of constructing a linear regression model based on incomplete data containing gaps, using statistical and expert information. The reasons for the gaps in the data can be, in particular, a temporary malfunction (failure) of the measuring equipment when taking various technical characteristics, or negligence in the work of statistical services when fixing the reporting indicators. Very often, gaps arise when processing various kinds of sociological information in the form of questionnaires, when respondents refuse to answer a specific question (but answer others) or give an inadmissible, in particular, evasive answer. The approach proposed in the work involves filling the gaps with intervals, the boundaries of which are formed by experts, guided by both their experience and knowledge about the object of research, and using the well-known methods of point filling in the gaps. After that, the estimation of the parameters of the model, depending on the nature of the initial uncertainty in the data, is reduced to solving problems of linear or partially Boolean linear programming. The case is considered when the solution of the formalizing uncertainty in the initial data of the interval system of linear algebraic equations is not unique. The problem of constructing a linear regression equation for the influence of the volume of export of large-tonnage containers and the freight turnover of the PRC railway transport on the volume of import of large-capacity containers at the Zabaikalsk-Manchuria railway checkpoint is solved.

Keywords: regression model, data uncertainty, gaps, interval system of linear algebraic equations, quasi-solutions, parameter estimates.

References

1. S.I. Noskov (1996). A technology for modeling objects with unstable functioning and uncertainty in data. Irkutsk: Oblinformpechat. 320 p.
2. S.I. Noskov, V.D. Toropov (2005). Formation of initial information and identification of parameters of an expert model of statistical type. Modern technologies. System analysis. Modeling. No. 4. C.103-106.
3. V.B. Golovchenko, S.I. Noskov (1992). The choice of a class of regression linear in parameters based on expert statements. Cybernetics and Systems Analysis. No. 5. P. 109-115.
4. V.B. Golovchenko, S.I. Noskov (1992). Combining forecasts taking into account expert information. Automation and Telemechanics. No. 11. P. 109-117.
5. V.B. Golovchenko, S.I. Noskov (1991). Estimation of the parameters of an econometric model based on statistical and expert information. Automation and Telemechanics. No. 4. P. 123-132.
6. V.B. Golovchenko, S.I. (1993). Noskov Prediction based on a discrete dynamic model using expert information. Automation and Telemechanics. No. 10. P. 140-148.
7. A.S. Mandel (2004). Method of analogues in forecasting short time series: an expert-statistical approach. Automation and telemechanics. No. 4. P. 143-152.
8. D.V. Lisitsyn (2013). Combined regression models for describing data presented in different scales. Collection of scientific papers of the Novosibirsk State Technical University. No. 3 (73). P. 41-48.
9. N. Draper, G. Smith (1981). Applied regression analysis. Moscow: Finance and Statistics. Vol. 1. 366 p., Vol. 2. 351p.
10. R.J. Little, D.B. Rubin (1991). Statistical analysis of data with gaps. Moscow: Finance and Statistics. 334 p.
11. N.G. Zagoruiko, V.N. Elkina, V.S. Temirkaev (1976). Algorithm ZET — 75 for filling the gaps in empirical tables and its application. Machine methods for detecting patterns. Novosibirsk: Nauka. P. 57-63.
12. S.A. Ayvazyan, I.S. Enyukov, L.D. Meshalky (1983). Applied statistics. Basics of modeling and primary data processing. Moscow: Finance and Statistics. 472 p.
13. A.M. Nikiforov (1987). Development and research of statistical methods for pattern recognition with self-learning and processing of incomplete data. Dis … Cand. physical and mathematical sciences. Moscow. 144 p.
14. A.S. Efimov (2010). Solution of the clustering problem by the method of competitive learning with incomplete statistical data. Bulletin of Nizhny Novgorod University. N.I. Lobachevsky. No. 1. P. 220-225.
15. K.V. Ryzhenkova (2012). Methods for restoring missing data in statistical studies. Intellect. Innovation. Investments. No. 3. P. 127-133.
16. S.P. Shary (2019). The problem of recovering dependencies from data with interval uncertainty. Zavodskaya laboratory. Diagnostics of materials. Vol. 86. No. 1. P. 62-74.
17. S.P. Shary, I.A. Sharaya (2013). Recognition of the solvability of interval equations and its applications to data analysis. Computational technologies. Vol. 18. No. 3. P. 80-109.
18. S.P. Shary (2015). Maximum consistency method for data fitting under interval uncertainty. J. of Global Optimization. Vol. 62. No. 3. 16 p.
19. S.I. Noskov (2018). Point characterization of solution sets of interval systems of linear algebraic equations. Information technologies and mathematical modeling in control of complex systems. No. 1. P. 8-13.
20. S.N. Vasiliev, A.P. Seledkin (1980). Synthesis of the efficiency function in multicriteria decision-making problems. Izvestiya AN SSSR. Those. cybernetics. No. 3. P. 186-190.
21. S.I. Noskov, M.P. Bazilevsky (2018). Construction of regression models using the apparatus of linear-Boolean programming. Irkutsk. 176 p.

Information about author:

Sergei I. Noskov, Doctor of Technical Sciences, Professor of the Department of Information Systems and Information Security, Irkutsk State University of Railways, Irkutsk, Russia