Generating reliable tourist accommodation statistics: Bootstrapping regression model for overdispersed long-tailed data
Nguyen Van Truong, University of Transport and Communication, Vietnam & & Japan Transport and Tourism Research Institute, Japan, Tetsuo Shimizu, Tokyo Metropolitan University, Japan, Takeshi Kurihara, Toyo University, Japan, Sunkyung Choi, Tokyo Institute of Technology & Japan Transport and Tourism Research Institute, Japan
Published online: 30 May 2020, JTHSM, 6(2), pp.30-37.
Purpose: Few studies have applied count data analysis to tourist accommodation data. This study was undertaken to investigate the characteristics and to seek for the most fitting models for population total estimation in relation to tourist accommodation data.
Methods: Based on the data of 10,503 hotels, obtained from by a nationwide Japanese survey, the bootstrap resampling method was applied for re-randomisation of the data. Training and test sets were derived by randomly splitting each of the bootstrap samples. Six count models were fitted to the training set and validated with the test set. Bootstrap distributions for parameters of significance were used for model evaluation.
Results: The outcome variable (number of guests), was found to be heterogenous, over dispersed and long-tailed, with excessive zero counts. The hurdle negative binomial and zero-inflated negative binomial models outperformed the other models. The accuracy (se) of the estimation of total guests with training sets that ranged from 5% to 85%, was from 3.7 to 0.4 respectively. Results appear rather overestimated.
Implications: Findings indicated that the integration of the bootstrap resampling method and count regression provide a statistical tool for generating reliable tourist accommodation statistics. The use of bootstrap would help to detect and correct the bias of the estimation.
JEL Classification: C4, L8, C24, Z3