< Back to Volume 6, Issue 2, table of contents


Generating reliable tourist accommodation statistics: Bootstrapping regression model for overdispersed long-tailed data

Nguyen Van Truong, University of Transport and Communication, Vietnam & & Japan Transport and Tourism Research Institute, Japan, Tetsuo Shimizu, Tokyo Metropolitan University, Japan, Takeshi Kurihara, Toyo University, Japan, Sunkyung Choi, Tokyo Institute of Technology & Japan Transport and Tourism Research Institute, Japan
Published online: 30 May 2020, JTHSM, 6(2), pp.30-37.

URN: urn:nbn:de:0168-ssoar-67905-4, DOI: 10.5281/zenodo.3837608

Full-text download:

Export reference:
BibTeX,,DataCite,,Dublin Core,,Mendeley

Cite as:

Van Truong, N., Shimizu, T., Kurihara, T., & Choi, S. (2020). Generating reliable tourist accommodation statistics: Bootstrapping regression model for overdispersed long-tailed data. Journal of Tourism, Heritage & Services Marketing, 6(2), 30–37. https://doi.org/10.5281/zenodo.3837608

Generating Reliable Tourist Accommodation Statistics: Bootstrapping Regression Model for Overdispersed Long-Tailed Data


Purpose: Few studies have applied count data analysis to tourist accommodation data. This study was undertaken to investigate the characteristics and to seek for the most fitting models for population total estimation in relation to tourist accommodation data.

Methods: Based on the data of 10,503 hotels, obtained from by a nationwide Japanese survey, the bootstrap resampling method was applied for re-randomisation of the data. Training and test sets were derived by randomly splitting each of the bootstrap samples. Six count models were fitted to the training set and validated with the test set. Bootstrap distributions for parameters of significance were used for model evaluation.

Results: The outcome variable (number of guests), was found to be heterogenous, over dispersed and long-tailed, with excessive zero counts. The hurdle negative binomial and zero-inflated negative binomial models outperformed the other models. The accuracy (se) of the estimation of total guests with training sets that ranged from 5% to 85%, was from 3.7 to 0.4 respectively. Results appear rather overestimated.

Implications: Findings indicated that the integration of the bootstrap resampling method and count regression provide a statistical tool for generating reliable tourist accommodation statistics. The use of bootstrap would help to detect and correct the bias of the estimation.

Keywords: tourism statistics, bootstrap, count regression, heterogeneity, over dispersed data, zero-inflated data  

JEL Classification: C4, L8, C24, Z3

< Previous article in Vol. 6, Iss. 2aaaaaaaaaaaNext article in Vol. 6, Iss. 2 >