Improvement of population size disaggregation algorithm based on random forest model: A case study of Hebei Province
The population size disaggregation algorithm based on the random forest model is an effective method for revealing the regional distribution patterns and influencing factors. Using Hebei Province as the case study area, this study extended the modeling framework of the population density random forest model optimization scheme by incorporating additional methodological components, including training sample optimization, sample grouping, and test-retest reliability testing. The results are as follows: (1) Evaluating and optimizing the quality of training samples significantly enhanced their representativeness. (2) By partitioning the training samples into experimental and control groups, test-retest reliability testing was implemented. Specifically, the population density prediction dataset of round 8 experiments had good test-retest reliability (r = 0.994), and the population density dataset obtained from round 8, group 2 experiments had good criterion validity (R2 = 0.944). These results significantly outperformed the GHS_POP (R2 = 0.849), and improved the accuracy in downscaling representation of population size data. (3) Zonal statistical aggregation was performed on the population density grid dataset, yielding population size datasets for multi-scale natural geographic units including endowment zones, road buffers, river buffers, and three-level watersheds. This methodological improvement provides a foundational framework for enhanced spatialization of disaggregation data. Through process reconfiguration, a standardized modeling framework was established for the population size disaggregation algorithm based on the random forest model, incorporating label fidelity, mask control, zonal modeling, stratified sampling, optimal sample selection, factor selection, optimal weighted combination, test-retest reliability testing, dasymetric mapping, and criterion validity testing.
