Wednesday, June 5, 2019
Geographically Weighted Regression to Model Housing Prices
geographically Weighted atavism to Model Housing PricesIntroductionIn chapter 2, HPM has been employ to molding the relationships mingled with characteristics of property and neighbourhood. However, HPM treats the wholly living accommodations market as a single homogenous market and assumes a stationary handle, i.e the parameter estimates be assumed to apply equally e trulyplace space. This presumes the influences of various factors on family line prices in one post be the equivalent as those in another reparation so that space, place and fix do no matter (Foster refer).However, as shown in Chapter 2, the residuals derived using HPM argon correlated. Additionally, Chapter 3 shows that when MLM address is employed to account for spacial heterogeneity, the effects of those various factors in fact deviate across neighbourhoods at different scales and in that location ar great price differentials between neighbourhoods. The global approach, such as HPM, masks those l ocal anesthetic deviations from this second-rate relationship.Disadvantages of MLMAlthough MLM approach takes into account spacial heterogeneity by specifying the spacial units as levels in the representative, there are some weakness of this approach. Firstly, there is no harmony on the definition of neighbourhoods (Kearns and Parkinson 2001 2103), so the specification of the macro level units (i.e. neighbourhoods) is fairly arbitrary. In the past, census boundaries (),administrative boundaries (.), or school catchment areas (goodman) chip in all been used to demarcate the whole syndicate market into smaller submarkets, or local neighbourhood areas. Some researchers combined a series of dataset, such as travel-to-work, immigration and house price information and constructed a so-called housing market areas (HMAs)(..) . HMAs neither match the census boundaries, or the administrative boundaries, hardly instead, they represents.. . The existence of spacial dependence in geogr aphical data means that the observations that are most spatially dependent in the locations that are close to each other should constitute a neighbourhood. A pre peg downd hierarchy of spatial units based on administrate or census boundaries may not necessarily appropriate.Secondly, MLM1 treats space and assumes that homogeneous spatial process applies in spite of appearance the neighbourhoods and discontinues at the boundaries of the neighbourhoods. (). Additionally, the highest level of spatial units (for example, MSOAs in our analysis) are assumed to be spatially dependent. This assumption is unrealistic because the effect of a neighbourhood is to a greater extent likely change gradually from one neighbourhood to its adjacent ones rather than completely stops, the so-called spill-over effects. Therefore, there might be presence of spatial dependance between MSOAs that MLM is unable to capture.In contract, GWR (Brunsdon et al, 1996..) relaxes the assumptions of the effects of v arious shiftings being constant over space (Dark, 2004,Mitchell, 2005andShi etal., 2006) and treats space as continuous. It calibrates locally a spatially variable coefficient reversal model for each location of the claim area by weight the attri moreoveres of it neighbouring locations based on distance-decay functions (.). The attributes of neighbours of a consortted location are all considered so the spatial settlement and heterogeneity buttocks be taken into account in this approach (Paez 2005). This chapter therefore introduced this type of example technique to explore the spatial variations that may exist in the relationships between house price and its predictors.Purpose and Structure of the ChapterThe aim of this chapter is to identify whether the relationships of house prices and a wrap of characteristics of houses and neighbourhood attributes) are relatively stable, or they vary substantially over space? If there are spatial variations, how does the relationships vary within and between neighbourhoods and how does this variation differ from the entrusts derived from MLM approach? In addition, how good is the GWR approach in terms of its predictive capability, equalityd with MLM.?In the next section, a brief description of this technique is introduced. Section 3 follows with a review of previous applications of GWR is presented. The proposed study in relation to the empirical implementation of the technique then follows in section 4. The final section summarise the relation between GWR and MLM the results and discusses the appropriateness of both techniques.4.2 Brief Description on GWR ModelsWhat is GWR?GWR technique is fully descried by Fotheringham etal., 20022 and just a brief description of the approach is presented here. GWR is a spatial analysis technique that takes into account spatial autocorrelations among the observations in surrounding locations by allowing for spatial nonstationarity in the linear regression coefficients for eac h location. In GWR literature, the location can be a point or an aggregated area.describe local geographical variations in the relationships between a response variable and its instructive variables by a set of local estimates for all the predictors for each geographical location (Fotheringham et al. 2002). A set of estimates and bill errors for each local coefficients are green goodsd by focusing each location in the study region and weighted intercellular substance of its nearby observation.The basic GWR equation can be written as (4.1)Where denotes the coordinates of the th point in a two-dimensional study area is the dependent variable at point , is the estimated intercept at point , ( represents the estimated coefficient for variable at point , is the independent variable of the th parameter at location , and is the error term for the local model at point .The theme of ( is derived using weighted least squares (WLS) regressions (Moore and Myers, 2010 Fotheringham et a l., 2002) by weightiness the observations near location in accordance with their distance to that fit point. It is given bywhere is a stroke matrix denoting the geographical weighting of the observations around the fit point .WeightingThe weighting is based on the distance between the regression location and its close neighbours, defined as bandwidth. The points in closer proximity to location is given more weight and therefore has more influence on the estimation of than the observations that are further away to location . A number of weighting schemes are available, but they tend to be Gaussian or Gaussian-like function, which is the types of dependency generally plant in spatial processes (Forthemham). Two Comm only used distance-decay functions in GWR are Gaussian and Bi-square function (Fotheringham et al. 2002), which are expressed as belowGaussian Bi-squareWhere is the th element of the diagonal of the matrix of the geographical weights , is the bandwidth, a threshol d distance that any observations beyond this distance volition not be used for calibrating the local model, and represents the distance between observation and focus point . When and coincide, the weighting equals to 1.Source Gollini et al (2014) GW model an R Package for Exploring Spatial Heterogeneity using Geographically Weighted Models both(prenominal) functions are continuous up until the bandwidth, but the weights of Bi-square function decrease faster than that of Gaussian function and eventually become zero at the boundary of the bandwidth, while the weights of Gaussian function do not become zero. Both of the weighting functions will be tried in the planned research.BandwidthBandwidths can be specified either as fixed or adaptive (in terms of physical distance). The physical distance for adaptive bandwidth is changeable according to the spatial density so as to capture a fixed nearest neighbours for each local model a shorter distance for areas where observations are de nse and long-lasting distance when data are sparse. The benefit of using adaptive bandwidth is that it can ensure sufficient local information be utilised for areas where observations are spatially scares and reduce the estimate stochastic variable for local coefficient and still reveal subtle local variations where observations are dense (Fotheringham et al. 2002). Therefore, adaptive bandwidth will be used in the planned research as the density of house price data vary geographically.The size of bandwidth affects gradient of the kernel and thus the rate of decay function. A small bandwidth have fewer observations included in the local model and rapid decay whereas a large bandwidth will have more observations in the local model and a smoother weighting scheme. The size of the bandwidth is important as if the bandwidth is too small, although the model would fits better for the local observations, but at the same time local noise may in like manner be fitted thus the local estima tes will have large variances. Conversely, if the bandwidth is too large, although the variances will become smaller, but the estimates of local coefficients are based on a much larger area and result in biased estimates which masks the true local relationships, especially if the relationships vary dramatically over small areas. This is the so-called bias-variance trade-off (Fotheringham et al., 2002)3. The effective number can be used to reflect bias-variance trade-off in GWR, which is a round of the number of observations that have been used effectively for calibrating the local model.Bias-Variance Trade-OffTo find the best bias-variance trade-off, an appropriate weighting function and optimal bandwidth need to be selected. It has been argued that the selection of bandwidth selection is far more important than the weighting scheme as the weighting all decreases as distances increase by all weighting functions but the size of bandwidth decides the degree of decay (Fortherham). The optimization process is generally exploratory and can be very compute-intensive process as it requires all the local regressions fitted at each step4. It can be achieved by either cross-validation order or use corrected Akaike information criterion (AICc) (Fotheringham et al. (2002).Leave-one-out cross-validation (LOOCV) is a commonly used cross-validation method in GWR, where for each local model, it is validated by using all the cases except for one observation and the model is tested on that single observation. The bandwidth which produce the smallest root mean square prediction errors for all the dependent variables of all the local models is deemed as the optimal bandwidth. AICc is an indicator of goodness-of-fit and can be used to analyze competing models while taking into account the complexity of a model. A lower AIC score indicate a better fit of a model. As a rule of thumb, a decrease of 3 in AIC of two competing model score indicates an improvement in the model fit for the model with lower AIC (Fotheringham et al 2002 Zhang etal., 2011).It is common though to get different optimal bandwidth from the two methods as the criteria for optimal is different for AICc and for CV5 and the AIC value is not based on prediction of the conditional variable (6..). In addition, AIC score can be corrected for small sample size, while classical CV method tend to produce under-smoothed result for small sample size7. One thing is note is that AIC should be avoided when the sample size is large as it requires the creation of an n by n matrix 8so the optimization can be very slow9. Both method will be tried out in the planned research.Why Use GWR and when?As mentioned earlier, when there is spatial dependency between variables and spatial non-stationarity, GWR can be used to disaggregate global relations to local levels to obtain a better disposition of spatial data in more details. As every local model is fitted to local observations, it fits better to data than a global model and residuals are generally lower and less spatially dependent. The outputs, the estimates of local coefficient are specific to each location.In Chapter 2, Morans I has been used and indicate that there is statistical significant spatial autocorrelation within both house prices and the residuals of HPM results. This means that the global fitted coefficient value of HPM does not represent detailed location variations adequately and GWR should be used in this instance to taken into account the spatial dependency and examine the heterogeneity in housing market.A review of GWR approach in house price estimationThis section reviews the application of GWR technique with a focus on residential real estate, as well as the comparisons of GWR with a range of other methodologies. The section will conclude with the identification of the research gap and thus the contribution of the current chapter.Application in Real Estate military ratingGWR has been applied to a number of field, including land use (Geniaux et al. 2011.), environment (Harris et al. 2010a), health (Comber et al. 2011, Helbich et al. 2012b, Yang and Matthews 2012 10) and crime studies (Leitner and Helbich 2011), economics (11), regional studies (12) and residential real estate studies (Kestens et al. 2006 Bitter et al. 2007). In terms of the application to real estate, GWR has been used to investigate the effects of the locations and surrounding neighbourhood characteristics, such as ,the effects of accessibility, such as the new jalopy transitway in..((Mulley, 2013), infrastructure availability in .(Cellmer, 2012), and the effects of open space amenities (Nilsson, 2014).GWR has also been used to identify housing sub-markets (Borst Mccluskey, 2007 Crespo Grt-Regamey, 2013 Helbich, Brunauer, Hagenauer, Leitner, 2013).GWR compared with other modelling techniquesGWR has also been compared with a few valuation tools in real estate, such as multiple regression analysis (MRA), simultaneous auto regressive model (SAR), Artificial neural networks (ANN), spatial expansion method (SEM) and Spatial lag model (e.g., Brunsdon et al., 199913 LeSage 199914 (Bitter, Mulligan, Dallerba, 2006 Helbich, Brunauer, Vaz, Nijkamp, 2013 McCluskey, McCord, Davis, Haran, McIlhatton, 2013 Yu, Wei, Wu, 2007).More specifically Bitter, Mulligan, Dallerba (2006) demonstrated in their study that GWR was superior to spatial expansion method ( define briefly .)in terms of predictive accuracy and explanatory power when applied to examine the marginal price of key housing attributes in the Tucson, Arizona housing market. McCluskey, McCord, Davis, Haran, McIlhatton (2013) also showed that GWR outperform MRA, ANN and SAR in term of predictive accuracy, transparency, and cost-effectiveness and offered when applied to 2,694 residual properties in for real estate price estimation. In a case study of spatial heterogeneity in Austria, Helbich, Brunauer, Vaz, et al. (2013) extended GWR to a mixed-GWR(MGWR ), which allows some coefficient to be stationary while others to be non-stationary. This approach is more flexible and parsimonious than standard GWR (Wei and Qi, 2012). Both MGWR and GWR has smaller prediction errors in comparison with a global approach, such as OLS, SAR and spatial two stage least square functioning (S2SLS)15.There are other extensions of GWR. To deal with cross-sectional time series data, GTWR (Huang, Wu, Barry, 2010) was developed to integrate both secular and spatial information in the weighting matrices to capture spatial and temporal dependency and heterogeneity16 . GTWR is able to model spatial and temporal nonstationarity simultaneously and therefore offers a better goodness-of-fit. LeSage (2003) incorporate a Bayesian treatment into GWR in order to improve the estimates of GWR parameters. Contextualized Geographically Weighted Regression (CGWR) was developed by adding contextual variables into standard GWR. The research applied this approach to model s patial heterogeneity in the land parcel prices of Beijing in China and demonstrated that the incorporation of contextual information improved the model fit.However, multicollinearity between explanatory variables may result in unstable results in GWR models and cause more problem for GWR than in a global regression model (Lloyd 2007). Therefore, peak caution should be exercised when analysing the spatial patterns of local coefficients derived from GWR (Wheeler Tiefelsdorf, 2005). A range of diagnostic tools was proposed and usage of PCA to identify the most influential predictors or integrating ridge regression into the GWR framework (D. C. Wheeler, 2007) can help stabilize GWR regression coefficients.There is only limited comparison of GWR with MLM, or random coefficient model (RCM). These two approaches are very different in terms of its underlying assumptions of the spatial process and yielded completely different results in the study of long-term illness in the UK (Brunsdon, A itkin, Fotheringham, Charlton, 1999).There has no published research that compares GWR with MLM in terms of their capability to model spatial heterogeneity of house price data and their predictive accuracy. In addition, although GWR can be applied at any geographic scale of measurement, in practice however, may applications and previous research applied it to an coarsely aggregated scale collectible to the availability of data or keep anonymized information. Unlike previous studies, we have geo-code the location of each house based on its unit postcode location, which only contains typically around 15 residential addresses17. We hope to offer further insight into the geographical variation of the relationships at this detailed level, which previously might be disguised in previous research when the level of analysis was carried out at a much coarser scale.Planned ResearchStandard GWR is applied to the same dataset in chapter two and three, the house price data of the Greater Brist ol area. Two extended version of GWR, GTWR and CGWR, will be explored with the former to capture the temporal dependency and heterogeneity and the later to incorporate contextual information into the model. In GWR and CGWR, the whole dataset will be split into yearly data to avoid the potential temporal autocorrelation within the data. There is no need of doing so in GTWR, as the time of sale has been taken into account in the model.Individual house characteristics are all categorical variables as described in Chapter 2 and will be modelled first and then neighbourhood variables will be added in the succeeding models.The planned procedures and a few methodological issues are addressed as follows. Firstly, before carrying out actual modelling of GWR, whether there is significant spatial autocorrelation within the data, which can be between the response variables and its lagged values or between the explanatory variables and their lagged value. Two most commonly used weighting functi on, Gaussian and Bi-squares functions will be used, although it has been shown that the selection of the weighting function does not have as much an effect on the results as the selection of bandwidth (Fotheringham, Brunsdon, and Charlton 1998). If it is the case, just one weighting function will be used in the subsequent yearly models and the focus will be one the optimization of bandwidth. An adaptive bandwidth is proposed, as there is a good mixture of rural/urban of housing stock in Greater Bristol and the density of the house sales varies dramatically over space. Both CV and AIC will be used to obtain optimal bandwidth and measure model fit as it was shown in the past that the two methods resulted in different optimal bandwidth and regression coefficients (18).Once a weighting function and bandwidth has been selected, the weighting matrix can are defined and used to estimate the coefficient for every location based on equation (4.1) and calibrating local GWR. The standardised r esiduals and the parameters, and their estimated standard errors will be mapped to investigate whether they vary spatially19. This will also be compared with the map of the shrinking estimates of the neighbourhoods (OAs, LSOAs and MSOAs) derived by MLM in previous chapters. It is expected that the mapped patterns of MLM coefficient exhibit more noise than that of GWR, since GWR is essentially a spatially smoothing calibration. All of the model caliberation will be conducted in R, using GWmodel package as this software is free and the process can be easily replicated.Lastly, the predictive accuracy of GWR will be measured and compare with MLM. R squared is used for goodness of fit of the model and it measures the proportion of variation in the data that is explained by the model. Adjusted Rsquared takes into account the complexity of the model in terms of the number of variable that are specified in the model. It is expected that extended version of GWR, GTWR and CGWR, may provide b etter model fit and more accurate predictions based on their previous applications.In the past, there has been criticism that GWR cannot produce confidence intervals (..) and the significance of the estimates for parameters cannot be tested. However, Monte Carlo significance tests have been used to test whether there is significant variability (..) so this test is also planned to test if the spatial variation of the coefficients are statistically significant. hazardous bootstrap approach as suggested by by Hardle (1990) and McMillen (2004) can also be used to produce a weighted average of the variance of the separate parameter estimates.ConclusionGWR generally give much better fits to the data and the residuals are less autocorrelated. Its advantages over MLM is that it no longer treats space as discrete, which more likely resemble the spatial process in reality, and it models both spatial dependency and heterogeneity. In addition, it is essentially a non-parametric approach that d oes not requiring any assumptions with respect to the predictors, which can be categorical or the underlying distributions of the predictors can be highly skewed. There is no need to specify a useable form to produce the estimates of spatially varying parameters (Brunsdon et al 1998). The underlining concept of letting the data speak for themselves make it a good exploratory tool 20 for spatial analysis. This concept is very much similar to another modelling technique, ANN, except that in ANN, there is no implication of nearer locations have more influences on the estimates of local coefficients than locations that are further away as in GWR. This although unlikely in reality, but it might happen. How does GWR compared with ANN will be discussed in the next chapter. striking GWR and ANN a set of estimates of spatially varying parameters WITHOUT specifying a functional form let the data speak for themselves (Chris et al 1998)1 the parameter estimates are assumed to be randomly dist ributed with either a finite (Wedel and Kamakura 2000) or a continuous mixture distribution (Aitkin 1996).2 And Legendre, 19933 Check Bias-variance trade-off MLM (Goldstein 1987) and Ridge Regeression (Hoerl and Kennard 1970a, 1970b)4 check reference Schabenberger and Gotway (2005 316-317) statistical methods for spatial data analysisWaller and Gotway (2004, p434) applied spatial statisticsand Lloyd (2007 pp 79-86) local models for spatial analysis5 http//webhelp.esri.com/arcgisdesktop/9.3/body.cfm?tocVisable=1ID=-1TopicName=Interpreting GWR results6 Housing Sub-markets and Hedonic Price Analysis A Bayesian Approach byDavid C. Wheeler1*, Antonio Pez2*,Lance A. Waller1 and Jamie Spinney3Chapter 4 7 Encyclopedia of Geographic Information Scienceedited by Karen Kemp (p183)8 (gwr.sel spgwr)9 NOTE AIC be applied in non-Gaussian GWR( Local Models for Spatial Analysis, Second Edition By Christopher D. Lloyd) 10 Modelling spatially varying impacts of socioeconomic predictors on mortality o utcomes, J Geograph Syst (2003) 5161184, DOI 10.1007/s10109-003-0099-7, proposed for modelling spatially varying, predictor effects on a disease or mortality count outcome The methodology is illustrated by suicide mortality in 32 London Boroughs over the period 19791993, in terms of area deprivation and a measure of social fragmentation disease mapping methods11 spatial HETEROGENEITY AND THE WAGE CURVE REVISITED*Simonetta Longhi, ISER, Peter Nijkamp12 The Geographic Diversity of U.S. Nonmetropolitan Growth Dynamics A Geographically Weighted Regression Approach Mark D. Partridgey Dan 5. Rickman, Kamar AU, and M, move Olfertte.st for geographic heterogeneity in ihe growth parameters ami compare iliem to global regression estimates. The results indicate significant heterogeneity in the regression coejjkients across the country, most notably for amenities and college graduate shares. V.sing GWR also exposes .signiftimt local variations that are masked by global estimates13 A Compariso n of Random-Coefficient modelling and Modeling and Geographically Weighted Regression for Spatial Non-Stationary Regression Problems, Geographical and Environmental Modeling, 3 (1), 47621
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.