Need a perfect paper? Place your first order and save 5% with this code:   SAVE5NOW

Comparison of Regression Models on Housing and Sales Price Prediction

Abstract:

This report compares regression models applied to two distinct datasets: Boston Housing and Sales of Advertising. The purpose is to gauge the efficacy of different regression techniques in forecasting housing prices and sales dependent on advertising budget. Some models considered for evaluation are Linear Regression, Decision Tree Regression ( fully grown and pruned), and K-Nearest Neighbors( KNN). We also perform the analysis and model evaluation to determine the advantages and disadvantages of different regression techniques in light of these diverse datasets. Through the analysis of the performance characteristics of each model, the fully grown tree emerged to be the best model for performing the prediction with lower RMSE and MAE.

Introduction

Regression analysis constitutes a base statistical method applied in different industries to predict a continuous response variable from one or more explanatory variables. It applies at every level of all the state’s economies, financial institutions, social sciences sectors, and even to entities beyond the reach of humankind. It provides explanations and understanding of the variables; thus, the users can make the best decisions if they implement the variables. To this end, the report undertakes a comparative appraisal of regression models utilized in connection with two separate datasets – the aim is to assess the models as price-forecasting and sales-prediction tools.

Under investigation are two datasets that emanate from different domains, each posing unique challenges and prospects for predictive modeling. The first dataset is about housing economics and has been obtained from the U.S. Census Service; its features include housing characteristics in Boston, Massachusetts. This dataset’s dependent variable of focus is median home values (MEDV), a critical factor in real estate assessment and market review.

The second dataset is centered on marketing and advertising, covering the sales figures of a particular product among the various media such as TV, radio, and newspaper. Analyzing the effect of advertising expenses on sales performance is fundamental for companies aiming to improve their marketing strategies and spend their efforts appropriately.

Data Description

Boston Housing Dataset:

In this dataset, the U.S Census Bureau provides information about Boston housing in Massachusetts. The set consists of 13 features like crime rate, proportion of non-retail business acres per town, average number of rooms per dwelling, and median value of owner-occupied homes (MEDV) (Hoxha, 2024). The indicator of the target is the median value of owner-occupied houses (MEDV).

Linear Regression Model: Linear regression model was built using both stats models.api and the learn package. stats models.API results include Beta coefficients and p-values for each predictor were established. P-values assist in determining the significance of every predictor. P-values reveal that some predictors are significant (p < 0.05), and others are not. As an illustration, RM, PTRATI, O, and LSAT have low p-values, which signify they are significant predictors. The actual value, the predicted value, and the residual values for the housing prices are shown below:

The actual value, the predicted value, and the residual values for the housing prices

Advertising Sales Dataset:

The dataset represents the sales figures of a particular product based on advertising spending on three different media: TV, radio, and newspaper. This dataset contains three features: The advertising expenditure of TV, radio, newspapers, and sales. The target variable is the sales. The linear regression model was built using the sklearn package. The model was trained and evaluated using RMSE and MAE. The actual values, predicted values, and residual values are shown below.

The actual values, predicted values, and residual values

Methodology

Boston Housing Dataset:

We implemented Linear Regression using both stats models. API and clear, and Decision Tree Regression with two variations: trees looked full-grown and pruned.

The evaluation metrics of models include Root Mean Squared Error (RMSE) and Mean Absolute Error (MAE)

Advertising Sales Dataset:

The dataset represents the sales figures of a particular product based on advertising spending on three different media: TV, radio, and newspaper sales. The dataset is composed of 3 features: Advertising expenditure on TV, advertising expenditure on radio, advertising expenditure on newspaper, and sales. The dependent variable is the sales amount.

Also, we used K-Nearest Neighbors (KNN) Regression with feature scaling using StandardScaler (Kohli et al., 2020). Model evaluation metrics involve RMSE and MAE.

Results and Discussion

4.1 Boston Housing Dataset:

Linear Regression – RMSE: 4.772599769922687 MAE: 3.1113773882381848

Fully Grown Tree – RMSE: 2.996632096465735 MAE: 2.217647058823529

Pruned Tree – RMSE: 3.673046832398878 MAE: 3.0071469760281424

The best model is: Fully Grown Tree

Fig 4: Shows the regression summary analysis results for Boston Housing

: Shows the regression summary analysis results for Boston Housing

4.2 Advertising Sales Dataset:

Linear Regression – RMSE: 1.7052146229349223 MAE: 1.2748262109549338

Fully Grown Tree – RMSE: 1.8444511378727275 MAE: 1.24

Pruned Tree – RMSE: 2.279765895641708 MAE: 1.7668252387726064

KNN model – RMSE 1.726276339408033 MAE 1.4074999999999993

The best model is Linear Regression

While the Fully Grown Tree model for the Boston Housing dataset and Linear regression analysis prove to be the best performer in this analysis, it is essential to consider other factors such as model complexity, interpretability, and computational efficiency for real-world deployment. Future research could explore ensemble methods or advanced regression techniques to further enhance predictive accuracy (Yuan et al., 2024). Continuous monitoring and validation of the chosen model’s performance are recommended to ensure its effectiveness.

Conclusion

As per the evaluation metrics, the Fully Grown Tree is the best model for the Boston Housing dataset; however, Linear Regression performs the best for the Advertising Sales dataset. This indicates that a regression model may differ across datasets and predictive tasks. This report gives a detailed comparison of regression models employed in predicting housing prices and advertising sales, analyzing the strengths and weaknesses of each method.

Recommendations

Research on feature engineering and selection methods needs to be done further to help improve the models’ predictive performance. Ensemble methods, e.g., Random Forest or Gradient Boosting, may improve the prediction accuracy. Regular monitoring and updating of the predictive models are advisable to maintain relevance and effectiveness.

References:

Hoxha, V. (2024). Comparative analysis of machine learning models in predicting housing prices: a case study of Prishtina’s real estate market. International Journal of Housing Markets and Analysis.

Kohli, S., Godwin, G. T., & Urolagin, S. (2020). Sales prediction using linear and KNN regression. In Advances in Machine Learning and Computational Intelligence: Proceedings of ICMLCI 2019 (pp. 321–329). Singapore: Springer Singapore.

Yuan, S., Kroon, B., & Kramer, A. (2024). Building prediction models with grouped data: A case study on the prediction of turnover intention. Human Resource Management Journal, 34(1), 20-38.

Appendix:

The link below provides the code used for the data preprocessing, model implementation, and evaluations of both datasets. https://colab.research.google.com/drive/1O0wO52jUsDVdk9KOV_46TG6yjOZs_XqY?usp=sharing

 

Don't have time to write this essay on your own?
Use our essay writing service and save your time. We guarantee high quality, on-time delivery and 100% confidentiality. All our papers are written from scratch according to your instructions and are plagiarism free.
Place an order

Cite This Work

To export a reference to this article please select a referencing style below:

APA
MLA
Harvard
Vancouver
Chicago
ASA
IEEE
AMA
Copy to clipboard
Copy to clipboard
Copy to clipboard
Copy to clipboard
Copy to clipboard
Copy to clipboard
Copy to clipboard
Copy to clipboard
Need a plagiarism free essay written by an educator?
Order it today

Popular Essay Topics