APPLICATION OF DATA ANALYSIS AND PREPROCESSING IN PYTHON TO THE HOUSING PRICE PREDICTION PROBLEM
Keywords:
Predicting house prices, data analysis, data preprocessing, Support Vector Regressor, Random Forest Regressor, PythonAbstract
Predictive modeling is one of the most important and widely applicable problems in the field of machine learning. It serves as the foundation for many important applications in human life, ranging from familiar areas such as weather forecasting and price prediction to more complex areas such as disease diagnosis, fraud detection, and autonomous driving. The focus of predictive modeling is to predict the outcome of an event or a variable in the future based on historical data by automatically learning from the data and building a prediction model. This paper focuses on building a model to predict housing prices in Ho Chi Minh City. Through the application of analysis and data preprocessing techniques using Python programming language libraries, the data is cleaned, missing values are handled, duplicates and outliers are addressed, categorical variables are encoded, data is normalized, feature selection and dimensionality reduction are performed. Next, machine learning models are trained to predict housing prices using the Support Vector Regressor (SVR) and Random Forest Regressor (RFR) methods. Experimental results show that RFR is capable of capturing complex and nonlinear relationships, is less affected by outliers and noise, and outperforms SVR in terms of performance.