1.1. Background
In the face of escalating climate change impacts, the urgency for implementing policies that mitigate greenhouse gas (GHG) emissions has been underscored [
1]. Urbanization, progressing rapidly, has positioned cities as the epicenter of this global issue. By 2050, cities are projected to accommodate 67% of the world’s population [
2]. Nowadays, cities are the primary consumers of global energy, accounting for 70% of the total, and are the source of over 70% of global carbon dioxide emissions [
3]. Buildings are one of the most significant entities shaping a city’s performance. Thus, buildings are also expected to play a crucial role in the sustainable transformation of cities [
4,
5]. The commercial and residential sectors are responsible for approximately a quarter of the country’s energy consumption in the Canadian context. Buildings alone contribute 13.06% of Canada’s carbon dioxide emissions [
6].
In Quebec, buildings consume 35% of the total energy produced, while in Montreal, it peaks at 88% of the total electric energy consumption. Commercial buildings utilize 47% of carbon-emitting fuel sources representing the highest category among buildings in terms of GHG emissions in Quebec [
7]. In addition, most of Montreal’s building stock dates back to pre-1970 [
8], making the buildings outdated in their building code compliance and energy use efficiency. Therefore, numerous initiatives and incentive programs aimed at building owners have been developed by policymakers, targeting preliminary and feasibility analyses that would accelerate building retrofits [
9,
10].
BEMs play a crucial role in assessing decarbonization approaches and are deemed essential to examine building retrofit strategies, which may allow decision-makers to implement effective actions. Generally, building energy modelling within the building retrofit domain can be classified into top-down and bottom-up approaches. The latter is often identified as the more adequate for engineering applications due to the granular representation of individual buildings compared to aggregated statistical analyses [
11].
Among the bottom-up models are white-box models, which are detailed physics-based and physics-driven, and aim at representing the different building components physically within the model, leading to precise simulations. EnergyPlus and TRNSYS are well-known white-box modelling tools widely used in academia for their accuracy [
12]. Nevertheless, the accuracy of these models brings drawbacks of complex inputs and high computational expenses associated with high-fidelity simulations. Moreover, setting up a building energy model through the referred tools can be time-consuming and cumbersome, requiring significant person-hours. Comprehensive building performance analyses involve many variables requiring substantial computational time and power resources, which physics-based modelling techniques may obstruct due to their computational inefficiency [
13].
Zhang et al. classified building retrofit measure identification methods into three categories: parametric-based, optimization-based, and machine learning-based methods. Parametric-based approaches rely on pre-defined selected alternatives based on expert domain knowledge, which limits the search space [
14]. A study by Saad presented a parametric-based analysis of possible passive design retrofit interventions for existing Egyptian residential buildings reporting a stepped approach selection, where best-performing configurations were selected in each stage [
15]. Feng et al. [
16] presented a parametric-based approach composed of six building retrofit scenarios aimed at helping homeowners to choose the most suitable retrofit plan considering their building conditions and personal preferences. A more comprehensive approach aimed at understanding the effect of different variables on an office building was performed by Charles et al. [
17]. The study examined each building component from infiltration, energy system upgrades, and building envelope on the operational carbon and energy consumption by only investigating one combined scenario. This approach can become infeasible and requires exponential computational capabilities when many building retrofit measures (BRMs) are within scope.
Optimization approaches tackle this problem by introducing the compounded effect of connecting simulation software with an optimization algorithm. Metaheuristic algorithms, such as a genetic algorithm, can identify nearly optimal solutions that balance the objectives. This modelling typology must undergo many runs to achieve the solution and follows preset stopping criteria [
18]. Several studies have outlined this problem as a significant drawback in optimization approaches when combined with a whitebox model [
19].
To tackle the shortcomings of the former approaches, machine learning models (MLMs) are deployed to emulate the physics-based methods and provide a quick approximation outcome. The MLM is trained on sampled databases which are the outputs of the white-box modelling process, to provide a greater efficiency by minimizing the number of required simulations runs, referred to as surrogate modelling since the MLM acts as a surrogate to the white-box model [
20]. The referred setup of models is mainly used to overcome the problem of inexistent pre- and post-retrofit building data [
11]. Surrogate models use various learning algorithms, such as artificial neural networks (ANN), support vector machines (SVM), random forest (RF), multi-variate linear regression (MVLR), and multiple adaptive regression splines (MARS) [
21]. In summary, surrogates can adequately balance computational cost and accuracy.
The determination of the surrogate model type is predominantly influenced by the pursuit of attaining the highest possible accuracy. Nevertheless, there are instances where a compromise is sought between optimizing accuracy and favoring a model structure that exhibits maximum interpretation [
21]. As shown in
Table 1, multiple studies have used MARS and MVLR algorithms due to their simplicity and interpretability in addressing their research problems [
22,
23]. The summary provides an analysis of the current status of the literature in the field for studies that have used MVLR or MARS algorithms.
Regarding the number of samples, it varied drastically from 90 to 10,000 samples. All of the studies with a low number of samples studied a single objective. Xu et al. [
24] developed a surrogate model to determine building upgrades and operating costs. At the same time, Wei et al. [
25] developed a surrogate model to identify the effect of different building forms on the overall energy consumption of the building in cold climates. Only a few studies used feature selection methods with the developed surrogate models, such as forward-stepwise feature selection. Hygh et al. [
23] trained MVLR surrogate models using 16,000 samples from a medium-sized DOE archetype [
26] to predict the annual total heating and cooling loads. The study used stepwise feature selection and demonstrated its importance in improving the model fit to the datasets.
Using variable training sample sizes, Chen et al. [
22] used an MVLR surrogate model to determine indoor comfort levels and daylighting availability in a high-rise residential building. Prada et al. [
27] used a MARS algorithm to develop a surrogate optimization model for several residential typologies reporting the model’s efficiency and effectiveness. Sekhar Roy et al. [
28] reported the fractional computation associated with the flexibility of the MARS algorithm in developing a regression mode for heating and cooling load forecasting. It can be observed from the literature that no studies have developed surrogate models with the purpose of early-phase building retrofit measures (BRMs) analysis through energy, carbon emissions, and cost. In addition, few studies have used feature selection methods, and a comparison among different feature selection methods has not been conducted.
Finally, building energy retrofits encompass a broader array of BRMs, which increases the complexity and widens the solution space leading to a sophisticated, complex task usually performed by consultants [
29]. The broad range of experts relies on experience-based recommendations or a limited number of investigations due to the limited time and the exponential growth that occurs when a new BRM is added. In addition, the building retrofit process is prolonged and can extend over many years, adding complexity to non-technical users’ initial decision-making process. Therefore, developing an affordably computational modelling method that can provide a more comprehensive analysis with many BRMs could advance the body of knowledge and provide outstanding support in decision-making analysis, which can significantly progress efforts towards building decarbonization.
Table 1.
Related studies and identified objectives and features.
Table 1.
Related studies and identified objectives and features.
Ref. | Study Objective | Case Study | Surrogate Model Parameters |
---|
E | M | Ce | Co | D | Building Type | Simulation Tool | Number of Inputs | Number of Outputs | Algorithm | Feature Selection | Validation | Metric | Sampling Method | Number of Samples |
---|
[30] | X | X | | | | O | EP | 7 | 2 | MVLR | Best subsets | - | - | - | - |
[27] | X | | | | | R | TRNSYS | 6 | 2 | MARS, MVLR | - | - | - | Multiple | 2710 |
[31] | | X | | | X | O | Daysim | 15 | 2 | MVLR | - | | | Monte Carlo | 1900 |
[22] | X | | | | | R | EP | 9 | 3 | MVLR, MARS, SVM | - | - | - | - | 5610 |
[32] | X | X | | | | O | EP | 26 | 4 | MVLR | - | - | - | LHS & Monte Carlo | 467 |
[33] | | X | | X | | C | EP | 5 | 2 | MARS | - | | | NA | 7776 |
[24] | | | | X | | O | EP | 6 | 2 | MVLR | - | K-Fold CV | RMSE | - | 90 |
[34] | X | | | | | C | EP | 16 | 2 | MARS, SRC | - | CV | R2 | LHS | 2000 |
[25] | X | | | | | O | - | 8 | 3 | MARS, RF | - | - | RMSE & R2 | LHS | 100 |
[35] | X | X | | | X | O & I | BSim | 14 | 5 | MARS, MVLR | - | CV | R2 | Monte Carlo | 10,000 |
[36] | X | | | | | R | EP | 34 | 1 | MVLR | Forward Stepwise | K-Fold CV | RMSE& R2 | LHS | 6000 |
[23] | X | | | | | O | EP | 27 | 1 | MVLR | Forward Stepwise | CV | RMSE&R2 | Monte Carlo | 20,000 |
[37] | X | X | | | | R | EP | 25 | 4 | MVLR | BE & RFE | - | RMSE&R2 | - | 8760 |
1.2. Research Objectives and Contributions
Building retrofits are comprehensive in scope and require extensive analyses of the effect of BRMs. The resultant combinatorial problem can be significantly costly computationally for the decision-makers’ initial assessment, specifically for a comprehensive evaluation that includes energy consumption, carbon emissions, and the associated cost. The problem results in a limited scope of the undertaken investigation that is hindered by computational capabilities. In addition, surrogate models that represent a possible solution for the above problem have been limited in the analysis with no focus on the combination of energy, carbon emissions, and cost, representing critical facets of decision making. The survey of existing literature revealed a notable deficiency in research examining the impact of feature engineering and selection methods within the realm of building performance. This gap in understanding leaves unexplored opportunities for optimally configuring models with fewer input features.
It can be noted that the development of surrogate and conventional building performance models has been explored in previous studies, as shown in
Table 1. However, studying the effect of model parameters on the surrogate model accuracy is limited in the building research domain. To the authors’ knowledge, no studies have investigated a comparative analysis of feature selection methods to identify the best feature selection method per target output for multiple facets of the building performance. In addition, few or no studies have investigated the modelling energy, cost, and carbon emissions altogether.
Therefore, this study aims to investigate the applicability of developing a surrogate model to emulate the conventional building performance model and expand its prediction to include cost and carbon emissions. The study aims to explore the performance of interpretable surrogate models such as MVLR and MARS algorithms and, in that process, provide a comprehensive investigation of the effect of the different feature engineering and selection methods to develop a clear understanding that can be recreated and developed in future studies.
In this study, a bottom-up conventional building performance model is developed for a reference building in Montreal, Canada. A robust high-fidelity simulation model is designed to report building performance when examining a wide-ranging BRM. In the meantime, the data generated by the physics-based model are used to develop a surrogate building performance model that considers multiple objectives, namely energy consumption, carbon emissions, and cost.
The paper’s scientific contribution can be divided into the following:
The study incorporates many investigated BRMs and HVAC systems to study their effect on building performance, laying the path for a comprehensive surrogate model. The combinatorial effect of the investigated complex problem is determined, which helps to illustrate the evident deficiency of physics-based simulation models regarding computational requirements.
The study proposes a methodology for developing interpretable surrogate models for predicting multiple facets of the building performance, while investigating optimum configurations through feature engineering and selection coupling.
The study presents a comprehensive analysis of the effects of various surrogate model parameters, such as feature selection, on the models’ accuracy and evolution.
The study proposes novel adjusted feature selection methods that are modified to optimize the performance of the models, stabilize their prediction capabilities, and address their deficiencies.
Considering the addressed research problems and contributions, this paper can be regarded as a value-added scientific paper to the existing body of knowledge.