Socio-Economic, Physical, Housing, Eviction, and Risk dataset (SEPHER) 2.0

Overview

The Socio-Economic Physical Housing Eviction Risk, version 2 (SEPHER) 2.0 is a census-tract level data set developed to enable research into environmental justice.

Assessing the impact of climate change on vulnerable populations and the implications of such impacts is a critical step toward climate and environmental justice. In general, indices or metrics that aim to study linkages between climatic environmental impacts and vulnerable populations lack housing information. Financially relevant real estate data (e.g., mortgages, evictions) alongside other socio-economic and physical risk information can, however, provide a crucial lens to assess climate justice. In addition, standard socio-economic and demographic variables aggregated at census units lack the granularity required to capture inequalities, especially in heterogeneous communities, so there is a need for publicly available, ready-to-use, digitized and distributed datasets containing relevant inequality metrics using real estate and financial information. Also, studies focusing on damages and financial impacts of climate change often use commercial datasets which must be acquired for hundreds of thousands of dollars, making the inclusion of such information prohibitive for advocacy groups, journalists and other interested people. With this in mind, the Socio-Economic, Physical, Housing, Eviction, and Risk dataset (SEPHER) was created by merging multiple publicly available datasets that include socio-economic, climate risk scores, evictions and housing variables at the census tract level over the United States. The purpose of the SEPHER data set is to allow for testing, assessing and generating new analysis and metrics that can address inequalities and climate injustice.

SEPHER was created to address the lack of comprehensive publicly available datasets needed to support research on climate impacts to vulnerable populations. Existing public datasets and social vulnerability indices (such as CDC Social Vulnerability Index and FEMA National Risk Index) focus on socioeconomic and physical risks to vulnerable populations without including financial or real estate data. There are commercial datasets that include both, but they are expensive and thus not readily available to everyone.

SEPHER was driven by the idea that creating and disseminating a comprehensive dataset, as well as documenting the process for reproducibility, will contribute to a fairer society that will be more informed about who has benefited and been burdened by climate change.

In November 2021, SEPHER 2.0 was created and rebased. SEPHER 2.0 improves on SEPHER published in Tedesco et. al (2021) in terms of standardization and homogenization of variables and an improved data dictionary. Hereafter SEPHER 2.0 is referred to as SEPHER. Tedesco et al. 2021) provides an example of how the SEPHER database can be applied to the Miami metropolitan area.

Download

Download the SEPHER 2.0 data set in CSV (0.62 GB) and US Census Tracts Shapefile (45 MB zip).

Data Dictionary (60 KB XLSX file)

Citation: Tedesco, M., C. Hultquist, S. E. Char, C. Constantinides, T. Galjanic, A. D. Sinha. 2021. Socio-Economic, Physical, Housing, Eviction, and Risk dataset, version 2 (SEPHER 2.0), Preliminary Release. https://doi.org/10.7927/r6yw-xw73. Accessed DAY MONTH YEAR.

Methods

SEPHER draws upon four major source datasets: CDC Social Vulnerability Index, FEMA National Risk Index, Home Mortgage Disclosure Act, and Evictions datasets. They are described in more detail in the Source Data and Variables section below. The data from these source datasets have been merged, cleaned, and standardized and all of the variables documented in the data dictionary.

SEPHER is distributed as a CSV file. A key characteristic of SEPHER is that it is in “wide” format i.e. structured to support geospatial analysis and visualization: each row represents a single US census tract that is uniquely identified by its GEOID/FIPS code and each column represents some variable describing the tract (e.g. socioeconomic information, risk score for exposure to natural disasters, number of mortgages originated or denied or number of eviction judgments issued in a particular year). Since the frequency of observations is annual and the dataset is organized in wide format, there is a separate column for each variable and year with the corresponding variable names distinguished by year. For example, since eviction data covers the period from 2000 to 2016 there are 17 variables (columns) in total for the number of eviction filings per tract with corresponding variable names ranging from EVICTION.FILINGS_2000 to EVICTION.FILINGS_2016.

As SEPHER is intended to be a comprehensive dataset, the majority of the variables from the original source datasets are included in it. The only variables that are left out are those that are duplicates of each other (whether or not they have the same variable name in different datasets) or where most of the data is missing or included in some other variable (e.g. State and County FIPS codes are included in the 11 digit GEOID/FIPS code).

The table below provides reference information about the original datasets that make up SEPHER while the subsequent paragraphs provide more details on each dataset.

Dataset	Documentation and Original Data Source
CDC Social Vulnerability Index	Documentation: https://www.atsdr.cdc.gov/placeandhealth/svi/documentation/SVI_documentation_2018.html Source: https://www.atsdr.cdc.gov/placeandhealth/svi/data_documentation_download.html
FEMA National Risk Index	Documentation: https://hazards.fema.gov/nri/data-resources Source: https://hazards.fema.gov/nri/Content/StaticDocuments/DataDownload//NRI_Table_CensusTracts/NRI_Table_CensusTracts.zip
Home Mortgage Disclosure Act	Documentation: https://files.consumerfinance.gov/hmda-historic-data-dictionaries/lar_record_codes.pdf Source: https://www.consumerfinance.gov/data-research/hmda/historic-data/?geo=fl&records=all-records&field_descriptions=labels
Evictions	Documentation: https://evictionlab.org/docs/Eviction%20Lab%20Methodology%20Report.pdf Source: https://evictionlab.org/get-the-data/

CDC Social Vulnerability Index

CDC Social Vulnerability Index (SVI) dataset is a dataset prepared for the Centers for Disease Control and Prevention for the purpose of assessing the degree of social vulnerability of American communities to natural hazards and anthropogenic events. It contains data on 15 social factors taken or derived from Census reports as well as rankings of each tract based on these individual factors, groups of factors corresponding to four related themes (Socioeconomic, Household Composition & Disability, Minority Status & Language, and Housing Type & Transportation) and overall. The data is available for the years 2000, 2010, 2014, 2016, and 2018.

FEMA National Risk Index

The National Risk Index (NRI) dataset compiled by the Federal Emergency Management Agency (FEMA) consists of historic natural disaster data from across the United States at a tract-level. The dataset includes information about 18 natural disasters including earthquakes, tsunamis, wildfires, volcanic activity and many others. Each disaster is detailed out in terms of its frequency, historic impact, potential exposure, expected annual loss and associated risk. The dataset also includes some summary variables for each tract including the total expected loss in terms of building loss, human loss and agricultural loss, the population of the tract, and the area covered by the tract. It finally includes a few more features to characterize the population such as social vulnerability rating and community resilience.

Home Mortgage Disclosure Act

The Home Mortgage Disclosure Act (HMDA) dataset contains loan-level data for home mortgages including information on applications, denials, approvals, and institution purchases. It is managed and expanded annually by the Consumer Financial Protection Bureau based on the data collected from financial institutions. The dataset is used by public officials to make decisions and policies, uncover lending patterns and discrimination among mortgage applicants, and investigate if lenders are serving the housing needs of the communities. It covers the period from 2007 to 2017.

Evictions

The Evictions dataset is compiled and managed by the Eviction Lab at Princeton University and consists of court records related to eviction cases in the United States between 2000 and 2016. Its purpose is to estimate the prevalence of court-ordered evictions and compare eviction rates among states, counties, cities, and neighborhoods. Besides information on eviction filings and judgments, the dataset includes socioeconomic and real estate data for each tract including race/ethnic origin, household income, poverty rate, property value, median gross rent, rent burden, and others.

References

Tedesco, Marco, Carolynne Hultquist, and Alex de Sherbinin. A new dataset integrating public socioeconomic, physical risk, and housing data for climate justice metrics: A test-case study in Miami. Environmental Justice. Ahead of Print. https://doi.org/10.1089/env.2021.0059 (open access).

Citation

Tedesco, M., C. Hultquist, S. E. Char, C. Constantinides, T. Galjanic, A. D. Sinha. 2021. Socio-Economic, Physical, Housing, Eviction, and Risk dataset, version 2 (SEPHER 2.0), Preliminary Release. https://doi.org/10.7927/r6yw-xw73. Accessed DAY MONTH YEAR.

Disclaimer

This is a preliminary open data release, pending peer review of the data and associated journal articles. Following the peer review process, data curation will be completed by the NASA Socioeconomic Data and Applications Center (SEDAC) and the data will be disseminated through the SEDAC catalog.