1 Introduction

This study presents a new model for predicting food inspection failures in Chicago with 61% accuracy and 70% specificity. The model helps to direct limited health inspectors towards food establishments with a higher likelihood of failure, optimizing manpower and resource allocation.

With fewer than 40 health inspectors in charge of over 15,000 food establishments across Chicago, the Department of Public Health is seeking a better way to optimize the allocation of manpower and resources. To direct limited manpower towards food establishments most in need of inspections, this study presents a new model that predicts inspection results.

The model is a logistic regression model that predicts whether food establishments would fail the inspection based on data collected from previous inspections, and open data from the Chicago Data Portal. The results inform a cost-benefit analysis that considers the costs of deploying inspectors and the gains from avoiding foodborne illness outbreaks.

Using 2022 inspection data to predict inspection results for 2023, the model has an accuracy of 61% and specificity of 70%. The high specificity is ideal, as the model focuses on predicting inspection failures (which we define as the negative result in the logistic model). The model is generalizable across neighborhoods, but predicts differently in terms of types of food establishments.

The model is used to build the app ChicaGo! made by the data analytics team of the Chicago Public Health Department, to make full use of the limited manpower while ensuring quality inspections. The app is designed for two groups: (1) for the management team to optimise allocations, and (2) for the inspectors to complete inspections quickly. To learn more about the application, please refer to this Youtube video.

2 Data collection and wrangling

2.1 Spatial structure

2022 census data on total population, median household income, total white population, and total graduates were obtained from the U.S. Census Bureau using API, while data on Chicago’s neighborhood tracts were taken from Github.

# Boundary polygon
chicagoBoundary <- 
  st_read(file.path(root.dir,"/Chapter5/chicagoBoundary.geojson")) %>%
  st_transform(CRS_SYSTEM) 

# Neighborhoods
neighborhoods <-
  st_read("https://raw.githubusercontent.com/blackmad/neighborhoods/master/chicago.geojson") %>%
  st_transform(CRS_SYSTEM) 

# Define ACS Variables to get from census API
ACS_VARS <- c("B01001_001", # Total Population
              "B01001A_001", # Total White
              "B19013_001E", # Median HH Income ($)
              "B06009_006E") # Total graduate or professional

# Census data
tracts22 <- 
  get_acs(geography = "tract", 
          variables = ACS_VARS, 
          year=2022, 
          state = "IL", 
          county = "Cook",
          geometry=TRUE, 
          output="wide") %>%
  st_transform(CRS_SYSTEM) %>%
  rename(Population = B01001_001E, # Total population
         White = B01001A_001E, # Total white
         Income = B19013_001E, # Median household income
         Grad = B06009_006E) %>% # Total graduate or professional
  dplyr::select(-starts_with("B"))

2.2 Food inspection

Open data on Chicago’s food inspection data was obtained from Chicago Data Portal. Inspection cases that indicate “no entry”, “out of business”, “not ready”, and “business not located” were removed, while inspections that state “pass with conditions” were categorized with “fail”.

Considering that there was a change in food inspection procedures starting from 7 January 2018, we use data from more recent years (2020 to 2023). From this data, we derive information on the number of days since the last inspection and the number of previous failures for each business inspected. Finally, 2022 data is extracted to build the model, while 2023 data is extracted to test the model.

# Inspection data from https://data.cityofchicago.org/
inspection <- 
  read.socrata("https://data.cityofchicago.org/Health-Human-Services/Food-Inspections/4ijn-s7e5") 

# Data from 2020 to 2023
inspection.recent <- inspection %>%
  mutate(inspection_date=ymd(inspection_date)) %>%
  # ymd: year, month, day
  filter(inspection_date >= "2020-01-01" & inspection_date < "2024-01-01") %>%
  # Remove entries with no entry, out of business, not ready and business not located
  # Recategorise pass w/ conditions as fail
  filter(results == c("Fail","Pass","Pass w/ Conditions"))
inspection.recent$results[inspection.recent$results == "Pass w/ Conditions"] = "Fail"

# Add numeric column on results
inspection.recent <- inspection.recent %>%
  mutate(results_numeric = ifelse(results == "Pass",1,0))
# Clean data
inspection.clean <- inspection.recent %>%
    mutate(x = gsub("[()]", "", location)) %>% 
    separate(x,into= c("Y","X"), sep=",") %>%
    mutate(X = as.numeric(X),Y = as.numeric(Y)) %>% 
    na.omit() %>%
    st_as_sf(coords = c("X", "Y"), crs = 4326, agr = "constant")%>%
    st_transform(CRS_SYSTEM) %>% 
    distinct()

# Exclude points that fall beyond buffer of chicagoBoundary (in the airport)
inspection.clean <- st_intersection(inspection.clean,st_buffer(chicagoBoundary,660))
inspection.clean <- inspection.clean %>%
  arrange(license_, inspection_date) %>% # sort by license and date
  group_by(license_) %>% # group by license
  mutate(
    # Days since last inspection
    days_since_last = as.numeric(inspection_date - lag(inspection_date, default = NA)),
    # Number of previous fails
    fails = cumsum(lag(results == "Fail", default = FALSE))) %>%
  ungroup()

# Filter out 2022 to 2023
inspection.clean <- inspection.clean %>%
  filter(inspection_date >= "2022-01-01")

# If license is 0, define as 0
inspection.clean$days_since_last[inspection.clean$license_ == 0] <- 0
inspection.clean$fails[inspection.clean$license_ == 0] <- 0

# Change NA to 0
inspection.clean$days_since_last[is.na(inspection.clean$days_since_last)] <- 0

2.3 Features

Open data on Chicago’s liquor establishments, vacant buildings, 311 reports of sanitation code violations, and 311 reports of three or more street lights going out were sourced from Chicago Data Portal using API. The average distances of each food inspection record to the nearest features (neighbors) are calculated, where the number to account for depends on the nature and the number of points each feature has. For example, the nearest 2 liquor establishment are considered, while the nearest 30 cases of the many sanitation violation reports are taken into account.

liquor <- 
  read.socrata("https://data.cityofchicago.org/Community-Economic-Development/Business-Licenses-Current-Liquor-and-Public-Places/nrmj-3kcf") %>%
    dplyr::select(Y = LATITUDE, X = LONGITUDE, LICENSE.ID, LICENSE.CODE, LICENSE.DESCRIPTION, BUSINESS.ACTIVITY.ID, BUSINESS.ACTIVITY, LICENSE.NUMBER) %>%
    drop_na(X) %>%
    st_as_sf(coords = c("X", "Y"), crs = 4326, agr = "constant")%>%
    st_transform(CRS_SYSTEM) %>%
    mutate(Legend = "Liquor")

sanitation <- 
  read.socrata("https://data.cityofchicago.org/Service-Requests/311-Service-Requests-Sanitation-Code-Complaints-No/rccf-5427") %>%
    dplyr::select(Y = Latitude, X = Longitude) %>%
    drop_na(X) %>%
    st_as_sf(coords = c("X", "Y"), crs = 4326, agr = "constant")%>%
    st_transform(CRS_SYSTEM) %>%
    mutate(Legend = "Sanitation")
  
streetLights <- 
   read.socrata("https://data.cityofchicago.org/Service-Requests/311-Service-Requests-Street-Lights-All-Out-No-Dupl/756k-itxx") %>%
    dplyr::select(Y = Latitude, X = Longitude) %>%
    drop_na(Y) %>%
    st_as_sf(coords = c("X", "Y"), crs = 4326, agr = "constant")%>%
    st_transform(CRS_SYSTEM) %>%
    mutate(Legend = "Street_lights_out")

vacantBuildings <- 
  read.socrata("https://data.cityofchicago.org/Buildings/Vacant-and-Abandoned-Buildings-Violations/kc9i-wq85") %>%
    dplyr::select(Y = latitude, X = longitude) %>%
    na.omit() %>%
    st_as_sf(coords = c("X", "Y"), crs = 4326, agr = "constant")%>%
    st_transform(CRS_SYSTEM) %>%
    mutate(Legend = "Vacant_buildings")
# By distance (nearest neighbor)
inspection.clean <-inspection.clean %>%
    mutate(
      liquor_nn2 = nn_function(st_coordinates(inspection.clean), 
                              st_coordinates(liquor), k = 2)) %>%
    mutate(
      sanitation_nn30 = nn_function(st_coordinates(inspection.clean), 
                              st_coordinates(sanitation), k = 30)) %>%
    mutate(
      streetLights_nn10 = nn_function(st_coordinates(inspection.clean), 
                              st_coordinates(streetLights), k = 10)) %>%
    mutate(
      vacantBuildings_nn2 = nn_function(st_coordinates(inspection.clean), 
                              st_coordinates(vacantBuildings), k = 2))

Weather data at Chicago O’Hare International Airport (ORD) in 2022 and 2023 was obtained from the Iowa Environmental Mesonet (IEM). The two-day average high temperature is calculated for each inspection date.

weather <- read.csv("Data/ORD.csv") %>%
  drop_na() # remove NA

# Change to datetime and only keep the date
weather <- weather %>%
  mutate(date = date(ymd_hm(valid)))

# Highest temp per day
maxTemp <- weather %>% 
    group_by(date) %>% 
    summarize(max = max(tmpc))

# Convert to xts object
maxTemp_xts <- xts(maxTemp$max, order.by = maxTemp$date)

# Average highest temp of 2 days (rolling avg)
temp2_xts <- rollmean(maxTemp_xts, k = 2, align = "right", fill = NA)

# Convert back to df
temp2 <- data.frame(date=index(temp2_xts), temp=coredata(temp2_xts))

# Join to 2022 and 2023 inspection data
inspection.clean <- left_join(inspection.clean, temp2, by=c("inspection_date"="date"))

3 Data analysis

3.1 Food inspection

Food establishments that fail the inspection tend to cluster a bit more towards the southwest and the city center in the northeast. Nevertheless, the distribution of food inspections over space are quite similar no matter the result, suggesting that beyond spatial features, temporal features and internal food establishments characteristics should also be considered.

# Data from 2022
inspection.2022 <- inspection.clean %>%
  filter(inspection_date >= "2022-01-01" & inspection_date < "2023-01-01")

# Points map
ggplot() + 
  geom_sf(data = chicagoBoundary, fill = "grey30") +
  geom_sf(data = inspection.2022, aes(color=results), size=0.1, show.legend = "point") +
  scale_color_manual(values = paletteRG) +
  facet_wrap(~results) +
  labs(title= "Food inspection results in Chicago, 2022",
       caption = "Figure 1. Source: City of Chicago.") +
  theme_void()+
  theme(plot.title = element_text(size = 12, face = "bold"),
        legend.position = "none",
        strip.text = element_text(hjust=0, size = 10))

3.2 Continuous features

Visualizing nearby features (nearest neighbor) over space, they exhibit a similar pattern as food inspections — mostly distributed across Chicago with some clustering in the northeast, where the city center is.

inspection.2022 %>%
  dplyr::select(geometry,liquor_nn2,sanitation_nn30,streetLights_nn10,vacantBuildings_nn2) %>%
  gather(Variable, value,-geometry) %>%
  ggplot() + 
  geom_sf(data = chicagoBoundary, fill = "grey30") +
  geom_sf(aes(color=Variable), size=0.1, show.legend = "point") +
  scale_color_manual(values = palette4) +
  facet_wrap(~Variable,nrow=1) +
  labs(title= "Nearest neighbor features, 2022",
       caption = "Figure 2") +
  theme_void()+
  theme(plot.title = element_text(size = 12, face = "bold"),
        legend.position = "none",
        strip.text = element_text(hjust=0, size = 10,margin=margin(10,0,5,0)))

Including all other continuous features, such as the number of previous failures, these features are grouped by inspection result (pass and fail) and are visualized by mean (Figure 3). Useful features are those that exhibit significant differences across failing and passing the inspection.

In general, food establishments that have not been inspected in a while and have more previous failures are more likely to fail the inspection. When temperatures and number of nearby vacant buildings are lower, failure rates are higher. Slightly fewer 311 reports and more nearby liquor establishments are slightly more associated with failures.

# Numerical/ continuous data
continuous <- inspection.2022 %>%
  dplyr::select(results,liquor_nn2,sanitation_nn30,streetLights_nn10,vacantBuildings_nn2,temp,days_since_last,fails) %>%
  st_drop_geometry()

# Continuous features bar graphs (by mean)
continuous %>%
  gather(Variable, value, -results) %>%
  ggplot(aes(results, value, fill=results)) +
  geom_bar(position = "dodge", stat = "summary", fun = "mean",width = 0.5) + 
  facet_wrap(~Variable, scales = "free", nrow = 2) +
  scale_fill_manual(values = paletteRG) +
  labs(x="Food inspection", y="Mean", 
       title = "Feature associations with the likelihood of passing the food inspection, 2022",
       subtitle = "Bar graphs of continous features",
       caption = "Figure 3") +
  theme_minimal()+
  theme(legend.position = "none",
        panel.grid.major.x = element_blank(),
        plot.title = element_text(face="bold"),
        plot.caption = element_text(size=10),
        strip.text = element_text(size = 10, hjust=0, face = "bold"),
        axis.title.x = element_text(margin=margin(10,0,0,0)),
        axis.title.y = element_text(margin=margin(0,10,0,0)),
        aspect.ratio = 2/3)

3.3 Categorical features

Focusing on categorical features, such as the facility and inspection types, these features are grouped by inspection result (pass and fail) and are plotted by count (Figure 4). There are almost 150 facility types and 15 inspection types; these two features have both been recategorized into 7 types for clearer visualization and to avoid class imbalances when building the model.

Looking at the inspection types, there is a clear trend with higher failure rates for first inspections and fewer failures for re-inspections. For the risk category feature, risk is defined based on internal food establishment characteristics. For instance, all day care centers looking after children and elderly (who have weaker immune systems) are deemed risk 1. Risk 1 establishments are inspected twice a year, Risk 2 establishments once per year, and Risk 3 establishments once every two years.

# Recategorisation
# Cleaning the many facility types
inspection.clean$facility_cleaned = inspection.clean$facility_type

inspection.clean$facility_cleaned[grepl("roof", inspection.clean$facility_cleaned, ignore.case = T)] <- "Restaurant"
inspection.clean$facility_cleaned[grepl("banquet", inspection.clean$facility_cleaned, ignore.case = T)] <- "Cater"
inspection.clean$facility_cleaned[grepl("diner", inspection.clean$facility_cleaned, ignore.case = T)] <- "Restaurant"
inspection.clean$facility_cleaned[grepl("sushi", inspection.clean$facility_cleaned, ignore.case = T)] <- "Restaurant"

inspection.clean$facility_cleaned[grepl("poultry", inspection.clean$facility_cleaned, ignore.case = T)] <- "Poultry"
inspection.clean$facility_cleaned[grepl("slaughter", inspection.clean$facility_cleaned, ignore.case = T)] <- "Poultry"
inspection.clean$facility_cleaned[grepl("butcher", inspection.clean$facility_cleaned, ignore.case = T)] <- "Poultry"

inspection.clean$facility_cleaned[grepl("coffee", inspection.clean$facility_cleaned, ignore.case = T)] <- "Snack"
inspection.clean$facility_cleaned[grepl("liquor", inspection.clean$facility_cleaned, ignore.case = T)] <- "Liquor"
inspection.clean$facility_cleaned[grepl("brewery", inspection.clean$facility_cleaned, ignore.case = T)] <- "Liquor"
inspection.clean$facility_cleaned[grepl("tavern", inspection.clean$facility_cleaned, ignore.case = T)] <- "Liquor"
inspection.clean$facility_cleaned[grepl("club", inspection.clean$facility_cleaned, ignore.case = T)] <- "Liquor"

inspection.clean$facility_cleaned[grepl("grocery", inspection.clean$facility_cleaned, ignore.case = T)] <- "Grocery"
inspection.clean$facility_cleaned[grepl("mobil", inspection.clean$facility_cleaned, ignore.case = T)] <- "Mobile"
inspection.clean$facility_cleaned[grepl("kiosk", inspection.clean$facility_cleaned, ignore.case = T)] <- "Mobile"
inspection.clean$facility_cleaned[grepl("pop", inspection.clean$facility_cleaned, ignore.case = T)] <- "Mobile"
inspection.clean$facility_cleaned[grepl("push", inspection.clean$facility_cleaned, ignore.case = T)] <- "Mobile"
inspection.clean$facility_cleaned[grepl("booth", inspection.clean$facility_cleaned, ignore.case = T)] <- "Mobile"

inspection.clean$facility_cleaned[grepl("care", inspection.clean$facility_cleaned, ignore.case = T)] <- "Care"
inspection.clean$facility_cleaned[grepl("day", inspection.clean$facility_cleaned, ignore.case = T)] <- "Care"
inspection.clean$facility_cleaned[grepl("assisted", inspection.clean$facility_cleaned, ignore.case = T)] <- "Care"
inspection.clean$facility_cleaned[grepl("services", inspection.clean$facility_cleaned, ignore.case = T)] <- "Care"
inspection.clean$facility_cleaned[grepl("years", inspection.clean$facility_cleaned, ignore.case = T)] <- "Care"
inspection.clean$facility_cleaned[grepl("support", inspection.clean$facility_cleaned, ignore.case = T)] <- "Care"
inspection.clean$facility_cleaned[grepl("after", inspection.clean$facility_cleaned, ignore.case = T)] <- "Care"
inspection.clean$facility_cleaned[grepl("youth", inspection.clean$facility_cleaned, ignore.case = T)] <- "Care"
inspection.clean$facility_cleaned[grepl("shelter", inspection.clean$facility_cleaned, ignore.case = T)] <- "Care"

inspection.clean$facility_cleaned[grepl("shake", inspection.clean$facility_cleaned, ignore.case = T)] <- "Snack"
inspection.clean$facility_cleaned[grepl("milk", inspection.clean$facility_cleaned, ignore.case = T)] <- "Snack"
inspection.clean$facility_cleaned[grepl("herbal", inspection.clean$facility_cleaned, ignore.case = T)] <- "Snack"

inspection.clean$facility_cleaned[grepl("ice", inspection.clean$facility_cleaned, ignore.case = T)] <- "Snack"
inspection.clean$facility_cleaned[grepl("donut", inspection.clean$facility_cleaned, ignore.case = T)] <- "Snack"
inspection.clean$facility_cleaned[grepl("bake", inspection.clean$facility_cleaned, ignore.case = T)] <- "Snack"
inspection.clean$facility_cleaned[grepl("cafe", inspection.clean$facility_cleaned, ignore.case = T)] <- "Snack"
inspection.clean$facility_cleaned[grepl("paleteria", inspection.clean$facility_cleaned, ignore.case = T)] <- "Snack"

inspection.clean$facility_cleaned[grepl("gas", inspection.clean$facility_cleaned, ignore.case = T)] <- "Gas"

inspection.clean$facility_cleaned[grepl("culinary", inspection.clean$facility_cleaned, ignore.case = T)] <- "Culinary"
inspection.clean$facility_cleaned[grepl("cooking", inspection.clean$facility_cleaned, ignore.case = T)] <- "Culinary"
inspection.clean$facility_cleaned[grepl("pastry", inspection.clean$facility_cleaned, ignore.case = T)] <- "Culinary"
inspection.clean$facility_cleaned[grepl("share", inspection.clean$facility_cleaned, ignore.case = T)] <- "Culinary"
inspection.clean$facility_cleaned[grepl("chef", inspection.clean$facility_cleaned, ignore.case = T)] <- "Culinary"

inspection.clean$facility_cleaned[grepl("stadium", inspection.clean$facility_cleaned, ignore.case = T)] <- "Event"
inspection.clean$facility_cleaned[grepl("event", inspection.clean$facility_cleaned, ignore.case = T)] <- "Event"
inspection.clean$facility_cleaned[grepl("hall", inspection.clean$facility_cleaned, ignore.case = T)] <- "Event"
inspection.clean$facility_cleaned[grepl("theat", inspection.clean$facility_cleaned, ignore.case = T)] <- "Event"
inspection.clean$facility_cleaned[grepl("venue", inspection.clean$facility_cleaned, ignore.case = T)] <- "Event"

inspection.clean$facility_cleaned[grepl("school", inspection.clean$facility_cleaned, ignore.case = T)] <- "School"

inspection.clean$facility_cleaned[grepl("exercise", inspection.clean$facility_cleaned, ignore.case = T)] <- "Exercise"
inspection.clean$facility_cleaned[grepl("rehab", inspection.clean$facility_cleaned, ignore.case = T)] <- "Exercise"
inspection.clean$facility_cleaned[grepl("gym", inspection.clean$facility_cleaned, ignore.case = T)] <- "Exercise"
inspection.clean$facility_cleaned[grepl("fitness", inspection.clean$facility_cleaned, ignore.case = T)] <- "Exercise"

inspection.clean$facility_cleaned[grepl("store", inspection.clean$facility_cleaned, ignore.case = T)] <- "Store"
inspection.clean$facility_cleaned[grepl("dollar", inspection.clean$facility_cleaned, ignore.case = T)] <- "Store"
inspection.clean$facility_cleaned[grepl("candy", inspection.clean$facility_cleaned, ignore.case = T)] <- "Store"

inspection.clean$facility_cleaned[grepl("cater", inspection.clean$facility_cleaned, ignore.case = T)] <- "Cater"
inspection.clean$facility_cleaned[grepl("comm", inspection.clean$facility_cleaned, ignore.case = T)] <- "Cater"
inspection.clean$facility_cleaned[grepl("kitchen", inspection.clean$facility_cleaned, ignore.case = T)] <- "Cater"
inspection.clean$facility_cleaned[grepl("pantry", inspection.clean$facility_cleaned, ignore.case = T)] <- "Cater"
inspection.clean$facility_cleaned[grepl("church", inspection.clean$facility_cleaned, ignore.case = T)] <- "Cater"
inspection.clean$facility_cleaned[grepl("charter", inspection.clean$facility_cleaned, ignore.case = T)] <- "Cater"
inspection.clean$facility_cleaned[grepl("ARCHDIOCESE", inspection.clean$facility_cleaned, ignore.case = T)] <- "Cater"
inspection.clean$facility_cleaned[grepl("hostel", inspection.clean$facility_cleaned, ignore.case = T)] <- "Cater"
inspection.clean$facility_cleaned[grepl("hotel", inspection.clean$facility_cleaned, ignore.case = T)] <- "Cater"

inspection.clean$facility_cleaned[grepl("pack", inspection.clean$facility_cleaned, ignore.case = T)] <- "Pack"
inspection.clean$facility_cleaned[grepl("whole", inspection.clean$facility_cleaned, ignore.case = T)] <- "Pack"
inspection.clean$facility_cleaned[grepl("distri", inspection.clean$facility_cleaned, ignore.case = T)] <- "Pack"

inspection.clean$facility_cleaned[grepl("bar", inspection.clean$facility_cleaned, ignore.case = T)] <- "Liquor"

inspection.clean$facility_cleaned[grepl("center", inspection.clean$facility_cleaned, ignore.case = T)] <- "Event"

inspection.clean$facility_cleaned[grepl("helicopter", inspection.clean$facility_cleaned, ignore.case = T)] <- "Other"
inspection.clean$facility_cleaned[grepl("hair", inspection.clean$facility_cleaned, ignore.case = T)] <- "Other"
inspection.clean$facility_cleaned[grepl("riverwalk", inspection.clean$facility_cleaned, ignore.case = T)] <- "Other"

inspection.clean$facility_cleaned[grepl("culinary", inspection.clean$facility_cleaned, ignore.case = T)] <- "Cater"
inspection.clean$facility_cleaned[grepl("event", inspection.clean$facility_cleaned, ignore.case = T)] <- "Cater"
inspection.clean$facility_cleaned[grepl("pack", inspection.clean$facility_cleaned, ignore.case = T)] <- "Cater"
inspection.clean$facility_cleaned[grepl("hospital", inspection.clean$facility_cleaned, ignore.case = T)] <- "Cater"

inspection.clean$facility_cleaned[grepl("gas", inspection.clean$facility_cleaned, ignore.case = T)] <- "Store"
inspection.clean$facility_cleaned[grepl("store", inspection.clean$facility_cleaned, ignore.case = T)] <- "Snack"
inspection.clean$facility_cleaned[grepl("exercise", inspection.clean$facility_cleaned, ignore.case = T)] <- "Snack"

inspection.clean$facility_cleaned[inspection.clean$facility_cleaned == ""] <- "Other"
inspection.clean$facility_cleaned[grepl("poultry", inspection.clean$facility_cleaned, ignore.case = T)] <- "Other"

inspection.clean$facility_cleaned[grepl("retail", inspection.clean$facility_cleaned, ignore.case = T)] <- "Snack"
inspection.clean$facility_cleaned[grepl("nursing", inspection.clean$facility_cleaned, ignore.case = T)] <- "Care"
inspection.clean$facility_cleaned[grepl("news", inspection.clean$facility_cleaned, ignore.case = T)] <- "Grocery"
inspection.clean$facility_cleaned[grepl("liquor", inspection.clean$facility_cleaned, ignore.case = T)] <- "Other"
inspection.clean$facility_cleaned[grepl("snack", inspection.clean$facility_cleaned, ignore.case = T)] <- "Other"
inspection.clean$facility_cleaned[grepl("regulated", inspection.clean$facility_cleaned, ignore.case = T)] <- "Other"
# Recategorisation
# Cleaning the many inspection types
inspection.clean$inspection_cleaned = inspection.clean$inspection_type

inspection.clean$inspection_cleaned[grepl("entry", inspection.clean$inspection_cleaned, ignore.case = T)] <- "Other"
inspection.clean$inspection_cleaned[grepl("non", inspection.clean$inspection_cleaned, ignore.case = T)] <- "Other"
inspection.clean$inspection_cleaned[inspection.clean$inspection_cleaned == "Suspected Food Poisoning"] <- "Complaint"
inspection.clean$inspection_cleaned[inspection.clean$inspection_cleaned == "Suspected Food Poisoning Re-inspection"] <- "Complaint Re-Inspection"
inspection.clean$inspection_cleaned[inspection.clean$inspection_cleaned == "ASSESSMENT"] <- "Other"
inspection.clean$inspection_cleaned[inspection.clean$inspection_cleaned == "Consultation"] <- "Other"
inspection.clean$inspection_cleaned[inspection.clean$inspection_cleaned == "Recent Inspection"] <- "Other"
inspection.clean$inspection_cleaned[inspection.clean$inspection_cleaned == "Short Form Complaint"] <- "Other"
# Categorical data
categorical <- inspection.clean %>%
  filter(inspection_date >= "2022-01-01" & inspection_date < "2023-01-01") %>%
  dplyr::select(facility_cleaned,risk,inspection_cleaned,results)%>%
  st_drop_geometry()

# Convert to tidy format
categorical.long <- categorical %>%
  gather(Variable, value, -results) %>%
  count(Variable, value, results) 

# Plot graph
ggplot(categorical.long, aes(value, n, fill = results)) +   
  geom_bar(position = "dodge", stat="identity",width = 0.5) +
  facet_wrap(~Variable, scales="free", nrow = 2) +
  scale_fill_manual(values = paletteRG, 
                    name = "Food inspection")+
  guides(fill = guide_legend(label.position = "bottom",
                             direction="horizontal"))+
  scale_x_discrete(labels = wrap_format(10))+ # wrap long labels
  labs(x=NULL, y="Count",
       title = "Feature associations with the likelihood of passing the food inspection, 2022",
       subtitle = "Bar graphs of categorical features",
       caption = "Figure 4") +
  theme_minimal()+
  theme(legend.position = "inside",
        legend.position.inside = c(0.7,0.1),
        panel.grid.major.x = element_blank(),
        plot.title = element_text(face="bold"),
        plot.caption = element_text(size=10),
        strip.text = element_text(size = 10, hjust=0, face = "bold"),
        axis.title.x = element_text(margin=margin(10,0,0,0)),
        axis.title.y = element_text(margin=margin(0,10,0,0)),
        axis.text = element_text(size=7))

3.4 Summary of features

We summarize our features in Table 1, sorting them into (1) Internal characteristics of the food establishment, (2) Temporal features of the inspections, and (3) Spatial patterns nearby. For continuous features, we presented the average value (Mean). For categorical features, we presented the most common category (Mode).

tbl <- data.frame(
  Variable = c("Risk", "Facility Type", "Inspection type",
               
               "Previous failures", "Temperature (˚C)", "Days since last inspection",
               
               "Liquor licenses", "Vacant buildings", "Sanitation", "Streetlights"),
  
  Type = c("Categorical","Categorical","Categorical",
           
           "Continuous","Continuous","Continuous",
           
           "Continuous","Continuous","Continuous","Continuous"),
  
  Values = c("1 to 3", "7 types", "7 types",
            
            "0 to 8","-11.1 to 37.2", "0 to 1406",
            
            "0 to 2102*", "8 to 4278*", "11 to 1094*", "6 to 623*"), # distance
  
  Stat = c("1","Restaurant", "Canvass", # mode 
           
           "0.5", "17.0", "243", # mean
           
           "169", "1154", "129", "67") # mean
            
)

kable(tbl, caption = "<span style='color:#000000;'><b>Table 1. Summary of features</b></span>",
      col.names = c("Feature", "Type", "Values", "Mean / Mode")) %>%
   kable_styling()  %>%
    group_rows("Internal characteristics", 1, 3) %>%
    group_rows("Temporal features", 4, 6) %>%
    group_rows("Spatial patterns", 7, 10) %>%
    footnote(general_title = "\n",
           general = "*Distance in ft to the nearest points")
Table 1. Summary of features
Feature Type Values Mean / Mode
Internal characteristics
Risk Categorical 1 to 3 1
Facility Type Categorical 7 types Restaurant
Inspection type Categorical 7 types Canvass
Temporal features
Previous failures Continuous 0 to 8 0.5
Temperature (˚C) Continuous -11.1 to 37.2 17.0
Days since last inspection Continuous 0 to 1406 243
Spatial patterns
Liquor licenses Continuous 0 to 2102* 169
Vacant buildings Continuous 8 to 4278* 1154
Sanitation Continuous 11 to 1094* 129
Streetlights Continuous 6 to 623* 67

*Distance in ft to the nearest points

4 Logistic Regression

4.1 Feature Engineering

Before building the logistic regression model, 3 feature engineering methods are used to optimize the features for modelling. The first two methods have already been applied during data wrangling and analysis.

1. Nearest neighbor: As mentioned, to relate the open data on Chicago’s liquor establishments, vacant buildings, 311 reports of sanitation code violations, and 311 reports of three or more street lights going out to the food inspections, the average distances of each inspection record to the nearest features are calculated.

2. Recategorization: The facility types and the inspection types have many categories, including certain categories with almost no categories. They have been recategorized into 7 categories to prevent class imbalance. For example, all children and elderly day cares are grouped together, while all shared kitchens and food distribution centers are grouped together.

3. Feature interaction: A new interaction feature is created by associating the 311 sanitation reports with street light reports, because these two features appear to be related and their collaborative effects could be considered.

# Interaction
inspection.clean <- inspection.clean %>%
  mutate(interaction = sanitation_nn30*streetLights_nn10)

4.2 Train/Test Split

The model is trained on 2022 data and tested on 2023 data.

# Data from 2022
train <- inspection.clean %>%
  filter(inspection_date >= "2022-01-01" & inspection_date < "2023-01-01") %>% st_drop_geometry()

# Data from 2023
test <- inspection.clean %>%
  filter(inspection_date >= "2023-01-01" & inspection_date < "2024-01-01") %>% st_drop_geometry()

4.3 Regression Models

Two regression models were built. The original model only contains data from the food inspection dataset (internal characteristics), namely the facility types (recategorized), inspection types (recategorized) and risk categories (Table 2).

# Original model
model <- glm(results_numeric ~ .,
                  data=train %>%
                    dplyr::select(facility_cleaned,risk,inspection_cleaned,
                                  -results,results_numeric),
                  family="binomial" (link="logit"))

kable(broom::tidy(model), digits = 2,
      caption = paste("<span style='color:#000000;'><b>Table 2. Regression summary for original model.</b><br>McFadden R-Squared value: 0.05"),
      col.names = c("Term", "Estimate", "Standard error", "Statistic", "p-value")) %>%
  kable_styling() %>%
  scroll_box(width = "100%", height = "500px") # as output is quite long
Table 2. Regression summary for original model.
McFadden R-Squared value: 0.05
Term Estimate Standard error Statistic p-value
(Intercept) 10.87 196.97 0.06 0.96
facility_cleanedCater 0.14 0.26 0.52 0.60
facility_cleanedGrocery -0.43 0.16 -2.64 0.01
facility_cleanedMobile -0.04 0.36 -0.10 0.92
facility_cleanedOther -0.46 0.24 -1.93 0.05
facility_cleanedRestaurant -0.44 0.12 -3.55 0.00
facility_cleanedSchool -0.39 0.16 -2.35 0.02
riskRisk 1 (High) -10.41 196.97 -0.05 0.96
riskRisk 2 (Medium) -10.41 196.97 -0.05 0.96
riskRisk 3 (Low) -10.29 196.97 -0.05 0.96
inspection_cleanedCanvass Re-Inspection 1.31 0.10 12.89 0.00
inspection_cleanedComplaint -0.53 0.10 -5.45 0.00
inspection_cleanedComplaint Re-Inspection 0.94 0.14 6.60 0.00
inspection_cleanedLicense 0.24 0.10 2.44 0.01
inspection_cleanedLicense Re-Inspection 0.94 0.18 5.35 0.00
inspection_cleanedOther 0.56 0.19 2.88 0.00
#summary(model)
# pR2(model)

The new model includes open data from the Chicago City Portal (temporal features and spatial patterns), summarised in Table 3. After repeated modelling efforts, we ultimately chose to exclude the continuous features of two-day average high temperature and number of days since the last inspection. With more features included, the McFadden’s R-squared increases slightly by 0.01.

newModel <- glm(results_numeric ~ .,
                  data=train %>%
                    dplyr::select(facility_cleaned,risk,inspection_cleaned,
                                  vacantBuildings_nn2,
                                  liquor_nn2,interaction,fails,
                                  -results,results_numeric),
                  family="binomial" (link="logit"))

kable(broom::tidy(newModel), digits = 2,
      caption = paste("<span style='color:#000000;'><b>Table 3. Regression summary for new model.</b><br>McFadden R-Squared value: 0.06"),
      col.names = c("Term", "Estimate", "Standard error", "Statistic", "p-value")) %>%
  kable_styling() %>%
  scroll_box(width = "100%", height = "500px") # as output is quite long
Table 3. Regression summary for new model.
McFadden R-Squared value: 0.06
Term Estimate Standard error Statistic p-value
(Intercept) 10.78 196.97 0.05 0.96
facility_cleanedCater 0.14 0.26 0.53 0.60
facility_cleanedGrocery -0.43 0.16 -2.61 0.01
facility_cleanedMobile 0.11 0.37 0.31 0.76
facility_cleanedOther -0.50 0.24 -2.07 0.04
facility_cleanedRestaurant -0.48 0.13 -3.79 0.00
facility_cleanedSchool -0.33 0.17 -1.98 0.05
riskRisk 1 (High) -10.36 196.97 -0.05 0.96
riskRisk 2 (Medium) -10.36 196.97 -0.05 0.96
riskRisk 3 (Low) -10.27 196.97 -0.05 0.96
inspection_cleanedCanvass Re-Inspection 1.43 0.11 13.62 0.00
inspection_cleanedComplaint -0.47 0.10 -4.75 0.00
inspection_cleanedComplaint Re-Inspection 1.13 0.15 7.68 0.00
inspection_cleanedLicense 0.18 0.10 1.85 0.06
inspection_cleanedLicense Re-Inspection 0.99 0.18 5.64 0.00
inspection_cleanedOther 0.65 0.20 3.29 0.00
vacantBuildings_nn2 0.00 0.00 3.71 0.00
liquor_nn2 0.00 0.00 -0.44 0.66
interaction 0.00 0.00 -1.38 0.17
fails -0.22 0.05 -4.68 0.00
#summary(newModel)
#pR2(newModel)
testProbs <- data.frame(Outcome = as.factor(test$results_numeric),
                        Probs = predict(model, test, type= "response"))

newtestProbs <- data.frame(Outcome = as.factor(test$results_numeric),
                        Probs = predict(newModel, test, type= "response"))

A potentially more useful approach to goodness of fit than the McFadden’s R-squared is to visualize the predicted probabilities of a food establishment failing an inspection (Figure 5). The new model helps in shifting the distribution towards the two ends — towards 0 for inspection failures and 1 for passes.

lbl <- c(
  "1" = "1 (pass)",
  "0" = "0 (fail)"
)

h1 <- ggplot(testProbs, aes(x = Probs, fill = fct_relevel(Outcome, "1"))) +
  geom_density() +
  geom_vline(aes(xintercept=0.56),linetype="dashed")+
  facet_wrap(~factor(Outcome, c("1","0")),
             labeller = as_labeller(lbl),
             nrow=2,
             strip.position="right") +
  scale_fill_manual(values = paletteRG[c(2,1)]) +
  labs(x = "Probability", y = "Density of probabilities",
       title = "Original Model") +
  theme_minimal()+
  theme(plot.title = element_text(size=12,face="bold"),
        strip.text = element_blank(), #element_text(size = 12), #
        legend.position = "none",
        panel.grid.minor = element_blank(),
        axis.title.x = element_text(margin=margin(10,0,0,0)))

h2 <- ggplot(newtestProbs, aes(x = Probs, fill = fct_relevel(Outcome, "1"))) +
  geom_density() +
  geom_vline(aes(xintercept=0.56),linetype="dashed")+
  facet_wrap(~factor(Outcome, c("1","0")),
             labeller = as_labeller(lbl),
             nrow=2,
             strip.position="right") +
  scale_fill_manual(values = paletteRG[c(2,1)]) +
  labs(x = "Probability", y = NULL,
       title = "New Model") +
  theme_minimal()+
  theme(plot.title = element_text(size=12,face="bold"),
        strip.text = element_text(size = 12),
        legend.position = "none",
        panel.grid.minor = element_blank(),
        axis.title.x = element_text(margin=margin(10,0,0,0)))

h1+h2+
  plot_layout(axis_titles = "collect")+
  plot_annotation(title = "Distribution of predicted probabilities by observed outcome", 
                  subtitle = "Vertical dashed line denotes threshold of 0.56",
                  caption = "Figure 5",
                  theme = theme(plot.title = element_text(size=14,face="bold",hjust=0.17),
                                plot.caption = element_text(size=10),
                                plot.subtitle = element_text(size=12,hjust=0.08)))

4.4 Cross Validation

Performing a 100-fold cross validation on both models using 2022 data, their ROC (in this case area under the curve, AUC), sensitivity and specificity are compared using the default probability threshold of 0.5.

Visualizing the means and the distributions of the goodness of fit metrics across the folds, AUC and specificity appear similar for both models (Figure 6 and 7). The AUC is distributed around 0.64 or 0.66, which indicates a relatively generalizable model, as an AUC of 1 would reflect an overfit. The distribution of specificity is tighter for both models, which suggests that both models generalize better with respect to specificity (true negatives, in this case, predicting fails correctly).

The original model has a higher mean specificity of 0.9 but very low sensitivity of 0.19, while the new model has a lower specificty of 0.78 but relatively higher sensitivity of 0.39, suggesting that the new model strikes a better balance.

set.seed(123)

train$results_factor = factor(train$results)

ctrl <- trainControl(method = "cv", number = 100, classProbs=TRUE, summaryFunction=twoClassSummary)

cvFit <- train(results_factor ~ .,
                  data=train %>%
                 st_drop_geometry() %>%
                    dplyr::select(facility_cleaned,risk,zip,inspection_cleaned,
                                  -results,-results_numeric,results_factor),
                method="glm", family="binomial",
                metric="ROC", trControl = ctrl)
new_cvFit <- train(results_factor ~ .,
                  data=train  %>%
                 st_drop_geometry() %>%
                    dplyr::select(facility_cleaned,risk,inspection_cleaned,
                                  sanitation_nn30,vacantBuildings_nn2,
                                  liquor_nn2,streetLights_nn10,fails,
                                  -results,-results_numeric,results_factor),
                method="glm", family="binomial",
                metric="ROC", trControl = ctrl)
annotation <- data.frame(
  label = c(round(cvFit$results[["ROC"]],2),
            round(cvFit$results[["Sens"]],2),
            round(cvFit$results[["Spec"]],2)),
  metric = c("ROC","Sens","Spec"),
  x     = c(cvFit$results[["ROC"]]-0.1,
            cvFit$results[["Sens"]]-0.1,
            cvFit$results[["Spec"]]-0.1),
  y     = c(48,48,48)
)

dplyr::select(cvFit$resample, -Resample) %>%
  gather(metric, value) %>%
  left_join(gather(cvFit$results[2:4], metric, mean)) %>%
  ggplot(aes(value)) +
    geom_histogram(bins=35, fill = paletteRG[2]) +
    facet_wrap(~metric) +
    geom_vline(aes(xintercept = mean), colour = "grey30", linetype = 3, size = 1.5) +
    scale_x_continuous(limits = c(0, 1)) +
    labs(x="Goodness of fit", y="Count", title="CV goodness of fit metrics (original model with threshold of 0.5)",
         subtitle = "Across-fold mean labeled represented as dotted lines",
         caption = "Figure 6") +
  theme_minimal()+
  theme(plot.title = element_text(size=13, face="bold"),
        plot.caption = element_text(size=10),
        axis.title.x = element_text(margin=margin(10,0,0,0)),
        panel.grid.minor = element_blank(),
        strip.text = element_text(face="bold",hjust=0,size=12))+
  geom_text(data = annotation,
            mapping = aes(x = x, y = y, label = label))

new_annotation <- data.frame(
  label = c(round(new_cvFit$results[["ROC"]],2), 
            round(new_cvFit$results[["Sens"]],2),
            round(new_cvFit$results[["Spec"]],2)),
  metric = c("ROC","Sens","Spec"),
  x     = c(new_cvFit$results[["ROC"]]-0.12, 
            new_cvFit$results[["Sens"]]-0.12,
            new_cvFit$results[["Spec"]]-0.12),
  y     = c(45,45,45)
)

dplyr::select(new_cvFit$resample, -Resample) %>%
  gather(metric, value) %>%
  left_join(gather(new_cvFit$results[2:4], metric, mean)) %>%
  ggplot(aes(value)) + 
    geom_histogram(bins=35, fill = paletteRG[2]) +
    facet_wrap(~metric) +
    geom_vline(aes(xintercept = mean), colour = "grey30", linetype = 2, size=0.7) +
    scale_x_continuous(limits = c(0, 1)) +
    labs(x="Goodness of fit", y="Count", title="CV goodness of fit metrics (new model with threshold of 0.5)",
         subtitle = "Across-fold mean labeled and represented as dotted lines",
         caption = "Figure 7") + 
  theme_minimal()+
  theme(plot.title = element_text(size=13, face="bold"),
        plot.caption = element_text(size=10),
        axis.title.x = element_text(margin=margin(10,0,0,0)),
        panel.grid.minor = element_blank(),
        strip.text = element_text(face="bold",hjust=0,size=12))+ 
  geom_text(data = new_annotation,
            mapping = aes(x = x, y = y, label = label))

5 ROC Curve

Focusing on just the new model, its ROC curve is above the y=x line, suggesting that the model is better than a random guess (Figure 8). The curve is not too square, indicating that the model is not overfitted. The area under the curve (AUC) is 0.66, which is between 0.50 and 1, implying a fairly reasonable classification of inspection results.

The ROC curve highlights the trade-offs when it comes to deciding on a threshold. According to the ROC Curve, a threshold that predicts inspection passes correctly around 75% of the time, will predict inspection failures incorrectly around 50% of the time. The appropriateness of such trade-offs are evaluated using a cost-benefit analysis.

ggplot(newtestProbs, aes(d = as.numeric(Outcome), m = Probs)) +
  geom_roc(n.cuts = 50, labels = FALSE, colour = paletteRG[1]) +
  style_roc(theme = theme_minimal) +
  geom_abline(slope = 1, intercept = 0, size = 1.5, color = 'grey') +
  labs(title = "ROC curve of new model",
       subtitle = "Area under the curve: 0.66",
       caption = "Figure 8") + 
  coord_equal()+
  theme(plot.title = element_text(size=12,face = "bold"), #hjust=0.12
        plot.subtitle = element_text(size=11),
        plot.caption = element_text(size=10),
        axis.title.x = element_text(margin=margin(10,0,0,0)))

#auc(testProbs$Outcome, testProbs$Probs)
#auc(newtestProbs$Outcome, newtestProbs$Probs)

6 Cost-Benefit Analysis

The assumptions for the analysis are:

  1. Assigning an inspector costs $200.

  2. F&Bs that fail the inspection have a 5% chance of causing a health incident, which incurs a loss of $10,000 on average, in terms of business closures, loss of economic productivity, and welfare costs. We define preventing an incident as a gain.

  3. True positive: $0

  4. True negative: -$200 + $10,000 x 0.05 = $300

  5. False positive: $0

  6. False negative: -$200

To identify the optimal threshold that returns the greatest cost/ benefit, the confusion metric outcomes are plotted across thresholds (Figure 9).

  1. True positive: No change because cost/ benefit is defined as $0.
  2. True negative: Revenue gained increases as threshold increases.
  3. False positive: No change because cost/ benefit is defined as $0.
  4. False negative: Losses increase as threshold increases.
iterateThresholds <- function(data) {
  x = .01
  all_prediction <- data.frame()
  while (x <= 1) {
  
  this_prediction <-
      newtestProbs %>%
      mutate(predOutcome = ifelse(Probs > x, 1, 0)) %>%
      count(predOutcome, Outcome) %>%
      summarize(True_Positive = sum(n[predOutcome==1 & Outcome==1]),
                True_Negative = sum(n[predOutcome==0 & Outcome==0]),
                False_Positive = sum(n[predOutcome==1 & Outcome==0]),
                False_Negative = sum(n[predOutcome==0 & Outcome==1])) %>%
     gather(Variable, Count) %>%
     mutate(Revenue =
               case_when(Variable == "True_Positive"  ~ Count * 0,
                         Variable == "True_Negative"  ~ Count * (300),
                         Variable == "False_Positive" ~ Count * (0),
                         Variable == "False_Negative" ~ Count * (-200)),
            Threshold = x)
  
  all_prediction <- rbind(all_prediction, this_prediction)
  x <- x + .01
  }
return(all_prediction)
}
whichThreshold <- iterateThresholds(newtestProbs)

whichThreshold %>%
  ggplot(.,aes(Threshold, Revenue, colour = Variable)) +
  geom_point() +
  scale_colour_manual(values = palette4) +   
  scale_y_continuous(labels = scales::comma)+
  labs(title = "Revenue by confusion matrix type and threshold",
       y = "Revenue",
       caption = "Figure 9") +
  guides(colour=guide_legend(title = "Confusion matrix")) +
  theme_minimal()+
  theme(plot.title = element_text(size=14,face="bold"),
        plot.caption = element_text(size=10),
        axis.title.x = element_text(margin=margin(10,0,0,0)),
        panel.grid.minor = element_blank())

Calculating the total revenue across the confusion metrics for each threshold, the optimal threshold of 0.56 yields the highest total revenue (Figure 10). After 0.56, total revenue falls as the losses from wasted outreach efforts increase.

whichThreshold_revenue <- 
whichThreshold %>% 
    group_by(Threshold) %>% 
    summarize(Revenue = sum(Revenue)) 

optimal_threshold <- pull(arrange(whichThreshold_revenue, -Revenue)[1,1])

whichThreshold_revenue.long <- whichThreshold_revenue %>%
  gather(Variable,value,-Threshold)

lbl2 <- c(
  "Revenue" = "Total revenue"
)

ggplot(whichThreshold_revenue.long)+
  geom_line(aes(x = Threshold, y = value), color = paletteRG[2],lwd=1)+
  geom_vline(xintercept =  optimal_threshold, color = "grey50",lwd=1, linetype = 2, size=0.7)+
  scale_y_continuous(labels = scales::comma)+ #, limits = c(-1629950,1000000)
  labs(title = "Total revenue as a function of threshold",
       subtitle = "Vertical line denotes optimal threshold of 0.56",
       caption = "Figure 10", y = NULL) + 
  theme_minimal()+
  theme(legend.position = "none",
        plot.title = element_text(size = 14, face = "bold"),
        plot.caption = element_text(size=10),
        panel.grid.minor = element_blank(),
        strip.text = element_text(size=12,face="bold",hjust=0),
        axis.title.x = element_text(margin=margin(10,0,0,0)))

Based on the optimal threshold of 56%, we compare the confusion matrices of the original and the new model by testing them on 2023 data (Table 4 and 5). The accuracy of both models are around 0.6, but the new model improves specificty from 0.68 to 0.7 by increasing the number of true negatives (correct predictions of inspection failures) from 1380 to 1435.

testProbs <-
  testProbs %>%
  mutate(predOutcome  = as.factor(ifelse(testProbs$Probs > 0.56, 1, 0)))

newtestProbs <-
  newtestProbs %>%
  mutate(predOutcome  = as.factor(ifelse(newtestProbs$Probs > 0.56, 1, 0)))
results <- caret::confusionMatrix(testProbs$predOutcome, testProbs$Outcome,
                       positive = "1")

tbl1 <- data.frame(
  col1 = c("Predicted false",
           "Predicted true"),

  Observed_false = c(results$table[1],
                     results$table[2]),

  Observed_true = c(results$table[3],
                    results$table[4])
)

kable(tbl1,
      caption = paste("<span style='color:#000000;'><b>Table 4. Confusion matrix of original model with threshold of 0.56.</b><br>Accuracy: ",
                      round(results$overall[["Accuracy"]],2),
                      "Sensitivity:",
                      round(results$byClass[["Sensitivity"]],2),
                      "Specificity:",
                      round(results$byClass[["Specificity"]],2),
                      "</span>",
                      sep=" "),
      col.names = c("", "Observed false", "Observed true")) %>%
  kable_styling(full_width = F) %>%
  kableExtra::column_spec(1, bold = TRUE)
Table 4. Confusion matrix of original model with threshold of 0.56.
Accuracy: 0.6 Sensitivity: 0.56 Specificity: 0.68
Observed false Observed true
Predicted false 1380 1325
Predicted true 663 1664
new_results <- caret::confusionMatrix(newtestProbs$predOutcome, newtestProbs$Outcome, 
                       positive = "1")

tbl2 <- data.frame(
  col1 = c("Predicted false",
           "Predicted true"),
  
  Observed_false = c(new_results$table[1],
                     new_results$table[2]),
  
  Observed_true = c(new_results$table[3],
                    new_results$table[4])
)

kable(tbl2,
      caption = paste("<span style='color:#000000;'><b>Table 5. Confusion matrix of new model with threshold of 0.56.</b><br>Accuracy: ",
                      round(new_results$overall[["Accuracy"]],2),
                      "Sensitivity:",
                      round(new_results$byClass[["Sensitivity"]],2),
                      "Specificity:",
                      round(new_results$byClass[["Specificity"]],2),
                      "</span>",
                      sep=" "),
      col.names = c("", "Observed false", "Observed true")) %>%
  kable_styling(full_width = F) %>%
  kableExtra::column_spec(1, bold = TRUE)
Table 5. Confusion matrix of new model with threshold of 0.56.
Accuracy: 0.61 Sensitivity: 0.55 Specificity: 0.7
Observed false Observed true
Predicted false 1435 1341
Predicted true 608 1648

These equations are multiplied by the confusion matrix to calculate the total costs and benefits (Table 6).

cost_benefit_table <-
   newtestProbs %>%
      count(predOutcome, Outcome) %>%
      summarize(True_Positive = sum(n[predOutcome==1 & Outcome==1]),
                True_Negative = sum(n[predOutcome==0 & Outcome==0]),
                False_Positive = sum(n[predOutcome==1 & Outcome==0]),
                False_Negative = sum(n[predOutcome==0 & Outcome==1])) %>%
       gather(Variable, Count) %>%
       mutate(Revenue =
               case_when(Variable == "True_Positive"  ~ Count * 0,
                         Variable == "True_Negative"  ~ Count * (300),
                         Variable == "False_Positive" ~ Count * (0),
                         Variable == "False_Negative" ~ Count * (-200))) %>%
    bind_cols(data.frame(Description = c(
              "Correctly predicted F&B would pass",
              "Correctly predicted F&B would fail",
              "Predicted F&B would pass but it failed",
              "Predicted F&B would fail but it passsed")))

kable(cost_benefit_table,
       caption = "<span style='color:#000000;'><b>Table 6. Cost/benefit table of new model with threshold of 0.56.</b></span>") %>% 
  kable_styling()
Table 6. Cost/benefit table of new model with threshold of 0.56.
Variable Count Revenue Description
True_Positive 1648 0 Correctly predicted F&B would pass
True_Negative 1435 430500 Correctly predicted F&B would fail
False_Positive 608 0 Predicted F&B would pass but it failed
False_Negative 1341 -268200 Predicted F&B would fail but it passsed

Comparing the revenue from the two models, the new model has a higher total revenue at the optimal threshold of 56% (Table 7 and 8). The new model increases revenue at the optimal threshold by $12,000, an improvement of almost 10% relative to the old model.

# New table
tbl3 <- whichThreshold_revenue %>%
  filter(Threshold == optimal_threshold | Threshold > 0.5 & Threshold < 0.51) %>% # rounding issue
  mutate(Threshold = c("50% threshold", "Optimal Threshold"))
# Original table
iterateThresholds <- function(data) {
  x = .01
  all_prediction <- data.frame()
  while (x <= 1) {
  
  this_prediction <-
      testProbs %>%
      mutate(predOutcome = ifelse(Probs > x, 1, 0)) %>%
      count(predOutcome, Outcome) %>%
      summarize(True_Positive = sum(n[predOutcome==1 & Outcome==1]),
                True_Negative = sum(n[predOutcome==0 & Outcome==0]),
                False_Positive = sum(n[predOutcome==1 & Outcome==0]),
                False_Negative = sum(n[predOutcome==0 & Outcome==1])) %>%
     gather(Variable, Count) %>%
     mutate(Revenue =
               case_when(Variable == "True_Positive"  ~ Count * 0,
                         Variable == "True_Negative"  ~ Count * (300),
                         Variable == "False_Positive" ~ Count * (0),
                         Variable == "False_Negative" ~ Count * (-200)),
            Threshold = x)
  
  all_prediction <- rbind(all_prediction, this_prediction)
  x <- x + .01
  }
return(all_prediction)
}

whichThreshold <- iterateThresholds(testProbs)

whichThreshold_revenue <- 
whichThreshold %>% 
    group_by(Threshold) %>% 
    summarize(Revenue = sum(Revenue)) 

optimal_threshold <- pull(arrange(whichThreshold_revenue, -Revenue)[1,1])

tbl4 <- whichThreshold_revenue %>%
  filter(Threshold == optimal_threshold | Threshold > 0.5 & Threshold < 0.51) %>% # rounding issue
  mutate(Threshold = c("50% threshold", "Optimal Threshold"))
# Present original model first
kable(tbl4,
      format.args = list(big.mark = ","), # add commas
       caption = "<span style='color:#000000;'><b>Table 7. Total revenue by threshold of original model</b></span>",
      col.names = c("Threshold", "Total revenue")) %>% 
  kable_styling(full_width = F)
Table 7. Total revenue by threshold of original model
Threshold Total revenue
50% threshold 67,200
Optimal Threshold 150,300
kable(tbl3,
      format.args = list(big.mark = ","), # add commas
       caption = "<span style='color:#000000;'><b>Table 8. Total revenue by threshold of new model</b></span>",
      col.names = c("Threshold", "Total revenue")) %>% 
  kable_styling(full_width = F)
Table 8. Total revenue by threshold of new model
Threshold Total revenue
50% threshold 129,400
Optimal Threshold 162,300

7 Evaluation over space

In public-sector predictive modeling, generalizability is a significant consideration. Recall its two definitions: first, the ability to predict accurately on new data - which was the focus of cross-validation; and second, the ability to predict with comparable accuracy across different group contexts. Here, the focus will be on the latter. Specifically, we evaluate specificity across different types of food establishments and neighborhoods, which aligns with our use case of successfully predicting failures.

When examined across the recategorized facility types, specificity reveals uneven prediction performance (Table 9). The model performs well in predicting failures for school cafeterias, restaurants, grocery stores, and similar establishments. However, its performance is weaker for caterers, food facilities in healthcare buildings, and mobile food trucks. These disparities highlight significant differences between categories, indicating a need for further refinement of the categories and the model.

testmap <- inspection.clean %>%
  filter(inspection_date >= "2023-01-01" & inspection_date < "2024-01-01")
testwgeom <- data.frame(Y = as.numeric(testmap$latitude),
                        X = as.numeric(testmap$longitude),
                        Outcome = as.factor(testmap$results_numeric),
                        Probs = predict(newModel, testmap, type= "response"),
                        facility_type = as.factor(testmap$facility_cleaned))
testwgeom <- testwgeom %>%
  mutate(predOutcome  = as.factor(ifelse(testwgeom$Probs > 0.56, 1, 0))) %>%
  st_as_sf(coords = c("X", "Y"), crs = 4326, agr = "constant") %>%
  st_transform(CRS_SYSTEM)
# testwgeom_matrix <- testwgeom %>%
#   mutate(matrix = case_when(
#     predOutcome==1 & Outcome==1 ~ "True_Positive",
#     predOutcome==0 & Outcome==0 ~ "True_Negative",
#     predOutcome==1 & Outcome==0 ~ "False_Positive",
#     predOutcome==0 & Outcome==1 ~ "False_Negative")
#   )
results_type <- testwgeom %>%
  group_by(facility_type) %>%
  summarise(
    conf_matrix = list(caret::confusionMatrix(
      factor(predOutcome), 
      factor(Outcome),
      positive = "1"
    )),
    .groups = "drop"
  ) %>%
  mutate(
    spec = map_dbl(conf_matrix, ~ round(.x$byClass[["Specificity"]], 2))
  )
tbl5 <- results_type %>%
  st_drop_geometry() %>%
  dplyr::select(facility_type, spec)
kable(tbl5,
       caption = "<span style='color:#000000;'><b>Table 9. Specificity by facility type</b></span>",
      col.names = c("Facility Type", "Specificity")) %>% 
  kable_styling(full_width = F)
Table 9. Specificity by facility type
Facility Type Specificity
Care 0.13
Cater 0.08
Grocery 0.66
Mobile 0.00
Other 0.74
Restaurant 0.79
School 0.66

Across neighborhoods, the specificity of failure predictions is fairly consistent, suggesting that our model is generalizable (Figure 11). Notably, neighborhoods in the middle southern part of Chicago tend to exhibit higher specificity, while the northern areas show relatively lower values. Overall, the model demonstrates a goodness of fit across spatial contexts.

# Clean geometries of both datasets
neighborhoods_clean <- st_make_valid(neighborhoods)
neigh_cleaned <- st_intersection(neighborhoods_clean, chicagoBoundary)
testwgeom_neigh <- testwgeom %>%
  st_join(.,neigh_cleaned)
results_neighborhoods <- testwgeom_neigh %>%
  group_by(name) %>%
  summarise(
    TN = sum(predOutcome == 0 & Outcome == 0),  # True Negatives
    FP = sum(predOutcome == 1 & Outcome == 0),  # False Positives
    specificity = round(TN / (TN + FP), 2),     # Specificity formula
    .groups = "drop"
  )
results_neighmap <- results_neighborhoods %>%
  st_drop_geometry() %>%
  dplyr::select(name, specificity) %>%
  merge(.,neighborhoods, by = "name") %>%
  st_as_sf()
ggplot() + 
  geom_sf(data = neigh_cleaned, fill = "transparent", color = "grey") +
  geom_sf(data = results_neighmap, aes(fill = specificity), color = "grey") +
  scale_fill_gradient(low = "#e07a7a", high = "#8191a3", name = "Specificity") +
  labs(title= "Specificity by neighborhood, 2023",
       caption = "Figure 11. Source: City of Chicago.") +
  theme_void()+
  theme(plot.title = element_text(size = 12, face = "bold"),
        # legend.position = "none",
        strip.text = element_text(hjust=0, size = 10))

8 Conclusion

In conclusion, the new model presented in this study is largely accurate (61% accuracy) and classifies inspection results uptakes reasonably well (0.66 AUC). Compared to the original model that has insufficient data, the new model has higher specificity (70%), suggesting that it is better tailored towards identifying inspection failures.

The model could direct health inspectors towards food establishments with the highest likelihood of failure, to optimize allocation of manpower and resources. Food establishments with the highest likelihood of violating health standards will be prioritized for inspection, enabling earlier detection of critical violations and reducing potential public health risks.

Our model demonstrates a goodness of fit across neighborhoods. Nevertheless, it reveals uneven prediction performance in different types of food establishments. New features, more advanced feature engineering (including a better recategorisation of facility types), and a more powerful predictive algorithm could improve the model and further tailor it towards identifying food inspection failures.