Working in a hospital has given me an appreciation of the complexity of healthcare outcome measures and the diversity of interested parties. For instance, unplanned readmission rate is one important outcome measure because they are very constly and often preventable. For this reason, this metric is often used by entities such as Center for Medicaid Services (CMS), to evaluate a facility.
On the other hand, healthcare is a service that can also be evaluated by patient reported outcome measures (PRAMs). Different parties prioritize some outcome measures over others, and in some cases, an improvement in one measure may be associated with a decline in another.
Dataset¶
CMS reports extensively on hospital-level readmission rates and patient reported outcome measures (PRAMs). One PRAM is the Consumer Assessment of Healthcare Providers and Systems (CAHPS).
A number of contextual factors also play a role in hospital outcomes. For example, tasks that decrease readmission rate, such as follow-up calls and medication reconciliation, demand resources. Therefore, facilities in medically underserved areas (MUAs) may struggle to achieve high levels of patient satisfaction and decrease readmission rates.
CMS reports on a number of measures, such as physicians per 1000 population, and infant mortality rates, which are used to identify MUAs.
Project goals¶
In this project, I aimed to explore the relationships between readmission rates and CAHPS outcomes in MUAs and non-MUAs. I hoped to answer these questions:
- Does readmission rate and patient satisfaction have a positive correlation, or must we make tradeoffs in improvements?
- Are measures used to determine MUA status related to a facility's outcome measures?
- Should investments aimed at improving outcomes target MUA measures?
Summary of findings¶
- My exploration showed that while minor, there is a negative association between measures of patient satisfaction and readmission rates (as I expected). This suggests that hospital administrators shouldn't necessarily be concerned that measures to decrease readmission rates would decrease patient satisfaction.
- Additionally, the relationship between MUA status and measures is minimal. It appears that the underlying causes of these factors may be different. An important note is that these measures only cover a subset of the ways in which patients are medically underserved. Some areas with higher Index of Medical Underservice (IMU) scores may be underserving certain specific populations (such as those with low-income, or non-English speakers).
Full project¶
import pandas as pd
from pprint import pprint
import requests
# from private.py
API_TOKEN=''
API_SECRET=''
auth = (API_TOKEN, API_SECRET)
# the two API queries with an initialized empty dataframe for each
queries = [
["https://data.medicare.gov/resource/qk5r-kh7u.json?$limit=5000", pd.DataFrame()],
["https://data.medicare.gov/resource/r32h-32z5.json?$limit=5000&$where=measure_name= 'Rate of readmission after discharge from hospital (hospital-wide)' AND score != 'Not Available'", pd.DataFrame()]]
# use each query and save results in the dataframe
for query in queries:
if len(API_TOKEN)>0:
r=requests.get(query[0], auth=auth).json()
query[1] = pd.DataFrame.from_records(r)
else:
r=requests.get(query[0]).json()
query[1] = pd.DataFrame.from_records(r)
# separate CAHPS and readmission datasets
cahps_df = queries[0][1]
readm_df = queries[1][1]
# import MUA dataset
mua_df= pd.read_csv("https://data.hrsa.gov//DataDownload/DD_Files/MUA_DET.csv")
print("The CAHPS DataFrame has " + str(cahps_df.shape[0]) + " rows and " + str(cahps_df.shape[1]) + " columns. \n\nThe Readmission DataFrame has " + str(readm_df.shape[0]) + " rows and " + str(readm_df.shape[1]) + " columns. \n\nThe Medically Underserved Area DataFrame has " + str(mua_df.shape[0]) + " rows and " + str(mua_df.shape[1]) + " columns.\n\nTotal number of columns:", cahps_df.shape[0] + readm_df.shape[0] + mua_df.shape[0])
Exploring the CAHPS and readmission datasets¶
I wrote some functions to compile column names and counts of variables in each column from a dataframe into callable lists, for example:
import cleaning.explore as explore
print("Facility names (first 15):\n")
pprint(explore.get_counts(cahps_df)[1][0:15])
print("\nColumns in the readmissions data frame:\n")
for col in explore.get_cols(readm_df):
print(col)
Notes from exploring datasets:
- CAHPS
- state is abbreviation
- city, county, address, and facility name are in all caps
- there are several hospital names that appear more than once
- Readmission
- there's a column called location_state and one called state.
- Zip is called zip_code
- state is abbreviation
- city, county, address, and facility name are in all caps
- there are several hospital names that appear more than once
Cleaning up these datasets¶
Starting with the CAHPS and Readmission datasets¶
First, I removed and renamed uninteresting columns from these two DataFrames
cahps_df_clean = cahps_df.drop(
columns = [
'cms_certification_number',
'measure_start_date',
'measure_end_date',
'footnote', 'telephone'
]).rename(
columns = {
"county":"county_name"})
readm_df_clean = readm_df.drop(
columns = [
'footnote',
'measure_id',
'measure_name',
'phone_number',
'location_address',
'location_city',
'location_state',
'location_zip'
]).rename(
columns = {
'hospital_name':'facility_name',
'compared_to_national':'rate_of_readmission_national_comparison',
'score':'rate_of_readmission',
'denominator':'rate_denominator',
'higher_estimate':'rate_of_readmission_higher_estimate',
'lower_estimate':'rate_of_readmission_lower_estimate',
'zip_code':'zip'})
Now for the Medically Underserved Area (MUA) dataset¶
This is a more involved process...
- MUA status is determined by an IMU (Index of Medical Underservice) score of 62 or less. Accoring to HRSA, the IMU involves four variables
- ratio of primary medical care physicians per 1,000 population,
- infant mortality rate,
- percentage of the population with incomes below the poverty level, and
- percentage of the population age 65 or over
There are a LOT of variables in this dataset:
for col in explore.get_cols(mua_df):
print(col)
1. I started by removing the majority of variables not related to these measure or MUA status because I was interested in simply getting a record of which areas are considered underserved.
mua_df_clean = mua_df.drop(
columns = ['MUA/P ID', 'Common County Name', 'Complete County Name', 'MUA/P Service Area Name', 'County Description','State Name', 'U.S. - Mexico Border 100 Kilometer Indicator', 'U.S. - Mexico Border County Indicator', 'Medically Underserved Area/Population (MUA/P) Metropolitan Indicator.1', 'Medically Underserved Area/Population (MUA/P) Metropolitan Description', 'Medically Underserved Area/Population (MUA/P) Metropolitan Indicator', 'Rural Status Code', 'Designation Type Code',
'Minor Civil Division FIPS Code', 'Minor Civil Division Census Code', 'Minor Civil Division Name', 'MUA/P Status Code', 'Census Tract', 'Designation Date', 'Designation Date.1', 'State and County Federal Information Processing Standard Code', 'County or County Equivalent Federal Information Processing Standard Code', 'HHS Region Code','HHS Region Name', 'State FIPS Code', 'Data Warehouse Record Create Date', 'Data Warehouse Record Create Date Text',
'Primary State FIPS Code','HPSA Primary State Abbreviation', 'Primary State Name', 'Primary HHS Region Code', 'Primary HHS Region Name','Common Region Code','Common Region Name', 'Common State Name','Common State Abbreviation','Common State FIPS Code','Common State County FIPS Code', 'Break in Designation', 'Medically Underserved Area/Population (MUA/P) Component Designation Date','Ratio of Providers per 1000 Population','Medically Underserved Area/Population (MUA/P) Component Designation Date.1',
'Medically Underserved Area/Population (MUA/P) Component Geographic Name','Medically Underserved Area/Population (MUA/P) Component Geographic Type Code','Medically Underserved Area/Population (MUA/P) Component Geographic Type Description', 'Medically Underserved Area Geography Type Surrogate Key', 'Medically Underserved Area/Population (MUA/P) Component Last Update Date','Medically Underserved Area/Population (MUA/P) Component Status Code','Medically Underserved Area/Population (MUA/P) Component Status Description',
'Medically Underserved Area/Population (MUA/P) Component Update Date','Designation Population in a Medically Underserved Area/Population (MUA/P)', 'Medically Underserved Area/Population (MUA/P) Population Type ID','Medically Underserved Area/Population (MUA/P) Population Type Code','Medically Underserved Area/Population (MUA/P) Total Resident Civilian Population','Medically Underserved Area/Population (MUA/P) Withdrawal Date','Medically Underserved Area/Population (MUA/P) Withdrawal Date in Text Format', 'Unnamed: 74'
]).rename(columns={ 'Designation Type':'is-MUA?', 'MUA/P Status Description':'MUA-status-desc', 'County Equivalent Name' : 'county_name', 'State Abbreviation' : 'state','Medically Underserved Area/Population (MUA/P) Population Type': 'MUA-population-type'})
2. Next, I made the "is-MUA?" column a yes/no category
def binarize_categorical_variable(df, column, yescat):
""" Turns a categorical variable in a DataFrame into a "yes" or "no" response
arguments:
df = pandas DataFrame,
column = quoted column name
yes = quoted *and bracketed* list of category names that you wish to turn to "yes"
returns: the original pandas df with new binarized variable"""
df[column]=df[column].astype('category')
cat_list=[]
for cat in df[column].cat.categories:
cat_list.append(cat)
repl_dict = {}
for cat in cat_list:
for yes in yescat:
if cat == yes:
repl_dict[cat]= "Yes"
if repl_dict.get(cat) == None:
repl_dict[cat]="No"
df[column] = df[column].replace(repl_dict)
return df
mua_df_clean=binarize_categorical_variable(mua_df_clean, 'is-MUA?', ['Medically Underserved Area'])
# Change the county name to be all uppercase to match the other DataFrames
mua_df_clean['county_name'] = mua_df_clean.county_name.str.upper()
mua_df_clean.head()
Merging the datasets¶
At this point, I combined the CAHPS, readmission, and MUA datasets
def double_merge(df1, df2, df3, merge1, merge2):
""" A function to merge 3 datasets using a left merge
Arguments:
df1 = pandas dataframe of the limiting dataset / want to keep all the columns
df2, df3 = the other two pandas dataframes
merge1 = a list of columns to merge on for the first merge
merge2 = a list of columns to merge on for the 2nd merge
Returns: a pandas dataframe of mergerd data
"""
first_merge = df1.merge(df2, how = 'left', on = merge1)
second_merge = first_merge.merge(df3, how = 'left', on = merge2)
return second_merge
merge1 = ['facility_name','address','city','county_name','state','zip'] #the columns on which to merge cahps_df_clean and readm_df_clean
merge2 = ['state', 'county_name'] #the column on which to merge mua_df_clean with the other two
merged_dff = double_merge(cahps_df_clean, readm_df_clean, mua_df_clean, merge1, merge2).drop_duplicates(subset="address")
Completing the cleaning process¶
There are a couple more things to do to this merged dataset:
- filter out NAs in the variables of interest (readmission rate and patient ratings)
- fill the rest of the NAs or missing values with something meaningful
- Some NAs mean the hospital is not listed in the MUA dataset (is not in an MUA)
- Some mean that 0 people provided a given response for a patient satisfaction question
- make sure each variable is the correct datatype
def cleanup_merged_df(df):
""" Function to clean up the merged data
Argument: the merged data as pandas DataFrame
Output: Cleaned up merged data as pandas DataFrame"""
## This filters out rows where the mean facility rating and rate of readmission are null
df = df[df.patients_rating_of_the_facility_linear_mean_score.notnull()]
df = df[df.rate_of_readmission.notnull()]
## Mark rows that existed in the outcome measure (OC) DataFrame but not in the MUA DataFrame as not being MUAs
df['is-MUA?'] = df['is-MUA?'].fillna(value="No")
df['MUA-status-desc'] = df['MUA-status-desc'].fillna(value="Not Designated")
df['MUA-population-type'] = df['MUA-population-type'].fillna(value="None")
df['Rural Status Description'] = df['Rural Status Description'].fillna(value="No Info")
# Fill missing count values with 0 (0 respondants)
fill_list=[
'number_of_completed_surveys',
'number_of_sampled_patients',
'patients_rating_of_the_facility_linear_mean_score',
'patients_who_gave_the_facility_a_rating_of_0_to_6_on_a_scale_from_0_lowest_to_10_highest',
'patients_who_gave_the_facility_a_rating_of_7_or_8_on_a_scale_from_0_lowest_to_10_highest',
'patients_who_gave_the_facility_a_rating_of_9_or_10_on_a_scale_from_0_lowest_to_10_highest',
'patients_who_reported_no_they_would_not_recommend_the_facility_to_family_or_friends',
'patients_who_reported_probably_yes_they_would_recommend_the_facility_to_family_or_friends',
'patients_who_reported_that_staff_definitely_communicated_about_what_to_expect_during_and_after_the_procedure',
'patients_who_reported_that_staff_definitely_gave_care_in_a_professional_way_and_the_facility_was_clean',
'patients_who_reported_that_staff_did_not_communicate_about_what_to_expect_during_and_after_the_procedure',
'patients_who_reported_that_staff_did_not_give_care_in_a_professional_way_or_the_facility_was_not_clean',
'patients_who_reported_that_staff_somewhat_communicated_about_what_to_expect_during_and_after_the_procedure',
'patients_who_reported_that_staff_somewhat_gave_care_in_a_professional_way_or_the_facility_was_somewhat_clean',
'patients_who_reported_yes_they_would_definitely_recommend_the_facility_to_family_or_friends']
for col in fill_list:
df.loc[:,col] = df.loc[:,col].fillna(value=0)
# ## List columns to be integers, floats, and categories
# int_list = ['patients_rating_of_the_facility_linear_mean_score', 'communication_about_your_procedure_linear_mean_score','facilities_and_staff_linear_mean_score', 'number_of_completed_surveys','number_of_sampled_patients', 'patients_recommending_the_facility_linear_mean_score','patients_who_gave_the_facility_a_rating_of_0_to_6_on_a_scale_from_0_lowest_to_10_highest', 'patients_who_gave_the_facility_a_rating_of_7_or_8_on_a_scale_from_0_lowest_to_10_highest', 'patients_who_gave_the_facility_a_rating_of_9_or_10_on_a_scale_from_0_lowest_to_10_highest','patients_who_reported_no_they_would_not_recommend_the_facility_to_family_or_friends','patients_who_reported_probably_yes_they_would_recommend_the_facility_to_family_or_friends','patients_who_reported_that_staff_definitely_communicated_about_what_to_expect_during_and_after_the_procedure', 'patients_who_reported_that_staff_definitely_gave_care_in_a_professional_way_and_the_facility_was_clean', 'patients_who_reported_that_staff_did_not_communicate_about_what_to_expect_during_and_after_the_procedure', 'patients_who_reported_that_staff_did_not_give_care_in_a_professional_way_or_the_facility_was_not_clean','patients_who_reported_that_staff_somewhat_communicated_about_what_to_expect_during_and_after_the_procedure','patients_who_reported_that_staff_somewhat_gave_care_in_a_professional_way_or_the_facility_was_somewhat_clean','patients_who_reported_yes_they_would_definitely_recommend_the_facility_to_family_or_friends', 'survey_response_rate_percent', 'rate_denominator', 'provider_id']
# float_list = ['rate_of_readmission_higher_estimate', 'rate_of_readmission_lower_estimate', 'rate_of_readmission']
# category_list = ['county_name','state', 'is-MUA?', 'MUA-status-desc', 'MUA-population-type', 'Rural Status Description']
# ## Change data types as above
# for col in int_list:
# df[col] = df[col].astype('int')
# for col in float_list:
# df[col] = df[col].astype('float')
# for col in category_list:
# df[col] = df[col].astype('category')
return df
merged_df=cleanup_merged_df(merged_dff).reset_index().drop(columns='index')
Variables in the complete dataset¶
print("The merged data set has", merged_df.shape[0], "rows and", merged_df.shape[1], "columns.\n\nColumn names: \n")
for col in explore.get_cols(merged_df):
print(col)
Finally time to plot!¶
1. Plotting the distribution of the responses to the CAHPS survey questions¶
I was interested in looking at the reponses to 4 questions:
- did the staff communicate about what to expect during and after the procedure?
- would you recommend the facility to family or friends?
- did staff give care in a professional way and was the facility clean?
- what was your overall rating of the favility from 0 being the lowest to 10 being the highest?
The survey respondants provided Likert-type responses to each question. This dataset gives counts of how many respondant provided each type of response.
I turned those counts into proportions of total respondants in order to look at the distribution of responses for the four questions above:
import plotly.express as px
import plotly.offline as pyo
pyo.init_notebook_mode()
def plotly_resp_by_state(df, columns, names, graphTitle):
""" A function to plot the proportion of patients who gave each response to a survey question
Arguments:
columns = the name of the variable columns as a list of strings
names = the to-be names of the variable columns as a list of strings
graphTitle = title of the graph as a string
Ouput: plotly plot"""
#group by state
grp_df=df.groupby(by='state').sum().reset_index()
# create dict of current and to-be column names
name_dict={}
for i, col in enumerate(columns):
name_dict[col]=names[i]
# add the following two columns to column list
col_list=['state','number_of_completed_surveys']
col_list.extend(columns)
# create new df from grp_df and rename columns
df=grp_df[col_list].rename(columns=name_dict)
# change measure from count to proportion of total surveyed patients
l=len(col_list)
df.iloc[:,2:l]=df.values[:,2:l]/df.values[:,1,None]
# restructure DataFrame
df=df.melt(id_vars='state', value_vars=names, var_name='rating', value_name='proportion of patients')
# create plot
fig=px.strip(df,
x='rating',
y='proportion of patients',
orientation = "v",
hover_data= ['state'])
fig.update_layout(title_text=graphTitle,
title_font_size=12,
template="plotly_white",
autosize=False,
width=400,
height=400)
fig.update_yaxes(title_text="proportion of patients", title_font_size=12)
fig.show()
# create a list of column lists, a list of name lists, and a list of graph ttiles for 4 measures
columns=[
['patients_who_reported_that_staff_did_not_communicate_about_what_to_expect_during_and_after_the_procedure','patients_who_reported_that_staff_somewhat_communicated_about_what_to_expect_during_and_after_the_procedure', 'patients_who_reported_that_staff_definitely_communicated_about_what_to_expect_during_and_after_the_procedure'],
['patients_who_reported_no_they_would_not_recommend_the_facility_to_family_or_friends','patients_who_reported_probably_yes_they_would_recommend_the_facility_to_family_or_friends','patients_who_reported_yes_they_would_definitely_recommend_the_facility_to_family_or_friends'],
['patients_who_reported_that_staff_did_not_give_care_in_a_professional_way_or_the_facility_was_not_clean','patients_who_reported_that_staff_somewhat_gave_care_in_a_professional_way_or_the_facility_was_somewhat_clean','patients_who_reported_that_staff_definitely_gave_care_in_a_professional_way_and_the_facility_was_clean'],
['patients_who_gave_the_facility_a_rating_of_0_to_6_on_a_scale_from_0_lowest_to_10_highest','patients_who_gave_the_facility_a_rating_of_7_or_8_on_a_scale_from_0_lowest_to_10_highest','patients_who_gave_the_facility_a_rating_of_9_or_10_on_a_scale_from_0_lowest_to_10_highest']]
names_list=[
['not at all', 'somewhat', 'definitely'],
['no','probably', 'yes definitely'],
['not at all', 'somewhat', 'definitely'],
['0-6','7-8','9-10']]
graphTitle_list=["Patient Ratings of Staff Communication about the Procedure", "Patient Ratings of If They Would Recommend the Facility", "Patient Ratings of Professional Care and Facility Cleanliness","Overall Rating of the Facility (scale from 0 lowest to 10 highest)"]
# plot each of 4 measures
for i, o in enumerate(columns):
columns=o
names=names_list[i]
graphTitle=graphTitle_list[i]
plotly_resp_by_state(merged_df, columns, names, graphTitle)
Hover over each point to see the state.
I was surprised to see that overall, the distribution of patient ratings is skewed towards positive reponses when asked about specific aspects of their care. This surprised me because a lot of the narrative around healthcare in the U.S. is about the inconvenience and cost. However, it appears that consistently, less than 10% of respondant gave negative ratings.
However, healthcare professionals and administrators are always trying to improve patient experience. What are some potential strategies for doing this?
Possible candidates include the practices that have been shown to decrease readmission rates, such as:
- medication reconciliation
- follow-up phone calls
- handoff to PCPs promptly after surgery
However, sometimes, practices that have a positive effect on one outcome measure may not improve another. For instance, swift handoff can make patients feel like they aren't being given attention and thereby decrease patient satisfaction. For this reason, I wanted to explore the question of whether increases in patient satisfaction are associated with decreased readmission rates.
2. Plotting the relationship between readmission rate and patient satisfaction¶
The CAHPS dataset provides linear means for patient ratings of communication about the procedures, the overall rating of the facility, and liklihood to recommend the facility.
I started by checking out the distribution of these variables using a seaborn pairplot:
import matplotlib.pyplot as plt
import seaborn as sns
#plot
d = sns.pairplot(merged_df, height = 4, vars=[
'patients_rating_of_the_facility_linear_mean_score',
'patients_recommending_the_facility_linear_mean_score',
'facilities_and_staff_linear_mean_score',
'communication_about_your_procedure_linear_mean_score'], )
#change axis labels
replacements = {
'communication_about_your_procedure_linear_mean_score': 'communication',
'patients_recommending_the_facility_linear_mean_score': 'recommendation',
'facilities_and_staff_linear_mean_score' : 'facilities+staff',
'patients_rating_of_the_facility_linear_mean_score': 'overall rating'}
for i in range(3):
for j in range(3):
xlabel = d.axes[i][j].get_xlabel()
ylabel = d.axes[i][j].get_ylabel()
if xlabel in replacements.keys():
d.axes[i][j].set_xlabel(replacements[xlabel])
if ylabel in replacements.keys():
d.axes[i][j].set_ylabel(replacements[ylabel])
All of the linear means of the responses are mostly normally distributed and positively correlated with each other
Next, I checked the correlations between these four patient responses and readmission rate.
As the code below shows, all of the four patient reported measures above had weak negative correlations with readmission rates:
corr = merged_df.corr()['rate_of_readmission']
print("Correlation with readmission rate:")
corr.loc[(
'patients_recommending_the_facility_linear_mean_score',
'patients_rating_of_the_facility_linear_mean_score',
'facilities_and_staff_linear_mean_score',
'communication_about_your_procedure_linear_mean_score'),]
I anticipated a higher association between readmission rates and patient reported outcome measures. Perhaps, the two types of measures are not affected my common underlying conditions, which may indicate that the strategies to improve patient satisfaction are distinct from those to decrease readmission rates.
Next, I used plotly to make scatterplots of readmission by each of the three patient reported outcome measures, colored by MUA status.
import plotly.graph_objects as go
from plotly.subplots import make_subplots
fig=make_subplots(rows=4,cols=1)
datayes=merged_df[merged_df['is-MUA?']=='Yes']
datano=merged_df[merged_df['is-MUA?']=='No']
dim_list=[['patients_rating_of_the_facility_linear_mean_score',1,1, True],
['patients_recommending_the_facility_linear_mean_score',2,1, False],
['communication_about_your_procedure_linear_mean_score', 3,1, False],
['facilities_and_staff_linear_mean_score', 4,1, False]]
for dim in dim_list:
fig.append_trace(go.Scatter(
x=datayes['rate_of_readmission'],
y = datayes[dim[0]],
mode='markers',
name='MUA',
marker_color='#cea2fd',
hoverinfo="text",
hovertext=datayes.facility_name + ' <br> ' + datayes.city + ' ,' + datayes.state,
showlegend=dim[3]),
row=dim[1], col=dim[2])
fig.append_trace(go.Scatter(
x=datano['rate_of_readmission'],
y = datano[dim[0]],
mode='markers',
name='not MUA',
marker_color='#137e6d',
hoverinfo="text",
hovertext=datano.facility_name + ' <br> ' + datano.city + ' , ' + datano.state,
showlegend=dim[3]),
row=dim[1], col=dim[2])
fig.update_layout(template="plotly_white", autosize=False,width=700,height=1000)
fig.update_xaxes(title_text="Readmission Rate", row=4, col=1)
fig.update_yaxes(title_text="Overall Rating", row=1, col=1)
fig.update_yaxes(title_text="Recommendation", row=2, col=1)
fig.update_yaxes(title_text="Communication", row=3, col=1)
fig.update_yaxes(title_text="Facilities and Staff", row=4, col=1)
fig.show()
3. Visualizing the difference between MUA and non-MUA hospitals¶
Above, I colored the datapoints by MUA status, hoping to get some clues about conditions that could be contributing to outcomes.
The measures that contribute to MUA status are candidates. However, there doesn't seem to be a difference between MUA hospitals and non-MUA hospitals in the association between readmission rate and any of the 4 patient reported outcome measures.
To take a closer look at this, I used boxplots to visualize the difference between MUA and non-MUA hospitals in these four outcome measures.
fig=make_subplots(rows=2,cols=2)
dim_list=[['patients_rating_of_the_facility_linear_mean_score','#a2bffe',1,1],
['patients_recommending_the_facility_linear_mean_score','#0d75f8',1,2],
['communication_about_your_procedure_linear_mean_score', '#0e87cc', 2,1],
['rate_of_readmission','#6a79f7', 2,2]]
for dim in dim_list:
fig.append_trace(go.Box(
x = merged_df['is-MUA?'],
y = merged_df[dim[0]],
boxpoints='all',
jitter=0.3,
width=.2,
marker=dict(color = dim[1], line=dict(width=.5,color='DarkSlateGrey')),
hoverinfo="text",
hovertext = merged_df.facility_name + ' <br> ' + merged_df.city + ' , ' + merged_df.state,
showlegend=False),
row=dim[2], col=dim[3])
fig.update_layout(title_text="Outcomes Measures in MUA and non-MUA Facilities",template="plotly_white", autosize=False,width=700,height=700)
fig.update_xaxes(title_text="is MUA?")
fig.update_yaxes(title_text="Overall Rating", row=1, col=1)
fig.update_yaxes(title_text="Recommendation", row=1, col=2)
fig.update_yaxes(title_text="Communication", row=2, col=1)
fig.update_yaxes(title_text="Readmission Rate", row=2, col=2)
fig.show()
These boxplots confirm that the distributions of each of these four outcome measures don't differ greatly between MUA and non-MUA hospitals.
Except perhaps that in MUA hospitals, patients rated that they would be less likely to recommend the facility to family and friends.
This visualization doesn't indicate that the measures that contribute to MUA status would be good targets for investment for those trying to improve outcome measures.
However, MUA status is a category that is composed of multiple measures, as mentioned previously. I was curious about looking at each of these measures separately.
4. Disentangling MUA status¶
I made a scatterplot for each of the measures that contribute to MUA status and readmission rate, responses on recommendation, and responses on overall rating.
fig=make_subplots(rows=4,cols=3)
dim_list=[['Ratio of Providers per 1000 Population IMU Score',1],
['Infant Mortality Rate',2],
['Percent of Population with Incomes at or Below 100 Percent of the U.S. Federal Poverty Level', 3],
['Percentage of Population Age 65 and Over', 4]]
for dim in dim_list:
fig.append_trace(go.Scatter(
x =merged_df['rate_of_readmission'],
y = merged_df[dim[0]],
mode='markers',
marker_color='#40a368',
marker=dict(color = dim[1], line=dict(width=.5,color='DarkSlateGrey')),
hoverinfo="text",
hovertext = merged_df.facility_name + ' <br> ' + merged_df.city + ' , ' + merged_df.state,
showlegend=False),
row=dim[1], col=1)
fig.append_trace(go.Scatter(
x =merged_df['patients_recommending_the_facility_linear_mean_score'],
y = merged_df[dim[0]],
mode='markers',
marker_color='#40a368',
marker=dict(color = dim[1], line=dict(width=.5,color='DarkSlateGrey')),
hoverinfo="text",
hovertext = merged_df.facility_name + ' <br> ' + merged_df.city + ' , ' + merged_df.state,
showlegend=False),
row=dim[1], col=2)
fig.append_trace(go.Scatter(
x =merged_df['patients_rating_of_the_facility_linear_mean_score'],
y = merged_df[dim[0]],
mode='markers',
marker_color='#40a368',
marker=dict(color = dim[1], line=dict(width=.5,color='DarkSlateGrey')),
hoverinfo="text",
hovertext = merged_df.facility_name + ' <br> ' + merged_df.city + ' , ' + merged_df.state,
showlegend=False),
row=dim[1], col=3)
fig.update_layout(template="plotly_white", autosize=False,width=700,height=900)
fig.update_xaxes(title_text="Readmission Rate", row=4, col=1)
fig.update_xaxes(title_text="Recommendation", row=4, col=2)
fig.update_xaxes(title_text="Overall Rating", row=4, col=3)
fig.update_yaxes(title_text="Ratio of Providers per 1000 Pop", row=1, col=1, title_font = dict(size=10))
fig.update_yaxes(title_text='Infant Mortality Rate', row=2, col=1, title_font = dict(size=10))
fig.update_yaxes(title_text="Pop Below US Fed Poverty Level (%)", row=3, col=1, title_font = dict(size=10))
fig.update_yaxes(title_text='Pop => 65 years (%)', row=4, col=1, title_font = dict(size=10))
fig.show()
There don't seem to be any strong correlations here. The correlation coefficients back this up:
corr = {
"readmission rate" : merged_df.corr()['rate_of_readmission'],
"recommendation" : merged_df.corr()['patients_recommending_the_facility_linear_mean_score'],
"overall rating" : merged_df.corr()['patients_rating_of_the_facility_linear_mean_score']}
corr = pd.DataFrame(corr)
corr.loc[(
'Ratio of Providers per 1000 Population IMU Score',
'Infant Mortality Rate',
'Percent of Population with Incomes at or Below 100 Percent of the U.S. Federal Poverty Level',
'Percentage of Population Age 65 and Over'),:]
5. Ploting results on a map¶
For fun, I made a plotly map of all the hospitals in this dataset, since it included coordinates. Scroll to zoom in!
location_df=merged_df[merged_df['location'].notnull()].reset_index()
for i, row in location_df.iterrows():
location_df.loc[i,'lat']=row['location']['coordinates'][1]
location_df.loc[i,'lon']=row['location']['coordinates'][0]
location_df['state']=location_df['state'].astype('object')
fig=go.Figure()
datayes=location_df[location_df['is-MUA?']=='Yes']
datano=location_df[location_df['is-MUA?']=='No']
fig.add_trace(go.Scattergeo(
locationmode = 'USA-states',
lon = datayes.lon,
lat = datayes.lat,
name='MUA',
hoverinfo="text",
text = datayes.facility_name + ' <br> ' + datayes.city + ' , ' + datayes.state,
mode = 'markers',
marker = dict(
size = 8,
opacity = 0.8,
reversescale = True,
autocolorscale = False,
symbol = 'circle',
line = dict(
width=1,
color='rgba(102, 102, 102)'
),
colorscale = 'Portland',
cmin = 0,
color = location_df['rate_of_readmission'],
colorbar_title="Rate of Readmission")))
fig.add_trace(go.Scattergeo(
locationmode = 'USA-states',
lon = datano.lon,
lat = datano.lat,
name='not MUA',
hoverinfo="text",
text = datano.facility_name + ' <br> ' + datano.city + ' , ' + datano.state,
mode = 'markers',
marker = dict(
size = 8,
opacity = 0.8,
reversescale = True,
autocolorscale = False,
symbol = 'square',
line = dict(
width=1,
color='rgba(102, 102, 102)'
),
colorscale = 'Portland',
cmin = 0,
color = location_df['rate_of_readmission'],
colorbar_title="Rate of Readmission")))
fig.update_layout(legend=dict(x=-.1, y=1.2),
geo = dict(
scope='usa',
projection_type='albers usa',
showland = True,
landcolor = "rgb(250, 250, 250)",
subunitcolor = "rgb(217, 217, 217)",
countrycolor = "rgb(217, 217, 217)",
countrywidth = 0.5,
subunitwidth = 0.5
),
)
fig.show()
Summary¶
As mentioned, healthcare outcome measures can sometimes come into conflict with each other, with some interventions having positive effects on some measures but not others. For instance, patient satisfaction is sometimes used as a balance measure for interventions aiming to affect another outcome measure. My exploration showed that while minor, there is a negative association between measures of patient satisfaction and readmission rates (as I expected). This suggests that hospital administrators shouldn't necessarily be concerned that measures to decrease readmission rates would decrease patient satisfaction.
Additionally, I wanted to take a look at the relationship between these factors and MUA status and measures to explore whether investment in any of these areas (say, decreasing infant mortality) could also have positive impact on readmission rates and patient reported outcome measures. While some have minimal association, it appears that the underlying causes of these factors may be different. An important note is that these measures only cover a subset of the ways in which patients are medically underserved. Some areas with higher IMU scores may be underserving certain specific populations (such as those with low-income, or non-English speakers).
In the future, I want to take a closer look at what aspects of healthcare drive patient satisfaction. Gaining insight into how disparate factors influence patient satisfaction and other healthcare statistics would be helpful for administrators decided how to allocate resources