Community Health Status Indicators (CHSI) to combat obesity, heart disease, and cancer are major components of the Community Health Data Initiative. This dataset provides key health indicators for local communities and encourages dialogue about actions that can be taken to improve community health (e.g., obesity, heart disease, cancer).
The CHSI report and dataset was designed not only for public health professionals but also for members of the community who are interested in the health of their community. The CHSI report contains over 200 measures for each of the 3,141 United States counties. Although CHSI presents indicators like deaths due to heart disease and cancer, it is imperative to understand that behavioral factors such as obesity, tobacco use, diet, physical activity, alcohol and drug use, sexual behavior and others substantially contribute to these deaths.
Data and Resources
%matplotlib inline
import pandas as pd
import numpy as pd
from sklearn.linear_model import LinearRegression
%pylab inline
import matplotlib.pyplot as plt
import pandas as pd
X = pd.read_csv('/Users/annettechiu/Desktop/Health_indicators/RISKFACTORSANDACCESSTOCARE.csv')
X.head() 5 rows × 31 columns
X = X[X['No_Exercise'] > -100]
X = X[X['Diabetes'] > -100]
X.hist('No_Exercise');
X.hist('Diabetes');
X.hist('Diabetes');
#remove any points with a missing y value
filtered_data =X[~np.isnan(X["No_Exercise"])]
filtered_data.head(3)
filtered_data.columns
filtered_data[['No_Exercise','Disabled_Medicare']].corr()
filtered_data[['No_Exercise','High_Blood_Pres']].corr()
filtered_data[['No_Exercise','Elderly_Medicare']].corr()
filtered_data[['No_Exercise','Obesity']].corr()
filtered_data[['No_Exercise','Diabetes']].corr()
filtered_data[['No_Exercise','Prim_Care_Phys_Rate']].corr()
npMatrix = np.matrix(filtered_data)
No_Exercise,Prim_Care_Phys_Rate = npMatrix[:,0], npMatrix[:,1]
mdl = LinearRegression().fit(No_Exercise,Prim_Care_Phys_Rate) # either this or the next line
#mdl = LinearRegression().fit(filtered_data[['x']],filtered_data.y)
m = mdl.coef_[0]
b = mdl.intercept_
print "formula: y = {0}x + {1}".format(m, b) # following slope intercept form
plt.scatter(No_Exercise,Prim_Care_Phys_Rate, color='blue')
plt.plot([0,100],[b,m*100+b],'r')
plt.title('Linear Regression', fontsize = 20)
plt.xlabel('No_Exercise', fontsize = 15)
plt.ylabel('Prim_Care_Phys_Rate', fontsize = 15)
npMatrix = np.matrix(filtered_data)
No_Exercise, Diabetes = npMatrix[:,0], npMatrix[:,1]
mdl = LinearRegression().fit(No_Exercise,Diabetes) # either this or the next line
#mdl = LinearRegression().fit(filtered_data[['x']],filtered_data.y)
m = mdl.coef_[0]
b = mdl.intercept_
print "formula: y = {0}x + {1}".format(m, b) # following slope intercept form