I was given two datasets: One about 15 different schools, and one about all of the students in them, including their math and reading scores. The two csv datasets were sized at 1kb and 2,074kb respectively. There was no missing data. I was asked to identify the most effective schools, and provide other insights.

Here, I used Pandas in a Jupyter Notebook. The requirements of this task required me to learn about formatting intermediate datatypes, especially the int64.

As a whole, schools with higher budgets, did not yield better test results. Schools with higher spending per student (645-675 dollars) actually underperformed compared to schools with smaller budgets (585 dollars per student).

Smaller and medium sized schools dramatically out-performed large sized schools on passing math performances (89-91% passing vs 67%).

Charter schools out-performed the public district schools across all metrics.

In [1]:

```
# Dependencies and Setup
import pandas as pd
import numpy as np
# File to Load
school_data_to_load = "Resources/schools_complete.csv"
student_data_to_load = "Resources/students_complete.csv"
# Read School and Student Data File and store into Pandas Data Frames
school_data = pd.read_csv(school_data_to_load)
student_data = pd.read_csv(student_data_to_load)
# Combine the data into a single dataset
school_data_complete = pd.merge(student_data, school_data, how="left", on=["school_name", "school_name"])
```

In [6]:

```
school_data.head()
```

Out[6]:

In [8]:

```
school_data_complete.head()
```

Out[8]:

Calculate the total number of schools

Calculate the total number of students

Calculate the total budget

Calculate the average math score

Calculate the average reading score

Calculate the overall passing rate (overall average score), i.e. (avg. math score + avg. reading score)/2

Calculate the percentage of students with a passing math score (70 or greater)

Calculate the percentage of students with a passing reading score (70 or greater)

Create a dataframe to hold the above results

Optional: give the displayed data cleaner formatting

In [3]:

```
Total_Schools_int = len(school_data_complete['school_name'].unique())
Total_Students_int = len(school_data_complete['student_name'])
Total_Budget_int = sum(school_data_complete['budget'].unique())
Average_Math_int = school_data_complete['math_score'].mean()
Average_Reading_int= school_data_complete['reading_score'].mean()
Passing_Math_count_float = school_data_complete[(school_data_complete["math_score"] >= 70)].count()["student_name"]
Passing_Math_percentage_float = (Passing_Math_count_float / float(Total_Students_int)) * 100
Passing_Reading_count_float = school_data_complete[(school_data_complete["reading_score"] >= 70)].count()["student_name"]
Passing_Reading_percentage_float = (Passing_Reading_count_float / float(Total_Students_int)) * 100
Overall_Passing_percentage_float = (Passing_Math_percentage_float + Passing_Reading_percentage_float)/2
df1 = pd.DataFrame({
"Total Schools": [Total_Schools_int],
"Total Students": [Total_Students_int],
"Total Budget": [Total_Budget_int],
"Average Math Score": [Average_Math_int],
"Average Reading Score": [Average_Reading_int],
"% Passing Math": [Passing_Math_percentage_float],
"% Passing Reading": [Passing_Reading_percentage_float],
"% Overall Passing Rate": [Overall_Passing_percentage_float]
})
df1['Total Students'] = df1['Total Students'].map("{:,}".format)
df1['Total Budget'] = df1['Total Budget'].map("${:,.2f}".format)
df1
```

Out[3]:

Create an overview table that summarizes key metrics about each school, including:

- School Name
- School Type
- Total Students
- Total School Budget
- Per Student Budget
- Average Math Score
- Average Reading Score
- % Passing Math
- % Passing Reading
- Overall Passing Rate (Average of the above two)

Create a dataframe to hold the above results

In [4]:

```
school_types = school_data.set_index(["school_name"])["type"]
per_school_counts = school_data_complete["school_name"].value_counts()
per_school_budget = school_data_complete.groupby(["school_name"]).mean()["budget"]
per_school_capita = per_school_budget / per_school_counts
per_school_math = school_data_complete.groupby(["school_name"]).mean()["math_score"]
per_school_reading = school_data_complete.groupby(["school_name"]).mean()["reading_score"]
school_passing_math = school_data_complete[(school_data_complete["math_score"] >= 70)]
school_passing_reading = school_data_complete[(school_data_complete["reading_score"] >= 70)]
per_school_passing_math = school_passing_math.groupby(["school_name"]).count()["student_name"] / per_school_counts * 100
per_school_passing_reading = school_passing_reading.groupby(["school_name"]).count()["student_name"] / per_school_counts * 100
overall_passing_rate = (per_school_passing_math + per_school_passing_reading) / 2
per_school_summary = pd.DataFrame({"School Type": school_types,
"Total Students": per_school_counts,
"Total School Budget": per_school_budget,
"Per Student Budget": per_school_capita,
"Average Math Score": per_school_math,
"Average Reading Score": per_school_reading,
"% Passing Math": per_school_passing_math,
"% Passing Reading": per_school_passing_reading,
"% Overall Passing Rate": overall_passing_rate})
per_school_summary = per_school_summary[["School Type", "Total Students", "Total School Budget", "Per Student Budget",
"Average Math Score", "Average Reading Score",
"% Passing Math", "% Passing Reading",
"% Overall Passing Rate"]]
per_school_summary["Total School Budget"] = per_school_summary["Total School Budget"].map("${:,.2f}".format)
per_school_summary["Per Student Budget"] = per_school_summary["Per Student Budget"].map("${:,.2f}".format)
per_school_summary
```

Out[4]:

- Sort and display the five best-performing schools

In [5]:

```
top_schools = per_school_summary.sort_values(["% Overall Passing Rate"], ascending=False)
top_schools.head(5)
```

Out[5]:

- Sort and display the five worst-performing schools

In [6]:

```
bottom_schools = per_school_summary.sort_values(["% Overall Passing Rate"], ascending=True)
bottom_schools.head(5)
```

Out[6]:

Create a table that lists the average Reading Score for students of each grade level (9th, 10th, 11th, 12th) at each school.

Create a pandas series for each grade. Hint: use a conditional statement.

Group each series by school

Combine the series into a dataframe

Optional: give the displayed data cleaner formatting

In [7]:

```
# Create data series of scores by grade levels using conditionals
ninth_graders = school_data_complete[(school_data_complete["grade"] == "9th")]
tenth_graders = school_data_complete[(school_data_complete["grade"] == "10th")]
eleventh_graders = school_data_complete[(school_data_complete["grade"] == "11th")]
twelfth_graders = school_data_complete[(school_data_complete["grade"] == "12th")]
# Group each by school name
ninth_graders_scores = ninth_graders.groupby(["school_name"]).mean()["math_score"]
tenth_graders_scores = tenth_graders.groupby(["school_name"]).mean()["math_score"]
eleventh_graders_scores = eleventh_graders.groupby(["school_name"]).mean()["math_score"]
twelfth_graders_scores = twelfth_graders.groupby(["school_name"]).mean()["math_score"]
# Combine series into single data frame
scores_by_grade = pd.DataFrame({"9th": ninth_graders_scores, "10th": tenth_graders_scores,
"11th": eleventh_graders_scores, "12th": twelfth_graders_scores})
# Minor data munging
scores_by_grade = scores_by_grade[["9th", "10th", "11th", "12th"]]
scores_by_grade.index.name = None
# Display the data frame
scores_by_grade
```

Out[7]:

- Perform the same operations as above for reading scores

In [8]:

```
# Create data series of scores by grade levels using conditionals
ninth_graders = school_data_complete[(school_data_complete["grade"] == "9th")]
tenth_graders = school_data_complete[(school_data_complete["grade"] == "10th")]
eleventh_graders = school_data_complete[(school_data_complete["grade"] == "11th")]
twelfth_graders = school_data_complete[(school_data_complete["grade"] == "12th")]
# Group each by school name
ninth_graders_scores = ninth_graders.groupby(["school_name"]).mean()["reading_score"]
tenth_graders_scores = tenth_graders.groupby(["school_name"]).mean()["reading_score"]
eleventh_graders_scores = eleventh_graders.groupby(["school_name"]).mean()["reading_score"]
twelfth_graders_scores = twelfth_graders.groupby(["school_name"]).mean()["reading_score"]
# Combine series into single data frame
scores_by_grade = pd.DataFrame({"9th": ninth_graders_scores, "10th": tenth_graders_scores,
"11th": eleventh_graders_scores, "12th": twelfth_graders_scores})
# Minor data munging
scores_by_grade = scores_by_grade[["9th", "10th", "11th", "12th"]]
scores_by_grade.index.name = None
# Display the data frame
scores_by_grade
```

Out[8]:

- Create a table that breaks down school performances based on average Spending Ranges (Per Student). Use 4 reasonable bins to group school spending. Include in the table each of the following:
- Average Math Score
- Average Reading Score
- % Passing Math
- % Passing Reading
- Overall Passing Rate (Average of the above two)

In [9]:

```
# Establish the bins
spending_bins = [0, 585, 615, 645, 675]
group_names = ["<$585", "$585-615", "$615-645", "$645-675"]
# Categorize the spending based on the bins
per_school_summary["Spending Ranges (Per Student)"] = pd.cut(per_school_capita, spending_bins, labels=group_names)
spending_math_scores = per_school_summary.groupby(["Spending Ranges (Per Student)"]).mean()["Average Math Score"]
spending_reading_scores = per_school_summary.groupby(["Spending Ranges (Per Student)"]).mean()["Average Reading Score"]
spending_passing_math = per_school_summary.groupby(["Spending Ranges (Per Student)"]).mean()["% Passing Math"]
spending_passing_reading = per_school_summary.groupby(["Spending Ranges (Per Student)"]).mean()["% Passing Reading"]
overall_passing_rate = (spending_passing_math + spending_passing_reading) / 2
# Assemble into data frame
spending_summary = pd.DataFrame({"Average Math Score" : spending_math_scores,
"Average Reading Score": spending_reading_scores,
"% Passing Math": spending_passing_math,
"% Passing Reading": spending_passing_reading,
"% Overall Passing Rate": overall_passing_rate})
# Minor data munging
spending_summary = spending_summary[["Average Math Score",
"Average Reading Score",
"% Passing Math", "% Passing Reading",
"% Overall Passing Rate"]]
# Display results
spending_summary
```

Out[9]:

- Perform the same operations as above, based on school size.

In [10]:

```
# Establish the bins
size_bins = [0, 1000, 2000, 5000]
group_names = ["Small (<1000)", "Medium (1000-2000)", "Large (2000-5000)"]
# Categorize the spending based on the bins
per_school_summary["School Size"] = pd.cut(per_school_summary["Total Students"], size_bins, labels=group_names)
# Calculate the scores based on bins
size_math_scores = per_school_summary.groupby(["School Size"]).mean()["Average Math Score"]
size_reading_scores = per_school_summary.groupby(["School Size"]).mean()["Average Reading Score"]
size_passing_math = per_school_summary.groupby(["School Size"]).mean()["% Passing Math"]
size_passing_reading = per_school_summary.groupby(["School Size"]).mean()["% Passing Reading"]
overall_passing_rate = (size_passing_math + size_passing_reading) / 2
# Assemble into data frame
size_summary = pd.DataFrame({"Average Math Score" : size_math_scores,
"Average Reading Score": size_reading_scores,
"% Passing Math": size_passing_math,
"% Passing Reading": size_passing_reading,
"% Overall Passing Rate": overall_passing_rate})
# Minor data munging
size_summary = size_summary[["Average Math Score",
"Average Reading Score",
"% Passing Math", "% Passing Reading",
"% Overall Passing Rate"]]
# Display results
size_summary
```

Out[10]:

- Perform the same operations as above, based on school type.

In [11]:

```
# Type | Average Math Score | Average Reading Score | % Passing Math | % Passing Reading | % Overall Passing Rate
type_math_scores = per_school_summary.groupby(["School Type"]).mean()["Average Math Score"]
type_reading_scores = per_school_summary.groupby(["School Type"]).mean()["Average Reading Score"]
type_passing_math = per_school_summary.groupby(["School Type"]).mean()["% Passing Math"]
type_passing_reading = per_school_summary.groupby(["School Type"]).mean()["% Passing Reading"]
overall_passing_rate = (type_passing_math + type_passing_reading) / 2
# Assemble into data frame
type_summary = pd.DataFrame({"Average Math Score" : type_math_scores,
"Average Reading Score": type_reading_scores,
"% Passing Math": type_passing_math,
"% Passing Reading": type_passing_reading,
"% Overall Passing Rate": overall_passing_rate})
# Minor data munging
type_summary = type_summary[["Average Math Score",
"Average Reading Score",
"% Passing Math",
"% Passing Reading",
"% Overall Passing Rate"]]
# Display results
type_summary
```

Out[11]:

In [ ]:

```
```