Pymaceuticals Inc.¶

Analysis¶

Overall, it is clear that Capomulin outperforms all other treatment options in the screen.
Capomulin was the only treatment to reduce tumor volume. It held to a 19% reduction in tumor volume over the course of trial, whereas all other drugs were correlated with an increase in tumor volume by roughly 40-50%.
Capomulin greatly limited the spread of the tumor compared to other treatment options. By study end, the average mouse on Capomulin had only 1 new metastatic site, as opposed to the average 2-3 found in mice of other treatment options.
Lastly, mice on the Capomulin treatment had the highest survival rate of any treatment in the screen. Over 90% of mice treated by Capomulin survived the full duration of the trial, compared to only 35-45% of mice on other treatment options.

# Dependencies and Setup
%matplotlib inline
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

# Hide warning messages in notebook
import warnings
warnings.filterwarnings('ignore')

# File to Load (Remember to Change These)
mouse_drug_data = pd.read_csv("data/mouse_drug_data.csv")
clinical_trial_data = pd.read_csv("data/clinicaltrial_data.csv")
df = pd.merge(clinical_trial_data, mouse_drug_data,  how = "left", on=["Mouse ID","Mouse ID"])
df.head()

Tumor Response to Treatment¶

We are tasked with creating a time series line plot that tracks tumor volume mean with error bars. To do this, we must obtain means and standard errors for each drug at each timepoint.
First we must use groupby() on drug type and timepoint, so as to produce workable values.
Next, we must munge the data so that each column represents a a drug, and each row represents a timepoint ("long format").
Finally, we must generate the plot for the drugs that have been pre-specified as important (namely, Capomulin, Infubinol, Ketapril, and placebo).

# Store the Mean Tumor Volume Data Grouped by Drug and Timepoint 
tumor_vols_mean = df.groupby(["Drug", "Timepoint"]).mean()["Tumor Volume (mm3)"]
# Convert to DataFrame
tumor_vols_mean_df = pd.DataFrame(tumor_vols_mean)
tumor_vols_mean_df = tumor_vols_mean_df.reset_index()
# Preview DataFrame
tumor_vols_mean_df.head()

# Store the Standard Error of Tumor Volumes Grouped by Drug and Timepoint
tumor_vols_se = df.groupby(["Drug", "Timepoint"]).sem()["Tumor Volume (mm3)"]
# Convert to DataFrame
tumor_vols_se_df = pd.DataFrame(tumor_vols_se)
tumor_vols_se_df = tumor_vols_se_df.reset_index()
# Preview DataFrame
tumor_vols_se_df.head()

# Convert data from long to wide format
tumor_vols_mean_df_wide = tumor_vols_mean_df.pivot(index="Timepoint", columns="Drug")["Tumor Volume (mm3)"]
tumor_vols_se_df_wide = tumor_vols_se_df.pivot(index="Timepoint", columns="Drug")["Tumor Volume (mm3)"]
# Preview that Reformatting worked
tumor_vols_mean_df_wide.head()

# Generate the Plot (with Error Bars)
      # Since we set the index to timepoint, we can use that as our x value.
plt.errorbar(tumor_vols_mean_df_wide.index, tumor_vols_mean_df_wide["Capomulin"], yerr=tumor_vols_se_df_wide["Capomulin"], color="r", marker="o", markersize=5, linestyle="dashed", linewidth=0.50)
plt.errorbar(tumor_vols_mean_df_wide.index, tumor_vols_mean_df_wide["Infubinol"], yerr=tumor_vols_se_df_wide["Infubinol"], color="b", marker="^", markersize=5, linestyle="dashed", linewidth=0.50)
plt.errorbar(tumor_vols_mean_df_wide.index, tumor_vols_mean_df_wide["Ketapril"], yerr=tumor_vols_se_df_wide["Ketapril"], color="g", marker="s", markersize=5, linestyle="dashed", linewidth=0.50)
plt.errorbar(tumor_vols_mean_df_wide.index, tumor_vols_mean_df_wide["Placebo"], yerr=tumor_vols_se_df_wide["Placebo"], color="k", marker="d", markersize=5, linestyle="dashed", linewidth=0.50)

plt.title("Tumor Response to Treatment")
plt.ylabel("Tumor Volume (mm3)")
plt.xlabel("Time (Days)")
plt.grid(True)
plt.legend(loc="best", fontsize="small", fancybox=True)
# Save the Figure
# Save the Figure
plt.savefig("analysis/Fig1.png")

# Show the Figure
plt.show()

Metastatic Response to Treatment¶

This ask is identical to the previous ask, except that the variable of interest is different. However, it is being treated the same (by taking the mean and the standard error)

# Store the Mean Met. Site Data Grouped by Drug and Timepoint 
metastatic_response_mean = df.groupby(["Drug", "Timepoint"]).mean()["Metastatic Sites"]
# Convert to DataFrame
metastatic_response_mean_df = pd.DataFrame(metastatic_response_mean)
# Preview DataFrame
metastatic_response_mean_df.head()

# Store the Standard Error associated with Met. Sites Grouped by Drug and Timepoint 
metastatic_response_se = df.groupby(["Drug", "Timepoint"]).sem()["Metastatic Sites"]

# Convert to DataFrame
metastatic_response_se_df = pd.DataFrame(metastatic_response_se)
# Preview DataFrame
metastatic_response_se_df.head()

# Minor Data Munging to Re-Format the Data Frames
metastatic_response_mean_df2 = metastatic_response_mean_df.reset_index()
metastatic_response_mean_df_wide = metastatic_response_mean_df2.pivot(index="Timepoint", columns="Drug")["Metastatic Sites"]

metastatic_response_se_df2 = metastatic_response_se_df.reset_index()
metastatic_response_se_df_wide = metastatic_response_se_df2.pivot(index="Timepoint", columns="Drug")["Metastatic Sites"]

# Preview that Reformatting worked
metastatic_response_mean_df_wide.head()

plt.errorbar(metastatic_response_mean_df_wide.index, metastatic_response_mean_df_wide["Capomulin"], yerr=metastatic_response_se_df_wide["Capomulin"], color="r", marker="o", markersize=5, linestyle="dashed", linewidth=0.50)
plt.errorbar(metastatic_response_mean_df_wide.index, metastatic_response_mean_df_wide["Infubinol"], yerr=metastatic_response_se_df_wide["Infubinol"], color="b", marker="^", markersize=5, linestyle="dashed", linewidth=0.50)
plt.errorbar(metastatic_response_mean_df_wide.index, metastatic_response_mean_df_wide["Ketapril"], yerr=metastatic_response_se_df_wide["Ketapril"], color="g", marker="s", markersize=5, linestyle="dashed", linewidth=0.50)
plt.errorbar(metastatic_response_mean_df_wide.index, metastatic_response_mean_df_wide["Placebo"], yerr=metastatic_response_se_df_wide["Placebo"], color="k", marker="d", markersize=5, linestyle="dashed", linewidth=0.50)

plt.title("Metastatic Spread During Treatment")
plt.ylabel("Met. Sites")
plt.xlabel("Time (Days)")
plt.grid(True)
plt.legend(loc="best", fontsize="small", fancybox=True)
# Save the Figure
# Save the Figure
plt.savefig("analysis/Fig2.png")

# Show the Figure
plt.show()

Survival Rates¶

This ask is similar to the previous two asks, but with a couple differences.
We need to do a count of the scores (when there are less scores, that's because there are less mice).
We need to draw a proportion by dividing the count at each timepoint by the total number of mice.

# Store the Count of Mice Grouped by Drug and Timepoint (W can pass any metric)
mice_still_alive = df.groupby(["Drug", "Timepoint"]).count()["Tumor Volume (mm3)"]
# Convert to DataFrame
mice_still_alive_df = pd.DataFrame(mice_still_alive)


# Note: Resetting the index here fills in the "Drug" column with repetitions automatically. 
# Otherwise, it would retain groupby object structure.
mice_still_alive_df.head().reset_index()

# Minor Data Munging to Re-Format the Data Frames
mice_still_alive_df2 = mice_still_alive_df.reset_index()
mice_still_alive_df_wide = mice_still_alive_df2.pivot(index="Timepoint", columns="Drug")["Tumor Volume (mm3)"]
# Preview the Data Frame
mice_still_alive_df_wide.head()

# Generate the Plot (Accounting for percentages)
plt.plot(100 * mice_still_alive_df_wide["Capomulin"] / 25, "ro", linestyle="dashed", markersize=5, linewidth=0.50)
plt.plot(100 * mice_still_alive_df_wide["Infubinol"] / 25, "b^", linestyle="dashed", markersize=5, linewidth=0.50)
plt.plot(100 * mice_still_alive_df_wide["Ketapril"] / 25, "gs", linestyle="dashed", markersize=5, linewidth=0.50)
plt.plot(100 * mice_still_alive_df_wide["Placebo"] / 25 , "kd", linestyle="dashed", markersize=6, linewidth=0.50)
plt.title("Mice Survival Rates During Treatment")
plt.ylabel("Survival Rate (%)")
plt.xlabel("Time (Days)")
plt.grid(True)
plt.legend(loc="best", fontsize="small", fancybox=True)

# Save the Figure
plt.savefig("analysis/Fig3.png")

# Show the Figure
plt.show()

Summary Bar Graph¶

This ask requires calculating the difference between the first and last values for each drug as a percentage.
Then, we must convert the answers into a tuple, which can be used in conjunction with user-defined functions to produce the desired graph.

# Calculate the percent changes for each drug
tumor_pct_change =  100 * (tumor_vols_mean_df_wide.iloc[-1] - tumor_vols_mean_df_wide.iloc[0]) / tumor_vols_mean_df_wide.iloc[0]
# Display the data to confirm
tumor_pct_change

Drug
Capomulin   -19.475303
Ceftamin     42.516492
Infubinol    46.123472
Ketapril     57.028795
Naftisol     53.923347
Placebo      51.297960
Propriva     47.241175
Ramicane    -22.320900
Stelasyn     52.085134
Zoniferol    46.579751
dtype: float64

# Store all Relevant Percent Changes into a Tuple
pct_changes = (tumor_pct_change["Capomulin"], 
               tumor_pct_change["Infubinol"], 
               tumor_pct_change["Ketapril"], 
               tumor_pct_change["Placebo"])

# Splice the data between passing and failing drugs
fig, ax = plt.subplots()
ind = np.arange(len(pct_changes))  
width = 1
rectsPass = ax.bar(ind[0], pct_changes[0], width, color='green')
rectsFail = ax.bar(ind[1:], pct_changes[1:], width, color='red')

# Orient widths. Add labels, tick marks, etc. 
ax.set_ylabel('% Tumor Volume Change')
ax.set_title('Tumor Change Over 45 Day Treatment')
ax.set_xticks(ind + 0.5)
ax.set_xticklabels(('Capomulin', 'Infubinol', 'Ketapril', 'Placebo'))
ax.set_autoscaley_on(False)
ax.set_ylim([-30,70])
ax.grid(True)

# Use functions to label the percentages of changes
def autolabelFail(rects):
    for rect in rects:
        height = rect.get_height()
        ax.text(rect.get_x() + rect.get_width()/2., 3,
                '%d%%' % int(height),
                ha='center', va='bottom', color="white")
        
def autolabelPass(rects):
    for rect in rects:
        height = rect.get_height()
        ax.text(rect.get_x() + rect.get_width()/2., -8,
                '-%d%% ' % int(height),
                ha='center', va='bottom', color="white")

# Call functions to implement the function calls
autolabelPass(rectsPass)
autolabelFail(rectsFail)

# Save the Figure
fig.savefig("analysis/Fig4.png")

# Show the Figure
fig.show()

Drug	Capomulin	Ceftamin	Infubinol	Ketapril	Naftisol	Placebo	Propriva	Ramicane	Stelasyn	Zoniferol
Timepoint
0	45.000000	45.000000	45.000000	45.000000	45.000000	45.000000	45.000000	45.000000	45.000000	45.000000
5	44.266086	46.503051	47.062001	47.389175	46.796098	47.125589	47.248967	43.944859	47.527452	46.851818
10	43.084291	48.285125	49.403909	49.582269	48.694210	49.423329	49.101541	42.531957	49.463844	48.689881
15	42.064317	50.094055	51.296397	52.399974	50.933018	51.359742	51.067318	41.495061	51.529409	50.779059
20	40.716325	52.157049	53.197691	54.920935	53.644087	54.364417	53.346737	40.238325	54.067395	53.170334

Drug	Capomulin	Ceftamin	Infubinol	Ketapril	Naftisol	Placebo	Propriva	Ramicane	Stelasyn	Zoniferol
Timepoint
0	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000
5	0.160000	0.380952	0.280000	0.304348	0.260870	0.375000	0.320000	0.120000	0.240000	0.166667
10	0.320000	0.600000	0.666667	0.590909	0.523810	0.833333	0.565217	0.250000	0.478261	0.500000
15	0.375000	0.789474	0.904762	0.842105	0.857143	1.250000	0.764706	0.333333	0.782609	0.809524
20	0.652174	1.111111	1.050000	1.210526	1.150000	1.526316	1.000000	0.347826	0.952381	1.294118

	Mouse ID	Tumor Volume (mm3)	Drug
0	b128	45.0	Capomulin
1	f932	45.0	Ketapril
2	g107	45.0	Ketapril
3	a457	45.0	Ketapril
4	c819	45.0	Ketapril

Summary

Solution

Pymaceuticals Inc.¶

Analysis¶

Tumor Response to Treatment¶

Metastatic Response to Treatment¶

Survival Rates¶

Summary Bar Graph¶