Workspace
Gino Sartor/

Mammary Cancer Survival Analysis

0
Beta
Spinner

Survival analysis

In medicine, especially but not only in oncology, it is very important to analyse overall survival, recurrence-free survival and perform other time-to-event analyses. In this example, using simulated data, we will analyse overall mammary cancer survival. We will try to see if survival is associated with a few predictive variables.

The dataset

For every woman, we have data about

  • date of cancer diagnosis
  • date of last follow up visit
  • survival at last follow up visit
  • possible predictors of survival: BMI, cancer stage, neo_adiuvant_therapy

suppressWarnings(suppressMessages(library(dplyr, quiet = T)))

set.seed(111)
# Upload data (I created the data, it's not real!)
db = read.csv("mammary_cancer_data.csv") %>% select(-X) %>% 
filter(as.Date(date_diagnosis,format="%d/%m/%Y")>as.Date("01/01/1900",format="%d/%m/%Y"))

db = db %>% 
# simulate stage data and neo_adiuvant_therapy data accordingly
mutate(stage = sample(1:4, nrow(db), replace = T),
       neo_adiuvant_therapy = ifelse(stage %in% c(3,4), 1, neo_adiuvant_therapy),
# calculate survival time after diagnosis
	   survival_time = abs(as.numeric(as.Date(last_visit,format="%d/%m/%Y") - as.Date(date_diagnosis,format="%d/%m/%Y")-sample(1:5000, nrow(db)))), 
       survival_factor = as.factor(survival),
       neo_adiuvant_therapy = as.factor(neo_adiuvant_therapy), 
       stage = as.factor(stage), 
       BMI_cat = ifelse(BMI > 30, "Obese", "Not Obese"))
head(db)

Data summary

BMI, stage and neoadiuvant therapy are not significantly associated with survival status at bivariate analysis, but time has not yet been considered.

# Summary of data
install.packages("arsenal", quiet = T)
library(arsenal, quiet = T)
as.data.frame(summary(tableby(survival_factor ~ BMI + neo_adiuvant_therapy + stage, data = db), text = T))

Kaplan-Meier Analysis

Overall Kaplan-Meier survival curve shows a median survival of more than 4000 days, which is more than 10 years.

library(survival, quiet = T)
library(ggplot2, quiet = T)
suppressWarnings(suppressMessages(install.packages("survminer", quiet = T)))
suppressWarnings(suppressMessages(library(survminer, quiet = T)))
# overall survival
sop <- Surv(db$survival_time, db$survival)
fitkm <- survfit(sop ~ 1, type="kaplan-meier")

ggsurvplot(
  fitkm,
  data = db,
  conf.int = TRUE,
  xlab = "Days", ylab = "",
  ggtheme = theme_light(),
  surv.median.line = "hv",  legend.labs = c("Overall survival"),
  legend.title = "",risk.table = F,
  palette = c("#8C3F4D"), font.y = 15, font.x = 15, font.tickslab = 10, font.legend = 12)

Bivariate and Multivariate Analysis

BMI is associated with survival at bivariate analysis while neoadiuvant therapy and stage seem not (remember this is fake data!!)

listavariabili<-list("BMI_cat", "stage", "neo_adiuvant_therapy")
listatitoli<- c("BMI", "Cancer stage", "Neo-adiuvant therapy")
listaetichette <- list(c("Not Obese", "Obese"), c("1", "2", "3", "4"), c("No", "Yes"))
formulae <- lapply(listavariabili, function(x) as.formula(paste0("sop ~ ", x)))
fits <- surv_fit(formulae, data = db)

p1_st <- ggsurvplot(fits, data = db,
  pval = TRUE,
  #pval.coord = c(200, 0.10),
  xlab = "Days", ylab = "",
  ggtheme = theme_light(),
  #surv.median.line = "hv",
  title = "",
  legend.title=listatitoli, tables.height = 0.28, tables.theme = theme_cleantable(),  conf.int = F, tables.col = "strata", tables.y.text = F, censor = T,
  palette = c("#8C3F4D", "#3E606F", "#6476a1", "red"), font.y = 7, font.x = 7, font.tickslab = 10, font.legend = 12, legend.labs = listaetichette,
  risk.table = T, risk.table.title = "", risk.table.fontsize = 3
  )
options(repr.plot.width=10, repr.plot.height=20)
arrange_ggsurvplots(p1_st, ncol = 1, nrow = 3, print = TRUE, title = "Kaplan-Meier plots for survival in mammary cancer stratified by Body Mass Index, neoadiuvant therapy and stage.\nLog rank test p values are shown for each group.")
suppressWarnings(suppressMessages(install.packages("Publish", quiet = T)))
suppressWarnings(suppressMessages(library(Publish, quiet = T)))
fit<-coxph(sop~stage + neo_adiuvant_therapy + BMI, data = db)
a <- publish(fit, print = F, factor.reference = "extraline", units = NULL, probindex = FALSE,pvalue.digits=3,pvalue.stars=TRUE)
a_df <- as.data.frame(a[["regressionTable"]])
a_df[, 1:5]
  • AI Chat
  • Code