Week 1: Introduction to Survival Analysis

Definitions

Survival Time (T): Time until an event occurs; also called time-to-event, death time, or reliability time

Censored Data: Incomplete observation data where the event hasn’t occurred by the end of study

  • Event did not occur until end of study
  • Sample unit no longer collectable before end of study

Truncated Data: Data with left-truncation (subjects entered study after time 0)

Concept

Survival analysis studies time-to-event data. Key components:

  1. Origin: Start of observation time
  2. Scale: Time measurement unit (days, months, hours)
  3. Event: The endpoint being measured (death, failure, relapse)

Week 2: Basic Quantities of Survival Distribution

Definitions

Survival Function S(t): probability surviving beyond time t

Recall the definition of random variable T here

Hazard Function h(t): Instantaneous failure rate at time t given survival until t

See conceptual example here

Cumulative Hazard H(t): — total hazard accumulated up to time t

Mean Residual Life mrl(t): expected remaining time after surviving t

Median Life: Time where 50% of subjects have experienced the event

Concepts

S(t) properties:

  • Monotonically decreasing
  • , (starts at 1, approaches 0)

h(t) properties: Can be increasing, decreasing, constant, or bathtub-shaped

Relationship: and

Hazard function

Rate of the event occurring per unit time (t), among those who haven’t experienced the event yet.

  • A high h(t) at some time t means: among those still alive at t, failures are happening rapidly.
  • A low h(t) means failures are rare at that moment.

The “instantaneous” part means you’re shrinking the window to a single point in time rather than measuring over a finite interval (like survival function).

100 light bulbs. All running at time t=1000 hours. 5 fail between 1000 and 1001 hours.

At t=1000, survivors are failing at a rate of 0.05 per hour.

Survival vs hazard function

  • Survival describes the subject’s survivability for an event (event does not occur)
  • Conversely, hazard describes the subject’s risk of NOT surviving (event occurs)

Cumulative hazard: Accumulated risk over time.

Driving a car: every hour you drive, you face some hazard of an accident. H(t) is the total risk you’ve accumulated after t hours of driving. Even if the hourly risk is small, H(t) keeps growing the longer you drive.

Mean residual life

Try interpreting (given ) :

  • : How much time remains from now () until the event .
  • : Mean of time left (survival time)

It’s basically the mean of remaining time left before event occurs

Formulas

Total Time on Test (TTT): Cumulative operating time until last failure — sum of all observed times (both event and censored)

where is the failure or censoring time for subject .

Interpretation

TTT represents the total “machine-hours” or “person-years” that subjects contributed to the study. It’s used in:

  • TTT Plot: A diagnostic tool to check if data follows Weibull distribution
  • Nelson-Aalen estimator: TTT appears in variance calculations

Hazard function relationship proof

The limit is by definition :

MRL proof

By definition of Conditional Probability. For a Continuous Random Variable, the conditional density given is:

Let , then . So, by definition of Expectation for continuous random variable,

Integration by parts with , :

The boundary term: at , . At , (assuming finite mean). So the boundary term vanishes, leaving:

Therefore:


Week 3: Parametric Models

Exponential Distribution

Concept: Simplest model; assumes constant hazard rate. Has “lack of memory” property.

Formulas (for , ):

Notice that . We can interpret as definition of hazard function:

Rate of the event occurring per unit time (t), among those who haven’t experienced the event yet.

Weibull Distribution

Concept: Generalization of exponential with shape parameter . Can model increasing, decreasing, or constant hazard.

Formulas (for ):

  • : Decreasing hazard
  • : Constant hazard (reduces to exponential)
  • : Increasing hazard

Gamma Distribution

Concept: Another generalization of exponential.

Formulas:

Cheatsheet

Exp
Weibull
Gamma

Formulas

Total Time on Test (TTT): Cumulative operating time until last failure — sum of all observed times (both event and censored)

$TTT = \sum_{i=1}^n t_i$$

where is the failure or censoring time for subject .

Interpretation

TTT represents the total “machine-hours” or “person-years” that subjects contributed to the study. It’s used in:

  • TTT Plot: A diagnostic tool to check if data follows Weibull distribution
  • Nelson-Aalen estimator: TTT appears in variance calculations

Example

Case: Risk of getting sick during flu season. Hazard increases over time as the season peaks.

Use Weibull with , (increasing hazard).

t (days)h(t)
100.02
300.06
600.12

Interpretation of h(t): On day 10, survivors (still-healthy people) are getting sick at a rate of 0.02 per day. By day 60, that rate has jumped to 0.12 — the flu season is peaking, risk is much higher.

t (days)H(t)
100.1
300.9
603.6

Interpretation of H(t): By day 30 you’ve accumulated 0.9 units of total sickness risk — nearly 1 full unit. By day 60 it’s 3.6, and survival probability is , meaning only 2.7% of people have avoided getting sick by day 60.


Week 4: Censoring and Truncation

Definitions

Censoring (Penyensoran): When the exact survival time is not fully observed — only partial information is available. The subject has not experienced the event by the end of study, or is lost to follow-up before the event occurs.

Mostly about not having enough information.

Truncation (Pemancungan): When subjects are only observed if their event time falls within a certain window. Those whose event occurs outside the observation window are not included in the study at all.

Concepts

Types of Censoring:

  1. Right Censoring: Event time is beyond a certain point (study ends, subject drops out)

    • Type I: Study ends at fixed time
    • Type II: Study ends when events occur
    • Random/Progressive: Subject lost to follow-up
  2. Left Censoring: Event occurred before study started but exact time unknown

  3. Interval Censoring: Event known to occur within an interval

TypeWhat we knowWhat we don’t know
Right censored- Survived until time
If/when event occurred after
Left censoredEvent occurred before studyExactly when
Interval censoredEvent occurred in Exact time

Types of Truncation:

  1. Left Truncation: Subject enters study after time 0; only observed if
  2. Right Truncation: Subject exits study before event; only observed if

Formulas

Likelihood for right-censored data:

where is the event indicator:

  • if event observed at time
  • if censored at time

Nelson-Aalen & Kaplan-Meier estimators:

where:

  • = number of events at time
  • = number at risk just before

Examples

  1. Clinical Trial: 30 patients treated for heart disease, observed for 6 years. Only 10 had strokes during the study. The other 20 are right-censored (type I) — we know they survived at least 6 years but don’t know when (or if) they will have a stroke.

  2. Carcinogen Study: 40 mice injected with carcinogen, observed until 25 show disease symptoms. The remaining 15 mice are right-censored (type II) — they may develop disease later but we stopped before observing it.

  3. Survey: Children asked when they started using gadgets. Some cannot remember exact time (left-censored), some started during the study (observed), some haven’t started yet (right-censored (random/progressive)).


Key point: Censoring and truncation affect the likelihood function and require special statistical methods (non-parametric like Kaplan-Meier, or semi-parametric like Cox proportional hazards) to properly analyze.


Week 5: Advanced Censoring & Truncation

Detailed Censoring (Penyensoran)

Censoring occurs when we have only partial information about the exact survival time.

Types of Right Censoring (Penyensoran Kanan):

  1. Type I (Time Censoring): Study ends at a pre-determined time set by the researcher.

    • Fixed: All subjects stop at the same time (e.g., mice observed for exactly 14 days).
    • Progressive: Different fixed censoring times are assigned at the start.
    • Generalized: Subjects enter the study at different times and are censored if the event hasn’t occurred when the study ends or they leave.
  2. Type II (Failure Censoring): Study ends when a pre-specified number of events occur among subjects.

    • Simple: Stops exactly at the -th failure. The total duration of the study is a random variable.
    • Progressive: Some survivors are intentionally removed from the study at various intermediate event times.
  3. Competing Risk: Occurs when multiple types of events are possible. The occurrence of one event prevents the observation of others (e.g., in a study on leukemia, death from other causes is a competing risk for disease relapse).

Left Censoring (Penyensoran Kiri): The event occurred before a certain time , but the exact time is unknown (e.g., a child already knows how to use a gadget before the survey starts).

Interval Censoring (Penyensoran Interval): The event is known to have occurred within a specific interval (e.g., a tumor is detected during a follow-up at 24 months, but was not present at the 18-month check-up).

Double Censoring (Penyensoran Ganda): A dataset that contains both left-censored and right-censored observations (e.g., a gadget usage survey where some children already use them, some start during the study, and some haven’t started by the end).

Truncation (Pemancungan)

Truncation is a selection mechanism by design. Only subjects who satisfy certain conditions regarding their survival time are included in the sample.

  1. Right Truncation (Pemancungan Kanan): Only subjects who have already experienced the event before a time are included.

    • Condition:
    • Example: Using historical death records to study lifespan. People who are still alive are not in the records and are excluded from the analysis.
  2. Left Truncation (Pemancungan Kiri / Delayed Entry): Only subjects who have not yet experienced the event at time are included.

    • Condition:
    • Subjects must survive until to enter the study (Delayed Entry).
    • Example:
      • Nursing Home: Studying age of death among residents. Subjects must survive long enough to enter the home. Those who die before entering are never observed.
      • Life Insurance: Policyholders must be alive at the time they sign up for the policy.

Comparison Summary

FeatureCensoringTruncation
NatureMissing information about exact timeSelection bias by study design
AwarenessResearcher knows the subject exists but not the exact Researcher may not even know the excluded subjects exist
LikelihoodUses for events, for censoredUses conditional probabilities given selection criteria

Likelihood Construction Examples

Right-Censored Data

Scenario: 5 patients in a clinical trial. 3 die at times 2, 5, 8; 2 are censored at times 3, 6.

Step-by-step construction:

  1. For event times (): contribute to likelihood
  2. For censored times (): contribute to likelihood

For exponential model with constant hazard :

To find MLE: , then

Left-Truncated Data

Scenario: Nursing home study. Subject enters at age 70, dies at age 85.

For left-truncated data with entry time :

  • Must survive until to be observed
  • Contribution: (conditional on surviving until entry)

Combined Left-Truncated & Right-Censored

For subject who enters at , has event/censoring at :


Week 6: Non-Parametric Estimation (Penaksiran Non-Parametrik)

Estimating the Survival Function (Kaplan-Meier Approach)

The Kaplan-Meier estimator builds the survival curve step-by-step, calculating the conditional probability of surviving past each observed event time.

Timeline and Definitions:

  • : Total number of subjects at the start (time ).
  • : Number of individuals at risk just before time .
  • : Number of events (failures/deaths) that occur at time .
  • : Number of censored individuals between and .

Step-by-step Intuition:

  1. At time : Everyone is alive.
  2. At time : There are people at risk. people experience the event. The probability of surviving past given survival up to is .
  3. At time : There are people at risk. Notice that drops not just because of the deaths , but also because of any censored observations that occurred between and . So, . people experience the event. The conditional survival probability is .

General Kaplan-Meier Formula:

Estimating the Cumulative Hazard (Nelson-Aalen Approach)

The Nelson-Aalen estimator calculates the cumulative hazard by adding up the instantaneous hazard rates at each event time.

Step-by-step Intuition:

  1. At time : Out of people at risk, fail. The hazard rate is .
  2. At time : Out of people at risk, fail. The hazard rate is .

Cumulative Hazard up to : Simply sum the individual hazards:

General Nelson-Aalen Formula:

Variance Estimation: Greenwood’s Formula

To estimate uncertainty in , use Greenwood’s formula:

Intuition: Kaplan-Meier is a product of conditional probabilities. Using the delta method, the variance of is the sum of variances from each term.

Simplified form when (no ties):

For Nelson-Aalen:


Confidence Intervals

Pointwise CI for

Linear scale (can go below 0 or above 1 — not recommended):

Log-log transform (recommended — keeps within ):

where

For 95% CI:


Summary: Non-Parametric Estimators

EstimatorFormulaVariance
Kaplan-Meier Greenwood
Nelson-Aalen

Key relationship: