Week 1: Introduction to Survival Analysis
Definitions
Survival Time (T): Time until an event occurs; also called time-to-event, death time, or reliability time
Censored Data: Incomplete observation data where the event hasn’t occurred by the end of study
- Event did not occur until end of study
- Sample unit no longer collectable before end of study
Truncated Data: Data with left-truncation (subjects entered study after time 0)
Concept
Survival analysis studies time-to-event data. Key components:
- Origin: Start of observation time
- Scale: Time measurement unit (days, months, hours)
- Event: The endpoint being measured (death, failure, relapse)
Week 2: Basic Quantities of Survival Distribution
Definitions
Survival Function S(t): — probability surviving beyond time t
Recall the definition of random variable T here
Hazard Function h(t): Instantaneous failure rate at time t given survival until t
Cumulative Hazard H(t): — total hazard accumulated up to time t
Mean Residual Life mrl(t): — expected remaining time after surviving t
Median Life: Time where 50% of subjects have experienced the event
Concepts
S(t) properties:
- Monotonically decreasing
- , (starts at 1, approaches 0)
h(t) properties: Can be increasing, decreasing, constant, or bathtub-shaped
Relationship: and
Hazard function
Rate of the event occurring per unit time (t), among those who haven’t experienced the event yet.
- A high h(t) at some time t means: among those still alive at t, failures are happening rapidly.
- A low h(t) means failures are rare at that moment.
The “instantaneous” part means you’re shrinking the window to a single point in time rather than measuring over a finite interval (like survival function).
100 light bulbs. All running at time t=1000 hours. 5 fail between 1000 and 1001 hours.
At t=1000, survivors are failing at a rate of 0.05 per hour.
Survival vs hazard function
- Survival describes the subject’s survivability for an event (event does not occur)
- Conversely, hazard describes the subject’s risk of NOT surviving (event occurs)
Cumulative hazard: Accumulated risk over time.
Driving a car: every hour you drive, you face some hazard of an accident. H(t) is the total risk you’ve accumulated after t hours of driving. Even if the hourly risk is small, H(t) keeps growing the longer you drive.
Mean residual life
Try interpreting (given ) :
- : How much time remains from now () until the event .
- : Mean of time left (survival time)
It’s basically the mean of remaining time left before event occurs
Formulas
Total Time on Test (TTT): Cumulative operating time until last failure — sum of all observed times (both event and censored)
where is the failure or censoring time for subject .
Interpretation
TTT represents the total “machine-hours” or “person-years” that subjects contributed to the study. It’s used in:
- TTT Plot: A diagnostic tool to check if data follows Weibull distribution
- Nelson-Aalen estimator: TTT appears in variance calculations
Hazard function relationship proof
The limit is by definition :
MRL proof
By definition of Conditional Probability. For a Continuous Random Variable, the conditional density given is:
Let , then . So, by definition of Expectation for continuous random variable,
Integration by parts with , :
The boundary term: at , . At , (assuming finite mean). So the boundary term vanishes, leaving:
Therefore:
Week 3: Parametric Models
Exponential Distribution
Concept: Simplest model; assumes constant hazard rate. Has “lack of memory” property.
Formulas (for , ):
Notice that . We can interpret as definition of hazard function:
Rate of the event occurring per unit time (t), among those who haven’t experienced the event yet.
Weibull Distribution
Concept: Generalization of exponential with shape parameter . Can model increasing, decreasing, or constant hazard.
Formulas (for ):
- : Decreasing hazard
- : Constant hazard (reduces to exponential)
- : Increasing hazard
Gamma Distribution
Concept: Another generalization of exponential.
Formulas:
Cheatsheet
| Exp | ||||||
| Weibull | — | — | ||||
| Gamma | — | — | — |
Formulas
Total Time on Test (TTT): Cumulative operating time until last failure — sum of all observed times (both event and censored)
$TTT = \sum_{i=1}^n t_i$$
where is the failure or censoring time for subject .
Interpretation
TTT represents the total “machine-hours” or “person-years” that subjects contributed to the study. It’s used in:
- TTT Plot: A diagnostic tool to check if data follows Weibull distribution
- Nelson-Aalen estimator: TTT appears in variance calculations
Example
Case: Risk of getting sick during flu season. Hazard increases over time as the season peaks.
Use Weibull with , (increasing hazard).
| t (days) | h(t) |
|---|---|
| 10 | 0.02 |
| 30 | 0.06 |
| 60 | 0.12 |
Interpretation of h(t): On day 10, survivors (still-healthy people) are getting sick at a rate of 0.02 per day. By day 60, that rate has jumped to 0.12 — the flu season is peaking, risk is much higher.
| t (days) | H(t) |
|---|---|
| 10 | 0.1 |
| 30 | 0.9 |
| 60 | 3.6 |
Interpretation of H(t): By day 30 you’ve accumulated 0.9 units of total sickness risk — nearly 1 full unit. By day 60 it’s 3.6, and survival probability is , meaning only 2.7% of people have avoided getting sick by day 60.
Week 4: Censoring and Truncation
Definitions
Censoring (Penyensoran): When the exact survival time is not fully observed — only partial information is available. The subject has not experienced the event by the end of study, or is lost to follow-up before the event occurs.
Mostly about not having enough information.
Truncation (Pemancungan): When subjects are only observed if their event time falls within a certain window. Those whose event occurs outside the observation window are not included in the study at all.
Concepts
Types of Censoring:
-
Right Censoring: Event time is beyond a certain point (study ends, subject drops out)
- Type I: Study ends at fixed time
- Type II: Study ends when events occur
- Random/Progressive: Subject lost to follow-up
-
Left Censoring: Event occurred before study started but exact time unknown
-
Interval Censoring: Event known to occur within an interval
| Type | What we know | What we don’t know |
|---|---|---|
| Right censored | - Survived until time | If/when event occurred after |
| Left censored | Event occurred before study | Exactly when |
| Interval censored | Event occurred in | Exact time |
Types of Truncation:
- Left Truncation: Subject enters study after time 0; only observed if
- Right Truncation: Subject exits study before event; only observed if
Formulas
Likelihood for right-censored data:
where is the event indicator:
- if event observed at time
- if censored at time
Nelson-Aalen & Kaplan-Meier estimators:
where:
- = number of events at time
- = number at risk just before
Examples
-
Clinical Trial: 30 patients treated for heart disease, observed for 6 years. Only 10 had strokes during the study. The other 20 are right-censored (type I) — we know they survived at least 6 years but don’t know when (or if) they will have a stroke.
-
Carcinogen Study: 40 mice injected with carcinogen, observed until 25 show disease symptoms. The remaining 15 mice are right-censored (type II) — they may develop disease later but we stopped before observing it.
-
Survey: Children asked when they started using gadgets. Some cannot remember exact time (left-censored), some started during the study (observed), some haven’t started yet (right-censored (random/progressive)).
Key point: Censoring and truncation affect the likelihood function and require special statistical methods (non-parametric like Kaplan-Meier, or semi-parametric like Cox proportional hazards) to properly analyze.
Week 5: Advanced Censoring & Truncation
Detailed Censoring (Penyensoran)
Censoring occurs when we have only partial information about the exact survival time.
Types of Right Censoring (Penyensoran Kanan):
-
Type I (Time Censoring): Study ends at a pre-determined time set by the researcher.
- Fixed: All subjects stop at the same time (e.g., mice observed for exactly 14 days).
- Progressive: Different fixed censoring times are assigned at the start.
- Generalized: Subjects enter the study at different times and are censored if the event hasn’t occurred when the study ends or they leave.
-
Type II (Failure Censoring): Study ends when a pre-specified number of events occur among subjects.
- Simple: Stops exactly at the -th failure. The total duration of the study is a random variable.
- Progressive: Some survivors are intentionally removed from the study at various intermediate event times.
-
Competing Risk: Occurs when multiple types of events are possible. The occurrence of one event prevents the observation of others (e.g., in a study on leukemia, death from other causes is a competing risk for disease relapse).
Left Censoring (Penyensoran Kiri): The event occurred before a certain time , but the exact time is unknown (e.g., a child already knows how to use a gadget before the survey starts).
Interval Censoring (Penyensoran Interval): The event is known to have occurred within a specific interval (e.g., a tumor is detected during a follow-up at 24 months, but was not present at the 18-month check-up).
Double Censoring (Penyensoran Ganda): A dataset that contains both left-censored and right-censored observations (e.g., a gadget usage survey where some children already use them, some start during the study, and some haven’t started by the end).
Truncation (Pemancungan)
Truncation is a selection mechanism by design. Only subjects who satisfy certain conditions regarding their survival time are included in the sample.
-
Right Truncation (Pemancungan Kanan): Only subjects who have already experienced the event before a time are included.
- Condition:
- Example: Using historical death records to study lifespan. People who are still alive are not in the records and are excluded from the analysis.
-
Left Truncation (Pemancungan Kiri / Delayed Entry): Only subjects who have not yet experienced the event at time are included.
- Condition:
- Subjects must survive until to enter the study (Delayed Entry).
- Example:
- Nursing Home: Studying age of death among residents. Subjects must survive long enough to enter the home. Those who die before entering are never observed.
- Life Insurance: Policyholders must be alive at the time they sign up for the policy.
Comparison Summary
| Feature | Censoring | Truncation |
|---|---|---|
| Nature | Missing information about exact time | Selection bias by study design |
| Awareness | Researcher knows the subject exists but not the exact | Researcher may not even know the excluded subjects exist |
| Likelihood | Uses for events, for censored | Uses conditional probabilities given selection criteria |
Likelihood Construction Examples
Right-Censored Data
Scenario: 5 patients in a clinical trial. 3 die at times 2, 5, 8; 2 are censored at times 3, 6.
Step-by-step construction:
- For event times (): contribute to likelihood
- For censored times (): contribute to likelihood
For exponential model with constant hazard :
To find MLE: , then
Left-Truncated Data
Scenario: Nursing home study. Subject enters at age 70, dies at age 85.
For left-truncated data with entry time :
- Must survive until to be observed
- Contribution: (conditional on surviving until entry)
Combined Left-Truncated & Right-Censored
For subject who enters at , has event/censoring at :
Week 6: Non-Parametric Estimation (Penaksiran Non-Parametrik)
Estimating the Survival Function (Kaplan-Meier Approach)
The Kaplan-Meier estimator builds the survival curve step-by-step, calculating the conditional probability of surviving past each observed event time.
Timeline and Definitions:
- : Total number of subjects at the start (time ).
- : Number of individuals at risk just before time .
- : Number of events (failures/deaths) that occur at time .
- : Number of censored individuals between and .
Step-by-step Intuition:
- At time : Everyone is alive.
- At time : There are people at risk. people experience the event. The probability of surviving past given survival up to is .
- At time : There are people at risk. Notice that drops not just because of the deaths , but also because of any censored observations that occurred between and . So, . people experience the event. The conditional survival probability is .
General Kaplan-Meier Formula:
Estimating the Cumulative Hazard (Nelson-Aalen Approach)
The Nelson-Aalen estimator calculates the cumulative hazard by adding up the instantaneous hazard rates at each event time.
Step-by-step Intuition:
- At time : Out of people at risk, fail. The hazard rate is .
- At time : Out of people at risk, fail. The hazard rate is .
Cumulative Hazard up to : Simply sum the individual hazards:
General Nelson-Aalen Formula:
Variance Estimation: Greenwood’s Formula
To estimate uncertainty in , use Greenwood’s formula:
Intuition: Kaplan-Meier is a product of conditional probabilities. Using the delta method, the variance of is the sum of variances from each term.
Simplified form when (no ties):
For Nelson-Aalen:
Confidence Intervals
Pointwise CI for
Linear scale (can go below 0 or above 1 — not recommended):
Log-log transform (recommended — keeps within ):
where
For 95% CI:
Summary: Non-Parametric Estimators
| Estimator | Formula | Variance |
|---|---|---|
| Kaplan-Meier | Greenwood | |
| Nelson-Aalen |
Key relationship: