8  Adverse events summary

TipObjective

Create adverse event summary tables to provide high-level safety overview across treatment groups. Learn to calculate AE rates and percentages using Polars and create comprehensive safety summary tables with rtflite.

8.1 Overview

Adverse events (AE) summary tables are critical safety assessments required in clinical study reports. Following ICH E3 guidance, these tables summarize the overall safety profile by showing the number and percentage of participants experiencing various categories of adverse events across treatment groups.

Key categories typically include:

  • Any adverse event: Total participants with at least one AE
  • Drug-related events: Events potentially related to study treatment
  • Serious adverse events: Events meeting regulatory criteria for seriousness
  • Deaths: Fatal outcomes
  • Discontinuations: Participants who stopped treatment due to AEs

This tutorial shows you how to create an AE summary table using Python’s rtflite package.

import polars as pl
import rtflite as rtf
polars.config.Config

8.2 Step 1: Load data

We need two datasets for AE analysis: the subject-level dataset (ADSL) and the adverse events dataset (ADAE).

# Load datasets
adsl = pl.read_parquet("data/adsl.parquet")
adae = pl.read_parquet("data/adae.parquet")

# Display key variables from ADSL
adsl.select(["USUBJID", "TRT01A", "SAFFL"])
shape: (254, 3)
USUBJID TRT01A SAFFL
str str str
"01-701-1015" "Placebo" "Y"
"01-701-1023" "Placebo" "Y"
"01-701-1028" "Xanomeline High Dose" "Y"
"01-718-1371" "Xanomeline High Dose" "Y"
"01-718-1427" "Xanomeline High Dose" "Y"
# Display key variables from ADAE
adae.select(["USUBJID", "AEREL", "AESER", "AEOUT", "AEACN"])
shape: (1_191, 5)
USUBJID AEREL AESER AEOUT AEACN
str str str str str
"01-701-1015" "PROBABLE" "N" "NOT RECOVERED/NOT RESOLVED" ""
"01-701-1015" "PROBABLE" "N" "NOT RECOVERED/NOT RESOLVED" ""
"01-701-1015" "REMOTE" "N" "RECOVERED/RESOLVED" ""
"01-718-1427" "POSSIBLE" "N" "RECOVERED/RESOLVED" ""
"01-718-1427" "POSSIBLE" "N" "RECOVERED/RESOLVED" ""

Key ADAE variables used in this analysis:

  • USUBJID: Unique subject identifier to link with ADSL
  • AEREL: Relationship of adverse event to study drug (e.g., “RELATED”, “POSSIBLE”, “PROBABLE”, “DEFINITE”, “NOT RELATED”)
  • AESER: Serious adverse event flag (“Y” = serious, “N” = not serious)
  • AEOUT: Outcome of adverse event (e.g., “RECOVERED”, “RECOVERING”, “NOT RECOVERED”, “FATAL”)
  • AEACN: Action taken with study treatment (e.g., “DOSE NOT CHANGED”, “DRUG WITHDRAWN”, “DOSE REDUCED”)

8.3 Step 2: Filter safety population

For safety analyses, we focus on participants who received at least one dose of study treatment.

# Filter to safety population
adsl_safety = adsl.filter(pl.col("SAFFL") == "Y").select(["USUBJID", "TRT01A"])

# Get treatment counts for denominators
pop_counts = adsl_safety.group_by("TRT01A").agg(
    N = pl.len()
).sort("TRT01A")

# Preserve the treatment level order for downstream joins
treatment_levels = pop_counts.select(["TRT01A"])

# Safety population by treatment
pop_counts
shape: (3, 2)
TRT01A N
str u32
"Placebo" 86
"Xanomeline High Dose" 84
"Xanomeline Low Dose" 84
# Join treatment information to AE data
adae_safety = adae.join(adsl_safety, on="USUBJID")

# Total AE records in safety population
adae_safety.height
1191

8.4 Step 3: Define AE categories

We’ll calculate participant counts for standard AE categories used in regulatory submissions.

def count_participants(df, condition=None):
    """
    Count unique participants meeting a condition

    Args:
        df: DataFrame with adverse events
        condition: polars expression for filtering (None = count all)

    Returns:
        DataFrame with counts by treatment
    """
    if condition is not None:
        df = df.filter(condition)

    counts = df.group_by("TRT01A").agg(
        n = pl.col("USUBJID").n_unique()
    )

    return treatment_levels.join(counts, on="TRT01A", how="left").with_columns(
        pl.col("n").fill_null(0)
    )

# Calculate each category
categories = []

# 1. Participants in population (no filtering)
pop_row = pop_counts.with_columns(
    category = pl.lit("Participants in population")
).rename({"N": "n"})
categories.append(pop_row)

# 2. With any adverse event
any_ae = count_participants(adae_safety).with_columns(
    category = pl.lit("With any adverse event")
)
categories.append(any_ae)
# 3. With drug-related adverse event
drug_related = count_participants(
    adae_safety,
    pl.col("AEREL").is_in(["POSSIBLE", "PROBABLE", "DEFINITE", "RELATED"])
).with_columns(
    category = pl.lit("With drug-related adverse event")
)
categories.append(drug_related)

# 4. With serious adverse event
serious = count_participants(
    adae_safety,
    pl.col("AESER") == "Y"
).with_columns(
    category = pl.lit("With serious adverse event")
)
categories.append(serious)
# 5. With serious drug-related adverse event
serious_drug_related = count_participants(
    adae_safety,
    (pl.col("AESER") == "Y") &
    pl.col("AEREL").is_in(["POSSIBLE", "PROBABLE", "DEFINITE", "RELATED"])
).with_columns(
    category = pl.lit("With serious drug-related adverse event")
)
categories.append(serious_drug_related)

# 6. Who died
deaths = count_participants(
    adae_safety,
    pl.col("AEOUT") == "FATAL"
).with_columns(
    category = pl.lit("Who died")
)
categories.append(deaths)

# 7. Discontinued due to adverse event
discontinued = count_participants(
    adae_safety,
    pl.col("AEACN") == "DRUG WITHDRAWN"
).with_columns(
    category = pl.lit("Discontinued due to adverse event")
)
categories.append(discontinued)

8.5 Step 4: Combine and calculate percentages

Now we combine all categories and calculate percentages based on the safety population.

# Combine all categories
ae_summary = pl.concat(categories, how="diagonal")

# Add population totals and calculate percentages
ae_summary = ae_summary.join(
    pop_counts.select(["TRT01A", "N"]),
    on="TRT01A",
    how="left"
).with_columns([
    # Fill missing counts with 0
    pl.col("n").fill_null(0),
    # Calculate percentage
    pl.when(pl.col("category") == "Participants in population")
        .then(None)  # No percentage for population row
        .otherwise((100.0 * pl.col("n") / pl.col("N")).round(1))
        .alias("pct")
])

ae_summary.sort(["category", "TRT01A"])
shape: (21, 5)
TRT01A n category N pct
str u32 str u32 f64
"Placebo" 0 "Discontinued due to adverse ev… 86 0.0
"Xanomeline High Dose" 0 "Discontinued due to adverse ev… 84 0.0
"Xanomeline Low Dose" 0 "Discontinued due to adverse ev… 84 0.0
"Xanomeline High Dose" 1 "With serious drug-related adve… 84 1.2
"Xanomeline Low Dose" 1 "With serious drug-related adve… 84 1.2

8.6 Step 5: Format for display

We’ll format the counts and percentages for the final table display.

# Format display values
ae_formatted = ae_summary.with_columns([
    # Show counts as strings, including zeros
    pl.col("n").cast(str).alias("n_display"),
    # Format percentages with parentheses; blank out population row
    pl.when(pl.col("category") == "Participants in population")
      .then(pl.lit(""))
      .otherwise(
          pl.format("({})", pl.col("pct").fill_null(0).round(1).cast(str))
      )
      .alias("pct_display")
])

ae_formatted.select(["category", "TRT01A", "n_display", "pct_display"])
shape: (21, 4)
category TRT01A n_display pct_display
str str str str
"Participants in population" "Placebo" "86" ""
"Participants in population" "Xanomeline High Dose" "84" ""
"Participants in population" "Xanomeline Low Dose" "84" ""
"Discontinued due to adverse ev… "Xanomeline High Dose" "0" "(0.0)"
"Discontinued due to adverse ev… "Xanomeline Low Dose" "0" "(0.0)"

8.7 Step 6: Create final table structure

We reshape the data to create the final table with treatments as columns.

# Define category order for consistent display
category_order = [
    "Participants in population",
    "With any adverse event",
    "With drug-related adverse event",
    "With serious adverse event",
    "With serious drug-related adverse event",
    "Who died",
    "Discontinued due to adverse event"
]

# Pivot to wide format
ae_wide = ae_formatted.pivot(
    values=["n_display", "pct_display"],
    index="category",
    on="TRT01A",
    maintain_order=True
)

# Reorder columns for each treatment group
treatments = ["Placebo", "Xanomeline Low Dose", "Xanomeline High Dose"]
column_order = ["category"]
for trt in treatments:
    column_order.extend([f"n_display_{trt}", f"pct_display_{trt}"])

# Create final table with proper column order
final_table = ae_wide.select(column_order).sort(
    pl.col("category").cast(pl.Enum(category_order))
)

final_table
shape: (7, 7)
category n_display_Placebo pct_display_Placebo n_display_Xanomeline Low Dose pct_display_Xanomeline Low Dose n_display_Xanomeline High Dose pct_display_Xanomeline High Dose
str str str str str str str
"Participants in population" "86" "" "84" "" "84" ""
"With any adverse event" "69" "(80.2)" "77" "(91.7)" "79" "(94.0)"
"With drug-related adverse even… "44" "(51.2)" "73" "(86.9)" "70" "(83.3)"
"Who died" "2" "(2.3)" "1" "(1.2)" "0" "(0.0)"
"Discontinued due to adverse ev… "0" "(0.0)" "0" "(0.0)" "0" "(0.0)"

8.8 Step 7: Generate publication-ready output

Finally, we format the AE summary table for regulatory submission using the rtflite package.

# Get population sizes for column headers
n_placebo = pop_counts.filter(pl.col("TRT01A") == "Placebo")["N"][0]
n_low = pop_counts.filter(pl.col("TRT01A") == "Xanomeline Low Dose")["N"][0]
n_high = pop_counts.filter(pl.col("TRT01A") == "Xanomeline High Dose")["N"][0]

doc_ae_summary = rtf.RTFDocument(
    df=final_table.rename({"category": ""}),
    rtf_title=rtf.RTFTitle(
        text=[
            "Analysis of Adverse Event Summary",
            "(Safety Analysis Population)"
        ]
    ),
    rtf_column_header=[
        rtf.RTFColumnHeader(
            text = [
                "",
                "Placebo",
                "Xanomeline Low Dose",
                "Xanomeline High Dose"
            ],
            col_rel_width=[4, 2, 2, 2],
            text_justification=["l", "c", "c", "c"],
        ),
        rtf.RTFColumnHeader(
            text=[
                "",          # Empty for first column
                "n", "(%)",  # Placebo columns
                "n", "(%)",  # Low Dose columns
                "n", "(%)"   # High Dose columns
            ],
            col_rel_width=[4] + [1] * 6,
            text_justification=["l"] + ["c"] * 6,
            border_left = ["single"] + ["single", ""] * 3,
            border_top = [""] + ["single"] * 6
        )
    ],
    rtf_body=rtf.RTFBody(
        col_rel_width=[4] + [1] * 6,
        text_justification=["l"] + ["c"] * 6,
        border_left = ["single"] + ["single", ""] * 3
    ),
    rtf_footnote=rtf.RTFFootnote(
        text=[
            "Every subject is counted a single time for each applicable row and column."
        ]
    ),
    rtf_source=rtf.RTFSource(
        text=["Source: ADSL and ADAE datasets"]
    )
)

doc_ae_summary.write_rtf("rtf/tlf_ae_summary.rtf")
rtf/tlf_ae_summary.rtf
PosixPath('pdf/tlf_ae_summary.pdf')