8 Adverse events summary

Objective

Create adverse event summary tables to provide high-level safety overview across treatment groups. Learn to calculate AE rates and percentages using Polars and create comprehensive safety summary tables with rtflite.

8.1 Overview

Adverse events (AE) summary tables are critical safety assessments required in clinical study reports. Following ICH E3 guidance, these tables summarize the overall safety profile by showing the number and percentage of participants experiencing various categories of adverse events across treatment groups.

Key categories typically include:

Any adverse event: Total participants with at least one AE
Drug-related events: Events potentially related to study treatment
Serious adverse events: Events meeting regulatory criteria for seriousness
Deaths: Fatal outcomes
Discontinuations: Participants who stopped treatment due to AEs

This tutorial shows you how to create an AE summary table using Python’s rtflite package.

import polars as pl
import rtflite as rtf

polars.config.Config

8.2 Step 1: Load data

We need two datasets for AE analysis: the subject-level dataset (ADSL) and the adverse events dataset (ADAE).

# Load datasets
adsl = pl.read_parquet("data/adsl.parquet")
adae = pl.read_parquet("data/adae.parquet")

# Display key variables from ADSL
adsl.select(["USUBJID", "TRT01A", "SAFFL"])

shape: (254, 3)

USUBJID	TRT01A	SAFFL
str	str	str
"01-701-1015"	"Placebo"	"Y"
"01-701-1023"	"Placebo"	"Y"
"01-701-1028"	"Xanomeline High Dose"	"Y"
…	…	…
"01-718-1371"	"Xanomeline High Dose"	"Y"
"01-718-1427"	"Xanomeline High Dose"	"Y"

# Display key variables from ADAE
adae.select(["USUBJID", "AEREL", "AESER", "AEOUT", "AEACN"])

shape: (1_191, 5)

USUBJID	AEREL	AESER	AEOUT	AEACN
str	str	str	str	str
"01-701-1015"	"PROBABLE"	"N"	"NOT RECOVERED/NOT RESOLVED"	""
"01-701-1015"	"PROBABLE"	"N"	"NOT RECOVERED/NOT RESOLVED"	""
"01-701-1015"	"REMOTE"	"N"	"RECOVERED/RESOLVED"	""
…	…	…	…	…
"01-718-1427"	"POSSIBLE"	"N"	"RECOVERED/RESOLVED"	""
"01-718-1427"	"POSSIBLE"	"N"	"RECOVERED/RESOLVED"	""

Key ADAE variables used in this analysis:

USUBJID: Unique subject identifier to link with ADSL
AEREL: Relationship of adverse event to study drug (e.g., “RELATED”, “POSSIBLE”, “PROBABLE”, “DEFINITE”, “NOT RELATED”)
AESER: Serious adverse event flag (“Y” = serious, “N” = not serious)
AEOUT: Outcome of adverse event (e.g., “RECOVERED”, “RECOVERING”, “NOT RECOVERED”, “FATAL”)
AEACN: Action taken with study treatment (e.g., “DOSE NOT CHANGED”, “DRUG WITHDRAWN”, “DOSE REDUCED”)

8.3 Step 2: Filter safety population

For safety analyses, we focus on participants who received at least one dose of study treatment.

# Filter to safety population
adsl_safety = adsl.filter(pl.col("SAFFL") == "Y").select(["USUBJID", "TRT01A"])

# Get treatment counts for denominators
pop_counts = adsl_safety.group_by("TRT01A").agg(
    N = pl.len()
).sort("TRT01A")

# Preserve the treatment level order for downstream joins
treatment_levels = pop_counts.select(["TRT01A"])

# Safety population by treatment
pop_counts

shape: (3, 2)

TRT01A	N
str	u32
"Placebo"	86
"Xanomeline High Dose"	84
"Xanomeline Low Dose"	84

# Join treatment information to AE data
adae_safety = adae.join(adsl_safety, on="USUBJID")

# Total AE records in safety population
adae_safety.height

8.4 Step 3: Define AE categories

We’ll calculate participant counts for standard AE categories used in regulatory submissions.

def count_participants(df, condition=None):
    """
    Count unique participants meeting a condition

    Args:
        df: DataFrame with adverse events
        condition: polars expression for filtering (None = count all)

    Returns:
        DataFrame with counts by treatment
    """
    if condition is not None:
        df = df.filter(condition)

    counts = df.group_by("TRT01A").agg(
        n = pl.col("USUBJID").n_unique()
    )

    return treatment_levels.join(counts, on="TRT01A", how="left").with_columns(
        pl.col("n").fill_null(0)
    )

# Calculate each category
categories = []

# 1. Participants in population (no filtering)
pop_row = pop_counts.with_columns(
    category = pl.lit("Participants in population")
).rename({"N": "n"})
categories.append(pop_row)

# 2. With any adverse event
any_ae = count_participants(adae_safety).with_columns(
    category = pl.lit("With any adverse event")
)
categories.append(any_ae)

# 3. With drug-related adverse event
drug_related = count_participants(
    adae_safety,
    pl.col("AEREL").is_in(["POSSIBLE", "PROBABLE", "DEFINITE", "RELATED"])
).with_columns(
    category = pl.lit("With drug-related adverse event")
)
categories.append(drug_related)

# 4. With serious adverse event
serious = count_participants(
    adae_safety,
    pl.col("AESER") == "Y"
).with_columns(
    category = pl.lit("With serious adverse event")
)
categories.append(serious)

# 5. With serious drug-related adverse event
serious_drug_related = count_participants(
    adae_safety,
    (pl.col("AESER") == "Y") &
    pl.col("AEREL").is_in(["POSSIBLE", "PROBABLE", "DEFINITE", "RELATED"])
).with_columns(
    category = pl.lit("With serious drug-related adverse event")
)
categories.append(serious_drug_related)

# 6. Who died
deaths = count_participants(
    adae_safety,
    pl.col("AEOUT") == "FATAL"
).with_columns(
    category = pl.lit("Who died")
)
categories.append(deaths)

# 7. Discontinued due to adverse event
discontinued = count_participants(
    adae_safety,
    pl.col("AEACN") == "DRUG WITHDRAWN"
).with_columns(
    category = pl.lit("Discontinued due to adverse event")
)
categories.append(discontinued)

8.5 Step 4: Combine and calculate percentages

Now we combine all categories and calculate percentages based on the safety population.

# Combine all categories
ae_summary = pl.concat(categories, how="diagonal")

# Add population totals and calculate percentages
ae_summary = ae_summary.join(
    pop_counts.select(["TRT01A", "N"]),
    on="TRT01A",
    how="left"
).with_columns([
    # Fill missing counts with 0
    pl.col("n").fill_null(0),
    # Calculate percentage
    pl.when(pl.col("category") == "Participants in population")
        .then(None)  # No percentage for population row
        .otherwise((100.0 * pl.col("n") / pl.col("N")).round(1))
        .alias("pct")
])

ae_summary.sort(["category", "TRT01A"])

shape: (21, 5)

TRT01A	n	category	N	pct
str	u32	str	u32	f64
"Placebo"	0	"Discontinued due to adverse ev…	86	0.0
"Xanomeline High Dose"	0	"Discontinued due to adverse ev…	84	0.0
"Xanomeline Low Dose"	0	"Discontinued due to adverse ev…	84	0.0
…	…	…	…	…
"Xanomeline High Dose"	1	"With serious drug-related adve…	84	1.2
"Xanomeline Low Dose"	1	"With serious drug-related adve…	84	1.2

8.6 Step 5: Format for display

We’ll format the counts and percentages for the final table display.

# Format display values
ae_formatted = ae_summary.with_columns([
    # Show counts as strings, including zeros
    pl.col("n").cast(str).alias("n_display"),
    # Format percentages with parentheses; blank out population row
    pl.when(pl.col("category") == "Participants in population")
      .then(pl.lit(""))
      .otherwise(
          pl.format("({})", pl.col("pct").fill_null(0).round(1).cast(str))
      )
      .alias("pct_display")
])

ae_formatted.select(["category", "TRT01A", "n_display", "pct_display"])

shape: (21, 4)

category	TRT01A	n_display	pct_display
str	str	str	str
"Participants in population"	"Placebo"	"86"	""
"Participants in population"	"Xanomeline High Dose"	"84"	""
"Participants in population"	"Xanomeline Low Dose"	"84"	""
…	…	…	…
"Discontinued due to adverse ev…	"Xanomeline High Dose"	"0"	"(0.0)"
"Discontinued due to adverse ev…	"Xanomeline Low Dose"	"0"	"(0.0)"

8.7 Step 6: Create final table structure

We reshape the data to create the final table with treatments as columns.

# Define category order for consistent display
category_order = [
    "Participants in population",
    "With any adverse event",
    "With drug-related adverse event",
    "With serious adverse event",
    "With serious drug-related adverse event",
    "Who died",
    "Discontinued due to adverse event"
]

# Pivot to wide format
ae_wide = ae_formatted.pivot(
    values=["n_display", "pct_display"],
    index="category",
    on="TRT01A",
    maintain_order=True
)

# Reorder columns for each treatment group
treatments = ["Placebo", "Xanomeline Low Dose", "Xanomeline High Dose"]
column_order = ["category"]
for trt in treatments:
    column_order.extend([f"n_display_{trt}", f"pct_display_{trt}"])

# Create final table with proper column order
final_table = ae_wide.select(column_order).sort(
    pl.col("category").cast(pl.Enum(category_order))
)

final_table

shape: (7, 7)

category	n_display_Placebo	pct_display_Placebo	n_display_Xanomeline Low Dose	pct_display_Xanomeline Low Dose	n_display_Xanomeline High Dose	pct_display_Xanomeline High Dose
str	str	str	str	str	str	str
"Participants in population"	"86"	""	"84"	""	"84"	""
"With any adverse event"	"69"	"(80.2)"	"77"	"(91.7)"	"79"	"(94.0)"
"With drug-related adverse even…	"44"	"(51.2)"	"73"	"(86.9)"	"70"	"(83.3)"
…	…	…	…	…	…	…
"Who died"	"2"	"(2.3)"	"1"	"(1.2)"	"0"	"(0.0)"
"Discontinued due to adverse ev…	"0"	"(0.0)"	"0"	"(0.0)"	"0"	"(0.0)"

8.8 Step 7: Generate publication-ready output

Finally, we format the AE summary table for regulatory submission using the rtflite package.

# Get population sizes for column headers
n_placebo = pop_counts.filter(pl.col("TRT01A") == "Placebo")["N"][0]
n_low = pop_counts.filter(pl.col("TRT01A") == "Xanomeline Low Dose")["N"][0]
n_high = pop_counts.filter(pl.col("TRT01A") == "Xanomeline High Dose")["N"][0]

doc_ae_summary = rtf.RTFDocument(
    df=final_table.rename({"category": ""}),
    rtf_title=rtf.RTFTitle(
        text=[
            "Analysis of Adverse Event Summary",
            "(Safety Analysis Population)"
        ]
    ),
    rtf_column_header=[
        rtf.RTFColumnHeader(
            text = [
                "",
                "Placebo",
                "Xanomeline Low Dose",
                "Xanomeline High Dose"
            ],
            col_rel_width=[4, 2, 2, 2],
            text_justification=["l", "c", "c", "c"],
        ),
        rtf.RTFColumnHeader(
            text=[
                "",          # Empty for first column
                "n", "(%)",  # Placebo columns
                "n", "(%)",  # Low Dose columns
                "n", "(%)"   # High Dose columns
            ],
            col_rel_width=[4] + [1] * 6,
            text_justification=["l"] + ["c"] * 6,
            border_left = ["single"] + ["single", ""] * 3,
            border_top = [""] + ["single"] * 6
        )
    ],
    rtf_body=rtf.RTFBody(
        col_rel_width=[4] + [1] * 6,
        text_justification=["l"] + ["c"] * 6,
        border_left = ["single"] + ["single", ""] * 3
    ),
    rtf_footnote=rtf.RTFFootnote(
        text=[
            "Every subject is counted a single time for each applicable row and column."
        ]
    ),
    rtf_source=rtf.RTFSource(
        text=["Source: ADSL and ADAE datasets"]
    )
)

doc_ae_summary.write_rtf("rtf/tlf_ae_summary.rtf")

rtf/tlf_ae_summary.rtf

PosixPath('pdf/tlf_ae_summary.pdf')