3  Study Population Table

3.1 Overview

Clinical trials define multiple analysis populations based on different inclusion criteria. Following ICH E3 guidance, regulatory submissions must clearly document the number of participants in each analysis population to support the validity of statistical analyses.

The key analysis populations typically include:

  • All Randomized: Total participants who entered the study
  • Intent-to-Treat (ITT): Participants included in the primary efficacy analysis
  • Efficacy Population: Participants who meet specific criteria for efficacy evaluation
  • Safety Population: Participants who received at least one dose of study treatment

This tutorial shows you how to create a population summary table using Python’s rtflite package.

import polars as pl # Data manipulation
import rtflite as rtf # RTF reporting

3.2 Step 1: Load Data

We start by loading the Subject-level Analysis Dataset (ADSL), which contains population flags for each participant.

adsl = pl.read_parquet("data/adsl.parquet")

Let’s examine the key population flag variables we’ll use:

  • USUBJID: Unique participant identifier
  • TRT01P: Planned treatment group
  • ITTFL: Intent-to-treat population flag (Y/N)
  • EFFFL: Efficacy population flag (Y/N)
  • SAFFL: Safety population flag (Y/N)
adsl.select(["USUBJID", "TRT01P", "ITTFL", "EFFFL", "SAFFL"])
shape: (254, 5)
USUBJID TRT01P ITTFL EFFFL SAFFL
str str str str str
"01-701-1015" "Placebo" "Y" "Y" "Y"
"01-701-1023" "Placebo" "Y" "Y" "Y"
"01-701-1028" "Xanomeline High Dose" "Y" "Y" "Y"
"01-701-1033" "Xanomeline Low Dose" "Y" "Y" "Y"
"01-701-1034" "Xanomeline High Dose" "Y" "Y" "Y"
"01-718-1254" "Xanomeline Low Dose" "Y" "Y" "Y"
"01-718-1328" "Xanomeline High Dose" "Y" "Y" "Y"
"01-718-1355" "Placebo" "Y" "Y" "Y"
"01-718-1371" "Xanomeline High Dose" "Y" "Y" "Y"
"01-718-1427" "Xanomeline High Dose" "Y" "Y" "Y"

3.3 Step 2: Calculate Treatment Group Totals

First, we calculate the total number of randomized participants in each treatment group, which will serve as the denominator for percentage calculations.

totals = adsl.group_by("TRT01P").agg(
    total = pl.len()
)

totals
shape: (3, 2)
TRT01P total
str u32
"Placebo" 86
"Xanomeline Low Dose" 84
"Xanomeline High Dose" 84

3.4 Step 3: Define Helper Function

We create a reusable function to count participants by treatment group for any population subset.

def count_by_treatment(data, population_name):
    """Count participants by treatment group and add population label"""
    return data.group_by("TRT01P").agg(
        n = pl.len()
    ).with_columns(
        population = pl.lit(population_name)
    )

3.5 Step 4: Count Each Population

Now we calculate participant counts for each analysis population.

3.5.1 All Randomized Participants

pop_all = count_by_treatment(
    data=adsl,
    population_name="Participants in population"
)

pop_all
shape: (3, 3)
TRT01P n population
str u32 str
"Xanomeline Low Dose" 84 "Participants in population"
"Placebo" 86 "Participants in population"
"Xanomeline High Dose" 84 "Participants in population"

3.5.2 Intent-to-Treat Population

adsl_itt = adsl.filter(pl.col("ITTFL") == "Y")
pop_itt = count_by_treatment(
    data=adsl_itt,
    population_name="Participants included in ITT population"
)

pop_itt
shape: (3, 3)
TRT01P n population
str u32 str
"Xanomeline Low Dose" 84 "Participants included in ITT p…
"Xanomeline High Dose" 84 "Participants included in ITT p…
"Placebo" 86 "Participants included in ITT p…

3.5.3 Efficacy Population

adsl_eff = adsl.filter(pl.col("EFFFL") == "Y")
pop_eff = count_by_treatment(
    data=adsl_eff,
    population_name="Participants included in efficacy population"
)

pop_eff
shape: (3, 3)
TRT01P n population
str u32 str
"Xanomeline Low Dose" 81 "Participants included in effic…
"Placebo" 79 "Participants included in effic…
"Xanomeline High Dose" 74 "Participants included in effic…

3.5.4 Safety Population

adsl_saf = adsl.filter(pl.col("SAFFL") == "Y")
pop_saf = count_by_treatment(
    data=adsl_saf,
    population_name="Participants included in safety population"
)

pop_saf
shape: (3, 3)
TRT01P n population
str u32 str
"Placebo" 86 "Participants included in safet…
"Xanomeline High Dose" 84 "Participants included in safet…
"Xanomeline Low Dose" 84 "Participants included in safet…

3.6 Step 5: Combine All Populations

We stack all population counts together into a single dataset.

all_populations = pl.concat([
    pop_all,
    pop_itt,
    pop_eff,
    pop_saf
])

all_populations
shape: (12, 3)
TRT01P n population
str u32 str
"Xanomeline Low Dose" 84 "Participants in population"
"Placebo" 86 "Participants in population"
"Xanomeline High Dose" 84 "Participants in population"
"Xanomeline Low Dose" 84 "Participants included in ITT p…
"Xanomeline High Dose" 84 "Participants included in ITT p…
"Placebo" 79 "Participants included in effic…
"Xanomeline High Dose" 74 "Participants included in effic…
"Placebo" 86 "Participants included in safet…
"Xanomeline High Dose" 84 "Participants included in safet…
"Xanomeline Low Dose" 84 "Participants included in safet…

3.7 Step 6: Calculate Percentages

We join with the total counts and calculate what percentage each population represents of the total randomized participants.

stats_with_pct = all_populations.join(
    totals,
    on="TRT01P"
).with_columns(
    pct = (100.0 * pl.col("n") / pl.col("total")).round(1)
)

stats_with_pct
shape: (12, 5)
TRT01P n population total pct
str u32 str u32 f64
"Xanomeline Low Dose" 84 "Participants in population" 84 100.0
"Placebo" 86 "Participants in population" 86 100.0
"Xanomeline High Dose" 84 "Participants in population" 84 100.0
"Xanomeline Low Dose" 84 "Participants included in ITT p… 84 100.0
"Xanomeline High Dose" 84 "Participants included in ITT p… 84 100.0
"Placebo" 79 "Participants included in effic… 86 91.9
"Xanomeline High Dose" 74 "Participants included in effic… 84 88.1
"Placebo" 86 "Participants included in safet… 86 100.0
"Xanomeline High Dose" 84 "Participants included in safet… 84 100.0
"Xanomeline Low Dose" 84 "Participants included in safet… 84 100.0

3.8 Step 7: Format Display Values

For the final table, we format the display text. The total randomized count shows just “N”, while subset populations show “N (%)”.

formatted_stats = stats_with_pct.with_columns(
    display = pl.when(pl.col("population") == "Participants in population")
        .then(pl.col("n").cast(str)) 
        .otherwise(
            pl.concat_str([ 
                pl.col("n").cast(str),
                pl.lit(" ("),
                pl.col("pct").round(1).cast(str),
                pl.lit(")")
            ])
        )
)

formatted_stats
shape: (12, 6)
TRT01P n population total pct display
str u32 str u32 f64 str
"Xanomeline Low Dose" 84 "Participants in population" 84 100.0 "84"
"Placebo" 86 "Participants in population" 86 100.0 "86"
"Xanomeline High Dose" 84 "Participants in population" 84 100.0 "84"
"Xanomeline Low Dose" 84 "Participants included in ITT p… 84 100.0 "84 (100.0)"
"Xanomeline High Dose" 84 "Participants included in ITT p… 84 100.0 "84 (100.0)"
"Placebo" 79 "Participants included in effic… 86 91.9 "79 (91.9)"
"Xanomeline High Dose" 74 "Participants included in effic… 84 88.1 "74 (88.1)"
"Placebo" 86 "Participants included in safet… 86 100.0 "86 (100.0)"
"Xanomeline High Dose" 84 "Participants included in safet… 84 100.0 "84 (100.0)"
"Xanomeline Low Dose" 84 "Participants included in safet… 84 100.0 "84 (100.0)"

3.9 Step 8: Create Final Table

We reshape the data from long format (rows for each treatment-population combination) to wide format (columns for each treatment group).

df_overview = formatted_stats.pivot(
    values="display",
    index="population",
    on="TRT01P",
    maintain_order=True
).select(
    ["population", "Placebo", "Xanomeline Low Dose", "Xanomeline High Dose"]
)

df_overview
shape: (4, 4)
population Placebo Xanomeline Low Dose Xanomeline High Dose
str str str str
"Participants in population" "86" "84" "84"
"Participants included in ITT p… "86 (100.0)" "84 (100.0)" "84 (100.0)"
"Participants included in effic… "79 (91.9)" "81 (96.4)" "74 (88.1)"
"Participants included in safet… "86 (100.0)" "84 (100.0)" "84 (100.0)"

3.10 Step 9: Generate Publication-Ready Output

Finally, we format the population table for regulatory submission using the rtflite package.

doc_overview = rtf.RTFDocument(
    df=df_overview,
    rtf_title=rtf.RTFTitle(
        text=["Analysis Population", "All Participants Randomized"]
    ),
    rtf_column_header=rtf.RTFColumnHeader(
        text=["", "Placebo\nn (%)", "Xanomeline Low Dose\nn (%)", "Xanomeline High Dose\nn (%)"],
        col_rel_width=[4, 2, 2, 2],
        text_justification=["l", "c", "c", "c"],
    ),
    rtf_body=rtf.RTFBody(
        col_rel_width=[4, 2, 2, 2],
        text_justification=["l", "c", "c", "c"],
    ),
    rtf_source=rtf.RTFSource(text=["Source: ADSL dataset"])
)

doc_overview.write_rtf("rtf/tlf_population.rtf")
rtf/tlf_population.rtf