= pl.read_parquet("data/adsl.parquet") adsl
3 Study Population Table
3.1 Overview
Clinical trials define multiple analysis populations based on different inclusion criteria. Following ICH E3 guidance, regulatory submissions must clearly document the number of participants in each analysis population to support the validity of statistical analyses.
The key analysis populations typically include:
- All Randomized: Total participants who entered the study
- Intent-to-Treat (ITT): Participants included in the primary efficacy analysis
- Efficacy Population: Participants who meet specific criteria for efficacy evaluation
- Safety Population: Participants who received at least one dose of study treatment
This tutorial shows you how to create a population summary table using Python’s rtflite
package.
3.2 Step 1: Load Data
We start by loading the Subject-level Analysis Dataset (ADSL), which contains population flags for each participant.
Let’s examine the key population flag variables we’ll use:
- USUBJID: Unique participant identifier
- TRT01P: Planned treatment group
- ITTFL: Intent-to-treat population flag (Y/N)
- EFFFL: Efficacy population flag (Y/N)
- SAFFL: Safety population flag (Y/N)
"USUBJID", "TRT01P", "ITTFL", "EFFFL", "SAFFL"]) adsl.select([
USUBJID | TRT01P | ITTFL | EFFFL | SAFFL |
---|---|---|---|---|
str | str | str | str | str |
"01-701-1015" | "Placebo" | "Y" | "Y" | "Y" |
"01-701-1023" | "Placebo" | "Y" | "Y" | "Y" |
"01-701-1028" | "Xanomeline High Dose" | "Y" | "Y" | "Y" |
"01-701-1033" | "Xanomeline Low Dose" | "Y" | "Y" | "Y" |
"01-701-1034" | "Xanomeline High Dose" | "Y" | "Y" | "Y" |
… | … | … | … | … |
"01-718-1254" | "Xanomeline Low Dose" | "Y" | "Y" | "Y" |
"01-718-1328" | "Xanomeline High Dose" | "Y" | "Y" | "Y" |
"01-718-1355" | "Placebo" | "Y" | "Y" | "Y" |
"01-718-1371" | "Xanomeline High Dose" | "Y" | "Y" | "Y" |
"01-718-1427" | "Xanomeline High Dose" | "Y" | "Y" | "Y" |
3.3 Step 2: Calculate Treatment Group Totals
First, we calculate the total number of randomized participants in each treatment group, which will serve as the denominator for percentage calculations.
= adsl.group_by("TRT01P").agg(
totals = pl.len()
total
)
totals
TRT01P | total |
---|---|
str | u32 |
"Placebo" | 86 |
"Xanomeline Low Dose" | 84 |
"Xanomeline High Dose" | 84 |
3.4 Step 3: Define Helper Function
We create a reusable function to count participants by treatment group for any population subset.
def count_by_treatment(data, population_name):
"""Count participants by treatment group and add population label"""
return data.group_by("TRT01P").agg(
= pl.len()
n
).with_columns(= pl.lit(population_name)
population )
3.5 Step 4: Count Each Population
Now we calculate participant counts for each analysis population.
3.5.1 All Randomized Participants
= count_by_treatment(
pop_all =adsl,
data="Participants in population"
population_name
)
pop_all
TRT01P | n | population |
---|---|---|
str | u32 | str |
"Xanomeline Low Dose" | 84 | "Participants in population" |
"Placebo" | 86 | "Participants in population" |
"Xanomeline High Dose" | 84 | "Participants in population" |
3.5.2 Intent-to-Treat Population
= adsl.filter(pl.col("ITTFL") == "Y")
adsl_itt = count_by_treatment(
pop_itt =adsl_itt,
data="Participants included in ITT population"
population_name
)
pop_itt
TRT01P | n | population |
---|---|---|
str | u32 | str |
"Xanomeline Low Dose" | 84 | "Participants included in ITT p… |
"Xanomeline High Dose" | 84 | "Participants included in ITT p… |
"Placebo" | 86 | "Participants included in ITT p… |
3.5.3 Efficacy Population
= adsl.filter(pl.col("EFFFL") == "Y")
adsl_eff = count_by_treatment(
pop_eff =adsl_eff,
data="Participants included in efficacy population"
population_name
)
pop_eff
TRT01P | n | population |
---|---|---|
str | u32 | str |
"Xanomeline Low Dose" | 81 | "Participants included in effic… |
"Placebo" | 79 | "Participants included in effic… |
"Xanomeline High Dose" | 74 | "Participants included in effic… |
3.5.4 Safety Population
= adsl.filter(pl.col("SAFFL") == "Y")
adsl_saf = count_by_treatment(
pop_saf =adsl_saf,
data="Participants included in safety population"
population_name
)
pop_saf
TRT01P | n | population |
---|---|---|
str | u32 | str |
"Placebo" | 86 | "Participants included in safet… |
"Xanomeline High Dose" | 84 | "Participants included in safet… |
"Xanomeline Low Dose" | 84 | "Participants included in safet… |
3.6 Step 5: Combine All Populations
We stack all population counts together into a single dataset.
= pl.concat([
all_populations
pop_all,
pop_itt,
pop_eff,
pop_saf
])
all_populations
TRT01P | n | population |
---|---|---|
str | u32 | str |
"Xanomeline Low Dose" | 84 | "Participants in population" |
"Placebo" | 86 | "Participants in population" |
"Xanomeline High Dose" | 84 | "Participants in population" |
"Xanomeline Low Dose" | 84 | "Participants included in ITT p… |
"Xanomeline High Dose" | 84 | "Participants included in ITT p… |
… | … | … |
"Placebo" | 79 | "Participants included in effic… |
"Xanomeline High Dose" | 74 | "Participants included in effic… |
"Placebo" | 86 | "Participants included in safet… |
"Xanomeline High Dose" | 84 | "Participants included in safet… |
"Xanomeline Low Dose" | 84 | "Participants included in safet… |
3.7 Step 6: Calculate Percentages
We join with the total counts and calculate what percentage each population represents of the total randomized participants.
= all_populations.join(
stats_with_pct
totals,="TRT01P"
on
).with_columns(= (100.0 * pl.col("n") / pl.col("total")).round(1)
pct
)
stats_with_pct
TRT01P | n | population | total | pct |
---|---|---|---|---|
str | u32 | str | u32 | f64 |
"Xanomeline Low Dose" | 84 | "Participants in population" | 84 | 100.0 |
"Placebo" | 86 | "Participants in population" | 86 | 100.0 |
"Xanomeline High Dose" | 84 | "Participants in population" | 84 | 100.0 |
"Xanomeline Low Dose" | 84 | "Participants included in ITT p… | 84 | 100.0 |
"Xanomeline High Dose" | 84 | "Participants included in ITT p… | 84 | 100.0 |
… | … | … | … | … |
"Placebo" | 79 | "Participants included in effic… | 86 | 91.9 |
"Xanomeline High Dose" | 74 | "Participants included in effic… | 84 | 88.1 |
"Placebo" | 86 | "Participants included in safet… | 86 | 100.0 |
"Xanomeline High Dose" | 84 | "Participants included in safet… | 84 | 100.0 |
"Xanomeline Low Dose" | 84 | "Participants included in safet… | 84 | 100.0 |
3.8 Step 7: Format Display Values
For the final table, we format the display text. The total randomized count shows just “N”, while subset populations show “N (%)”.
= stats_with_pct.with_columns(
formatted_stats = pl.when(pl.col("population") == "Participants in population")
display "n").cast(str))
.then(pl.col(
.otherwise(
pl.concat_str([ "n").cast(str),
pl.col(" ("),
pl.lit("pct").round(1).cast(str),
pl.col(")")
pl.lit(
])
)
)
formatted_stats
TRT01P | n | population | total | pct | display |
---|---|---|---|---|---|
str | u32 | str | u32 | f64 | str |
"Xanomeline Low Dose" | 84 | "Participants in population" | 84 | 100.0 | "84" |
"Placebo" | 86 | "Participants in population" | 86 | 100.0 | "86" |
"Xanomeline High Dose" | 84 | "Participants in population" | 84 | 100.0 | "84" |
"Xanomeline Low Dose" | 84 | "Participants included in ITT p… | 84 | 100.0 | "84 (100.0)" |
"Xanomeline High Dose" | 84 | "Participants included in ITT p… | 84 | 100.0 | "84 (100.0)" |
… | … | … | … | … | … |
"Placebo" | 79 | "Participants included in effic… | 86 | 91.9 | "79 (91.9)" |
"Xanomeline High Dose" | 74 | "Participants included in effic… | 84 | 88.1 | "74 (88.1)" |
"Placebo" | 86 | "Participants included in safet… | 86 | 100.0 | "86 (100.0)" |
"Xanomeline High Dose" | 84 | "Participants included in safet… | 84 | 100.0 | "84 (100.0)" |
"Xanomeline Low Dose" | 84 | "Participants included in safet… | 84 | 100.0 | "84 (100.0)" |
3.9 Step 8: Create Final Table
We reshape the data from long format (rows for each treatment-population combination) to wide format (columns for each treatment group).
= formatted_stats.pivot(
df_overview ="display",
values="population",
index="TRT01P",
on=True
maintain_order
).select("population", "Placebo", "Xanomeline Low Dose", "Xanomeline High Dose"]
[
)
df_overview
population | Placebo | Xanomeline Low Dose | Xanomeline High Dose |
---|---|---|---|
str | str | str | str |
"Participants in population" | "86" | "84" | "84" |
"Participants included in ITT p… | "86 (100.0)" | "84 (100.0)" | "84 (100.0)" |
"Participants included in effic… | "79 (91.9)" | "81 (96.4)" | "74 (88.1)" |
"Participants included in safet… | "86 (100.0)" | "84 (100.0)" | "84 (100.0)" |
3.10 Step 9: Generate Publication-Ready Output
Finally, we format the population table for regulatory submission using the rtflite
package.
= rtf.RTFDocument(
doc_overview =df_overview,
df=rtf.RTFTitle(
rtf_title=["Analysis Population", "All Participants Randomized"]
text
),=rtf.RTFColumnHeader(
rtf_column_header=["", "Placebo\nn (%)", "Xanomeline Low Dose\nn (%)", "Xanomeline High Dose\nn (%)"],
text=[4, 2, 2, 2],
col_rel_width=["l", "c", "c", "c"],
text_justification
),=rtf.RTFBody(
rtf_body=[4, 2, 2, 2],
col_rel_width=["l", "c", "c", "c"],
text_justification
),=rtf.RTFSource(text=["Source: ADSL dataset"])
rtf_source
)
"rtf/tlf_population.rtf") doc_overview.write_rtf(
rtf/tlf_population.rtf