= pl.read_parquet("data/adsl.parquet") adsl
2 Disposition of Participants Table
2.1 Overview
Clinical trials needs to track how participants flow through a study from enrollment to completion. Following ICH E3 guidance, regulatory submissions require a disposition table in Section 10.1 that summarizes:
- Enrolled: Total participants who entered the study
- Completed: Participants who finished the study protocol
- Discontinued: Participants who left early and their reasons
This tutorial shows you how to create a regulatory-compliant disposition table using Python’s rtflite
package.
2.2 Step 1: Load Data
We start by loading the Subject-level Analysis Dataset (ADSL), which contains all participant information needed for our disposition table.
The ADSL dataset stores participant-level information including treatment assignments and study completion status. We’re using the parquet
format for data storage.
Let’s examine the key variables we’ll use to build our disposition table:
- USUBJID: Unique identifier for each participant
- TRT01P: Treatment name (text)
- TRT01PN: Treatment group (numeric code)
- DISCONFL: Flag indicating if participant discontinued (Y/N)
- DCREASCD: Specific reason for discontinuation
"USUBJID", "TRT01P", "TRT01PN", "DISCONFL", "DCREASCD"]) adsl.select([
USUBJID | TRT01P | TRT01PN | DISCONFL | DCREASCD |
---|---|---|---|---|
str | str | i64 | str | str |
"01-701-1015" | "Placebo" | 0 | "" | "Completed" |
"01-701-1023" | "Placebo" | 0 | "Y" | "Adverse Event" |
"01-701-1028" | "Xanomeline High Dose" | 81 | "" | "Completed" |
"01-701-1033" | "Xanomeline Low Dose" | 54 | "Y" | "Sponsor Decision" |
"01-701-1034" | "Xanomeline High Dose" | 81 | "" | "Completed" |
… | … | … | … | … |
"01-718-1254" | "Xanomeline Low Dose" | 54 | "" | "Completed" |
"01-718-1328" | "Xanomeline High Dose" | 81 | "Y" | "Withdrew Consent" |
"01-718-1355" | "Placebo" | 0 | "" | "Completed" |
"01-718-1371" | "Xanomeline High Dose" | 81 | "Y" | "Adverse Event" |
"01-718-1427" | "Xanomeline High Dose" | 81 | "Y" | "Lack of Efficacy" |
2.3 Step 2: Count Total Participants
First, we count how many participants were enrolled in each treatment group.
We group participants by treatment arm and count them using .group_by()
and .agg()
. The .pivot()
operation reshapes our data from long format (rows for each treatment) to wide format (columns for each treatment), which matches the standard disposition table layout.
= (
n_rand
adsl"TRT01PN")
.group_by(= pl.len())
.agg(n
.with_columns(["Participants in population").alias("row"),
pl.lit(None, dtype=pl.Float64).alias("pct") # Placeholder for percentage (not applicable for totals)
pl.lit(
])
.pivot(="row",
index="TRT01PN",
on=["n", "pct"],
values=True
sort_columns
)
)
n_rand
row | n_0 | n_54 | n_81 | pct_0 | pct_54 | pct_81 |
---|---|---|---|---|---|---|
str | u32 | u32 | u32 | f64 | f64 | f64 |
"Participants in population" | 86 | 84 | 84 | null | null | null |
2.4 Step 3: Count Completed Participants
Next, we identify participants who successfully completed the study and calculate what percentage they represent of each treatment group.
We filter for participants where DCREASCD == "Completed"
, then calculate both counts and percentages. The .join()
operation brings in the total count for each treatment group so we can compute percentages.
= (
n_complete
adslfilter(pl.col("DCREASCD") == "Completed")
."TRT01PN")
.group_by(= pl.len())
.agg(n
.join("TRT01PN").agg(total = pl.len()),
adsl.group_by(="TRT01PN"
on
)
.with_columns(["Completed").alias("row"),
pl.lit(100.0 * pl.col("n") / pl.col("total")).round(1).alias("pct")
(
])
.pivot(="row",
index="TRT01PN",
on=["n", "pct"],
values=True
sort_columns
)
)
n_complete
row | n_0 | n_54 | n_81 | pct_0 | pct_54 | pct_81 |
---|---|---|---|---|---|---|
str | u32 | u32 | u32 | f64 | f64 | f64 |
"Completed" | 58 | 25 | 27 | 67.4 | 29.8 | 32.1 |
2.5 Step 4: Count Discontinued Participants
Now we count participants who left the study early, regardless of their specific reason.
We filter for participants where the discontinuation flag DISCONFL == "Y"
, then follow the same pattern of counting and calculating percentages within each treatment group.
= (
n_disc
adslfilter(pl.col("DISCONFL") == "Y")
."TRT01PN")
.group_by(= pl.len())
.agg(n
.join("TRT01PN").agg(total = pl.len()),
adsl.group_by(="TRT01PN"
on
)
.with_columns(["Discontinued").alias("row"),
pl.lit(100.0 * pl.col("n") / pl.col("total")).round(1).alias("pct")
(
])
.pivot(="row",
index="TRT01PN",
on=["n", "pct"],
values=True
sort_columns
)
)
n_disc
row | n_0 | n_54 | n_81 | pct_0 | pct_54 | pct_81 |
---|---|---|---|---|---|---|
str | u32 | u32 | u32 | f64 | f64 | f64 |
"Discontinued" | 28 | 59 | 57 | 32.6 | 70.2 | 67.9 |
2.6 Step 5: Break Down Discontinuation Reasons
For regulatory reporting, we need to show the specific reasons why participants discontinued.
We filter out completed participants, then group by both treatment and discontinuation reason. The indentation (four spaces) in the row labels helps show these are subcategories under “Discontinued”. We also use .fill_null(0)
to handle cases where certain discontinuation reasons don’t occur in all treatment groups.
= (
n_reason
adslfilter(pl.col("DCREASCD") != "Completed")
."TRT01PN", "DCREASCD"])
.group_by([= pl.len())
.agg(n
.join("TRT01PN").agg(total = pl.len()),
adsl.group_by(="TRT01PN"
on
)
.with_columns([" "), pl.col("DCREASCD")]).alias("row"),
pl.concat_str([pl.lit(100.0 * pl.col("n") / pl.col("total")).round(1).alias("pct")
(
])
.pivot(="row",
index="TRT01PN",
on=["n", "pct"],
values=True
sort_columns
)
.with_columns(["n_0", "n_54", "n_81"]).fill_null(0),
pl.col(["pct_0", "pct_54", "pct_81"]).fill_null(0.0)
pl.col([
])"row")
.sort(
)
n_reason
row | n_0 | n_54 | n_81 | pct_0 | pct_54 | pct_81 |
---|---|---|---|---|---|---|
str | u32 | u32 | u32 | f64 | f64 | f64 |
" Adverse Event" | 8 | 44 | 40 | 9.3 | 52.4 | 47.6 |
" Death" | 2 | 1 | 0 | 2.3 | 1.2 | 0.0 |
" I/E Not Met" | 1 | 0 | 2 | 1.2 | 0.0 | 2.4 |
" Lack of Efficacy" | 3 | 0 | 1 | 3.5 | 0.0 | 1.2 |
" Lost to Follow-up" | 1 | 1 | 0 | 1.2 | 1.2 | 0.0 |
" Physician Decision" | 1 | 0 | 2 | 1.2 | 0.0 | 2.4 |
" Protocol Violation" | 1 | 1 | 1 | 1.2 | 1.2 | 1.2 |
" Sponsor Decision" | 2 | 2 | 3 | 2.3 | 2.4 | 3.6 |
" Withdrew Consent" | 9 | 10 | 8 | 10.5 | 11.9 | 9.5 |
2.7 Step 6: Combine All Results
Now we stack all our individual summaries together to create the complete disposition table.
Using pl.concat()
, we combine the enrollment counts, completion counts, discontinuation counts, and detailed discontinuation reasons into a single table that flows logically from top to bottom.
= pl.concat([
tbl_disp
n_rand,
n_complete,
n_disc,
n_reason
])
tbl_disp
row | n_0 | n_54 | n_81 | pct_0 | pct_54 | pct_81 |
---|---|---|---|---|---|---|
str | u32 | u32 | u32 | f64 | f64 | f64 |
"Participants in population" | 86 | 84 | 84 | null | null | null |
"Completed" | 58 | 25 | 27 | 67.4 | 29.8 | 32.1 |
"Discontinued" | 28 | 59 | 57 | 32.6 | 70.2 | 67.9 |
" Adverse Event" | 8 | 44 | 40 | 9.3 | 52.4 | 47.6 |
" Death" | 2 | 1 | 0 | 2.3 | 1.2 | 0.0 |
… | … | … | … | … | … | … |
" Lost to Follow-up" | 1 | 1 | 0 | 1.2 | 1.2 | 0.0 |
" Physician Decision" | 1 | 0 | 2 | 1.2 | 0.0 | 2.4 |
" Protocol Violation" | 1 | 1 | 1 | 1.2 | 1.2 | 1.2 |
" Sponsor Decision" | 2 | 2 | 3 | 2.3 | 2.4 | 3.6 |
" Withdrew Consent" | 9 | 10 | 8 | 10.5 | 11.9 | 9.5 |
2.8 Step 7: Generate Publication-Ready Output
Finally, we format our table in RTF format using the rtflite
package.
The RTFDocument
class handles the complex formatting required for clinical reports, including proper column headers, borders, and spacing. The resulting RTF file can be directly included in regulatory submissions or converted to PDF for review.
= rtf.RTFDocument(
doc_disp =tbl_disp.select("row", "n_0", "pct_0", "n_54", "pct_54", "n_81", "pct_81"),
df=rtf.RTFTitle(text=["Disposition of Participants"]),
rtf_title=[
rtf_column_header
rtf.RTFColumnHeader(=["", "Placebo", "Xanomeline Low Dose", "Xanomeline High Dose"],
text=[3] + [2] * 3,
col_rel_width=["l"] + ["c"] * 3,
text_justification
),
rtf.RTFColumnHeader(=["", "n", "(%)", "n", "(%)", "n", "(%)"],
text=[3] + [1] * 6,
col_rel_width=["l"] + ["c"] * 6,
text_justification=[""] + ["single"] * 6,
border_top=["single"] + ["single", ""] * 3
border_left
)
],=rtf.RTFBody(
rtf_body=[3] + [1] * 6,
col_rel_width=["l"] + ["c"] * 6,
text_justification=["single"] + ["single", ""] * 3
border_left
),=rtf.RTFSource(text=["Source: ADSL dataset"]) # Required source attribution
rtf_source
)
"rtf/tlf_disposition.rtf") # Save as RTF for submission doc_disp.write_rtf(
rtf/tlf_disposition.rtf