import polars as pl4 TLF overview
Understand the regulatory context and importance of TLFs in clinical study reports. Learn basic concepts of Polars and rtflite for TLF generation.
4.1 Overview
Tables, listings, and figures (TLFs) are essential components of clinical study reports (CSRs) and regulatory submissions. Following ICH E3 guidance, TLFs provide standardized summaries of clinical trial data that support regulatory decision-making.
This chapter provides an overview of creating TLFs using Python, focusing on the tools and workflows demonstrated throughout this book.
4.2 Background
Submitting clinical trial results to regulatory agencies is a crucial aspect of clinical development. The Electronic Common Technical Document (eCTD) has emerged as the global standard format for regulatory submissions. For instance, the United States Food and Drug Administration (US FDA) mandates the use of eCTD for new drug applications and biologics license applications.
A CSR provides comprehensive information about the methods and results of an individual clinical study. To support the statistical analysis, numerous tables, listings, and figures are included within the main text and appendices. The creation of a CSR is a collaborative effort that involves various professionals such as clinicians, medical writers, statisticians, and statistical programmers.
Within an organization, these professionals typically collaborate to define, develop, validate, and deliver the necessary TLFs for a CSR. These TLFs serve to summarize the efficacy and/or safety of the pharmaceutical product under study. In the pharmaceutical industry, Microsoft Word is widely utilized for CSR preparation. As a result, the deliverables from statisticians and statistical programmers are commonly provided in formats such as .rtf, .doc, .docx to align with industry standards and requirements.
Each organization may define specific TLF format requirements that differ from the examples in this book. It is advisable to consult and adhere to the guidelines and specifications set by your respective organization when preparing TLFs for submission.
By following the ICH E3 guidance, most TLFs in a CSR are located at:
- Section 10: Study participants
- Section 11: Efficacy evaluation
- Section 12: Safety evaluation
- Section 14: Tables, listings, and figures referenced but not included in the text
- Section 16: Appendices
4.3 Datasets
The dataset structure follows CDISC Analysis Data Model (ADaM).
In this project, we use publicly available CDISC pilot study data, which is accessible through the CDISC GitHub repository.
We have converted these datasets from the .xpt format to the .parquet format for ease of use and compatibility with Python tools. The dataset structure adheres to the CDISC Analysis Data Model (ADaM) standard.
4.4 Tools
To exemplify the generation of TLFs in RTF format, we rely on the functionality provided by two Python packages:
- Polars: Preparation of datasets in a format suitable for reporting purposes. Polars offers a comprehensive suite of tools and functions for data manipulation and transformation, ensuring that the data is structured appropriately.
- rtflite: Creation of RTF files. The rtflite package offers functions specifically designed for generating RTF files, allowing us to produce TLFs in the desired format.
4.5 Polars
Polars is an open-source library for data manipulation implemented in Rust with Python bindings. It offers exceptional performance while maintaining a user-friendly interface for interactive data analysis.
Key advantages of Polars include:
- Performance: 10-100x faster than pandas for most operations due to Rust implementation
- Memory efficiency: Lazy evaluation and columnar storage reduce memory usage
- Familiar syntax: Similar to tidyverse-style pipelines, making it accessible to R users
- Type safety: Strong typing system that catches errors early in development
The creators of Polars have provided exceptional documentation and tutorials that serve as valuable resources for learning and mastering the functionalities of the library.
Furthermore, several books are available that serve as introductions to Polars:
In this book, we assume that the reader has some experience with data manipulation concepts. This prior knowledge enables a more efficient and focused exploration of the clinical reporting concepts presented throughout the book.
To illustrate the basic usage of Polars, let’s work with a sample ADSL dataset. This dataset contains subject-level information from a clinical trial, which will serve as a practical example for generating summaries using Polars.
polars.config.Config
# Read clinical data
adsl = pl.read_parquet("data/adsl.parquet")
# Select columns
adsl = adsl.select(["USUBJID", "TRT01A", "AGE", "SEX"])
# Basic data exploration
adsl| USUBJID | TRT01A | AGE | SEX |
|---|---|---|---|
| str | str | f64 | str |
| "01-701-1015" | "Placebo" | 63.0 | "Female" |
| "01-701-1023" | "Placebo" | 64.0 | "Male" |
| "01-701-1028" | "Xanomeline High Dose" | 71.0 | "Male" |
| … | … | … | … |
| "01-718-1371" | "Xanomeline High Dose" | 69.0 | "Female" |
| "01-718-1427" | "Xanomeline High Dose" | 74.0 | "Female" |
Key Polars operations for clinical reporting include:
4.5.1 I/O
Polars supports multiple data formats for input and output (see the I/O guide). For clinical development, we recommend the .parquet format because tools in Python, R, and Julia can read and write it without conversion. The example below loads subject-level ADSL data with Polars.
import polars as pladsl = pl.read_parquet("data/adsl.parquet")
adsl = adsl.select("STUDYID", "USUBJID", "TRT01A", "AGE", "SEX") # select columns
adsl| STUDYID | USUBJID | TRT01A | AGE | SEX |
|---|---|---|---|---|
| str | str | str | f64 | str |
| "CDISCPILOT01" | "01-701-1015" | "Placebo" | 63.0 | "Female" |
| "CDISCPILOT01" | "01-701-1023" | "Placebo" | 64.0 | "Male" |
| "CDISCPILOT01" | "01-701-1028" | "Xanomeline High Dose" | 71.0 | "Male" |
| … | … | … | … | … |
| "CDISCPILOT01" | "01-718-1371" | "Xanomeline High Dose" | 69.0 | "Female" |
| "CDISCPILOT01" | "01-718-1427" | "Xanomeline High Dose" | 74.0 | "Female" |
4.5.2 Filtering
Filtering in Polars uses the .filter() method with column expressions. Below are examples applied to the ADSL data.
# Filter female subjects
adsl.filter(pl.col("SEX") == "Female")| STUDYID | USUBJID | TRT01A | AGE | SEX |
|---|---|---|---|---|
| str | str | str | f64 | str |
| "CDISCPILOT01" | "01-701-1015" | "Placebo" | 63.0 | "Female" |
| "CDISCPILOT01" | "01-701-1034" | "Xanomeline High Dose" | 77.0 | "Female" |
| "CDISCPILOT01" | "01-701-1047" | "Placebo" | 85.0 | "Female" |
| … | … | … | … | … |
| "CDISCPILOT01" | "01-718-1371" | "Xanomeline High Dose" | 69.0 | "Female" |
| "CDISCPILOT01" | "01-718-1427" | "Xanomeline High Dose" | 74.0 | "Female" |
# Filter subjects with Age >= 65
adsl.filter(pl.col("AGE") >= 65)| STUDYID | USUBJID | TRT01A | AGE | SEX |
|---|---|---|---|---|
| str | str | str | f64 | str |
| "CDISCPILOT01" | "01-701-1028" | "Xanomeline High Dose" | 71.0 | "Male" |
| "CDISCPILOT01" | "01-701-1033" | "Xanomeline Low Dose" | 74.0 | "Male" |
| "CDISCPILOT01" | "01-701-1034" | "Xanomeline High Dose" | 77.0 | "Female" |
| … | … | … | … | … |
| "CDISCPILOT01" | "01-718-1371" | "Xanomeline High Dose" | 69.0 | "Female" |
| "CDISCPILOT01" | "01-718-1427" | "Xanomeline High Dose" | 74.0 | "Female" |
4.5.3 Deriving
Deriving new variables is common in clinical data analysis for creating age groups, BMI categories, or treatment flags. Polars uses .with_columns() to add new columns while keeping existing ones.
# Create age groups
adsl.with_columns([
pl.when(pl.col("AGE") < 65)
.then(pl.lit("<65"))
.otherwise(pl.lit(">=65"))
.alias("AGECAT")
])| STUDYID | USUBJID | TRT01A | AGE | SEX | AGECAT |
|---|---|---|---|---|---|
| str | str | str | f64 | str | str |
| "CDISCPILOT01" | "01-701-1015" | "Placebo" | 63.0 | "Female" | "<65" |
| "CDISCPILOT01" | "01-701-1023" | "Placebo" | 64.0 | "Male" | "<65" |
| "CDISCPILOT01" | "01-701-1028" | "Xanomeline High Dose" | 71.0 | "Male" | ">=65" |
| … | … | … | … | … | … |
| "CDISCPILOT01" | "01-718-1371" | "Xanomeline High Dose" | 69.0 | "Female" | ">=65" |
| "CDISCPILOT01" | "01-718-1427" | "Xanomeline High Dose" | 74.0 | "Female" | ">=65" |
4.5.4 Grouping
Grouping operations are fundamental for creating summary statistics in clinical reports. Polars uses group_by() followed by aggregation functions to compute counts, means, and other statistics by categorical variables like treatment groups.
The .count() method provides a quick way to get subject counts by group.
# Count by treatment group
adsl.group_by("TRT01A").len().sort("TRT01A")| TRT01A | len |
|---|---|
| str | u32 |
| "Placebo" | 86 |
| "Xanomeline High Dose" | 84 |
| "Xanomeline Low Dose" | 84 |
You can also use .agg() with multiple aggregation functions:
# Age statistics by treatment group
adsl.group_by("TRT01A").agg([
pl.col("AGE").mean().round(1).alias("mean_age"),
pl.col("AGE").std().round(2).alias("sd_age")
]).sort("TRT01A")| TRT01A | mean_age | sd_age |
|---|---|---|
| str | f64 | f64 |
| "Placebo" | 75.2 | 8.59 |
| "Xanomeline High Dose" | 74.4 | 7.89 |
| "Xanomeline Low Dose" | 75.7 | 8.29 |
4.5.5 Joining
Joining datasets is essential for combining subject-level data (ADSL) with event-level data (e.g. ADAE, ADLB). Polars supports various join types including inner, left, and full joins.
Here is a toy example that splits ADSL and joins it back by USUBJID.
# Create a simple demographics subset
demo = adsl.select("USUBJID", "AGE", "SEX").head(3)
# Create treatment info subset
trt = adsl.select("USUBJID", "TRT01A").head(3)
# Left join to combine datasets
demo.join(trt, on="USUBJID", how="left")| USUBJID | AGE | SEX | TRT01A |
|---|---|---|---|
| str | f64 | str | str |
| "01-701-1015" | 63.0 | "Female" | "Placebo" |
| "01-701-1023" | 64.0 | "Male" | "Placebo" |
| "01-701-1028" | 71.0 | "Male" | "Xanomeline High Dose" |
4.5.6 Pivoting
Pivoting transforms data from long to wide format, commonly needed for creating tables. Use .pivot() to reshape grouped data into columns.
# Create summary by treatment and sex
(
adsl
.group_by(["TRT01A", "SEX"])
.agg(pl.len().alias("n"))
.pivot(
values="n",
index="SEX",
on="TRT01A"
)
)| SEX | Xanomeline Low Dose | Placebo | Xanomeline High Dose |
|---|---|---|---|
| str | u32 | u32 | u32 |
| "Female" | 50 | 53 | 40 |
| "Male" | 34 | 33 | 44 |
Having covered the essential Polars operations for data manipulation, we now turn to the second component of our clinical reporting workflow: formatting and presenting the processed data in regulatory-compliant RTF format.
4.6 rtflite
import rtflite as rtfrtflite is a Python package for creating production-ready tables and figures in RTF format. While Polars handles the data processing and statistical calculations, rtflite focuses exclusively on the presentation layer. The package is designed to:
- Provide simple Python classes that map to table elements (title, headers, body, footnotes) for intuitive table construction.
- Offer a canonical Python API with a clear, composable interface.
- Focus exclusively on table formatting and layout, leaving data manipulation to dataframe libraries like polars or pandas.
- Minimize external dependencies for maximum portability and reliability.
Creating an RTF table involves three steps:
- Design the desired table layout and structure.
- Configure the appropriate rtflite components.
- Generate and save the RTF document.
This guide introduces rtflite’s core components and demonstrates how to turn dataframes into Tables, Listings, and Figures (TLFs) for clinical reporting.
4.6.1 Data: adverse events
To explore the RTF generation capabilities in rtflite, we will use the dataset data/adae.parquet. This dataset contains adverse event (AE) information from a clinical trial.
Below are the meanings of relevant variables:
USUBJID: Unique Subject IdentifierTRTA: Actual TreatmentAEDECOD: Dictionary-Derived Term
# Load adverse events data
df = pl.read_parquet("data/adae.parquet")
df.select(["USUBJID", "TRTA", "AEDECOD"])| USUBJID | TRTA | AEDECOD |
|---|---|---|
| str | str | str |
| "01-701-1015" | "Placebo" | "APPLICATION SITE ERYTHEMA" |
| "01-701-1015" | "Placebo" | "APPLICATION SITE PRURITUS" |
| "01-701-1015" | "Placebo" | "DIARRHOEA" |
| … | … | … |
| "01-718-1427" | "Xanomeline High Dose" | "DECREASED APPETITE" |
| "01-718-1427" | "Xanomeline High Dose" | "NAUSEA" |
4.6.2 Table-ready data
In this AE example, we provide the number of subjects with each type of AE by treatment group.
tbl = (
df.group_by(["TRTA", "AEDECOD"])
.agg(pl.len().alias("n"))
.sort("TRTA")
.pivot(values="n", index="AEDECOD", on="TRTA")
.fill_null(0)
.sort("AEDECOD") # Sort by adverse event name to match R output
)
tbl| AEDECOD | Placebo | Xanomeline High Dose | Xanomeline Low Dose |
|---|---|---|---|
| str | u32 | u32 | u32 |
| "ABDOMINAL DISCOMFORT" | 0 | 1 | 0 |
| "ABDOMINAL PAIN" | 1 | 2 | 3 |
| "ACROCHORDON EXCISION" | 0 | 1 | 0 |
| … | … | … | … |
| "WOUND" | 0 | 0 | 2 |
| "WOUND HAEMORRHAGE" | 0 | 1 | 0 |
4.6.3 Table component classes
rtflite provides dedicated classes for each table component. Commonly used classes include:
RTFPage: RTF page information (orientation, margins, pagination).RTFPageHeader: Page headers with page numbering (compatible with r2rtf).RTFPageFooter: Page footers for attribution and notices.RTFTitle: RTF title information.RTFColumnHeader: RTF column header information.RTFBody: RTF table body information.RTFFootnote: RTF footnote information.RTFSource: RTF data source information.
These component classes work together to build complete RTF documents. A full list of all classes and their parameters can be found in the API reference.
4.6.4 Simple example
A minimal example below illustrates how to combine components to create an RTF table.
RTFBody()defines table body layout.RTFDocument()transfers table layout information into RTF syntax.write_rtf()saves encoded RTF into a.rtffile.
rtf/tlf_overview1.rtf
PosixPath('pdf/tlf_overview1.pdf')
4.6.5 Column width
If we want to adjust the width of each column to provide more space to the first column, this can be achieved by updating col_rel_width in RTFBody.
The input of col_rel_width is a list with the same length as the number of columns. This argument defines the relative length of each column within a pre-defined total column width.
In this example, the defined relative width is 3:2:2:2. Only the ratio of col_rel_width is used. Therefore it is equivalent to use col_rel_width = [6,4,4,4] or col_rel_width = [1.5,1,1,1].
rtf/tlf_overview2.rtf
PosixPath('pdf/tlf_overview2.pdf')
4.6.6 Column headers
In RTFColumnHeader, the text argument provides the column header content as a list of strings.
rtf/tlf_overview3.rtf
PosixPath('pdf/tlf_overview3.pdf')
4.6.7 Titles, footnotes, and data source
RTF documents can include additional components to provide context and documentation:
RTFTitle: Add document titles and subtitlesRTFFootnote: Add explanatory footnotesRTFSource: Add data source attribution
rtf/tlf_overview5.rtf
PosixPath('pdf/tlf_overview5.pdf')
Note the use of \\line in column headers to create line breaks within cells.
4.6.8 Text formatting and alignment
rtflite supports various text formatting options:
- Text formatting: Bold (
b), italic (i), underline (u), strikethrough (s) - Text alignment: Left (
l), center (c), right (r), justify (j) - Font properties: Font size, font family
rtf/tlf_overview6.rtf
PosixPath('pdf/tlf_overview6.pdf')
4.6.9 Border customization
Table borders can be customized extensively:
- Border styles:
single,double,thick,dotted,dashed - Border sides:
border_top,border_bottom,border_left,border_right - Page borders:
border_first,border_lastfor first/last rows across pages
rtf/tlf_overview7.rtf
PosixPath('pdf/tlf_overview7.pdf')
4.7 Next steps
Having covered the fundamental concepts and tools for creating clinical TLFs with Python, readers can explore specific implementations based on their requirements:
Each chapter provides step-by-step tutorials with reproducible code examples that can be adapted for specific clinical reporting requirements.