4 TLF overview

Objective

Understand the regulatory context and importance of TLFs in clinical study reports. Learn basic concepts of Polars and rtflite for TLF generation.

4.1 Overview

Tables, listings, and figures (TLFs) are essential components of clinical study reports (CSRs) and regulatory submissions. Following ICH E3 guidance, TLFs provide standardized summaries of clinical trial data that support regulatory decision-making.

This chapter provides an overview of creating TLFs using Python, focusing on the tools and workflows demonstrated throughout this book.

4.2 Background

Submitting clinical trial results to regulatory agencies is a crucial aspect of clinical development. The Electronic Common Technical Document (eCTD) has emerged as the global standard format for regulatory submissions. For instance, the United States Food and Drug Administration (US FDA) mandates the use of eCTD for new drug applications and biologics license applications.

A CSR provides comprehensive information about the methods and results of an individual clinical study. To support the statistical analysis, numerous tables, listings, and figures are included within the main text and appendices. The creation of a CSR is a collaborative effort that involves various professionals such as clinicians, medical writers, statisticians, and statistical programmers.

Within an organization, these professionals typically collaborate to define, develop, validate, and deliver the necessary TLFs for a CSR. These TLFs serve to summarize the efficacy and/or safety of the pharmaceutical product under study. In the pharmaceutical industry, Microsoft Word is widely utilized for CSR preparation. As a result, the deliverables from statisticians and statistical programmers are commonly provided in formats such as .rtf, .doc, .docx to align with industry standards and requirements.

Note

Each organization may define specific TLF format requirements that differ from the examples in this book. It is advisable to consult and adhere to the guidelines and specifications set by your respective organization when preparing TLFs for submission.

By following the ICH E3 guidance, most TLFs in a CSR are located at:

Section 10: Study participants
Section 11: Efficacy evaluation
Section 12: Safety evaluation
Section 14: Tables, listings, and figures referenced but not included in the text
Section 16: Appendices

4.3 Datasets

The dataset structure follows CDISC Analysis Data Model (ADaM).

In this project, we use publicly available CDISC pilot study data, which is accessible through the CDISC GitHub repository.

We have converted these datasets from the .xpt format to the .parquet format for ease of use and compatibility with Python tools. The dataset structure adheres to the CDISC Analysis Data Model (ADaM) standard.

4.4 Tools

To exemplify the generation of TLFs in RTF format, we rely on the functionality provided by two Python packages:

Polars: Preparation of datasets in a format suitable for reporting purposes. Polars offers a comprehensive suite of tools and functions for data manipulation and transformation, ensuring that the data is structured appropriately.
rtflite: Creation of RTF files. The rtflite package offers functions specifically designed for generating RTF files, allowing us to produce TLFs in the desired format.

4.5 Polars

Polars is an open-source library for data manipulation implemented in Rust with Python bindings. It offers exceptional performance while maintaining a user-friendly interface for interactive data analysis.

Key advantages of Polars include:

Performance: 10-100x faster than pandas for most operations due to Rust implementation
Memory efficiency: Lazy evaluation and columnar storage reduce memory usage
Familiar syntax: Similar to tidyverse-style pipelines, making it accessible to R users
Type safety: Strong typing system that catches errors early in development

The creators of Polars have provided exceptional documentation and tutorials that serve as valuable resources for learning and mastering the functionalities of the library.

Furthermore, several books are available that serve as introductions to Polars:

Python Polars: The Definitive Guide

Note

In this book, we assume that the reader has some experience with data manipulation concepts. This prior knowledge enables a more efficient and focused exploration of the clinical reporting concepts presented throughout the book.

To illustrate the basic usage of Polars, let’s work with a sample ADSL dataset. This dataset contains subject-level information from a clinical trial, which will serve as a practical example for generating summaries using Polars.

import polars as pl

polars.config.Config

# Read clinical data
adsl = pl.read_parquet("data/adsl.parquet")

# Select columns
adsl = adsl.select(["USUBJID", "TRT01A", "AGE", "SEX"])

# Basic data exploration
adsl

shape: (254, 4)

USUBJID	TRT01A	AGE	SEX
str	str	f64	str
"01-701-1015"	"Placebo"	63.0	"Female"
"01-701-1023"	"Placebo"	64.0	"Male"
"01-701-1028"	"Xanomeline High Dose"	71.0	"Male"
…	…	…	…
"01-718-1371"	"Xanomeline High Dose"	69.0	"Female"
"01-718-1427"	"Xanomeline High Dose"	74.0	"Female"

Key Polars operations for clinical reporting include:

4.5.1 I/O

Polars supports multiple data formats for input and output (see the I/O guide). For clinical development, we recommend the .parquet format because tools in Python, R, and Julia can read and write it without conversion. The example below loads subject-level ADSL data with Polars.

import polars as pl

adsl = pl.read_parquet("data/adsl.parquet")
adsl = adsl.select("STUDYID", "USUBJID", "TRT01A", "AGE", "SEX") # select columns
adsl

shape: (254, 5)

STUDYID	USUBJID	TRT01A	AGE	SEX
str	str	str	f64	str
"CDISCPILOT01"	"01-701-1015"	"Placebo"	63.0	"Female"
"CDISCPILOT01"	"01-701-1023"	"Placebo"	64.0	"Male"
"CDISCPILOT01"	"01-701-1028"	"Xanomeline High Dose"	71.0	"Male"
…	…	…	…	…
"CDISCPILOT01"	"01-718-1371"	"Xanomeline High Dose"	69.0	"Female"
"CDISCPILOT01"	"01-718-1427"	"Xanomeline High Dose"	74.0	"Female"

4.5.2 Filtering

Filtering in Polars uses the .filter() method with column expressions. Below are examples applied to the ADSL data.

# Filter female subjects
adsl.filter(pl.col("SEX") == "Female")

shape: (143, 5)

STUDYID	USUBJID	TRT01A	AGE	SEX
str	str	str	f64	str
"CDISCPILOT01"	"01-701-1015"	"Placebo"	63.0	"Female"
"CDISCPILOT01"	"01-701-1034"	"Xanomeline High Dose"	77.0	"Female"
"CDISCPILOT01"	"01-701-1047"	"Placebo"	85.0	"Female"
…	…	…	…	…
"CDISCPILOT01"	"01-718-1371"	"Xanomeline High Dose"	69.0	"Female"
"CDISCPILOT01"	"01-718-1427"	"Xanomeline High Dose"	74.0	"Female"

# Filter subjects with Age >= 65
adsl.filter(pl.col("AGE") >= 65)

shape: (221, 5)

STUDYID	USUBJID	TRT01A	AGE	SEX
str	str	str	f64	str
"CDISCPILOT01"	"01-701-1028"	"Xanomeline High Dose"	71.0	"Male"
"CDISCPILOT01"	"01-701-1033"	"Xanomeline Low Dose"	74.0	"Male"
"CDISCPILOT01"	"01-701-1034"	"Xanomeline High Dose"	77.0	"Female"
…	…	…	…	…
"CDISCPILOT01"	"01-718-1371"	"Xanomeline High Dose"	69.0	"Female"
"CDISCPILOT01"	"01-718-1427"	"Xanomeline High Dose"	74.0	"Female"

4.5.3 Deriving

Deriving new variables is common in clinical data analysis for creating age groups, BMI categories, or treatment flags. Polars uses .with_columns() to add new columns while keeping existing ones.

# Create age groups
adsl.with_columns([
    pl.when(pl.col("AGE") < 65)
      .then(pl.lit("<65"))
      .otherwise(pl.lit(">=65"))
      .alias("AGECAT")
])

shape: (254, 6)

STUDYID	USUBJID	TRT01A	AGE	SEX	AGECAT
str	str	str	f64	str	str
"CDISCPILOT01"	"01-701-1015"	"Placebo"	63.0	"Female"	"<65"
"CDISCPILOT01"	"01-701-1023"	"Placebo"	64.0	"Male"	"<65"
"CDISCPILOT01"	"01-701-1028"	"Xanomeline High Dose"	71.0	"Male"	">=65"
…	…	…	…	…	…
"CDISCPILOT01"	"01-718-1371"	"Xanomeline High Dose"	69.0	"Female"	">=65"
"CDISCPILOT01"	"01-718-1427"	"Xanomeline High Dose"	74.0	"Female"	">=65"

4.5.4 Grouping

Grouping operations are fundamental for creating summary statistics in clinical reports. Polars uses group_by() followed by aggregation functions to compute counts, means, and other statistics by categorical variables like treatment groups.

The .count() method provides a quick way to get subject counts by group.

# Count by treatment group
adsl.group_by("TRT01A").len().sort("TRT01A")

shape: (3, 2)

TRT01A	len
str	u32
"Placebo"	86
"Xanomeline High Dose"	84
"Xanomeline Low Dose"	84

You can also use .agg() with multiple aggregation functions:

# Age statistics by treatment group
adsl.group_by("TRT01A").agg([
    pl.col("AGE").mean().round(1).alias("mean_age"),
    pl.col("AGE").std().round(2).alias("sd_age")
]).sort("TRT01A")

shape: (3, 3)

TRT01A	mean_age	sd_age
str	f64	f64
"Placebo"	75.2	8.59
"Xanomeline High Dose"	74.4	7.89
"Xanomeline Low Dose"	75.7	8.29

4.5.5 Joining

Joining datasets is essential for combining subject-level data (ADSL) with event-level data (e.g. ADAE, ADLB). Polars supports various join types including inner, left, and full joins.

Here is a toy example that splits ADSL and joins it back by USUBJID.

# Create a simple demographics subset
demo = adsl.select("USUBJID", "AGE", "SEX").head(3)

# Create treatment info subset
trt = adsl.select("USUBJID", "TRT01A").head(3)

# Left join to combine datasets
demo.join(trt, on="USUBJID", how="left")

shape: (3, 4)

USUBJID	AGE	SEX	TRT01A
str	f64	str	str
"01-701-1015"	63.0	"Female"	"Placebo"
"01-701-1023"	64.0	"Male"	"Placebo"
"01-701-1028"	71.0	"Male"	"Xanomeline High Dose"

4.5.6 Pivoting

Pivoting transforms data from long to wide format, commonly needed for creating tables. Use .pivot() to reshape grouped data into columns.

# Create summary by treatment and sex
(
    adsl
        .group_by(["TRT01A", "SEX"])
        .agg(pl.len().alias("n"))
        .pivot(
            values="n",
            index="SEX",
            on="TRT01A"
        )
)

shape: (2, 4)

SEX	Xanomeline Low Dose	Placebo	Xanomeline High Dose
str	u32	u32	u32
"Female"	50	53	40
"Male"	34	33	44

Having covered the essential Polars operations for data manipulation, we now turn to the second component of our clinical reporting workflow: formatting and presenting the processed data in regulatory-compliant RTF format.

4.6 rtflite

import rtflite as rtf

rtflite is a Python package for creating production-ready tables and figures in RTF format. While Polars handles the data processing and statistical calculations, rtflite focuses exclusively on the presentation layer. The package is designed to:

Provide simple Python classes that map to table elements (title, headers, body, footnotes) for intuitive table construction.
Offer a canonical Python API with a clear, composable interface.
Focus exclusively on table formatting and layout, leaving data manipulation to dataframe libraries like polars or pandas.
Minimize external dependencies for maximum portability and reliability.

Creating an RTF table involves three steps:

Design the desired table layout and structure.
Configure the appropriate rtflite components.
Generate and save the RTF document.

This guide introduces rtflite’s core components and demonstrates how to turn dataframes into Tables, Listings, and Figures (TLFs) for clinical reporting.

4.6.1 Data: adverse events

To explore the RTF generation capabilities in rtflite, we will use the dataset data/adae.parquet. This dataset contains adverse event (AE) information from a clinical trial.

Below are the meanings of relevant variables:

USUBJID: Unique Subject Identifier
TRTA: Actual Treatment
AEDECOD: Dictionary-Derived Term

# Load adverse events data
df = pl.read_parquet("data/adae.parquet")

df.select(["USUBJID", "TRTA", "AEDECOD"])

shape: (1_191, 3)

USUBJID	TRTA	AEDECOD
str	str	str
"01-701-1015"	"Placebo"	"APPLICATION SITE ERYTHEMA"
"01-701-1015"	"Placebo"	"APPLICATION SITE PRURITUS"
"01-701-1015"	"Placebo"	"DIARRHOEA"
…	…	…
"01-718-1427"	"Xanomeline High Dose"	"DECREASED APPETITE"
"01-718-1427"	"Xanomeline High Dose"	"NAUSEA"

4.6.2 Table-ready data

In this AE example, we provide the number of subjects with each type of AE by treatment group.

tbl = (
    df.group_by(["TRTA", "AEDECOD"])
    .agg(pl.len().alias("n"))
    .sort("TRTA")
    .pivot(values="n", index="AEDECOD", on="TRTA")
    .fill_null(0)
    .sort("AEDECOD")  # Sort by adverse event name to match R output
)

tbl

shape: (242, 4)

AEDECOD	Placebo	Xanomeline High Dose	Xanomeline Low Dose
str	u32	u32	u32
"ABDOMINAL DISCOMFORT"	0	1	0
"ABDOMINAL PAIN"	1	2	3
"ACROCHORDON EXCISION"	0	1	0
…	…	…	…
"WOUND"	0	0	2
"WOUND HAEMORRHAGE"	0	1	0

4.6.3 Table component classes

rtflite provides dedicated classes for each table component. Commonly used classes include:

RTFPage: RTF page information (orientation, margins, pagination).
RTFPageHeader: Page headers with page numbering (compatible with r2rtf).
RTFPageFooter: Page footers for attribution and notices.
RTFTitle: RTF title information.
RTFColumnHeader: RTF column header information.
RTFBody: RTF table body information.
RTFFootnote: RTF footnote information.
RTFSource: RTF data source information.

These component classes work together to build complete RTF documents. A full list of all classes and their parameters can be found in the API reference.

4.6.4 Simple example

A minimal example below illustrates how to combine components to create an RTF table.

RTFBody() defines table body layout.
RTFDocument() transfers table layout information into RTF syntax.
write_rtf() saves encoded RTF into a .rtf file.

rtf/tlf_overview1.rtf

PosixPath('pdf/tlf_overview1.pdf')

4.6.5 Column width

If we want to adjust the width of each column to provide more space to the first column, this can be achieved by updating col_rel_width in RTFBody.

The input of col_rel_width is a list with the same length as the number of columns. This argument defines the relative length of each column within a pre-defined total column width.

In this example, the defined relative width is 3:2:2:2. Only the ratio of col_rel_width is used. Therefore it is equivalent to use col_rel_width = [6,4,4,4] or col_rel_width = [1.5,1,1,1].

rtf/tlf_overview2.rtf

PosixPath('pdf/tlf_overview2.pdf')

4.6.6 Column headers

In RTFColumnHeader, the text argument provides the column header content as a list of strings.

rtf/tlf_overview3.rtf

PosixPath('pdf/tlf_overview3.pdf')

We also allow column headers to be displayed in multiple lines. If an empty column name is needed for a column, you can insert an empty string. For example, ["name 1", "", "name 3"].

In RTFColumnHeader, the col_rel_width can be used to align column headers with different numbers of columns.

By using RTFColumnHeader with col_rel_width, one can customize complex column headers. If there are multiple pages, the column header will repeat on each page by default.

rtf/tlf_overview4.rtf

PosixPath('pdf/tlf_overview4.pdf')

4.6.7 Titles, footnotes, and data source

RTF documents can include additional components to provide context and documentation:

RTFTitle: Add document titles and subtitles
RTFFootnote: Add explanatory footnotes
RTFSource: Add data source attribution

rtf/tlf_overview5.rtf

PosixPath('pdf/tlf_overview5.pdf')

Note the use of \\line in column headers to create line breaks within cells.

4.6.8 Text formatting and alignment

rtflite supports various text formatting options:

Text formatting: Bold (b), italic (i), underline (u), strikethrough (s)
Text alignment: Left (l), center (c), right (r), justify (j)
Font properties: Font size, font family

rtf/tlf_overview6.rtf

PosixPath('pdf/tlf_overview6.pdf')

4.6.9 Border customization

Table borders can be customized extensively:

Border styles: single, double, thick, dotted, dashed
Border sides: border_top, border_bottom, border_left, border_right
Page borders: border_first, border_last for first/last rows across pages

rtf/tlf_overview7.rtf

PosixPath('pdf/tlf_overview7.pdf')

4.7 Next steps

Having covered the fundamental concepts and tools for creating clinical TLFs with Python, readers can explore specific implementations based on their requirements:

Each chapter provides step-by-step tutorials with reproducible code examples that can be adapted for specific clinical reporting requirements.