R/Pharma 2025 Workshop
2025-11-07
Four parts of this workshop:
Python environment setup (Nan)
Use uv to create and manage reproducible Python projects. Develop and collaborate in GitHub Codespaces, Visual Studio Code, or Positron.
Python packages for clinical reporting (Yilong)
A guided tour of essential packages such as polars and rtflite, with demonstrations of creating TLFs commonly used in clinical trials.
Manage clinical trial A&R projects (Yilong)
Practical project structure, conventions, and execution from data to deliverables.
Prepare eCTD submission packages (Nan)
An example workflow for assembling submission-ready source code and outputs using py-pkglite, aligned with eCTD requirements.
The views and opinions expressed in this presentation are those of the individual presenters and do not represent those of their affiliated organizations or institutions.
With Python, learning how to:
Note
The toolchain, process, and formats may be different in different organizations. We only provide one common way to address them.
Note
Interested in R? check https://r4csr.org/
R/Pharma organizers
Team members from Meta Platforms and Merck & Co., Inc., Rahway, NJ, USA
Contributors of pycsr and r4csr training materials
In this workshop, we assume you have basic Python programming experience and clinical development knowledge.
Pre-configured Codespaces for pycsr book
Examples:
adsl, adae, etc.Training material: https://pycsr.org/
During the workshop, we will use pycsr and demo-py-esub projects
We share the same automation philosophy as the R community described in Section 1.1 of the R Packages book and quote here.
Three recommended options:
GitHub Codespaces
Positron
VS Code
uv is a modern Python package and project manager written in Rust.
Replaces scattered toolchain:
pip + venv + pyenv + pip-tools + setuptoolsBenefits:
pyproject.toml as single source of truthSkip if using GitHub Codespaces: uv is pre-installed there.
macOS/Linux:
Windows:
Verify:
Ruff - Code formatting and linting
mypy - Type checking
pytest - Testing framework
All configured in pyproject.toml.
Virtual environments are mandatory in Python
Dependency locking
uv.lock pins exact versionsrenv.lock.python-version file
3.14.0)The ICH E3: structure and content of clinical study reports provide guidance to assist sponsors in the development of a CSR.
In a CSR, most of TLFs are located in:
Publicly available CDISC pilot study data located at the CDISC GitHub repository.
The dataset structure follows the CDISC Analysis Data Model (ADaM).
Source data: https://github.com/elong0527/r4csr/tree/main/data-adam
Converted parquet data: https://github.com/nanxstats/pycsr/tree/main/data
polars: Python package for data manipulation similar to dplyr/tidyr R packages
rtflite: Python package for creating production-ready tables and figures in RTF format similar to R package r2rtf
https://pycsr.org/tlf-overview.html#polars
Modern Python dataframe library designed for performance and expressiveness.
Key advantages:
Core operations:
Counting participants:
Calculating percentages:
Pivoting to wide format:
In the pharmaceutical industry, RTF/Microsoft Word play a central role in preparing clinical study reports
Different organizations can have different table standards
rtflite package provides the flexibility to customize table appearance for
Note
rtflite package also provides the flexibility to convert figures in RTF format.
rtflite provides Python classes RTFDocument that map to table elements. The goal is to help you translate data frame to tables in RTF file.
rtflite
Key concepts:
.pivot() to reshape data to wide format.fill_null(0)Key concepts:
Key concepts:
Key concepts:
Key concepts:
.n_unique()Key concepts:
.str.to_titlecase()A Python package designed specifically to organize analysis scripts and code for a clinical trial project.
Purpose:
Our primary focus is creating a standard folder structure to organize the project, with 4 goals in mind:
Combines:
demo-py-esub/
├── pyproject.toml # Project metadata
├── .python-version # Python version
├── uv.lock # Locked dependencies
├── src/demo001/ # Study-specific code
│ ├── __init__.py
│ └── utils.py
├── analysis/ # Quarto analysis docs
│ └── tlf-*.qmd
├── data/ # ADaM datasets
├── output/ # Generated TLFs
└── tests/ # Validation tests
Consistency
Reproducibility
uv.lock pins dependencies.python-version specifies PythonAutomation
uv sync restores environmentquarto render generates outputspytest validates codeCompliance
Core principle: All project assets in version control.
Plain text workflow:
.qmd files for analysis (not .ipynb for final deliverables).md files for documentation.toml files for configuration.xlsx files for trackingProject tracking:
Planning:
Development:
analysis/ and src/Validation:
tests/ruff, mypy, pytest)Delivery:
quarto renderFDA Study Data Technical Conformance Guide Section 4.1.2.10:
Submit programs for primary and secondary efficacy analyses. Specify software in ADRG. Use ASCII text format. No executable extensions.
Goal: Enable reviewers to understand and confirm analysis algorithms.
Analysis package: https://github.com/elong0527/demo-py-esub
Submission package: https://github.com/elong0527/demo-py-ectd
Clone and explore to see complete examples.
m5/datasets/<study-id>/analysis/adam/
├── datasets/
│ ├── *.xpt # ADaM datasets
│ ├── define.xml
│ ├── adrg.pdf # Instructions
│ └── analysis-results-metadata.pdf
└── programs/
├── py0pkgs.txt # Packed Python package
├── tlf-01-*.txt # Analysis programs
└── tlf-02-*.txt
Key: All files in programs/ must be ASCII text.
Packs Python projects into portable text files.
Why needed:
pkglite capabilities:
.txt fileDocumentation: https://pharmaverse.github.io/py-pkglite/
1. Create .pkgliteignore
2. Pack the package
3. Convert Quarto to Python scripts
.qmd -> verify it works.qmd -> .ipynb -> .pyruff.txt (no .py extension)Human-readable Debian Control File (DCF) format:
# Generated by py-pkglite
# Use `pkglite unpack` to restore
Package: demo-py-esub
File: pyproject.toml
Format: text
Content:
[project]
name = "demo001"
version = "0.1.0"
...
Reviewers can read without special tools.
Document the Python environment:
Python environment:
| Software | Version | Description |
|---|---|---|
| Python | 3.14.0 | Programming language |
| uv | 0.9.9 | Package manager |
Packages:
| Package | Version | Description |
|---|---|---|
| polars | 1.35.1 | Data manipulation |
| rtflite | 1.1.0 | RTF generation |
| demo001 | 0.1.0 | Study functions |
Appendix: Step-by-step reproduction instructions.
Essential: Simulate reviewer experience before submission.
Workflow:
uvx pkglite unpack programs/py0pkgs.txt -o .cd demo-py-esub && uv syncpython ../programs/tlf-*.txtCatches: Missing dependencies, path errors, platform issues.
Book:
Regulatory:
Technical: