11 Packaging overview
Understand the concept of analysis packages for clinical study reports. Learn how Python packages provide structure, reproducibility, and compliance for regulatory submissions.
11.1 What is an analysis package
An analysis package is a Python package designed specifically to organize analysis scripts and code for a clinical trial project.
Unlike general-purpose Python packages distributed on PyPI, analysis packages serve as:
- Project containers for clinical trial deliverables
- Reproducible environments for analyses
- Submission-ready structures for regulatory review
Think of it as combining:
- Python package structure (for code organization)
- Quarto project (for report generation)
- Regulatory requirements (for eCTD submission)
11.2 Why use an analysis package
Clinical trial projects have unique needs that standard Python projects may not address:
Regulatory compliance:
- FDA requires submission of analysis programs in ASCII text format
- Reviewers must be able to reproduce your results
- Documentation must explain the analysis process
Team collaboration:
- Multiple statisticians and programmers work on hundreds of tables
- Consistent structure reduces communication overhead
- Shared functions avoid code duplication
Long-term maintenance:
- Analysis must be reproducible years later
- Environment must be reconstructable
- Code and data provenance must be clear
The Python package structure addresses these needs systematically.
11.3 Analysis package vs standard package
Python packages serve different purposes depending on context.
Standard Python package (for PyPI):
- Purpose: Share reusable functionality
- Audience: General Python community
- Scope: Generic, broadly applicable functions
- Example:
polars,plotnine,rtflite
Analysis package (for submissions):
- Purpose: Organize trial-specific analyses
- Audience: Study team and regulators
- Scope: Study-specific tables, listings, figures
- Example:
demo001(DEMO-001 study analysis)
In R terms, think of an analysis package like a project-specific R package (e.g., esubdemo) versus a CRAN package (e.g., dplyr).
11.4 Key components
A typical analysis package contains:
Python package structure:
pyproject.toml: Project metadata and dependenciessrc/studyname/: Study-specific Python functionstests/: Validation and testing codeuv.lock: Exact dependency versions
Analysis content:
analysis/: Quarto documents for TLFsdata/: ADaM datasets (input)output/: Generated tables, listings, figures (output)
Documentation:
README.md: Project overview_quarto.yml: Quarto book configuration
This structure supports the complete lifecycle: development, validation, and submission.
11.5 Demo project
This book uses demo-py-esub as the demonstration project.
The project shows how to:
- Organize analysis code as a Python package
- Generate clinical study reports with Quarto
- Prepare deliverables for eCTD submission
Clone the project to follow along:
git clone https://github.com/elong0527/demo-py-esub.git
cd demo-py-esubThe project generates six TLFs:
- Disposition of patients
- Study population
- Baseline characteristics
- Efficacy analysis (ANCOVA)
- Adverse events summary
- Adverse events (specific)
These cover the most common clinical reporting scenarios.
11.6 Workflow overview
The typical workflow for an analysis package:
1. Project setup:
- Initialize Python package with uv
- Configure Quarto for report generation
- Set up version control with Git
2. Development:
- Write analysis functions in
src/ - Create Quarto documents in
analysis/ - Generate TLFs in
output/
3. Validation:
- Write tests in
tests/ - Perform independent review
- Verify outputs match specifications
4. Submission:
- Pack package into text files with pkglite
- Place files in eCTD Module 5 structure
- Update ADRG with reproduction instructions
The following chapters detail each stage.
11.7 Benefits of this approach
Using Python packages for clinical analysis provides:
Consistency:
- Standard structure across all projects
- Team members know where to find code and outputs
- Reduces onboarding time for new projects
Automation:
- uv manages dependencies automatically
- Quarto renders all reports in batch
- Testing frameworks verify correctness
Reproducibility:
uv.lockensures exact dependency versions.python-versionspecifies Python version- Repository snapshots freeze package ecosystem
Compliance:
- Built-in documentation with docstrings
- Testing infrastructure for validation
- Standard structure simplifies review
For regulatory submissions, reproducibility is not optional. The FDA expects to reconstruct your exact environment and verify your results.
11.8 What’s next
The next chapters cover:
- Package structure: Organizing code and content (Chapter 12)
- Project management: Git-centric workflows for collaboration (Chapter 13)
- Submission overview: eCTD requirements and pkglite (Chapter 14)
- Submission package: Packing for eCTD Module 5 (Chapter 15)
- Submission dryrun: Verifying reproducibility (Chapter 16)
With this foundation, you’re ready to learn how to manage analysis packages effectively.