12 Package structure
Learn the recommended directory structure for clinical analysis packages. Understand how to organize Python code, Quarto documents, data, and outputs in a reproducible, submission-ready layout.
12.1 Core principle
Organize clinical analysis projects as valid Python packages that leverage the entire Python packaging toolchain (especially uv).
Additionally, integrate essential components for clinical reporting:
- Quarto documents for reproducible analysis
- ADaM datasets as inputs
- Generated TLFs as outputs
This hybrid structure combines Python package best practices with clinical trial deliverable requirements.
12.2 Complete example structure
A clinical analysis package requires these essential components:
demo-py-esub/
├── pyproject.toml
├── .python-version
├── uv.lock
├── README.md
├── .gitignore
├── _quarto.yml
├── index.qmd
├── src/
│ └── demo001/
│ ├── __init__.py
│ ├── utils.py
│ ├── baseline.py
│ ├── efficacy.py
│ ├── population.py
│ └── safety.py
├── analysis/
│ ├── tlf-01-disposition.qmd
│ ├── tlf-02-population.qmd
│ ├── tlf-03-baseline.qmd
│ ├── tlf-04-efficacy-ancova.qmd
│ ├── tlf-05-ae-summary.qmd
│ └── tlf-06-specific.qmd
├── data/
│ ├── adsl.parquet
│ ├── adae.parquet
│ ├── adlbc.parquet
│ ├── advs.parquet
│ └── adtte.parquet
├── output/
│ ├── tlf-disposition.rtf
│ ├── tlf-population.rtf
│ ├── tlf-baseline.rtf
│ ├── tlf-efficacy-ancova.rtf
│ ├── tlf-ae-summary.rtf
│ └── tlf-ae-specific.rtf
└── tests/
├── __init__.py
├── test_utils.py
├── test_baseline.py
└── data/
└── adsl_subset.parquet
This structure satisfies both Python packaging standards and regulatory submission requirements.
In R terms, this combines:
- R package structure (
DESCRIPTION,R/,tests/) - Analysis project layout (
vignettes/,data/,output/)
12.3 Python package components
The Python package portion follows the Python Packaging User Guide.
12.3.1 pyproject.toml
The single source of truth for project configuration:
[project]
name = "demo001"
version = "0.1.0"
description = "Analysis package for DEMO-001 study"
readme = "README.md"
requires-python = ">=3.13"
dependencies = [
"plotnine>=0.15.1",
"polars>=1.35.2",
"rtflite>=1.1.0",
]
[dependency-groups]
dev = [
"mypy>=1.18.2",
"pytest>=9.0.1",
"ruff>=0.14.5",
]
[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"Key sections:
[project]: Package metadata[project].dependencies: Runtime dependencies for analysis[dependency-groups.dev]: Development tools (testing, linting)[build-system]: How to build the package
12.3.2 .python-version
Specifies the exact Python version:
3.14.0
Created by uv python pin 3.14.0.
Use the full MAJOR.MINOR.PATCH version (for example, 3.14.0), not just 3.14. This prevents drift as new patch versions are released.
12.3.3 uv.lock
Lock file with exact dependency versions:
version = 1
requires-python = ">=3.13"
[[package]]
name = "polars"
version = "1.35.1"
source = { registry = "https://pypi.org/simple" }
...This file is auto-generated by uv sync and uv lock.
Never edit manually. Commit to version control.
12.3.4 src/demo001/
Study-specific Python functions go here.
Following the src/ layout (recommended):
src/
└── demo001/
├── __init__.py # Package initialization
├── utils.py # Utility functions
├── baseline.py # Baseline characteristics
├── efficacy.py # Efficacy analysis
├── population.py # Population analysis
└── safety.py # Safety analysis
Why src/ layout?
- Prevents accidental imports from development directory
- Forces proper package installation
- Industry best practice
Each module contains related functions. For example, utils.py:
"""Utility functions for formatting and calculations."""
def fmt_num(x: float, digits: int = 1, width: int = 5) -> str:
"""Format a number with specified digits and width.
Parameters
----------
x : float
Number to format
digits : int
Number of decimal places
width : int
Total width of formatted string
Returns
-------
str
Formatted number string
Examples
--------
>>> fmt_num(12.345, digits=2, width=6)
' 12.35'
"""
return f"{x:>{width}.{digits}f}"Document all functions with docstrings following the NumPy docstring standard.
12.3.5 tests/
Validation tests using pytest:
tests/
├── __init__.py
├── test_utils.py # Test utility functions
├── test_baseline.py # Test baseline functions
└── data/ # Test data fixtures
└── adsl_subset.parquet
Example test:
# tests/test_utils.py
from demo001.utils import fmt_num
def test_fmt_num_basic():
"""Test basic number formatting."""
assert fmt_num(12.345, digits=2, width=6) == " 12.35"
def test_fmt_num_padding():
"""Test width padding."""
assert fmt_num(1.2, digits=1, width=5) == " 1.2"Run tests:
uv run pytestFor clinical submissions, high test coverage demonstrates code quality. Aim for >80% coverage for critical functions.
12.4 Quarto project components
The Quarto portion enables reproducible report generation.
12.4.1 _quarto.yml
Quarto project configuration:
project:
type: book
book:
title: "DEMO-001 Analysis Results"
chapters:
- index.qmd
- analysis/tlf-01-disposition.qmd
- analysis/tlf-02-population.qmd
- analysis/tlf-03-baseline.qmd
- analysis/tlf-04-efficacy-ancova.qmd
- analysis/tlf-05-ae-summary.qmd
- analysis/tlf-06-specific.qmd
format:
html:
theme: cosmoThis configures Quarto to render all analysis documents as a book.
12.4.2 index.qmd
Landing page for the Quarto book:
---
title: "DEMO-001 Clinical Study Report"
---
## Overview
This analysis package contains Tables, Listings, and Figures (TLFs)
for the DEMO-001 clinical trial.
## Study Information
- Protocol: DEMO-001
- Phase: III
- Indication: [Disease]
- Primary Endpoint: [Endpoint]
## Analysis Programs
The following TLFs are included:
- **Disposition**: Patient disposition table
- **Population**: Analysis population summary
- **Baseline**: Baseline characteristics
- **Efficacy**: Primary efficacy analysis (ANCOVA)
- **AE Summary**: Adverse events summary
- **AE Specific**: Specific adverse events12.4.3 analysis/
Analysis scripts as Quarto documents:
analysis/
├── tlf-01-disposition.qmd
├── tlf-02-population.qmd
├── tlf-03-baseline.qmd
├── tlf-04-efficacy-ancova.qmd
├── tlf-05-ae-summary.qmd
└── tlf-06-specific.qmd
Each .qmd file:
- Loads required data
- Performs analysis
- Generates formatted output
- Exports to RTF for submission
Example structure:
---
title: "Table 14.1.1 - Disposition of Patients"
---
## Load Data
```{python}
import polars as pl
from demo001.utils import fmt_num
adsl = pl.read_parquet("data/adsl.parquet")
```
## Analysis
```{python}
# Calculate disposition counts
disposition = adsl.group_by("TRTA").agg([
pl.len().alias("N"),
pl.col("EOSSTT").filter(pl.col("EOSSTT") == "COMPLETED").count().alias("Completed")
])
```
## Output
```{python}
from rtflite import Table
# Generate RTF table...
```For final submissions, convert .qmd files to .py scripts. Covered in Chapter 15.
12.5 Data and output directories
12.5.1 data/
Input datasets in Parquet format:
data/
├── adsl.parquet # Subject-level analysis dataset
├── adae.parquet # Adverse events
├── adlbc.parquet # Lab chemistry
├── advs.parquet # Vital signs
└── adtte.parquet # Time-to-event
Why Parquet?
- Faster than CSV for large datasets
- Preserves data types
- Smaller file size
- Python ecosystem standard
For submission, ADaM datasets are converted to SAS .xpt format per FDA requirements.
12.5.2 output/
Generated TLF outputs:
output/
├── tlf-disposition.rtf
├── tlf-population.rtf
├── tlf-baseline.rtf
├── tlf-efficacy-ancova.rtf
├── tlf-ae-summary.rtf
└── tlf-ae-specific.rtf
RTF files are submission-ready and can be converted to PDF for review.
The output/ directory typically goes in .gitignore since outputs are generated from source code. Commit source, not generated artifacts.
12.6 Additional files
12.6.1 .gitignore
Exclude generated files from version control:
# Python
__pycache__/
*.py[cod]
.venv/
# Quarto
_book/
*.html
# Output (generated)
output/
# OS
.DS_Store
Thumbs.db
12.6.2 README.md
Project documentation:
# demo-py-esub
Analysis package for DEMO-001 clinical trial.
## Installation
```bash
git clone https://github.com/org/demo-py-esub.git
cd demo-py-esub
uv sync
```
## Usage
Generate all TLFs:
```bash
quarto render
```
Run tests:
```bash
uv run pytest tests/
```12.7 Benefits of this structure
Consistency:
- Every project follows the same layout
- Team members instantly know where files belong
- Reduces cognitive load
Reproducibility:
uv.lockpins all dependencies.python-versionspecifies Python version- Quarto renders from source every time
Automation:
uv syncrestores environmentquarto renderregenerates all outputspytestvalidates all functions
Compliance:
- Standard Python package can be built and distributed
- Tests provide validation evidence
- Documentation is built-in
The structure scales well. A project with 300 TLFs uses the same layout, just more files in analysis/ and src/.
12.8 Mixed language projects
If your organization uses both R and Python:
Separate projects:
clinical-trial-001/
├── r-package/ # R-based analyses
│ ├── DESCRIPTION
│ ├── R/
│ └── vignettes/
├── python-package/ # Python-based analyses
│ ├── pyproject.toml
│ ├── src/
│ └── analysis/
├── data/ # Shared ADaM datasets
└── output/ # Shared outputs
Why separate?
- Different build systems (devtools vs uv)
- Different dependency management (renv vs uv)
- Different testing frameworks (testthat vs pytest)
- Simpler to maintain
Share data and outputs, not source code.
John Carmack noted: “It’s almost always a mistake to mix languages in a single project.”
Mixing languages in a single project increases the complexity of dependency management, testing, and build processes. It can lead to confusion about which tools to use for specific tasks and complicate collaboration among team members who may specialize in different languages.
Keep Python projects pure Python. Keep R projects pure R. Share data and outputs, not codebases.
12.9 What’s next
You’ve learned the recommended structure for analysis packages.
The next part covers eCTD submission:
- Regulatory requirements for program submission
- Using pkglite to pack Python packages
- Creating submission packages
- Verifying reproducibility with dry runs
With this structure in place, you’re ready to prepare submission-ready deliverables.