12 Package structure

Objective

Learn the recommended directory structure for clinical analysis packages. Understand how to organize Python code, Quarto documents, data, and outputs in a reproducible, submission-ready layout.

12.1 Core principle

Organize clinical analysis projects as valid Python packages that leverage the entire Python packaging toolchain (especially uv).

Additionally, integrate essential components for clinical reporting:

Quarto documents for reproducible analysis
ADaM datasets as inputs
Generated TLFs as outputs

This hybrid structure combines Python package best practices with clinical trial deliverable requirements.

12.2 Complete example structure

A clinical analysis package requires these essential components:

demo-py-esub/
├── pyproject.toml
├── .python-version
├── uv.lock
├── README.md
├── .gitignore
├── _quarto.yml
├── index.qmd
├── src/
│   └── demo001/
│       ├── __init__.py
│       ├── utils.py
│       ├── baseline.py
│       ├── efficacy.py
│       ├── population.py
│       └── safety.py
├── analysis/
│   ├── tlf-01-disposition.qmd
│   ├── tlf-02-population.qmd
│   ├── tlf-03-baseline.qmd
│   ├── tlf-04-efficacy-ancova.qmd
│   ├── tlf-05-ae-summary.qmd
│   └── tlf-06-specific.qmd
├── data/
│   ├── adsl.parquet
│   ├── adae.parquet
│   ├── adlbc.parquet
│   ├── advs.parquet
│   └── adtte.parquet
├── output/
│   ├── tlf-disposition.rtf
│   ├── tlf-population.rtf
│   ├── tlf-baseline.rtf
│   ├── tlf-efficacy-ancova.rtf
│   ├── tlf-ae-summary.rtf
│   └── tlf-ae-specific.rtf
└── tests/
    ├── __init__.py
    ├── test_utils.py
    ├── test_baseline.py
    └── data/
        └── adsl_subset.parquet

This structure satisfies both Python packaging standards and regulatory submission requirements.

In R terms, this combines:

R package structure (DESCRIPTION, R/, tests/)
Analysis project layout (vignettes/, data/, output/)

12.3 Python package components

The Python package portion follows the Python Packaging User Guide.

12.3.1 `pyproject.toml`

The single source of truth for project configuration:

[project]
name = "demo001"
version = "0.1.0"
description = "Analysis package for DEMO-001 study"
readme = "README.md"
requires-python = ">=3.13"
dependencies = [
    "plotnine>=0.15.1",
    "polars>=1.35.2",
    "rtflite>=1.1.0",
]

[dependency-groups]
dev = [
    "mypy>=1.18.2",
    "pytest>=9.0.1",
    "ruff>=0.14.5",
]

[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"

Key sections:

[project]: Package metadata
[project].dependencies: Runtime dependencies for analysis
[dependency-groups.dev]: Development tools (testing, linting)
[build-system]: How to build the package

12.3.2 `.python-version`

Specifies the exact Python version:

3.14.0

Created by uv python pin 3.14.0.

Important

Use the full MAJOR.MINOR.PATCH version (for example, 3.14.0), not just 3.14. This prevents drift as new patch versions are released.

12.3.3 `uv.lock`

Lock file with exact dependency versions:

version = 1
requires-python = ">=3.13"

[[package]]
name = "polars"
version = "1.35.1"
source = { registry = "https://pypi.org/simple" }
...

This file is auto-generated by uv sync and uv lock.

Never edit manually. Commit to version control.

12.3.4 `src/demo001/`

Study-specific Python functions go here.

Following the src/ layout (recommended):

src/
└── demo001/
    ├── __init__.py         # Package initialization
    ├── utils.py            # Utility functions
    ├── baseline.py         # Baseline characteristics
    ├── efficacy.py         # Efficacy analysis
    ├── population.py       # Population analysis
    └── safety.py           # Safety analysis

Why src/ layout?

Prevents accidental imports from development directory
Forces proper package installation
Industry best practice

Each module contains related functions. For example, utils.py:

"""Utility functions for formatting and calculations."""

def fmt_num(x: float, digits: int = 1, width: int = 5) -> str:
    """Format a number with specified digits and width.

    Parameters
    ----------
    x : float
        Number to format
    digits : int
        Number of decimal places
    width : int
        Total width of formatted string

    Returns
    -------
    str
        Formatted number string

    Examples
    --------
    >>> fmt_num(12.345, digits=2, width=6)
    ' 12.35'
    """
    return f"{x:>{width}.{digits}f}"

Document all functions with docstrings following the NumPy docstring standard.

12.3.5 `tests/`

Validation tests using pytest:

tests/
├── __init__.py
├── test_utils.py           # Test utility functions
├── test_baseline.py        # Test baseline functions
└── data/                   # Test data fixtures
    └── adsl_subset.parquet

Example test:

# tests/test_utils.py
from demo001.utils import fmt_num

def test_fmt_num_basic():
    """Test basic number formatting."""
    assert fmt_num(12.345, digits=2, width=6) == " 12.35"

def test_fmt_num_padding():
    """Test width padding."""
    assert fmt_num(1.2, digits=1, width=5) == "  1.2"

Run tests:

uv run pytest

Note

For clinical submissions, high test coverage demonstrates code quality. Aim for >80% coverage for critical functions.

12.4 Quarto project components

The Quarto portion enables reproducible report generation.

12.4.1 `_quarto.yml`

Quarto project configuration:

project:
  type: book

book:
  title: "DEMO-001 Analysis Results"
  chapters:
    - index.qmd
    - analysis/tlf-01-disposition.qmd
    - analysis/tlf-02-population.qmd
    - analysis/tlf-03-baseline.qmd
    - analysis/tlf-04-efficacy-ancova.qmd
    - analysis/tlf-05-ae-summary.qmd
    - analysis/tlf-06-specific.qmd

format:
  html:
    theme: cosmo

This configures Quarto to render all analysis documents as a book.

12.4.2 `index.qmd`

Landing page for the Quarto book:

---
title: "DEMO-001 Clinical Study Report"
---

## Overview

This analysis package contains Tables, Listings, and Figures (TLFs)
for the DEMO-001 clinical trial.

## Study Information

- Protocol: DEMO-001
- Phase: III
- Indication: [Disease]
- Primary Endpoint: [Endpoint]

## Analysis Programs

The following TLFs are included:

- **Disposition**: Patient disposition table
- **Population**: Analysis population summary
- **Baseline**: Baseline characteristics
- **Efficacy**: Primary efficacy analysis (ANCOVA)
- **AE Summary**: Adverse events summary
- **AE Specific**: Specific adverse events

12.4.3 `analysis/`

Analysis scripts as Quarto documents:

analysis/
├── tlf-01-disposition.qmd
├── tlf-02-population.qmd
├── tlf-03-baseline.qmd
├── tlf-04-efficacy-ancova.qmd
├── tlf-05-ae-summary.qmd
└── tlf-06-specific.qmd

Each .qmd file:

Loads required data
Performs analysis
Generates formatted output
Exports to RTF for submission

Example structure:

---
title: "Table 14.1.1 - Disposition of Patients"
---

## Load Data

```{python}
import polars as pl
from demo001.utils import fmt_num

adsl = pl.read_parquet("data/adsl.parquet")
```

## Analysis

```{python}
# Calculate disposition counts
disposition = adsl.group_by("TRTA").agg([
    pl.len().alias("N"),
    pl.col("EOSSTT").filter(pl.col("EOSSTT") == "COMPLETED").count().alias("Completed")
])
```

## Output

```{python}
from rtflite import Table
# Generate RTF table...
```

Warning

For final submissions, convert .qmd files to .py scripts. Covered in Chapter 15.

12.5 Data and output directories

12.5.1 `data/`

Input datasets in Parquet format:

data/
├── adsl.parquet            # Subject-level analysis dataset
├── adae.parquet            # Adverse events
├── adlbc.parquet           # Lab chemistry
├── advs.parquet            # Vital signs
└── adtte.parquet           # Time-to-event

Why Parquet?

Faster than CSV for large datasets
Preserves data types
Smaller file size
Python ecosystem standard

For submission, ADaM datasets are converted to SAS .xpt format per FDA requirements.

12.5.2 `output/`

Generated TLF outputs:

output/
├── tlf-disposition.rtf
├── tlf-population.rtf
├── tlf-baseline.rtf
├── tlf-efficacy-ancova.rtf
├── tlf-ae-summary.rtf
└── tlf-ae-specific.rtf

RTF files are submission-ready and can be converted to PDF for review.

The output/ directory typically goes in .gitignore since outputs are generated from source code. Commit source, not generated artifacts.

12.6 Additional files

12.6.1 `.gitignore`

Exclude generated files from version control:

# Python
__pycache__/
*.py[cod]
.venv/

# Quarto
_book/
*.html

# Output (generated)
output/

# OS
.DS_Store
Thumbs.db

12.6.2 `README.md`

Project documentation:

# demo-py-esub

Analysis package for DEMO-001 clinical trial.

## Installation

```bash
git clone https://github.com/org/demo-py-esub.git
cd demo-py-esub
uv sync
```

## Usage

Generate all TLFs:

```bash
quarto render
```

Run tests:

```bash
uv run pytest tests/
```

12.7 Benefits of this structure

Consistency:

Every project follows the same layout
Team members instantly know where files belong
Reduces cognitive load

Reproducibility:

uv.lock pins all dependencies
.python-version specifies Python version
Quarto renders from source every time

Automation:

uv sync restores environment
quarto render regenerates all outputs
pytest validates all functions

Compliance:

Standard Python package can be built and distributed
Tests provide validation evidence
Documentation is built-in

Note

The structure scales well. A project with 300 TLFs uses the same layout, just more files in analysis/ and src/.

12.8 Mixed language projects

If your organization uses both R and Python:

Separate projects:

clinical-trial-001/
├── r-package/              # R-based analyses
│   ├── DESCRIPTION
│   ├── R/
│   └── vignettes/
├── python-package/         # Python-based analyses
│   ├── pyproject.toml
│   ├── src/
│   └── analysis/
├── data/                   # Shared ADaM datasets
└── output/                 # Shared outputs

Why separate?

Different build systems (devtools vs uv)
Different dependency management (renv vs uv)
Different testing frameworks (testthat vs pytest)
Simpler to maintain

Share data and outputs, not source code.

Mixing programming languages in a single project is often a mistake

John Carmack noted: “It’s almost always a mistake to mix languages in a single project.”

Mixing languages in a single project increases the complexity of dependency management, testing, and build processes. It can lead to confusion about which tools to use for specific tasks and complicate collaboration among team members who may specialize in different languages.

Keep Python projects pure Python. Keep R projects pure R. Share data and outputs, not codebases.

12.9 What’s next

You’ve learned the recommended structure for analysis packages.

The next part covers eCTD submission:

Regulatory requirements for program submission
Using pkglite to pack Python packages
Creating submission packages
Verifying reproducibility with dry runs

With this structure in place, you’re ready to prepare submission-ready deliverables.

12.1 Core principle

12.2 Complete example structure

12.3 Python package components

12.3.1 pyproject.toml

12.3.2 .python-version

12.3.3 uv.lock

12.3.4 src/demo001/

12.3.5 tests/

12.4 Quarto project components

12.4.1 _quarto.yml

12.4.2 index.qmd

12.4.3 analysis/

12.5 Data and output directories

12.5.1 data/

12.5.2 output/

12.6 Additional files

12.6.1 .gitignore

12.6.2 README.md

12.7 Benefits of this structure

12.8 Mixed language projects

12.9 What’s next

12.3.1 `pyproject.toml`

12.3.2 `.python-version`

12.3.3 `uv.lock`

12.3.4 `src/demo001/`

12.3.5 `tests/`

12.4.1 `_quarto.yml`

12.4.2 `index.qmd`

12.4.3 `analysis/`

12.5.1 `data/`

12.5.2 `output/`

12.6.1 `.gitignore`

12.6.2 `README.md`