15 Submission package
Learn how to pack Python analysis packages into eCTD-ready text files. Know the workflow for converting Quarto documents to Python scripts and organizing files in Module 5 structure.
15.1 Prerequisites
This chapter uses two demo repositories:
Analysis project: demo-py-esub
git clone https://github.com/elong0527/demo-py-esub.gitSubmission package: demo-py-ectd
git clone https://github.com/elong0527/demo-py-ectd.gitWe assume paths demo-py-esub/ and demo-py-ectd/ below.
Install pkglite:
pkglite is not needed as a project dependency. Use uvx to run without installation:
uvx pkglite --helpOr install globally with pipx:
pipx install pkgliteuvx and pipx both run command-line tools in isolated environments. Use uvx for one-off commands or pipx for frequently used tools.
15.2 The whole game
The eCTD Module 5 structure for Python submissions:
demo-py-ectd/m5/datasets/ectddemo/analysis/adam/
├── datasets/
│ ├── adsl.xpt
│ ├── adae.xpt
│ ├── ...
│ ├── adrg.pdf
│ ├── analysis-results-metadata.pdf
│ ├── define.xml
│ └── define2-0-0.xsl
└── programs/
├── py0pkgs.txt
├── tlf-01-disposition.txt
├── tlf-02-population.txt
├── tlf-03-baseline.txt
├── tlf-04-efficacy.txt
├── tlf-05-ae-summary.txt
└── tlf-06-ae-specific.txt
datasets/ folder:
- ADaM datasets in
.xptformat (SAS transport) define.xmlmetadata (created by Pinnacle 21 or similar tools)- ADRG and ARM documentation
programs/ folder:
py0pkgs.txt: Packed Python package(s)tlf-*.txt: Individual analysis programs
All files in programs/ must be ASCII text files.
File naming conventions:
- Use lowercase letters only
- No underscores or special characters
- Use hyphens for separators
- No executable extensions (
.py,.exe)
15.3 Packing Python packages with py-pkglite
15.3.1 Create .pkgliteignore
First, create a .pkgliteignore file to exclude unnecessary files:
uvx pkglite use demo-py-esub/✓ Created .pkgliteignore in /home/user/demo-py-esub
This generates demo-py-esub/.pkgliteignore with default exclusions:
# Git
.git/
# OS
.DS_Store
Thumbs.db
# Python
__pycache__/
*.py[cod]
.venv/
# R
.Rproj.user/
.Rhistory
.RData
.Ruserdata
# Quarto
.quarto/
The .pkgliteignore file follows gitignore syntax.
What to exclude:
- Generated files (
__pycache__,.venv) - Data and outputs (submitted separately)
- Version control metadata
- Quarto book content (if converted to scripts separately)
What to include:
- Source code (
src/) - Configuration (
pyproject.toml,uv.lock,.python-version) - Tests (
tests/) - Essential documentation for package installation
pkglite uses content-based file classification. Text files are packed directly. Binary files trigger warnings.
15.3.2 Pack the package
Pack the analysis package into a text file:
uvx pkglite pack demo-py-esub/ \
-o demo-py-ectd/m5/datasets/ectddemo/analysis/adam/programs/py0pkgs.txtPacking demo-py-esub
Reading _quarto.yml
Reading uv.lock
Reading .pkgliteignore
Reading pyproject.toml
Reading index.qmd
Reading README.md
Reading .gitignore
Reading .python-version
Reading analysis/tlf-05-ae-summary.qmd
Reading analysis/tlf-02-population.qmd
Reading analysis/tlf-03-baseline.qmd
Reading analysis/.gitignore
Reading analysis/tlf-06-specific.qmd
Reading analysis/tlf-01-disposition.qmd
Reading analysis/tlf-04-efficacy-ancova.qmd
Reading tests/test_utils.py
Reading tests/__init__.py
Reading output/tlf_ae_specific.rtf
Reading output/tlf_baseline.rtf
Reading output/tlf_population.rtf
Reading output/tlf_disposition.rtf
Reading output/tlf_ae_summary.rtf
Reading output/tlf_efficacy_ancova.rtf
Reading .github/.gitignore
Reading .github/workflows/quarto-publish.yml
Reading data/adae.parquet
Reading data/adlbhy.parquet
Reading data/adsl.parquet
Reading data/adtte.parquet
Reading data/adlbc.parquet
Reading data/adlbh.parquet
Reading data/advs.parquet
Reading src/demo001/baseline.py
Reading src/demo001/population.py
Reading src/demo001/__init__.py
Reading src/demo001/utils.py
Reading src/demo001/safety.py
Reading src/demo001/efficacy.py
✓ Packed 1 packages into /home/user/demo-py-ectd/m5/datasets/ectddemo/analysis/adam/programs/py0pkgs.txt
This creates a single text file containing:
pyproject.toml.python-versionuv.lock- All files in
src/ - All files in
tests/
15.3.3 Inspect the packed file
View the first few lines:
head -n 20 demo-py-ectd/m5/datasets/ectddemo/analysis/adam/programs/py0pkgs.txt# Generated by py-pkglite: do not edit by hand
# Use `pkglite unpack` to restore the packages
Package: demo-py-esub
File: _quarto.yml
Format: text
Content:
project:
type: book
book:
title: "DEMO-001 Analysis Results"
chapters:
- index.qmd
- analysis/tlf-01-disposition.qmd
- analysis/tlf-02-population.qmd
- analysis/tlf-03-baseline.qmd
- analysis/tlf-04-efficacy-ancova.qmd
- analysis/tlf-05-ae-summary.qmd
- analysis/tlf-06-specific.qmd
15.3.4 Packing multiple packages
If you have dependencies in private repositories:
uvx pkglite pack internal-utils/ demo-py-esub/ \
-o demo-py-ectd/m5/datasets/ectddemo/analysis/adam/programs/py0pkgs.txtPackages are packed in order specified. Always pack dependencies first (low-level before high-level).
Unpacking will restore packages in the same order. Depending on how you reinstall them, the order may matter.
15.4 Converting Quarto to Python scripts
Analysis programs must be plain Python scripts, not Quarto documents.
15.4.1 The conversion workflow
For each .qmd file in analysis/:
- Render to verify it works
- Convert
.qmdto.ipynb(Jupyter notebook) - Convert
.ipynbto.py(Python script) - Clean up comments and formatting
- Save as
.txtfile
15.4.2 Automated conversion script
This shell script automates the conversion:
#!/bin/bash
cd demo-py-esub/
uv sync
source .venv/bin/activate
convert_analysis() {
local analysis_name=$1
local analysis_path="analysis/$analysis_name.qmd"
local output_path="$HOME/demo-py-ectd/m5/datasets/ectddemo/analysis/adam/programs"
# Render .qmd to verify it works
quarto render "$analysis_path" --quiet
# Convert .qmd to .ipynb
quarto convert "$analysis_path"
# Convert .ipynb to .py using nbconvert
uvx --from nbconvert jupyter-nbconvert \
--to python "analysis/$analysis_name.ipynb" \
--output "$output_path/$analysis_name.py"
# Remove all comments (lines starting with #)
awk '!/^#/' "$output_path/$analysis_name.py" > temp && \
mv temp "$output_path/$analysis_name.py"
# Consolidate consecutive blank lines
awk 'NF {p = 0} !NF {p++} p < 2' "$output_path/$analysis_name.py" > temp && \
mv temp "$output_path/$analysis_name.py"
# Clean up intermediate files
rm "analysis/$analysis_name.ipynb"
# Format with ruff
uvx ruff format "$output_path/$analysis_name.py"
# Rename .py to .txt (no executable extension)
mv "$output_path/$analysis_name.py" "$output_path/$analysis_name.txt"
}
# Convert all analysis files
for qmd_file in analysis/*.qmd; do
analysis=$(basename "$qmd_file" .qmd)
convert_analysis "$analysis"
doneSave as convert_analyses.sh and run:
chmod +x convert_analyses.sh
./convert_analyses.sh15.4.3 Add reviewer instructions
Optionally, add a header to each .txt file:
# Note to Reviewer
#
# To rerun this analysis program, please refer to the ADRG appendix.This helps reviewers understand how to execute the code.
Automated insertion:
header='# Note to Reviewer
#
# To rerun this analysis program, please refer to the ADRG appendix.
#
'
add_header() {
local file=$1
echo "$header" | cat - "$file" > temp && mv temp "$file"
}
for txt_file in demo-py-ectd/m5/datasets/ectddemo/analysis/adam/programs/tlf-*.txt; do
add_header "$txt_file"
done15.5 Verifying ASCII compliance
Non-ASCII characters (curly quotes, em dashes, Unicode symbols) will cause submission issues. Always verify before submission.
Currently, there is no built-in py-pkglite utility to check for ASCII compliance, so contributions are welcome! An example verification script is available in rtflite: verify_ascii.py.
15.6 Compliance checklist
Before finalizing the submission package:
File naming:
- [ ] All filenames are lowercase
- [ ] No underscores or special characters
- [ ] No `.py` extensions (use `.txt`)
Content:
- [ ] All `.txt` files are ASCII compliant
- [ ] Python package unpacks correctly
- [ ] Analysis programs run without errors
Structure:
- [ ] Files in correct eCTD Module 5 directories
- [ ] `py0pkgs.txt` in `programs/`
- [ ] Analysis programs in `programs/`
- [ ] ADaM datasets in `datasets/`
Documentation:
- [ ] ADRG includes Python version
- [ ] ADRG includes package versions
- [ ] ADRG includes reproduction instructions
- [ ] ARM links programs to outputs15.7 Updating ADRG
The ADRG must document the Python environment and provide reproduction instructions.
15.7.1 Section: Macro Programs
Example content:
7.X Macro Programs
Submitted Python programs follow the naming pattern `tlf-##-*.txt`.
All study-specific Python functions are saved in the `py0pkgs.txt` file.
The recommended steps to unpack and use these functions are described in the Appendix.
The table below contains the software version and program metadata:
**Analysis programs table:**
| Program Name | Output Table | Title |
|--------------|--------------|-------|
| tlf-01-disposition.txt | Table 14.1.1 | Disposition of Patients |
| tlf-02-population.txt | Table 14.1.2 | Analysis Population |
| tlf-03-baseline.txt | Table 14.1.3 | Baseline Characteristics |
| tlf-04-efficacy.txt | Table 14.2.1 | Efficacy Analysis (ANCOVA) |
| tlf-05-ae-summary.txt | Table 14.3.1 | Adverse Events Summary |
| tlf-06-ae-specific.txt | Table 14.3.2 | Specific Adverse Events |
**Python environment table:**
| Software | Version | Description |
|----------|---------|-------------|
| Python | 3.14.0 | Programming language |
| uv | 0.9.9 | Package manager |
**Python packages table:**
| Package | Version | Description |
|---------|---------|-------------|
| polars | 1.35.1 | Data manipulation |
| plotnine | 0.15.1 | Data visualization |
| rtflite | 1.1.0 | RTF table generation |
| statsmodels | 0.14.0 | Statistical models |
**Proprietary packages table:**
| Package | Version | Description |
|---------|---------|-------------|
| demo001 | 0.1.0 | DEMO-001 study analysis functions |Package versions can be extracted from uv.lock or by running uv pip list.
15.7.2 Appendix: Reproduction instructions
Provide step-by-step instructions for reviewers:
Appendix: Instructions to Execute Analysis Programs
1. Install uv
Follow instructions at https://docs.astral.sh/uv/getting-started/installation/
2. Create working directory
Create a temporary directory (e.g., `C:/tempwork/` on Windows).
Copy all files from `m5/datasets/ectddemo/analysis/adam/` to this directory.
3. Unpack and install Python packages
Navigate to the working directory and run:
```bash
uvx pkglite unpack programs/py0pkgs.txt -o .
```
This restores the package structure in the current directory.
Install the package:
```bash
cd demo-py-esub
uv sync
```
This installs all dependencies and the demo001 package.
4. Copy data to the correct location
Ensure the `datasets/` folder with ADaM datasets is in the working directory.
5. Execute analysis programs
Run each program in order:
```bash
cd demo-py-esub
source .venv/bin/activate # macOS/Linux
# or .venv\Scripts\activate # Windows
python ../programs/tlf-01-disposition.txt
python ../programs/tlf-02-population.txt
python ../programs/tlf-03-baseline.txt
python ../programs/tlf-04-efficacy.txt
python ../programs/tlf-05-ae-summary.txt
python ../programs/tlf-06-ae-specific.txt
```
Each program generates RTF output in the specified output directory.Tailor the instructions to your organization’s environment. Include any special configuration (e.g., proxy settings, internal package indexes).
15.8 Updating ARM
The Analysis Results Metadata (ARM) documents the relationship between programs and outputs.
15.8.1 Section 2: Analysis Results Metadata Summary
Example table:
| Table Reference | Table Title | Programming Language | Program Name | Input Files |
|---|---|---|---|---|
| Table 14.1.1 | Disposition of Patients | Python | tlf-01-disposition.txt | adsl.xpt |
| Table 14.1.2 | Analysis Population | Python | tlf-02-population.txt | adsl.xpt |
| Table 14.1.3 | Baseline Characteristics | Python | tlf-03-baseline.txt | adsl.xpt |
15.8.2 Section 3: Analysis Results Metadata Details
For each table, provide:
Table Reference: Table 14.1.1
Analysis Result: Disposition counts by treatment group
Analysis Reason: Describe study population
Analysis Purpose: Primary
Programming Statements: (Python version 3.14.0), [programs/tlf-01-disposition.txt]15.9 Testing the submission package
Before finalizing, test the complete workflow:
- Unpack
py0pkgs.txtin a clean directory - Install packages per ADRG instructions
- Run each analysis program
- Verify outputs match original results
This is covered in detail in Chapter 16.
15.10 What’s next
You’ve learned how to prepare Python submission packages.
The next chapter covers dry run testing:
- Simulating the reviewer workflow
- Unpacking and installing from text files
- Reproducing analysis results
- Verifying compliance
Dry run testing ensures your submission package works correctly.