Model Management and Resource Control
This guide explains how to manage deep learning models and control GPU/CPU resources in HiTMicTools.
Overview
HiTMicTools uses multiple neural network models for different analysis tasks:
Focus Restoration: NAFNet models for brightfield and fluorescence channels
Segmentation: MonaiUnet or RT-DETR for cell/object detection
Classification: ResNet-based classifiers for cell type and quality
PI Classification: Models for propidium iodide staining detection
Out-of-Focus Detection: Quality control models
1. Model Collections (Recommended Approach)
Model collections are ZIP bundles that contain all required models and configurations for a complete analysis pipeline.
Advantages of Model Collections
Simplified Deployment: Single file contains all models
Version Consistency: All models are compatible and tested together
Tracking Support: Includes btrack configuration files
Easy Distribution: Share one file instead of multiple directories
Reproducibility: Ensures everyone uses the same model versions
Available Model Collections
Collection |
Pipeline |
Tracking |
Description |
|---|---|---|---|
|
ASCT_focusrestore |
Yes |
Full pipeline with tracking |
|
ASCT_scsegm |
Optional |
RT-DETR instance segmentation |
|
Various |
No |
With out-of-focus detection |
Using Model Collections
Basic Configuration
models:
model_collection: "./models/model_collection_tracking_20250529.zip"
That’s it! The pipeline automatically extracts and loads all required models.
What’s Inside a Model Collection?
model_collection_tracking_20250529.zip
├── bf_focus_restorer/
│ ├── model.pth # Brightfield focus model weights
│ └── model_metadata.json # Model architecture config
├── fl_focus_restorer/
│ ├── model.pth # Fluorescence focus model weights
│ └── model_metadata.json
├── segmentation/
│ ├── model.pth # Segmentation model weights
│ └── model_metadata.json
├── cell_classifier/
│ ├── model.onnx # Cell classifier (ONNX format)
│ └── model_metadata.json
├── pi_classifier/
│ ├── model.joblib # PI classifier (scikit-learn)
│ └── model_metadata.json
└── tracking/
└── tracking_config.json # btrack configuration
Creating Custom Model Collections
Using the CLI (Recommended)
Create model bundles directly from the command line:
# 1. Create a configuration file describing your models
cat > models_info.yml << EOF
bf_focus:
model_path: "./models/bf_focus/model.pth"
model_metadata: "./models/bf_focus/config.json"
inferer_args:
scale_method: "range01"
patch_size: 256
fl_focus:
model_path: "./models/fl_focus/model.pth"
model_metadata: "./models/fl_focus/config.json"
segmentation:
model_path: "./models/segmentation/model.pth"
model_metadata: "./models/segmentation/config.json"
cell_classifier:
model_path: "./models/classifier/model.onnx"
model_metadata: "./models/classifier/config.json"
pi_classification:
model_path: "./models/pi_classifier/model.joblib"
tracker:
config_path: "./tracking/config.json" # Optional
EOF
# 2. Create the bundle (date will be auto-inserted)
hitmictools bundle -i models_info.yml -o my_custom_collection.zip
# Creates: my_custom_collection_20251218.zip
# Disable auto-dating if you want exact filename
hitmictools bundle -i models_info.yml -o my_bundle.zip --no-auto-date
The bundle will include:
All model files with standardized naming
Metadata JSON files for each model
Internal
config.ymlwith creation timestampOptional tracker configuration
Using the Standalone Script (Legacy)
For backward compatibility, the script interface is still available:
# Explicit output path (no auto-dating)
python scripts/create_model_bundle.py create \
-i models_info.yml \
-o my_custom_collection.zip
# Auto-dated output
python scripts/create_model_bundle.py create-mbundle \
-i models_info.yml \
-d ./output_directory/
# Creates: ./output_directory/model_collection_20251218.zip
2. Individual Model Specification (Advanced)
For development, testing, or custom pipelines, you can specify each model individually.
Complete Individual Model Configuration
# Do NOT include model_collection when using individual models
bf_focus:
model_path: "/path/to/models/bf_focus/model.pth"
model_metadata: "/path/to/models/bf_focus/config.json"
inferer_args:
scale_method: "range01" # Scaling: "range01", "fixed_range", "none"
patch_size: 256 # Must be power of 2
overlap_ratio: 0.25 # 0.0-0.5, higher = smoother but slower
half_precision: true # Use FP16 for faster inference
scaler_args:
pmin: 1.0 # Percentile min for range01
pmax: 99.8 # Percentile max for range01
fl_focus:
model_path: "/path/to/models/fl_focus/model.pth"
model_metadata: "/path/to/models/fl_focus/config.json"
inferer_args:
scale_method: "fixed_range"
patch_size: 256
overlap_ratio: 0.25
half_precision: true
scaler_args:
bit_depth: 12 # For fixed_range scaling
segmentation:
model_path: "/path/to/models/segmentation/model.pth"
model_metadata: "/path/to/models/segmentation/config.json"
inferer_args:
scale_method: "none" # Already normalized
patch_size: 512
overlap_ratio: 0.25
half_precision: true
cell_classifier:
model_path: "/path/to/models/cell_classifier/model.onnx"
model_metadata: "/path/to/models/cell_classifier/config.json"
model_args:
batch_size: 512 # Adjust based on GPU memory
min_size: 128 # Minimum object size to classify
classes:
0: "single-cell"
1: "clump"
2: "noise"
3: "off-focus"
4: "joint-cell"
pi_classification:
pi_classifier_path: "/path/to/models/pi_classifier/model.joblib"
# scikit-learn model, no additional config needed
# Optional: Out-of-focus detector
oof_detector:
model_path: "/path/to/models/oof_detector/model.pth"
model_metadata: "/path/to/models/oof_detector/config.json"
Model Loading Details
The pipeline loads models using load_model_fromdict() which supports:
Valid model keys:
bf_focus- Brightfield focus restorationfl_focus- Fluorescence focus restorationsegmentation- Cell segmentationcell_classifier- Cell type/quality classificationpi_classification- PI staining classificationoof_detector- Out-of-focus detectionsc_segmenter- Single-cell instance segmentation (RT-DETR)
Supported model formats:
PyTorch (
.pth,.pt) - Most modelsONNX (
.onnx) - Cross-platform inferencescikit-learn (
.joblib) - Traditional ML models
3. Model Architectures
Focus Restoration Models
NAFNet (Nonlinear Activation Free Network)
Architecture: U-Net style with NAF blocks
Input: Single-channel grayscale (brightfield or fluorescence)
Output: Focus-restored image
Typical size: ~10-50 MB
Inference: ~0.5-2 seconds per frame (GPU)
MonaiUnet
Architecture: MONAI U-Net
Alternative to NAFNet
Similar performance, different training approach
Segmentation Models
MonaiUnet for Segmentation
Architecture: MONAI U-Net with instance segmentation head
Input: Single-channel brightfield
Output: Instance segmentation masks
Typical size: ~50-100 MB
RT-DETR (Real-Time Detection Transformer)
Architecture: Transformer-based object detection
Used in ASCT_scsegm pipeline
Input: Single-channel brightfield
Output: Bounding boxes + masks
Typical size: ~100-200 MB
Better for crowded/overlapping cells
Classification Models
FlexResNet
Architecture: Custom ResNet variant
Input: Cropped cell images (typically 128x128)
Output: Class probabilities
Classes: single-cell, clump, noise, off-focus, joint-cell
Typical size: ~20-50 MB
PI Classifier
Architecture: Random Forest or Logistic Regression
Input: Intensity features from FL channel
Output: PI positive/negative
Typical size: <1 MB
4. Resource Management
HiTMicTools includes sophisticated GPU/CPU memory management for multi-process environments.
The ReserveResource System
ReserveResource is a context manager that prevents GPU memory over-subscription:
from HiTMicTools.resource_management.reserveresource import ReserveResource
import torch
# Reserve 8 GB of GPU memory
with ReserveResource(torch.device("cuda:0"), required_gb=8.0, logger=logger):
# Run your analysis here
# Other processes will queue if insufficient memory
run_pipeline()
How It Works
Booking System: Creates JSON file tracking memory usage
File location:
TMPDIR/memory_bookings_cuda0.jsonTracks total reserved memory per device
Queueing: When memory unavailable, processes wait in queue
Fair allocation (first-come, first-served)
Periodic checks for available memory
Automatic cleanup on exit
Cross-Platform: Works on macOS (MPS), Linux (CUDA), Windows (CUDA/CPU)
Configuration for Multi-Process
When running multiple processes:
pipeline_setup:
parallel_processing: true
num_workers: 3 # Number of concurrent processes
# Internally, each process reserves memory:
# - Focus restoration: ~4-6 GB
# - Segmentation: ~6-8 GB
# - Classification: ~2-4 GB
# Total per process: ~12-18 GB peak
Important: Set num_workers based on available VRAM:
16 GB GPU:
num_workers: 1-224 GB GPU:
num_workers: 2-340 GB GPU:
num_workers: 3-4
Memory Logging
Track memory usage during processing:
from HiTMicTools.resource_management.memlogger import MemoryLogger
logger = MemoryLogger(log_dir="./logs", prefix="analysis")
# Log memory at specific points
logger.info("Starting segmentation", show_memory=True, cuda=True)
# Output: [INFO] Starting segmentation | RAM: 12.3 GB | VRAM: 8.5 GB
Cleanup and Cache Management
All models inherit from BaseModel which provides cleanup:
# Manual cleanup (usually automatic)
model.cleanup()
# This calls:
# - torch.cuda.empty_cache()
# - del self.model
# - gc.collect()
The pipeline automatically calls cleanup between stages.
5. Model Performance and Optimization
Inference Speed Optimization
Use Half Precision (FP16)
inferer_args:
half_precision: true # 2x faster, ~50% less memory
Adjust Patch Size
inferer_args:
patch_size: 256 # Smaller = slower but less memory
# Options: 128, 256, 512, 1024
Reduce Overlap
inferer_args:
overlap_ratio: 0.125 # Less overlap = faster but more artifacts
# Range: 0.0 - 0.5
Batch Size for Classifiers
model_args:
batch_size: 1024 # Larger = faster but more memory
Typical Performance Metrics
For a 2048x2048 image, single frame:
Task |
GPU (RTX 4090) |
CPU |
VRAM |
|---|---|---|---|
Focus Restoration (BF) |
0.8s |
15s |
4 GB |
Focus Restoration (FL) |
0.8s |
15s |
4 GB |
Segmentation |
1.2s |
25s |
6 GB |
Classification (500 cells) |
0.3s |
2s |
2 GB |
Total per frame |
~3s |
~60s |
~12 GB peak |
Multi-frame movie (100 frames):
GPU: ~5-8 minutes
CPU: ~1.5-2 hours
Model Versioning and Reproducibility
Track model versions:
# In your config, add comments
models:
model_collection: "./models/model_collection_tracking_20250529.zip"
# Version: 2025-05-29
# Training date: 2025-05-15
# Training dataset: ASCT_batch_042
# Validation accuracy: 98.5%
Save model metadata:
# Model metadata JSON includes:
{
"model_type": "NAFNet",
"architecture": "unet_style",
"input_channels": 1,
"output_channels": 1,
"training_date": "2025-05-15",
"training_dataset": "ASCT_batch_042",
"validation_metrics": {
"psnr": 32.5,
"ssim": 0.95
}
}
6. Troubleshooting Models
Model Loading Errors
“Model file not found”
# Check paths
import os
print(os.path.exists("./models/model_collection.zip"))
# Use absolute paths
models:
model_collection: "/full/path/to/model_collection.zip"
“Invalid model metadata”
# Metadata must match model architecture
# Check model_metadata.json:
{
"model_type": "NAFNet", # Must match actual model
"input_channels": 1, # Must be correct
...
}
“CUDA out of memory”
# Solutions:
# 1. Enable half precision
inferer_args:
half_precision: true
# 2. Reduce patch size
inferer_args:
patch_size: 128
# 3. Reduce batch size
model_args:
batch_size: 256
# 4. Use fewer workers
pipeline_setup:
num_workers: 1
Model Performance Issues
Focus restoration artifacts
# Increase overlap
inferer_args:
overlap_ratio: 0.35 # Default 0.25
# Adjust scaling
scaler_args:
pmin: 0.5 # Less aggressive clipping
pmax: 99.9
Poor segmentation
# Check preprocessing
pipeline_setup:
focus_correction: true # Ensure enabled
method: "basicpy_fl" # Try different methods
# Verify correct channel
pipeline_setup:
reference_channel: 0 # Should be brightfield
Misclassification
# Adjust minimum object size
model_args:
min_size: 100 # Filter out smaller objects
# Check class definitions match training
classes:
0: "single-cell" # Must match model training
1: "clump"
...
7. Model Storage Best Practices
File Organization
Recommended structure:
models/
├── collections/
│ ├── model_collection_tracking_20250529.zip
│ ├── model_collection_scsegm_20251106.zip
│ └── README.md # Document model versions
├── individual/
│ ├── bf_focus/
│ │ ├── v1.0/
│ │ │ ├── model.pth
│ │ │ └── config.json
│ │ └── v2.0/
│ │ ├── model.pth
│ │ └── config.json
│ └── ...
└── experimental/
└── ... # Models under development
Version Control
DO:
Store model collections on shared filesystem or cloud storage
Document model versions in README
Tag configs with model version information
Keep checksums of model files for verification
DO NOT:
Commit large model files to git (use
.gitignore)Overwrite production models without versioning
Mix model versions in the same collection
Share models without documentation
Model Distribution
For sharing models:
# 1. Create bundle with CLI
hitmictools bundle -i models_info.yml -o model_collection_v1.0.zip
# Creates: model_collection_v1.0_20251218.zip (with auto-dating)
# 2. Calculate checksum
sha256sum model_collection_v1.0_20251218.zip > model_collection_v1.0_20251218.zip.sha256
# 3. Document
echo "Model Collection v1.0" > README.txt
echo "Creation date: 2025-12-18" >> README.txt
echo "Training date: 2025-05-29" >> README.txt
echo "Dataset: ASCT_training_set_042" >> README.txt
cat model_collection_v1.0_20251218.zip.sha256 >> README.txt
# 4. Share via cloud/network storage
# DO NOT email large files
Summary
Model management in HiTMicTools:
Use model collections for production (simplest approach)
Individual models for development and testing
ReserveResource for GPU memory management
MemoryLogger for monitoring resource usage
Half precision and patch size tuning for performance
Version control for reproducibility
For model training and development, see the experimental notebooks in experiments/.