# Model Management and Resource Control

This guide explains how to manage deep learning models and control GPU/CPU resources in HiTMicTools.

## Overview

HiTMicTools uses multiple neural network models for different analysis tasks:
- **Focus Restoration**: NAFNet models for brightfield and fluorescence channels
- **Segmentation**: MonaiUnet or RT-DETR for cell/object detection
- **Classification**: ResNet-based classifiers for cell type and quality
- **PI Classification**: Models for propidium iodide staining detection
- **Out-of-Focus Detection**: Quality control models

## 1. Model Collections (Recommended Approach)

Model collections are ZIP bundles that contain all required models and configurations for a complete analysis pipeline.

### Advantages of Model Collections

- **Simplified Deployment**: Single file contains all models
- **Version Consistency**: All models are compatible and tested together
- **Tracking Support**: Includes btrack configuration files
- **Easy Distribution**: Share one file instead of multiple directories
- **Reproducibility**: Ensures everyone uses the same model versions

### Available Model Collections

| Collection | Pipeline | Tracking | Description |
|------------|----------|----------|-------------|
| `model_collection_tracking_20250529.zip` | ASCT_focusrestore | Yes | Full pipeline with tracking |
| `model_collection_scsegm_20251106.zip` | ASCT_scsegm | Optional | RT-DETR instance segmentation |
| `model_collection_oof_20251014.zip` | Various | No | With out-of-focus detection |

### Using Model Collections

#### Basic Configuration

```yaml
models:
  model_collection: "./models/model_collection_tracking_20250529.zip"
```

That's it! The pipeline automatically extracts and loads all required models.

#### What's Inside a Model Collection?

```
model_collection_tracking_20250529.zip
├── bf_focus_restorer/
│   ├── model.pth                    # Brightfield focus model weights
│   └── model_metadata.json          # Model architecture config
├── fl_focus_restorer/
│   ├── model.pth                    # Fluorescence focus model weights
│   └── model_metadata.json
├── segmentation/
│   ├── model.pth                    # Segmentation model weights
│   └── model_metadata.json
├── cell_classifier/
│   ├── model.onnx                   # Cell classifier (ONNX format)
│   └── model_metadata.json
├── pi_classifier/
│   ├── model.joblib                 # PI classifier (scikit-learn)
│   └── model_metadata.json
└── tracking/
    └── tracking_config.json         # btrack configuration
```

### Creating Custom Model Collections

#### Using the CLI (Recommended)

Create model bundles directly from the command line:

```bash
# 1. Create a configuration file describing your models
cat > models_info.yml << EOF
bf_focus:
  model_path: "./models/bf_focus/model.pth"
  model_metadata: "./models/bf_focus/config.json"
  inferer_args:
    scale_method: "range01"
    patch_size: 256

fl_focus:
  model_path: "./models/fl_focus/model.pth"
  model_metadata: "./models/fl_focus/config.json"

segmentation:
  model_path: "./models/segmentation/model.pth"
  model_metadata: "./models/segmentation/config.json"

cell_classifier:
  model_path: "./models/classifier/model.onnx"
  model_metadata: "./models/classifier/config.json"

pi_classification:
  model_path: "./models/pi_classifier/model.joblib"

tracker:
  config_path: "./tracking/config.json"  # Optional
EOF

# 2. Create the bundle (date will be auto-inserted)
hitmictools bundle -i models_info.yml -o my_custom_collection.zip
# Creates: my_custom_collection_20251218.zip

# Disable auto-dating if you want exact filename
hitmictools bundle -i models_info.yml -o my_bundle.zip --no-auto-date
```

The bundle will include:
- All model files with standardized naming
- Metadata JSON files for each model
- Internal `config.yml` with creation timestamp
- Optional tracker configuration

#### Using the Standalone Script (Legacy)

For backward compatibility, the script interface is still available:

```bash
# Explicit output path (no auto-dating)
python scripts/create_model_bundle.py create \
    -i models_info.yml \
    -o my_custom_collection.zip

# Auto-dated output
python scripts/create_model_bundle.py create-mbundle \
    -i models_info.yml \
    -d ./output_directory/
# Creates: ./output_directory/model_collection_20251218.zip
```

## 2. Individual Model Specification (Advanced)

For development, testing, or custom pipelines, you can specify each model individually.

### Complete Individual Model Configuration

```yaml
# Do NOT include model_collection when using individual models

bf_focus:
  model_path: "/path/to/models/bf_focus/model.pth"
  model_metadata: "/path/to/models/bf_focus/config.json"
  inferer_args:
    scale_method: "range01"          # Scaling: "range01", "fixed_range", "none"
    patch_size: 256                  # Must be power of 2
    overlap_ratio: 0.25              # 0.0-0.5, higher = smoother but slower
    half_precision: true             # Use FP16 for faster inference
  scaler_args:
    pmin: 1.0                        # Percentile min for range01
    pmax: 99.8                       # Percentile max for range01

fl_focus:
  model_path: "/path/to/models/fl_focus/model.pth"
  model_metadata: "/path/to/models/fl_focus/config.json"
  inferer_args:
    scale_method: "fixed_range"
    patch_size: 256
    overlap_ratio: 0.25
    half_precision: true
  scaler_args:
    bit_depth: 12                    # For fixed_range scaling

segmentation:
  model_path: "/path/to/models/segmentation/model.pth"
  model_metadata: "/path/to/models/segmentation/config.json"
  inferer_args:
    scale_method: "none"             # Already normalized
    patch_size: 512
    overlap_ratio: 0.25
    half_precision: true

cell_classifier:
  model_path: "/path/to/models/cell_classifier/model.onnx"
  model_metadata: "/path/to/models/cell_classifier/config.json"
  model_args:
    batch_size: 512                  # Adjust based on GPU memory
    min_size: 128                    # Minimum object size to classify
  classes:
    0: "single-cell"
    1: "clump"
    2: "noise"
    3: "off-focus"
    4: "joint-cell"

pi_classification:
  pi_classifier_path: "/path/to/models/pi_classifier/model.joblib"
  # scikit-learn model, no additional config needed

# Optional: Out-of-focus detector
oof_detector:
  model_path: "/path/to/models/oof_detector/model.pth"
  model_metadata: "/path/to/models/oof_detector/config.json"
```

### Model Loading Details

The pipeline loads models using `load_model_fromdict()` which supports:

**Valid model keys:**
- `bf_focus` - Brightfield focus restoration
- `fl_focus` - Fluorescence focus restoration
- `segmentation` - Cell segmentation
- `cell_classifier` - Cell type/quality classification
- `pi_classification` - PI staining classification
- `oof_detector` - Out-of-focus detection
- `sc_segmenter` - Single-cell instance segmentation (RT-DETR)

**Supported model formats:**
- PyTorch (`.pth`, `.pt`) - Most models
- ONNX (`.onnx`) - Cross-platform inference
- scikit-learn (`.joblib`) - Traditional ML models

## 3. Model Architectures

### Focus Restoration Models

**NAFNet (Nonlinear Activation Free Network)**
- Architecture: U-Net style with NAF blocks
- Input: Single-channel grayscale (brightfield or fluorescence)
- Output: Focus-restored image
- Typical size: ~10-50 MB
- Inference: ~0.5-2 seconds per frame (GPU)

**MonaiUnet**
- Architecture: MONAI U-Net
- Alternative to NAFNet
- Similar performance, different training approach

### Segmentation Models

**MonaiUnet for Segmentation**
- Architecture: MONAI U-Net with instance segmentation head
- Input: Single-channel brightfield
- Output: Instance segmentation masks
- Typical size: ~50-100 MB

**RT-DETR (Real-Time Detection Transformer)**
- Architecture: Transformer-based object detection
- Used in ASCT_scsegm pipeline
- Input: Single-channel brightfield
- Output: Bounding boxes + masks
- Typical size: ~100-200 MB
- Better for crowded/overlapping cells

### Classification Models

**FlexResNet**
- Architecture: Custom ResNet variant
- Input: Cropped cell images (typically 128x128)
- Output: Class probabilities
- Classes: single-cell, clump, noise, off-focus, joint-cell
- Typical size: ~20-50 MB

**PI Classifier**
- Architecture: Random Forest or Logistic Regression
- Input: Intensity features from FL channel
- Output: PI positive/negative
- Typical size: <1 MB

## 4. Resource Management

HiTMicTools includes sophisticated GPU/CPU memory management for multi-process environments.

### The ReserveResource System

`ReserveResource` is a context manager that prevents GPU memory over-subscription:

```python
from HiTMicTools.resource_management.reserveresource import ReserveResource
import torch

# Reserve 8 GB of GPU memory
with ReserveResource(torch.device("cuda:0"), required_gb=8.0, logger=logger):
    # Run your analysis here
    # Other processes will queue if insufficient memory
    run_pipeline()
```

### How It Works

1. **Booking System**: Creates JSON file tracking memory usage
   - File location: `TMPDIR/memory_bookings_cuda0.json`
   - Tracks total reserved memory per device

2. **Queueing**: When memory unavailable, processes wait in queue
   - Fair allocation (first-come, first-served)
   - Periodic checks for available memory
   - Automatic cleanup on exit

3. **Cross-Platform**: Works on macOS (MPS), Linux (CUDA), Windows (CUDA/CPU)

### Configuration for Multi-Process

When running multiple processes:

```yaml
pipeline_setup:
  parallel_processing: true
  num_workers: 3              # Number of concurrent processes

# Internally, each process reserves memory:
# - Focus restoration: ~4-6 GB
# - Segmentation: ~6-8 GB
# - Classification: ~2-4 GB
# Total per process: ~12-18 GB peak
```

**Important**: Set `num_workers` based on available VRAM:
- 16 GB GPU: `num_workers: 1-2`
- 24 GB GPU: `num_workers: 2-3`
- 40 GB GPU: `num_workers: 3-4`

### Memory Logging

Track memory usage during processing:

```python
from HiTMicTools.resource_management.memlogger import MemoryLogger

logger = MemoryLogger(log_dir="./logs", prefix="analysis")

# Log memory at specific points
logger.info("Starting segmentation", show_memory=True, cuda=True)
# Output: [INFO] Starting segmentation | RAM: 12.3 GB | VRAM: 8.5 GB
```

### Cleanup and Cache Management

All models inherit from `BaseModel` which provides cleanup:

```python
# Manual cleanup (usually automatic)
model.cleanup()

# This calls:
# - torch.cuda.empty_cache()
# - del self.model
# - gc.collect()
```

The pipeline automatically calls cleanup between stages.

## 5. Model Performance and Optimization

### Inference Speed Optimization

**Use Half Precision (FP16)**
```yaml
inferer_args:
  half_precision: true    # 2x faster, ~50% less memory
```

**Adjust Patch Size**
```yaml
inferer_args:
  patch_size: 256         # Smaller = slower but less memory
  # Options: 128, 256, 512, 1024
```

**Reduce Overlap**
```yaml
inferer_args:
  overlap_ratio: 0.125    # Less overlap = faster but more artifacts
  # Range: 0.0 - 0.5
```

**Batch Size for Classifiers**
```yaml
model_args:
  batch_size: 1024        # Larger = faster but more memory
```

### Typical Performance Metrics

For a 2048x2048 image, single frame:

| Task | GPU (RTX 4090) | CPU | VRAM |
|------|----------------|-----|------|
| Focus Restoration (BF) | 0.8s | 15s | 4 GB |
| Focus Restoration (FL) | 0.8s | 15s | 4 GB |
| Segmentation | 1.2s | 25s | 6 GB |
| Classification (500 cells) | 0.3s | 2s | 2 GB |
| Total per frame | ~3s | ~60s | ~12 GB peak |

Multi-frame movie (100 frames):
- GPU: ~5-8 minutes
- CPU: ~1.5-2 hours

### Model Versioning and Reproducibility

**Track model versions:**
```yaml
# In your config, add comments
models:
  model_collection: "./models/model_collection_tracking_20250529.zip"
  # Version: 2025-05-29
  # Training date: 2025-05-15
  # Training dataset: ASCT_batch_042
  # Validation accuracy: 98.5%
```

**Save model metadata:**
```python
# Model metadata JSON includes:
{
  "model_type": "NAFNet",
  "architecture": "unet_style",
  "input_channels": 1,
  "output_channels": 1,
  "training_date": "2025-05-15",
  "training_dataset": "ASCT_batch_042",
  "validation_metrics": {
    "psnr": 32.5,
    "ssim": 0.95
  }
}
```

## 6. Troubleshooting Models

### Model Loading Errors

**"Model file not found"**
```python
# Check paths
import os
print(os.path.exists("./models/model_collection.zip"))

# Use absolute paths
models:
  model_collection: "/full/path/to/model_collection.zip"
```

**"Invalid model metadata"**
```python
# Metadata must match model architecture
# Check model_metadata.json:
{
  "model_type": "NAFNet",  # Must match actual model
  "input_channels": 1,      # Must be correct
  ...
}
```

**"CUDA out of memory"**
```yaml
# Solutions:
# 1. Enable half precision
inferer_args:
  half_precision: true

# 2. Reduce patch size
inferer_args:
  patch_size: 128

# 3. Reduce batch size
model_args:
  batch_size: 256

# 4. Use fewer workers
pipeline_setup:
  num_workers: 1
```

### Model Performance Issues

**Focus restoration artifacts**
```yaml
# Increase overlap
inferer_args:
  overlap_ratio: 0.35     # Default 0.25

# Adjust scaling
scaler_args:
  pmin: 0.5               # Less aggressive clipping
  pmax: 99.9
```

**Poor segmentation**
```yaml
# Check preprocessing
pipeline_setup:
  focus_correction: true   # Ensure enabled
  method: "basicpy_fl"     # Try different methods

# Verify correct channel
pipeline_setup:
  reference_channel: 0     # Should be brightfield
```

**Misclassification**
```yaml
# Adjust minimum object size
model_args:
  min_size: 100            # Filter out smaller objects

# Check class definitions match training
classes:
  0: "single-cell"         # Must match model training
  1: "clump"
  ...
```

## 7. Model Storage Best Practices

### File Organization

Recommended structure:
```
models/
├── collections/
│   ├── model_collection_tracking_20250529.zip
│   ├── model_collection_scsegm_20251106.zip
│   └── README.md                    # Document model versions
├── individual/
│   ├── bf_focus/
│   │   ├── v1.0/
│   │   │   ├── model.pth
│   │   │   └── config.json
│   │   └── v2.0/
│   │       ├── model.pth
│   │       └── config.json
│   └── ...
└── experimental/
    └── ...                          # Models under development
```

### Version Control

**DO:**
- Store model collections on shared filesystem or cloud storage
- Document model versions in README
- Tag configs with model version information
- Keep checksums of model files for verification

**DO NOT:**
- Commit large model files to git (use `.gitignore`)
- Overwrite production models without versioning
- Mix model versions in the same collection
- Share models without documentation

### Model Distribution

For sharing models:

```bash
# 1. Create bundle with CLI
hitmictools bundle -i models_info.yml -o model_collection_v1.0.zip
# Creates: model_collection_v1.0_20251218.zip (with auto-dating)

# 2. Calculate checksum
sha256sum model_collection_v1.0_20251218.zip > model_collection_v1.0_20251218.zip.sha256

# 3. Document
echo "Model Collection v1.0" > README.txt
echo "Creation date: 2025-12-18" >> README.txt
echo "Training date: 2025-05-29" >> README.txt
echo "Dataset: ASCT_training_set_042" >> README.txt
cat model_collection_v1.0_20251218.zip.sha256 >> README.txt

# 4. Share via cloud/network storage
# DO NOT email large files
```

## Summary

Model management in HiTMicTools:
- **Use model collections** for production (simplest approach)
- **Individual models** for development and testing
- **ReserveResource** for GPU memory management
- **MemoryLogger** for monitoring resource usage
- **Half precision** and patch size tuning for performance
- **Version control** for reproducibility

For model training and development, see the experimental notebooks in `experiments/`.