Status: Paper 1 pending journal submission (awaiting NASA approval). Paper 2 pending NASA Technical Reports Server (NTRS).
Overview
Machine learning framework for cloud base height retrieval from NASA ER-2 airborne observations, developed during my NASA Goddard Space Flight Center OSTEM internship (Summer 2025). Two first-author papers pending NASA approval, covering complementary analyses.
Paper 1: CBH Retrieval (1,426 samples, 5 flights)
Systematic comparison of atmospheric feature-based versus image-based ML for CBH retrieval.
| Metric | Value |
|---|---|
| GBDT R² (per-flight shuffled CV) | 0.744 |
| Best CNN R² (ResNet-18) | 0.617 |
| GBDT MAE | 117.4 m |
| CNN MAE | 150.9 m |
| Labeled Samples | 1,426 |
Domain shift: Leave-one-flight-out CV yields R² = -15.4 (catastrophic). Few-shot learning (50 samples) recovers R² = 0.57–0.85. Conformal prediction achieves 27% coverage (target: 90%); per-flight calibration recovers 86%.
Paper 2: Domain Shift Analysis (5,500 samples, 6 flights)
Expanded dataset focused on physics-informed feature engineering and domain adaptation.
| Metric | Value |
|---|---|
| LOFO R² (6-flight mean) | -5.36 |
| Worst single flight R² | -19.4 |
| Few-shot recovery (50 samples) | R² = +0.35 |
| Conformal coverage under shift | 34% (target: 90%) |
| Within-flight calibration | 90% coverage |
| Physics-derived features | 29 (from 5 base ERA5 variables) |
Five adaptation methods evaluated: only few-shot learning works. Instance weighting, TrAdaBoost, MMD alignment, and feature selection all fail or make things worse.
Why the Numbers Differ
The two papers analyze overlapping but different datasets. Paper 1 uses 1,426 samples across 5 flights (LOFO R² = -15.4). Paper 2 expands to 5,500 ocean-only boundary-layer observations across 6 flights (LOFO R² = -5.36). The shift is less severe in the expanded dataset because additional flights reduce the mean, but remains catastrophic in both cases.
Technical Approach
Data Pipeline
- HDF5 preprocessing pipeline for NASA ER-2 observations
- Temporal interpolation and radiometric correction
- Integration with ERA5 reanalysis atmospheric data
Model Comparison
- Feature-based: XGBoost gradient boosting with atmospheric variables
- Image-based: CNNs (ResNet-18, EfficientNet-B0) on raw thermal IR imagery
- Result: Atmospheric features significantly outperform raw images
Key Insight
Temporal autocorrelation (lag-1 ρ = 0.94) inflates pooled K-fold R² from 0.744 to 0.924. Per-flight shuffled CV is the honest within-regime metric.
Technology Stack
- ML Frameworks: PyTorch, TensorFlow, scikit-learn
- Gradient Boosting: XGBoost, LightGBM
- Data Processing: HDF5, NetCDF, Pandas, NumPy
- Atmospheric Data: ERA5 reanalysis
Links
- GitHub Repository
- Paper 1: Pending journal submission (awaiting NASA approval)
- Paper 2: Pending NASA Technical Reports Server (NTRS)
Affiliation
NASA Goddard Space Flight Center
OSTEM Intern – Atmospheric Remote Sensing
May – August 2025