NASA CloudML: Machine Learning for Atmospheric Remote Sensing

Status: Paper 1 pending journal submission (awaiting NASA approval). Paper 2 pending NASA Technical Reports Server (NTRS).

Overview

Machine learning framework for cloud base height retrieval from NASA ER-2 airborne observations, developed during my NASA Goddard Space Flight Center OSTEM internship (Summer 2025). Two first-author papers pending NASA approval, covering complementary analyses.

Paper 1: CBH Retrieval (1,426 samples, 5 flights)

Systematic comparison of atmospheric feature-based versus image-based ML for CBH retrieval.

Metric	Value
GBDT R² (per-flight shuffled CV)	0.744
Best CNN R² (ResNet-18)	0.617
GBDT MAE	117.4 m
CNN MAE	150.9 m
Labeled Samples	1,426

Domain shift: Leave-one-flight-out CV yields R² = -15.4 (catastrophic). Few-shot learning (50 samples) recovers R² = 0.57–0.85. Conformal prediction achieves 27% coverage (target: 90%); per-flight calibration recovers 86%.

Paper 2: Domain Shift Analysis (5,500 samples, 6 flights)

Expanded dataset focused on physics-informed feature engineering and domain adaptation.

Metric	Value
LOFO R² (6-flight mean)	-5.36
Worst single flight R²	-19.4
Few-shot recovery (50 samples)	R² = +0.35
Conformal coverage under shift	34% (target: 90%)
Within-flight calibration	90% coverage
Physics-derived features	29 (from 5 base ERA5 variables)

Five adaptation methods evaluated: only few-shot learning works. Instance weighting, TrAdaBoost, MMD alignment, and feature selection all fail or make things worse.

Why the Numbers Differ

The two papers analyze overlapping but different datasets. Paper 1 uses 1,426 samples across 5 flights (LOFO R² = -15.4). Paper 2 expands to 5,500 ocean-only boundary-layer observations across 6 flights (LOFO R² = -5.36). The shift is less severe in the expanded dataset because additional flights reduce the mean, but remains catastrophic in both cases.

Technical Approach

Data Pipeline

HDF5 preprocessing pipeline for NASA ER-2 observations
Temporal interpolation and radiometric correction
Integration with ERA5 reanalysis atmospheric data

Model Comparison

Feature-based: XGBoost gradient boosting with atmospheric variables
Image-based: CNNs (ResNet-18, EfficientNet-B0) on raw thermal IR imagery
Result: Atmospheric features significantly outperform raw images

Key Insight

Temporal autocorrelation (lag-1 ρ = 0.94) inflates pooled K-fold R² from 0.744 to 0.924. Per-flight shuffled CV is the honest within-regime metric.

Technology Stack

ML Frameworks: PyTorch, TensorFlow, scikit-learn
Gradient Boosting: XGBoost, LightGBM
Data Processing: HDF5, NetCDF, Pandas, NumPy
Atmospheric Data: ERA5 reanalysis

Affiliation

NASA Goddard Space Flight Center
OSTEM Intern – Atmospheric Remote Sensing
May – August 2025