How training AI on birds is revealing underwater mysteries

by February 9, 2026

by February 9, 2026 0 comments

Evaluation

We evaluated Perch 2.0 using a few-shot linear probe on maritime tasks, such as distinguishing between baleen whale species or variety killer whale subpopulation compared its performance to pre-trained models supported by our perch hoplite A repository for agile modeling and transfer learning. they include Perch 2.0, Perch 1.0, surfperchand this multispecies whale model.

For underwater data evaluation, we used three datasets: NOAA PIPAN, refsetAnd DCLDE.

NOAA Pipan: an annotated subset of NOAA NCEI Passive Acoustic Data Collection From NOAA Pacific Islands Fisheries Science Center recording. It includes labels used in our prior whale models, as well as new annotations for baleen species such as the common minke whale, humpback whale, sei whale, blue whale, fin whale, and Bryde’s whale.
refset: developed for surfperch Model training, this dataset leverages data annotations Google Arts and Culture Project: : call into our corral. This includes a mix of biological reef noises (croaks, crackles, growls), specific species/genera classes (for example, damselfishes, dolphins, and groupers), and anthropomorphic noises and wave classes.
DCLDE: This dataset is evaluated using three different label sets:
- Species: killer whale, humpback, to distinguish between inorganic sounds and unidentified underwater sounds (with some uncertainty in killer whale and humpback labels).
- Known species bio: For some labels of killer whales and humpbacks.
- Ecotype: to distinguish between killer whale sub-populations (ecotype), which includes Transient/Biggs, Northern Residents, Southern Residents, Southeastern Alaska Killer Whales, and Offshore Killer Whales.

In this protocol, for a given target dataset with labeled data, we compute embeddings from each candidate model. We then select a fixed number of examples per class (4, 8, 16, or 32), and train a simple multi-class logistic regression model on top of the embeddings. We use the resulting classifier to calculate Area under the receiver-operating characteristic curve (AUC_ROC), where values closer to 1 indicate a stronger ability to distinguish between classes. This process uses a given pre-trained embedding model to create a custom classifier from a small number of labeled examples.

Our results show that more examples per class improves performance in all models except the reefset data, where performance is high even with only four examples per class for all models except the multi-species whale model. Specifically, Perch 2.0 is consistently either the top or second best performing model for every dataset and sample size.

Microsoft AI proposes OrbitalBrain: enabling distributed machine learning in space with inter-satellite links and constellation-aware resource optimization strategies

How training AI on birds is revealing underwater mysteries

Evaluation

Microsoft AI proposes OrbitalBrain: enabling distributed machine learning in space with inter-satellite links and constellation-aware resource optimization strategies

Leveraging Emerging AI Agents in Composable CDPs

Related Articles

Leave a Comment Cancel Reply