Back to OceanLens
AI Training Progress

Dataset Training Roadmap

Track our progress on training the OceanLens AI with new marine life datasets. We're building this gradually to ensure the highest identification accuracy.

Behind the Identification Engine

How Our AI Thinks

Identifying blackwater larvae and midnight plankton isn't just a database lookup problem — it requires judgment. Here's how we built that judgment through three iterations.

V1The Foundation

The Strict Database

Our first approach was simple: the AI searched its database for the closest visual match and returned the answer. Fast and consistent — until the creature didn't exist in the database. Faced with the unknown, the system would confidently map it to the nearest wrong answer. In the vast, underdocumented world of blackwater diving, this was a recurring problem.

Hallucination by omission — confidently wrong when the species was absent from training data.
V2The Overcorrection

The Vision-First AI

We gave the AI permission to override the database whenever its visual analysis disagreed. This helped with unknown larvae — but created a new failure mode we called 'academic aggression.' When the database correctly identified something as Mollusca based on multiple verified references, the AI would spot a translucent wing-like structure and argue it was a ribbon worm instead. Opinionated when it should have been humble.

Overconfident visual reasoning that overruled solid database evidence.
V3Current System

The Logic Tree

The current system doesn't choose between the database and visual reasoning — it defines exactly when each should win. Three rules govern every identification decision:

🔒
Exact Match Rule When the database is > certain% confident at the Phylum or Class level, the AI cannot override it. Core morphology is settled by data.
🐛
Larval Exception The AI may add life-stage context the database lacks. Database says Mollusca → AI may clarify Veliger larval stage.
🛑
Veto Power When the database is uncertain and visual anatomy clearly contradicts it, the AI must answer 'Unknown.' An honest unknown is more valuable than a confident wrong answer.

Ready to test the logic tree?

Upload a photo and watch V3 make its call — rules and all.

Try the AI Now

AI Database Training Status

🐠

Reef & Pelagic

Refining Data

General identification is live, but we are actively curating base data to reduce noise and achieve expert-level accuracy.

Data QualityFair / Needs Curation
High VolumeMedium Precision
👽

Blackwater & Plankton

Alpha Training

AI is learning! We need more photo contributions to improve accuracy.

Seed Data: 531Goal: 2,000
01
Phase 1 · Active

Foundational Taxonomy & Blackwater Focus

Curating a massive foundational dataset of 100,000+ images. By learning from all clear underwater environments (reef, muck, pelagic), the AI builds a strong baseline. However, our ultimate goal and specialized focus remains mastering Blackwater and Pelagic identification, where traditional data is scarce.

Cnidaria · Ctenophora
Scyphozoa (True Jellyfish)
Imported
Hydrozoa (Hydromedusae / Siphonophores)
Imported
Cubozoa (Box Jellies)
Imported
Ctenophora (Comb Jellies)
Imported
Mollusca
Nudibranchia (Pelagic / Sea Slugs)
Imported
Cephalopoda (Octopus / Squid)
Imported
Pteropoda (Sea Butterflies)
Imported
Arthropoda · Annelida
Amphipoda
Imported
Stomatopoda (Mantis Shrimp Larvae)
Imported
Polychaeta (Bristle Worms)
Imported
Chordata
Thaliacea (Salps / Doliolids)
Imported
Micro-organisms
Radiolaria
Imported
Chaetognatha (Arrow Worms)
Not Found
Foraminifera
Imported
02
Phase 2 · Planned

Large-Scale Taxa

Massive groups requiring automated pre-screening pipelines before manual QC. Combined they represent 380,000+ records — too large for the Phase 1 approach.

Larval Actinopterygii
Queued
Pelagic Decapoda
Queued
Isopoda
Queued
03
Phase 3 · Future

Dual Mode: Reef & Daylight

Expanding the database to support reef fish and benthic marine life for daytime photography.

Reef Fishes
Queued
Benthic Creatures
Queued
UI Switch: Blackwater / Reef Mode
Queued

What's Next

Beyond the Database

Two capabilities that will define how the world's diving and research communities interact with ocean intelligence.

Coming Soon
For Researchers & Developers

Marine Vision API

The identification engine that powers OceanLens — now open to the world. Batch-process thousands of field specimens, integrate deep-sea intelligence into your research platform, or build the next tool for ocean science.

  • Batch-process entire collections of lab or field photos in a single API call
  • Built for marine biologists, research NGOs, and developers shipping biodiversity tools
Coming Soon
For the Diving Community

Rare Species Radar

Every confirmed sighting becomes a live data point on a global map. Watch rare larvae and midnight plankton appear in real time — and see where the ocean's most elusive life chooses to surface.

  • Anonymous heatmap of rare larval stages and pelagic sightings from divers worldwide
  • Turns every dive into a contribution to global biodiversity science
OceanLens · Marine Intelligence Platform