Dataset Training Roadmap
Track our progress on training the OceanLens AI with new marine life datasets. We're building this gradually to ensure the highest identification accuracy.
Behind the Identification Engine
How Our AI Thinks
Identifying blackwater larvae and midnight plankton isn't just a database lookup problem — it requires judgment. Here's how we built that judgment through three iterations.
The Strict Database
Our first approach was simple: the AI searched its database for the closest visual match and returned the answer. Fast and consistent — until the creature didn't exist in the database. Faced with the unknown, the system would confidently map it to the nearest wrong answer. In the vast, underdocumented world of blackwater diving, this was a recurring problem.
The Vision-First AI
We gave the AI permission to override the database whenever its visual analysis disagreed. This helped with unknown larvae — but created a new failure mode we called 'academic aggression.' When the database correctly identified something as Mollusca based on multiple verified references, the AI would spot a translucent wing-like structure and argue it was a ribbon worm instead. Opinionated when it should have been humble.
The Logic Tree
The current system doesn't choose between the database and visual reasoning — it defines exactly when each should win. Three rules govern every identification decision:
Ready to test the logic tree?
Upload a photo and watch V3 make its call — rules and all.
AI Database Training Status
Reef & Pelagic
General identification is live, but we are actively curating base data to reduce noise and achieve expert-level accuracy.
Blackwater & Plankton
AI is learning! We need more photo contributions to improve accuracy.
Foundational Taxonomy & Blackwater Focus
Curating a massive foundational dataset of 100,000+ images. By learning from all clear underwater environments (reef, muck, pelagic), the AI builds a strong baseline. However, our ultimate goal and specialized focus remains mastering Blackwater and Pelagic identification, where traditional data is scarce.
Large-Scale Taxa
Massive groups requiring automated pre-screening pipelines before manual QC. Combined they represent 380,000+ records — too large for the Phase 1 approach.
Dual Mode: Reef & Daylight
Expanding the database to support reef fish and benthic marine life for daytime photography.
What's Next
Beyond the Database
Two capabilities that will define how the world's diving and research communities interact with ocean intelligence.
Marine Vision API
The identification engine that powers OceanLens — now open to the world. Batch-process thousands of field specimens, integrate deep-sea intelligence into your research platform, or build the next tool for ocean science.
- Batch-process entire collections of lab or field photos in a single API call
- Built for marine biologists, research NGOs, and developers shipping biodiversity tools
Rare Species Radar
Every confirmed sighting becomes a live data point on a global map. Watch rare larvae and midnight plankton appear in real time — and see where the ocean's most elusive life chooses to surface.
- Anonymous heatmap of rare larval stages and pelagic sightings from divers worldwide
- Turns every dive into a contribution to global biodiversity science