The BIG data world in the Earth Sciences so far exists primarily for disciplines that generate massive volumes of observational or computed data using large-scale, shared instrumentation such as global sensor networks, satellites, or high-performance computing facilities. These data are typically highly standardized, and managed and curated by well-supported community data facilities. In many other Geoscience domains, especially those where data are primarily acquired by individual investigators or small teams (known as ‘Long-tail science communities’), data are poorly shared and integrated, lacking a community-based data infrastructure that ensures persistent access, quality control, standardization, and integration of data, as well as appropriate tools to fully explore and mine the data within the context of broader Earth Science datasets. In this presentation I will offer some insights from my long-term work with data systems in geochemistry, describing technical and cultural achievements, challenges, and opportunities to advance data science in a long-tail Geoscience domain.