Antelop

A user-friendly software package for systems neuroscience data management

Rory Bedford

MRC LMB

Big data in neuroscience

  • New technologies such as Neuropixels allow us to record an increasingly high number of units
  • Continual shift towards more automated behavioural assays means longer/more recordings
  • Scientists have to spend more time on data processing workflows and data management
  • Increasingly advanced skills are needed to undertake this (HPC skills, data engineering, etc.)

Data engineering challenges

  • Custom file formats/project structures are hard to parse
  • Custom preprocessing/analysis scripts are very difficult to reproduce
  • Lab data storage not centralised
  • High entry barrier to existing tools like DataJoint and NWB which makes their adoption difficult for many labs

Our solution: Antelop

  • Software package designed to facilitate the easy adoption of data processing and storage best practices
  • Simple pip install and straightforward graphical configuration
  • Extensive graphical user interface for all aspects of your data management and processing
  • MySQL database backend for centralised storage
  • Supports electrophysiology, calcium imaging, and behavioural data processing with HPC integration

Our solution: Antelop

  • Integrates with existing tools, such as popular spikesorters, CaImAn, and DeepLabCut
    • Leverages the fact that preprocessing requirements for common acquisition methods are fairly uniform
  • Implements a range of data visualisation tools and metrics out of the box, including an analysis standard library
  • Supports the writing of custom analysis scripts, with direct integration to your lab’s GitHub and data immutability checks for reproducibility
  • Has import/export functions for NWB and a range of acquisition systems
  • Has a strictly structured but accomodating database structure for analysis routines to utilise

Infrastructure

Data analysis potential

  • Antelop can be used to curate a large database of paired neural activity and environmental events for a domain of interest (eg Hippocampus and spatial navigation tasks)
  • This facilitates the use of large scale multi-recording analyses
  • For example: neuroscience foundation models
    • Neural nets trained via self-supervised learning to model the relationship between activity and the environmental factor of interest
    • Can be fine-tuned on a number of sub-tasks such as decoding a quantity of interest

How to represent a complex environment?

How to represent a complex environment?

  • Environmental arrays split into four different types
  • Each contains NumPy arrays with a strict structure, metadata, and common clock with neural activity
  • All data belongs to the environment (which belongs to a recording session, experimenter, etc.)
  • Environments can have one or more subject
  • Optionally, data can also belong to a subject

Publishing

  • Working on a preprint at present
  • We aim to publish by May this year
  • Python package has been released but is undergoing extensive testing
  • Docs available

Thank you