3. Design Decisions#
This page documents key architectural decisions in roiextractors, including the rationale behind them. It is intended for future developers so they know why things are the way they are.
3.1. Timestamps#
Roiextractors is tightly integrated with NeuroConv, which writes data to NWB. When NeuroConv decides whether to store timestamps or just a sampling rate in NWB, the timestamp API must satisfy three constraints:
Respect user intentions. If the user (or upstream code) has explicitly set timestamps via
set_times(), those must take priority. The user may have computed corrected timing from TTL pulses or applied temporal alignment shifts, and that decision should be preserved.Be faithful to the source format. If the data format contains hardware timestamps recorded by the acquisition system and the user has not used
set_times(), those timestamps should be written to NWB.Avoid unnecessary allocation. If neither user-set nor native timestamps exist, timing can only be reconstructed from the sampling rate. The API should let NeuroConv detect this case without allocating a full
np.arange(n) / sampling_frequencyarray.
These three constraints create a tension. A single method like has_time_vector() that only
checks whether _times has been cached cannot distinguish between “no timestamps were set”
and “native timestamps exist but have not been loaded yet.” Eagerly loading native timestamps at
initialization would resolve this ambiguity, but it would increase the memory footprint of every
extractor regardless of whether those timestamps are ever needed. To address this, the current
solution separates the question of provenance (does the source format have timestamps?) from the act
of materializing them.
3.1.1. Separation of get_native_timestamps and get_timestamps#
To satisfy these constraints, the timestamp API separates concerns into distinct methods:
get_timestamps()provides a consistent API that always returns timestamps regardless of the state of the extractor. Its contract is simple: call it, get anumpy.ndarrayof times in seconds. Users never need to worry about whether the format has native timestamps, whether someone calledset_times(), or whether timing must be reconstructed from the sampling rate.get_native_timestamps()is the method that format readers override. It answers a provenance question: “Does this format have hardware timestamps?” If yes, it returns them; if not, the default implementation returnsNone. ReturningNonerather than raising an exception lets NeuroConv distinguish “no native timestamps” from “timestamps exist but failed to load”. This addresses constraint 2.
With this separation, downstream code can resolve all three constraints through a simple priority chain:
What timing information is available?
|
|-- has_time_vector()?
| |
| yes --> user-set timestamps (constraint 1: respect user intentions)
|
|-- get_native_timestamps() is not None?
| |
| yes --> hardware timestamps from the source format (constraint 2: faithful to source)
|
'-- sampling rate only (constraint 3: no unnecessary allocation)
has_time_vector() is a cheap check (it tests whether _times has been cached).
get_native_timestamps() is only called when the first check fails, and it returns None
immediately for formats without hardware timestamps, so no array is allocated in that case.
When native timestamps do exist, loading them is unavoidable: any consumer that needs to decide
between writing a rate or a full timestamp array must inspect the actual values.
Note
The name get_native_timestamps was chosen over get_original_timestamps because the
latter already exists in NeuroConv with different semantics: there it returns timestamps as
they were at initialization, before any user modifications or temporal alignment shifts,
rather than indicating provenance. Using a distinct name avoids this conflict.
The original discussion is in issue #448 and PR #465.
3.1.2. Why get_native_timestamps is abstract#
get_native_timestamps() is marked @abstractmethod on both ImagingExtractor and
SegmentationExtractor, even though the method body provides a default return None. This
is intentional: it forces every concrete extractor to explicitly implement the method and decide
whether its format has hardware timestamps.
Without the @abstractmethod decorator, a developer writing a new extractor could easily
overlook get_native_timestamps entirely. Their extractor would silently inherit the default
return None, and if the format did contain hardware timestamps, no one would notice until
a user found incorrect timing in their NWB file. Since timestamp provenance directly determines
what NeuroConv writes to NWB (a sampling rate vs. a full timestamps array), a silent omission
here is a data-correctness bug.
The cost of the decorator is small: extractors without native timestamps write a two-line
override that returns None. The benefit is that every extractor author is required to
consciously opt out rather than accidentally inheriting a default they may not know exists.
3.1.3. Copy vs. view semantics for sliced timestamps#
Beyond deciding which timestamps to store, slicing introduces a question about how timestamps are shared in memory between parent and child extractors.
When an extractor is temporally sliced (e.g., via slice_samples()), there are two constraints
in tension:
We do not want to copy the full timestamps array into the child extractor. That would duplicate memory unnecessarily.
But the child must be independent from the parent. Calling
set_times()on the child should not affect the parent, and vice versa.
This is an instance of the general aliasing problem: shared mutable state risks unintended
side effects. In NumPy, this manifests as the view vs. copy distinction. A view
(parent._times[start:end]) is memory-efficient but shares the underlying data: in-place
mutations on one side propagate to the other. A copy is safe but duplicates memory.
The current implementation uses a view, so it is memory-efficient but also has the benefits of a copy:
self._times = self._parent_imaging._times[start_sample:end_sample]
This works because of an internal contract that roiextractors must uphold: _times is only
ever replaced wholesale (via set_times()), never modified element-by-element. Consider both
directions:
Parent calls ``set_times()``: the parent rebinds its
_timesto a new array. The child’s view still references the original array, so it is unaffected.Child calls ``set_times()``: the child rebinds its
_timesto a new array, abandoning the view. The parent’s array is unaffected.
In both cases, the replacement (rather than mutation) of _times means the view is either
preserved or abandoned, never partially modified. The result is the memory efficiency of views
with the isolation of copies.
See PR #498 for the original discussion.