Design Decisions ================ This page documents key architectural decisions in roiextractors, including the rationale behind them. It is intended for future developers so they know why things are the way they are. Timestamps ---------- Roiextractors is tightly integrated with `NeuroConv `_, which writes data to NWB. When NeuroConv decides whether to store timestamps or just a sampling rate in NWB, the timestamp API must satisfy three constraints: 1. **Respect user intentions.** If the user (or upstream code) has explicitly set timestamps via ``set_times()``, those must take priority. The user may have computed corrected timing from TTL pulses or applied temporal alignment shifts, and that decision should be preserved. 2. **Be faithful to the source format.** If the data format contains hardware timestamps recorded by the acquisition system and the user has not used ``set_times()``, those timestamps should be written to NWB. 3. **Avoid unnecessary allocation.** If neither user-set nor native timestamps exist, timing can only be reconstructed from the sampling rate. The API should let NeuroConv detect this case without allocating a full ``np.arange(n) / sampling_frequency`` array. These three constraints create a tension. A single method like ``has_time_vector()`` that only checks whether ``_times`` has been cached cannot distinguish between "no timestamps were set" and "native timestamps exist but have not been loaded yet." Eagerly loading native timestamps at initialization would resolve this ambiguity, but it would increase the memory footprint of every extractor regardless of whether those timestamps are ever needed. To address this, the current solution separates the question of *provenance* (does the source format have timestamps?) from the act of *materializing* them. Separation of ``get_native_timestamps`` and ``get_timestamps`` ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ To satisfy these constraints, the timestamp API separates concerns into distinct methods: - ``get_timestamps()`` provides a consistent API that always returns timestamps regardless of the state of the extractor. Its contract is simple: call it, get a ``numpy.ndarray`` of times in seconds. Users never need to worry about whether the format has native timestamps, whether someone called ``set_times()``, or whether timing must be reconstructed from the sampling rate. - ``get_native_timestamps()`` is the method that format readers override. It answers a **provenance** question: *"Does this format have hardware timestamps?"* If yes, it returns them; if not, the default implementation returns ``None``. Returning ``None`` rather than raising an exception lets NeuroConv distinguish "no native timestamps" from "timestamps exist but failed to load". This addresses constraint 2. With this separation, downstream code can resolve all three constraints through a simple priority chain: .. code-block:: text What timing information is available? | |-- has_time_vector()? | | | yes --> user-set timestamps (constraint 1: respect user intentions) | |-- get_native_timestamps() is not None? | | | yes --> hardware timestamps from the source format (constraint 2: faithful to source) | '-- sampling rate only (constraint 3: no unnecessary allocation) ``has_time_vector()`` is a cheap check (it tests whether ``_times`` has been cached). ``get_native_timestamps()`` is only called when the first check fails, and it returns ``None`` immediately for formats without hardware timestamps, so no array is allocated in that case. When native timestamps do exist, loading them is unavoidable: any consumer that needs to decide between writing a rate or a full timestamp array must inspect the actual values. .. note:: The name ``get_native_timestamps`` was chosen over ``get_original_timestamps`` because the latter already exists in NeuroConv with different semantics: there it returns timestamps as they were at initialization, before any user modifications or temporal alignment shifts, rather than indicating provenance. Using a distinct name avoids this conflict. The original discussion is in `issue #448 `_ and `PR #465 `_. Why ``get_native_timestamps`` is abstract ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ``get_native_timestamps()`` is marked ``@abstractmethod`` on both ``ImagingExtractor`` and ``SegmentationExtractor``, even though the method body provides a default ``return None``. This is intentional: it forces every concrete extractor to explicitly implement the method and decide whether its format has hardware timestamps. Without the ``@abstractmethod`` decorator, a developer writing a new extractor could easily overlook ``get_native_timestamps`` entirely. Their extractor would silently inherit the default ``return None``, and if the format *did* contain hardware timestamps, no one would notice until a user found incorrect timing in their NWB file. Since timestamp provenance directly determines what NeuroConv writes to NWB (a sampling rate vs. a full timestamps array), a silent omission here is a data-correctness bug. The cost of the decorator is small: extractors without native timestamps write a two-line override that returns ``None``. The benefit is that every extractor author is required to consciously opt out rather than accidentally inheriting a default they may not know exists. Copy vs. view semantics for sliced timestamps ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Beyond deciding *which* timestamps to store, slicing introduces a question about *how* timestamps are shared in memory between parent and child extractors. When an extractor is temporally sliced (e.g., via ``slice_samples()``), there are two constraints in tension: - We do not want to **copy** the full timestamps array into the child extractor. That would duplicate memory unnecessarily. - But the child must be **independent** from the parent. Calling ``set_times()`` on the child should not affect the parent, and vice versa. This is an instance of the general `aliasing problem `_: shared mutable state risks unintended side effects. In NumPy, this manifests as the view vs. copy distinction. A **view** (``parent._times[start:end]``) is memory-efficient but shares the underlying data: in-place mutations on one side propagate to the other. A **copy** is safe but duplicates memory. The current implementation uses a view, so it is memory-efficient but also has the benefits of a copy: .. code-block:: python self._times = self._parent_imaging._times[start_sample:end_sample] This works because of an internal contract that roiextractors must uphold: ``_times`` is only ever replaced wholesale (via ``set_times()``), never modified element-by-element. Consider both directions: - **Parent calls ``set_times()``**: the parent rebinds its ``_times`` to a new array. The child's view still references the original array, so it is unaffected. - **Child calls ``set_times()``**: the child rebinds its ``_times`` to a new array, abandoning the view. The parent's array is unaffected. In both cases, the replacement (rather than mutation) of ``_times`` means the view is either preserved or abandoned, never partially modified. The result is the memory efficiency of views with the isolation of copies. See `PR #498 `_ for the original discussion.