Turbulence analysis (TURBAN) toolbox: Infrastructure
This manual assumes that you have read your data into python. For converting common file formats to python data, see the documentation of the individual instruments and platforms (such as MSS or MicroRider).
Exporting and importing data
TURBAN can export to and import from xarray datasets using its .to_xarray() and .from_xarray() methods available on Processing as well as Level1/2/3/4 objects.
TURBAN further defines convenience methods for importing ATOMIX files, which have been tested on a few of the available benchmark files. However, these tend to not follow a 100% strict format and so these methods may fail in untested cases.
The following would do a roundtrip in TURBAN:
from turban import ShearLevel3
l3 = ShearLevel3.from_atomix_netcdf("data/process/shear/MSS_Baltic.nc") # import from benchmark file
ds = l3.to_xarray() # TURBAN-compliant xarray dataset
l3_reimport = ShearLevel3.from_xarray(ds) # equal to l3
Shear processing
Molecular viscosity
Molecular viscosity can be set in two ways: Either by using a constant fallback value in the ShearConfig:
# Option 1
from turban import ShearConfig
cfg = ShearConfig(
molvisc_fallback=1.6e-6,
...
)
or by explicitly setting a molvisc auxiliary variable on the ShearLevel3 object from which Level 4 is derived. This can for instance be achieved by setting an auxiliary variable on Level 1 with the appropriate aggregation instructions:
# Option 2
import numpy as np
from turban import ShearLevel1
level1 = ShearLevel1.from_atomix_netcdf("data/process/shear/MSS_Baltic.nc")
molvisc_arr = np.linspace(1e-6, 2e-6, len(level1.time))
# specify aggregated name `molvisc`
level1.add_aux_data(molvisc_arr, "molvisc", "mean", "molvisc")
# level 3 will now contain a chunk-averaged variable called `molvisv`,
# which level 4 will then use
Sections, segments, and chunks
Nomenclature
Large parts of TURBAN operate on data in the form of timeseries $x(t)$ which are sampled at regular time steps $\Delta t=1/f_s$, yielding timeseries numbers $x_i$ where $i=1, 2,\dots N\in\mathbb{N}$. Any such timeseries must be split up at different levels to accommodate e.g. spectral analysis:
- Sections consist of (not necessarily, but usually) contiguous regions of $x_i$ to be analysed together, such as consecutive dives in the same file, or parts of the same cast that was interrupted by a snagged cable. They may be as long or short as necessary (but if they are shorter than the chunk length, they are effectively discarded from analysis). Sections are specified as integer array with a unique identifier for each section.
- Segments consist of a fixed-length piece of data to be analysed by some method, e.g., FFT, and are thus usually a power of $2$. These are specified by
segment_length. - Chunks consist of multiple consecutive segments in order to e.g. enhance the statistical reliability of a dissipation estimate. Inside each chunk
While sections in practice are mutually exclusive, both segments and chunks often overlap with the previous and next. These are specified in the code as follows:
| Context | Type | Variable name(s) in code | Meaning |
|---|---|---|---|
| Segment | int |
segment_length |
Number of samples per segment |
| Chunk | int |
segment_overlap |
Overlap (number of samples) of consecutive segments inside chunk |
| Chunk | int |
chunk_length |
Number of samples per individual chunk |
| Section | int |
chunk_overlap |
Overlap (number of samples) of consecutive chunks |
| Section | int array |
section_number |
Unique identifier per section, 0 means discard |
As example, let us consider a timeseries x of 13 samples. (We choose few samples in order to be able to write out arrays.) Of these, the first sample and the last two samples are to be discarded. This is achieved by
import numpy as np
x = np.random.rand(13)
section_number = np.zeros_like(x, dtype=int)
section_number[1:12] = 1
segment_length = 2
segment_overlap = 1
chunk_length = 4 # We can fit 3 half-overlapping segments in here
chunk_overlap = 1
Implementation
In order to use numpy's vectorized routines for time series analysis, we use the function get_chunking_index. It takes in the parameters section_number_or_data_len and arbitrarily many tuples of the form (length, overlap), and return an array of indices, in the following called idx.
When called with exactly the two tuples (chunk_length, chunk_overlap) and(segment_length, segment_overlap), then index idx (see sketch), when used to index an axis of length len(section_number_or_data_len) or section_number_or_data_len of any array, will trigger expansion of the time axis into three axes, that are, in turn:
1. The slow/reduced/aggregated time axis, i.e. counting chunks.
2. Inside each chunk, counting the number of segments.
3. Inside each segment, counting the samples attributed to each segment.
For instance, given a an array x whose last axis has length samples_len, we can calculate the FFT over all segments without any loop:
from turban.utils.util import get_chunking_index
idx = get_chunking_index(
len(x),
(chunk_length, chunk_overlap),
(segment_length, segment_overlap),
)
xr = x[..., idx] # reshape time axis. Now the last dimension contains the FFT segments
xr -= xr.mean(axis=-1)[..., np.newaxis] # subtract mean
xr *= np.hanning(segment_length)[np.newaxis, np.newaxis, :] # hanning window
Fx = np.fft.rfft(xr) # FFT(x). Now the last dimension contains the frequency axis
Similarly, one can easily calculate segment- or chunkwise variance, trends, or other. The same logic is also used for aggregating variables from "fast" to "slow" time.
Customising TURBAN
Logging
TURBAN operates with module-level loggers, one for each .py file with the respective name. Their loglevel can be set all at once or only pertaining to specific modules. The following sets the log level to debug for all code under turban/instruments/mss/:
from turban import set_turban_loglevel
set_turban_loglevel("WARNING") # default
set_turban_loglevel("DEBUG", "turban.instruments.mss")
Configuration objects
TURBAN operates with several kinds of high-level objects, each of which have their own settings:
1. Instruments, child classes of turban.instruments.generic.api.Instrument, are configured through instances of (child classes of) turban.instruments.generic.config.InstrumentConfig
2. Processing pipelines, defined in turban/process, e.g. ShearProcessing or one of its levels (e.g., ShearLevel1 through ShearLevel4), are configured using turban.process.generic.config.SegmentConfig
Both allow/require the user to make various parameter choices.
Changing algorithms
When a user needs to change an alorithm beyond a simple parameter, there are two options. 1. Of course, the source code may be modified. 2. Python allows "monkey patching" individual functions (i.e. replacing them with our own implementations), as in:
from turban.process.shear import level3
from numpy import ndarray, newaxis
from jaxtyping import Float
def my_apply_compensation_highpass(
x: Float[ndarray, "n_shear time_slow f"],
freq: Float[ndarray, "f"],
freq_highpass: float,
) -> Float[ndarray, "f"]:
correction_factor = np.ones_like(freq) # No highpass frequency compensation
x *= correction_factor[newaxis, :]
return correction_factor
# with the following, TURBAN would now use the user-defined function instead
# level3.apply_compensation_highpass = my_apply_compensation_highpass