I have multiple sets of scientific data (European Data File files for EEG data) that I am reading in via MNE Python. Each EDF can be read in with a Numpy array that contains the time basis, say 0.00, 0.01, 0.02, 0.03... , and number of channels, each a Numpy array that represent measurements along that time base [-3.00, 4.40, 2.20, 3.40] for channel "FZ-CZ", [1.24, 1.34, 1.00, 3.11] for channel "CZ-PZ". These channels have text labels that are unique within a record, but can vary from record to record.
The non-ideal thought that I had would be to put these in a table in a vanilla Postgres DB, but the channel labels may not be consistent and this would be a lot of rows for a file on the order of minutes with dozens of channels. Perhaps having a "Channel" table where I can keep a running list of all of the channels for all of the files and avoid duplication, but again, this goes against my better judgement.
|PatientId|Channel|Time|Voltage| -------------------------------- | 1| FZ-CZ |0.00|-3.00 | | 1| FZ-CZ |0.01| 4.40 | | ... | ... |... | ... | | 1| CZ-PZ |0.00| 1.24 | | 1| CZ-PZ |0.01| 1.34 |
What I shudder at even more is the idea of having different tables for different records with a variable number of columns consisting of the voltage values of the channels.
I found an entire thesis (link directly downloads a PDF) on the topic of European Data Files, which seemed to encourage the use of storing these traces as binary data in the table. I don't mind having this as a secondary option, but I'd like to be able to query the data directly.
In searching the site, I found the following 3 questions helpful, and was more swayed to pursue a time series database solution. Design options for time series scientific data
From that bit of research, I decided to pursue something like QuestDB or Timescale or even Amazon Timestream, but it seems like all of these solutions are geared toward a.) Having exact timestamps for each of the datapoints (so like "01/01/2022 00:00"), and b.) incrementally adding data points to a table as measurements are being taken. Otherwise, these solutions seem ideal for storing this type of data.
Aside from turning the relative time points of my data into absolute time stamps, is there a way to get this to work with a time series database? If not, am I "stuck" storing these data as bytes/blobs or even something like JSON in a table or even just keeping the files in an S3 bucket and reading/writing them as needed?
As per timescale documentation here Timescale also supports integer values:
Hypertables are intended for time-series data, so your table needs a column that holds time values. This can be a timestamp, date, or integer.
External links referenced by this document:
Local articles referenced by this article: