Logo


Welcome to TSSEARCH documentation!

Time Series Subsequence Search Python package (TSSEARCH for short) is a Python package that assists researchers in exploratory analysis for query search and time series segmentation without requiring significant programming effort. It contains curated routines for query and subsequence search. TSSEARCH installation is straightforward and goes along with startup code examples. Our goal is to provide the tools to get faster insights for your time series.

Highlights

  • Search: we provide methods for time series query search and segmentation
  • Weights: the relative contribution of each point of the query to the overall distance can be expressed using a user-defined weight vector
  • Visualization: we provide visualizations to present the results of the segmentation and query search
  • Unit tested: we provide unit tests for each distance
  • Easily extended: adding new distances is easy, and we encourage you to contribute with your custom distances or search methods

Contents

In development

Lockstep Distances

Distance measures that compare the i-th point of one time series to the i-th point of another are named as lock-step measures (e.g., Euclidean distance and the other Lp norms). A linear interpolation is computed to the longer series to ensure it has the same length as the shorter series.

Elastic Distances

Distance measures that perform a non-linear mapping to align the time series and allow flexible comparison of one-to-many or one-to-none points (e.g., Dynamic Time Warping, Longest Common Subsequence). These measures produce elastic adjustment to compensate for potential localized misalignment.

Dynamic Time Warping (DTW)

The DTW algorithm computes the stretch of the time axis which optimally maps between two time series. It measures the remaining cumulative distance after the alignment and the pairwise correspondence between each sample.

import numpy as np
import matplotlib.pyplot as plt
from tssearch.search.query_search import time_series_search
from tssearch.utils.visualisation import plot_alignment

# generates signals
freq = 2
amp = 2
time = np.linspace(0, 2, 100)
ts1 = np.concatenate([amp * np.sin(np.pi * time), np.zeros(100), amp * np.sin(np.pi * time), np.zeros(10)])
ts2 = np.concatenate([np.zeros(10), amp * np.sin(np.pi * time), np.zeros(150), amp * np.sin(np.pi * time), np.zeros(5)])

dict_distances = {
    "elastic": {"Dynamic Time Warping": {
        "multivariate": "yes",
        "description": "",
        "function": "dtw",
        "parameters": {"dtw_type": "dtw", "alpha": 1},
        "use": "yes"}
    }
}

result = time_series_search(dict_distances, ts1, ts2, output=("number", 1))

plt.figure()
plt.title("Dynamic Time Warping")
plot_alignment(ts1, ts2, result["Dynamic Time Warping"]["path"][0])
plt.legend(fontsize=17, loc="lower right")
An example of DTW.

Longest Common Subsequence (LCSS)

The Longest Common Subsequence (LCSS) measures the similarity between two time series whose lengths might be different. Since it is formulated based on edit distances, gaps or unmatched regions are permitted and they are penalized with a value proportional to their length. It can be useful to identify similarities between time series whose lengths differ greatly or have noise [1].

In the example below, we compute the LCSS alignment between two time series, one of them with added noise.

import numpy as np
import matplotlib.pyplot as plt
from tssearch.search.query_search import time_series_search
from tssearch.utils.visualisation import plot_alignment

ts1 = np.sin(np.arange(0, 4*np.pi, 0.1))
noise = np.random.normal(0, 0.1, ts1.shape)
ts2 = 1 + np.sin(np.arange(0, 4*np.pi, 0.1) + 2) + noise

ts1 = ts1.reshape(-1, 1)
ts2 = ts2.reshape(-1, 1)

dict_distances = {
    "elastic": {"Longest Common Subsequence": {
        "multivariate": "yes",
        "description": "",
        "function": "lcss",
        "parameters": {"eps": 1, "report": "distance"},
        "use": "yes"}
    }
}

result = time_series_search(dict_distances, ts1, ts2, output=("number", 1))

plt.figure()
plt.title("Longest Common Subsequence")
plot_alignment(ts1, ts2, result["Longest Common Subsequence"]["path"][0])
An example of LCSS.

Time Warp Edit Distance (TWED)

Time warp edit distance (TWED) uses sequences’ samples indexes/timestamps difference to linearly penalize the matching of samples for which indexes/timestamps values are too far and to favor the matching samples for which indexes/timestamps values are closed. Contrarily to other elastic measures, TWED entails a time shift tolerance controlled by the stiffness parameter of the measure. Moreover, it involves a second parameter defining a constant penalty for insert or delete operations. If stiffness > 0, TWED is a distance (i.e., verifies the triangle inequality) in both space and time [2].

TWED has been used in time series classification assessing classification performance while varying TWED input parameters [2], [3]. In the example, we calculate TWED between two time series varying its parameters.

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from tssearch.distances.compute_distance import time_series_distance

# generates signals
freq = 2
amp = 2
time = np.linspace(0, 2, 1000)
ts1 = amp * np.sin(2 * np.pi * freq * time)
ts2 = amp * np.sin(6 * np.pi * freq * time)[::50]

# visualize original and downsampled sequence
plt.figure()
plt.plot(time, ts1, color=sns.color_palette("Greens")[2], label="Time series 1", lw=3.)
plt.plot(time[::50], ts2, color=sns.color_palette("Greens")[5], label="Time series 2",  lw=3.)
plt.ylabel('Space')
plt.xlabel('Time')
plt.legend(fontsize=17, loc="lower right")

stiffness = [1e-5, 1e-4, 1e-3, 1e-2, 1e-1, 1]
penalty = [0, .25, .5, .75, 1.0]

distance = list()
for s in stiffness:
    for p in penalty:
        # calculate distances
        dict_distances = {
                "elastic": {"Time Warp Edit Distance": {"multivariate": "no",
                                                        "description": "",
                                                        "function": "twed",
                                                        "parameters": {"nu": s, "lmbda": p, "p": 2, "time": "true"},
                                                        "use": "yes"}}}

        distance.append({'stiffness': s,
                         'penalty': p,
                         'distance': time_series_distance(dict_distances,
                                                          ts1, ts2,
                                                          time, time[::50]).values[0][0]})

df = pd.DataFrame(distance)
df_pivot = df.pivot("stiffness", "penalty", "distance")

plt.figure()
sns.heatmap(df_pivot, annot=True, cbar_kws={'label': "TWED"}, cmap="viridis")
Two example series Resulting TWED distances
[1]
  1. Vlachos, G. Kollios and D. Gunopulos, “Discovering similar multidimensional trajectories,” Proceedings 18th International Conference on Data Engineering, 2002, pp. 673-684, doi: 10.1109/ICDE.2002.994784.
[2](1, 2)
  1. Marteau, “Time Warp Edit Distance with Stiffness Adjustment for Time Series Matching,” in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 2, pp. 306-318, Feb. 2009, doi: 10.1109/TPAMI.2008.76.
[3]Joan Serrà, Josep Ll. Arcos, An empirical evaluation of similarity measures for time series classification, Knowledge-Based Systems, Volume 67, 2014, Pages 305-314, ISSN 0950-7051, https://doi.org/10.1016/j.knosys.2014.04.035.

Module Reference

Search

segmentation

tssearch.search.segmentation.time_series_segmentation(dict_distances, query, sequence, tq=None, ts=None, weight=None)[source]

Time series segmentation locates the time instants between consecutive query repetitions on a more extended and repetitive sequence.

Parameters:
  • dict_distances (dict) – Configuration file with distances
  • query (nd-array) – Query time series.
  • sequence (nd-array) – Sequence time series.
  • tq (nd-array) – Time stamp time series query.
  • ts (nd-array) – Time stamp time series sequence.
  • weight (nd-array (Default: None)) – query weight values
Returns:

segment_results – Segmented time instants for each given distances

Return type:

dict

utils

Query search for elastic measures

Parameters:
  • dict_distances (dict) – Configuration file with distances
  • query (nd-array) – Query time series.
  • sequence (nd-array) – Sequence time series.
  • tq (nd-array) – Time stamp time series query.
  • ts (nd-array) – Time stamp time series sequence.
  • weight (nd-array (Default: None)) – query weight values
Returns:

  • distance (nd-array) – distance value between query and sequence
  • ac (nd-array) – accumulated cost matrix

Query search for lockstep measures

Parameters:
  • dict_distances (dict) – Configuration file with distances
  • query (nd-array) – Query time series.
  • sequence (nd-array) – Sequence time series.
  • weight (nd-array (Default: None)) – query weight values
Returns:

res – distance value between query and sequence

Return type:

nd-array

tssearch.search.search_utils.start_sequences_index(distance, output=('number', 1), overlap=1.0)[source]

Method to retrieve the k-best occurrences from a given vector distance

Parameters:
  • distance (nd-array) – distance values
  • output (tuple) – number of occurrences
  • overlap (float) – minimum distance between occurrences
Returns:

id_s – indexes of k-best occurrences

Return type:

nd-array

Distances

Lockstep Distances

tssearch.distances.lockstep_distances.braycurtis_distance(x, y, weight=None)[source]

Computes the Braycurtis distance between two time series.

If the time series do not have the same length, an interpolation is performed.

Parameters:
  • x (nd-array) – Time series x.
  • y (nd-array) – Time series y.
  • weight (nd-array (Default: None)) – query weight values.
Returns:

Braycurtis distance value.

Return type:

float

tssearch.distances.lockstep_distances.canberra_distance(x, y, weight=None)[source]

Computes the Canberra distance between two time series.

If the time series do not have the same length, an interpolation is performed.

Parameters:
  • x (nd-array) – Time series x.
  • y (nd-array) – Time series y.
  • weight (nd-array (Default: None)) – query weight values.
Returns:

Canberra distance value.

Return type:

float

tssearch.distances.lockstep_distances.chebyshev_distance(x, y, weight=None)[source]

Computes the Chebyshev distance between two time series.

If the time series do not have the same length, an interpolation is performed.

Parameters:
  • x (nd-array) – Time series x.
  • y (nd-array) – Time series y.
  • weight (nd-array (Default: None)) – query weight values.
Returns:

Chebyshev distance value.

Return type:

float

tssearch.distances.lockstep_distances.correlation_distance(x, y, weight=None)[source]

Computes the correlation distance between two time series.

If the time series do not have the same length, an interpolation is performed.

Parameters:
  • x (nd-array) – Time series x.
  • y (nd-array) – Time series y.
  • weight (nd-array (Default: None)) – query weight values.
Returns:

Correlation distance value.

Return type:

float

tssearch.distances.lockstep_distances.cosine_distance(x, y, weight=None)[source]

Computes the cosine distance between two time series.

If the time series do not have the same length, an interpolation is performed.

Parameters:
  • x (nd-array) – Time series x.
  • y (nd-array) – Time series y.
  • weight (nd-array (Default: None)) – query weight values.
Returns:

Cosine distance value.

Return type:

float

tssearch.distances.lockstep_distances.euclidean_distance(x, y, weight=None)[source]

Computes the Euclidean distance between two time series.

If the time series do not have the same length, an interpolation is performed.

Parameters:
  • x (nd-array) – Time series x.
  • y (nd-array) – Time series y.
  • weight (nd-array (Default: None)) – query weight values.
Returns:

Euclidean distance value.

Return type:

float

tssearch.distances.lockstep_distances.hamming_distance(x, y, weight=None)[source]

Computes the Hamming distance between two time series.

If the time series do not have the same length, an interpolation is performed.

Parameters:
  • x (nd-array) – Time series x.
  • y (nd-array) – Time series y.
  • weight (nd-array (Default: None)) – query weight values.
Returns:

Hamming distance value.

Return type:

float

tssearch.distances.lockstep_distances.mahalanobis_distance(x, y, weight=None)[source]

Computes the Mahalanobis distance between two time series.

If the time series do not have the same length, an interpolation is performed.

Parameters:
  • x (nd-array) – Time series x.
  • y (nd-array) – Time series y.
  • weight (nd-array (Default: None)) – query weight values.
Returns:

Mahalanobis distance value.

Return type:

float

tssearch.distances.lockstep_distances.manhattan_distance(x, y, weight=None)[source]

Computes the Manhattan distance between two time series.

If the time series do not have the same length, an interpolation is performed.

Parameters:
  • x (nd-array) – Time series x.
  • y (nd-array) – Time series y.
  • weight (nd-array (Default: None)) – query weight values.
Returns:

Manhattan distance value.

Return type:

float

tssearch.distances.lockstep_distances.minkowski_distance(x, y, weight=None, p=3)[source]

Computes the Minkowski distance between two time series.

If the time series do not have the same length, an interpolation is performed.

Parameters:
  • x (nd-array) – Time series x.
  • y (nd-array) – Time series y.
  • weight (nd-array (Default: None)) – query weight values.
  • p (int) – Lp norm distance degree.
Returns:

Minkowski distance value.

Return type:

float

tssearch.distances.lockstep_distances.pearson_correlation(x, y, beta=None)[source]

Computes the Pearson correlation between two time series.

If the time series do not have the same length, an interpolation is performed.

Parameters:
  • x (nd-array) – Time series x.
  • y (nd-array) – Time series y.
  • beta (float) – Beta coefficient.
Returns:

Pearson correlation value.

Return type:

float

tssearch.distances.lockstep_distances.short_time_series_distance(x, y, tx=None, ty=None)[source]

Computes the short time series distance (STS) between two time series.

Reference: Möller-Levet, C. S., Klawonn, F., Cho, K., and Wolkenhauer, O. (2003).

Parameters:
  • x (nd-array) – Time series x.
  • y (nd-array) – Time series y.
  • tx (nd-array) – Sampling index of time series x.
  • ty (nd-array) – Sampling index of time series y.
Returns:

Short time series distance value.

Return type:

float

tssearch.distances.lockstep_distances.sqeuclidean_distance(x, y, weight=None)[source]

Computes the squared Euclidean distance between two time series.

If the time series do not have the same length, an interpolation is performed.

Parameters:
  • x (nd-array) – Time series x.
  • y (nd-array) – Time series y.
  • weight (nd-array (Default: None)) – query weight values.
Returns:

Squared Euclidean distance value.

Return type:

float

Elastic Distances

tssearch.distances.elastic_distances.dlp(x, y, p=2)[source]

Computes Lp norm distance between two time series.

Parameters:
  • x (nd-array) – Time series x (query).
  • y (nd-array) – Time series y.
  • p (int) – Lp norm distance degree for local cost computation.
Returns:

Return type:

The Lp distance.

tssearch.distances.elastic_distances.dtw(x, y, weight=None, **kwargs)[source]

Computes Dynamic Time Warping (DTW) of two time series.

Parameters:
  • x (nd-array) – Time series x (query).
  • y (nd-array) – Time series y.
  • dist (function) – The distance used as a local cost measure. None defaults to the squared euclidean distance.
  • **kwargs
  • below (See) –
  • do_sign_norm (bool) -- (*) – If True the signals will be normalized before computing the DTW, (default: False)
  • do_dist_norm (bool) -- (*) – If True the DTW distance will be normalized by dividing the summation of the path dimension. (default: True)
  • window (String) -- (*) – Selects the global window constrains. Available options are None and sakoe-chiba. (default: None)
  • factor (Float) -- (*) – Selects the global constrain factor. (default: min(xl, yl) * .50)
Returns:

  • d (float) – The DTW distance.
  • ac (nd-array) – The accumulated cost matrix.
  • path (nd-array) – The optimal warping path between the two sequences.

tssearch.distances.elastic_distances.lcss(x, y, eps=1, **kwargs)[source]

Computes the Longest Common Subsequence (LCSS) distance between two numeric time series.

Parameters:
  • x (nd-array) – Time series x (query).
  • y (nd-array) – Time series y.
  • eps (float) – Amplitude matching threshold.
  • **kwargs
  • below (See) –
  • window (String) -- (*) – Selects the global window constrains. Available options are None and sakoe-chiba. (default: None)
Returns:

  • d (float) – The LCSS distance.
  • ac (nd-array) – The similarity matrix.
  • path (nd-array) – The optimal path between the two sequences.

tssearch.distances.elastic_distances.twed(x, y, tx, ty, nu=0.001, lmbda=1.0, p=2, report='distance')[source]

Computes Time Warp Edit Distance (TWED) of two time series.

Reference :
Marteau, P.; F. (2009). “Time Warp Edit Distance with Stiffness Adjustment for Time Series Matching”. IEEE Transactions on Pattern Analysis and Machine Intelligence. 31 (2): 306–318. arXiv:cs/0703033 http://people.irisa.fr/Pierre-Francois.Marteau/
Parameters:
  • x (nd-array) – Time series x (query).
  • y (nd-array) – Time series y.
  • tx (nd-array) – Time stamp time series x.
  • ty (nd-array) – Time stamp time series y.
  • nu (int) –
    Stiffness parameter (nu >= 0)
    nu = 0, TWED distance measure on amplitude. nu > 0, TWED distance measure on amplitude x time.
  • lmbda (int) – Penalty for deletion operation (lmbda >= 0).
  • p (int) – Lp norm distance degree for local cost computation.
  • report (str) – distance, cost matrix, path.
Returns:

  • d (float) – The TWED distance.
  • ac (nd-array) – The accumulated cost matrix.
  • path (nd-array) – The optimal warping path between the two sequences.

Time Distances

tssearch.distances.time_distances.tam(x, y)[source]

Calculates the Time Alignment Measurement (TAM) based on an optimal warping path between two time series.

Reference: Folgado et. al, Time Alignment Measurement for Time Series, 2016.

Parameters:
  • x (nd-array) – Time series x.
  • y (nd-array) – Time series y.
Returns:

  • In case report=instants the number of indexes in advance, delay and phase
  • will be returned.
  • For report=ratios, the ratio of advance, delay and phase.
  • will be returned. In case report=distance, only the TAM will be returned.

Authors

This package is being developed and maintained by Fraunhofer AICOS.

FhP-AICOS

TSSEARCH was written in collaboration with Cognitive Systems Lab of University of Bremen.

Lead Development Team

Contributors

  • Hui Liu
  • Tanja Schultz
  • Yale Hartmann

Changelog

Version 0.1.3

  • Fixed a bug on the setup.py to correctly build the PyPI package
  • Removed novainstrumentation from dependencies
  • Fixed a bug on TWED distance (#7)

Version 0.1.0

  • Release of TSSEARCH with documentation

License

BSD 3-Clause License

Copyright (c) 2022, Associação Fraunhofer Portugal Research
All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:

1. Redistributions of source code must retain the above copyright notice, this
   list of conditions and the following disclaimer.

2. Redistributions in binary form must reproduce the above copyright notice,
   this list of conditions and the following disclaimer in the documentation
   and/or other materials provided with the distribution.

3. Neither the name of the copyright holder nor the names of its
   contributors may be used to endorse or promote products derived from
   this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Installation

This packages is available on PyPI:

$ pip install tssearch

Get started

The code below segments a 10 s electrocardiography record:

import tssearch

# Load the query, (optional) weight vector and sequence
data = tssearch.load_ecg_example()

# Selects the Dynamic Time Warping (DTW) as the distance for the segmentation
cfg = tssearch.get_distance_dict(["Dynamic Time Warping"])

# Performs the segmentation
out = tssearch.time_series_segmentation(cfg, data['query'], data['sequence'], data['weight'])

Indices and tables