Dynamical Components Analysis Classes

class dca.dca.DynamicalComponentsAnalysis(d=None, T=None, init='random_ortho', n_init=1, stride=1, chunk_cov_estimate=None, tol=1e-06, ortho_lambda=10.0, verbose=False, block_toeplitz=None, method='toeplitzify', device='cpu', dtype=torch.float64, rng_or_seed=None)[source]

Dynamical Components Analysis.

Runs DCA on multidimensional timeseries data X to discover a projection onto a d-dimensional subspace of an N-dimensional space which maximizes the complexity, as defined by the Gaussian Predictive Information (PI) of the d-dimensional dynamics over windows of length T.

Parameters:
  • d (int) – Number of basis vectors onto which the data X are projected.

  • T (int) – Size of time windows across which to compute mutual information. Total window length will be 2 * T. When fitting a model, the length of the shortest timeseries must be greater than 2 * T and for good performance should be much greater than 2 * T.

  • init (str) – Options: “random_ortho”, “random”, or “PCA” Method for initializing the projection matrix.

  • n_init (int) – Number of random restarts. Default is 1.

  • stride (int) – Number of samples to skip when estimating cross covariance matrices. Settings stride > 1 will speedup covariance estimation but may reduce the quality of the covariance estimate for small datasets.

  • chunk_cov_estimate (None or int) – If None, cov is estimated from entire time series. If an int, cov is estimated by chunking up time series and averaging covariances from chucks. This can use less memory and be faster for long timeseries. Requires that the length of the shortest timeseries in the batch is longer than 2 * T * chunk_cov_estimate.

  • tol (float) – Tolerance for stopping optimization. Default is 1e-6.

  • ortho_lambda (float) – Coefficient on term that keeps V close to orthonormal.

  • verbose (bool) – Verbosity during optimization.

  • use_scipy (bool) – Whether to use SciPy or Pytorch L-BFGS-B. Default is True. Pytorch is not well tested.

  • block_toeplitz (bool) – If True, uses the block-Toeplitz logdet algorithm which is typically faster and less memory intensive on cpu for T >~ 10 and d >~ 40.

  • method (str) – ‘toeplitzify’ for naive averaging to compute the covariance, which is faster but less accurate for very small datasets. ‘ML’ for maximum likelihood block toeplitz covariance estimation which can be very slow for large datasets.

  • device (str) – What device to run the computation on in Pytorch.

  • dtype (pytorch.dtype) – What dtype to use for computation.

  • rng_or_seed (None, int, or NumPy RandomState) – Random number generator or seed.

T

Default T used for PI.

Type:

int

T_fit

T used for last cross covariance estimation.

Type:

int

d

Default d used for fitting the projection.

Type:

int

d_fit

d used for last projection fit.

Type:

int

cross covs

Cross covariance matrices from the last covariance estimation.

Type:

torch tensor

coef_

Projection matrix from fit.

Type:

ndarray (N, d)

estimate_data_statistics(X, T=None, regularization=None, reg_ops=None)[source]

Estimate the cross covariance matrix from data.

Parameters:
  • X (ndarray or list of ndarrays) – Data to estimate the cross covariance matrix.

  • T (int) – T for PI calculation (optional).

  • regularization (str) – Whether to regularize cross covariance estimation.

  • reg_ops (dict) – Options for cross covariance regularization.

score(X=None)[source]

Calculate the PI of data for the DCA projection.

Parameters:

X (ndarray or list) – Optional. If X is none, calculate PI from the training data. If X is given, calcuate the PI of X for the learned projections.

class dca.dca.DynamicalComponentsAnalysisFFT(d=None, T=None, init='random_ortho', n_init=1, tol=1e-06, ortho_lambda=10.0, verbose=False, device='cpu', dtype=torch.float64, rng_or_seed=None)[source]

Dynamical Components Analysis using FFT for PI calculation.

Currently only well-defined for d=1.

Runs DCA on multidimensional timeseries data X to discover a projection onto a d-dimensional subspace which maximizes the dynamical complexity.

Parameters:
  • d (int) – Number of basis vectors onto which the data X are projected.

  • T (int) – Size of time windows across which to compute mutual information.

  • init (string) – Options: “random”, “PCA” Method for initializing the projection matrix.

score(X)[source]

Calculate the PI of data for the DCA projection.

Parameters:

X (ndarray or list) –

dca.dca.build_loss(cross_cov_mats, d, ortho_lambda=1.0, block_toeplitz=False)[source]

Constructs a loss function which gives the (negative) predictive information in the projection of multidimensional timeseries data X onto a d-dimensional basis, where predictive information is computed using a stationary Gaussian process approximation.

Parameters:
  • X (np.ndarray, shape (# time-steps, N)) – The multidimensional time series data from which the mutual information is computed.

  • d (int) – Number of basis vectors onto which the data X are projected.

  • ortho_lambda (float) – Regularization hyperparameter.

Returns:

loss – Loss function which accepts a (flattened) N-by-d matrix, whose columns are basis vectors, and outputs the negative predictive information corresponding to that projection (plus regularization term).

Return type:

function