Covariance Utilities¶

dca.cov_util.block_toeplitz_covariance(X, S_X, T, max_iter=100, tol=1e-06)[source]¶

Estimate the ML block toeplitz covariance matrix using:

Fuhrmann, Daniel R., and T. A. Barton. “Estimation of block-Toeplitz covariance matrices.” 1990 Conference Record Twenty-Fourth Asilomar Conference on Signals, Systems and Computers.

Parameters:

X (ndarray (time, features) or (batches, time, features)) – Data used to calculate the PI. Can be None if S_X is given.
S_X (ndarray) – Sample covariance matrix. Can be None if X is given.
T (int) – This T should be 2 * T_pi. This T sets the joint window length not the past or future window length.
max_iter (int) – Maximum number of EM iterations.
tol (int) – LL tolerance for stopping EM iterations.

dca.cov_util.calc_block_toeplitz_logdets(cross_cov_mats, proj=None)[source]¶

Calculates logdets which can be used to calculate predictive information or entropy for a spatiotemporal Gaussian process with T N-by-N cross-covariance matrices using the block-Toeplitz algorithm.

Based on: Sowell, Fallaw. “A decomposition of block toeplitz matrices with applications to vector time series.” 1989a). Unpublished manuscript (1989).

Parameters:

cross_cov_mats (np.ndarray, shape (T, N, N)) – Cross-covariance matrices: cross_cov_mats[dt] is the cross-covariance between X(t) and X(t+dt), where each of X(t) and X(t+dt) is a N-dimensional vector.
proj (np.ndarray, shape (N, d), optional) – If provided, the N-dimensional data are projected onto a d-dimensional basis given by the columns of proj. Then, the mutual information is computed for this d-dimensional timeseries.

Returns:

lodgets – T logdets.

Return type:

list

dca.cov_util.calc_chunked_cov(X, T, stride, chunks, cov_est=None, rng=None, stride_tricks=True)[source]¶

Calculate an unormalized (by sample count) lagged covariance matrix in chunks to save memory.

Parameters:

X (np.ndarray, shape (# time-steps, N)) – The N-dimensional time series data from which the cross-covariance matrices are computed.
T (int) – The number of time lags.
stride (int) – The number of time-points to skip between samples.
chunks (int) – Number of chunks to break the data into when calculating the lagged cross covariance. More chunks will mean less memory used
cov_est (ndarray) – Current estimate of unnormalized cov_est to be added to.

Returns:

cov_est (ndarray) – Current covariance estimate.
n_samples – How many samples were used.

dca.cov_util.calc_cov_from_cross_cov_mats(cross_cov_mats)[source]¶

Calculates the N*T-by-N*T spatiotemporal covariance matrix based on T N-by-N cross-covariance matrices.

Parameters:: cross_cov_mats (np.ndarray, shape (T, N, N)) – Cross-covariance matrices: cross_cov_mats[dt] is the cross-covariance between X(t) and X(t+dt), where each of X(t) and X(t+dt) is a N-dimensional vector.
Returns:: cov – Big covariance matrix, stationary in time by construction.
Return type:: np.ndarray, shape (N*T, N*T)

dca.cov_util.calc_cross_cov_mats_from_cov(cov, T, N)[source]¶

Calculates T N-by-N cross-covariance matrices given a N*T-by-N*T spatiotemporal covariance matrix by averaging over off-diagonal cross-covariance blocks with constant |t1-t2|. :param N: Numbner of spatial dimensions. :type N: int :param T: Number of time-lags. :type T: int :param cov: Spatiotemporal covariance matrix. :type cov: np.ndarray, shape (N*T, N*T)

Returns:: cross_cov_mats – Cross-covariance matrices.
Return type:: np.ndarray, shape (T, N, N)

dca.cov_util.calc_cross_cov_mats_from_data(X, T, mean=None, chunks=None, stride=1, rng=None, regularization=None, reg_ops=None, stride_tricks=True, logger=None, method='toeplitzify')[source]¶

Compute the N-by-N cross-covariance matrix, where N is the data dimensionality, for each time lag up to T-1.

Parameters:

X (np.ndarray, shape (# time-steps, N)) – The N-dimensional time series data from which the cross-covariance matrices are computed.
T (int) – The number of time lags.
chunks (int) – Number of chunks to break the data into when calculating the lagged cross covariance. More chunks will mean less memory used
stride (int or float) – If stride is an int, it defines the stride between lagged samples used to estimate the cross covariance matrix. Setting stride > 1 can speed up the calculation, but may lead to a loss in accuracy. Setting stride to a float greater than 0 and less than 1 will random subselect samples.
rng (NumPy random state) – Only used if stride is a float.
regularization (string) – Regularization method for computing the spatiotemporal covariance matrix.
reg_ops (dict) – Paramters for regularization.
stride_tricks (bool) – Whether to use numpy stride tricks in form_lag_matrix. True will use less memory for large T.
logger (logger) – Logger.
method (str) – ‘ML’ for EM-based maximum likelihood block toeplitz estimation, ‘toeplitzify’ for naive block averaging.

Returns:

cross_cov_mats – Cross-covariance matrices. cross_cov_mats[dt] is the cross-covariance between X(t) and X(t+dt), where X(t) is an N-dimensional vector.

Return type:

np.ndarray, shape (T, N, N), float

dca.cov_util.calc_pi_from_cov(cov_2_T_pi)[source]¶

Calculates the Gaussian Predictive Information between variables {1,…,T_pi} and {T_pi+1,…,2*T_pi} with covariance matrix cov_2_T_pi.

Parameters:: cov_2_T_pi (np.ndarray, shape (2*T_pi, 2*T_pi)) – Covariance matrix.
Returns:: PI – Mutual information in nats.
Return type:: float

dca.cov_util.calc_pi_from_cross_cov_mats(cross_cov_mats, proj=None)[source]¶

Calculates predictive information for a spatiotemporal Gaussian process with T-1 N-by-N cross-covariance matrices.

Parameters:

cross_cov_mats (np.ndarray, shape (T, N, N)) – Cross-covariance matrices: cross_cov_mats[dt] is the cross-covariance between X(t) and X(t+dt), where each of X(t) and X(t+dt) is a N-dimensional vector.
proj (np.ndarray, shape (N, d), optional) – If provided, the N-dimensional data are projected onto a d-dimensional basis given by the columns of proj. Then, the mutual information is computed for this d-dimensional timeseries.

Returns:

PI – Mutual information in nats.

Return type:

float

dca.cov_util.calc_pi_from_cross_cov_mats_block_toeplitz(cross_cov_mats, proj=None)[source]¶

Calculates predictive information for a spatiotemporal Gaussian process with T-1 N-by-N cross-covariance matrices using the block-Toeplitz algorithm.

Based on: Sowell, Fallaw. “A decomposition of block toeplitz matrices with applications to vector time series.” 1989a). Unpublished manuscript (1989).

Parameters:

cross_cov_mats (np.ndarray, shape (T, N, N)) – Cross-covariance matrices: cross_cov_mats[dt] is the cross-covariance between X(t) and X(t+dt), where each of X(t) and X(t+dt) is a N-dimensional vector.
proj (np.ndarray, shape (N, d), optional) – If provided, the N-dimensional data are projected onto a d-dimensional basis given by the columns of proj. Then, the mutual information is computed for this d-dimensional timeseries.

Returns:

PI – Mutual information in nats.

Return type:

float

dca.cov_util.calc_pi_from_data(X, T, proj=None, stride=1, rng=None)[source]¶

Calculates the Gaussian Predictive Information between variables {1,…,T_pi} and {T_pi+1,…,2*T_pi}..

Parameters:

X (ndarray or torch tensor (time, features) or (batches, time, features)) – Data used to calculate the PI.
T (int) – This T should be 2 * T_pi. This T sets the joint window length not the past or future window length.
proj (ndarray or torch tensor) – Projection matrix for data (optional). If proj is not given, the PI of the dataset is given.
stride (int or float) – If stride is an int, it defines the stride between lagged samples used to estimate the cross covariance matrix. Setting stride > 1 can speed up the calculation, but may lead to a loss in accuracy. Setting stride to a float greater than 0 and less than 1 will random subselect samples.
rng (NumPy random state) – Only used if stride is a float.

Returns:

PI – Mutual information in nats.

Return type:

float

dca.cov_util.extract_diag_blocks(cov, Q)[source]¶

Extract the diagonal blocks from a matrix.

Parameters:

cov (ndarray (Q*P, Q*P)) – Matrix to extract diagonal blocks from.
Q (int) – The number of blocks.

dca.cov_util.form_lag_matrix(X, T, stride=1, stride_tricks=True, rng=None, writeable=False)[source]¶

Form the data matrix with T lags.

Parameters:

X (ndarray (n_time, N)) – Timeseries with no lags.
T (int) – Number of lags.
stride (int or float) – If stride is an int, it defines the stride between lagged samples used to estimate the cross covariance matrix. Setting stride > 1 can speed up the calculation, but may lead to a loss in accuracy. Setting stride to a float greater than 0 and less than 1 will random subselect samples.
rng (NumPy random state) – Only used if stride is a float.
stride_tricks (bool) – Whether to use numpy stride tricks to form the lagged matrix or create a new array. Using numpy stride tricks can can lower memory usage, especially for large T. If False, a new array is created.
writeable (bool) – For testing. You should not need to set this to True. This function uses stride tricks to form the lag matrix which means writing to the array will have confusing behavior. If stride_tricks is False, this flag does nothing.

Returns:

X_with_lags – Timeseries with lags.

Return type:

ndarray (n_lagged_time, N * T)

class dca.cov_util.memoized(func)[source]¶

Decorator for memoization. From: https://wiki.python.org/moin/PythonDecoratorLibrary.

Caches a function’s return value each time it is called. If called later with the same arguments, the cached value is returned (not reevaluated).

dca.cov_util.one_step(S_X, R_X, R_Y, A, A_R_Y, Q)[source]¶

Perform one EM step for ML block toeplitz covariance estimation.

Parameters:

S_X (ndarray) – Sample covariance matrix.
R_X (ndarray) – Observed covariance matrix.
R_Y (ndarray) – Latent covariance matrix.
A (ndarray) – Projection from latent to observed variables.
A_R_Y (ndarray) – Precomputed A @ R_Y.

dca.cov_util.project_cross_cov_mats(cross_cov_mats, proj)[source]¶

Projects the cross covariance matrices.

Parameters:

cross_cov_mats (np.ndarray, shape (T, N, N)) – Cross-covariance matrices: cross_cov_mats[dt] is the cross-covariance between X(t) and X(t+dt), where each of X(t) and X(t+dt) is a N-dimensional vector.
proj (np.ndarray, shape (N, d), optional) – If provided, the N-dimensional data are projected onto a d-dimensional basis given by the columns of proj. Then, the mutual information is computed for this d-dimensional timeseries.

Returns:

cross_cov_mats_proj – Projected cross covariances matrices.

Return type:

ndarray, shape (T, d, d)

dca.cov_util.rectify_spectrum(cov, epsilon=1e-06, logger=None)[source]¶

Rectify the spectrum of a covariance matrix.

Parameters:

cov (ndarray) – Covariance matrix
epsilon (float) – Minimum eigenvalue for the rectified spectrum.
verbose (bool) – Whethere to print when the spectrum needs to be rectified.

dca.cov_util.toeplitzify(cov, T, N, symmetrize=True)[source]¶

Make a matrix block-Toeplitz by averaging along the block diagonal.

Parameters:

cov (ndarray (T*N, T*N)) – Covariance matrix to make block toeplitz.
T (int) – Number of blocks.
N (int) – Number of features per block.
symmetrize (bool) – Whether to ensure that the whole matrix is symmetric. Optional (default=True).

Returns:

cov_toep – Toeplitzified matrix.

Return type:

ndarray (T*N, T*N)