pensa.statesinfo
Methods for state-specific information.
The methods here are based on the following paper:
Neil J. Thomson, Owen N. Vickery, Callum M. Ives, Ulrich Zachariae:Ion-water coupling controls class A GPCR signal transduction pathways.
pensa.statesinfo.discrete_states
- calculate_entropy(state_limits, distribution_list)
Calculate the Shannon entropy of a distribution as the summation of all -p*log(p) where p refers to the probability of a conformational state.
- Parameters:
state_limits (list of lists) – A list of values that represent the limits of each state for each distribution.
distribution_list (list of lists) – A list containing multivariate distributions (lists) for a particular residue or water
- Returns:
entropy – The Shannon entropy value
- Return type:
float
- calculate_entropy_multthread(state_limits, distribution_list, max_thread_no)
Calculate the Shannon entropy of a distribution as the summation of all -p*log(p) where p refers to the probability of a conformational state.
- Parameters:
state_limits (list of lists) – A list of values that represent the limits of each state for each distribution.
distribution_list (list of lists) – A list containing multivariate distributions (lists) for a particular residue or water
max_thread_no (int) – Maximum number of threads to use in the multi-threading.
- Returns:
entropy – The Shannon entropy value
- Return type:
float
- determine_state_limits(distr, traj1_len, gauss_bins=180, gauss_smooth=None, write_plots=None, write_name=None)
Cluster a distribution into discrete states with well-defined limits. The function handles both residue angle distributions and water distributions. For waters, the assignment of an additional non-angular state is performed if changes in pocket occupancy occur. The clustering requires that the distribution can be decomposed to a mixture of Gaussians.
- Parameters:
distr (list) – Distribution for specific feature.
gauss_bins (int, optional) – Number of histogram bins to assign for the clustering algorithm. The default is 180.
gauss_smooth (int, optional) – Number of bins to perform smoothing over. The default is ~10% of gauss_bins.
write_plots (bool, optional) – If true, visualise the states over the raw distribution. The default is None.
write_name (str, optional) – Filename for write_plots. The default is None.
- Returns:
State intersects for each cluster in numerical order.
- Return type:
list
- get_discrete_states(all_data_a, all_data_b, discretize='gaussian', pbc=True, h2o=False, write_plots=False)
Obtain list of state limits for each feature.
- Parameters:
all_data_a (float array) – Trajectory data from the first ensemble.
all_data_b (float array) – Trajectory data from the second ensemble.
discretize (str, optional) – Method for state discretization. Options are ‘gaussian’, which defines state limits by gaussian intersects, and ‘partition_values’, which defines state limits by partitioning all values in the data. The default is ‘gaussian’.
pbc (bool, optional) – If true, the apply periodic bounary corrections on angular distribution inputs. The input for periodic correction must be radians. The default is True.
h2o (bool, optional) – If true, the apply periodic bounary corrections for spherical angles with different periodicities. The default is False.
write_plots (bool, optional) – If true, visualise the states over the raw distribution. The default is False.
- Returns:
ssi_states – List of state limits for each feature.
- Return type:
list of list
- get_intersects(gaussians, distribution, Gauss_xvals, write_plots=None, write_name=None)
Obtain the intersects of a mixture of Gaussians which have been obtained from decomposing a distribution into Gaussians. Additional state limits are added at the beginning and end of the distribution.
- Parameters:
gaussians (list of lists) – A list of X gaussians.
distribution (list) – The distribution that Gaussians have been obtained from.
xline (list) – The x-axis linespace that the distribution spans.
write_plots (bool, optional) – If true, visualise the states over the raw distribution. The default is None.
write_name (str, optional) – Filename for write_plots. The default is None.
- Returns:
all_intersects – All the Gaussian intersects.
- Return type:
list
- smart_gauss_fit(distr, traj1_len, gauss_bins=180, gauss_smooth=None, write_name=None)
Obtaining the gaussians to fit the distribution into a Gaussian mix. Bin number automatically adjusted if the Gaussian fit experiences errors.
- Parameters:
distr (list) – Distribution of interest for the fitting.
gauss_bins (int, optional) – Bin the distribution into gauss_bin bins. The default is 180.
gauss_smooth (int, optional) – Smooth the distribution according to a Hanning window length of gauss_smooth. The default is ~10% of gauss_bins.
write_name (str, optional) – Used in warning to notify which feature has had binning altered during clustering. The default is None.
- Returns:
gaussians (list) – y-axis values for the Gaussian distribution.
xline (list) – x-axis values for the Gaussian distribution.