Tracing Information Flow

Here we trace infromation flow throughout the receptor using SSI analysis tools. The SSI resolved in the ensemble comparison concludes which features couple their torsion angles to the ensemble conditions, thereby receiving information of the ensemble switch. However, to trace information flow we need to ensure that features couple to each other as well as to the ensemble condition.

We can trace information flow by calculating the feature-feature SSI using the function ssi_feature_analysis().

The Co-SSI statistic, which tells how much the communication between two features depends on the ensemble condition, can be calculated using cossi_featens_analysis().

Features and States

import os
import numpy as np
from pensa.features import *
from pensa.statesinfo import *
from pensa.comparison import *

First, load the structural features as described in the previous tutorial:

sim_a_rec = read_structure_features(
    "traj/condition-a_receptor.gro",
    "traj/condition-a_receptor.xtc"
)
sim_b_rec = read_structure_features(
    "traj/condition-b_receptor.gro",
    "traj/condition-b_receptor.xtc"
)
sim_a_rec_feat, sim_a_rec_data = sim_a_rec
sim_b_rec_feat, sim_b_rec_data = sim_b_rec

Then combine all backbone torsions from one residue to one multivariate feature. You can also analyze sidechain torsions, just replace bb by sc. Analysis of sidechain torsions takes much more time to run though so for this tutorial, we look only at backbone torsions.

bbtors_res_feat_a, bbtors_res_data_a = get_multivar_res(
    sim_a_rec_feat['bb-torsions'], sim_a_rec_data['bb-torsions']
)
bbtors_res_feat_b, bbtors_res_data_b = get_multivar_res(
    sim_b_rec_feat['bb-torsions'], sim_b_rec_data['bb-torsions']
)

From those residue-based multivariate features, we now determine the state boundaries.

bbtors_states = get_discrete_states(
    bbtors_res_data_a, bbtors_res_data_b
)

Let’s also featurize the water cavities in the system, using the grid we generated in the preprocessing tutorial.

grid = "ab_grid_OH2_density.dx"
water_feat_a, water_data_a = read_water_features(
    "traj/condition-a_water.gro", "traj/condition-a_water_aligned.xtc",
    top_waters = 5, atomgroup = "OH2", grid_input = grid
)
water_feat_b, water_data_b = read_water_features(
    "traj/condition-b_water.gro", "traj/condition-b_water.xtc",
    top_waters = 5, atomgroup = "OH2", grid_input = grid
)

Just like we did for torsions, we can now determine the discrete states for the orientation of the water molecules in the pockets.

water_states = get_discrete_states(
    water_data_a['WaterPocket_Distr'],
    water_data_b['WaterPocket_Distr'],
    discretize='gaussian', pbc=True
)

Water occupancy (is water in the pocket or not?) is described as a binary feature with the values 0 or 1. We can thus define the state boundaries manually.

water_occup_states = [[[-0.1, 0.5, 1.1]]] * len(water_states)

Water Pockets

We start by comparing the occupancy between the two conditions, similar to what we did in the comparison tutorial:

water_names, water_ssi = ssi_ensemble_analysis(
    water_feat_a['WaterPocket_OccupDistr'], water_feat_b['WaterPocket_OccupDistr'],
    water_data_a['WaterPocket_OccupDistr'], water_data_b['WaterPocket_OccupDistr'],
    water_occup_states, verbose=True, h2o=False, pbc=False
)

Except for water site O2, there are not many differences in occupancy, so let’s have a closer look and compare the orientation of the water molecules between the two conditions:

water_names, water_ssi = ssi_ensemble_analysis(
    water_feat_a['WaterPocket_Distr'], water_feat_b['WaterPocket_Distr'],
    water_data_a['WaterPocket_Distr'], water_data_b['WaterPocket_Distr'],
    water_states, verbose=True, h2o=True
)

Beyond comparing distributions, we can also quantify the amount of feature-feature communication between the water sites. The corresponding function looks similar to the ensemble comarison above, however it calculates the amount of shared information between each pair of residues.

water_pairs_names, water_pairs_ssi = ssi_feature_analysis(
    water_feat_a['WaterPocket_Distr'], water_feat_b['WaterPocket_Distr'],
    water_data_a['WaterPocket_Distr'], water_data_b['WaterPocket_Distr'],
    water_states, verbose=True, h2o=True
)

This function produces an array for the SSI between all features. This array is two-dimensional, so we visualize its values using a heat map:

from pensa.comparison import resnum_heatmap, pair_features_heatmap

pair_features_heatmap(
    water_pairs_names, water_pairs_ssi,
    "plots/water-pairs_ssi.pdf",
    vmin = 0.0, vmax = 1.,
    cbar_label='SSI',
    separator=' & '
)

Water and Torsions Combined

Now we want to investigate information flow between more than one type of features. To do so, we combine the water sites and the torsions:

all_feat_a = water_feat_a['WaterPocket_Distr'] + bbtors_res_feat_a
all_feat_b = water_feat_b['WaterPocket_Distr'] + bbtors_res_feat_b
all_data_a = np.array(list(water_data_a['WaterPocket_Distr']) + list(bbtors_res_data_a), dtype=object)
all_data_b = np.array(list(water_data_b['WaterPocket_Distr']) + list(bbtors_res_data_b), dtype=object)
all_states = water_states + bbtors_states

Note that we only use backbone torsions here to minimize computational effort. Analysis of sidechain torsions (or of both combined) can often deliver more scientific insigts.

As we did for the water sites above, we now calculate the SSI for all combined feature-feature pairs.

all_pairs_names, all_pairs_ssi = ssi_feature_analysis(
    all_feat_a, all_feat_b,
    all_data_a, all_data_b,
    all_states, verbose=True
)

The number of pairs for an entire protein is enormous so we determine those with the highest SSI. To alleviate the computational effort for the sort function, we first filter the pairs by a threshold SSI of 0.5:

relevant = np.abs(all_pairs_ssi) > 0.5
not_self = np.array([name.split(' & ')[0] != name.split(' & ')[1] for name in all_pairs_names])
relevant *= not_self
argrelev = np.argwhere(relevant).flatten()
all_relevant_pairs_names = [all_pairs_names[i] for i in argrelev]
all_relevant_pairs_ssi = all_pairs_ssi[relevant]

Then we run the actual sorting by SSI.

sort_features(all_relevant_pairs_names, all_relevant_pairs_ssi)

The Co-SSI feature-feature-ensemble analysis is done in the same manner.

all_new_pairs_names, all_new_pairs_ssi, all_new_pairs_cossi = cossi_featens_analysis(
    all_feat_a, all_feat_b, all_feat_a, all_feat_b,
    all_data_a, all_data_b, all_data_a, all_data_b,
    all_states, all_states, verbose=True
)

The output of cossi_featens_analysis() produces an array for the SSI and the Co-SSI between all features. These results can again be visualized in a 2D representation.