fennomix_mhc.plotting_utils module¶
Functions:
|
Adjusts y-axis limits of logo plots to a common maximum. |
|
Counts amino acid frequencies at each position and computes information content. |
|
Fits a UMAP reducer on HLA embeddings. |
|
Plots a sequence logo for a specific allele and k-mer length. |
|
Plots sequence logos for multiple k-mers (lengths) of a given allele. |
|
Creates an interactive UMAP scatter plot using Plotly. |
|
Greedy selection of 'a' elements to maximize coverage of 'b' elements. |
|
Applies t-SNE transformation on high-dimensional embeddings. |
|
Transforms embeddings using a pre-fitted UMAP reducer into a 2D DataFrame. |
|
Applies MDS transformation on a precomputed distance matrix. |
- fennomix_mhc.plotting_utils.adjust_axes(logo_plots, max_y)[source][source]¶
Adjusts y-axis limits of logo plots to a common maximum.
- Parameters:
logo_plots – List of Logomaker Logo objects (or nested lists).
max_y – Target maximum y-value for all plots.
- fennomix_mhc.plotting_utils.count_motif_bits(df, allele_col, allele, kmer, logo_scale=20)[source][source]¶
Counts amino acid frequencies at each position and computes information content.
- Parameters:
df – Input DataFrame with sequences.
allele_col – Column name for allele.
allele – Allele to filter.
kmer – Peptide length to consider.
logo_scale – Scaling factor for log-odds (controls logo height).
- Returns:
A DataFrame of information content per position (rows) and amino acid (columns).
- fennomix_mhc.plotting_utils.fit_hla_umap_reducer(hla_embeds, random_state=1337)[source][source]¶
Fits a UMAP reducer on HLA embeddings.
- Parameters:
hla_embeds – A 2D numpy array of shape (n_samples, n_features) containing HLA embeddings.
random_state – Random seed for reproducibility. Default is 1337.
- Returns:
A fitted UMAP reducer object.
- fennomix_mhc.plotting_utils.plot_motif(df, allele_col, allele, kmer, ax=None, logo_scale=20)[source][source]¶
Plots a sequence logo for a specific allele and k-mer length.
- Parameters:
df – Input DataFrame with sequences and allele info.
allele_col – Column name for allele.
allele – Allele name to filter.
kmer – Length of peptide to analyze.
ax – Matplotlib axis to plot on. If None, uses current axis.
logo_scale – Scaling factor for motif information content.
- Returns:
A Logomaker Logo object.
- fennomix_mhc.plotting_utils.plot_motif_multi_mer(df, allele_col, allele, kmers, axes=None, logo_scale=20, fig_width_per_kmer=4, fig_height=3)[source][source]¶
Plots sequence logos for multiple k-mers (lengths) of a given allele.
- Parameters:
df – Input DataFrame with ‘sequence’ and allele columns.
allele_col – Name of the column containing allele identifiers.
allele – Specific allele to plot.
kmers – List of peptide lengths (k-mer sizes) to visualize.
axes – Optional matplotlib axes to plot on. If None, creates new subplots.
logo_scale – Scaling factor for information content in logos.
fig_width_per_kmer – Width per subplot in inches.
fig_height – Height of the entire figure in inches.
- Returns:
A list of Logomaker Logo objects.
- fennomix_mhc.plotting_utils.plot_umap_df(df, color_col, hover_col, size=1, jump_color=True, image_width=700, image_height=600, save_as='')[source][source]¶
Creates an interactive UMAP scatter plot using Plotly.
- Parameters:
df (
DataFrame) – Input DataFrame containing UMAP1, UMAP2, and metadata columns.color_col – Column name to use for coloring points.
hover_col – Column name to display in hover tooltip.
size – Marker size.
jump_color – Whether to use spaced-out colors from Turbo256 palette.
image_width – Width of the output image in pixels.
image_height – Height of the output image in pixels.
save_as – Optional file path to save the image (e.g., “plot.png”).
- Returns:
A Plotly Figure object.
- fennomix_mhc.plotting_utils.select_optimal_a_cover(a_to_b_map, uncovered_b, coverage_threshold, max_a_elements)[source][source]¶
Greedy selection of ‘a’ elements to maximize coverage of ‘b’ elements.
Implements a greedy set cover algorithm: selects ‘a’ such that their mapped ‘b’ set covers the most uncovered elements iteratively.
- Parameters:
a_to_b_map – Mapping from elements of set A to sets of elements in B.
uncovered_b – Initial set of uncovered elements in B.
coverage_threshold – Target fraction of B to cover (e.g., 0.9 for 90%).
max_a_elements – Maximum number of ‘a’ elements to select.
- Returns:
A set of selected ‘a’ elements.
- fennomix_mhc.plotting_utils.transform_embeds_to_tSNE_df(embeds, labels, seed)[source][source]¶
Applies t-SNE transformation on high-dimensional embeddings.
- Parameters:
embeds – A 2D numpy array of shape (n_samples, n_features).
labels – A sequence of labels for each sample.
seed – Random state for reproducibility.
- Returns:
A pandas DataFrame with columns [‘t-SNE 1’, ‘t-SNE 2’, ‘label’].
- fennomix_mhc.plotting_utils.transform_embeds_to_umap_df(hla_reducer, embeds, alleles)[source][source]¶
Transforms embeddings using a pre-fitted UMAP reducer into a 2D DataFrame.
- Parameters:
hla_reducer – A fitted UMAP reducer object.
embeds – A 2D numpy array of shape (n_samples, n_features) to be transformed.
alleles – A sequence of allele names corresponding to each embedding.
- Returns:
A pandas DataFrame with columns [‘UMAP1’, ‘UMAP2’, ‘allele’].
- fennomix_mhc.plotting_utils.transform_matrix_to_mds_df(matrix, labels, seed)[source][source]¶
Applies MDS transformation on a precomputed distance matrix.
- Parameters:
matrix – A 2D square distance matrix of shape (n_samples, n_samples).
labels – A sequence of labels for each sample.
seed – Random state for reproducibility.
- Returns:
A pandas DataFrame with columns [‘MDS1’, ‘MDS2’, ‘label’].