fennomix_mhc.plotting_utils module¶

Functions:

`adjust_axes`(logo_plots, max_y)	Adjusts y-axis limits of logo plots to a common maximum.
`count_motif_bits`(df, allele_col, allele, kmer)	Counts amino acid frequencies at each position and computes information content.
`fit_hla_umap_reducer`(hla_embeds[, random_state])	Fits a UMAP reducer on HLA embeddings.
`plot_motif`(df, allele_col, allele, kmer[, ...])	Plots a sequence logo for a specific allele and k-mer length.
`plot_motif_multi_mer`(df, allele_col, allele, ...)	Plots sequence logos for multiple k-mers (lengths) of a given allele.
`plot_umap_df`(df, color_col, hover_col[, ...])	Creates an interactive UMAP scatter plot using Plotly.
`select_optimal_a_cover`(a_to_b_map, ...)	Greedy selection of 'a' elements to maximize coverage of 'b' elements.
`transform_embeds_to_tSNE_df`(embeds, labels, seed)	Applies t-SNE transformation on high-dimensional embeddings.
`transform_embeds_to_umap_df`(hla_reducer, ...)	Transforms embeddings using a pre-fitted UMAP reducer into a 2D DataFrame.
`transform_matrix_to_mds_df`(matrix, labels, seed)	Applies MDS transformation on a precomputed distance matrix.

fennomix_mhc.plotting_utils.adjust_axes(logo_plots, max_y)[source][source]¶

Adjusts y-axis limits of logo plots to a common maximum.

Parameters:

logo_plots – List of Logomaker Logo objects (or nested lists).
max_y – Target maximum y-value for all plots.

fennomix_mhc.plotting_utils.count_motif_bits(df, allele_col, allele, kmer, logo_scale=20)[source][source]¶

Counts amino acid frequencies at each position and computes information content.

Parameters:

df – Input DataFrame with sequences.
allele_col – Column name for allele.
allele – Allele to filter.
kmer – Peptide length to consider.
logo_scale – Scaling factor for log-odds (controls logo height).

Returns:

A DataFrame of information content per position (rows) and amino acid (columns).

fennomix_mhc.plotting_utils.fit_hla_umap_reducer(hla_embeds, random_state=1337)[source][source]¶

Fits a UMAP reducer on HLA embeddings.

Parameters:

hla_embeds – A 2D numpy array of shape (n_samples, n_features) containing HLA embeddings.
random_state – Random seed for reproducibility. Default is 1337.

Returns:

A fitted UMAP reducer object.

fennomix_mhc.plotting_utils.plot_motif(df, allele_col, allele, kmer, ax=None, logo_scale=20)[source][source]¶

Plots a sequence logo for a specific allele and k-mer length.

Parameters:

df – Input DataFrame with sequences and allele info.
allele_col – Column name for allele.
allele – Allele name to filter.
kmer – Length of peptide to analyze.
ax – Matplotlib axis to plot on. If None, uses current axis.
logo_scale – Scaling factor for motif information content.

Returns:

A Logomaker Logo object.

fennomix_mhc.plotting_utils.plot_motif_multi_mer(df, allele_col, allele, kmers, axes=None, logo_scale=20, fig_width_per_kmer=4, fig_height=3)[source][source]¶

Plots sequence logos for multiple k-mers (lengths) of a given allele.

Parameters:

df – Input DataFrame with ‘sequence’ and allele columns.
allele_col – Name of the column containing allele identifiers.
allele – Specific allele to plot.
kmers – List of peptide lengths (k-mer sizes) to visualize.
axes – Optional matplotlib axes to plot on. If None, creates new subplots.
logo_scale – Scaling factor for information content in logos.
fig_width_per_kmer – Width per subplot in inches.
fig_height – Height of the entire figure in inches.

Returns:

A list of Logomaker Logo objects.

fennomix_mhc.plotting_utils.plot_umap_df(df, color_col, hover_col, size=1, jump_color=True, image_width=700, image_height=600, save_as='')[source][source]¶

Creates an interactive UMAP scatter plot using Plotly.

Parameters:

df (DataFrame) – Input DataFrame containing UMAP1, UMAP2, and metadata columns.
color_col – Column name to use for coloring points.
hover_col – Column name to display in hover tooltip.
size – Marker size.
jump_color – Whether to use spaced-out colors from Turbo256 palette.
image_width – Width of the output image in pixels.
image_height – Height of the output image in pixels.
save_as – Optional file path to save the image (e.g., “plot.png”).

Returns:

A Plotly Figure object.

fennomix_mhc.plotting_utils.select_optimal_a_cover(a_to_b_map, uncovered_b, coverage_threshold, max_a_elements)[source][source]¶

Greedy selection of ‘a’ elements to maximize coverage of ‘b’ elements.

Implements a greedy set cover algorithm: selects ‘a’ such that their mapped ‘b’ set covers the most uncovered elements iteratively.

Parameters:

a_to_b_map – Mapping from elements of set A to sets of elements in B.
uncovered_b – Initial set of uncovered elements in B.
coverage_threshold – Target fraction of B to cover (e.g., 0.9 for 90%).
max_a_elements – Maximum number of ‘a’ elements to select.

Returns:

A set of selected ‘a’ elements.

fennomix_mhc.plotting_utils.transform_embeds_to_tSNE_df(embeds, labels, seed)[source][source]¶

Applies t-SNE transformation on high-dimensional embeddings.

Parameters:

embeds – A 2D numpy array of shape (n_samples, n_features).
labels – A sequence of labels for each sample.
seed – Random state for reproducibility.

Returns:

A pandas DataFrame with columns [‘t-SNE 1’, ‘t-SNE 2’, ‘label’].

fennomix_mhc.plotting_utils.transform_embeds_to_umap_df(hla_reducer, embeds, alleles)[source][source]¶

Transforms embeddings using a pre-fitted UMAP reducer into a 2D DataFrame.

Parameters:

hla_reducer – A fitted UMAP reducer object.
embeds – A 2D numpy array of shape (n_samples, n_features) to be transformed.
alleles – A sequence of allele names corresponding to each embedding.

Returns:

A pandas DataFrame with columns [‘UMAP1’, ‘UMAP2’, ‘allele’].

fennomix_mhc.plotting_utils.transform_matrix_to_mds_df(matrix, labels, seed)[source][source]¶

Applies MDS transformation on a precomputed distance matrix.

Parameters:

matrix – A 2D square distance matrix of shape (n_samples, n_samples).
labels – A sequence of labels for each sample.
seed – Random state for reproducibility.

Returns:

A pandas DataFrame with columns [‘MDS1’, ‘MDS2’, ‘label’].