fennomix_mhc.plotting_utils module

Functions:

adjust_axes(logo_plots, max_y)

Adjusts y-axis limits of logo plots to a common maximum.

count_motif_bits(df, allele_col, allele, kmer)

Counts amino acid frequencies at each position and computes information content.

fit_hla_umap_reducer(hla_embeds[, random_state])

Fits a UMAP reducer on HLA embeddings.

plot_motif(df, allele_col, allele, kmer[, ...])

Plots a sequence logo for a specific allele and k-mer length.

plot_motif_multi_mer(df, allele_col, allele, ...)

Plots sequence logos for multiple k-mers (lengths) of a given allele.

plot_umap_df(df, color_col, hover_col[, ...])

Creates an interactive UMAP scatter plot using Plotly.

select_optimal_a_cover(a_to_b_map, ...)

Greedy selection of 'a' elements to maximize coverage of 'b' elements.

transform_embeds_to_tSNE_df(embeds, labels, seed)

Applies t-SNE transformation on high-dimensional embeddings.

transform_embeds_to_umap_df(hla_reducer, ...)

Transforms embeddings using a pre-fitted UMAP reducer into a 2D DataFrame.

transform_matrix_to_mds_df(matrix, labels, seed)

Applies MDS transformation on a precomputed distance matrix.

fennomix_mhc.plotting_utils.adjust_axes(logo_plots, max_y)[source][source]

Adjusts y-axis limits of logo plots to a common maximum.

Parameters:
  • logo_plots – List of Logomaker Logo objects (or nested lists).

  • max_y – Target maximum y-value for all plots.

fennomix_mhc.plotting_utils.count_motif_bits(df, allele_col, allele, kmer, logo_scale=20)[source][source]

Counts amino acid frequencies at each position and computes information content.

Parameters:
  • df – Input DataFrame with sequences.

  • allele_col – Column name for allele.

  • allele – Allele to filter.

  • kmer – Peptide length to consider.

  • logo_scale – Scaling factor for log-odds (controls logo height).

Returns:

A DataFrame of information content per position (rows) and amino acid (columns).

fennomix_mhc.plotting_utils.fit_hla_umap_reducer(hla_embeds, random_state=1337)[source][source]

Fits a UMAP reducer on HLA embeddings.

Parameters:
  • hla_embeds – A 2D numpy array of shape (n_samples, n_features) containing HLA embeddings.

  • random_state – Random seed for reproducibility. Default is 1337.

Returns:

A fitted UMAP reducer object.

fennomix_mhc.plotting_utils.plot_motif(df, allele_col, allele, kmer, ax=None, logo_scale=20)[source][source]

Plots a sequence logo for a specific allele and k-mer length.

Parameters:
  • df – Input DataFrame with sequences and allele info.

  • allele_col – Column name for allele.

  • allele – Allele name to filter.

  • kmer – Length of peptide to analyze.

  • ax – Matplotlib axis to plot on. If None, uses current axis.

  • logo_scale – Scaling factor for motif information content.

Returns:

A Logomaker Logo object.

fennomix_mhc.plotting_utils.plot_motif_multi_mer(df, allele_col, allele, kmers, axes=None, logo_scale=20, fig_width_per_kmer=4, fig_height=3)[source][source]

Plots sequence logos for multiple k-mers (lengths) of a given allele.

Parameters:
  • df – Input DataFrame with ‘sequence’ and allele columns.

  • allele_col – Name of the column containing allele identifiers.

  • allele – Specific allele to plot.

  • kmers – List of peptide lengths (k-mer sizes) to visualize.

  • axes – Optional matplotlib axes to plot on. If None, creates new subplots.

  • logo_scale – Scaling factor for information content in logos.

  • fig_width_per_kmer – Width per subplot in inches.

  • fig_height – Height of the entire figure in inches.

Returns:

A list of Logomaker Logo objects.

fennomix_mhc.plotting_utils.plot_umap_df(df, color_col, hover_col, size=1, jump_color=True, image_width=700, image_height=600, save_as='')[source][source]

Creates an interactive UMAP scatter plot using Plotly.

Parameters:
  • df (DataFrame) – Input DataFrame containing UMAP1, UMAP2, and metadata columns.

  • color_col – Column name to use for coloring points.

  • hover_col – Column name to display in hover tooltip.

  • size – Marker size.

  • jump_color – Whether to use spaced-out colors from Turbo256 palette.

  • image_width – Width of the output image in pixels.

  • image_height – Height of the output image in pixels.

  • save_as – Optional file path to save the image (e.g., “plot.png”).

Returns:

A Plotly Figure object.

fennomix_mhc.plotting_utils.select_optimal_a_cover(a_to_b_map, uncovered_b, coverage_threshold, max_a_elements)[source][source]

Greedy selection of ‘a’ elements to maximize coverage of ‘b’ elements.

Implements a greedy set cover algorithm: selects ‘a’ such that their mapped ‘b’ set covers the most uncovered elements iteratively.

Parameters:
  • a_to_b_map – Mapping from elements of set A to sets of elements in B.

  • uncovered_b – Initial set of uncovered elements in B.

  • coverage_threshold – Target fraction of B to cover (e.g., 0.9 for 90%).

  • max_a_elements – Maximum number of ‘a’ elements to select.

Returns:

A set of selected ‘a’ elements.

fennomix_mhc.plotting_utils.transform_embeds_to_tSNE_df(embeds, labels, seed)[source][source]

Applies t-SNE transformation on high-dimensional embeddings.

Parameters:
  • embeds – A 2D numpy array of shape (n_samples, n_features).

  • labels – A sequence of labels for each sample.

  • seed – Random state for reproducibility.

Returns:

A pandas DataFrame with columns [‘t-SNE 1’, ‘t-SNE 2’, ‘label’].

fennomix_mhc.plotting_utils.transform_embeds_to_umap_df(hla_reducer, embeds, alleles)[source][source]

Transforms embeddings using a pre-fitted UMAP reducer into a 2D DataFrame.

Parameters:
  • hla_reducer – A fitted UMAP reducer object.

  • embeds – A 2D numpy array of shape (n_samples, n_features) to be transformed.

  • alleles – A sequence of allele names corresponding to each embedding.

Returns:

A pandas DataFrame with columns [‘UMAP1’, ‘UMAP2’, ‘allele’].

fennomix_mhc.plotting_utils.transform_matrix_to_mds_df(matrix, labels, seed)[source][source]

Applies MDS transformation on a precomputed distance matrix.

Parameters:
  • matrix – A 2D square distance matrix of shape (n_samples, n_samples).

  • labels – A sequence of labels for each sample.

  • seed – Random state for reproducibility.

Returns:

A pandas DataFrame with columns [‘MDS1’, ‘MDS2’, ‘label’].