fennomix_mhc.tda_fmm module

Classes:

DecoyModel(gaussian_outlier_sigma, *args, ...)

A simplified model for fitting decoy score distributions.

TDA_fmm(n_components[, external_model])

Finite Mixture Model (FMM) for Target-Decoy Analysis (TDA).

Functions:

gamma_pdf(X, u, sigma)

Calculate Gamma probability density function values using mean and std.

gauss_pdf(X, u, sigma)

Calculate Gaussian probability density function values for input array X.

select_best_fmm(target_scores, decoy_fmm[, ...])

Selects the best TDA_fmm model by BIC criterion.

class fennomix_mhc.tda_fmm.DecoyModel(gaussian_outlier_sigma, *args, **kwargs)[source][source]

Bases: TDA_fmm

A simplified model for fitting decoy score distributions.

Uses a single Gaussian, optionally filtering outliers using sigma threshold.

Methods:

__init__(gaussian_outlier_sigma, *args, **kwargs)

Initializes the decoy model.

fit(X)

Fits a single Gaussian to the decoy scores.

pdf(X)

Computes Gaussian PDF for given scores.

__init__(gaussian_outlier_sigma, *args, **kwargs)[source][source]

Initializes the decoy model.

Parameters:
  • gaussian_outlier_sigma (float | None) – If provided, scores below (mu - sigma * gaussian_outlier_sigma) are filtered before fitting.

  • *args (Any) – Ignored, for compatibility.

  • **kwargs

    Ignored, for compatibility.

fit(X)[source][source]

Fits a single Gaussian to the decoy scores.

Optionally filters left-tail outliers before fitting.

Parameters:

X (ndarray | list[float]) – Decoy scores.

Return type:

None

pdf(X)[source][source]

Computes Gaussian PDF for given scores.

Parameters:

X (ndarray | list[float]) – Input scores.

Return type:

ndarray

Returns:

PDF values.

class fennomix_mhc.tda_fmm.TDA_fmm(n_components, external_model=None)[source][source]

Bases: object

Finite Mixture Model (FMM) for Target-Decoy Analysis (TDA).

This class estimates score distributions using a mixture of Gaussians. It supports modeling both target and decoy distributions, where the decoy model can be incorporated as an external component in the target model.

n_components

Number of Gaussian components in the mixture.

external_model

Optional fitted decoy model (for target modeling).

max_iter

Maximum number of EM iterations.

main_pdf

PDF function used for the first component.

helper_pdf

PDF function used for other components.

weights

Learned mixture weights (pi_k).

mu

Learned means for each component.

sigma

Learned standard deviations for each component.

Methods:

__init__(n_components[, external_model])

Initializes the TDA_fmm model.

fit(X)

Fits the FMM model using Expectation-Maximization (EM) algorithm.

get_pi0()

Returns the estimated proportion of decoy (null) components in the mixture.

loglik_BIC(X)

Computes log-likelihood and Bayesian Information Criterion (BIC).

pdf(X)

Computes the PDF of the main mixture components (excluding external model).

pdf_mix(X[, external_pdf])

Computes the full mixture PDF, including external model if present.

pep(X[, external_pdf])

Estimates Posterior Error Probabilities (PEP).

plot(title, plot_scores[, false_scores])

Plots the fitted mixture model against histogram of scores.

__init__(n_components, external_model=None)[source][source]

Initializes the TDA_fmm model.

Parameters:
  • n_components (int) – Number of Gaussian components in the mixture.

  • external_model (Optional[TDA_fmm]) – Pre-fitted decoy model. If None, models decoy; if provided, models target with decoy as a component.

fit(X)[source][source]

Fits the FMM model using Expectation-Maximization (EM) algorithm.

Parameters:

X (ndarray | list[float]) – Input scores to fit the model on.

Return type:

None

get_pi0()[source][source]

Returns the estimated proportion of decoy (null) components in the mixture.

Return type:

float

Returns:

pi0 value (between 0 and 1). Returns 0 if model not fitted or no external model.

loglik_BIC(X)[source][source]

Computes log-likelihood and Bayesian Information Criterion (BIC).

BIC = -2 * loglik + num_params * log(n)

Parameters:

X (ndarray | list[float]) – Input scores.

Return type:

tuple[float, float]

Returns:

A tuple of (log-likelihood, BIC). Returns (0, 0) if model not fitted.

pdf(X)[source][source]

Computes the PDF of the main mixture components (excluding external model).

Parameters:

X (ndarray | list[float]) – Input scores of shape (n,).

Return type:

ndarray

Returns:

PDF values of shape (n,). Returns zeros if model not fitted.

pdf_mix(X, external_pdf=None)[source][source]

Computes the full mixture PDF, including external model if present.

f_mixture(x) = pi0 * f_decoy(x) + (1-pi0) * f_target(x)

Parameters:
  • X (ndarray | list[float]) – Input scores.

  • external_pdf (ndarray | None) – Optional precomputed PDF values from external model.

Return type:

ndarray

Returns:

Mixture PDF values. Returns zeros if model not fitted.

pep(X, external_pdf=None)[source][source]

Estimates Posterior Error Probabilities (PEP).

PEP = pi0 * f_decoy(x) / f_mixture(x)

Parameters:
  • X (ndarray | list[float]) – Input scores.

  • external_pdf (ndarray | None) – Optional precomputed PDF values from external model. If None and external_model exists, it will be computed.

Return type:

ndarray

Returns:

Array of PEP values for each score in X. Returns zeros if model not fitted.

plot(title, plot_scores, false_scores=None)[source][source]

Plots the fitted mixture model against histogram of scores.

If an external model exists and false_scores are provided, plots:
  • Decoy model (external)

  • Target histogram + mixture fit

  • Separated true and false components

Otherwise, plots only the decoy model fit.

Parameters:
  • title (str) – Title prefix for plots.

  • plot_scores (ndarray | list[float]) – Scores to plot (e.g., target scores).

  • false_scores (ndarray | list[float] | None) – Optional decoy scores for comparison.

Return type:

None

fennomix_mhc.tda_fmm.gamma_pdf(X, u, sigma)[source][source]

Calculate Gamma probability density function values using mean and std.

The shape and scale parameters are derived from mean (u) and std (sigma).

Parameters:
  • X (ndarray | list[float] | float) – Input array of shape (n,) representing scores.

  • u (float) – Mean of the distribution.

  • sigma (float) – Standard deviation of the distribution.

Return type:

ndarray

Returns:

Array of Gamma PDF values with same shape as X.

fennomix_mhc.tda_fmm.gauss_pdf(X, u, sigma)[source][source]

Calculate Gaussian probability density function values for input array X.

Parameters:
  • X (ndarray | list[float] | float) – Input array of shape (n,) representing scores.

  • u (float) – Mean (mu) of the Gaussian distribution.

  • sigma (float) – Standard deviation (sigma) of the Gaussian distribution.

Return type:

ndarray

Returns:

Array of PDF values with same shape as X.

fennomix_mhc.tda_fmm.select_best_fmm(target_scores, decoy_fmm, _max_component_=3, verbose=True)[source][source]

Selects the best TDA_fmm model by BIC criterion.

Fits models with 1 to _max_component_ components and selects the one with lowest BIC.

Parameters:
  • target_scores (ndarray | list[float]) – Scores to fit the target model on.

  • decoy_fmm (DecoyModel) – Pre-fitted decoy model.

  • _max_component_ – Maximum number of components to try.

  • verbose (bool) – Whether to print progress.

Return type:

TDA_fmm

Returns:

Best-fitted TDA_fmm model (target model).