API Reference

The xlmhg Python API includes two alternative functions to conduct an XL-mHG test:

  • The simple test function, xlmhg_test(), accepts a ranked list in the form of a vector, and (optionally) the X and L parameters, and returns a 3-tuple containing the test statistic, cutoff, and p-value.
  • The advanced test function, get_xlmhg_test_result(), accepts a more compact representation of a list (consisting of its length N and a vector specifying the indices of the 1’s in the ranked list), as well as several additional arguments that can improve the performance of the test. Instead of a simple tuple, this API returns the test result as an mHGResult object, which includes additional information such as the test parameters, and methods to calculate additional quantities like E-Scores.

Additionally, the API includes a function, get_result_figure(), for visualizing a test result in a Plotly figure. See Examples for concrete examples of how to use these functions.

Simple test function - xlmhg_test()

xlmhg.xlmhg_test(v, X=None, L=None, table=None)

Perform an XL-mHG test (simplified interface).

This function accepts a vector containing zeros and ones, and returns a 3-tuple with the XL-mHG test statistic, cutoff, and p-value.

Parameters:
  • v (1-dim numpy.ndarray of integers) – The ranked list. All non-zero elements are considered “1”s. (Let N denote the length of the list.)
  • X (int, optional) – The X parameter. [1]
  • L (int, optional) – The L parameter. [N]
  • table (np.ndarray with ndim=2 and dtype=numpy.longdouble, optional) – The dynamic programming table. Size has to be at least (K+1) x (W+1), with W = N-K. Providing this array avoids memory reallocation when conducting multiple tests. [None]
Returns:

  • stat (float) – The XL-mHG test statistic.
  • cutoff (int) – The (first) cutoff at which stat was attained. (0 if no cutoff was tested.)
  • pval (float) – The XL-mHG p-value (either exact or an upper bound).

Advanced test function - get_xlmhg_test_result()

xlmhg.get_xlmhg_test_result(N, indices, X=None, L=None, exact_pval='always', pval_thresh=None, escore_pval_thresh=None, table=None, use_alg1=False, tol=1e-12)

Perform an XL-mHG test.

This function accepts a list in the form of a numpy indices array containing the indices of the non-zero elements (sorted), along with the length N of the list. It returns an mHGResult object.

Parameters:
  • int (N,) – The length of the list.
  • indices (1-dim numpy.ndarray with dtype = numpy.uint16) – Sorted list of indices corresponding to the “1”s in the ranked list.
  • X (int, optional) – The X parameter. Should be between 0 and K (inclusive), where K is the length of indices. [0]
  • L (int, optional) – The L parameter. Should be between 0 and N (inclusive). If None, this parameter will be set to N [None]
  • exact_pval (str, enumerated) –

    Valid values are: ‘always’, ‘if_significant’, and ‘if_necessary’. Determines in which cases exact p-values should be calculated. This option helps users avoid the time-consuming calculation of an exact p-value in cases where they do not require it, which can lead to significant performance gains. [‘always’]

    Specifically, this setting (in conjunction with pval_thresh) determines in which cases the PVAL-THRESH algorithm is invoked to efficiently determine whether the test is significant. This algorithm first tries to make this determination by calculating O(1)- and O(N)- bounds of the XL-mHG p-value. Only if this fails to give a conclusive answer, an O(N^2)-algorithm is used to calculate the exact p-value.

    Note that whenever ‘if_necessary’ or ‘if_significant’ is specified, a significance level (p-value threshold; argument pval_thresh) must be specified as well.

  • pval_thresh (float, optional) – The significance threshold, i.e., the p-value below which the test should be considered statistically significant. Note that this argument must be given whenever the escore_pval_thresh argument is given. [None]
  • escore_pval_thresh (float, optional) – The significance threshold to be used in the calculation of an E-score. The E-score is a measure of the strength of enrichment that is similar to “fold enrichment”. [None]
  • table (numpy.ndarray with ndim=2 and dtype=numpy.longdouble, optional) – The dynamic programming table. Size has to be at least (K+1) x (W+1). Providing this array avoids memory reallocation when conducting multiple tests. [None]
  • use_alg1 (bool, optional) – Whether to use PVAL1 (instead of PVAL2) for calculating the p-value. [False]
  • tol (float, optional) – The tolerance used for comparing floats. [1e-12]
Returns:

The test result.

Return type:

mHGResult

Test result objects - mHGResult

class xlmhg.mHGResult(N, indices, X, L, stat, cutoff, pval, pval_thresh=None, escore_pval_thresh=None, escore_tol=None)

The result of an XL-mHG test.

This class is used by the get_xlmhg_test_result function to represent the result of an XL-mHG test.

Parameters:
N

int – The length of the ranked list (i.e., the number of elements in it).

indices

numpy.ndarray with ndim=1 and dtype=np.uint16. – A sorted (!) list of indices of all the 1’s in the ranked list.

X

int – The XL-mHG X parameter.

L

int – The XL-mHG L parameter.

stat

float – The XL-mHG test statistic.

cutoff

int – The XL-mHG cutoff.

pval

float – The XL-mHG p-value.

pval_thresh

float or None – The user-specified significance (p-value) threshold for this test.

escore_pval_thresh

float or None – The user-specified p-value threshold used in the E-score calculation.

escore_tol

float or None – The floating point tolerance used in the E-score calculation.

K

(property) Returns the number of 1’s in the list.

escore

(property) Returns the E-score associated with the result.

fold_enrichment

(property) Returns the fold enrichment at the XL-mHG cutoff.

hash

(property) Returns a unique hash value for the result.

k

(property) Returns the number of 1’s above the XL-mHG cutoff.

v

(property) Returns the list as a numpy.ndarray (with dtype np.uint8).

Visualizing test results - get_result_figure()

xlmhg.get_result_figure(result, show_title=False, title=None, show_inset=True, plot_fold_enrichment=False, width=800, height=350, font_size=24, margin=None, font_family='Computer Modern Roman, serif', score_color='rgb(0, 109, 219)', enrichment_color='rgb(219, 109, 0)', cutoff_color='rgba(255, 52, 52, 0.7)', line_width=2.0, ymax=None, mHG_label=False)

Visualize an XL-mHG test result.

Parameters:
  • result (mHGResult) – The test result.
  • show_title (bool, optional) – Whether to include a title in the figure. If title is not None, this parameter is ignored. [False]
  • title (str or None, optional) – Figure title. If not None, show_title is ignored. [None]
  • show_inset (bool, optional) – Whether to show test parameters and p-value as an inset. [True]
  • plot_fold_enrichment (bool, optional) – Whether to plot the fold enrichment on a second axis. [False]
  • width (int, optional) – The width of the figure (in pixels). [800]
  • height (int, optional) – The height of the figure (in pixels). [350]
  • font_size (int, optional) – The font size to use. [20]
  • margin (dict, optional) – A dictionary specifying the figure margins (in pixels). Valid keys are “l” (left), “r” (right), “t” (top), and “b” (bottom). Missing keys are replaced by Plotly default values. If None, will be set to a dictionary specifying a left margin of 100 px, and a top margin of 40 px. [None]
  • font_family (str, optional) – The font family (name) to use. [“Computer Modern Roman, serif”]
  • score_color (str, optional) – The color used for plotting the enrichment scores. [“rgb(0,109,219)”]
  • enrichment_color (str, optional) – The color used for plotting the fold enrichment values (if enabled). [“rgb(219,109,0)”]
  • cutoff_color (str, optional) – The color used for indicating the XL-mHG test cutoff. [“rgba(255, 109,182,0.5)”]
  • line_width (int or float, optional) – The line width used for plotting. [2.0]
  • ymax (int or float or None, optional) – The y-axis limit. If None, determined automatically. [None]
  • mHG_label (bool, optional) – If True, label the p-value with “mHG” instead of “XL-mHG”. [False]
Returns:

The Plotly figure.

Return type:

plotly.graph_obs.Figure