API Reference¶
The xlmhg Python API includes two alternative functions to conduct an XL-mHG test:
- The simple test function,
xlmhg_test()
, accepts a ranked list in the form of a vector, and (optionally) theX
andL
parameters, and returns a 3-tuple containing the test statistic, cutoff, and p-value. - The advanced test function,
get_xlmhg_test_result()
, accepts a more compact representation of a list (consisting of its lengthN
and a vector specifying theindices
of the 1’s in the ranked list), as well as several additional arguments that can improve the performance of the test. Instead of a simple tuple, this API returns the test result as anmHGResult
object, which includes additional information such as the test parameters, and methods to calculate additional quantities like E-Scores.
Additionally, the API includes a function,
get_result_figure()
, for visualizing a test result in a
Plotly figure. See Examples for concrete examples of
how to use these functions.
Simple test function - xlmhg_test()
¶
-
xlmhg.
xlmhg_test
(v, X=None, L=None, table=None)¶ Perform an XL-mHG test (simplified interface).
This function accepts a vector containing zeros and ones, and returns a 3-tuple with the XL-mHG test statistic, cutoff, and p-value.
Parameters: - v (1-dim
numpy.ndarray
of integers) – The ranked list. All non-zero elements are considered “1”s. (Let N denote the length of the list.) - X (int, optional) – The
X
parameter. [1] - L (int, optional) – The
L
parameter. [N] - table (np.ndarray with
ndim=2
anddtype=numpy.longdouble
, optional) – The dynamic programming table. Size has to be at least (K+1) x (W+1), with W = N-K. Providing this array avoids memory reallocation when conducting multiple tests. [None]
Returns: - stat (float) – The XL-mHG test statistic.
- cutoff (int) – The (first) cutoff at which stat was attained. (0 if no cutoff was tested.)
- pval (float) – The XL-mHG p-value (either exact or an upper bound).
- v (1-dim
Advanced test function - get_xlmhg_test_result()
¶
-
xlmhg.
get_xlmhg_test_result
(N, indices, X=None, L=None, exact_pval='always', pval_thresh=None, escore_pval_thresh=None, table=None, use_alg1=False, tol=1e-12)¶ Perform an XL-mHG test.
This function accepts a list in the form of a numpy
indices
array containing the indices of the non-zero elements (sorted), along with the lengthN
of the list. It returns anmHGResult
object.Parameters: - int (N,) – The length of the list.
- indices (1-dim
numpy.ndarray
withdtype
= numpy.uint16) – Sorted list of indices corresponding to the “1”s in the ranked list. - X (int, optional) – The
X
parameter. Should be between 0 and K (inclusive), where K is the length ofindices
. [0] - L (int, optional) – The
L
parameter. Should be between 0 andN
(inclusive). IfNone
, this parameter will be set toN
[None] - exact_pval (str, enumerated) –
Valid values are: ‘always’, ‘if_significant’, and ‘if_necessary’. Determines in which cases exact p-values should be calculated. This option helps users avoid the time-consuming calculation of an exact p-value in cases where they do not require it, which can lead to significant performance gains. [‘always’]
Specifically, this setting (in conjunction with
pval_thresh
) determines in which cases the PVAL-THRESH algorithm is invoked to efficiently determine whether the test is significant. This algorithm first tries to make this determination by calculating O(1)- and O(N)- bounds of the XL-mHG p-value. Only if this fails to give a conclusive answer, an O(N^2)-algorithm is used to calculate the exact p-value.Note that whenever ‘if_necessary’ or ‘if_significant’ is specified, a significance level (p-value threshold; argument
pval_thresh
) must be specified as well. - pval_thresh (float, optional) – The significance threshold, i.e., the p-value below which the test
should be considered statistically significant. Note that this
argument must be given whenever the
escore_pval_thresh
argument is given. [None] - escore_pval_thresh (float, optional) – The significance threshold to be used in the calculation of an E-score. The E-score is a measure of the strength of enrichment that is similar to “fold enrichment”. [None]
- table (
numpy.ndarray
withndim=2
anddtype=numpy.longdouble
, optional) – The dynamic programming table. Size has to be at least (K+1) x (W+1). Providing this array avoids memory reallocation when conducting multiple tests. [None] - use_alg1 (bool, optional) – Whether to use PVAL1 (instead of PVAL2) for calculating the p-value. [False]
- tol (float, optional) – The tolerance used for comparing floats. [1e-12]
Returns: The test result.
Return type:
Test result objects - mHGResult
¶
-
class
xlmhg.
mHGResult
(N, indices, X, L, stat, cutoff, pval, pval_thresh=None, escore_pval_thresh=None, escore_tol=None)¶ The result of an XL-mHG test.
This class is used by the
get_xlmhg_test_result
function to represent the result of an XL-mHG test.Parameters: - N (int) – See
N
attribute. - indices – See
indices
attribute. - X (int) – See
X
attribute. - L (int) – See :attr:’L’ attribute.
- stat (float) – See
stat
attribute. - cutoff (int) – See
cutoff
attribute. - pval (float) – See
pval
attribute. - pval_thresh (float, optional) – See
pval_thresh
attribute. - escore_pval_thresh (float, optional) – See
escore_pval_thresh
attribute. - escore_tol (float, optional) – See
escore_tol
attribute.
-
N
¶ int – The length of the ranked list (i.e., the number of elements in it).
-
indices
¶ numpy.ndarray
withndim=1
anddtype=np.uint16
. – A sorted (!) list of indices of all the 1’s in the ranked list.
-
X
¶ int – The XL-mHG X parameter.
-
L
¶ int – The XL-mHG L parameter.
-
stat
¶ float – The XL-mHG test statistic.
-
cutoff
¶ int – The XL-mHG cutoff.
-
pval
¶ float – The XL-mHG p-value.
-
pval_thresh
¶ float or None – The user-specified significance (p-value) threshold for this test.
-
escore_pval_thresh
¶ float or None – The user-specified p-value threshold used in the E-score calculation.
-
escore_tol
¶ float or None – The floating point tolerance used in the E-score calculation.
-
K
¶ (property) Returns the number of 1’s in the list.
-
escore
¶ (property) Returns the E-score associated with the result.
-
fold_enrichment
¶ (property) Returns the fold enrichment at the XL-mHG cutoff.
-
hash
¶ (property) Returns a unique hash value for the result.
-
k
¶ (property) Returns the number of 1’s above the XL-mHG cutoff.
-
v
¶ (property) Returns the list as a
numpy.ndarray
(with dtypenp.uint8
).
- N (int) – See
Visualizing test results - get_result_figure()
¶
-
xlmhg.
get_result_figure
(result, show_title=False, title=None, show_inset=True, plot_fold_enrichment=False, width=800, height=350, font_size=24, margin=None, font_family='Computer Modern Roman, serif', score_color='rgb(0, 109, 219)', enrichment_color='rgb(219, 109, 0)', cutoff_color='rgba(255, 52, 52, 0.7)', line_width=2.0, ymax=None, mHG_label=False)¶ Visualize an XL-mHG test result.
Parameters: - result (
mHGResult
) – The test result. - show_title (bool, optional) – Whether to include a title in the figure. If
title
is notNone
, this parameter is ignored. [False] - title (str or None, optional) – Figure title. If not
None
,show_title
is ignored. [None] - show_inset (bool, optional) – Whether to show test parameters and p-value as an inset. [True]
- plot_fold_enrichment (bool, optional) – Whether to plot the fold enrichment on a second axis. [False]
- width (int, optional) – The width of the figure (in pixels). [800]
- height (int, optional) – The height of the figure (in pixels). [350]
- font_size (int, optional) – The font size to use. [20]
- margin (dict, optional) – A dictionary specifying the figure margins (in pixels).
Valid keys are “l” (left), “r” (right), “t” (top), and “b” (bottom).
Missing keys are replaced by Plotly default values. If
None
, will be set to a dictionary specifying a left margin of 100 px, and a top margin of 40 px. [None] - font_family (str, optional) – The font family (name) to use. [“Computer Modern Roman, serif”]
- score_color (str, optional) – The color used for plotting the enrichment scores. [“rgb(0,109,219)”]
- enrichment_color (str, optional) – The color used for plotting the fold enrichment values (if enabled). [“rgb(219,109,0)”]
- cutoff_color (str, optional) – The color used for indicating the XL-mHG test cutoff. [“rgba(255, 109,182,0.5)”]
- line_width (int or float, optional) – The line width used for plotting. [2.0]
- ymax (int or float or None, optional) – The y-axis limit. If
None
, determined automatically. [None] - mHG_label (bool, optional) – If
True
, label the p-value with “mHG” instead of “XL-mHG”. [False]
Returns: The Plotly figure.
Return type: plotly.graph_obs.Figure
- result (