Welcome to JUMP Shiny

JUMP Shiny is a comprehensive web platform designed for mass spectrometry-based proteomics analysis. JUMP Shiny offers a wide range of analytical tools, including experimental design, exploratory analysis, batch normalization, differential expression and enrichment pathway analysis. This platform provides intuitive visualizations and functionalities that facilitate in-depth exploration of proteomics data. The tutorial tab will guide you through user interface, functionalities, and detailed steps to perform the analysis.

Contact Information

Aijun Zhang: azhang16@uthsc.edu
Xusheng Wang: xwang39@uthsc.edu

Wang Lab@2025: Lab Website

Acknowledgement

Part of the code was adapted from project TCC-GUI under MIT license. We truly appreciate and respect their contributions.

Experiment Design

Experiment Design offers tools for organizing sample processing sequences in proteomics experiments. By using block randomization, it assigns samples to different batches, minimizing batch effects and reducing the risk of introducing confounders that could bias data interpretation.

The Experiment Design algorithm comprises three key procedures:

Generating a batch design matrix based on the distribution of the first explanatory variable, while considering the specified number and size of batches.
Allocating samples to each batch, factoring in the distribution of the second explanatory variable.
Optimizing the batch design scenario for the third and subsequent variables.

Steps for Experiment Design

Navigate to Experiment Design Tab

Click on the Experiment Design tab in the left sidebar of this page.
Upload Sample Information

At the top left, Click [Browse...] to upload your sample information table, which should be in .csv format.

Ensure your file follows the correct format shown below. The first column should be SampleID, with factors to be considered starting from the second column. You can also download the example file by clicking the [Download example sample information].

The [Sample Information Table] and the [Factor Distributions] plot showing the distribution of each factor (maximum: 3 plots) will be displayed after successful uploading. Note that factors should be categorical variables, and continuous variables should be grouped before uploading (such as AgeGroup in the below table).
Experiment Settings

Choose your experiment type as either [Label-free] or [TMT-labeling].

Input the number of samples in a batch. For [TMT-labeling] experiment, this could be 10, 11, 16, and 18. A WARNING will be shown if the number is greater than 18. For [Label-free], there is no limitation for the batch size.

Input the number of IR (Internal Reference) samples used in a batch. This typically would be 0, 1 or 2.

The [Optimization level] is the number of times that the block randomization program will be run to find the best result. You can leave it as the default value.

Note that when the number of factors is greater than two, achieving an equal distribution for the third and subsequent factors across batches cannot be guaranteed. In such cases, prioritizing factors becomes essential, and the most important factor should be placed as the first two factors to be considered.
Run Experiment Design

Click the [Run experiment design] button. After the program finishes running, it will generate a [Batch and channel assignation] and the [Batch design matrix] for each factor.

The [Batch and channel assignation] contains the information provided by the user alongside the batch information generated by the program. For TMT-labeling experiments, the table contains an additional column specifying the assigned channel for each sample.

Click [Download all results] to download all results as an .xlsx file.

Exploratory Analysis

Exploratory Analysis provides a user-friendly interface for uploading and visualizing datasets. It summarizes key dataset characteristics, helping users understand data distribution and identify underlying patterns. This section provides effective quality control of your data.

Steps for Exploratory Analysis

Navigate to Exploratory Analysis Tab

Click on the Exploratory Analysis tab in the left sidebar of this page.
Upload Data

Users can seamlessly upload a proteomic dataset, which should be in tab-delimited text or csv file format. Please ensure your file follows the correct format shown below. For a large dataset (over 30 Mb), please use JUMP Shiny locally.

In addition, please click Download example input expression table for an example illustration.

The input data should be organized into three columns: Protein Accession Number (i.e., UniProt), Gene Name, and Protein Description, followed by as many samples as needed. JUMP Shiny supports raw abundance values as well as log2 conversion values.
Note: if your data is already log2 transformed, please choose the log2 option.

An example of input data is shown below:

If successfully uploaded, the input data will be displayed in the Protein expression table panel (see below).

JUMP Shiny Format:

If your data contains Accession number, Gene Name, Description, and samples, use jumpshiny to upload. The first column is required.

JUMPq Result Format:

If your data sample starts from the 24th column, use jumpq to upload. Please remove the header rows of jumpq data.

Jumpq

JUMPq Batch Result Format:

If your data contains batch info, use jump_batch to upload.

Jumpq Batch

Group Assignment

After loading the dataset, input your grouping in the [Meta information] panel.

Group Information File
The Group Information File is required for the data analysis. It should adhere to the following structure:
- Sample Name Column: The first column should contain your sample names. These names must exactly match the corresponding column names in your input expression table. Only the columns specified in this file will be included in the analysis, so ensure that they are correct, complete, and matched.
- Grouping Name Column: After the sample name column, you can include one or more grouping columns. Each grouping column can represent different categories or factors relevant to your analysis (e.g., “control” vs. “treatment,” “male” vs. “female,” etc.). You can add as many grouping columns as necessary to capture all relevant grouping factors.
Note: Headers are required in this file to clearly identify each column. The header of the first column is required to be named as Sample or sample.

Below is an example of Group Information file:
Confirm and Analyze

Click the [Assign group information] button and wait for the [Summary] section to display additional information about your dataset. You can download and save the plots in .svg format for further analysis or publication. All plots can be zoomed in and out for a closer examination of the data.

Intensity Distribution Plot

By clicking the [Intensity Distribution] tab, you can view box plots for all uploaded samples, with each group highlighted in different colors. You can filter out proteins with low intensity and customize the title, X-axis, and Y-axis labels as needed.

PCA Plot

The PCA Plot visualizes the distribution of selected groups based on Principal Component Analysis (PCA). This plot helps in identifying patterns and trends in your dataset by reducing the dimensionality and highlighting the differences and similarities between groups. We include 2D and 3D PCA plots. Each point in the plot represents a sample, and the position of the points indicates their relative similarity or difference based on the principal components. The axes represent the first two/three principal components, which capture the most variance in the data. This visualization can be useful for identifying outliers, clusters, and potential relationships between groups.

You can define the number of top variable proteins included in the PCA. The results may vary depending on the number of proteins selected. Additionally, you can toggle the buttons to display or hide labels on the plot.

Sample Correlation

The sample correlation heatmap visualizes the correlation within and between groups. This method organizes samples and features into a hierarchical tree, known as a dendrogram, based on their similarity or dissimilarity. The heatmap uses color gradients to represent the intensity of the correlation, with closely related samples or groups appearing closer together on the dendrogram. This visualization can help identify clusters of similar samples, reveal patterns in the data, and highlight differences between groups. It’s a valuable tool for understanding the relationships and structure within your dataset.

Similarly, you can select the number of proteins to include in the cluster analysis. The percentage indicates the ratio of the selected top variable proteins to the total number of proteins in the dataset. You can also choose from various agglomeration and distance methods to customize the clustering process. These options allow you to refine the analysis and tailor the clustering approach based on the characteristics of your data and your specific research needs.

Group Selection

Navigate to the [Group selection] panel to explore different ways of grouping your data for visualization. You can select variables or categories to group your data, such as experimental conditions, genes, or sexes. This flexibility allows you to customize the visualization and highlight specific aspects of your data for more detailed analysis. Use the options in the panel to easily switch between different groupings and gain insights from various perspectives.

Batch Normalization

Batch Normalization aims to correct unwanted technical variation in protein expression data arising from experimental batch effects. By applying normalization techniques, you can ensure that the observed differences in protein expression are due to biological variation rather than technical artifacts. This process offers several key benefits:

Improving the Accuracy of Differential Expression: Normalization helps in accurately identifying true biological differences by eliminating technical noise.
Reducing the Impact of Batch Effects: It minimizes the influence of variations introduced during different experimental batches, leading to more consistent and reliable data.
Enhancing the Reproducibility of Results: By standardizing the data, normalization ensures that the results are reproducible across different experiments and studies.

Steps for Batch Normalization

Navigate to Batch Normalization Tab

Click on the Batch Normalization tab in the left sidebar of this page.
Select Normalization Method

Choose the appropriate normalization method based on your data:

Internal: If your data has an internal reference, such as TMT data, you can normalize the data based on an internal sample.

Linear: If your data doesn’t have an internal sample, select linear normalization. Linear normalization adjusts your dataset based on overall trends, bringing all samples to a common scale and correcting for systematic technical variations.

Internal+Linear: If your data has an internal reference, you can choose first internal normalization and then linear normalization for better results.
Selecting Batch Group Information

To perform batch normalization, please specify the necessary batch and internal reference as provided in the sample information file. Please use the dropdown menu to choose the relevant batch identifier and select the column that contains the internal reference sample information as provided in the sample information file.

Once you have selected the appropriate batch and internal reference column, click the [Run Batch Normalization] button to initiate the normalization process.

Select your batch group column in the dropdown menu.
- Internal Method: Format: Include one more Info column. Make sure to specify the internal samples.
- Linear Method: Format: No need to have internal reference column.
Normalization Results

After normalization, the Data Table will appear on the right side of the page. Similar to the Exploratory Analysis, Intensity Distribution, PCA plot, and Sample Correlation are generated to assess the effectiveness of the normalization.
Proceed to Differential Expression

Now, you can directly proceed to Differential Expression analysis.

Differential Expression

Differential Expression is a method used to identify proteins that show significant differences in expression levels between groups or conditions. By comparing the expression levels across different groups, such as diseased versus healthy samples, researchers can pinpoint specific molecules that are upregulated or downregulated, providing insights into the biological processes involved.

Steps for Differential Expression

Navigate to Differential Expression Tab

Click on the Differential Expression tab in the left sidebar of this page.
Adjust Differential Expression Parameters

Under the [Differential Expression Methods] panel, adjust grouping variables for comparison:
- Select the statistical method, and adjust the grouping variables for comparison.
- Define the type of comparisons: pairwise or multiple group comparisons.
- Choose whether to perform data imputation before conducting the differential analysis.
JUMP shiny offers three options: no imputation, imputation using the minProb method, or excluding all proteins with missing values in any samples (i.e., Data without NAs).

For the imputation, JUMP shiny handles missing values based on the following four cases:
- If there are no missing values in all groups, no imputation is performed.
- If one group has >1 values while the other groups have ≥1 values, no imputation is executed.
- If one group is completely missing while the other groups have more than one value: Missing values are imputed with the top n minimum values sample wisely in the group. The number of missing values imputed is equal to maximum number of values in the other groups that have at least 2 values.
- If all groups have only one value or are completely missing, this protein row is discarded from the differential expression analysis.
Imputing missing values can help minimize potential biases and make the dataset more complete, thereby improving the reliability of the results. Depending on your specific needs and the characteristics of your data, you can select the most appropriate option for your analysis.
Run Differential Expression

After defining all parameters, click the [Run Differential Expression Test] button. The program will analyze your data and generate a result table, providing detailed information from the differentially expressed analysis. Depending on the type of comparison, the result table will include the p-value, FDR, and log2 fold change in the last three columns, along with the original expression levels (log2 transformed values) of each protein.

You can click the header of each column to sort the proteins based on p-value, FDR, or log2 fold change. This functionality allows you to easily identify the most significant proteins according to your criteria. Additionally, by selecting a particular protein, you can view its expression levels across all samples.
Visualization of the Results of Differential Expression
- Volcano Plot
  Displays the distribution of result data, highlighting significant proteins based on p-value and log2 fold change. This plot helps in identifying proteins with substantial changes in expression and their statistical significance, making it easier to identify key proteins in your analysis.
  You can customize the volcano plot by defining the graphic title, significance levels (p-value or FDR), and the cutoff values for up-regulated and down-regulated log2 fold changes. The determination of up-regulation or down-regulation is based on the parameters specified in the [Differential Expression Methods] section, such as the order of the groups being compared.
  To label proteins with gene symbols in the volcano plot, click the + in the advanced parameters section and enter the protein accession numbers you would like to highlight. Additionally, if you click on a point in the volcano plot, a bar plot will be displayed beneath it. This bar plot provides a detailed view of the expression levels of the selected protein across all samples, providing expression levels across all samples.
- Heatmap Plot
  Explores the relationships and similarities among samples and proteins within your dataset. You can define the list of proteins to include in the heatmap using one of three options: inputting a specific list of proteins, selecting proteins based on an FDR cutoff, or choosing the top n proteins based on FDR. Additionally, you can further refine the selected protein list by applying a log2 fold change filter.
  To generate a customized heatmap plot, you can select different distance and agglomeration methods for the cluster analysis. Additionally, you have the option to generate dendrograms for samples, proteins, or both, depending on your analytical needs. You can also choose the color scheme for the heatmap, allowing you to tailor the visualization to your preferences.
- Standard Deviation (SD) Within the Group Plot
  Provides a distribution plot of the SD between samples within a group/condition. This plot helps visualize the variability within each group, offering information for setting the log2 fold change cutoff.
- Moving SD Plot
  Provides a scatter plot that displays the SDs across a range of intensities from low to high. This plot allows you to observe how the variability of your data changes with intensity. By visualizing the SDs, you can apply different log2 fold change cutoffs for proteins with varying intensity levels.

Enrichment Analysis

Enrichment Analysis is used to identify biological pathways, processes, or molecular functions that are significantly overrepresented in your set of differentially expressed proteins. By mapping these differentially expressed elements to known databases and ontologies, such as Gene Ontology (GO), KEGG pathways, or Reactome pathways, enrichment analysis helps uncover the underlying biological mechanisms driving the observed changes.

Steps

Select Proteins from Differential Expression Results

To perform enrichment analysis, start by selecting proteins from the differential expression results. You have several options for defining this list: you can provide a specific list of proteins, or filter proteins based on their FDR or p-value. Additionally, you can further refine this list by applying a log2 fold change cutoff. These filtering options ensure that you focus on the most relevant proteins, enhancing the accuracy and biological relevance of the enrichment analysis. Once the selection is made, you can click [Run Significance Filter]. The filtered proteins will be shown in the [Filtered Result Table]. The number of proteins selected will be shown below the table.
Set Up Enrichment Analysis Parameters

You can customize the parameters for the enrichment analysis to suit your specific needs. The JUMP rshiny platform offers three organism databases (human, mouse, and rat) and two methods (GO and KEGG) for performing enrichment analysis. By selecting the appropriate database and method, you can tailor the analysis to your organism of interest and the specific biological questions you are investigating.

You can set various parameters including the p-value cutoff, q-value cutoff, minimal gene set size, and maximal gene set size. Once you have configured these settings, click the [Run Enrichment Analysis] button to start the analysis and generate the results.
Enrichment Analysis Results

The results of the enrichment analysis will be displayed in a comprehensive result table. This table provides detailed information about the enriched pathways, processes, or molecular functions, including relevant statistics such as p-values and q-values. By examining this table, you can identify the most significant biological themes and mechanisms associated with your differentially expressed proteins or genes. You can download the enrichment result table in csv format.
Visualization of Enrichment Analysis Results

JUMP shiny provides two plots for visualizing the results of the enrichment analysis:
- Bubble Plot: This plot displays the most significantly enriched pathways, processes, or functions in a clear and easy-to-interpret dot graph format. Each bubble represents a specific enriched category, with the size of the bubble corresponding to the number of proteins in the enriched pathway. The color of each dot represents the significant level of the enriched pathway (i.e., adjusted p value)
- Network Map: This plot presents a network map that illustrates the relationships and interactions between the enriched pathways or functions. The network map helps in understanding the interconnectedness of various biological processes and identifying key regulatory nodes.

Sample Information

Upload Sample Information

Browse...

The file needs to be saved in .csv format. The first column should be the sample ID, with factors to be considered starting from the second column. Download example sample information

Experiment Settings

Experiment Type

Number of all samples in a batch

No. of IR (Internal Reference) samples in a batch

Optimization level

Sample Information Table

Factor Distributions

Batch and Channel Assignation

Download all results

Batch Design Matrix

Upload

Choose your file type

jumpshiny

jumpq

jump_batch

Type of intensity values

raw log2

Upload Data

Upload...

Text file in .tsv/.csv format, and the first column should be Accession Number, second column should be Gene Names, Third should be Description. Download example input expression table

Meta information

Upload sample information

Upload...

Text file in .tsv/.csv format, and the first column should be named as Sample.

JUMP Shiny expect first column to be sample name, second column to be group information, and so on.

Group selection

Data info

Protein expression table

Normalization Method

Batch Information

Please specify the internal information as in example in the batch info if selected.

Internal reference column should specify reference sample as 'internal'.

Protein expression table

Differential Expression Methods

Differential Expression Method

Missing Value Imputation

No Imputation

Imputation

Data without NAs

Differential Expression Result Table

Data preparation

Filter the dataset?

Yes

Parameter Selection

Organism

Method

p-value cutoff

q-value cutoff

minimal gene set size

maximal gene set size

Filtered Result Table