Welcome to JUMP Shiny
JUMP Shiny is a comprehensive web platform designed for mass spectrometry-based proteomics analysis. JUMP Shiny offers a wide range of analytical tools, including experimental design, exploratory analysis, batch normalization, differential expression and enrichment pathway analysis. This platform provides intuitive visualizations and functionalities that facilitate in-depth exploration of proteomics data. The tutorial tab will guide you through user interface, functionalities, and detailed steps to perform the analysis.
Contact Information
Aijun Zhang: azhang16@uthsc.edu
Xusheng Wang: xwang39@uthsc.edu
Wang Lab@2025: Lab Website
Acknowledgement
Part of the code was adapted from project TCC-GUI under MIT license. We truly appreciate and respect their contributions.
- Experiment Design
- Exploratory Analysis
- Batch Normalization
- Differential Expression
- Enrichment pathway analysis
Experiment Design
Experiment Design offers tools for organizing sample processing sequences in proteomics experiments. By using block randomization, it assigns samples to different batches, minimizing batch effects and reducing the risk of introducing confounders that could bias data interpretation.
The Experiment Design
algorithm comprises three key procedures:
- Generating a batch design matrix based on the distribution of the first explanatory variable, while considering the specified number and size of batches.
- Allocating samples to each batch, factoring in the distribution of the second explanatory variable.
- Optimizing the batch design scenario for the third and subsequent variables.
Steps for Experiment Design
-
Navigate to Experiment Design Tab
Click on the
Experiment Design
tab in the left sidebar of this page. -
Upload Sample Information
At the top left, Click
[Browse...]
to upload your sample information table, which should be in.csv
format.Ensure your file follows the correct format shown below. The first column should be
SampleID
, with factors to be considered starting from the second column. You can also download the example file by clicking the[Download example sample information]
.The
[Sample Information Table]
and the[Factor Distributions]
plot showing the distribution of each factor (maximum: 3 plots) will be displayed after successful uploading. Note that factors should be categorical variables, and continuous variables should be grouped before uploading (such asAgeGroup
in the below table). -
Experiment Settings
Choose your experiment type as either
[Label-free]
or[TMT-labeling]
.Input the number of samples in a batch. For
[TMT-labeling]
experiment, this could be 10, 11, 16, and 18. A WARNING will be shown if the number is greater than 18. For[Label-free]
, there is no limitation for the batch size.Input the number of IR (Internal Reference) samples used in a batch. This typically would be 0, 1 or 2.
The
[Optimization level]
is the number of times that the block randomization program will be run to find the best result. You can leave it as the default value.Note that when the number of factors is greater than two, achieving an equal distribution for the third and subsequent factors across batches cannot be guaranteed. In such cases, prioritizing factors becomes essential, and the most important factor should be placed as the first two factors to be considered.
-
Run Experiment Design
Click the
[Run experiment design]
button. After the program finishes running, it will generate a[Batch and channel assignation]
and the[Batch design matrix]
for each factor.The
[Batch and channel assignation]
contains the information provided by the user alongside the batch information generated by the program. For TMT-labeling experiments, the table contains an additional column specifying the assigned channel for each sample.Click
[Download all results]
to download all results as an .xlsx file.
Exploratory Analysis
Exploratory Analysis provides a user-friendly interface for uploading and visualizing datasets. It summarizes key dataset characteristics, helping users understand data distribution and identify underlying patterns. This section provides effective quality control of your data.
Steps for Exploratory Analysis
-
Navigate to Exploratory Analysis Tab
Click on the
Exploratory Analysis
tab in the left sidebar of this page. -
Upload Data
Users can seamlessly upload a proteomic dataset, which should be in
tab-delimited text
orcsv
file format. Please ensure your file follows the correct format shown below. For a large dataset (over 30 Mb), please use JUMP Shiny locally.In addition, please click
Download example input expression table
for an example illustration.The input data should be organized into three columns: Protein Accession Number (i.e., UniProt), Gene Name, and Protein Description, followed by as many samples as needed. JUMP Shiny supports
raw
abundance values as well aslog2
conversion values.
Note: if your data is already log2 transformed, please choose thelog2
option.An example of input data is shown below:
If successfully uploaded, the input data will be displayed in the
Protein expression table
panel (see below).
JUMP Shiny Format:
If your data contains Accession number, Gene Name, Description, and samples, use jumpshiny
to upload. The first column is required.
JUMPq Result Format:
If your data sample starts from the 24th column, use jumpq
to upload. Please remove the header rows of jumpq data.
JUMPq Batch Result Format:
If your data contains batch info, use jump_batch
to upload.
-
Group Assignment
After loading the dataset, input your grouping in the
[Meta information]
panel.Group Information File
The Group Information File is required for the data analysis. It should adhere to the following structure:-
Sample Name Column: The first column should contain your sample names. These names must exactly match the corresponding column names in your input expression table. Only the columns specified in this file will be included in the analysis, so ensure that they are correct, complete, and matched.
-
Grouping Name Column: After the sample name column, you can include one or more grouping columns. Each grouping column can represent different categories or factors relevant to your analysis (e.g., “control” vs. “treatment,” “male” vs. “female,” etc.). You can add as many grouping columns as necessary to capture all relevant grouping factors.
Note: Headers are required in this file to clearly identify each column. The header of the first column is required to be named as
Sample
orsample
.Below is an example of
Group Information
file: -
-
Confirm and Analyze
Click the
[Assign group information]
button and wait for the[Summary]
section to display additional information about your dataset. You can download and save the plots in .svg format for further analysis or publication. All plots can be zoomed in and out for a closer examination of the data.Intensity Distribution Plot
By clicking the
[Intensity Distribution]
tab, you can view box plots for all uploaded samples, with each group highlighted in different colors. You can filter out proteins with low intensity and customize the title, X-axis, and Y-axis labels as needed.PCA Plot
The PCA Plot visualizes the distribution of selected groups based on Principal Component Analysis (PCA). This plot helps in identifying patterns and trends in your dataset by reducing the dimensionality and highlighting the differences and similarities between groups. We include 2D and 3D PCA plots. Each point in the plot represents a sample, and the position of the points indicates their relative similarity or difference based on the principal components. The axes represent the first two/three principal components, which capture the most variance in the data. This visualization can be useful for identifying outliers, clusters, and potential relationships between groups.
You can define the number of top variable proteins included in the PCA. The results may vary depending on the number of proteins selected. Additionally, you can toggle the buttons to display or hide labels on the plot.
Sample Correlation
The sample correlation heatmap visualizes the correlation within and between groups. This method organizes samples and features into a hierarchical tree, known as a dendrogram, based on their similarity or dissimilarity. The heatmap uses color gradients to represent the intensity of the correlation, with closely related samples or groups appearing closer together on the dendrogram. This visualization can help identify clusters of similar samples, reveal patterns in the data, and highlight differences between groups. It’s a valuable tool for understanding the relationships and structure within your dataset.
Similarly, you can select the number of proteins to include in the cluster analysis. The percentage indicates the ratio of the selected top variable proteins to the total number of proteins in the dataset. You can also choose from various agglomeration and distance methods to customize the clustering process. These options allow you to refine the analysis and tailor the clustering approach based on the characteristics of your data and your specific research needs.
Group Selection
Navigate to the
[Group selection]
panel to explore different ways of grouping your data for visualization. You can select variables or categories to group your data, such as experimental conditions, genes, or sexes. This flexibility allows you to customize the visualization and highlight specific aspects of your data for more detailed analysis. Use the options in the panel to easily switch between different groupings and gain insights from various perspectives.
Batch Normalization
Batch Normalization aims to correct unwanted technical variation in protein expression data arising from experimental batch effects. By applying normalization techniques, you can ensure that the observed differences in protein expression are due to biological variation rather than technical artifacts. This process offers several key benefits:
- Improving the Accuracy of Differential Expression: Normalization helps in accurately identifying true biological differences by eliminating technical noise.
- Reducing the Impact of Batch Effects: It minimizes the influence of variations introduced during different experimental batches, leading to more consistent and reliable data.
- Enhancing the Reproducibility of Results: By standardizing the data, normalization ensures that the results are reproducible across different experiments and studies.
Steps for Batch Normalization
-
Navigate to Batch Normalization Tab
Click on the
Batch Normalization
tab in the left sidebar of this page. -
Select Normalization Method
Choose the appropriate normalization method based on your data:
Internal
: If your data has an internal reference, such as TMT data, you can normalize the data based on an internal sample.Linear
: If your data doesn’t have an internal sample, select linear normalization. Linear normalization adjusts your dataset based on overall trends, bringing all samples to a common scale and correcting for systematic technical variations.Internal+Linear
: If your data has an internal reference, you can choose first internal normalization and then linear normalization for better results. -
Selecting Batch Group Information
To perform batch normalization, please specify the necessary batch and internal reference as provided in the sample information file. Please use the dropdown menu to choose the relevant batch identifier and select the column that contains the internal reference sample information as provided in the sample information file.
Once you have selected the appropriate batch and internal reference column, click the
[Run Batch Normalization]
button to initiate the normalization process.Select your batch group column in the dropdown menu.
-
Internal Method: Format: Include one more
Info
column. Make sure to specify the internal samples. -
Linear Method: Format: No need to have internal reference column.
-
-
Normalization Results
After normalization, the
Data Table
will appear on the right side of the page. Similar to theExploratory Analysis
, Intensity Distribution, PCA plot, and Sample Correlation are generated to assess the effectiveness of the normalization. -
Proceed to Differential Expression
Now, you can directly proceed to
Differential Expression
analysis.
Differential Expression
Differential Expression is a method used to identify proteins that show significant differences in expression levels between groups or conditions. By comparing the expression levels across different groups, such as diseased versus healthy samples, researchers can pinpoint specific molecules that are upregulated or downregulated, providing insights into the biological processes involved.
Steps for Differential Expression
-
Navigate to Differential Expression Tab
Click on the
Differential Expression
tab in the left sidebar of this page.