Welcome to JUMP Shiny
JUMP Shiny is a comprehensive web platform designed for mass spectrometry-based proteomics analysis. JUMP Shiny offers a wide range of analytical tools, including experimental design, exploratory analysis, batch normalization, differential expression and enrichment pathway analysis. This platform provides intuitive visualizations and functionalities that facilitate in-depth exploration of proteomics data. The tutorial tab will guide you through user interface, functionalities, and detailed steps to perform the analysis.
Citation
If you use JUMPshiny in your work, please cite:
Zhang, A., Fu, Y., Yuan, Z.-F., Wu, L., Kong, D., Li, L., Wu, Z., Prins, P., Peng, J., & Wang, X. (n.d.). JUMPshiny: A User-Friendly Platform for Comprehensive Analysis and Visualization of Quantitative Proteomics Data. PROTEOMICS, n/a(n/a), e70061. https://doi.org/10.1002/pmic.70061
Contact Information
Aijun Zhang: azhang16@uthsc.edu
Xusheng Wang: xwang39@uthsc.edu
Wang Lab@2025: Lab Website
Acknowledgement
Part of the code was adapted from project TCC-GUI under MIT license. We truly appreciate and respect their contributions.
- Experiment Design
- Exploratory Analysis
- Batch Normalization
- Differential Expression
- Enrichment pathway analysis
Experiment Design
Experiment Design offers tools for organizing sample processing sequences in proteomics experiments. By using block randomization, it assigns samples to different batches, minimizing batch effects and reducing the risk of introducing confounders that could bias data interpretation.
The Experiment Design algorithm comprises three key procedures:
- Generating a batch design matrix based on the distribution of the first explanatory variable, while considering the specified number and size of batches.
- Allocating samples to each batch, factoring in the distribution of the second explanatory variable.
- Optimizing the batch design scenario for the third and subsequent variables.
Steps for Experiment Design
-
Navigate to Experiment Design Tab
Click on the
Experiment Designtab in the left sidebar of this page. -
Upload Sample Information
At the top left, Click
[Browse...]to upload your sample information table, which should be in.csvformat.Ensure your file follows the correct format shown below. The first column should be
SampleID, with factors to be considered starting from the second column. You can also download the example file by clicking the[Download example sample information].The
[Sample Information Table]and the[Factor Distributions]plot showing the distribution of each factor (maximum: 3 plots) will be displayed after successful uploading. Note that factors should be categorical variables, and continuous variables should be grouped before uploading (such asAgeGroupin the below table). -
Experiment Settings
Choose your experiment type as either
[Label-free]or[TMT-labeling].Input the number of samples in a batch. For
[TMT-labeling]experiment, this could be 10, 11, 16, and 18. A WARNING will be shown if the number is greater than 18. For[Label-free], there is no limitation for the batch size.Input the number of IR (Internal Reference) samples used in a batch. This typically would be 0, 1 or 2.
The
[Optimization level]is the number of times that the block randomization program will be run to find the best result. You can leave it as the default value.Note that when the number of factors is greater than two, achieving an equal distribution for the third and subsequent factors across batches cannot be guaranteed. In such cases, prioritizing factors becomes essential, and the most important factor should be placed as the first two factors to be considered.
-
Run Experiment Design
Click the
[Run experiment design]button. After the program finishes running, it will generate a[Batch and channel assignation]and the[Batch design matrix]for each factor.The
[Batch and channel assignation]contains the information provided by the user alongside the batch information generated by the program. For TMT-labeling experiments, the table contains an additional column specifying the assigned channel for each sample.Click
[Download all results]to download all results as an .xlsx file.
Exploratory Analysis
Exploratory Analysis provides a user-friendly interface for uploading and visualizing datasets. It summarizes key dataset characteristics, helping users understand data distribution and identify underlying patterns. This section provides effective quality control of your data.
Steps for Exploratory Analysis
-
Navigate to Exploratory Analysis Tab
Click on the
Exploratory Analysistab in the left sidebar of this page. -
Upload Data
Users can seamlessly upload a proteomic dataset, which should be in
tab-delimited textorcsvfile format. Please ensure your file follows the correct format shown below. For a large dataset (over 30 Mb), please use JUMP Shiny locally.In addition, please click
Download example input expression tablefor an example illustration.The input data should be organized into three columns: Protein Accession Number (i.e., UniProt), Gene Name, and Protein Description, followed by as many samples as needed. JUMP Shiny supports
rawabundance values as well aslog2conversion values.
Note: if your data is already log2 transformed, please choose thelog2option.An example of input data is shown below:
If successfully uploaded, the input data will be displayed in the
Protein expression tablepanel (see below).
JUMP Shiny Format:
If your data contains Accession number, Gene Name, Description, and samples, use jumpshiny to upload. The first column is required.
JUMPq Result Format:
If your data sample starts from the 24th column, use jumpq to upload. Please remove the header rows of jumpq data.
JUMPq Batch Result Format:
If your data contains batch info, use jump_batch to upload.
-
Group Assignment
After loading the dataset, input your grouping in the
[Meta information]panel.Group Information File
The Group Information File is required for the data analysis. It should adhere to the following structure:-
Sample Name Column: The first column should contain your sample names. These names must exactly match the corresponding column names in your input expression table. Only the columns specified in this file will be included in the analysis, so ensure that they are correct, complete, and matched.
-
Grouping Name Column: After the sample name column, you can include one or more grouping columns. Each grouping column can represent different categories or factors relevant to your analysis (e.g., “control” vs. “treatment,” “male” vs. “female,” etc.). You can add as many grouping columns as necessary to capture all relevant grouping factors.
Note: Headers are required in this file to clearly identify each column. The header of the first column is required to be named as
Sampleorsample.Below is an example of
Group Informationfile: -
-
Confirm and Analyze
Click the
[Assign group information]button and wait for the[Summary]section to display additional information about your dataset. You can download and save the plots in .svg format for further analysis or publication. All plots can be zoomed in and out for a closer examination of the data.Intensity Distribution Plot
By clicking the
[Intensity Distribution]tab, you can view box plots for all uploaded samples, with each group highlighted in different colors. You can filter out proteins with low intensity and customize the title, X-axis, and Y-axis labels as needed.PCA Plot
The PCA Plot visualizes the distribution of selected groups based on Principal Component Analysis (PCA). This plot helps in identifying patterns and trends in your dataset by reducing the dimensionality and highlighting the differences and similarities between groups. We include 2D and 3D PCA plots. Each point in the plot represents a sample, and the position of the points indicates their relative similarity or difference based on the principal components. The axes represent the first two/three principal components, which capture the most variance in the data. This visualization can be useful for identifying outliers, clusters, and potential relationships between groups.
You can define the number of top variable proteins included in the PCA. The results may vary depending on the number of proteins selected. Additionally, you can toggle the buttons to display or hide labels on the plot.
Sample Correlation
The sample correlation heatmap visualizes the correlation within and between groups. This method organizes samples and features into a hierarchical tree, known as a dendrogram, based on their similarity or dissimilarity. The heatmap uses color gradients to represent the intensity of the correlation, with closely related samples or groups appearing closer together on the dendrogram. This visualization can help identify clusters of similar samples, reveal patterns in the data, and highlight differences between groups. It’s a valuable tool for understanding the relationships and structure within your dataset.
Similarly, you can select the number of proteins to include in the cluster analysis. The percentage indicates the ratio of the selected top variable proteins to the total number of proteins in the dataset. You can also choose from various agglomeration and distance methods to customize the clustering process. These options allow you to refine the analysis and tailor the clustering approach based on the characteristics of your data and your specific research needs.
Group Selection
Navigate to the
[Group selection]panel to explore different ways of grouping your data for visualization. You can select variables or categories to group your data, such as experimental conditions, genes, or sexes. This flexibility allows you to customize the visualization and highlight specific aspects of your data for more detailed analysis. Use the options in the panel to easily switch between different groupings and gain insights from various perspectives.
Batch Normalization
Batch Normalization aims to correct unwanted technical variation in protein expression data arising from experimental batch effects. By applying normalization techniques, you can ensure that the observed differences in protein expression are due to biological variation rather than technical artifacts. This process offers several key benefits:
- Improving the Accuracy of Differential Expression: Normalization helps in accurately identifying true biological differences by eliminating technical noise.
- Reducing the Impact of Batch Effects: It minimizes the influence of variations introduced during different experimental batches, leading to more consistent and reliable data.
- Enhancing the Reproducibility of Results: By standardizing the data, normalization ensures that the results are reproducible across different experiments and studies.
Steps for Batch Normalization
-
Navigate to Batch Normalization Tab
Click on the
Batch Normalizationtab in the left sidebar of this page. -
Select Normalization Method
Choose the appropriate normalization method based on your data:
Internal: If your data has an internal reference, such as TMT data, you can normalize the data based on an internal sample.Linear: If your data doesn’t have an internal sample, select linear normalization. Linear normalization adjusts your dataset based on overall trends, bringing all samples to a common scale and correcting for systematic technical variations.Internal+Linear: If your data has an internal reference, you can choose first internal normalization and then linear normalization for better results. -
Selecting Batch Group Information
To perform batch normalization, please specify the necessary batch and internal reference as provided in the sample information file. Please use the dropdown menu to choose the relevant batch identifier and select the column that contains the internal reference sample information as provided in the sample information file.
Once you have selected the appropriate batch and internal reference column, click the
[Run Batch Normalization]button to initiate the normalization process.Select your batch group column in the dropdown menu.
-
Internal Method: Format: Include one more
Infocolumn. Make sure to specify the internal samples. -
Linear Method: Format: No need to have internal reference column.
-
-
Normalization Results
After normalization, the
Data Tablewill appear on the right side of the page. Similar to theExploratory Analysis, Intensity Distribution, PCA plot, and Sample Correlation are generated to assess the effectiveness of the normalization. -
Proceed to Differential Expression
Now, you can directly proceed to
Differential Expressionanalysis.
Differential Expression
Differential Expression is a method used to identify proteins that show significant differences in expression levels between groups or conditions. By comparing the expression levels across different groups, such as diseased versus healthy samples, researchers can pinpoint specific molecules that are upregulated or downregulated, providing insights into the biological processes involved.
Steps for Differential Expression
-
Navigate to Differential Expression Tab
Click on the
Differential Expressiontab in the left sidebar of this page. -
Adjust Differential Expression Parameters
Under the
[Differential Expression Methods]panel, adjust grouping variables for comparison:- Select the statistical method, and adjust the grouping variables for comparison.
- Define the type of comparisons: pairwise or multiple group comparisons.
- Choose whether to perform data imputation before conducting the differential analysis.
JUMP shiny offers three options: no imputation, imputation using the minProb method, or excluding all proteins with missing values in any samples (i.e., Data without NAs).
For the imputation, JUMP shiny handles missing values based on the following four cases:
- If there are no missing values in all groups, no imputation is performed.
- If one group has >1 values while the other groups have ≥1 values, no imputation is executed.
- If one group is completely missing while the other groups have more than one value: Missing values are imputed with the top n minimum values sample wisely in the group. The number of missing values imputed is equal to maximum number of values in the other groups that have at least 2 values.
- If all groups have only one value or are completely missing, this protein row is discarded from the differential expression analysis.
Imputing missing values can help minimize potential biases and make the dataset more complete, thereby improving the reliability of the results. Depending on your specific needs and the characteristics of your data, you can select the most appropriate option for your analysis.
-
Run Differential Expression
After defining all parameters, click the
[Run Differential Expression Test]button. The program will analyze your data and generate a result table, providing detailed information from the differentially expressed analysis. Depending on the type of comparison, the result table will include the p-value, FDR, and log2 fold change in the last three columns, along with the original expression levels (log2 transformed values) of each protein.You can click the header of each column to sort the proteins based on p-value, FDR, or log2 fold change. This functionality allows you to easily identify the most significant proteins according to your criteria. Additionally, by selecting a particular protein, you can view its expression levels across all samples.
-
Visualization of the Results of Differential Expression
-
Volcano Plot
Displays the distribution of result data, highlighting significant proteins based on p-value and log2 fold change. This plot helps in identifying proteins with substantial changes in expression and their statistical significance, making it easier to identify key proteins in your analysis.
You can customize the volcano plot by defining the graphic title, significance levels (p-value or FDR), and the cutoff values for up-regulated and down-regulated log2 fold changes. The determination of up-regulation or down-regulation is based on the parameters specified in the[Differential Expression Methods]section, such as the order of the groups being compared.
To label proteins with gene symbols in the volcano plot, click the + in the advanced parameters section and enter the protein accession numbers you would like to highlight. Additionally, if you click on a point in the volcano plot, a bar plot will be displayed beneath it. This bar plot provides a detailed view of the expression levels of the selected protein across all samples, providing expression levels across all samples.
-