Table of contents

  1. Introduction
  2. Data normalization
  3. Differential expression
  4. Pathway analysis

Introduction

This tutorial describes analysis of RNA-seq dataset GSE53053 using limma pipeline. The dataset measures gene expression in bone marrow derived macrophages treated either with LPS+IFNg or LPS (Jha et al., Immunity, 2015).

Let’s start by opening the dataset in Phantasus by following the link https://alserglab.wustl.edu/phantasus/?geo=GSE53053

Data normalization

We will use voom method from limma package to transform the gene counts into log-expressions. This essentially produces log2 counts-per-million (CPM) values, but also internally calculates precision weights that can be then used in differential expression analysis.

To do it, open Tools/Advanced normalization/Voom menu. Add condition as a factor into the design matrix. Also, check Automatically filter our lowly expressed genes checkbox to get rid of non-expressed genes.

After we clicking OK a new tab will appear with logCPM values for 15,172 genes.

Now, using Tool/Plots/PCA Plot we can observe the general concordance within the groups, although MandIL4 samples seem to be more noisy.

Differential expression

We will use Tools/Differential expression/Limma for differential expression analysis. For proper accounting of weights calculated by voom in the previous steps, Advanced design tab should be used. As before, add condition factor into the design. Select MandLPS_andIFNg as the target level and Ctrl as the reference. Click OK to run the analysis.

The differential expression results will appear as new columns. By clicking on t column two times we can make the most up-regulated genes appear at the top of the heatmap.

Pathway analysis

For pathway analysis with fgsea tool we need to have Entrez gene identifiers. To get them, let’s use Tools/Annotate/Annotate rows/From database menu. Select org.Mm.eg.seqlite - Mus musculus in the Speciment DB field. For Source column use ENSEMBLID, Source column type is then ENSEMBL. Choose ENTREZID as a Result column type.

A new ENTREZID column will appear with the corresponding gene identifiers. However, not always there is one-to-one correspondence between Ensembl and Entrez IDs, which results in duplicated Entrez or absent gene IDs. We need to deal with this before running fgsea, as it requires unique Entrez entries.

Let’s use Tools/Collapse menu. Select Maximum Mean Probe as the Collapse method and ENTREZID as the collapsing field. Also check Omit unannotated checkbox.

After running the collapse tool, a new tab will appear with around 13,000 unique Entrez genes.

Now we can run Tools/Pathway analysis/Perform FGSEA. We will use Gene Ontology biological processes pathway database, limma’s moderated t-statistic for gene ranking, and ENTREZID column for gene IDs (in accordance with the pathway database).

Table with FGSEA results will appear in a new tab.

By clicking, for example, on Cellular response to interferon-beta we will be able to see pathway details. Copy pathway gene IDs and paste them into search field of the heatmap to highlight genes from the pathway. Sort rows by decreasing of t and get a bird’s-eye view with View/Fit To Window to get a picture of pathway gene distribution within the dataset.

Finally, and individual GSEA plot for this gene set can be made with Tools/Plots/GSEA plot. There, select t for ranking and condition for annotation.