Hello! Trying to make an UpSetR plot showing overlapping gene annotations between 17 transcriptomes.
What am I doing wrong?
Is there a bug in the UpSetR library?
I posted on the UpSetR library's community GitHub forum, but did not really get an answer to my questions. My R language expert colleagues at UC Davis could not seem to figure it out either.
Here is what I know:
There ARE overlapping identities between all 17 of my sets (see code and output below).
However, my UpSetR plots do not reflect this.
Below is some R code with a downloadable set of data.
library(UpSetR)
# https://cran.r-project.org/web/packages/UpSetR/
# https://cran.r-project.org/web/packages/UpSetR/vignettes/basic.usage.html
# Download and import to a data.frame
if(!file.exists('presence_absence.csv')){download.file('https://raw.githubusercontent.com/johnsolk/RNAseq_15killifish/master/evaluation/presence_absence.csv', 'presence_absence.csv')}
pa <- read.csv("presence_absence.csv")
pa <- pa[,-c(1)]
rownames(pa) <- pa$Ensembl
pa <- pa[,-c(1)]
This plot will display 4 intersects. THIS WORKS. This is what we should see for all 17 sets.
upset(pa, sets = c(c("A_xenica","F_catanatus"),c("F_chrysotus","F_diaphanus")),mb.ratio = c(0.55, 0.45),keep.order = F)
But, as soon as I move to 6, I am confused - there is no longer a bar crossing all 6 sets.
upset(pa, sets = c(c("A_xenica","F_catanatus","F_notatus"), c("F_chrysotus","F_diaphanus","F_nottii")),mb.ratio = c(0.55, 0.45),nsets = 6,keep.order = F)
The full 17 sets does not make any sense
upset(pa, sets = c(c("A_xenica","F_catanatus"),c("F_chrysotus","F_diaphanus"),c("F_grandis","F_heteroclitusMDPL"),c("F_heteroclitusMDPP","F_notatus"),c("F_nottii","F_olivaceous"),c("F_parvapinis","F_rathbuni"),c("F_sciadicus","F_similis"),c("F_zebrinus","L_goodei"),c("L_parva")),mb.ratio = c(0.55, 0.45),keep.order = F)
This confirms that there are overlaps between all 17 sets (and 16, 15, 14, etc)
test<- rowSums(pa)
test<- as.data.frame(test)
colnames(test) <- c("sum")
sum(test$sum == 17)
sum(test$sum == 16)
sum(test$sum == 15)
sum(test$sum == 14)
Output:
> sum(test$sum == 17)
[1] 10897
> sum(test$sum == 16)
[1] 2979
> sum(test$sum == 15)
[1] 1585
> sum(test$sum == 14)
[1] 1247
I'm wondering if there is a bug in the UpSetR library that prevents displaying overlaps from such a large number of sets. I remembered Titus made a similar set of data looking at annotations from a few species with the python library, pyupset. When I made a similar notebook using pyupset with my data, there was a performance issue. The notebook ran overnight and didn't produce any output. This makes me think that it might not be possible to produce a plot with 17 intersections?
Any insight anyone has would be greatly appreciated!
Comments
comments powered by Disqus