Main

CoMS 2023 Viromics Webinar Dr. Erin Harvey

0:00 Introduction 2:15 Dr. Harvey Formal Intro Questions: 34:59 Question 1: 41:34 Question 2: 47:12 Question 3: 48:45 Question 4: 51:05 Question 5: 54:11 Question 6: 55:01 Question 7: Question 8: Question 9: Question 10:

Ohio State’s Center of Microbiome Science

1 month ago

hello thank you everyone for coming and be very welcome to the fifth ECR bomic seminar of this year let me introduce you to our speaker of today her name is Erin Harvey she's currently a fourth year postto at the University of Sydney Australia in the lab of Eddie Holmes the homes lab studies the emergence of evolution of RNA viruses and Edie works on the virus of Australian mammals and parasitic invertebrates with a particular interest in effects of changing land use and anthropogenic activities
on the vums of endangered native species eding did her uh did her undergraduate students at the University of New South Wales in Sydney majoring in immunology and medical microbiology and did her honor thesis at the kormes Cancer Center Within the genomic informatic lab under the supervision of Marshall Dinger from there she joined Eddie Holmes Lab at the University of Sydney and began her PhD studies in 2016 edin began doing virus Discovery using metat transcriptomics in her PhD studies and di
d vum studies on Australian ticks and flee koalas Antarctic penguins and their ticks she also work on a project using drones to capture whale blow which we extracted total RNA from unidentified Noel viruses ering completed her PhD studies in 2019 and has been exploring her interest in mamalian virm since then she's currently spending two months collaborating with Sebastian leim Lab at the University of grenan to learn more about endogenous viral elements and the title of her talk today is the ut
ility of pH in virus Discovery host Association taxonomy and disease potential so aring whenever you're ready you can please kindly share your presentation yeah okay is that all good yeah cool um yes so uh my name's Ain I'm from the University of Sydney but I am currently in the Netherlands um hence you can see that there's still daylight outside if I was in Sydney it would be midnight right now so thank you to the people from Australia who have also joined today um and this talk will be quite s
imilar to the one I gave uh at the rdrp summit but uh I will go more in depth on things like methods um and some more examples uh of uh where phenic trees are really useful uh in virus Discovery um so for most of us we are doing our virus Discovery entirely bioinformatically these days um which means that we're not really left with a lot of options uh for analysis techniques to put our virus sequences into context um and we are pretty much restricted to comparative uh uh analysis um and in the w
ords of the boss man uh philogyny is the obvious method um G pretty much uh introduced me pretty well um so my background uh is I've I've always wanted to study virus ution um and that's why I've been with Eddie Holmes for so long in Sydney um so uh for those who are not familiar with this figure um I think that this uh figure from M's uh 2016 paper redefining the invertebrate virus sphere really captures the impact of metat transcriptomic virus discovery on our understanding of virus diversity
um and for those not familiar the red lines indicate the viruses that were identified in this study uh and uh the gray lines are the viruses that were in the reference database at the time um so as you can see like groups like The borner vales uh were dramatically affected by this one study let alone all the other studies that have come um uh along behind it um and groups like that monoch group that um at the time was very new um now the taxonomy has completely changed uh entirely because of uh
metatranscriptomic virus Discovery um but on the flip side of that coin um the databases that we rely upon are still very much manually curated um and uh when they were set up um they were sort of set up to handle maybe a few hundred entries per year and many of those would have been resequencing of known viruses whereas now they're receiving thousands of entries per year of completely novel viruses and considering that they still rely on manual curation that leads to a lot of incomplete and inc
orrect entries in uh these databases and when we have incomplete and incorrect entries these can compound when people are relying upon blast alone to assign things like taxonomy and potential host associations um and this is one such example um I hope Sabrina doesn't mind me using this example again uh but I came across this virus while I was looking for an outgroup for a tree that I'll talk about uh in more detail later um and this virus was uh assigned as a hand like virus by the author but nc
bi has uh assigned the taxonomy as being within the mon uh monega Valles um and for those who don't know hter viruses are actually within the balies which is a different group of negative stranded RNA viruses um and given how uh divergent This virus is I mean it's hard to tell if it really should be within this group or not um and I have also put it in a tree with other bun viruses and uh it does look equally as divergent uh in this tree as well so when you're working with highly Divergent virus
es it can be quite difficult to assign taxonomy and just using blast hits um can be problematic um so I'm going to go into some examples of um uh where there are errors in the databases and how to deal with those things um and also how to use phog gentics to uh analyze your uh novel virus sequences um so first I'll talk about an S mining project um that I've been working on uh and secondly I'll talk about a project that I did in collaboration with a group of ecologists that we sampling uh feal s
amples from small mammals in Tasmania um to look at the effects of land use on the virs of these animals um I'll go through the workflow now just to get it out of the way because it is pretty similar for both projects um so our lab does total RNA extractions um and the kit we use really depends on the type of sample that we're working with because our lab works with a huge range of sample types but generally we will go with the kyogen Ron easy plus kit um that kit is pretty good it can handle mo
st things um and I use those for the veal samples I'll talk about later um but I also use the K Shredder columns with uh these samples as well just to try and get apps abolutely everything I could out of the sample because we weren't left with a lot to work with um we then do a ribosomal RNA depletion and by we I mean we send our samples off to a company called ARF in Melbourne and they do our library preps for us uh and then we sequence on an NOA seek and once we get our sequencing back we do u
h sort of cical QC uh trim our reads and do a denovo assembly um and at the moment I'm using a meah hit um but there's quite a few different uh denovo assemblers out there and um uh I think there's a couple of papers comparing them um I'll go into my virus Discovery pipeline now but I just uh want to point out that uh even within our group alone people use uh different methods depending on what kind of viruses you're looking for and for me working on um mainly mammals uh this this works for me u
m so I use Diamond blast x uh with a custom data base of virus proteins that was put together by Justine Chiron when she was in our lab um and that collects uh anything that's got any kind of uh um amino acid sequence similarity to a virus protein um but this data set will be filled with a lot of false positives um so then I remove false positives with a a blast n against uh the ncbi NT database and I'll remove everything that has sequence similarity to something that's not a virus so I'm left w
ith uh things that have no uh nucleotide sequence similarity to anything and things that have sequence similarity to virus sequences um and then with that data set I will then uh do another Diamond blast X against ncbi's NR database uh and I will uh then take out all of uh the hits that still are hitting to virus proteins um but at that stage I will at things like the um the evalue and uh the length of my contig um and the length of the the sequence similarity as well um and then I will manually
have a look at those contigs um uh ingenious and this gives me the opportunity to check for complete RFS um and to check that uh I'm seeing the genome structure that I would expect um and also if you're getting fragments um multiple fragments of a virus that don't assemble into a nice full genome um you can try and do a reassembly in genius or find other missing contigs um and also with uh segmented viruses try to match up and get all of the segments that you need um I'll will then do a web bla
st with those contigs um a just as another check to make sure that I'm not seeing anything like host proteins or um things associated with bacteria or C or any other thing any other uh false positives that could be in my sample um this will also give me an idea of what the taxonomy could be of this virus because sometimes just using the top five blast hits doesn't give you a lot of information and I'll talk about that a little bit later um and then once I've decided what taxonomy I think the vir
us will be uh I use ncbi virus to collect all of the RDP sequences um but that can be quite Pro problematic as anyone who uses NC ncbi virus would know um because uh the rdrp can be labeled as many many different things um and you even have to uh account for the fact that there could be uh spelling mistakes um so once I have uh an input file that I'm happy with I will then uh produce my alignment um and generally maed is the tool that I'll use um and uh you can play around with the parameters th
at you use um but sometimes if you're working with um particularly more um Divergent viruses uh you might need to play around with the tools that you use uh and uh explore something like clustal or even the aligner in genius um uh I'll then trim my alignment and I always always always manually assess my alignments um to see if uh the alignment tool I've used looks like it's the correct one uh if it looks like I've uh tax Ally assigned my virus correctly um and also to see if I need to change my
parameters um and then once I'm happy with my alignment uh I will put it into IQ tree and use model finder um but I will always check what model it has uh picked for me uh and make sure that it makes sense in terms of um uh what I would expect it to choose and um I got a question at rdrp about how you decide if the model is right and all I can say is Google as your friend um and to read uh other papers where people have uh built trees on the same sort of family so now we get into the interesting
stuff um so the first uh project I'll talk about is this s mining project uh where we screened all of the d uid uh RNA SE libraries that were on um the ncbi S uh and that was 446 libraries um and the Jesse ards are a group of Australian masup carnivores um and mupi are mammals that um uh have like a pouch that they uh uh grow their young in um and they're very common in Australia um and so we passed this to through a uh more liberal pipeline that is put together by my student um John miid uh an
d then I put it through my pipeline which is a little bit stricter just for purpose or preference and we removed all of the EES um and we know quite a lot of about the eaves in my suul thanks to Emma Harding's paper so we identified 15 novel viruses through this screening um and surprisingly most of them are actually DNA viruses um and I think this could be because of uh these libraries were generated not for the purpose of virus Discovery and so there could have been steps in the library prep l
ike polya selection that would remove a lot of uh viruses um and so so but we did Identify two Delta viruses and three RNA viruses uh and these were across numbats anti-us uh fat dut and tazzy Devils um and you can see there's an over representation of the anti kinus and the devils and that's um an indicative of how many s libraries there are available for those species um and they are more studied because of their biology so I'll start off with talking about this uh these Delta viruses um so uh
up until quite recently Delta viruses were thought to uh only infect humans and to have emerged in humans um and that was proven to be incorrect through metatranscriptomics um there was a Delta virus identified in Birds um by my colleague Michelle Wy and uh one in a snake I think um and since the then Sr mining um uh particularly using the seratus web tool has identified a a few other mamalian um Delta viruses and we identified U these two and they form a little monoptic CLA which is quite inte
resting um and I just want to point out that uh I did use Laura Bergner's uh paper as a reference to build this tree um so even if you do have a lot of experience in uh phog gentics sometimes it is best to look at what somebody else is done to help you uh to help influence uh the way you build a tree because um building this tree can be quite difficult um yeah so it'll be interesting to see if we can sequence more marsupiales and see if uh this sort of monoptic CLA holds up um so we also identif
ied two novel hpci viruses uh one in an anti-us uh and one in a nbat and uh the anti-us virus clusters with a bandicot capacci virus um that I actually identified from a a blood Mill of a tick um a few years ago in my PhD um and uh of note the mupi hassis don't cluster together so there's also uh this cluster down here of aala and a possum um and my theory is that uh this uh uh difference could be because uh bandicoots are omnivores and they also sort of um scr around on on the bush floor um and
so to anti kindness whereas uh koalas and possums are tree dwelling uh and their herbivores so this um difference in herass viruses could be that there's been multi multiple spill spill overs into marial um and uh that could be due to their differences in environmental Niche um but you might have noticed the other novel virus is sitting all the way down here and that's our numbat hassy virus um and numats have quite interesting biology in that they exclusively eat termites um and they are the c
losest relative to the Tasmanian Tiger um and they are now only exist in two national parks in the southwest of Australia um but that doesn't really explain why this virus would be clustering down at the bottom of the tree with a group of viruses that is associated with birds and reptiles um and I think that that could be just that this virus is so Divergent um that it there's not a lot of um bootstrapping support for these nodes down here um and so until we sort of fill in that that evolutionar
y space between um it and the other mamalian viruses we don't really know where it actually sits in the tree um and as you can see the hassy viruses do generally sort of cluster by host um but you may have noted notice this blue line down here that sort of box that Trend um so I noticed this virus while building this tree um and this virus uh is listed in ncbi as having a mosquito as its host um and for those who know anything about hpci viruses you'll know that they're uh not Vector born um and
they're not associated with invertebrates so this virus looks quite suspicious um and given the fact that it has come from sequencing of a mosquito I would be concerned that this may have actually come from the blood Mill of that mosquito rather than um replicating within that mosquito itself and I think that's where we need to be careful when we assign a host to a virus because should someone come along with another novel virus they might assume that it could be uh Vector born or ver invertebr
ate infecting when in fact this is just a mistake but on the flip side of that um sometimes you also identify highly Divergent viruses that buuck that Trend um and this virus that uh was identified in a library of uh devil facial tumor disease which is a cancer that affects Tasmanian devils um I 100% thought this was contamination to begin with um and this is where the benefit of uh the our whole RNA sequencing approach comes in because we can actually have a look at the um the composition of th
e whole library to see if there's any potential sources of contamination that could explain this virus um and so this was cell line sequencing so it's kind of unlikely that we would have any sort of vertebrate here um but you never know you know maybe a fly in the lab fell into the culture or I don't know anything could happen um but we didn't see any source of contamination uh we use a program called CC meden uh to have a look at the composition of the libraries and it spits out a nice little C
hron graph uh so you can uh sort of have a look around and see if there's anything that could be potential uh contamination this virus was also quite abundant it was the most abundant virus in this study um and this might not seem like a huge abundance but for viruses this is um pretty pretty impressive um and it's also extremely Divergent with only 24.3% amino acid sequence similarity at the rdrp um and that was to an inverte virus um it's also looking like this virus is exogenous we have a nic
e long conted with a really nice RF um but true viruses were discovered in invertebrates and thus far they have largely been uh only identified in invertebrates but recently there has been uh the description of a new genus called the pisky chu which includes uh some fish Associated viruses and two viruses from snakes and there's also a paper that I'm not sure if it has been published yet but is out as a preprint where they identified um uh chew like viruses in Turtles and a lot of these have uh
not only a disease disase Association but an association with neurological disease um and there are EES of uh Che like nuclear capsid proteins in Tasmanian devils um but that doesn't look like it's anything to do with our sequence it it's quite Divergent from our sequence um and uh it's you know full of stop codons whereas we have a nice big ORF um and it wasn't I was still on the fence about this and thinking that it was most likely uh contamin uh of some sort until uh my PhD student John meid
reviewed the manuscript and he came across a paper that suggested that dftd uh is of Schwan cell origin and Schwan cells are part of the pns so this holds up this idea that maybe vertebrate 2 viruses are associated with the nervous system and definitely makes me feel a little bit better about uh um suggesting that this could be the first Maman true virus so now on to the other project so um uh our uh e colist uh collaborators from uh Tasmania came to us with uh the question of whether um things
like agricultural land use uh forestry um could have an impact on the virs of native species and so they uh went out and did a whole lot of trapping so these uh FAL samples were all Tak from trapped caged animals so we can be certain that the fal sample is actually coming from uh our Target host um and they looked at uh feral animals like cats and black rats as well as native species um and that included carnivores like devils and quals as well as herbivores like um brush ta possums and swamp ra
ts um and so we pulled all of the extracted all of the RNA from these individually and then pulled them by collection location land use type and species and you might notice on our map down here that we've sort of excluded this Southwestern area of Tasmania and that's because that's one giant National Park essentially um and it's actually where alone Australia is filmed so if you want to see what the environment actually looks like and what some of the wildlife look like um I suggest checking th
at out so um we identified a whole lot of virus uh contigs in our libraries um because we're looking at feal samples that's sort of expected um particular particularly when you're looking at carnivals um because they're uh they're really scavengers so they get into everything um so I decided that maybe it would be more useful just to look at Maman Associated viruses in this case um and I'll talk a little bit about how I decided what to include and what to exclude um so for the sake of time I'm j
ust going to talk about the corn of rallies um a because we identified quite a few uh interesting species in here um but also because I think it's an order that people have quite a lot of uh problems with uh understanding uh the taxonomy um and uh there's quite a few incorrect uh incorrectly identified peor of ver that are actually just peor ofes um so it's important to note that there are only two families within the cores that are melean associated and that is the corner veray and the kis ver
um um so it can be quite difficult to determine if your novel virus belongs within one of these two families um this is the top blast hits of one of my novel viruses and as you can see um the top hits don't really tell us a lot the top hit is my uh pet haate riboa SP but we also have some um some hits that look like they could be real peorn day um her virus G virus inovirus these are all uh poorn day um but we also have a real uh you know weird result here seeing some Nan Aid SP which are a comp
letely different family of viruses that infect fungi so I put uh these this blast result in a tree um and as you can see we have our nanas clustering up here at the top the pecor ver clustering here and then our more Divergent um more invertebrate Associated viruses down the bottom here so it does look like um despite the the second hit being a hipat virus uh our virus is probably not vertebrate Associated and so this is one that I would have excluded um and just on a larger scale um this is how
I identified uh so first I would use a blast screen to to exclude the things that are obviously not corn um but then with those things that had sort of confusing top hits uh I put them in a tree and use that to exclude things um this tree is obviously very very difficult to look at um from this resolution um but you can see there's there's a few sort of red lines uh around there that are in um the wrong spot so they would be excluded um and again for the sake of time I'm just going to talk abou
t the cisive that we identified in this study um so we identified uh seven K virus in our study two of them having been previously identified um and that is K9 BC virus and uh rhdv2 um and this was in a a a range of samples pretty much covering all of the species that we sampled here and this includes um feral and uh Native species as well as herbivores and carnivores um and so we because a lot of uh kis viruses can be associated with gastrointestinal issues um we asked our collaborators if they
saw any signs of disease because they had actually trapped the animal um they had done a sort of General Health check um and also because they sampling feces um they could tell if there were any signs that something was wrong um and they said there were no signs in any of the animals um that they were uh sick um and so it's unlikely that things like this Kone VC virus are actually um causing disease in uh tazzy Devils so TD is tazzy devil um but it's very likely that this is rather coming from
a terrible habit that tazzy Devils have and that is eating the feces of other animals um and we also identified a lot of human Associated viruses like human Rota virus in the tazzy devil feces and we think that's because they go to campsites and get into the sewage systems which is pretty disgusting so don't be kissing tazzy Devils um and across the other species it's quite difficult ult for us to say whether these viruses are truly infecting our sampled host or if they're coming from the diet b
ut in the case of the brush tail possum considering that they're a herbivore we might be more likely to suggest that this could be coming from the Brushy um but you never know because they'll also get into anything they can get their hands on um the most concerning thing that we found here was rhdv2 um and that is rabbit hemorrhagic disease virus 2 which is used as a bio control agent in Australia to control the rabbit population um and there has been quite a lot of concern that this uh virus co
uld jump into native species and if it had the same um uh same effect on Native species as it does on rabbits this would be quite concerning um because a lot of native species are already at risk of Extinction and an a virus like this could completely wipe them out um so that's when it's important to look at the biology of your virus um and rhdv2 affects the liver of infected animals uh and you would not expect it to see uh expect to see this in feces um but we're lucky enough to have an expert
on rhdv2 in our lab um Jackie maah and she built this beautiful tree for me um and showed that uh our rhdv2 was indeed clustering with other uh rhdv2 isolates taken from rabbits in Tasmania um and uh it's not looking like this virus has changed significantly in order to jump into a new host um and the fact that it's only been found in feces also suggests that uh it's not infecting the host um that we've sampled but rather is coming from the fact that the the animals that we've found this in are
scavengers and so they're finding Dead Rabbits and eating uh them and they have a a high TI of V virus um and this was uh concerning because we actually had 20 5% of total reads being uh rhdv2 in one of our devil libraries which is an absolutely insane amount um so I'm just going to summarize what uh we're doing and what we're doing in the future so the must carnivore SRA paper should hopefully be out soon once Eddie gives it the okay um and the second project I've talked about where actually um
doing ecological analysis um and using votus for that and that's for the purposes of our collaborators but then I will also do a virus characterization paper where I put all of these viruses into trees uh and follow the ictv rules and make all of those sequences available online um and try and provide some context um we also have blood samples collected during the feal sampling uh which will be quite interesting to if we see any overlap between the viruses we see in the feces and the viruses we
see in the blood um but it's not a perfect overlap and we do have some other species as well like bandicoots which will be interesting to look at um and I also want to use uh the seratus Palm scan uh tool to try and see if I can identify any more true like viruses similar to my devil true virus um in any other mamalian libraries um and just since I have a little bit of time I just want to uh advertise my PhD students and the lovely work that they've done um so this work was done by John myth sa
id he's the guy on the left um and he builds these um beautiful tanglegrams uh to look for cross species transmission events in uh the viruses he identifies and so uh he extended The Host range of the flavor verid using S mining um so he found pesty viruses in amphibians reptiles and rafing fish and he also identified a uh potentially novel cross species transmission event between bats and rodents but he wants me to say that it's very potential um and also uh Vince who is Mr Fish in our lab um h
e put out this beautiful paper looking uh doing uh sequencing of fish collected from an area of 100 square meters uh in the Great barer Reef um and he was looking for signs of cross species Transmission in the viruses that he identified um and he saw little evidence of cross species transmission um and saw that there was more diversity in um the smaller fish I'm not going to try to name them but um which is interesting because they have very short lifespans and grow very rapidly and so perhaps t
here's a a bit of a um a payoff between that uh between the rapid growth uh and um being able to uh fight off viruses um and he also identified a couple of virus in this pisky chw group um and he'll have a paper coming out hopefully soon that he's done in collaboration with a group in Switzerland where they've sequenced uh a whole lot of cichlid fish uh from a single African lake and he's looked at the viruses in those fish and looked for um patterns of cross species Transmission in those as wel
l um so just to summarize um we're pretty Limited in the type of analysis that we can do uh when we're just working with virus sequences um and I think f is a a great way to get a lot of information with quite minimal input and also using tools that are are very well described um and easy to access as well um I want everyone to keep in mind that you can't just trust everything that's on ncbi um and that building trees is a great way to check things like host Association and taxonomy um and if yo
u're very new to phog gentics um don't be afraid read lots of papers a lot of people are now publishing their alignments when they publish their papers um so I would suggest looking at those alignments trying to recreate them yourself um but don't be afraid to ask for help um and there's a lot of courses out there as well to keep an eye out for um so I just want to thank Mena and a Who provided me with the feal samples um and everyone in the homes lab thank you so much Arin incredibly interestin
g talk so the chat went busy really really soon so I'm just going to start scrolling up to see who was first asking yeah jeanclaude mangara you have several questions would you mind to go over them yep that's fine um so the first is um for oh I mean I was asking the the person who was the question or if he's not already there CL are you there I think yes yes I'm thank you yeah thank you very much for the presentation and for this topic which is really interesting uh hopefully I'm working also in
the same on the same topic but I'm working I'm working on protocol of metagenomic uh to discover novel vares and I'm using nanop it's also I'm planning also to develop a bomatic pipeline which can be easily deployed in a limited resource region so that's why I have many question about how you proceed uh maybe the first question was about your protocol your protocol uh how do you proceed how do you do you cast have you customize one pipeline or you just go from one pipeline or from one uh tool t
o another tool and after that when you have your your outputs then you can go for ftic analysis um yep so we're lucky enough to have access to um the University of Sydney server so we have um quite a lot of computational power uh at our hands and also a lot of storage space um which helps with a lot of their SRA mining projects because obviously if you're downloading all of these uh libraries and assembling them um but I do think the most efficient way is when you're downloading after you go to
the next step have a a wellth thought out pipeline that deletes the file from the previous step when you no longer need it and that way you cut down on your um your storage uh requirements um we uh have uh automated pipelines that go um from the the step of trimming uh assembly um all the way through to like a final data set um that'll be in the form of something like a CSV file that we'll then manually have a look through um unfortunately because depending on what types of viruses you're lookin
g at um a lot of the reference databases are becoming more and more like muddy and it's harder to automatically pull things out based on sequence similarity um like that chew virus there's no way I would have pulled that out if I had an automated pipeline because I would have thought that chew viruses were invertebrate Associated um but also you could be getting a lot of false positives if uh you're using the reference database of say peor veray because there's a lot of nonmon sequences in there
so unfortunately I don't think it's possible to completely have you know one lovely pipeline that just spits out a tree at the end um but that that first section of our of our workflow is in a pipeline um and then for the fog gentics again I think it's at the moment quite manual um it would be great to try and develop some sort of uh method maybe through utilizing something like AI uh to try and uh a build your uh input data sets because it's quite unless you're always looking at the same famil
ies it's quite difficult to build those input data sets because so many viruses are unclassified so potentially the virus is most similar to your virus might be missed if you're just automatically pulling down by assigned taxonomy um does that answer your question yes I think thank you other question is was about the method you use for RNA depletion R RNA depletion and maybe I could go also uh do you only use uh do you only look for RNA viruses or also DNA uh another question so maybe I can fini
sh and let other people going on uh so do you do the protocol your protocol is it for the all CH or you just look for reads which are related to vares and then you stop by there after having those reads do you also develop maybe primaries or to look for the to amplify the whole CH yeah um so uh for the our RNA depletion um that's done using the alumin kit but we don't do any of that inhouse um because it's quite difficult to get a good setup for um building libraries we find that it's easier to
Outsource that for us um uh also so we sequence everything we just do a ribosomal RNA depletion and then we do short read sequencing so we're not getting full genomes we have to do that denovo assembly step um and you're not always going to get nice full genomes sometimes you will just get fragments um depending on how interested you are in the virus you found we might go back design primers and PCR um and the viruses in Vince's paper those have all been PCI I think except for one where he didn'
t have any sample left um but yeah we're we do uh incidentally identify DNA viruses just because you see those transcribed genes sometimes and uh I actually found more DNA viruses than RNA viruses in that Sr screening project which was interesting um but you never get the full genome obviously so then sometimes it's better to go back uh and do some targeted DNA sequencing if you want to try and get the full genome but obviously when you're working with Sr you have no options thank you very much
maybe for other question I can s yeah perfect thank you J CLA uh so next question was formulated by Luis tat lavanda would you mind to read your question Luis hello hello hello uh I from Peru H sorry for my English uh I prefer to read my question um s ER is very impression your presentation very informative for especially in my case I am very interested in this kind of study as metagenomics I I try to start to study U for the development or to f a new vir or virus virus are not only for men or p
eople um the to for animals um in this moment we have uh nanoport technologies that uh quality is improved before before not so good um but my my question is more about how do you determinate that the sequence is from a specific new virus or maybe could be a artifact because I see the different different programs or software have a determinate parameters and could be generate errors uh into the roval assemblies the how is your perspective for determinate could be real or not real what is the c u
m yep so again it's quite a manual process of looking uh at the the SE the cont that we get are we seeing a full genome what sort of frag how how big are the fragments we're getting um I tend to eliminate anything that's too short so anything shorter than say 600 nucleotides is definitely um out of the question um we also always look for a complete rdrp if possible um uh you also I know this is the most annoying thing to say but you get a feel for it once you've uh been doing it for a while um a
nd that's unfortunately just through following false uh false leads and um I think once you put things into trees you get a better idea as well um but generally I think to begin with you want to be getting nice full genomes um and you want to be using your uh blast pipeline to look for sequence similarity to other viruses um also depending on your host there might be information out there about uh endogenous viruses um uh and I'm actually over in the Netherlands to work with Sebastian Lem uh on
trying to find out a way that we could potentially do some sort of scoring system to see if uh a virus that we find through metatranscriptomic is likely exogenous or endogenous um but it is very difficult unfortunately it does come down to a bit of a manual approach does that does that answer your question uh just just s and other question is number four is do you see in nowadays we can found a new variant of vience on database of S A and A in a is possible and it's possible to to uh phones to g
o to put in a new paper is valid uh yeah I I think so I'll has done a couple of these types of papers and um uh Em's tool sadus is very useful as well so that's a webbased tool so if you have um a a particular virus family that you're interested you can use uh seratus to screen the SR for particular rdrp signals and then you can reduce the number of SRA libraries that you have to screen um so instead of looking at absolutely everything and and assembling and an analyzing thousands and thousands
of libraries you can sort of Target your approach more um to just the libraries that have some sort of signal um and I think it's very valid to reuse somebody else's data most of the data that I used in my mining project is has been developed for um looking at the facial tumor in Devils um and uh looking at the biology of antik kindnesses uh and then the num that one was because they were doing a genome and they wanted a transcriptome to match it um and there were only three libraries and we sti
ll found that hassy virus um so I think uh in the same turn like when we do any sequencing we put our libraries up so that if somebody else is interested in maybe the plant viruses or the invertebrate viruses or something to do with the transcriptome of that animal they can use our data as well um so I think yeah I think it's very valid to be able to use Sr and even if you're doing a sequencing project yourself I think it's a useful tool to go and look at what's also available because you might
find other things there as well okay thanks very informative thank you next question is from Ella Tali hi hi um nice to see you again uh so um yeah I was I wanted to ask about your reciprocal blast approach so yeah I I work in soil and diversity of viruses in soil is just ginormous and running Diamond I mean diamond is fast but it's just not very sustainable so I was wondering what's your opinion about doing the same thing but with hmm instead so you could get hmm of viruses from vog or any of t
he aror tools and then to eliminate false positives you can use Pam or keg so what's what's your take on this yeah I definitely think that that's probably the way forward for big studies like that especially when you're looking at those really really really div Divergent viruses that have never really been studied before um I think Justine shiron and Sabrina sadik are probably better people to talk to about that um they're also from my lab but yeah uh within the sphere of looking at mammals you'
re really you know looking at a very narrow window compared to the true diversity of viruses thank you it's a great talk thanks thank you um I think this question is already answered for cl Luis Janssen has a question I think um hello hi Dr hary great talk um please forgive me if I didn't understand some part of this but you found some virus is associated with tasmin and devils and uh you've mentioned at a brief point the the transmissible cancers and so I work with the uh viral Discovery too wi
th viromics but also I have a little foot on viral epidemiology and so I was wondering whether I was thinking whether the two virus or any other virus you found could have any sort of epidemiological link with the cancers given that these cancers are a major threat to the conservation of these species there is any sort of plans on trying to elaborate like if there any of these viruses can be used as a marker for the contacts with diseased animals and so on um yeah so there were quite a few libra
ries in that study where we identified the true virus but it was only in one Library I off the top of my head I couldn't tell you how many were sequenced in that one project um I also don't really know the structure of their project but I would definitely the the reason that I started this Sr screen was actually to look at devils and and to see if there were any potential viruses that could be you know um contributing factors to the Pathology that they see and I think that true virus like consid
ering that it's associated with you know neurological disease and then we've got it in this cancer that's associated with um uh pns cells maybe I would love to do more work on it um and uh if anybody listening has uh some ideas or some way of getting samples and doing some screening we also identified papiloma viruses a polyoma virus and a herpes virus in uh devil samples as well but they were from lip tissue um and they were from both individuals that had uh dftd and individuals that did not ha
ve the cancer and there didn't seem to be any kind of association with or without the disease um but obviously that's you know very very you know scraping the surface there um and it would be very very interesting to do more work and maybe design some primers so people could check their samples okay thank you thank you so next question is from alejandroo belard uh helloing nice presentation J my question it was two questions so the main thing is you are using Mega which is was designed to deal w
ith metag genom not necessarily meta transcripts MH I want review had compared that with a tool that was designed to deal with metat transcripts such as RNA Spates and second question goes for the philogen side uh you said you use trim Al and I never heard about it so I would like to try it um normally I have to do like triming alignments manually because some tools like are too strict with some virus proteins because they are just two Divergent and Al the let's say 200 Amino is you end up havin
g like only 20 30 something like that so those are my two questions um yeah uh so uh the people in our lab have all sort of compared different aligners I've compared I used to be using um uh Trinity which is obviously just like way too much memory requirement and not at all designed for a metat transcript D um I've had good results using megahit I haven't personally used RNA Spades but I think other people in our lab have maybe I should uh compare those as well and see if I get better results um
and in terms of trimel um uh it it also depends on your alignment um I would say an alignment of only 200 amino acids is already very short I normally like things to be over 300 and like really really would love it to be over 600 amino acids if possible um uh I when you're working with things that are really Divergent I I I don't really know what to say because I myself I've struggled with this problem as well trying to produce a nice tree um but I think you need to play around with your alignm
ent algorithm as well um maybe try some different aligners like cluster I think uh Mary Patron from our lab was saying that it handled some of her Divergent viruses a little bit better um and yeah I think trimmel is also going to you you can set parameters and tell it how much you want it to get rid of so you can say I want I want it to keep you know 60% of my alignment or 5% of my alignment um or and you can also set whether you want a gappy or no gapy output so I think playing around with para
meters is uh the first bort of call um but maybe uh also check that you have all the unclassified viruses in your tree as well because they might be close more closely related to your novel sequence than things that are in ictv or classified in ncbi that's great thank you so much thank you Frank onu do you want to read your comment or it was just a a comment more than a question my mine was more of a comment than a question I was very impressed by by the first project that a project will come ou
t of mining RNA s data and actually discover viruses from that seems like something that we should be able to replicate in places where we don't have facilities to do those those kind of studies so thank you very good presentation yeah for sure I think it's a a great resource to um be able to mine the SRA and um uh I promise I'm not biased because I'm friends with ATM but seratus is also a really great tool as as well okay thank you so uh the next question is from Kane hello can you hear me yes
y okay thank you um I have a couple of question the first one is recently during a previous work I have a uh it was a bad sample and um there is a group of viruses maybe not really known on on on the field of viral but recently is gaining more and more interest is C DNA so there is um I can I can say one or two um especially the C virus group that is well uh known because uh the uh epidemic recurrent epid epidemic in pig farms but there is a lot of unclassified virus there so recently when I was
working I got some sequences from meah hits uh Mega hits assembly and when I did blast I used diamond blast and I go back again to do the blast manually and I found some of these sequences have been label uh with for example one of one of these segment that really trick me is the sequence was clearly C viruses from NCB and the percentage identity was around uh 95 to 100 so but when I construct the fenity and I found uh that is totally another it's closer to another uh family but not cuses so ho
w this kind of issue happens and how it they might impact in varus classification because it's it's okay for um highly similar um sequences you find but what about those sequences uh I still have many of them but still even when I take the whole order of Chris DNA CH DNA V and I went to um uh ictv metadata metadata and I download all representative spaces there and I made the whole phenetic tree basic on the rep um on the rep protein and I see most of these sequences that have that was mislabele
d was completely unclassified so how how to solve this issue that are still ping there and that might create more and more difficulty for yeah for for phenetic relationship and yeah can I can I finish or just really really briefly we have only two minutes okay okay so so so the second question is how the length of uh new virus sequence impact um the real reability of phenetic relationship because sometime we use short sequence sometime length sequence sometime partial um rdrp so how this issue c
an be addressed thank you so much um yeah that that group of viruses the circos are quite often contaminants as well um so we do no template controls where we uh we use the um we'll we'll do an extraction that has no input essentially um with the kit um and we seek sequence that on its its own uh as its own library and we identify circoviruses in there um Ashley Porter wrote a paper about it um and so circoviruses I think can quite often be contaminants so you know they might be also Mis associa
ted with a host when really it's coming from your sequencing Library um and I think that whole group is a complete mess to be honest um I have the ability to avoid them thankfully um I really don't have the answer for that and um at rdrp we talked about these kinds of issues at length um the fact that so many viruses uh are misclassified and how do we Rectify that um and I I think you can always write to ncbi so you can email them and say like this virus is actually misclassified and then they w
ill uh uh fix that um but I'm sorry I don't I don't have the answer for that I think that's a really complex question um and it's only going to get worse as more and more people are doing metatranscriptomic virus Discovery um and your second question um you definitely want as long of a sequence as possible as I said lower than 300 amino acids I think is not a very robust tree and you'll see that in the tree that you get um I think yeah I I don't know that there's really an answer to that questio
n um in terms of how to deal with it other than to maybe resequence PCR try and get more of a genome um and at least try to get a lot of that RRP segment um but yeah I I don't know that there's necessarily a way of doing philogyny with anything shorter than that maybe if you're uh getting that um key signal of the rdrp um which uh that could be useful um but yeah I'm not really sure okay uh um one one quick question is um what what is uh the real benefits from for metat transic compared to metal
ic for new viral Discovery I think that's really interesting um it's a lot easier to get rid of all of the host Associated stuff so um you're getting rid of all the host genome the host DNA um and you're just looking at the RNA and when you do a ribosomal RNA depletion you cut down the amount of uh non-virus uh RNA that's in your sample dramatically I mean you can also do a lot of other methods um like size uh like doing filtration or um uh yeah there's there's loads of different methods but we
just our lab does total RNA sequencing and we find a lot of viruses um whereas DNA I I haven't done it myself uh I think you would have a but maybe if if you know the host so say you've got a well described host you might be able to um uh remove the host genome from your sample um but most of the things I work with don't have a genome available thank you so much thank you K thank you earing Harvey so interesting uh I think we could be here the whole evening but we need to finish now thank you ag
ain and I will see you very soon in next uh in the next um seminar thank you thank you

Comments