Main

Lecture 13: Case Hx: Complex Traits

MIT HST.512 Genomic Medicine, Spring 2004 Instructor: Dr. Scott Weiss View the complete course: https://ocw.mit.edu/courses/hst-512-genomic-medicine-spring-2004/ YouTube Playlist: https://www.youtube.com/watch?v=_-gQchCLmXk&list=PLUl4u3cNGP613PJMNmRjAIdBr76goU1V5 We're going to begin by of getting at this question of why complex trait human genetics is so difficult. And then go through each of the steps that you would do if you were actually doing this work. License: Creative Commons BY-NC-SA More information at https://ocw.mit.edu/terms More courses at https://ocw.mit.edu Support OCW at http://ow.ly/a1If50zVRlQ We encourage constructive comments and discussion on OCW’s YouTube and other social media channels. Personal attacks, hate speech, trolling, and inappropriate comments are not allowed and may be removed. More details at https://ocw.mit.edu/comments.

MIT OpenCourseWare

11 months ago

so this is an outline of what i'm going to talk about and we're going to begin by sort of getting at this question of why complex trait human genetics is so difficult um and then go through each of the steps that you would do if you were actually doing this work the first question you would get asked on an nih grant application is is your phenotype heritable and so some evidence of heritability or doing a study to determine heritability is often the first step in in a genetic study so we're goin
g to talk a little bit about heritability and how you define that and then identifying disease phenotypes so that's a critical issue the difference between a sub-phenotype and an intermediate phenotype um how you want to look at those things then developing your study design um the paper that i just was telling you about this paper that's coming out in science these people looked at two relative genetic isolates the population of finland and the uh uh saginay loch sanchez population in northeast
quebec french canadians so these are populations where they had a limited number of founders and so uh the thought is is that they're more genetically homogeneous and it might be easier to find genes in these populations but what's the big concern if you found a gene in the fins what would be your biggest concern about about that [Music] exactly that you know maybe that gene might not replicate or be a significant gene in an outbred population you know in a country such as the united states whe
re there's a lot of uh ethnic variation and diversity so it may be easier to find genes in genetic isolates but it may be more difficult in terms of their generalizability so give me some other examples of relative genetic isolates around the world populations that would be considered relative genetic isolates iceland yes absolutely iceland so that's where decode is is working decode is our number one competitor in copd research they're the only company that that's actually doing uh um copd stuf
f what else where else could you go what might be the disadvantage of a tribal population or a disadvantage of a of a small island population like tristan de cunha which is where they first went to do genetic isolate work in in asthma yeah small number of people so you know you got a limited number of uh meiosis in a population like that you're not going to get too many recombinations so you just sort of run out of gas because you don't have a big enough sample size let's switch boundaries possi
bly um the fins the swiss uh the northern uh um um the um ashkenazi jews um costa rica why costa rica why costa rica because costa rica is surrounded by volcanoes it was the one place in central america that uh the span you know there's not a huge spanish influx because there was no gold there so there's a limited number of spanish founders in the 15th century they pushed the indians to the periphery and they settled the central valley of costa rica so you've got this 200 founders in the 14th ce
ntury very little intermarriage and perfect church records very large pedigrees it's one of the it's probably the closest next to the saginay lock science gen population in quebec genetic isolates uh in the western hemisphere and we're actually doing a big study there we actually have six big pedigrees with over 120 people in each pedigree we've just finished the collection of these pedigrees we're about to do a genome scan that tom is actually doing for us most of the people are isolated by geo
graphy that you've mentioned with the exception of the nazis right so we can broaden uh your i guess the definition of genetic isolate to include things like economic like things that utilize it people like you know royal families back in the old days you know i only were married because of social status and yes but now there's things where you know people with certain economic status mail me and then they come in here can we see a certain set of people where they're married yeah it turns out i
think that geography is actually a much better uh uh um i think i think that yeah i think if it was happening families or people have a chance to procreate by physical proximity they will i think the geography is a lot better um historically it's a much more reliable guide to a genetic isolate than any sort of social convention i think er er er exactly for the reasons that zach groups like jerusalem they're so strong social pressure yeah like then you know what that would they may be they may be
the exception to the rule but i think you know populations like iceland costa rica um tristan de cunha finland amish would would fall into fall into that group the mormons maybe a little bit less so um the big advantage of the mormons is not so much that they're a relative genetic isolate but that they have very large families and they have very good church records so you know those are sort of other characteristics that sort of are helpful but the idea behind the genetic isolate is is that you
've got a relatively uh homogeneous set of alleles that circulating in the population mormons do intermarry and and so they're maybe i don't i don't know that you'd consider them really a genetic isolate whereas uh uh i think that uh the amish and the hutterites and the uh um ashkenazis are people where there is some set of co-social conventions that those people are much more likely to uh intermarry the reason i ask is because it seems like that the trend might be uh you know especially nowaday
s with transportation leaders that finding geographic isolated population is going to decline right this isn't this isn't the only way to do this i mean uh um and i think it is important to sort of make the distinction between uh uh um you know linkage and fine mapping and uh um those two things may be somewhat different the the the advantages of an outbred population is is that the degree of linkage this equilibrium will be relatively narrow and some of these genetic isolated populations the de
gree of linkage to secretion can be very large that means you can get do the do the linkage part pretty effectively but the association part is more difficult because you've got these big ld blocks that you've got to uh work with and you may not be able to get to to the gene um so anyway we'll go through the stuff these are sort of all of the steps that you would do in a typical study and we'll just go through each of these so i'm going to use asthma as my example because this is the disease i k
now the best and the point here is is that asthma prevalence in western developed countries has gone up a lot so do you think that this is a genetic thing or do you think it's something else so over the 20-year period sort of 82 000 you've had more than a doubling in the number of cases so over a 20-year period doubling in the number of cases genetic or environment why okay so there are three potential what are the three population genetic mechanisms that uh something like this could could uh co
uld occur uh a genetic explanation genetic explanation so you you you you you're i'll give you at least half credit but but you're definitely not 100 right um the first genetic mechanism would be spontaneous mutation right you had some spontaneous mutation and it caused this epidemic of diseases is that possible you already said no it's not possible and you're and you're right it's not but you have to know what the spontaneous mutation rate is which is about 1 times 10 to the minus 8 base pairs
per generation so it's pretty low you know we're spontaneously mutating all of the time but we're not spontaneously mutating fast enough to double the number of cases in a 20-year period of time so what's the second possible genetic mechanism natural selection right so um natural selection gonna do that well particularly with the disease like asthma where there's no selection pressure in no reproductive advantages or disadvantages right you all know plenty of people with asthma and they're able
to reproduce just like everybody else so it couldn't be natural selection and what's the third population genetic mechanism that could uh uh the third one genetic drift so you know you had some mutant asthmatic that came into the american population and over 20 years of time they intermarried with all these other people and doubled the asthma ring plausible or implausible no can't happen okay so you're right it can't be genetic so i i could give you i'd give you half credit but what i mean this
is most geneticists don't think like this but they should think like this the reality is is that all of these genes operate in a developmental and an environmental context all of your genes do so the true underlying model for disease causation is gene by environment interaction so it could very well be that there was some dramatic change in the environment and now that's interacting with some other genes that it wasn't they weren't interacting before with and now you've got this marked explosion
in a number of cases and that's almost certainly is the most comprehensive explanation but it would have to devolve from some sort of environmental change rather than some sort of primary change in in the genes but it could easily be that there is interaction between whatever the environmental exposure is and uh some of the underlying polymorphisms that may be disease related that are different now than they were back here when the disease rate was a lot lower okay so big health problem i don't
want to dwell on this but the uh uh because that's not the purpose of this course but uh um all this means is that people will give you money to study this and they you know they weren't so keen on doing that um 20 or 30 years ago the other important point is this disease is a disease of children so you're going to think about it i usually tell people 90 this is data looking at the age of onset in a closed population in olmsted county what's the famous medical center in olmsted county minnesota
you weren't supposed to answer that there was the answer mayo clinic mayo clinic so mayo clinic is everybody in olmsted county goes to the mayo clinic now you know saudi princes and cheeks and famous people from all over the world and the uh um uh um who was the king of jordan king saying he you know he went there first he went there for his uh yeah so zach's mother another famous person went to the mayo clinic so they get a lot of people from outside but this data is based on the people who li
ve in olmsted county now if you live in olmsted county you don't go anywhere okay you just sit right there and you stay there so this was a fairly stable population and they were able to capture all of the incident asthma cases and document them because they were all going to the mayo medical center and they had their chart records so 90 of all of the people who were diagnosed in as asthmatic in olmsted county were diagnosed before the age of six so this is a very very important point so this is
the opposite of alzheimer's disease right because if you think about genetics this is great for geneticists because i only have to wait six years from the time that the kid is born and i'm going to know whether they've got the disease phenotype or not if i was waiting for alzheimer's cases it would be waiting for godot i'd be waiting a long time before um i'd get my cases now there's ways around that for the old people right and what did geneticists do how did they find the bronco one gene what
did what did they do to to enhance the the probability that you would find if you're looking at people older older people how do you enrich for a genetic cause of a disease what do you do [Music] education doesn't have anything to do with i'm afraid no you just what characteristic of the cases would make you think it's more likely to be genetic what family history family family history but what specifically a family history age of onset right you're looking for genetic causes of heart attacks y
ou're going to take the people that have heart attacks when they're aged 50. so ed silverman who's the world leader in copd genetics in my laboratory is looking for early onset copd cases so he gets cases where the age of onset is younger than the age of 52. so that's young so if you're looking at alzheimer's cases you'd say well we want all of the cases of alzheimer's people before the age of 60. this is how mary claire king found barack one she looked at all of the early onset breast cancer ca
ses people who got breast cancer in their 20s their 30s their 40s instead of looking at older postmenopausal women which is almost certainly another disease okay so if you're looking at old people one of the clever ways that geneticists enrich for uh uh um genetic susceptibility is by looking at early age of onset okay of selecting potentially a special case of a particular disease absolutely right it's it's it's a little bit like in some perverse kind of way it's a little bit like the genetic i
solate you might find a gene that is specific for that particular type of early onset disease so you find a gene for early onset alzheimer's but not for garden variety old age alzheimer's that occurs in uh um virtually everybody by the time they're 90. um so you you're you're right but it's a it's a uh you know we're still and i think what geneticists would say the early stages of this and because we're still in the early stages most of us would be happy if we found any gene so you know you're g
oing to be in science if you find that early onset gene and nobody's going to be criticizing you because it's not the gene for all uh breast cancer or all alzheimer's we'll get another example hanging fruit exactly so a little bit more about the disease most of the kids are allergic allergy is probably the big reason why the asthma epidemic occurred and that means that um you know there's they have this particular type of an inflammatory process where antigen presented to dendritic cells in the
airways activate these uh cd4 positive t lymphocytes which then elaborate this series of inflammatory cytokines which go to these inflammatory cells which then infiltrate the airways and set up an inflammatory reaction with coughing wheezing airways responsiveness etc etc etc this is all well known but it does sort of suggest a whole host of other potential phenotypes that you could potentially look at and it also gets at this concept of ontogeny of the immune system where you know uh t null cel
ls at some point differentiate into these th1 and th2 cells which are determined their phenotype is determined by which cytokines they actually elaborate and i've got a question mark here but actually this particular step which is sort of the crosstalk and interaction between these two types of cells are controlled by two specific uh uh genes that elaborate cytokines il-10 and tgf beta and we've genotyped both of those genes in in asthma and copd and they're they're important in both diseases um
now it's important for you to understand that i sort of skewed things a little bit because i told you that asthma is a th2 disease and teach two diseases have increased you know there's this increase in allergic rhinitis food allergy asthma etc etc this novel gene that i was telling you about just a few minutes ago it's going to come out in science on friday that gene is expressed in the skin and in gut epithelium and in airway epithelium suggesting that may be important in all these different
types of allergic diseases which again has heightened people's interest in the gene and its potential uh um importance but um it's also important to recognize that th1 diseases have also increased so give me some example of some th1 diseases so the epidemiology and the reason i'm bringing this up is is that people most of the immunology community is focused on this is why it's important if you're going to be a good geneticist you got to really know your disease you can't just sort of wave your h
and at it and say oh and i think that the age of the generalist geneticist the geneticist that sort of oh you know i'm going to study this disease and i'm going to study that disease not with complex traits that's not going to work you're going to have to really know your disease because you have to know the environment you're gonna have to know the natural history you're gonna have to know the intermediate phenotypes and you have to really understand the biology uh uh as well the point here is
is these th1 disease give me an example of a th1 disease well i'm thinking about that um can you um tell me the autoimmune diseases like inflammatory bowel disease what are they they are th1 so crohn's disease th1 disease juvenile rheumatoid arthritis th1 disease psoriasis th1 disease juvenile diabetes t1 disease okay and the reason so if the prevalence of these has gone up and the prevalence of the these has gone up people are thinking that there's something going on further up here that has to
do with t reg cells cells that regulate t null cells in terms of their differentiation because it can't be just at this level that the immunologic defect is so it raises the possibility that there are genes uh a fox p3 uh getter t-bat a whole bunch of other genes that are proximal to the th1 th2 cd4 lymphocyte that may be important in all of these immune diseases and people are just now starting to look at at that and obviously the environmental and genetic factors that influence the differenti
ation of the immune system or you know how do people actually tolerize the foreign antigen that's the kind of really simple complicated question that if you could figure out an answer that you win a nobel prize so that's what my laboratory is starting to work on um so this is just to show you again what i've already told you that there are a bunch of factors mostly bacteria and viruses and parasites that influence this teach one teach to differentiation and uh um environmental factors that influ
ence those things are presumed to be important and one would want to know both the genes and the environmental factors that are involved in in this particular disease there happen to be a whole host of environmental factors that are correlates of those sorts of changes and i've listed a bunch of them here we got very interested in this when we went to china in 1996 to do an asthma genetic study and i noticed how different the environment was uh there and you know this left-hand category would so
rt of summarize what you would see if you were standing in rural china which you would see in terms of the environmental exposures very very low asthma rates in the rural china it's about one percent energy is a progressive increase in gradient in terms of disease prevalence as you march towards beijing or shanghai you know much much higher higher rates let me ask some questions is it not actually wrong that chinese populations i thought the families were limited no see that's again a little bit
of knowledge is a bad thing if you get into rural china actually where where uh farming is what everybody does um although the central government would say you could there's a there's a a a two-child policy or a one-child policy uh you know in rural china they just they have as many children as they may not register them with social security but if they need three kids to run the farm or four kids they have as many kids as they want so we found a lot of families with four five six eight kids um
so why has it been presumed to be difficult to do this kind of work what what's the reason that it's difficult i think these are some of the reason reasons and and some of them relate to the issues of study design the things that we were talking about you know one is this whole idea of genetic heterogeneity particularly if the underlying model here is gene by environment interaction presumably you could get the same phenotype and these phenotypes are determined by multiple genes and you know yo
u can get the same phenotype either high ige or airways responsiveness in population a with a very different constellation of genes and environmental exposures and you can get the same thing in population b with with different genes so the this is the genetic heterogeneity thing is a reason for focusing on a genetic isolate but then you have to worry about the generalizability question so in fact in asthma there are four positionally cloned genes counting the paper that's going to come out on fr
iday in science and and of those four the first is the only one that's people have really attempted to replicate um and it's sort of gotten mixed results you know there are some people that are replicated and some people that haven't so it's one of those genes that probably falls into this category of it well it's not a major gene it's a minor gene it's one of the 200 genes that determine asthma but it's not one of the top 10 in every population what's your guess about eugene my guess my guess a
bout this new genius is that it's a major player yeah i think that you know uh um but having said that the point that i made to this science writer who was doing this is it you know as it's you know you know it's that's what science is all about is you know replicating this seeing how important it really is and seeing what actually happens you know i mean i i think you can get a clue as to um whether you've got hold of an area where there's a potential major locus or not by looking at the replic
ability of the linkage peaks in a particular reason region for a complex trade in other words if you've got a region where there's a linkage peak and and there are 10 different studies in 10 different populations and that there's always a peak in the same region then the chances are that there's a major gene in there that's probably going to apply to a bunch of different populations well this is the going back to the science article that this is a region where there have been a number of people
have found a peak there you know the other problem here is is that unlike single gene disorders where there's a known mode of inheritance you get everything under the kitchen sink here right so you get some of these genes or autosomal recessives and some are autosomal dominance and some are uh um so you're getting a whole bunch of things jumbled up in one phenotype which makes it very difficult and then there's this problem of phenocopies so what's what's a phenocopy give me a give me a give me
an example from your own clinical experience of somebody who's a phenocopy who's not not due to a well just like you can get these diseases from jeans you can get them from exposures in the environment so what what if you've got some guy who smokes four packs a day and he's 50 years old and he has a whopping big heart attack well maybe he you know when you inclusivity has no family history but you smoke four packs a day well you can get a heart attack from smoking for four packs a day and you do
n't need to have any genes at all for heart attack you can just so that's a phenocopy he's going to look like somebody who's a genetic susceptible because he you know had a heart attack at 50 years old but it's all due to environmental exposure incomplete penetrance so this is this is a pop problem even in single gene disorders right because they're clearly examples um hemochromatosis cystic fibrosis very different spectrum of diseases and these what we know that you know the cftr gene causes cy
stic fibrosis you get some people who have completely normal lung function no lung disease at all and all they've got is mild pancreatic insufficiency and you've got other people who were totally debilitated from it so part of that can be penetrants part of it can be environmental exposure but incomplete penetrance is important then you've got this problem with multiple genes you know people have very the lay public has a very sort of delusional kind you know they think the genes are immutable y
ou know if you've got those genes that's it you know it can't be changed and they also they're monolithic they're really big whereas the reality is is that anyone you know you got 33 000 genes in the genome take a disease like asthma which isn't very complicated maybe they're 200 250 i don't know a lot that are probably important maybe 10 that play a a a a role in most every population um and a lot of environmental things going on and it makes it very complicated and that's why guys like this gu
y are going to make big bucks because they're going to be able to model all of the different pathways and you know the different genes together in some more realistic model of you know systems biology or some actual way of looking at this but the point here basically is look it's complicated to do this stuff again going back to what i said earlier it's getting a lot easier so 2002 one position like clone gene for asthma 2003 two positionally cloned gene from from asthma 2004 first paper is alrea
dy out and there's probably going to be four or five more so it can be four or five this year year after that they're going to be probably 10 and all of a sudden now you got 20 genes identified for the disease by positional cloning and that is the history of complex trait genetics and it's going to be it's it's happening right now right this very moment all across the world labs like mine are right in the middle of the fray doing this stuff this is simply put the single most exciting time to be
doing human genetics and uh uh it's going to go on for a while but who knows for how long um so then there's this other problem of pleiotropy which is you know you could have one gene and it can do a lot of different things you get the cftr gene that gives you lung disease pancreatic insufficiency infertility you know it's all has to do with mucosa and epithelia and different organ systems where this particular gene is expressed so one of the genes we're looking at and so jeff did jeff talk to y
ou about uh crh1 did we show you the data about crh1 last week so that gene gene is expressed is that gene expressed in the lung yes or no no not expressed in the lung so it's the receptor for crf or crh and it's expressed in the brain so what other disease might that gene potentially be important in he's an endocrinologist he's forbidden from answering [Music] what it's a comes from the hypothalamus actually so yeah clinical or pimple disease hypertension there are endocrine causes of hypertens
ion your physiology course yet oh no all right anybody had physiology hpa okay so um no come on i think calm diseases man common common common disease i'll tell you it's depression okay it's it's actually it's an it's it's been studied a huge amount and you know they've sectioned brains of people who committed suicide and this i mean all kinds of things show that cr crf and cr1 which is the ligand and the receptor are important in affective disorders is there any linkage well there's an associat
ion between our capital type and depression and julia lucinia's mexican-american yeah oh yeah really cool yeah so that's pleiotropy uh and then obviously you've got this problem with penetrance that you know which is individuals with a genotype will actually express the trait and you know ige genes can be important in hay fever they can be important in asthma uh and and there are some people who don't have high ig at all even though they've got the gene so uh um and and these are some other exam
ples of things like the the basic point i'm trying to make here is is that these are reasons that have been given for why doing this stuff this are hard but i'll tell you something the really the hard part has been developing the bioinformatics infrastructure the tools the bioinformatics tools and cheap reliable genotyping that those have really been the things that have been important and just in the little bit of time that i've been doing this my genotyping costs are have gone from a dollar 20
a sniff genotype down to next year uh i'll be down at about 15 20 cents of sniff genotype and you know there are three million snips minimum three to five million in the human genome now i'm not going to type it three to five million but i gotta type in any one experiment i got to be able to type a thousand over a a 10 20 uh uh um megabase region so you know one of these linkage peaks so i got to do a lot of genotyping in a lot of people and it's expensive the very first positionally cloned gen
e for asthma took six years and fifteen point six million dollars we could do that experiment today for two million dollars and a regular nih grant and that has totally changed the field that's the kind of thing that's really making this possible um okay so i already said that if you're if you're going to think like a geneticist everybody has to know a little bit of population genetics so you have to understand the concepts of linkage disequilibrium drifts natural selection et cetera et cetera i
'm reminded of the fact that uh somebody asked the president united states whether he believed in evolution and his answer was the jury's still out really yeah that's what he said um so this is your former classmate right right um i went to high school with the president um so this is the first question that you're going to get asked if you're writing a grant you know the first thing that you have to address is is the disease or the phenotype that you're interested in is it heritable so there's
lots of different ways to measure this you can calculate a heritability estimates you can do twin studies you can develop this concept of risk to relatives which is you know you look at the risk uh in in the uh pro bands divided by or the relatives divided by the risk in the population at large or you can look at familial aggregation but the point is you got to gather the evidence and if you don't know that your phenotype is heritable you're going to have to demonstrate that it's heritable befor
e anybody's going to give you a grant to study it because that's what geneticists say they say they want to know that they want to know the answer to that question so i i think it's asthma doesn't necessarily have a high heritability but it clearly is a heritable disease this is data from one uh a twin study from danish twin registry that looked at the concordance of uh um asthma in identical and fraternal twins identical twins share 100 percent of their genotype fraternal twins uh uh um share f
ifty percent uh um uh uh of their alleles you know everybody knows that twins also share the environment so that's another factor that's uh at issue here but the reality is is that there clearly is evidence of heritability of the disease you get very different you know that the problem here is that heritability estimates are always dependent on environmental exposures as well because the true underlying model and disease prevalence so the true underlying model for all of these diseases is clearl
y still going to be a gene by environment interaction so after you've decided that the phenotypes that you're interested in are heritable then you've got to go out and you've got to say okay i've got these phenotypes and i'm going to genotype them in a population you can either look at disease phenotypes the advantages of this is that people want to look at asthma they want asthma genes they they want to find quote the gene for asthma unquote which we already know is probably a false concept but
the problem with a lot of disease phenotypes is that they even though they're maybe binary clinically they may be real problems in terms of making that diagnosis in a way that would be useful for a a research study the problem with asthma is it's a syndrome right i mean there is no one way of diagnosing asthma so that you can say you take this test and i can guarantee you that everybody that takes this test is going to have the disease and everybody who has a negative test doesn't have the dise
ase [Music] yeah but the fev1 doesn't tell you whether somebody's got asthma or not i can show you people who have reduced fev1 and have cystic fibrosis or have interstitial lung disease or have uh copd i mean they can have a lot of different things right uh so it lacks sensitivity and specificity the fev one and that's true for every single test you know i mean elevated ige well you could have elevated ig from parasitic disease or from uh uh uh eosinophilic pneumonia or from 20 other different
things so there is no single test and and and the same may be true for most complex traits um there may be some phenotypes that are a little easier to measure like you know say well i want to study obesity well how fat is fat or is people who are fat like this different from people who are fat like this you know i mean there's all sorts of different ways of looking fat so or being fat so you know any one of these phenotypes has complications and and and i can tell you this from from when i first
got into this all i knew was phenotype i was a world-class phenotyper i knew all of the nuances of phenotype and everything there is to know about phenotype and that's tends to be what happens when you when you talk to clinicians because they understand that more so this stuff is really really important but it's not going to get you very far if you don't know all the other stuff you got to know all the other things but but i'm not i think the point is you do have to know this and again it's the
current problem there's a lot of numbers actually exactly well it goes back to the point that i was making earlier which i think is is that genetics is moving from a field where genetics were generalists to a field where geneticists are specialists you get people who specialize in respiratory genetics cardiovascular genetics obesity genetics diabetes genetics the days of sort of the person that can roam around and do all of these things no no no i don't think that's going to happen uh in five y
ears six years you're gonna have to be able to go in there and focus on a specific disease because it's going to be too complicated for you to be able to do otherwise then you've got this other type of phenotypes where you can say well okay we want to look at asthma but what about looking at intermediate phenotypes so give me some examples of an intermediate phenotype related to my disease of interest what would be an intermediate phenotype absolutely fev1 what else i had it up on a number of sl
ides so ige level right the measure of allergy skin test reactivity airways responsiveness um symptom score sputum production i mean the uh uh exhale then oh i mean the list goes on and on and on you can create hundreds so for obesity you could be looking at um body mass index as the primary phenotype to define obesity but then you could look at absolute fat mass or percent body fat or waist hip ratio or insulin resistance or do ct scans of somebody's abdominal fat deposition i mean there's a mi
llion different ways of potentially going at this the advantage here is is that sometimes these are more objective than sort of a subjective oh it's asthma it's not asthma um and it may be quote closer to the gene in the sense that you know you've got somebody's ige level you know you have some idea of sort of genes that determine that and it can be quantitative you can do a different approach statistically to quantitative traits then you can use if you're looking at binary traits like if someon
e comes in and you look at them and say okay this person has some form of ask difficulty intermediate phenotypes and that that correlates them with like narrowing the diagnosis from okay you don't really have or you have this so you have this type of basketball is that well it's it's it's it's it's it it the way i prefer to think about it and i think it's probably a better way for you to think about it is is it you got to get away from this is where thinking like a doctor and a clinician is bad
in the world of clinical medicine it's just like religion you either have the disease or you don't there's no such thing as being a little bit pregnant you're pregnant or you're not pregnant okay you have you have to have bypass surgery or you don't clinicians live in a binary world real scientists live in the world of continuous distributions okay so so you you know you can have uh uh uh when when are you fat are you fat with a body mass index of 23 24 25 26 when you hi when do you have high bl
ood pressure when it's 130 over 80 or 140 over 90 or you know you know when when is that so i think that and the other thing is is that the way to think about these is kind of like overlapping venn diagrams the clinical phenotype is actually a composite of these overlapping venn diagrams that are all have separate genetic determinants and things that contribute to them separate genetic and environ sort of like dissecting a layer and peeling an onion where you've got all these different uh um dif
ferent things but i think in in many ways being a clinician can help you as a research scientist but in some ways it can also hurt because you start to think in these absolute terms so i think the better way to think about it is is that these intermediate phenotypes overlapped create clinical phenotypes and yes you're what you're trying to do is get stratify in some way or classify people in some way so you're you're you're creating homogeneity so that you can actually identify the genetic deter
minants of of a disease or an intermediate phenotype so you want to go in that direction but but most of these things lack sufficient sensitivity and specificity to really be a uh terribly helpful this is just a list of some of the phenotypes that people have looked at in asthma i've starred some of the ones that people have focused on in terms of linkage peaks that have actually been identified but this is interesting because there's clearly a bias in the literature because there's a whole bunc
h of these other phenotypes where you could just disease and i could create a list 30 more of these where people haven't looked it so so this just gets to the point that there's plenty of work here for anybody who wants to do this stuff because you can go out and i got a junior person in my lab he's got a bunch of phenotypes that he's really interested in and he's going to go out and he's going to determine their heritability and then he's going to write another grant and he's going to map the g
enes for him and so on and so forth because he he wants to have his own little area to sort of work on so then the next thing so now we're kind of at the point where you you know you gotta i gotta move a little faster we're not gonna make our way through this but you gotta have a study design and there's a bunch of different ways of doing this okay you can do linkage you can do association yeah and and amongst the linkage studies you can do allele sharing uh uh um methods which are distribution
free or you can uh um do sort of um continuous distributions and and focus on that there are two types of genetic association studies the family based and the case control important point here is is that they're very different you know here you have the genotype uh three people here you have to genotype only two people different hypotheses here you're looking at the alleles or the genotypes in the cases relative to the controls it's the little frequency of the genotype frequency in the cases ver
sus the controls here you're looking at transmitted alleles from a heterozygous parent to an affected offspring so very different hypotheses different study designs and important thing to recognize is that in any association study um the association between a variant and a phenotype can be due to a causal relationship it can be the linkage of disequilibrium or it can be due to population admixture which means that usually in the context of a case control study not a family-based study you've got
different allele frequencies segregating in the cases and the controls because you've got different population histories evolutionary histories uh uh that have determined those allele frequencies so the most extreme example would be i had a thousand italian cases of asthma and i'm comparing it to a thousand swiss controls uh uh who don't have asthma and even though these two groups are predominant or caucasian their evolutionary history may be different and the allele frequencies may be differe
nt as a result of that so this is even within an ethnic group you can get these different allele frequencies and this is because ethnicity or self-designated ethnicity is only a weak predictor of evolutionary history i know what's your example or remarkable example so you've compared a link an association study for costa rica between germans and italians with two populations sure enough we find a linkage association between pasta eating and some peaks um a genome because in fact we'd be looking
for is linkage to the fact that you're in italian and just by the fact that the italians have a distinct distribution of polymorphisms then the germans gonna create this pasta association when in fact when you're looking at different populations so these uh some of the guys in my lab wrote an article demonstrating all of the potential problems in the case control type of uh a genetic association study and one of the things that's really impressive about this paper in science is is it we all use
genetic association as part of the fine mapping process to map a linkage peak but this is very important but because even if you can get rid of the population admixture problem linkage this equilibrium is always an issue and so you're never going to know for sure if you're at the gene or you're just close by to it and so you're gonna have to have something else to show that you've actually found the gene you're not getting into science just with genetic association okay and so the people in the
paper that's coming out this week they have expressed the gene in bronchial tissue they've done immunohistochemistry to show that the gene is expressed in epithelium uh they replicated their results in a different population et cetera et cetera so the the thing about the candidate about about the case control studies and about even family based association is that these studies are really easy to do and so there's lots of them in the literature so it's really important for you to know you know g
oing back to this slide it's really important for you to sort of know these potential problems because you want to be able to read this literature and say yeah these guys really found something or maybe they didn't so the advantages of this candidate gene thing is is that it's cheap and easy the the the the you compare remember i said that now four positionally cloned genes that have used this type of genome screen approach four that have been identified since the human genome was mapped in 1996
well you know that's seven years that's not even one gene a year that's pretty meek or weak that's because this is very expensive technologically intensive but the thing that's great about this is you come up with a novel gene at the end of the time so you know it's not dependent on what anybody knows about pathobiology so you could go this way you could say look i know that ige is important in asthma so i know that we ought to be screening il13 il4 l4 receptor ctla4 all of those genes in the p
athway that determines ig makes sense right screen those genes because we've already said that people with asthma have high ig well you know you you check those genes and yeah in fact they are most of those genes are asthma allergy genes it's not kind of not real exciting though you know it's not like everybody's gonna jump up and say oh my god you know io13 is an asthma gene well you know molecular biologist says yeah well we knew that 10 years ago what's new what's great about that well i mean
there are interesting things about it because you actually can get to the level it's going to change molecular biology too because you're actually going to get the level where you say well it's these three variants in the promoter it's this variant in exon one and it's this particular haplotype that's determining the effect on ige level so molecular biology is going to change because people aren't going to just be aren't going to get away with knocking out a gene or or looking at a whole gene e
ffect they're going to actually have to go in there and determine the particular variants that are important in terms of the molecular mechanisms so i don't want to sort of then denigrate this because i this is we do all of us do a lot of this stuff to keep ourselves busy while we're trying to do these really big experiments that are very expensive and take a long time um skip that so let's talk a little bit about linkage linkage is this idea of you take these microsatellite markers all the way
across the genome it's a property of families it's not a property of individuals and you're looking to see if there's a particular region of the genome contains a gene that's related to the phenotype of interest that's segregating uh in these families uh um uh uh uh uh using identity by dissent so what you do is you know you have some extended pedigree like this what you could do is you could do segregation analysis to develop a model to see how the disease is actually segregating in this popula
tion but that's pretty difficult for complex traits it's not easy to do um you could also use this approach the allele sharing approach which assumes no mode of inheritance just says we collected a whole bunch of sib pairs who are affected and we're going to test whether these affected relatives have inherited a region of the genome identity by descent more often than expected under random mendelian segregation and the nice thing about this is that it's easy but it's not very powerful i mean the
problem is you need a lot of sid pairs uh and even then even with over 300 sid pairs you don't get such great power um using this approach so uh um power goes up if the disease is more heritable um and you can do with less sid pairs but the reality is is that even with a huge number of sid pairs you you may not have a lot of power if the lambda is down here which it is for asthma probably so i think that this is why people have focused on extended pedigrees in these relative genetic isolates an
d that's why we're so excited about costa rica uh the finns are clearly excited about finland and decode is doing what it's doing in iceland um whether we're going to be successful or not i don't know but the basic approach is is that whether you're using an outbred population or a genetic isolate and whether you're using sib pairs or pedigrees is you've got these usually die and try nucleotide repeat uh str microsatellite markers um most of the genome services use about 400 of these markers equ
ally randomly spaced across the genome and what you do is you do just do a form of logistic regression basically where you would do a lot score log of the odds ratio calculation between relating phenotype in the family to these markers and what you do is you get a peak a linkage peak that is the lod score for that relationship between the markers and the uh um the phenotype and what that says is okay there's a gene or multiple genes in this particular region on a chromosome that's associated wit
h a particular phenotype and then you have to then going down and put more markers first more str markers and then micro uh snips and and gradually map that region until you've actually got it down to a very very small region of a particular you know thousand base pairs or whatever we can say it's a gene or one or two genes in this uh a relatively large region so that's that takes a lot of genotyping and a lot of work so our experiments now over the next year we have all these linkage peaks and
asthma and copd each experiment is going to be about 200 000 is going to be 1500 1600 snips in each of these regions and and we're going to uh fine map three or four regions over the course of the next year and hopefully we will be in science what's the best you know snips for 100 base pairs kind of thing on the order of one per uh you know one per thousand about one per thousand basis that's about what we're shooting for so this is just a summary of all of the genome screens that have been done
in asthma just to show you that uh most of them have used but sid pair studies most of them have been relatively small but we do get a substantial amount of replication these are regions across the genome this one right here that's the gene that was just mapped okay several populations including the finns showed a peak in this region and um they got this gene and then they went to the the canadians and they said can we replicate it in your population the interesting thing is it was asthma in th
e fins but it's high ige in the canadians so it shows you that this problem of phenotypic heterogeneity and genetic heterogeneity is a big issue here so it isn't a perfect replication at the phenotype level between these two populations but they've got all this other stuff the expression and everything else that proves that they've really got the gene but the one the one we're working on is uh um actually not on here it's i didn't leave it off intentionally but it's it's 12q uh and it's one of t
he ones that's the most replicable uh here it is here it's in this slide right here so this is uh this is a very good region and there's now sydney here um but it's also got a very low p value so that's one of the better ones now you know you can already see from this each one of these this region has five or six different genes in this region there's the cytokine cluster is here beta-2 adrenergic receptors here il13 is here cd14 is here so there's a whole bunch of small genes and here nobody kn
ows whether there's a big big gene or not you know maybe that that linkage peak is just being given by the fact that there's a whole bunch of small genes in in in that region um this this one the one we're working in this is uh 30 megabases it's huge huge region so but you can see from just looking at this that one two three four five six seven eight nine and there's another i mean these are 20 regions each of them about 20 to 30 20 to 40 megabases they there could be five or six genes in each o
ne of these regions and at least two of the positionally cloned genes there were two genes in the region and and you couldn't tell from the articles in fact this finnish article that's about to come out you can there's a second gene they identified and they don't have the molecular biology on that in the paper and they don't they're not sure what that gene is doing so you're actually gonna do a pineapple yeah [Music] probably yeah so these are some of the issues in doing the type of linkage stud
ies that i talked about multiple markers multiple phenotypes multiple comparisons phenotypes are correlated markers not independent you know you got to do so there's a lot of statistical issues you know this this work is really exciting i think because it combines genetics clinical medicine molecular genetics statistics evolutionary it's all this stuff is all mixed together so a lot of important statistical issues in doing these genome screens so then you got a genotype to people you've already
said that you know snips are the primary genetic variation in the human genome but we found indels we found uh repeats we found snips and indels together i mean there's all kinds of stuff uh in general snips occur about you know between one and a thousand and one and two thousand base pairs they're approximately three um maybe three to five million in the human genome and you know it's using these as the primary source of genetic variation that we're actually sort of going at trying to map these
genes there's a whole host of questions about how do you pick snips you know we wrote a paper together zach and i with some of our colleagues about haplotype tagging snips there's other approaches to using linkage disequilibrium to define the snips that you want to genotype so lots of issues there where sort of bioinformatics is interfacing with human genetics um no one really knows you know this is probably not 30 million this is probably three but you know no one really knows how many of thes
e snips are actually coding and i think everybody does know that there are more than coding snips that are important promoter snips are important coding snips are important snips in the three prime ut are important because they're going to change transcription factor binding and potentially change message level whole host of different and and any one snip is probably in and of itself isn't going to change function in a gene all that dramatically so people are going towards this idea of you know
analyzing data at the molecular level by looking at relevant functional haplotypes you know if you've got a couple snips in the promoter uh um you know another that's a non-synonymous c-snip and an exon another that's uh um at a splice site another that's in the three prime utr that's determining message level of stability we combine all of those snips to try to get an effect across that whole gene uh in terms of looking at that gene and its impact on on phenotype so this is just a little bit ab
out data analysis you know you can either look at continuous quantitative quantitative traits or qualitative traits they're parametric and non-parametric approaches to this um then you you know you use all the stuff you actually find the gene um i think that people are not doing the initial work was done with yak and back clones but now we're past the idea of doing that because there's enough markers with a hat map project across the genome that we can go into almost any region now in the genome
and we can come up with validated snips across that region so that we can actually pick snips and genotype them and go directly and this is what's accelerating the pace of uh positional cloning at the moment so these are some of the things that i haven't really talked about this this is introductory this lecture but you know you really get into this you know how do you do haplotype analysis uh ancestral haplotype analysis or linkage disequilibrium mapping molecular methods or tissue expression
all of these things can potentially be helpful in the fine mapping process we you know we've been very interested and have a project with zac where we wanted to use mouse expression and mouse qtl analysis to help us with human uh oppositional cloning we're not sure if our project's going to be funded so we don't know if we're actually going to get a chance to do that at the end of the day you want to be able to look at the impact of polymorphic variation in the gene that you found and see whethe
r that polymorphic how much of the phenotypic variance is explained by that uh of polymorphism and that gets back to this question of well you found a gene by positional cloning how do you know it's really a significant gene well does it replicate across different populations in different conditions is it important in different kinds of asthma uh does it seem to be explaining a significant amount of the variation so this is one example it's a poor example because it's not a it's not a really goo
d one this is a gene cd14 that we genotyped in the programming genomic applications this gene is the gene that binds lps or lipopolysaccharide to the membrane of the monocyte and then transduces that signal to the t cell to produce a th1 cytokine so we found a polymorphism in this gene as part of the program in genomic applications it's a c to t polymorphism so here's the uh t variant here's the uh heterozygote and here's the c and you can see that if you look at a dominant model where the c's a
re together that anybody that has a c genotype actually is likely to have more positive skin tests than those who are tt and that genetic variation is associated with variation in soluble uh uh cd14 levels in peripheral blood so there's a relationship between genotype and uh um intermediate phenotype and a relationship to the the the uh to allergy ultimately uh i'm sorry is that supposed to show the difference between the two it's small but it was significant okay well i think the point here is
that this is one snip you know this gets back to the point that you know it's not even a haplotype in this gene uh and and still there is uh and you know these are modest numbers they're not huge but they're but there was clearly a difference um probably the sort of level of difference you'd expect if it was just a single sniff i mean it's none of these effects are going to be very large at the level of an individual variant at the level of a gene with a haplotype with a really significant gene
maybe so but but not certainly not one snip so these are some of the skills that if you guys want to do this work if you were going to come to my laboratory i would want you to know something about you know you you'd want to be you want to know something about this and how to genotype then afterwards through the disease study design statistical methodology phenotyping environmental exposures and i probably ought to add to this list bioinformatics because without good bioinformatics skills you're
going to be lost uh and you know it's hard to know exactly where on the spectrum you know people wanna sort of you could do this and never have anything to do with the phenotyping and just focus on the functional variation uh from genes that these guys are are actually finding or you might sort of situate yourself somewhere in the middle um you know i've got people in my lab that are doing just this and very few people that are doing just this but i have some that are sitting in the middle so w
here are this going in the future i mean i think that uh um what's driving the field is high throughput sequencing and high throughput genotyping combined with bioinformatics in the presence of having lots of populations to do this kind of work you know that's what's really necessary is you got to have well phenotype populations in my lab these are all the different populations that we have for asthma we've got these extended pedigrees we've got affected sid pairs we've got trios and we've got i
ndividual cases and controls so that we can test the genes in multiple different populations and and under different conditions so why don't i stop there and i'd be glad to answer any questions that people have about any of the things that i said structure of proteins yeah well i mean i think that that means this is that you know actually getting into you know once you once you've got a relationship with a gene what you have to do is really get down and figure out what are the variants in that g
ene and what are they doing and and that that can proceed human genetics can contribute to that at the level of genetic association so for example we um laurie glimscher who's an immunologist at the school of public health identified a gene that controls uh t-cell differentiation uh it's tbx-21 or t-bet is the name of the gene and she created a knockout mouse and uh when you knock this gene out in the mouse you get tremendous airways responsiveness and allergic inflammation it looks like an asth
ma gene in the mouse and and so we sequenced that gene and then we started to look at we found a variant in the gene that's in the coding region it's it's a non-synonymous c-snip in the coding region it's very rare it's only occurs in about three percent of people but it turns out that that coding region variant is uh um determines which patients who get inhaled steroids get better the people that have that variant and get inhaled steroids have their airways responsiveness completely returned to
normal we're about to submit it to the lancet we're actually working with laurie to it's pretty exciting it's it's it's exciting because it's example of how you can actually you don't have to even go to the animal model and so she then crea has created you know her mouse model she started to do some experiments with steroids and steroids are probably important in controlling t-bed expression and she didn't know that so that's an example of structure function relationships where you're trying to
figure out what a gene actually does and it is important to recognize there are some genes that have been around for a while and and people still don't know we know there's a relationship to a disease phenotype but we don't know how the hell they work so figuring out that structure function stuff can take a long time potentially and doing the genetic association and the fine mapping may actually now proceed at a faster pace and and not take as much time but i think it's um you can actually do a
lot of structure function stuff usually what we do is when we get an association we will type every damn we'll sequence that gene we'll type every damn variant we can find in that gene in in the population and look at every thing that could be related to a an interesting phenotype because we're searching for clues to how to help our molecular biology colleagues in uh trying to help them figure out what the gene is actually doing um [Music] uh well you could do that i mean we're trying to work w
ith this guy he's got people in his lab who have ideas about how to uh get clues so like the stuff that drazen showed you last week that gene crh1 we know there's a relationship to steroid treatment response but we don't know what the variant is in the gene so we sequence the gene completely and now we've got two indels in that gene that sit right at intron exon junctions so the presumed what we're thinking is that those insertion deletion polymorphisms may be changing alternative splice sites s
o we're going to have to try to prove that that's one of the hypotheses that we're going to investigate in the renewal of the grant is trying to look at that um we also uh um so you know you have to let the gene tell you where its variation is and and how it might be uh contributing to phenotype and so uh the first thing usually is to sequence the gene completely second thing would be to then do a very careful analysis of the new variants and the re-sequenced variants that you found in relations
hip to the phenotype of interest for phenotypes of interest and see if you can find either haplotypes or individual snips or insertion deletion polymorphisms or transcription factor binding sites are things that could potentially explain the genetic association so then you can do that and then you have to go into an animal model and test those in a more rigorous way usually so scott how far off do you think is the day when a commission of the brigham will be able to come with you and say i have
a disease i have 500 patients cases and 500 controls i think i want to verify that with a climate that this is on the long arm of the chromosome and i want to do this um so you're really asking the question is i think the question you're asking is how far away is whole genome association how far away is whole genomic association where it's within the reach of significant but not impossible clinical studies uh max three years max all right i mean george church it's all about the genotyping costs
zach i mean he he listen he thinks he's close to you know the thousand dollar genome so if he's really close to the thousand dollar genome and genotyping snip genotyping costs really drop continue to drop as dramatically as they've dropped over the last three years i would see whole genome association being within the range of a a reasonable budget you know in a two or three year period of time all right on that note thanks very much

Comments