Please use this identifier to cite or link to this item: https://hdl.handle.net/2440/119979
Full metadata record
DC FieldValueLanguage
dc.contributor.advisorAdelson, David-
dc.contributor.authorCoburn, Catisha Leigh-
dc.date.issued2019-
dc.identifier.urihttp://hdl.handle.net/2440/119979-
dc.description.abstractAs research into epigenetics grows, it is clear that modifications to DNA through histones and other proteins can change behaviour within the cell, and is an important aspect of cellular function. One of the methods to observe these modifications is chromatin immunoprecipitation sequencing (ChIP-seq), which specifically targets protein-bound DNA to determine its location along the genome. The outcome of this technique are sequences of DNA, which indicate regions of DNA that may be bound by the protein. A drawback of this technique is that noise within the data can hide the true location of these proteins, and thus ChIP-seq peak calling software is needed to identify putative binding sites, which can then be associated with genes. There are a number of these programs available, but they tend to have a low level of agreement. This is because they use a wide variety of peak identification models that rely on different assumptions about the data. Ideally, the results from a number of tools could be combined to identify a combined, robust set of associated genes. One candidate technique is Latent Class Analysis (LCA). The aim of this thesis is to apply LCA to ChIP-seq data, and use it to identify a reliable set of bound genes. Three different LCA models were considered; a simple model, as well as models with additional random effects. These random effects had either constant loading among the programs, or non-constant loading. In Chapter 1, I applied these models to ChIP-seq data to observe the initial results. Next, in Chapter 2, I performed a series of simulations with varying parameters, 2 Abstract and analysed them with the three models, to clarify and extend upon the results from Chapter 1. In this case, the underlying truth was known, so I could measure the performance of each model. These measurements included the correlation to a Multivariate Gaussian Mixture Model (MGMM) results, which was fitted to the underlying data, and the root mean squared error to the MGMM results. An additional measurement was the BIC. Aside from comparing the models for accuracy, I also assessed the use of BIC for both determining the correct number of classes to use, and as a method of determining the best model using the simulations. Finally, in Chapter 4, I developed and tested using simulations a new method of using the LCA models to acquire a more accurate set of putative binding genes. This was analysed using the MGMM, as well as by comparing the proportion of binding genes with the known expected number. I then applied this new method to the original data in Chapter 5. Based on initial results in Chapter 1, the LCA model without random effects generated a reasonable set of binding genes. This was further confirmed using the results of the simulations in Chapter 2, which indicated that the posterior probabilities are more accurate using this model. In addition, the BIC was not found to accurately determine the best number of classes. When assessing the use of the BIC to choose a model, it was found that it did not necessarily find the best performing model, and, based on the simulations, selecting the LCA is better. Finally, assessments of the new method indicated that it performed well compared to using a single model. In conclusion, the approach that incorporates changing thresholds with the LCA was shown to be the most effective at producing a combined robust set of genesen
dc.language.isoenen
dc.subjectbioinformaticsen
dc.subjectChlP-seqen
dc.subjectstatisticsen
dc.subjectlatent class analysisen
dc.subjectgeneticsen
dc.titleMeasuring genome wide changes in chromatin state using ChlP-seqen
dc.typeThesisen
dc.contributor.schoolAdelaide Law Schoolen
dc.provenanceThis electronic version is made publicly available by the University of Adelaide in accordance with its open access policy for student theses. Copyright in this thesis remains with the author. This thesis may incorporate third party material which has been used by the author pursuant to Fair Dealing exceptions. If you are the owner of any included third party copyright material you wish to be removed from this electronic version, please complete the take down form located at: http://www.adelaide.edu.au/legalsen
dc.description.dissertationThesis (MPhil) -- University of Adelaide, School of Biological Sciences, 2019en
Appears in Collections:Research Theses

Files in This Item:
File Description SizeFormat 
Coburn2019_Ma.pdf8.82 MBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.