NGS changed the face of omics research with its debut in 2005. In 2007, a new method that uses a massively parallel sequencing technique was described. ChIP seq advanced the – time PCR (RT PCR) and array-based techniques available to the omics researchers. ChIP-sequencing can utilize short reads for aligning the genome, but at the same time, it requires millions of short sequences to generate meaningful data. Today, newer NGS systems can produce extended and deep reads. For example, Illumina HiSeq Flowcell produces up to 1.5B with each run when necessary. However, most still produce single-end 35bp sequences.
How has ChIP-sequencing advanced over the years?
The advancement of NGS systems between 2007 and 2019 has allowed the development, progress, and completion of several projects, including the Encyclopedia of DNA elements (ENCODE). Projects like ENCODE and other independent research groups have strived for years to standardize the steps involved in ChIP-sequencing and analysis.
Although ChIP-seq evolved from standardized micro-array techniques, it still demands a novel set of tools and software for the massive volume of data generated. The fact that ChIP seq experiments can belong to different classes does not reduce the complication during the standardization of sequencing and analysis procedures. A majority of the algorithms for ChIP seq analysis have been developed for the production and analysis of well-defined peaks that enable the investigation of transcription factor binding and motif analysis.
Why do researchers find ChIP Seq Analysis Challenging?
The primary challenge for all novice researchers and small to medium-sized labs is to determine data formats and proper experimental parameters before beginning the experiment. Here are the first set of issues most of them face.
- Deciding which NGS platform is suitable for the ChIP seq analysis.
- Determining which platform will yield the maximum number of reads within a limited budget.
- Finding a platform that facilitates the understanding of variation within the given sample.
- Locating an NGS analysis platform that will elucidate the differential binding analysis for the sample at hand.
In such cases, researchers consider around 5-10M reads to be minimum and 20-40M reads to be standard for ChIP seq analysis.
What are the main challenges of ChIP seq analysis?
The ChIP-seq generates the sequence from the regions indirectly bound to antibody targets. However, it also generates sequences from regions bound to the antibodies non-specifically. The latter gives rise to “noise.” Even the best-quality ChIP seq library can consist of 80% to 90% noise.
Finding the Peak
The first issue due to noise creation is finding the true peaks within the ChIP seq data. The peak refers to a site where several reads have been mapped and piled up. Since ChIP-sequencing considers single-end reads only, the fragments sequencing takes place from the 5′ ends. Therefore, it results in two distinct peaks – one peak on each strand, and the binding site is between these peaks. The distance of the binding site from the middle of each peak is called the “shift.”
Quality Control
Understanding the ChIP experiment fragment size enables the precise location of the nucleotide-resolved binding site. Gel-based methods can accomplish the size-based differentiation of the fragments.
On the other hand, paired-end sequence data enables the direct calculation of the fragment size from the data. Research shows that a mix of single-ended reads and paired-end reads can improve the best set of data for ChIP seq analysis.
Finding the best ChIP Sequence and Analysis Algorithm
A significant number of peak finders are available at the moment, but finding the “best” one is close to impossible. The parameters your team chooses will determine the outcome of any ChIP seq analysis. It is wise to stick to time-tested methods and read the fine print while choosing the sequencing and analysis algorithm.
What are the leading uses of ChIP seq analysis?
Motif Analysis – One of the leading applications of ChIP seq is to find the nucleotide sequences for the binding of proteins in the genome. The finer aim of ChIP seq analysis can be the determination of polymerase binding sites for replication and repair, binding motifs of transcription regulators and protein factors. It can also aid in the discovery of aberrant protein binding for regulation in diseased states.
Most commonly, multiple protein-binding motifs can be located within a single data set. The presence of a reliable NGS data analysis platform can help the researcher find the true peaks and run the motif analysis.
Chromatin State – Another use for ChIP seq is the determination of the histone and chromatin interaction during the different stages of the cell cycle. Additionally, nucleic acid modifications can bring about changes in the structure of the histone-chromatin folds.
Several recent publications point towards the usefulness of integrating ChIP seq analysis with functional genomics data for exploring gene regulatory mechanisms. Selective histone modifications the identification of the distinct states of chromatin.
The integration of ChIP seq analysis with genomic assays can help in the investigation of the regulation of gene expression in the different stages of the cell cycle.
Differential Binding Analysis – The differential binding analysis is a new technique, but it is heavily influenced by the existing analytics techniques of differential gene expression. It is a powerful technique that can detect the binding changes between given protein and nucleic acid samples.
Standard differential binding sequencing and analysis methods enable the researchers to identify the real peaks by peak height. It is a quantitative analysis that works on the principle of normalization of read counts, similar to that employed in microarray techniques.
Conclusion
The evolution of ChIP seq has replaced the early methods of nucleic acid and protein interaction. The presence of commercial and customizable NGS platforms that can analyze high-volume data that ChIP seq generates has opened a new dimension for genomics studies.
It is a powerful technique that has the potential to offer new biological insights into DNA-protein interactions. Modifications of ChIP seq analysis is now much easier with cloud-based NGS sequencing and analysis platforms.
The lightning-fast ChIP seq analysis yields publication-ready reports in not days or weeks, but hours. Web-based platforms have made the scale-up of ChIP-sequencing experiments from one sample to thousands possible. Massively parallel processing is now a reality without the intervention of codes.