Genome
Center

Core Home

Find Us

Contact Us

Links

Log On

Prices

Bioinformatics

Equipment


Ultra High Throughput Sequencing

genome analyzer Using a massively parallel sequencing approach, the Illumina Genome Analyzer (GA) can generate more than one billion bases of data in a single run.The system uses Solexa sequencing technology and novel reversible terminator chemistry, optimized to achieve unprecedented levels of cost effectiveness and throughput. More information describing this instrument and its application is available at two Illumina web sites, here and also here.

The Genome Analyzer at home in the Genome Center

The DNA Technologies and Expression Analysis Cores of the Genome Center operate two of these instruments, which provide new opportunities in analysis and experimentation for the UCD research community. Articles have appeared on the use of this technology for gene expression, SNPdata collection discovery, resequencing, and chromatin immunoprecipitation experiments. The Illumina sites are the best resource for the most up to date citations, but we're always happy to discuss with you what's new and what's available at our core facilities.

Steps involved in a GA experiment can be broken down into a series of experimental manipulations, instrument runs, and data analyses. These steps include creation of a sequencing library, seeding and preparation of the flow cell on the cluster station, sequencing by synthesis, and bioinformatics.

Screen shot during data collection

The Sequencing Library

A high quality sequencing library is critical to obtain good data, and the preparation method used will depend on the kinds of samples and experimental design specific to your project. Applications utilizing DNA library preparation include ChIP-seq, resequencing, and SNP analysis, whereas RNA library preparation would, for example, be used for transcriptional profiling and micro RNA studies.

We offer certain DNA library preparation services.  The input material for these libraries can be genomic DNA, chromatin immunoprecipitated material (ChIP samples) or double strand cDNA libraries.  The protocols for library preparation are in constant development and refinement, so exact standard procedures, for example, how much DNA is required, are still being defined.  As an approximation, the following guidelines should be observed.  For chromosomal, BAC, or related genomic libraries, provide 5 ug of high quality DNA (concentration > 100 ng/ul, OD 260/280 close to 1.8). For ChIP libraries, provide 100-200 ng PicoGreen quantified material (assaying of ChIP material by PCR must be carried out before delivery, facility personnel will evaluate your results and decide suitability for subsequent manipulations).  It is probable that smaller amounts of input material can also generate good sequencing libraries but with less guarantees for library construction success. Please inquire for more details.

Users interested in making their own libraries to expedite their studies should download and read the current protocols for different kits offered by Illumina (see below). Ordering information for kits can be provided on request. DNA library production utilizes relatively straightforward molecular biological techniques.  Making libraries from RNA, whether normalized to characterize the gene space or look for allele-specific expression, un-normalized for expression tag counting, or optimized towards small RNA, is trickier and users should be confident in their molecular biology skills before creating and delivering such libraries to the Genome Center for sequencing.  In addition to the kits, home-made library preparation methods using certain required Illumina reagents plus more generally available materials are being developed by numerous groups. We will post these protocols and workflows as we get permission; keep checking the "Homemade Protocols" section below to see the latest.

Full fees for the cluster generation and sequencing processes will be assessed on a failed run if the problem is library specific (control libraries will routinely be run in parallel). Several methods to validate the library prior to sequencing are suggested by Illumina, as outlined in the manuals. We, and others in the GA sequencing community, are also developing other metrics that predict subsequent data quality but it has not been a straightforward undertaking.  As the field deveops we will of course keep everyone informed.

Illumina Kit Protocols

Genomic DNA library.  The kit described in this protocol is used for creation of sequencing libraries from ds DNA template.  This template could be genomic DNA, chromatin immunoprecipitation samples (additional special manipulations for this starting material may be required, please inquire), BACs, or even double strand cDNA collections.

RNA library-NlaIII digested.  Used for creation of libraries from total RNA in gene expression (transcriptional profiling) experiments. 

RNA library-DpnII digested.  Used for creation of libraries in gene expression (transcriptional profiling) experiments as above, differs from previous protocol primarily in the use of one of the digesting enzymes.  Both kits supposedly are very similar and just reflect historical use of both enzymes.  Most users now go with this version.

Small RNA library.  Used to build libraries to analyze low molecular weight RNA expression.

Homemade Protocols


DNA library: Protocol describing genomic DNA library construction using commercially available enzymes coupled with the Illumina oligonucleotides from the library kit.  Protocol provided by Dr. Ghia Euskirchen.

Normalized RNA library: Not a detailed protocol, but an overview of a successful workflow for generating a normalized library.

mRNA Expression Analysis Library:  Not a detailed protocol, but general workflow steps used by some investigators.  NOT YET AVAILABLE

Other Resources

ChIP-Seq Data Technical Note and ChIP-Seq DataSheet. Two informational documents from Illumina describing the uses and applications of sequencing chromatin immunoprecipitation products.  Good overviews with useful references.

Cluster Generation and Sequencing by Synthesis

cluster station Flow cell preparation on the cluster station is carried out by Core facility personnel. The amount of library DNA used at this step is the most important determinant for generating the maximal number of reads (initially expressed as the number of clusters).  Typically there is not a “one size fits all” concentration value for perfect cluster generation, although we have a reasonable idea based on past experience.  If you provide us with library DNA, the concentration should be over 10 ng/ul, as measured by a nanodrop spectrophotometer.  We have achieved good clusters using more dilute libraries, but we have also seen such libraries fail altogether.  Ideally the library should appear as a predominant band of particular size on an agarose gel, roughly 200-300 bp.  Other bands may be visible but they should be relatively faint.  Quality metrics for pre-determining the suitability of a library are under development.   We recommend just trying out one lane with a new library.  This will establish the amount required for optimal cluster generation in subsequent runs, and should give you a good feel for the quality of the library content.  Once a good library is created, it provides enough material for many, many lanes so it's not a problem for us to archive and keep running that sample until enough data is produced to satisfy you.  Because of the length of run, instrument capacity, analytical constraints, and data transfer issues, turnaround time for any data set is difficult to predict exactly.  Two things are certain:  (a) the sooner you drop off a library, the sooner you will get data back, and (b) we will stay in contact and let you know the status of your project.

Each library has a particular sequencing primer used in conjunction with those sequences.  In the core labs we maintain stocks of the genomic DNA library sequencing primer so those don 't need to be provided with your samples.  However, for other types of libraries, eg., the small RNA or Dpn libraries, specific sequencing primers provided with the kits will need to be dropped off along with your samples.  Additional small charges may apply if multiple primer types need to be used in a single flow cell, we can talk about this before running your samples.


Deliverables, Prices, and Bioinformatics

Each lane on a flow cell generates approximately 2.7 GB of raw image data for each synthesis cycle.  These images are processed into base calls and alignment files through the so-called Solexa Pipeline, eventually amounting to about 25-30 GB of data/36 cycle sequence run.  Users can choose to have access to any portion or all of this data, following discussions with facility personnel about the specific requirements of the project.  While storage policies are still under development, users should be prepared to collect and store their own sequence and, if they wish, raw tiff image data, within two to three months after running the sample (although storage services are available through the Bioinformatics Core for a fee).  You should expect 2-5 million good sequence reads/lane, once cluster generation for your library is optimized.

Clearly the question of what defines a "good" sequence is critical. Library preparation and running costs on these sequencers is expensive (although the per base costs are actually pretty cheap!).  A large amount of data is generated and often significant effort is required to make sense of the output.  Every sequencing run utilizes a control lane of phix174 DNA. Based on the behavior of this commercially available library DNA, we determine whether the run met our quality specifications.  This is evidenced by certain metrics that we can discuss with you.  The summary report of the run will also be made available so you can see these metrics for yourself.  If the run does work, you will be charged full price regardless of how your data comes out.  Depending on your experimental system, alignment quality metrics may or may not be available.  We can discuss how to implement and interpret alignment and other quality measures.

Prices for sequencing vary depending on the number of cycles requested (ie, how long a sequence you want).  These prices are still under development, but currently vary from about 800 dollars/lane for 18 cycles to 1,200 dollars for 60 cycles/lane.  These prices are subject to change without notice.  This includes all the labor and reagents for cluster generation and cycle sequencing, initial base calling analysis using the standard Solexa pipeline, and some transfer of data files to a server of your choice if you have one available.  As stated above, failed lanes on an otherwise good run (as evidenced by the behavior of the control lane) will be charged full price.  Failed lanes due to instrument issues are charged a small fee for labor. Even from such runs, some information such as library integrity is obtained; also, users will maintain their place in line on the instrument if re-runs are necessary. 

While initial base calling is currently part of the service fee, other types of bioinformatic analyses will be carried out by the Bioinformatics Core.  They are developing a menu of interesting services relating to the access, manipulation and analysis of sequence data from these instruments.  As these services become defined we will place links from this site, but in the meantime please contact them directly.

The ability to utilize this instrument as an extension of individual research programs will lead to unprecedented amounts of data and enable novel areas of investigation. However, because of the cost and complexity of the process, as well as instrument availability issues, careful thought should be given to experimental design. We would be happy to discuss this with you.

Please watch this site for continued updates!