Brief Description
Since 1998 the Centre National de Génotypage (CNG) has been involved in hundreds of international genetics projects dealing with diverse environmental, biomarker and genetic data directed for the most part to study genes and human diseases. These projects have lead to the development of the Operon system, a feasible bioinformatics platform to centralize scientific software and biomedical data with internal results. In this website, some of the Operon’s functionalities are provided with the aim to give access to genotyping data produced at the CNG. The system is under development, and a more formal release will be further noticed. For more information or comments, please send email to operon@cng.fr.
Scope
This website interfaces the Operon databases and software tools needed to search and export genotype data with their associated annotations. By sequence alignment and algorithmic database analyses, the CNG genotype data is integrated with fundamental genomic and genetic public data sets. Most notably these resources are: NCBI Human Genome Assembly build 36, Reference Sequences (RefSeq), Entrez Genes DB, dbSNP, UniSTS, Genethon, deCODE, Marshfield, Illumina SNP panels, Affymetrix SNP panels, and the HapMap Project databases. The NCBI Human Genome Assembly build 36 is used to normalize all genomic data into the same physical map reference. The RefSeq data sets provide non-redundant and annotated genomic DNA, transcript (RNA), and protein sequence records. These sequences are cross-referenced with the Entrez Gene and UniSTS genomic loci. Some of these loci correlate with the Marshfield, deCODE, and the Genethon genetic maps. All these data sets are interconnected and controlled by the Operon system with the aim to allow researchers further investigate about 1x107 polymorphisms, and 4x109 genotypes currently available throughout this website.
Data Access
There are five interfaces where to initiate a search. First, on every page at the top right corner there is a text search field from where one can query gene or SNP loci. This search requires either a gene symbol or an accession identifier. Some examples are: ORMDL3 (gene symbol), HLA (locus name), NM_020376 (mRNA reference sequence accession), rs17115373 (dbSNP ID), D17S932 (microsatellite accession), P35408 (protein GenBank accession), and ADH (gene family name).
A second and more direct starting search point is to click on the chromosome number located at the top of this page. The result is a sequence map of all contigs used to build the consensus chromosome sequence. This map shows sequencing gaps, number of “N” bases in sequence, contig location, orientation and sequence length. The contig accession links to another table that lists all genes and pseudogenes mapped on the contig. To better visualize where this hyperlink trail may lead, please see the Search Path page.
The Search Path page illustrates how to reach a desired page from another page in the diagram. The pages in the graph are represented as boxes. Green boxes are points of entry where a search can be initiated, and blue ones are result pages.
A fourth query interface is the SNP Map Search Form. This form requires the coordinates of a chromosome region where to look for SNP markers. The result is a table including chromosome location, contig position, allele frequencies, and gene annotations. Please note that the “text” output option is a tab-delimited table that includes genotype counts not shown in the HTML output, and it will be exported from the browser to the default spreadsheet application.
Last interface is the Genotype Search Form needed to access genotype data. However, access to genotype data is restricted to registered users. You will be ask to login. Like the SNP Map Search Form, it requires the chromosome region where to look for SNP markers. In addition, at least one DNA panel must be selected from the pull-down menu, and the output can be exported also as a text tab-delimited table. There are two formats: DNA sample and genotype by row, or by column.