Authors Ballut L, Violot S, Shivakumaraswamy S, Thota LP, Sathya M, Kunala J, Dijkstra BW, Terreux R, Haser R, Balaram H, Aghajari N
Journal Nat Commun
Abstract Abstract ............
Times cited 0
Authors Payen L, Honorat M, Bouard C, Jacob G, Terreux R, Delalu H, Labarthe E, Guitton J
Journal Anal Bioanal Chem
Abstract Abstract section !
Times cited 0
Authors Bouard C, Terreux R, Hope J, Chemelle JA, Puisieux A, Ansieau S, Payen L
Journal J Biomol Struct Dyn
Abstract The basic helix-loop-helix (bHLH) transcription factor TWIST1 is essential to embryonic development, and hijacking of its function contributes to the development of numerous cancer types. It forms either a homodimer or a heterodimeric complex with an E2A or HAND partner. These functionally distinct complexes display sometimes antagonistic functions during development, so that alterations in the balance between them lead to pronounced morphological alterations, as observed in mice and in Saethre-Chotzen syndrome patients. We, here, describe the structures of TWIST1 bHLH-DNA complexes produced in silico through molecular dynamics simulations. We highlight the determinant role of the interhelical loops in maintaining the TWIST1-DNA complex structures and provide a structural explanation for the loss of function associated with several TWIST1 mutations/insertions observed in Saethre-Chotzen syndrome patients. An animated interactive 3D complement (I3DC) is available in Proteopedia at http://proteopedia.org/w/Journal:JBSD:27
Times cited 1
Authors Berger C, Romero-Brey I, Radujkovic D, Terreux R, Zayas M, Paul D, Harak C, Hoppe S, Gao M, Penin F, Lohmann V, Bartenschlager R
Times cited 25
Authors Friedrich A, Garnier N, Gagniere N, Nguyen H, Albou LP, Biancalana V, Bettler E, Deleage G, Lecompte O, Muller J, Moras D, Mandel JL, Toursel T, Moulinier L, Poch O
Journal Hum Mutat
Abstract Understanding how genetic alterations affect gene products at the molecular level represents a first step in the elucidation of the complex relationships between genotypic and phenotypic variations, and is thus a major challenge in the postgenomic era. Here, we present SM2PH-db (http://decrypthon.igbmc.fr/sm2ph), a new database designed to investigate structural and functional impacts of missense mutations and their phenotypic effects in the context of human genetic diseases. A wealth of up-to-date interconnected information is provided for each of the 2,249 disease-related entry proteins (August 2009), including data retrieved from biological databases and data generated from a Sequence-Structure-Evolution Inference in Systems-based approach, such as multiple alignments, three-dimensional structural models, and multidimensional (physicochemical, functional, structural, and evolutionary) characterizations of mutations. SM2PH-db provides a robust infrastructure associated with interactive analysis tools supporting in-depth study and interpretation of the molecular consequences of mutations, with the more long-term goal of elucidating the chain of events leading from a molecular defect to its pathology. The entire content of SM2PH-db is regularly and automatically updated thanks to a computational grid data federation facilities provided in the context of the Decrypthon program.
Times cited 7
Authors Hansen SF, Bettler E, Rinnan A, Engelsen SB, Breton C
Journal Mol Biosyst
Abstract Glycosyltransferases are one of the largest and most diverse enzyme groups in Nature. They catalyse the synthesis of glycosidic linkages by the transfer of a sugar residue from a donor to an acceptor substrate. These enzymes have been classified into families on the basis of amino acid sequence similarity that are kept updated in the Carbohydrate Active enZyme database (CAZy, ). The repertoire of glycosyltransferases in genomes is believed to determine the diversity of cellular glycan structures, and current estimates suggest that for most genomes about 1% of the coding regions are glycosyltransferases. However, plants tend to have far more glycosyltransferase genes than any other organism sequenced to date, and this can be explained by the highly complex polysaccharide network that form the cell wall and also by the numerous glycosylated secondary metabolites. In recent years, various bioinformatics strategies have been used to search bacterial and plant genomes for new glycosyltransferase genes. These are based on the use of remote homology detection methods that act at the 1D, 2D, and 3D level. The combined use of methods such as profile Hidden Markov Model (HMM) and fold recognition appears to be appropriate for this class of enzyme. Chemometric tools are also particularly well suited for obtaining an overview of multivariate data and revealing hidden latent information when dealing with large and highly complex datasets
Times cited 18
Authors Pellet J, Tafforeau L, Lucas-Hourani M, Navratil V, Meyniel L, Achaz G, Guironnet-Paquet A, Aublin-Gex A, Caignard G, Cassonnet P, Chaboud A, Chantier T, Deloire A, Demeret C, Le Breton M, Neveu G, Jacotot L, Vaglio P, Delmotte S, Gautier C, Combet C, Deleage G, Favre M, Tangy F, Jacob Y, Andre P, Lotteau V, Rabourdin-Combe C, Vidalain PO
Journal Nucleic Acids Res
Abstract Large collections of protein-encoding open reading frames (ORFs) established in a versatile recombination-based cloning system have been instrumental to study protein functions in high-throughput assays. Such 'ORFeome' resources have been developed for several organisms but in virology, plasmid collections covering a significant fraction of the virosphere are still needed. In this perspective, we present ViralORFeome 1.0 (http://www.viralorfeome.com), an open-access database and management system that provides an integrated set of bioinformatic tools to clone viral ORFs in the Gateway(R) system. ViralORFeome provides a convenient interface to navigate through virus genome sequences, to design ORF-specific cloning primers, to validate the sequence of generated constructs and to browse established collections of virus ORFs. Most importantly, ViralORFeome has been designed to manage all possible variants or mutants of a given ORF so that the cloning procedure can be applied to any emerging virus strain. A subset of plasmid constructs generated with ViralORFeome platform has been tested with success for heterologous protein expression in different expression systems at proteome scale. ViralORFeome should provide our community with a framework to establish a large collection of virus ORF clones, an instrumental resource to determine functions, activities and binding partners of viral proteins.
Times cited 29
Authors Joosten RP, Salzemann J, Bloch V, Stockinger H, Berglund AC, Blanchet C, Bongcam-Rudloff E, Combet C, Da Costa AL, Deleage G, Diarena M, Fabbretti R, Fettahi G, Flegel V, Gisel A, Kasam V, Kervinen T, Korpelainen E, Mattila K, Pagni M, Reichstadt M, Breton V, Tickle IJ, Vriend G
Journal J Appl Crystallogr
Abstract Structural biology, homology modelling and rational drug design require accurate three-dimensional macromolecular coordinates. However, the coordinates in the Protein Data Bank (PDB) have not all been obtained using the latest experimental and computational methods. In this study a method is presented for automated re-refinement of existing structure models in the PDB. A large-scale benchmark with 16 807 PDB entries showed that they can be improved in terms of fit to the deposited experimental X-ray data as well as in terms of geometric quality. The re-refinement protocol uses TLS models to describe concerted atom movement. The resulting structure models are made available through the PDB_REDO databank (http://www.cmbi.ru.nl/pdb_redo/). Grid computing techniques were used to overcome the computational requirements of this endeavour.
Times cited 57
Authors Heymann M, Paramelle D, Subra G, Forest E, Martinez J, Geourjon C, Deleage G
The technique of chemical cross-linking followed by mass spectrometry has proven to bring valuable information about the protein structure and interactions between proteic subunits. It is an effective and efficient way to experimentally investigate some aspects of a protein structure when NMR and X-ray crystallography data are lacking.
We introduce MSX-3D, a tool specifically geared to validate protein models using mass spectrometry. In addition to classical peptides identifications, it allows an interactive 3D visualization of the distance constraints derived from a cross-linking experiment.
Freely available at http://proteomics-pbil.ibcp.fr
Times cited 12
Authors Friedrich A, Ripp R, Garnier N, Bettler E, Deleage G, Poch O, Moulinier L
Journal BMC Bioinformatics
The post-genomic era is characterised by a torrent of biological information flooding the public databases. As a direct consequence, similarity searches starting with a single query sequence frequently lead to the identification of hundreds, or even thousands of potential homologues. The huge volume of data renders the subsequent structural, functional and evolutionary analyses very difficult. It is therefore essential to develop new strategies for efficient sampling of this large sequence space, in order to reduce the number of sequences to be processed. At the same time, it is important to retain the most pertinent sequences for structural and functional studies.
An exhaustive analysis on a large scale test set (284 protein families) was performed to compare the efficiency of four different sampling methods aimed at selecting the most pertinent sequences. These four methods sample the proteins detected by BlastP searches and can be divided into two categories: two customisable methods where the user defines either the maximal number or the percentage of sequences to be selected; two automatic methods in which the number of sequences selected is determined by the program. We focused our analysis on the potential information content of the sampled sets of sequences using multiple alignment of complete sequences as the main validation tool. The study considered two criteria: the total number of sequences in BlastP and their associated E-values. The subsequent analyses investigated the influence of the sampling methods on the E-value distributions, the sequence coverage, the final multiple alignment quality and the active site characterisation at various residue conservation thresholds as a function of these criteria.
The comparative analysis of the four sampling methods allows us to propose a suitable sampling strategy that significantly reduces the number of homologous sequences required for alignment, while at the same time maintaining the relevant information concerning the active site residues.
Times cited 5
Authors Garnier N, Friedrich A, Bolze R, Bettler E, Moulinier L, Geourjon C, Thompson JD, Deleage G, Poch O
Abstract MAGOS is a web server allowing automated protein modelling coupled to the creation of a hierarchical and annotated multiple alignment of complete sequences. MAGOS is designed for an interactive approach of structural information within the framework of the evolutionary relevance of mined and predicted sequence information.
Times cited 5
Authors Sapay N, Guermeur Y, Deleage G
Journal BMC Bioinformatics
Membrane proteins are estimated to represent about 25% of open reading frames in fully sequenced genomes. However, the experimental study of proteins remains difficult. Considerable efforts have thus been made to develop prediction methods. Most of these were conceived to detect transmembrane helices in polytopic proteins. Alternatively, a membrane protein can be monotopic and anchored via an amphipathic helix inserted in a parallel way to the membrane interface, so-called in-plane membrane (IPM) anchors. This type of membrane anchor is still poorly understood and no suitable prediction method is currently available.
We report here the "AmphipaSeeK" method developed to predict IPM anchors. It uses a set of 21 reported examples of IPM anchored proteins. The method is based on a pattern recognition Support Vector Machine with a dedicated kernel.
AmphipaSeeK was shown to be highly specific, in contrast with classically used methods (e.g. hydrophobic moment). Additionally, it has been able to retrieve IPM anchors in naively tested sets of transmembrane proteins (e.g. PagP). AmphipaSeek and the list of the 21 IPM anchored proteins is available on NPS@, our protein sequence analysis server.
Times cited 54
Authors Jambon M, Andrieu O, Combet C, Deleage G, Delfaud F, Geourjon C
Abstract We provide the scientific community with a web server which gives access to SuMo, a bioinformatic system for finding similarities in arbitrary 3D structures or substructures of proteins. SuMo is based on a unique representation of macromolecules using selected triplets of chemical groups having their own geometry and symmetry, regardless of the restrictive notions of main chain and lateral chains of amino acids. The heuristic for extracting similar sites was used to drive two major large-scale approaches. First, searching for ligand binding sites onto a query structure has been made possible by comparing the structure against each of the ligand binding sites found in the Protein Data Bank (PDB). Second, the reciprocal process, i.e. searching for a given 3D site of interest among the structures of the PDB is also possible and helps detect cross-reacting targets in drug design projects.
Times cited 54
Authors Simmonds P, Bukh J, Combet C, Deleage G, Enomoto N, Feinstone S, Halfon P, Inchauspe G, Kuiken C, Maertens G, Mizokami M, Murphy DG, Okamoto H, Pawlotsky JM, Penin F, Sablon E, Shin-I T, Stuyver LJ, Thiel HJ, Viazov S, Weiner AJ, Widell A
Abstract International standardization and coordination of the nomenclature of variants of hepatitis C virus (HCV) is increasingly needed as more is discovered about the scale of HCV-related liver disease and important biological and antigenic differences that exist between variants. A group of scientists expert in the field of HCV genetic variability, and those involved in development of HCV sequence databases, the Hepatitis Virus Database (Japan), euHCVdb (France), and Los Alamos (United States), met to re-examine the status of HCV genotype nomenclature, resolve conflicting genotype or subtype names among described variants of HCV, and draw up revised criteria for the assignment of new genotypes as they are discovered in the future. A comprehensive listing of all currently classified variants of HCV incorporates a number of agreed genotype and subtype name re-assignments to create consistency in nomenclature. The paper also contains consensus proposals for the classification of new variants into genotypes and subtypes, which recognizes and incorporates new knowledge of HCV genetic diversity and epidemiology. A proposal was made that HCV variants be classified into 6 genotypes (representing the 6 genetic groups defined by phylogenetic analysis). Subtype name assignment will be either confirmed or provisional, depending on the availability of complete or partial nucleotide sequence data, or remain unassigned where fewer than 3 examples of a new subtype have been described. In conclusion, these proposals provide the framework by which the HCV databases store and provide access to data on HCV, which will internationally coordinate the assignment of new genotypes and subtypes in the future.
Times cited 962
Authors Combet C, Penin F, Geourjon C, Deleage G
Journal Appl Bioinformatics
Abstract To date, more than 30 000 hepatitis C virus (HCV) sequences have been deposited in the generalist databases DNA Data Bank of Japan (DDBJ), EMBL Nucleotide Sequence Database (EMBL) and GenBank. The main difficulties with HCV sequences in these databases are their retrieval, annotation and analyses. To help HCV researchers face the increasing needs of HCV sequence analyses, we developed a specialised database of computer-annotated HCV sequences, called HCVDB. HCVDB is re-built every month from an up-to-date EMBL database by an automated process. HCVDB provides key data about the HCV sequences (e.g. genotype, genomic region, protein names and functions, known 3-dimensional structures) and ensures consistency of the annotations, which enables reliable keyword queries. The database is highly integrated with sequence and structure analysis tools and the SRS (LION bioscience) keywords query system. Thus, any user can extract subsets of sequences matching particular criteria or enter their own sequences and analyse them with various bioinformatics programs available on the same server.
HCVDB is available from http://hepatitis.ibcp.fr.
Times cited 28
Authors Horn F, Bettler E, Oliveira L, Campagne F, Cohen FE, Vriend G
Journal Nucleic Acids Res
Abstract The GPCRDB is a molecular class-specific information system that collects, combines, validates and disseminates heterogeneous data on G protein-coupled receptors (GPCRs). The database stores data on sequences, ligand binding constants and mutations. The system also provides computationally derived data such as sequence alignments, homology models, and a series of query and visualization tools. The GPCRDB is updated automatically once every 4-5 months and is freely accessible at http://www.gpcr.org/7tm/.
Times cited 236
Authors Van Durme JJ, Bettler E, Folkertsma S, Horn F, Vriend G
Journal Nucleic Acids Res
Abstract The NRMD is a database for nuclear receptor mutation information. It includes mutation information from SWISS-PROT/TrEMBL, several web-based mutation data resources, and data extracted from the literature in a fully automatic manner. Because it is also possible to add mutations manually, a hundred mutations were added for completeness. At present, the NRMD contains information about 893 mutations in 54 nuclear receptors. A common numbering scheme for all nuclear receptors eases the use of the information for many kinds of studies. The NRMD is freely available to academia and industry as a stand-alone version at: www.receptors.org/NR/.
Times cited 15
Authors Perriere G, Combet C, Penel S, Blanchet C, Thioulouse J, Geourjon C, Grassot J, Charavay C, Gouy M, Duret L, Deleage G
Journal Nucleic Acids Res
Abstract The World Wide Web server of the PBIL (Pôle Bioinformatique Lyonnais) provides on-line access to sequence databanks and to many tools of nucleic acid and protein sequence analyses. This server allows to query nucleotide sequence banks in the EMBL and GenBank formats and protein sequence banks in the SWISS-PROT and PIR formats. The query engine on which our data bank access is based is the ACNUC system. It allows the possibility to build complex queries to access functional zones of biological interest and to retrieve large sequence sets. Of special interest are the unique features provided by this system to query the data banks of gene families developed at the PBIL. The server also provides access to a wide range of sequence analysis methods: similarity search programs, multiple alignments, protein structure prediction and multivariate statistics. An originality of this server is the integration of these two aspects: sequence retrieval and sequence analysis. Indeed, thanks to the introduction of re-usable lists, it is possible to perform treatments on large sets of data. The PBIL server can be reached at: http://pbil.univ-lyon1.fr.
Times cited 28
Authors Jambon M, Imberty A, Deleage G, Geourjon C
Abstract An innovative bioinformatic method has been designed and implemented to detect similar three-dimensional (3D) sites in proteins. This approach allows the comparison of protein structures or substructures and detects local spatial similarities: this method is completely independent from the amino acid sequence and from the backbone structure. In contrast to already existing tools, the basis for this method is a representation of the protein structure by a set of stereochemical groups that are defined independently from the notion of amino acid. An efficient heuristic for finding similarities that uses graphs of triangles of chemical groups to represent the protein structures has been developed. The implementation of this heuristic constitutes a software named SuMo (Surfing the Molecules), which allows the dynamic definition of chemical groups, the selection of sites in the proteins, and the management and screening of databases. To show the relevance of this approach, we focused on two extreme examples illustrating convergent and divergent evolution. In two unrelated serine proteases, SuMo detects one common site, which corresponds to the catalytic triad. In the legume lectins family composed of >100 structures that share similar sequences and folds but may have lost their ability to bind a carbohydrate molecule, SuMo discriminates between functional and non-functional lectins with a selectivity of 96%. The time needed for searching a given site in a protein structure is typically 0.1 s on a PIII 800MHz/Linux computer; thus, in further studies, SuMo will be used to screen the PDB.
Times cited 117
Authors Errami M, Geourjon C, Deleage G
Multiple sequence alignments are essential tools for establishing the homology relations between proteins. Essential amino acids for the function and/or the structure are generally conserved, thus providing key arguments to help in protein characterization. However for distant proteins, it is more difficult to establish, in a reliable way, the homology relations that may exist between them. In this article, we show that secondary structure prediction is a valuable way to validate protein families at low identity rate.
We show that the analysis of the secondary structures compatibility is a reliable way to discard non-related proteins in low identity multiple alignment.
This validation is possible through our NPS@ server (http://npsa-pbil.ibcp.fr)
Times cited 14
Authors Campagne F, Bettler E, Vriend G, Weinstein H
Residue-based diagrams of proteins are graphical representations that can be used in protein information systems. These diagrams make it possible to visually integrate different types of biological information. The approach has been used successfully for membrane proteins. We developed the Residue-based diagram generator to (i) make it possible to generate residue-based diagrams of proteins in a batch mode that is compatible with the needs of information system curators, (ii) allow the generation of residue-based diagrams for families of soluble proteins or domains.
Licensed. Royalty free licenses are granted to non-profit institutions for educational and research purposes. http://icb.mssm.edu/crt/RbDg/index.xml
Times cited 6
Authors Bettler E, Krause R, Horn F, Vriend G
Journal Nucleic Acids Res
Abstract We present a coherent series of servers that can perform a large number of structure analyses on nuclear hormone receptors. These servers are part of the NucleaRDB project, which provides a powerful information system for nuclear hormone receptors. The computations performed by the servers include homology modelling, structure validation, calculating contacts, accessibility values, hydrogen bonding patterns, predicting mutations and a host of two- and three-dimensional visualisations. The Nuclear Receptor Structure Analysis Servers (NRSAS) are freely accessible at http://www.cmbi.kun.nl/NR/servers/html/ and in-house copies can be obtained upon request.
Times cited 2
Authors Combet C, Jambon M, Deleage G, Geourjon C
Abstract Geno3D (http://geno3d-pbil.ibcp.fr) is an automatic web server for protein molecular modelling. Starting with a query protein sequence, the server performs the homology modelling in six successive steps: (i) identify homologous proteins with known 3D structures by using PSI-BLAST; (ii) provide the user all potential templates through a very convenient user interface for target selection; (iii) perform the alignment of both query and subject sequences; (iv) extract geometrical restraints (dihedral angles and distances) for corresponding atoms between the query and the template; (v) perform the 3D construction of the protein by using a distance geometry approach and (vi) finally send the results by e-mail to the user.
Times cited 299
Authors Deleage G, Combet C, Blanchet C, Geourjon C
Journal Comput Biol Med
Abstract Programs devoted to the analysis of protein sequences exist either as stand-alone programs or as Web servers. However, stand-alone programs can hardly accommodate for the analysis that involves comparisons on databanks, which require regular updates. Moreover, Web servers cannot be as efficient as stand-alone programs when dealing with real-time graphic display. We describe here a stand-alone software program called ANTHEPROT, which is intended to perform protein sequence analysis with a high integration level and clients/server capabilities. It is an interactive program with a graphical user interface that allows handling of protein sequence and data in a very interactive and convenient manner. It provides many methods and tools, which are integrated into a graphical user interface. ANTHEPROT is available for Windows-based systems. It is able to connect to a Web server in order to perform large-scale sequence comparison on up-to-date databanks. ANTHEPROT is freely available to academic users and may be downloaded at http://pbil.ibcp.fr/ANTHEPROT.
Times cited 111
Authors Geourjon C, Combet C, Blanchet C, Deleage G
Journal Protein Sci
Abstract Molecular modeling of proteins is confronted with the problem of finding homologous proteins, especially when few identities remain after the process of molecular evolution. Using even the most recent methods based on sequence identity detection, structural relationships are still difficult to establish with high reliability. As protein structures are more conserved than sequences, we investigated the possibility of using protein secondary structure comparison (observed or predicted structures) to discriminate between related and unrelated proteins sequences in the range of 10%-30% sequence identity. Pairwise comparison of secondary structures have been measured using the structural overlap (Sov) parameter. In this article, we show that if the secondary structures likeness is >50%, most of the pairs are structurally related. Taking into account the secondary structures of proteins that have been detected by BLAST, FASTA, or SSEARCH in the noisy region (with high E: value), we show that distantly related protein sequences (even with <20% identity) can be still identified. This strategy can be used to identify three-dimensional templates in homology modeling by finding unexpected related proteins and to select proteins for experimental investigation in a structural genomic approach, as well as for genome annotation.
Times cited 39
Authors Blanchet C, Combet C, Geourjon C, Deleage G
Abstract MPSA is a stand-alone software intended to protein sequence analysis with a high integration level and Web clients/server capabilities. It provides many methods and tools, which are integrated into an interactive graphical user interface. It is available for most Unix/Linux and non-Unix systems. MPSA is able to connect to a Web server (e.g. http://pbil.ibcp.fr/NPSA) in order to perform large-scale sequence comparison on up-to-date databanks.
Times cited 20
Authors Guermeur Y, Geourjon C, Gallinari P, Deleage G
In many fields of pattern recognition, combination has proved efficient to increase the generalization performance of individual prediction methods. Numerous systems have been developed for protein secondary structure prediction, based on different principles. Finding better ensemble methods for this task may thus become crucial. Furthermore, efforts need to be made to help the biologist in the post-processing of the outputs.
An ensemble method has been designed to post-process the outputs of discriminant models, in order to obtain an improvement in prediction accuracy while generating class posterior probability estimates. Experimental results establish that it can increase the recognition rate of protein secondary structure prediction methods that provide inhomogeneous scores, even though their individual prediction successes are largely different. This combination thus constitutes a help for the biologist, who can use it confidently on top of any set of prediction methods. Moreover, the resulting estimates can be used in various ways, for instance to determine which areas in the sequence are predicted with a given level of reliability.
The prediction is freely available over the Internet on the Network Protein Sequence Analysis (NPS@) WWW server at http://pbil.ibcp.fr/NPSA/npsa_server.ht ml. The source code of the combiner can be obtained on request for academic use.
Times cited 129
Authors Deleage G, Blanchet C, Geourjon C
Abstract Recent improvements in the prediction of protein secondary structure are described, particularly those methods using the information contained into multiple alignments. In this respect, the prediction accuracy has been checked and methods that take into account multiple alignments are 70% correct for a three-state description of secondary structure. This quality is obtained by a 'leave-one out' procedure on a reference database of proteins sharing less than 25% identity. Biological applications such as 'protein domain design' and structural phylogeny are given. The biologist's point of view is also considered and joint predictions are encouraged in order to derive an amino acid based accuracy. All the tools described in this paper are available for biologists on the Web (http/www.ibcp.fr/predict.html).
Times cited 86
Authors Geourjon C, Deleage G
Journal Comput Appl Biosci
Abstract Recently a new method called the self-optimized prediction method (SOPM) has been described to improve the success rate in the prediction of the secondary structure of proteins. In this paper we report improvements brought about by predicting all the sequences of a set of aligned proteins belonging to the same family. This improved SOPM method (SOPMA) correctly predicts 69.5% of amino acids for a three-state description of the secondary structure (alpha-helix, beta-sheet and coil) in a whole database containing 126 chains of non-homologous (less than 25% identity) proteins. Joint prediction with SOPMA and a neural networks method (PHD) correctly predicts 82.2% of residues for 74% of co-predicted amino acids.
Times cited 692
Authors Geourjon C, Deleage G
Journal Protein Eng
Abstract A new method called the self-optimized prediction method (SOPM) has been developed to improve the success rate in the prediction of the secondary structure of proteins. This new method has been checked against an updated release of the Kabsch and Sander database, 'DATABASE.DSSP', comprising 239 protein chains. The first step of the SOPM is to build sub-databases of protein sequences and their known secondary structures drawn from 'DATABASE.DSSP' by (i) making binary comparisons of all protein sequences and (ii) taking into account the prediction of structural classes of proteins. The second step is to submit each protein of the sub-database to a secondary structure prediction using a predictive algorithm based on sequence similarity. The third step is to iteratively determine the predictive parameters that optimize the prediction quality on the whole sub-database. The last step is to apply the final parameters to the query sequence. This new method correctly predicts 69% of amino acids for a three-state description of the secondary structure (alpha helix, beta sheet and coil) in the whole database (46,011 amino acids). The correlation coefficients are C alpha = 0.54, C beta = 0.50 and Cc = 0.48. Root mean square deviations of 10% in the secondary structure content are obtained. Implications for the users are drawn so as to derive an accuracy at the amino acid level and provide the user with a guide for secondary structure prediction. The SOPM method is available by anonymous ftp to ibcp.fr.
Times cited 270
Authors Geourjon C, Deleage G
Journal Comput Appl Biosci
Abstract A computer module that includes multiple alignments, secondary structure prediction, and site and pattern search has been developed and integrated into our ANTHEPROT software for protein sequence analysis. All the programs can be invoked from within any routine, thus yielding multiple pathways to obtain final results. All the results are graphically displayed. The main feature of this module is that all methods are connected in an interactive graphic manner. This module has been designed to display easily the potential sites with conserved predicted structures.
Times cited 47
Authors Deleage G, Geourjon C
Journal Comput Appl Biosci
Abstract A graphic program has been developed to calculate the secondary structure content of proteins from their circular dichroism spectrum. All the information concerning analysis and results are given on a single screen. The actual and the theoretical spectra are plotted to allow visual inspection of the fit quality. The percentages of secondary structure and statistical parameters (r.m.s., residuals) are provided. The program is fully interactive for spectra analysis. Moreover, cursors driven by a mouse or arrow keys are moveable onto spectra yielding all the information concerning a given wavelength, such as the theoretical and experimental ellipticities, wavelength, values of reference model for alpha-helix, beta-sheet and beta-turn. Interfaces are provided for the CONTIN program of Provencher and Glöckner.
Times cited 188
Authors Geourjon C, Deleage G, Roux B
Journal J Mol Graph
Abstract ANTHEPROT is a fully interactive program devoted to the analysis of protein structures using a graphics workstation. It presents four options: The first option can predict secondary structures using five methods, and hydrophobicity, solvent accessibility, flexibility and antigenicity profiles using eighteen scales. The user may introduce his own scales. The results displayed on the screen can be easily analyzed. The second option is for representing results concerning up to eight proteins by one method. To compare these proteins, it is possible to align the profiles or the predicted secondary structure according to various motifs. The secondary structure deduced from crystallographic data may also be introduced. The third option is designed to compare the primary structure of two proteins and to visualize on the screen regions that exhibit similarity. Six different comparison matrices may be used, but the user can also introduce his own matrices. The last option is for studying the proteolytic peptides resulting from a chemical or enzymatic digestion of a given protein. It is possible to analyze the protein cleavage using eleven chemical reagents or enzymes. The results are displayed on the screen as RP-HPLC chromatogram.
Times cited 37
Authors Deleage G, Clerc FF, Roux B, Gautheron DC
Journal Comput Appl Biosci
Abstract A simple microcomputer package is described to make the theoretical analysis of protein sequences. Several methods designed to compare two sequences, to model proteolytic reactions and to predict the secondary structure, the hydrophobic/hydrophilic regions and the potential antigenic sites of proteins have been included in an Apple II microcomputer software. The package comprises 21 programs as well as the secondary structure database of Kabsch and Sander (1983).
Times cited 67
Authors Deleage G, Tinland B, Roux B
Journal Anal Biochem
Abstract A novel computerized program has been developed for predicting the secondary structure of proteins from their amino acid sequences. The scheme of the Chou and Fasman method (1978, Adv. Enzymol. Relat. Subj. Biochem. 47, 45-148) is closely followed. Some of their qualitative rules have been converted to numeric scales to obtain unambiguous predictions. This program has been tested on 21 proteins with known three-dimensional structures constituting a 4457 amino acids data base. The percentage of correctly predicted amino acids is between 41 and 66% for a three-state (helix, sheet, and coil) description of protein secondary structure.
Times cited 16
Authors DELEAGE G, ROUX B
Journal PROTEIN ENGINEERING
Abstract An algorithm has been developed to improve the success rate in the prediction of the secondary structure of proteins by taking into account the predicted class of the proteins. This method has been called the 'double prediction method' and consists of a first prediction of the secondary structure from a new algorithm which uses parameters of the type described by Chou and Fasman, and the prediction of the class of the proteins from their amino acid composition. These two independent predictions allow one to optimize the parameters calculated over the secondary structure database to provide the final prediction of secondary structure. This method has been tested on 59 proteins in the database (i.e. 10,322 residues) and yields 72% success in class prediction, 61.3% of residues correctly predicted for three states (helix, sheet and coil) and a good agreement between observed and predicted contents in secondary structure.
Times cited 278