Molecular Descriptors For Cheminformatics Pdf Download

16.09.2019Posted by admin

Cheminformatics (also known as chemoinformatics, chemioinformatics and chemical informatics) is the use of computer and informational techniques applied to a range of problems in the field of chemistry. These in silico techniques are used, for example, in pharmaceutical companies and academic settings in the process of drug discovery. These methods can also be used in chemical and allied industries in various other forms.^[1]

Molecular Descriptors For Cheminformatics Pdf Download Windows 7

3Applications
- 3.1Storage and retrieval

History[edit]

Molecular descriptors for cheminformatics pdf download pdf

The term chemoinformatics was defined by F.K. Brown^[2]^[3] in 1998:

'Chemoinformatics is the mixing of those information resources to transform data into information and information into knowledge for the intended purpose of making better decisions faster in the area of drug lead identification and optimization.'

Since then, both Cheminformatics and Chemoinformatics spellings have been used, and usage has evolved to establish Cheminformatics as the more popular term^[4]^[5]. While European Academia settled in 2006 for Chemoinformatics,^[6] the establishment in 2009 of the Journal of Cheminformatics is a strong push towards the shorter variant.

Motivation: Molecular representation for small molecules has been routinely used in QSAR/SAR, virtual screening, database search, ranking, drug ADME/T prediction and other drug discovery processes. To facilitate extensive studies of drug molecules, we developed a freely available, open-source python package called chemoinformatics in python (ChemoPy) for calculating the commonly used. Dragon is the world-wide most used application for the calculation of molecular descriptors.Its new version, Dragon 7.0, provides an improved user interface, new descriptors and additional features such as the calculation of fingerprints and the support for disconnected structures. While the major focus of CLiDE is to convert scanned images of 2D structures into MOL or ChemDraw file formats, in the latest version it can also convert 2D structures from PDF documents into various chemical file formats. CLiDE version 2.1 contains many new features and enhancements, including: handling of PDF input files.

Basics[edit]

Cheminformatics combines the scientific working fields of chemistry, computer science and information science for example in the areas of topology, chemical graph theory, information retrieval and data mining in the chemical space.^[7]^[8]^[9]^[10] Cheminformatics can also be applied to data analysis for various industries like paper and pulp, dyes and such allied industries.

Applications[edit]

Storage and retrieval[edit]

The primary application of cheminformatics is in the storage, indexing and search of information relating to compounds. The efficient search of such stored information includes topics that are dealt with in computer science as data mining, information retrieval, information extraction and machine learning. Related research topics include:

Unstructured data
Structured data mining and mining of structured data

File formats[edit]

The in silico representation of chemical structures uses specialized formats such as the XML-based Chemical Markup Language or SMILES. These representations are often used for storage in large chemical databases. While some formats are suited for visual representations in 2 or 3 dimensions, others are more suited for studying physical interactions, modeling and docking studies.

Virtual libraries[edit]

Chemical data can pertain to real or virtual molecules. Virtual libraries of compoundsmay be generated in various ways to explore chemical space and hypothesize novel compounds with desired properties.

Virtual libraries of classes of compounds (drugs, natural products, diversity-oriented synthetic products) were recently generated using the FOG (fragment optimized growth) algorithm.^[11] This was done by using cheminformatic tools to train transition probabilities of a Markov chain on authentic classes of compounds, and then using the Markov chain to generate novel compounds that were similar to the training database.

Virtual screening[edit]

In contrast to high-throughput screening, virtual screening involves computationallyscreening in silico libraries of compounds, by means of various methods such asdocking, to identify members likely to possess desired propertiessuch as biological activity against a given target. In some cases, combinatorial chemistry is used in the development of the library to increase the efficiency in mining the chemical space. More commonly, a diverse library of small molecules or natural products is screened.

Quantitative structure-activity relationship (QSAR)[edit]

This is the calculation of quantitative structure–activity relationship and quantitative structure property relationship values, used to predict the activity of compounds from their structures. In this context there is also a strong relationship to chemometrics. Chemical expert systems are also relevant, since they represent parts of chemical knowledge as an in silico representation. There is a relatively new concept of matched molecular pair analysis or prediction-driven MMPA which is coupled with QSAR model in order to identify activity cliff.^[12]

References[edit]

^Thomas Engel (2006). 'Basic Overview of Chemoinformatics'. J. Chem. Inf. Model.: 2267-2277. doi:10.1021/ci600234z.
^F.K. Brown (1998). Chapter 35. Chemoinformatics: What is it and How does it Impact Drug Discovery. Annual Reports in Med. Chem. Annual Reports in Medicinal Chemistry. 33. pp. 375–384. doi:10.1016/S0065-7743(08)61100-8. ISBN978-0-12-040533-6.
^Brown, Frank (2005). 'Editorial Opinion: Chemoinformatics – a ten year update'. Current Opinion in Drug Discovery & Development. 8 (3): 296–302.
^Cheminformatics or Chemoinformatics ?
^[1] www.genomicglossaries.com Tips & FAQs for the Biopharmaceutical glossaries #3
^Obernai Declaration
^Gasteiger J.(Editor), Engel T.(Editor): Chemoinformatics : A Textbook. John Wiley & Sons, 2004, ISBN3-527-30681-1
^A.R. Leach, V.J. Gillet: An Introduction to Chemoinformatics. Springer, 2003, ISBN1-4020-1347-7
^Alexandre Varnek and Igor Baskin (2011). 'Chemoinformatics as a Theoretical Chemistry Discipline'. Molecular Informatics. 30 (1): 20–32. doi:10.1002/minf.201000100. PMID27467875.
^Barry A. Bunin (Author), Brian Siesel (Author), Guillermo Morales (Author), Jürgen Bajorath (Author): Chemoinformatics: Theory, Practice, & Products. Springer, 2006, ISBN978-1402050008
^Kutchukian, Peter; Lou, David; Shakhnovich, Eugene (2009). 'FOG: Fragment Optimized Growth Algorithm for the de Novo Generation of Molecules occupying Druglike Chemical'. Journal of Chemical Information and Modeling. 49 (7): 1630–1642. doi:10.1021/ci9000458. PMID19527020.
^Sushko, Yurii; Novotarskyi, Sergii; Körner, Robert; Vogt, Joachim; Abdelaziz, Ahmed; Tetko, Igor V. (2014). 'Prediction-driven matched molecular pairs to interpret QSARs and aid the molecular optimization process'. Journal of Cheminformatics. 6 (1): 48. doi:10.1186/s13321-014-0048-0. PMC4272757. PMID25544551.

External links[edit]

Cheminformatics at Curlie

Retrieved from 'https://en.wikipedia.org/w/index.php?title=Cheminformatics&oldid=909899401'

BioC (Release) · BioC (Development) · GitHub (Latest)

Overview

Rcpi offers a molecular informatics toolkit with a comprehensive integration of bioinformatics and cheminformatics tools for drug discovery. For more information, please see our paper <DOI:10.1093/bioinformatics/btu624> (PDF).

Paper Citation

Formatted citation:

Dong-Sheng Cao, Nan Xiao, Qing-Song Xu, and Alex F. Chen. (2015). Rcpi: R/Bioconductor package to generate various descriptors of proteins, compounds and their interactions. Bioinformatics 31 (2), 279-281.

BibTeX entry:

Installation

To install the Rcpi package:

To make the package fully functional (especially the Open Babel related functions), we recommend installing the Enhances packages by:

Several dependencies of the Rcpi package may require some system-level libraries, check the corresponding manuals of these packages for detailed installation guides.

Browse the package vignettes: [1], [2] for a quick-start.

Features

Rcpi implemented and integrated the state-of-the-art protein sequence descriptors and molecular descriptors/fingerprints with R. For protein sequences, the Rcpi package could

Calculate six protein descriptor groups composed of fourteen types of commonly used structural and physicochemical descriptors that include 9920 descriptors.
Calculate six types of generalized scales-based descriptors derived by various dimensionality reduction methods for proteochemometric (PCM) modeling.
Parallellized pairwise similarity computation derived by protein sequence alignment and Gene Ontology (GO) semantic similarity measures within a list of proteins.

For small molecules, the Rcpi package could

Calculate 307 molecular descriptors (2D/3D), including constitutional, topological, geometrical, and electronic descriptors, etc.
Calculate more than ten types of molecular fingerprints, including FP4 keys, E-state fingerprints, MACCS keys, etc., and parallelized chemical similarity search.
Parallelized pairwise similarity computation derived by fingerprints and maximum common substructure search within a list of small molecules.

By combining various types of descriptors for drugs and proteins in different methods, interaction descriptors representing protein-protein or compound-protein interactions could be conveniently generated with Rcpi, including:

Molecular descriptors for cheminformatics pdf download torrent

Two types of compound-protein interaction (CPI) descriptors
Three types of protein-protein interaction (PPI) descriptors

Several useful auxiliary utilities are also shipped with Rcpi:

Parallelized molecule and protein sequence retrieval from several online databases, like PubChem, ChEMBL, KEGG, DrugBank, UniProt, RCSB PDB, etc.
Loading molecules stored in SMILES/SDF files and loading protein sequences from FASTA/PDB files
Molecular file format conversion

The computed protein sequence descriptors, molecular descriptors/fingerprints, interaction descriptors and pairwise similarities are widely used in various research fields relevant to drug disvery, primarily bioinformatics, cheminformatics, proteochemometrics, and chemogenomics.