InChI’ng forward: Community Engagement in IUPAC’s Digital Chemical identifier
暂无分享,去创建一个
Chemistry International January-March 2018 InChI’ng forward: Community Engagement in IUPAC’s Digital Chemical identifier by Leah McEwen Given two chemical structures, how do you determine if they are the same? How can chemical data from multiple sources be merged accurately? How can published data be consistently indexed and cross-linked for maximum discovery? Managing these processes manually, either for external or internal purposes, is untenable with the current scale of chemical information and the current diversity of sources. The ability to machine process chemical structure data is crucial across the chemical enterprise. InChI technology has become the industry standard for matching and cross-indexing in the major chemical databases. InChI is based on a canonical algorithm that notates chemical structure information in a layered format, the InChI string, with the formula and connectivity at the core. This standard form, based on a normalized structure, enables interoperability between databases. InChI strings can become quite long, however, especially for larger molecules; a hashed version of 28 characters called the InChIKey can be used for faster searching and matching. The InChIKey hashes the connectivity in one portion and additional information in another portion. This notation facilitates the automation of two key functions when working with large numbers of chemical structures: identification and verification. InChI can function as a bridge from a chemical record in one data source to a corresponding chemical record in another. By matching InChIKeys across these data sources, we can see how much overlap exists in the chemical space, but also how much unique coverage. Databases that collect data from multiple sources often apply this verification routine to sort data that can be connected directly, records that are likely unique, and records that may need further investigation as partial matches. Comparing InChIKeys can also indicate situations where the connectivity is the same, but some other variable may be present, such as stereoisomers or isotopes (see Figure 1). More information about InChI can be found on the InChI Trust website: www.inchi-trust.org; for a recent overview, see IUPAC100 Essential Tools, January 2018. (see page 32 for more information about IUPAC100 Essential Tools)