The New Nomenclature
How it works: Like the previous 1998 nomenclature, the current system names proteins according to the degree of shared amino acid identity. New sequences are compared to existing holotype proteins (those with a name ending in 1) to find the best match. The name is then assigned as the next available name, based on the percent identity. This naming system gives results consistent with the 1998 nomenclature using only slightly modified cut offs of 45%, 76%, 95% sequence identity.
Development: Discussions with academic and industrial partners helped us to establish some core principles of the revised nomenclature. One of the strongest messages that we received was that while users understood the need for change, they liked and understood the 1998 system. As a result, we chose to maximize consistency with that system to minimize name changes and potential confusion with existing literature. Other priorities were: to base the revised nomenclature on structural groups so that proteins with homologous structures (and, by extension, likely to have similar overall mechanisms of action) share similar names; and to incorporate invertebrate-active proteins from bacteria other than Bt into a unified and consistent naming structure.
A History of Naming
Original 1989 system: The original attempt to rationalize Bt toxin naming, based names on insecticidal specificity (Höfte and Whiteley, 1989, PMC: 372730). Bt insecticidal proteins were divided into Cry (crystal) toxins and Cyt proteins (with a distinct mechanism of action and which showed more general cytocidal activity, including haemolysis). The Cry proteins were then further divided according to their insect specificity by the use of roman numerals (e.g. CryI proteins were lepidopteran-active while CryIII were coleopteran-active). This was a useful system for giving consistent names but, over time, limitations were identified, including that names did not necessarily reflect protein relatedness, and that bioassay data were required for any new protein to be named.
The 1998 system: To address these issues, the system for assigning toxin names was revised in a landmark paper that has been the standard for over 20 years (Crickmore et al., 1998, PMC: 98935). In this system, proteins were named only on the basis of their amino acid sequence identity. The names beginning “Cry” or “Cyt” were subdivided at four levels, with those sharing less than 45% identity assigned different primary ranks (e.g. Cry1 and Cry2), those sharing less than 78% identity given different secondary ranks (e.g. Cry1A and Cry1B), those sharing less than 95% identity given different tertiary ranks (e.g. Cry1Aa and Cry1Ab) and, finally, all individual entries to the database within the same tertiary level were assigned a quaternary rank (e.g. Cry1Aa1 and Cry1Aa2). When vegetatively-expressed toxins from Bt were discovered, a parallel naming system was followed for these Vip proteins. In addition, some crystal proteins from other organisms (e.g. Brevibacillus laterosporus) were incorporated into the Cry nomenclature. However, proteins from other insecticidal bacteria, e.g. Lysinibacillus sphaericus, remained largely outside the Bt nomenclature system, even when they were clearly homologs of Bt toxins (e.g. the Bin toxins of L. sphaericus are related to Cry35 of Bt), and no proteins of gram negative origin were included.
Recent changes: The huge expansion of sequencing capacity has rendered the manual naming of new invertebrate-active toxins unsustainable and highlighted the need to add some automation into the process. Discussions were concurrently held on revising the nomenclature itself. Potential confusion was generated by the 1998 nomenclature, whereby the Cry designation is applied to proteins that form protein crystals during sporulation (or are closely related to proteins that do so), despite the proteins themselves belonging to several quite distinct structural groups. This confusion could often lead to statements such as “Cry proteins are composed of three domains”, a statement that, in fact, only applies to some of the Cry proteins. A similar problem exists for Vip proteins. Additionally, proteins from other organisms that were clear homologs of Cry proteins might have very different names, obscuring their commonalities. This concept of a revised nomenclature was developed and discussed with the wider community at a series of SIP, IOBC, ESA, and other meetings, and published as Crickmore et al, 2020; doi: 10.1016/j.jip.2020.107438.