Page on the stage of editing...
Biological diagnostics, an applied branch of systematics dealing with theory and practice of construction of diagnostic keys, became a separate scientific discipline in the early 1970s. Keys to plants and animals were constructed by biologists more than two centuries ago, and methods of constructing them were discussed much earlier, but it was not until the above-mentioned period that biological diagnostics became a separate branch of systematics, and this was related to the attempts to make identification by means of keys a computer-aided process. The precomputer history of keys was considered in detail in a number of publications (Lobanov, 1972; Morse, 1975; Sviridov, 1976, 1994; Pankhurst, 1978; Payne, Preece, 1980).
The biological keys, in spite of their external diversity, fit into a relatively small number of fundamentally different types. The classification of forms of diagnostic keys that remained complete till present was published in the "Entomological Review" at the dawn of the computer era (Lobanov, 1972).
Instead of an epigraph:
Keys are compiled by those who do not need them for those who cannot use them.
For discussing the main questions of diagnostics it is necessary to refine the terminology used.
As the keys (particularly computer keys) are sufficiently complex systems and possess a multitude of different characteristics, there is a multitude of different classifications of keys by different aspects. Two characteristics are of particular importance, however, and can be separated in the first turn.
These two characteristics of a key are not related to each other (it is easy to build a monoentry polytomous key and a multientry dichotomous key), but because the majority of traditional text (precomputer) monoentry keys were strictly dichotomous and the comparatively recent multientry keys (tabular, punched-card and computer keys) are normally polytomous, confusion of notions occurred and up to now (in spite of all the efforts of theoreticians of biological diagnostics) multi-entry keys are called simply polytomous, which is absolutely erroneous.
Recommendations on compiling traditional text monoentry keys were published repeatedly and it is unnecessary to consider this aspect here. We should only note the differences in the form of serial text keys (most widely spread in the Russian entomological literature, see example) and bracket text keys (nearly always used abroad, but becoming increasingly more frequent in Russia, see example) Preparing new editions we should of course give up the obsolete traditions and publish bracket keys more convenient for the user, where a thesis and an antithesis are put together.
It is more important here to touch upon more topical issues of compiling computer keys. Analysis of the last achievement of computer diagnostics allows us to make a conclusion that modern interactive keys are in no way inferior to the best traditional paper editions, and their advantages are so many that in their efficiency, availability to non-specialists in the field of systematics and reliability of identification they are are superior to paper keys by approximately an order of magnitude. They are finding ever-widening application even among conservative biologists. Their main drawback concerns only compilers, not the users, because construction of a good key is a labour-consuming process. The number of taxa included into the key increasing, expenditures on construction of one-entry text key and of multientry computer key grow in the arithmetic and geometric progression respectively.
The notion "computer identification" became so widely used that it is necessary to define clearly the domain concerned here and on other pages of our site. To begin with, here are considered only those taxonomic diagnostic systems, which are intended for the placement of animal specimens in taxa established before. Secondly, we do not consider here setting of the problem when characters of the taxon are not preset in an explicit form, but are formed by the program. Such systems are known in biology (e.g. Katsimis, Poularikas, 1986; Draper, Keefe, 1989), but those are developed by a specific domain of cybernetics - artificial perception theory. Eventually there are systems where taxonomic characters are defined by an expert, but reading of these characters from the specimen identified is performed not by an expert, but by an optical device and controlling system. Systems of this kind are developed by specialists in computer-aided processing of images (Fdez-Valdivia J. et al., 1992); they are used also at our institute (Galtsova, Kulangieva, 1995; Galtsova, Starobogatov et al., 1995) for the identification of nematodes and molluscs, but considering those is outside the scope of our review.
The first publications on using computers for the identification of biological objects appeared in the late 1960s (Kiskin et al., 1965; Ladley, 1965; Goodall, 1968; Morse, 1968 et al.). This aroused interest in the history and theory of compiling keys and soon gave rise to a separate scientific discipline of biological diagnostics, i.e. applied systematics, dealing with theory and practice of compiling of diagnostic keys. In the early 1970s the first peak of activity in the area of development of methods of computer identification was observed. It was at that period that the historical symposium in Cambridge took place in 1973 and proceedings "Biological Identification with Computers" (Pankhurst, 1975) was published, which later became a bible for scientists working in this fringe area of biology and informatics. But it was not until personal computers spread widely that not only theoretical papers appeared, but also adequate software for dialogue identification and computer-based biological keys were designed. An important milestone in biological diagnostics was the recent conference and Canterbury (December 1996, Great Britain) - Computer-Based Species Identification dealing with computer aspects of identification and the 21 Anniversary of the symposium and Cambridge. Observations on thirty early evolution of computer keys "from the inside" (as a participant of this process) give reason for the assessment of its results and further perspectives. In a generalized form 2 stages of this evolution can be revealed: the stage of increasing diversity of computer keys and the stage of their subsequent convergence.
The first stage was provided by very different opportunities of biologists of different countries and different institutions in the sphere of hardware. For experiments in the field of diagnostics usually large computers with different periphery and essentially different ways of access of the user to resources of the computer were used. This naturally evoked appearance of very different diagnostic programs: from the most simple search in the package of punched cards to quite acceptable interactive keys. Computers were improved along with the development of optimum methods of construction of keys.
For the more than 10 years the developments are oriented towards personal computers exceeding in performance the giant computers of the 1960s and 1970s. Now there is a great diversity of programs in terms of their interface and means of using of graphic images. The internal structure of computer keys has undergone notable convergence and now nearly all diagnostic programs that can be widely used by biologists have come to a practically similar optimum variant, which can be formulated as "multientry polytomous dialogue stepwise computer key, with a wide use of images of taxa and their characters, with computer assessment and ranging of characters at each step of identification and with a set of methods for increasing reliability of identification". Of particular interest are keys that use possibilities of the Internet.
Nearly all modern computer keys are stepwise using one character or several characters at each step and with cyclic repetition of the same operations for these steps. Certainly all programs considered are dialogue or interactive, i.e. implying stepwise actions of the computer and user (earlier before appearance of personal computers, their alternative was package regime of using computer, when the user gave the operator the task to process information and received reply several minutes or hours later - dialogue was excluded.
A generalized scenario of diagnostic program performance can be presented in the form the following algorithm:
1. Assessment of all possible characters for the array of taxa and input of them to the user.
2. Selection of the most convenient character and input of data on the state of that character of the specimen identified.
3. Search of all possible taxa and reduction of this set by means of excluding taxa that do not have the input state.
4. If the identification is not completed, the next step of the identification is item 1.
The most modern and promising programs realizing computer interactive multientry keys include:
Consideration and generalization of positive features of all known programs allows us to synthesize description of an "ideal" computer key. This should be undoubtedly multientry polytomous key. At each step the program should provide the most favourable conditions for choosing the character. It is best to list characters not in the standard order, but in the order of descending of their diagnostic value as a mathematical assessment of a potential possibility to split the set of taxa into minimum subsets. Ideally the value should be an integrated assessment optimizing the length of way of optimization and reliability of the diagnosis, or optimizing these parameters for the most common taxa. For correct understanding of the essence of characters it is preferable to place emphasis not upon their textual descriptions, but upon illustrative figures. With the large number of characters it is worthwhile to split them into groups and to give the user an opportunity to work with characters of one group. When search is made for state within the characters illustrations are even more necessary. Additional facilities at this moment are an opportunity of multiple selection (indication of several states at a time, if there is no confidence in the selection of only one state) and special references are made for those states that may have place in taxa of the current set. Selection of such states is usually a mistake and it is better to warn the user.
On the conclusive stage the program usually reduces the current set of taxa leaving there only those taxa which may have a set state of the character. When there are two or more taxa the program should make revaluation of available characters and pass to the next step. More promising is another approach when the initial set of taxa is not reduced, and at each step probabilities of belonging of the "image" of the specimen identified, accumulated through the previous steps, to each of the taxa are recalculated. Such approach allows to obtain correct identification even in case of user's mistake in some characters. After the step is completed the user should learn its result without additional efforts, i.e. to see the number of taxa remaining in the reduced set or list of taxa having the maximum probability of conforming to the input characters (in the second approach). In the latter case in the work with large keys (for dozens and hundreds of taxa) the user should be given an opportunity to see not all taxa, but only those taxa whose probability does not exceed a certain threshold. It is desirable to give the opportunity to view the image of taxa from the current set, sometimes this may facilitate significantly completion of the diagnosis. When the user discovers mistake in already entered characters the program should allow for "going back" and returning by one or several steps for correcting the error. After the diagnosis is completed it is desirable to give the user a maximally complete set of information on the taxon obligatorily including additional differentiating characters for checking the figures (both total and those of major details of structure).
This hypothetical optimum variant can be formulated as "multientry polytomous dialogue stepwise computer key widely using high quality colour images of taxa and their characters computer-processed with ranging of their characters at each step of the identification and set of modes for improving reliability of the identification".
Of particular interest are keys that appeared in the recent years that use the opportunities provided by protocols of the Inernet. Monoentry keys easily realized by means of hypertext have the most simple design. Even using a minimum set of tags of HTML language one can build a sufficiently convenient and efficient key. Using of Java-scripts and applets allows to supplement such key by very useful options. Unfortunately few attempts have been made to construct a true multientry key for using on the Internet working with the standard data base, e.g. through the interface CGI or ODBC. But rapid development of instrumental tools for this global net allows to hope that in the nearest future new interesting realizations of multientry keys will appear.
Analysis of the latter achievement of computer diagnostics allows to make a conclusion that modern computer interactive keys are in no way inferior to the best traditional paper editions, and have so many advantages, that in efficiency, availability to non-specialists in the field of systematics and in terms of reliability of identification they by an order of magnitude are superior to printed keys. Undoubtedly accumulation of diagnostic information in standard data bases or in DELTA format for subsequent using in interactive multientry keys will become in the nearest future a standard procedure for systematics in botany and zoology.
Бутаков Е.А., Лелеков С.Г. Диалоговая система определения объектов на основе графического интерфейса // Взаимодействие человека с компьютером. Доклады 1-го Московского Международного семинара, Москва, 5-9 августа 1991. - М.: ICSTI, 1991. С. 353-357.
Бутаков Е.А., Лелеков С.Г. и др. Компьютерное определение личинок рыб Черного моря // Вопросы ихтиологии. 1995. Т. 35, N 1. С. 43-47.
Dallwitz M.J., Paine T.A. User's Guide to DELTA System. A general system for coding taxonomic description // Division of Entomology. Report N 13 (Third Edition). 1986. Canberra: CSIRO. 80 p.
Дианов М.Б., Лобанов А.Л. PICKEY - program for identification of organisms by interactive use of images // Базы данных и компьютерная графика в зоологических исследованиях (Труды Зоологического института, т. 269). 1997. С. 35-39.
Draper S.R., Keefe P.D. Machine vision for the characterization and identification of cultivars // Plant Varieties and Seeds. 1989. Vol. 2, N 1. P. 52-62.
Estep K.W., Sluys R., Syvertsen E.E. "Linnaeus" and beyond: workshop report on multimedia tools for the identification and database storage of biodiversity // Hydrobiologia. 1993. Vol. 269/270. P. 519-525.
Fdez-Valdivia J. et al. Line detection and texture analysis for automatic nematode identification // J. Nematology. 1992. Vol. 24, N 4. P. 571-577.
Galtsova V.V., Kulangieva L.V. Expert system for identification of freeliving nematodes // 9-th International Meiofauna Conference. Perpignan, France. 1995. P. 62.
Galtsova V.V., Starobogatov Ya.I. et al. The conceptual scheme of expert system for taxonomy of invertebrates (with special reference to nematodes and mollusks // Towards a regional ETI Branch in St. Petersburg. 1995. P. 21-22.
Katsimis C., Poularikas A.D. Pattern recognition of zooplancton images using circular sampling technique // Proc. SPIE, 1986, vol. 596. P. 207-211.
Кискин П.Х., Печерская И.H., Печерский Ю.H. Автоматизация диагностического поиска сортов винограда на ЭВМ "Минск-1" // Виноделие и виноградарство СССР. 1965. N 1. С. 21-22.
Лобанов А.Л. Логический анализ и классификация существующих форм диагностических ключей // Энтомол. обозр. 1972. Т. 51, N 3. С. 668-681.
Лобанов А.Л. Оценка диагностической ценности рядов признаков в многовходовых определителях, рассчитанных на использование ЭВМ // Тезисы докладов VI Коми республиканской молодежной научной конф. Сыктывкар. 1974. С. 125-126.
Лобанов А.Л. Результаты экспериментов с биологическими диагностическими системами на базе ЭВМ "Hаири-С" // Биологические исследования на Северо-Востоке Европейской части СССР (Ежегодник Института биологии Коми филиала АH СССР). Сыктывкар. 1975а. С. 162-168.
Лобанов А.Л. Математический аппарат для расчета, оценки и сравнения конструктивных параметров диагностических ключей // Зоол. ж. 1975б. Т. 54, вып. 4. С. 485-497.
Лобанов А.Л. Принципы построения определителей насекомых с использованием электронных вычислительных машин. - Автореферат диссертации на соискание ученой степени канд. биол. наук. Л.: ЗИH АH СССР, 1983. 19 c.
Лобанов А.Л., Дианов М.Б. Диалоговая компьютерная диагностическая система BIKEY и возможности ее использования в энтомологии // Энтомол. обозр. 1994. Т. 73, вып. 2. С. 465-478.
Лобанов А.Л. Dialogue PC-based biological identification systems BIKEY5 and BIKEY6 // Базы данных и компьютерная графика в зоологических исследованиях (Труды Зоологического института, т. 269). 1997a. С. 61-65.
Лобанов А.Л. Biological identification with computers: 30 years of evolution // Компьютерные базы данных в ботанических исследованиях. Сборник научных трудов. 1997b. С. 51-55.
Лобанов А.Л., Степаньянц С.Д. Оптимизация определителя семейств медуз подотряда Filifera (Hydrozoa, Athecata) с помощью компьютерной диагностической системы. // Морской планктон. Систематика, экология, распределение. 2 (Исследования фауны морей. Вып. 45 (53)). 1993. С. 38-50.
Лобанов А.Л., Степаньянц С.Д., Дианов М.Б. BIKEY - диалоговая компьютерная программа для определения биологических объектов и ее использование в диагностике книдарий // Книдарии. Современное состояние и перспективы исследований. 2 (Труды Зоологического института. Т. 261). 1995. Т. 261. С. 20-70.
Lobanov A.L., Stepanjants S.D., Dianov M.B. Dialogue computer system BIKEY as applied to diagnostics of Cnidaria (illustrated an example of hydroids of the genus Symplectoscyphus) // Scientia Marina (Special volume: Advances in Hydrozoan Biology; S.Piraino, J.Bouillon et al. (eds.)). 1996. Vol. 60, N 1. P. 211-220.
Lobanov A.L., Schilow W.F., Nikritin L.M. Zur Anwendung von Computern fur die Determination in der Entomologie // Deutsch. Entomol. Z., 1981, N.F. 28, Heft 1-3. S. 29-43.
Miller M.G., Day E.R. Interactive taxonomy: name that bug in three touches or less // Amer. Entomol. 1990. Vol. 36, N 3. P. 219-224.
Morse L.E. Construction of identification keys by computer // Amer. J. of Botan. 1968. Vol. 55, N 6. P. 737.
Morse L.E. Computer programs for specimen identification, key construction and description printing using taxonomic data matrices // Publ. Mus. Michig. State Univ., Biol. Ser. 1974. Vol. 5, N 1. P. 1-128.
Morse L.E. Recent advances in the theory and practice of biological specimen identification // Biological Identification with Computers. London. 1975. P. 11-52.
Pankhurst R.J. A computer program for generating diagnostic keys // Computer J. 1970. Vol. 13, N 2. P. 145-151.
Pankhurst R.J. (ed.). Biological Identification with Computers. - London: Academic Press, 1975. 333 p.
Pankhurst R.J. Biological Identification. The principles and practice of identification methods in biology. - London: Edward Arnold, 1978. 104 p.
Pankhurst R.J. Practical taxonomic computing. - Cambridge: Cambridge University Press, 1991. 202 p.
Payne R.W., Preece D.A. Identification keys and diagnostic tables: a review // Journ. of the Royal Statist. Soc., Series A. 1980. Vol. 143, N 3. P. 253-292.
Свиридов А.В. Проблема соотношения биологической диагностики и систематики // Журн. общ. биол. 1973. Т. 34, N 6. С. 900-906.
Свиридов А.В. Материалы по истории методов диагностики биологических объектов // Hаучн. докл. высш. школы. Биол.науки. 1976. N 8. С. 7-22.
Свиридов А.В. О некоторых актуальных вопросах теории идентификации биологических объектов с помощью ключей // Научн. докл. высшей школы. Биол. науки. 1978. N 10. С. 15-28.
Свиридов А.В. Ключи в биологической систематике: теория и практика. - М.: Изд-во МГУ. 1994. 224 с.
Свиридов А.В. Типы биодиагностических ключей и их применение. - М. : Зоомузей МГУ, 1994. 1-112 с.
Sviridov A.V., Leuschner D. Optimization of taxonomic keys by means of probabilistic modelling // Biometr. Journal. 1986. Vol. 28, N 5. P. 609-616.
A.L. Lobanov, January 2003