FIXME This page was translated from German into English using DeepL. Please help completing the translation.
(remove this paragraph once the translation is finished)

What does ... mean?



Asynchronous JavaScript And XML is a programming concept that exchanges data between browser and server without reloading the entire web page.

Apache license

The Apache license is a free software license from the Apache Software Foundation that does not have a copyleft notice.


The application programming interface (programming interface) is understood to be the part of a software that is made available so that other systems can communicate with the software.


Beta code ancient Greek

Greek Beta Code is the 7-bit secure encoding using the US ASCII character set. Each diacritical character is represented by a separate character following the letter (exception: for uppercase letters preceding the letter). Beta Code does not distinguish between lowercase and uppercase letters; uppercase letters are preceded by * asteriskos (Greek ἀστερίσκος). Some projects use only upper case letters (e.g. TLG), others only lower case letters (e.g. the Perseus Project). See also: Betacode transcription table ancient Greek. ἀστερίσκος in beta code ancient greek:


Big Data

Analysis of large amounts of data from various sources with the aim of generating economic value from it.


Benedictus Gotthelf Teubner founded the publishing house, B.G. Teubner, in Leipzig in 1811, where the Bibliotheca scriptorum Graecorum et Romanorum Teubneriana (Bibliotheca Teubneriana), a nearly complete series of scholarly editions of Greek and Latin literature from antiquity to modern times, was published from 1849.


The Bibliotheca Teubneriana Latina Online provides electronic access to all editions of Latin texts published in the Bibliotheca Teubneriana (without preface or critical apparatus).



The term Creative Commons (CC) is understood to be a collection of licenses with which an author can grant rights of use for his work. Through the combination of the rights modules

  • by Attribution
  • nc Non-Commercial
  • nd No Derivatives
  • sa Share Alike

the release can be graduated according to the wishes of the author.


Copyleft is a clause in usage licenses that specifies that all modifications to a work are permitted only if they are distributed under substantially the same license terms.


The text-based file format CSV (Comma-separated values) is a form of DSV (Delimiter-separated values). The data is stored in tabular form, i.e. two-dimensional. Each row is a record. Fields are separated by comma or semicolon. Parallel passages from TATIANUS (TLG) in CSV format:

Original Sentence; Reference; Original Author; Original Publication; Original DC; Author; Publication; DC; Similarity; Dating; Author Name; Author Epiteths; Author ID; AuthorID-WorkID
"Τυρρηνοὶ σάλπιγγα, χαλκεύειν Κύκλωπες, καὶ ἐπιστολὰς συντάσσειν ἡ Περσῶν ποτε ἡγησαμένη γυνή, καθά ϕησιν Ἑλλάνικος:";"εὗρεν) ἡ Περσῶν ποτε ἡγησαμένη γυνή, καθά ϕησιν Ἑλλάνικος:";"TATIANUS  Apol.  [1766]";"Oratio ad Graecos, ed. E.J. Goodspeed, Die ältesten  Apologeten. Göttingen: Vandenhoeck & Ruprecht, 1915: 268-305.  (Cod: 10,694: Apol., Orat.)  ";"1T/2/2 to 1T/2/4 (Schema:Chapter/section/line )";"HELLANICUS  Hist.  [0539]";"Fragmenta, FGrH #4, #323a, #601a, #608a, #645a, #687a:  1A:107-152, *6-*8 addenda; 3B:41-50, 732-733; 3C:1-2, 190,  412-414.  fr. 124b (PSI 1173): vol. 1A, p. *6 addenda.  fr. 189 (P. Oxy. 10.1241): vol. 1A, p. 150.  fr. 201 bis (P. Giss. 307v): vol. 1A, p. *7 addenda.  (Pap: 18,331: Hist., Myth.)  ";"1a,4,F/179a/3 to 1a,4,F/179a/4 (Schema:Volume-Jacoby#-F//fragment/line )";67;-450.5;"HELLANICUS ";"Hist. ";"0539";"0539-002"
"Τυρρηνοὶ σάλπιγγα, χαλκεύειν Κύκλωπες, καὶ ἐπιστολὰς συντάσσειν ἡ Περσῶν ποτε ἡγησαμένη γυνή, καθά ϕησιν Ἑλλάνικος:";"εὗρεν) ἡ Περσῶν ποτε ἡγησαμένη γυνή, καθά ϕησιν Ἑλλάνικος:";"TATIANUS  Apol.  [1766]";"Oratio ad Graecos, ed. E.J. Goodspeed, Die ältesten  Apologeten. Göttingen: Vandenhoeck & Ruprecht, 1915: 268-305.  (Cod: 10,694: Apol., Orat.)  ";"1T/2/2 to 1T/2/4 (Schema:Chapter/section/line )";"HELLANICUS  Hist.  [0539]";"Fragmenta, FGrH #4, #323a, #601a, #608a, #645a, #687a:  1A:107-152, *6-*8 addenda; 3B:41-50, 732-733; 3C:1-2, 190,  412-414.  fr. 124b (PSI 1173): vol. 1A, p. *6 addenda.  fr. 189 (P. Oxy. 10.1241): vol. 1A, p. 150.  fr. 201 bis (P. Giss. 307v): vol. 1A, p. *7 addenda.  (Pap: 18,331: Hist., Myth.)  ";"3c,687a,F/8a/3 to 3c,687a,F/8a/3 (Schema:Volume-Jacoby#-F//fragment/line )";67;-450.5;"HELLANICUS ";"Hist. ";"0539";"0539-002"


The CTS (Canoncial Text Services) notation system, as part of the CITE architecture, provides a network-based service for identifying classical texts based on URNs. CTS URNs are divided into five parts separated by colons: urn:ctn:ctnNameSpace:WorkIdentifier:PassageIdentifier.



Digital Classics Books is an open access monograph series that publishes work in ancient studies and related fields in conjunction with the application or development of methods from the Digital Humanities.


Digital Classics Online is an opens access journal that publishes papers from ancient studies and related fields in connection with the application or development of methods from the Digital Humanities.


Der Neue Pauly. Enzyklopädie der Antike.

See also: RE


Digital Object Identifier (DOI) have been coordinated by the International DOI Foundation (IDF) since 1998. With DOI, both physical, digital and abstract objects can be permanently and uniquely identified and localized. The scheme, which always starts with 10, is preceded by a doi for identification: doi:10.ORGANISATION/ID.

An example:

Ch. Schubert (ed.): Working Papers Contested Order (NO. 10): Das Portal eAQUA – Neue Methoden in der geisteswissenschaftlichen Forschung V


Editing distance


Entropy in information theory indicates how many bits are needed on average to encode a value of a random variable as an event (as part of a message). The more bits needed, the higher the entropy and the more difficult it is to predict an event.



Forschungsdaten-Management (Engl.: Research Data Management)



All norm data of the German National Library are made available by means of Linked Data Service as Gemeinsame Normdatei. The Common Norm File incorporates the Norm File of Persons, which in turn contains all records of the Personal Names of Antiquity (PAN).


The GNU General Public License (also GPL or GNU GPL) is a license that allows to use, distribute, study or also modify a software for free. All programs derived from the software must also be licensed under the terms of the GPL (copyleft).



Hypertext Markup Language is a text-based markup language for the structured representation of content in electronic documents.



Various methods of image compression presented by the Joint Photographic Experts Group in 1992 in the form of a standard are summarized under the term JPEG.


JavaScript Object Notation is a compact data format designed for transferring data between client and server. Auszug von TLG-Metadaten in JSON:

  {"corpora_work_id":"002","work":"Fragmenta in epistulam ad Romanos (in catenis)"}
  {"corpora_work_id":"001","work":"Fragmenta "},
  {"corpora_work_id":"002","work":"Fragmentum (P. Oxy. 32.2637)"}



The Kleine Pauly.

See also: RE


The co-occurrence of two lexical units, e.g. words, within a higher-level segment, e.g. sentence, is called co-occurrence in general linguistics.



Reduction to the basic form of a word, i.e. the form under which the term can be found in a reference work.

Levenshtein Distance

Number of insert, delete and replace operations to transform one string into another.

See also: Editing distance for parallel passage search.



Metadata, or metainformation, is generally data that contains information about features that are not part of the data itself. In a corpus analysis, for example, all bibliographic information is treated as metadata.

MIT License

The MIT License (also X License or X11 License) is a license for software use originating from the Massachusetts Institute of Technology that permits the software to be used, copied, modified, merged, published, distributed, sublicensed, and/or sold, provided that a copyright notice and the permission notice accompany the copies.


A Markup Language or markup language describes the content of a document or the procedure required to process the data. HTML, XML or LaTeX are markup languages.

See also: Migne Latinus



Notation 3 is a formal language that can be used, for example, as syntax for RDF data:

<#Tim Berners-Lee> <#entwickelte> <#N3> .


Zerlegung eines Textes in einzelne Fragmente der Anzahl N. Die Fragmente können Buchstaben, Phoneme oder auch Wörter sein. In der Computerlinguistik finden sich oft Bi- oder Trigramme aus Zeichen (Buchstaben und/oder Satzzeichen).


Named Entity Recognition – Proper name recognition. Terms of a text are assigned to certain classes, e.g. places or persons.


In the context of written language, the term normalization is used unspecifically for a bundle of measures that all aim to bring about a uniform, formal and syntactic representation.



Personennamen der Antike is the standardization of personal names of the Greek and Latin speaking antiquity, originally in book form, meanwhile published electronically.



A parser is a program that parses an input and converts it into a format that can be used for further processing.

Persistent Identifier

An artificially assigned characteristic for the unique, permanent identification of a subject/object is called a persistent identifier (persistent ID or PID).


The Latin Library Texts of the Packard Humanities Institute (PHI) version 5.3 is a CD-Rom with Latin full texts and Bible versions up to the second century A.D. Meanwhile the texts can be viewed online: Classical Latin Texts.


One of the Packard Humanities Institute's (PHI) longest-running projects is a comprehensive database of all ancient Greek inscriptions, published as a licensed CD-ROM under the title: PHI CD ROM #7: Greek Inscriptions. Together with Cornell University and Ohio State University, it makes the corpus available online: Searchable Greek Inscriptions.


The Patrologia Latina (also ML für Migne Latinus) is the abbreviated form of the print series edited by Jacques-Paul Migne on the Latin writings of ecclesiastical writers from the beginnings to the time of Innocent III (1161–1216).


Portable Network Graphics is a graphics format that can compress without loss. It was developed as a free replacement for Graphics Interchange Format (GIF) and supports transparency via alpha channel.


Part-of-Speech Tagging assigns the words of a text to parts of speech.

Probability distribution

The probability distribution is the theoretical counterpart to the empirically determinable frequency distribution. It describes the probabilities with which a random variable assumes its possible values.


A Persistent Uniform Resource Locator does not refer directly to a resource in the form of a URL, but to a resolver that provides the current Internet URL. DOI or URN exist alternatively.



RE is the abbreviation of Paulys Realencyclopädie der classischen Altertumswissenschaft (also called Pauly-Wissowa). The Encyclopedia of Antiquity was published from 1893 to 1987 and was conceived as a complete new edition of the so-called “Ur-Pauly”, the Real-Encyclopädie der classischen Alterthumswissenschaft (1837–1864) founded by August Friedrich Pauly. The RE consists of 68 half volumes, 15 supplement volumes and an index of the supplements and supplement volumes.

A compact and also for private persons affordable edition appeared between 1964 and 1975 with the five volumes Der kleine Pauly (KIP).

Der Neue Pauly. Enzyklopädie der Antike (DNP, vereinzelt auch NP) has been published by J. B. Metzler Verlag since 1996. In addition to classical antiquity as the main focus, the New Pauly has also published volumes on the history of reception and science.


In computer science, a resolver is generally referred to as a name resolution software. A link resolver resolves metadata, e.g. in the form of a URN, into local inventory data and provides the matching hyperlink.


Resource Description and Access designates a new standard for cataloguing resources in libraries, archives and museums as the successor to the Anglo-American Cataloguing Rules (AACR2).


The Resource Description Framework was developed by the World Wide Web Consortium (W3C) to describe metadata. It is now considered an essential component of the so-called semantic Web. Statements in the RDF model are formed as triples of subject, predicate, and object, mostly in the form of XML or N3.



In statistics, significance is a measure of the probability of a systematic relationship between variables.


An algorithm that calculates the similarity of two texts based on characters and using the editing distance: sim = { n_{ab} * 2 } / { n_a + n_b }1).


Database language in relational databases. SQL (General language use: Structured Query Language) distinguishes three categories of commands:

  • Data Manipulation Language (DML)
  • Data Definition Language (DDL)
  • Data Control Language (DCL).

Stop words

A list of words that should not be included when processing a text.


Scalable Vector Graphics is based on XML and describes two-dimensional vector graphics.




The document format of the same name, developed by the Text Encoding Initiative, is based on XML in the current version P5 and has become the de facto standard for encoding printed works in the humanities.



Tagged Image File Format is an image file format used especially for high-resolution images in printable, lossless quality.


The Thesaurus Linguae Graecae® (TLG®) is a research program established in 1972 at the University of California, Irvine. It collected and digitized most of the literary texts in Greek from Homer (from about the 8th century B.C.) to the fall of Byzantium (A.D. 1453). Initially, the texts were distributed on CD-ROM. In the meantime, they can be received online: TLG - Home.


In computational linguistics, this refers to the decomposition into segments at the word level.


The text-based file format TSV (Tab-Separated Values) is a form of DSV (Delimiter-separated values). The data is stored in tabular form, i.e. two-dimensional. Each row is a record. Fields are separated by means of Tab-Stop.




According to RFC 1630 by T. Berners-Lee from 1994, URI is an acronym for Universal Resource Identifiers, meanwhile it is understood as Uniform Resource Identifier. A URI is used to identify an abstract or physical resource and can consist of five parts, but only scheme and path are mandatory: scheme://[authority]/path?[query]#[fragment] .


Uniform Resource Locator identify a resource by the access method to be used. For example, the eAQUA website is made accessible using, and an email address is recognized using the scheme


Publications can be permanently and reliably cited on the Web by using unique, location-independent identifiers URNs (Uniform Resource Name) instead of URLs. URNs are URIs with the scheme urn:namensraum:namensraum-spezifischerTeil, so e.g. urn:nbn:de:101-2012121200 für das Werk “Policy für die Vergabe von URNs im Namensraum urn:nbn:de (Version 1.0, Stand: 29. November 2012)” of the German National Library.


Unicode Transformation Format. Characters are mapped to a sequence of bytes for the purpose of electronic processing. Common encoding methods are

  • UTF-8 – Between 1 and 4 bytes. The code points 0 to 127, which correspond to the ASCII character set, are encoded using seven bits. The eighth introduces a longer Unicode character, which occupies the following 1-3 bytes. UTF-8 stores Latin characters most efficiently.
  • UTF-16 – One or two 16-bit units (2 or 4 bytes) are used to encode a character.
  • UTF-32 – Always encodes 32 bits (4 bytes). Easiest to handle due to the fixed length, but requires more memory.


Word stem reduction

Also called stemming, stem form reduction, or normal form reduction. Different morphological variants of a word are traced back to their common stem.



Binary file format of Microsoft Excel, which was exclusively in use until 2007.


Extensible Markup Language is a markup language for representing structured data in text form. It is mainly used as an exchange format between different computer systems.

Beginning of a TEI-XML document from the Perseus Digital Library:

<?xml version="1.0"?>
  PUBLIC "-//TEI P4//DTD Main DTD Driver File//EN" "" [
<!ENTITY % PersProse PUBLIC "-//Perseus P4//DTD Perseus Prose//EN" "" >
   <teiHeader type="text" status="new">
            <title>De liberis educandis</title>
            <title type="sub">Machine readable text</title>
            <author n="Plut.">Plutarch</author>
            <editor role="editor" n="Teubner">Gregorius N. Bernardakis</editor>&responsibility;&fund.NEH;</titleStmt>



The World Wide Web Consortium standardizes techniques on the World Wide Web. It was founded in 1994 at MIT.


Zipf's law

The law states that if you order the types of a text according to their frequency f and assign them each a rank r, then the product of f and r gives each a constant value k.

[OLIVER 93].Oliver, Ian. Programming Classics: Implementing the World's Best Algorithms. Prentice Hall PTR New York, 1993.