FIXME This page was translated from German into English using DeepL. Please help completing the translation.
(remove this paragraph once the translation is finished)

What does ... mean?

A

AJAX

Asynchronous JavaScript And XML is a programming concept that exchanges data between browser and server without reloading the entire web page.

Apache license

The Apache license is a free software license from the Apache Software Foundation that does not have a copyleft notice.

API

The application programming interface (programming interface) is understood to be the part of a software that is made available so that other systems can communicate with the software.

B

Beta code ancient Greek

Greek Beta Code is the 7-bit secure encoding using the US ASCII character set. Each diacritical character is represented by a separate character following the letter (exception: for uppercase letters preceding the letter). Beta Code does not distinguish between lowercase and uppercase letters; uppercase letters are preceded by * asteriskos (Greek ἀστερίσκος). Some projects use only upper case letters (e.g. TLG), others only lower case letters (e.g. the Perseus Project). See also: Betacode transcription table ancient Greek. ἀστερίσκος in beta code ancient greek:

a)steri/skos

Big Data

Analysis of large amounts of data from various sources with the aim of generating economic value from it.

BT

Benedictus Gotthelf Teubner founded the publishing house, B.G. Teubner, in Leipzig in 1811, where the Bibliotheca scriptorum Graecorum et Romanorum Teubneriana (Bibliotheca Teubneriana), a nearly complete series of scholarly editions of Greek and Latin literature from antiquity to modern times, was published from 1849.

BTL

The Bibliotheca Teubneriana Latina Online provides electronic access to all editions of Latin texts published in the Bibliotheca Teubneriana (without preface or critical apparatus).

C

CC

The term Creative Commons (CC) is understood to be a collection of licenses with which an author can grant rights of use for his work. Through the combination of the rights modules

  • by Attribution
  • nc Non-Commercial
  • nd No Derivatives
  • sa Share Alike

the release can be graduated according to the wishes of the author.

Copyleft

Copyleft is a clause in usage licenses that specifies that all modifications to a work are permitted only if they are distributed under substantially the same license terms.

CSV

The text-based file format CSV (Comma-separated values) is a form of DSV (Delimiter-separated values). The data is stored in tabular form, i.e. two-dimensional. Each row is a record. Fields are separated by comma or semicolon. Parallel passages from TATIANUS (TLG) in CSV format:

example.csv
Original Sentence; Reference; Original Author; Original Publication; Original DC; Author; Publication; DC; Similarity; Dating; Author Name; Author Epiteths; Author ID; AuthorID-WorkID
"Τυρρηνοὶ σάλπιγγα, χαλκεύειν Κύκλωπες, καὶ ἐπιστολὰς συντάσσειν ἡ Περσῶν ποτε ἡγησαμένη γυνή, καθά ϕησιν Ἑλλάνικος:";"εὗρεν) ἡ Περσῶν ποτε ἡγησαμένη γυνή, καθά ϕησιν Ἑλλάνικος:";"TATIANUS  Apol.  [1766]";"Oratio ad Graecos, ed. E.J. Goodspeed, Die ältesten  Apologeten. Göttingen: Vandenhoeck & Ruprecht, 1915: 268-305.  (Cod: 10,694: Apol., Orat.)  ";"1T/2/2 to 1T/2/4 (Schema:Chapter/section/line )";"HELLANICUS  Hist.  [0539]";"Fragmenta, FGrH #4, #323a, #601a, #608a, #645a, #687a:  1A:107-152, *6-*8 addenda; 3B:41-50, 732-733; 3C:1-2, 190,  412-414.  fr. 124b (PSI 1173): vol. 1A, p. *6 addenda.  fr. 189 (P. Oxy. 10.1241): vol. 1A, p. 150.  fr. 201 bis (P. Giss. 307v): vol. 1A, p. *7 addenda.  (Pap: 18,331: Hist., Myth.)  ";"1a,4,F/179a/3 to 1a,4,F/179a/4 (Schema:Volume-Jacoby#-F//fragment/line )";67;-450.5;"HELLANICUS ";"Hist. ";"0539";"0539-002"
"Τυρρηνοὶ σάλπιγγα, χαλκεύειν Κύκλωπες, καὶ ἐπιστολὰς συντάσσειν ἡ Περσῶν ποτε ἡγησαμένη γυνή, καθά ϕησιν Ἑλλάνικος:";"εὗρεν) ἡ Περσῶν ποτε ἡγησαμένη γυνή, καθά ϕησιν Ἑλλάνικος:";"TATIANUS  Apol.  [1766]";"Oratio ad Graecos, ed. E.J. Goodspeed, Die ältesten  Apologeten. Göttingen: Vandenhoeck & Ruprecht, 1915: 268-305.  (Cod: 10,694: Apol., Orat.)  ";"1T/2/2 to 1T/2/4 (Schema:Chapter/section/line )";"HELLANICUS  Hist.  [0539]";"Fragmenta, FGrH #4, #323a, #601a, #608a, #645a, #687a:  1A:107-152, *6-*8 addenda; 3B:41-50, 732-733; 3C:1-2, 190,  412-414.  fr. 124b (PSI 1173): vol. 1A, p. *6 addenda.  fr. 189 (P. Oxy. 10.1241): vol. 1A, p. 150.  fr. 201 bis (P. Giss. 307v): vol. 1A, p. *7 addenda.  (Pap: 18,331: Hist., Myth.)  ";"3c,687a,F/8a/3 to 3c,687a,F/8a/3 (Schema:Volume-Jacoby#-F//fragment/line )";67;-450.5;"HELLANICUS ";"Hist. ";"0539";"0539-002"

CTS

The CTS (Canoncial Text Services) notation system, as part of the CITE architecture, provides a network-based service for identifying classical texts based on URNs. CTS URNs are divided into five parts separated by colons: urn:ctn:ctnNameSpace:WorkIdentifier:PassageIdentifier.

D

DCB

Digital Classics Books is an open access monograph series that publishes work in ancient studies and related fields in conjunction with the application or development of methods from the Digital Humanities.

DCO

Digital Classics Online is an opens access journal that publishes papers from ancient studies and related fields in connection with the application or development of methods from the Digital Humanities.

DNP

Der Neue Pauly. Enzyklopädie der Antike.

See also: RE

DOI

Digital Object Identifier (DOI) have been coordinated by the International DOI Foundation (IDF) since 1998. With DOI, both physical, digital and abstract objects can be permanently and uniquely identified and localized. The scheme, which always starts with 10, is preceded by a doi for identification: doi:10.ORGANISATION/ID.

An example:

Ch. Schubert (ed.): Working Papers Contested Order (NO. 10): Das Portal eAQUA – Neue Methoden in der geisteswissenschaftlichen Forschung V
DOI: http://dx.doi.org/10.11588/ea.2013.2	

E

Editing distance

Entropy

Entropy in information theory indicates how many bits are needed on average to encode a value of a random variable as an event (as part of a message). The more bits needed, the higher the entropy and the more difficult it is to predict an event.

F

FDM

Forschungsdaten-Management (Engl.: Research Data Management)

G

GND

All norm data of the German National Library are made available by means of Linked Data Service as Gemeinsame Normdatei. The Common Norm File incorporates the Norm File of Persons, which in turn contains all records of the Personal Names of Antiquity (PAN).

GPL

The GNU General Public License (also GPL or GNU GPL) is a license that allows to use, distribute, study or also modify a software for free. All programs derived from the software must also be licensed under the terms of the GPL (copyleft).

H

HTML

Hypertext Markup Language is a text-based markup language for the structured representation of content in electronic documents.

J

JPEG

Various methods of image compression presented by the Joint Photographic Experts Group in 1992 in the form of a standard are summarized under the term JPEG.

JSON

JavaScript Object Notation is a compact data format designed for transferring data between client and server. Auszug von TLG-Metadaten in JSON:

example.json
{
"corpora_author_id":2064,
"author":"ACACIUS",
"works":
  [
  {"corpora_work_id":"002","work":"Fragmenta in epistulam ad Romanos (in catenis)"}
  ]
},
{
"corpora_author_id":1832,
"author":"ACESANDER",
"works":
  [
  {"corpora_work_id":"001","work":"Fragmenta "},
  {"corpora_work_id":"002","work":"Fragmentum (P. Oxy. 32.2637)"}
  ]
}

K

KLP

The Kleine Pauly.

See also: RE

Kookkurrenz

The co-occurrence of two lexical units, e.g. words, within a higher-level segment, e.g. sentence, is called co-occurrence in general linguistics.

L

Lemmatization

Reduction to the basic form of a word, i.e. the form under which the term can be found in a reference work.

Levenshtein Distance

Number of insert, delete and replace operations to transform one string into another.

See also: Editing distance for parallel passage search.

M

Metadaten

Metadata, or metainformation, is generally data that contains information about features that are not part of the data itself. In a corpus analysis, for example, all bibliographic information is treated as metadata.

MIT License

The MIT License (also X License or X11 License) is a license for software use originating from the Massachusetts Institute of Technology that permits the software to be used, copied, modified, merged, published, distributed, sublicensed, and/or sold, provided that a copyright notice and the permission notice accompany the copies.

ML

A Markup Language or markup language describes the content of a document or the procedure required to process the data. HTML, XML or LaTeX are markup languages.

See also: Migne Latinus

N

N3

Notation 3 is a formal language that can be used, for example, as syntax for RDF data:

<#Tim Berners-Lee> <#entwickelte> <#N3> .

N-Gramm

Zerlegung eines Textes in einzelne Fragmente der Anzahl N. Die Fragmente können Buchstaben, Phoneme oder auch Wörter sein. In der Computerlinguistik finden sich oft Bi- oder Trigramme aus Zeichen (Buchstaben und/oder Satzzeichen).

NER

Named Entity Recognition – Proper name recognition. Terms of a text are assigned to certain classes, e.g. places or persons.

Normalization

In the context of written language, the term normalization is used unspecifically for a bundle of measures that all aim to bring about a uniform, formal and syntactic representation.

P

PAN

Personennamen der Antike is the standardization of personal names of the Greek and Latin speaking antiquity, originally in book form, meanwhile published electronically.

See GND

Parser

A parser is a program that parses an input and converts it into a format that can be used for further processing.

Persistent Identifier

An artificially assigned characteristic for the unique, permanent identification of a subject/object is called a persistent identifier (persistent ID or PID).

PHI 5

The Latin Library Texts of the Packard Humanities Institute (PHI) version 5.3 is a CD-Rom with Latin full texts and Bible versions up to the second century A.D. Meanwhile the texts can be viewed online: Classical Latin Texts.

PHI 7

One of the Packard Humanities Institute's (PHI) longest-running projects is a comprehensive database of all ancient Greek inscriptions, published as a licensed CD-ROM under the title: PHI CD ROM #7: Greek Inscriptions. Together with Cornell University and Ohio State University, it makes the corpus available online: Searchable Greek Inscriptions.

PL

The Patrologia Latina (also ML für Migne Latinus) is the abbreviated form of the print series edited by Jacques-Paul Migne on the Latin writings of ecclesiastical writers from the beginnings to the time of Innocent III (1161–1216).

PNG

Portable Network Graphics is a graphics format that can compress without loss. It was developed as a free replacement for Graphics Interchange Format (GIF) and supports transparency via alpha channel.

PoS

Part-of-Speech Tagging assigns the words of a text to parts of speech.

Probability distribution

The probability distribution is the theoretical counterpart to the empirically determinable frequency distribution. It describes the probabilities with which a random variable assumes its possible values.

PURL

A Persistent Uniform Resource Locator does not refer directly to a resource in the form of a URL, but to a resolver that provides the current Internet URL. DOI or URN exist alternatively.

R

RE

RE is the abbreviation of Paulys Realencyclopädie der classischen Altertumswissenschaft (also called Pauly-Wissowa). The Encyclopedia of Antiquity was published from 1893 to 1987 and was conceived as a complete new edition of the so-called “Ur-Pauly”, the Real-Encyclopädie der classischen Alterthumswissenschaft (1837–1864) founded by August Friedrich Pauly. The RE consists of 68 half volumes, 15 supplement volumes and an index of the supplements and supplement volumes.

A compact and also for private persons affordable edition appeared between 1964 and 1975 with the five volumes Der kleine Pauly (KIP).

Der Neue Pauly. Enzyklopädie der Antike (DNP, vereinzelt auch NP) has been published by J. B. Metzler Verlag since 1996. In addition to classical antiquity as the main focus, the New Pauly has also published volumes on the history of reception and science.

Resolver

In computer science, a resolver is generally referred to as a name resolution software. A link resolver resolves metadata, e.g. in the form of a URN, into local inventory data and provides the matching hyperlink.

RDA

Resource Description and Access designates a new standard for cataloguing resources in libraries, archives and museums as the successor to the Anglo-American Cataloguing Rules (AACR2).

RDF

The Resource Description Framework was developed by the World Wide Web Consortium (W3C) to describe metadata. It is now considered an essential component of the so-called semantic Web. Statements in the RDF model are formed as triples of subject, predicate, and object, mostly in the form of XML or N3.

S

Significance

In statistics, significance is a measure of the probability of a systematic relationship between variables.

Similar-Text

An algorithm that calculates the similarity of two texts based on characters and using the editing distance: sim = { n_{ab} * 2 } / { n_a + n_b }1).

SQL

Database language in relational databases. SQL (General language use: Structured Query Language) distinguishes three categories of commands:

  • Data Manipulation Language (DML)
  • Data Definition Language (DDL)
  • Data Control Language (DCL).

Stop words

A list of words that should not be included when processing a text.

SVG

Scalable Vector Graphics is based on XML and describes two-dimensional vector graphics.

See XML

T

TEI

The document format of the same name, developed by the Text Encoding Initiative, is based on XML in the current version P5 and has become the de facto standard for encoding printed works in the humanities.

See XML

TIFF

Tagged Image File Format is an image file format used especially for high-resolution images in printable, lossless quality.

TLG

The Thesaurus Linguae Graecae® (TLG®) is a research program established in 1972 at the University of California, Irvine. It collected and digitized most of the literary texts in Greek from Homer (from about the 8th century B.C.) to the fall of Byzantium (A.D. 1453). Initially, the texts were distributed on CD-ROM. In the meantime, they can be received online: TLG - Home.

Tokenization

In computational linguistics, this refers to the decomposition into segments at the word level.

TSV

The text-based file format TSV (Tab-Separated Values) is a form of DSV (Delimiter-separated values). The data is stored in tabular form, i.e. two-dimensional. Each row is a record. Fields are separated by means of Tab-Stop.

See CSV

U

URI

According to RFC 1630 by T. Berners-Lee from 1994, URI is an acronym for Universal Resource Identifiers, meanwhile it is understood as Uniform Resource Identifier. A URI is used to identify an abstract or physical resource and can consist of five parts, but only scheme and path are mandatory: scheme://[authority]/path?[query]#[fragment] .

URL

Uniform Resource Locator identify a resource by the access method to be used. For example, the eAQUA website is made accessible using http://www.eaqua.net, and an email address is recognized using the scheme mailto:max.mustermann@example.org.

URN

Publications can be permanently and reliably cited on the Web by using unique, location-independent identifiers URNs (Uniform Resource Name) instead of URLs. URNs are URIs with the scheme urn:namensraum:namensraum-spezifischerTeil, so e.g. urn:nbn:de:101-2012121200 für das Werk “Policy für die Vergabe von URNs im Namensraum urn:nbn:de (Version 1.0, Stand: 29. November 2012)” of the German National Library.

UTF

Unicode Transformation Format. Characters are mapped to a sequence of bytes for the purpose of electronic processing. Common encoding methods are

  • UTF-8 – Between 1 and 4 bytes. The code points 0 to 127, which correspond to the ASCII character set, are encoded using seven bits. The eighth introduces a longer Unicode character, which occupies the following 1-3 bytes. UTF-8 stores Latin characters most efficiently.
  • UTF-16 – One or two 16-bit units (2 or 4 bytes) are used to encode a character.
  • UTF-32 – Always encodes 32 bits (4 bytes). Easiest to handle due to the fixed length, but requires more memory.

W

Word stem reduction

Also called stemming, stem form reduction, or normal form reduction. Different morphological variants of a word are traced back to their common stem.

X

XLS

Binary file format of Microsoft Excel, which was exclusively in use until 2007.

XML

Extensible Markup Language is a markup language for representing structured data in text form. It is mainly used as an exchange format between different computer systems.

Beginning of a TEI-XML document from the Perseus Digital Library:

<?xml version="1.0"?>
<!DOCTYPE TEI.2
  PUBLIC "-//TEI P4//DTD Main DTD Driver File//EN" "http://www.tei-c.org/Guidelines/DTD/tei2.dtd" [
<!ENTITY % TEI.XML "INCLUDE">
<!ENTITY % PersProse PUBLIC "-//Perseus P4//DTD Perseus Prose//EN" "http://www.perseus.tufts.edu/DTD/1.0/PersProse.dtd" >
%PersProse;
]>
<TEI.2>
   <teiHeader type="text" status="new">
      <fileDesc>
         <titleStmt>
            <title>De liberis educandis</title>
            <title type="sub">Machine readable text</title>
            <author n="Plut.">Plutarch</author>
            <editor role="editor" n="Teubner">Gregorius N. Bernardakis</editor>&responsibility;&fund.NEH;</titleStmt>

W

W3C

The World Wide Web Consortium standardizes techniques on the World Wide Web. It was founded in 1994 at MIT.

Z

Zipf's law

The law states that if you order the types of a text according to their frequency f and assign them each a rank r, then the product of f and r gives each a constant value k.

1)
[OLIVER 93].Oliver, Ian. Programming Classics: Implementing the World's Best Algorithms. Prentice Hall PTR New York, 1993.