Naming Configuration Input Normalization Token Counting Measure Clustering Export
stylo-ah-online
Use the tool stylo-ah-online to analyse your corpus of texts. It is lightweight, in-browser software to perform text analysis in terms of token, token statistics, distances and clustering. Usage:
- Set the browser location to save to. Enable multiple file downloads.
- USE CHROME (at the moment this is the dev environment, other browsers will be supported by a later version)
- First do the configuration, than open the files (multiple selection), that will run the analysis.
- Open up the web console, to get the messages, about the progress or errors of the analysis process.
- Manual and further information.
- Email to the admin.
- Browse the code on github.


Version: Beta 05.2024-09-10

Naming (see manual)

Type (Select/give a type, if this is not given by file ending. see manual)
Subject (Name the subject of the file. see manual)
State (Select/give a state, if this is not given by file ending. see manual)
ID (Give the ID of the file. see manual)
Date (The actual date, erase to reset. see manual)
Version (Choose a version of the file. see manual)
Author name (Fill in the name of the author. see manual)
File ending (Provide a file ending. see manual)


(Log entry of naming section.)

Configuration (see manual)

(Please check github to find some template config files.)

Config for text analysis

(This will save the configuration of the whole tool to file.)
(Choose a existing stylo-online configuration file to set the configuration for stylo-online. see manual)
(Choose MULTIPLE config files to perfom multiple analysis on one corpus. see manual)

GEN config for SERIAL text analysis (see manual)

(This will generate config files for each token version (1-3 gram), but leaves the other configuration unchanged.)
(This will generate config files for each counting method, but leaves the other configuration unchanged.)
(This will generate config files for each measure, but leaves the other configuration unchanged.)

Config for stylo-ah-online display

Reduce size of intermediate result for display (Checked: Just show a sample of the results (1000 token/signs). Not checked: Intermediate results will be shown in full length.)

Delete (see manual)

(This will delete the configuration.)
(This will delete the stored files and the results of the analysis.)
(This will reset stylo ah online to start an new analysis.)
(Load new version of software omitt browser cache.)

Input (see manual)

(Just choose the CORPUS FILES, than the selection below will be applyed (start analysis). Data in the data base will be overwritten.)
(This will RUN the analysis AGAIN, if you made changes to the settings below. Data is taken from the data base.)
(Select some data from the database to rerun on. )

Note


(Log entry for the input section.)

NORMALIZATION (see manual)

(Please check http://ecomparatio.net/~khk/NORM-DECOMP-DIST/textnorm.html to see some examples of how the selection would work.)

Word masking / stop words (see manual)

None
Use Word masking (Give back the string without stop words.)
Use positiv stop word list (Give back the string of only stop words.)
(Check this to apply stop word removal.)
(Choose a existing stop word file (CSV format, divider: ;;).)

Sign equalization (see manual)

Disambiguate diacritica
Disambiguate dashes
Text output latin u-v (replace all v with u)
Text output latin j-i (replace all j with i)
Iota sub to ad (takes greek utf8 string and repleces iota subscriptum with iota ad scriptum)
Text output tailing sigma uniform (equalize tailing sigma)
Text output without diacritics (replace diacritics)
Without modern diacritics (replace diacritics of modern languages; performance issues!)
Text output without some signs (delete some to the programmer unknown signs: †, *, ⋖, #, §, ⁑)
Text output without ligature (takes a string, return string with ligatures turned to single letters)
Text output equal case (input a string and get it back with all small case letters)
Text output no brackets (input string and get it back with no brackets)

Markup / Format (see manual)

Without markup (input a string and get it back with markup (html / xml) removed)
Delete punctuation (takes string and returns the string without punctuation)
Without newline (input string and get it back with linebreaks removed)

Word level conversions (see manual)

Elision expansion (elusion it will be expanded)
Alpha privativum / copulativum (takes utf8 greek and splits the alpha privativum and copulativum from wordforms)
Text output without numbering (takes string, return string without the edition numbering i.e. [2])
Text output no hypenation (removes hyphenation)

Combinations (see manual)

(Select one of the combined normalization functions (none of the single steps is used).)

Translitteration (see manual)

(Select one of the transliterations.)

Note


(Log entry for the normalization section.)

FEATURES / DECOMPOSITION / TOKEN (see manual)

(The word level decomposition and the gram decomposition will be combined. Check http://ecomparatio.net/~khk/NORM-DECOMP-DIST/zerl.html for some examples to see how decomposition will work.)

Word level decomposition (see manual)

None
Without consonants (string without consonants)
Without vowels (string without vowels)
Small words (string with just small words (stopwords))
Big words (string with just big words (not stopwords))

General N-Gram decomposition (see manual)

Gram-level
N (gram-size, set to one means for example word statistics)
M (gap-size, for gap n-gram)
Padding (used for sign level of words)
(check this to set the increment of the gram building to gram size)
Size of vocabulary (used for the (not byte) pair encoding tokenizer)

Note


(Log entry for the token section.)

SELECTION / COUNTING (see manual)

(select the counting methode)

Most frequent token / words (see manual)

MFW per text get the most frequnet word from each input text
Min value (position in frequency ordered list)
Max value (position in frequency ordered list)

Culling (per corpus) (see manual)

Min value (per cent of presents of a token in all texts)
Max value (per cent)

Text length normalization (see manual)

Compare fractions of texts (smallest text gives the length; none other method is applied, only on first run on corpus).
Fraction length (if a number is provided, than every text will be split into frations of that size on sign level).
Count of fractions (if a number is provided, than only that number or less text fractions will be used).

Note


(Log entry for the counting section.)

MEASURE SELECTION (see manual)

(Please check http://ecomparatio.net/~khk/measuredisplay to see a discription and comparison of the measures usable. See http://ecomparatio.net/~khk/NORM-DECOMP-DIST/index.php for some examples.)

Measure selection
Measure order (the order of the measure, additional to minkowski, burrows delta, argamon linear delta, eders delta, argamons quadratic delta, wasserst 1d, gower)

Note


(Log entry for the comparing section.)

CLUSTERING (see manual)

Cluster method
Hierarchical cluster linkage method

Display options

Offset pixels (set the pixel distance for the lables in the visualization; used in distance heatmap and cluster visualization)
Width of diagram (set the width (pixel) of the diagram, the space for the lables is not included)
Height of diagram (set the height in pixels of the visualization, space for labels not included)

Note


(Log entry for the cluster section.)

EXPORT (see manual)

Export Configuration / Presets

Export config as text file
Export stop words as CSV file

Multi file export

Export raw text input (as text file, renamed)
Export normed string (as text file)
Export decomposition (as text file)
Export frequency of token (Table of frequencies, as CSV file)

Single file result export

Export distance matrix (as text file; usable as gephi import)
Export cluster analysis (as nodes and edges file; for example as gephi import)
Export cluster visualization (as SVG / PNG)

Raster image export


Use PNG (Use PNG image format instead of SVG.)

Note


(Log entry for the export section.)
University Trier / Ancient History Trier / eAQUA digital resources / Imprint /