Naming Configuration Input Normalization Token Counting Measure Clustering Export
stylo-ah-online
Use the tool stylo-ah-online to analyse your corpus of texts. It is lightweight, in-browser software to perform text analysis in terms of token, token statistics, distances and clustering. Usage:
- Set the browser location to save to. Enable multiple file downloads.
- USE CHROME (at the moment this is the dev environment, other browsers will be supported by a later version)
- First do the configuration, than open the files (multiple selection), that will run the analysis.
- Open up the web console, to get the messages, about the progress or errors of the analysis process.
- Manual and further information.
- Email to the admin.
- Browse the code on github.


Version: Beta 05.2024-04-25

Naming

Type (Select/give a type, if this is not given by file ending.)
Subject (Name the subject of the file.)
State (Select/give a state, if this is not given by file ending.)
ID (Give the ID of the file.)
Date (The actual date, erase to reset.)
Version (Choose a version of the file.)
Author name (Fill in the name of the author.)
File ending (Provide a file ending.)


(Log entry of naming section.)

Configuration

Config for text analysis

(This will run the analysis again, if you made changes to the settings below.)
(Choose a existing stylo-online configuration file to set the configuration for stylo-online.)

GEN config for SERIAL text analysis

(This will generate config files for each token version (1-3 gram), but leaves the other configuration unchanged.)
(This will generate config files for each counting method, but leaves the other configuration unchanged.)
(This will generate config files for each measure, but leaves the other configuration unchanged.)

Config for stylo-ah-online display

Display size of results (Checked: Just show a sample of the results (1000 token/signs). Not checked: Results will be shown in full length.)

Delete

(This will delete the configuration.)
(This will delete the stored files and the results of the analysis.)
(This will reset stylo ah online to start an new analysis.)

Input / Replication

(Just choose the CORPUS FILES, than the selection below will be applyed (start analysis). Data in the data base will be overwritten.)
(This will RUN the analysis AGAIN, if you made changes to the settings below. Data is taken from the data base.)
(Choose MULTIPLE config files to perfom multiple analysis on one corpus.)
(Select some data from the database to rerun on. )

Note


(Log entry for the input section.)

NORMALIZATION

(Please check http://ecomparatio.net/~khk/NORM-DECOMP-DIST/textnorm.html to see some examples of how the selection would work.)

Word masking / stop words

None
Use Word masking (Give back the string without stop words.)
Use positiv stop word list (Give back the string of only stop words.)
(Check this to apply stop word removal.)
(Choose a existing stop word file (CSV format, divider: ;;).)

Sign equalization

Disambiguate diacritica
Disambiguate dashes
Text output latin u-v (repaces all u with v)
Text output latin j-i (repaces all j with i)
Iota sub to ad (takes greek utf8 string and repleces iota subscriptum with iota ad scriptum)
Text output tailing sigma uniform (equalize tailing sigma)
Text output without diacritics (replaces diacritics)
Text output without some signs (delete some to the programmer unknown signs: †, *, ⋖, #, §, ⁑)
Text output without ligature (takes a string, return string with ligatures turned to single letters)
Text output equal case (input a string and get it back with all small case letters)
Text output no brackets (input string and get it back with no brackets)

Markup / Format

Without markup (input a string and get it back with markup (html / xml) removed)
Delete punctuation (takes string and returns the string without punctuation)
Without newline (input string and get it back with linebreaks removed)

Word level conversions

Elision expansion (elusion it will be expanded)
Alpha privativum / copulativum (takes utf8 greek and splits the alpha privativum and copulativum from wordforms)
Text output without numbering (takes string, return string without the edition numbering i.e. [2])
Text output no hypenation (removes hyphenation)

Combinations

(Select one of the combined normalization functions (none of the single steps is used).)

Translitteration

(Select one of the transliterations.)

Note


(Log entry for the normalization section.)

FEATURES / DECOMPOSITION / TOKEN

(The word level decomposition and the gram decomposition will be combined. Check http://ecomparatio.net/~khk/NORM-DECOMP-DIST/zerl.html for some examples to see how decomposition will work.)

Word level decomposition

None
Without consonants (string without consonants)
Without vowels (string without vowels)
Small words (string with just small words (stopwords))
Big words (string with just big words (not stopwords))

General N-Gram decomposition

Gram-level
N (gram-size, set to one means for example word statistics)
M (gap-size, for gap n-gram)
Padding (used for sign level of words)
(check this to set the increment of the gram building to gram size)
Size of vocabulary (used for the byte pair encoding tokenizer)

Note


(Log entry for the token section.)

SELECTION / COUNTING

(select the counting methode)

Most frequent token / words

MFW per text get the most frequnet word from each input text
Min value (position in frequency ordered list)
Max value (position in frequency ordered list)

Culling (per corpus)

Min value (per cent of presents of a token in all texts)
Max value (per cent)

Text length normalization

Compare fractions of texts (smallest text gives the length; none other method is applied, only on first run on corpus).
Fraction length (if a number is provided, than every text will be split into frations of that size on sign level).
Count of fractions (if a number is provided, than only that number or less text fractions will be used).

Note


(Log entry for the counting section.)

MEASURE SELECTION

(Please check http://ecomparatio.net/~khk/measuredisplay to see a discription and comparison of the measures usable. See http://ecomparatio.net/~khk/NORM-DECOMP-DIST/index.php for some examples.)

Measure selection
Measure order (the order of the measure, additional to minkowski, burrows delta, argamon linear delta, eders delta, argamons quadratic delta, wasserst 1d, gower)

Note


(Log entry for the comparing section.)

CLUSTERING

Cluster method
Hierarchical cluster linkage method

Display options

Offset pixels (set the pixel distance for the lables in the visualization; used in distance heatmap and cluster visualization)
Width of diagram (set the width (pixel) of the diagram, the space for the lables is not included)
Height of diagram (set the height in pixels of the visualization, space for labels not included)

Note


(Log entry for the cluster section.)

EXPORT

Export Configuration / Presets

Export config as text file
Export stop words as CSV file

Multi file export

Export raw text input (as text file, renamed)
Export normed string (as text file)
Export decomposition (as text file)
Export frequency of token (as CSV file)

Single file result export

Export distance matrix (as text file; usable as gephi import)
Export cluster analysis (as nodes and edges file; for example as gephi import)
Export cluster visualization (as SVG)

Note


(Log entry for the export section.)
University Trier / Ancient History Trier / eAQUA digital resources /