Classical Languages¶
Some tools to support classics related subject matter, such as retrieving classical texts and searching throuhg them, analysing grammar, inflectng words (declensions, conjugations).
The intention, as with the other notebooks in this collection, is to explore ways in which we might create educational resources that are “reproducible with modification” through making available the means of production of various analyses, diagrams, etc along with the produced resource.
A secondary benefit is that by automating the generation of particular assets or examples, it becomes easier for authors to make use of them, which may open up new teaching lines. A tertiary benefit is that learners may use the same production methods to allow them to explore the topics themselves.
cltk
¶
cltk
, the *Classical Language Toolkit*, is a natural language processing (NLP) package designed for use with the languages of Ancient, Classical, and Medieval Eurasia (esp. Greek and Latin). I assume it is based on nltk.
A selection of tutorial notebooks can be found at cltk/tutorials.
cltk provides access to a variety of classical texts in a variety of languages, and as such provides a way for learners to access such texts themselves, if we can find a way of accessing a reliable index to them, or search through metadata provided for them.
The natural language processing tools in the package make it easy to search texts, as well as analyse them in some languages.
There are also language specific tools, such as a declension generator in Latin, that might be useful for helping check declensions and conjugations, or display particular person/tense combinations for a particular word.
OpenLearn units to explore:
%%capture
try:
import cltk
except:
%pip install matplotlib
%pip install cltk
Note
Corproa in a wide range of classical languages are available. For a list, see here.
We can obtain a list of available Ancient Greek corpora:
from cltk.data.fetch import FetchCorpus
FetchCorpus('grc').list_corpora # Latin: lat, Greek: grc
['grc_software_tlgu',
'grc_text_perseus',
'phi7',
'tlg',
'greek_proper_names_cltk',
'grc_models_cltk',
'greek_treebank_perseus',
'greek_treebank_gorman',
'greek_lexica_perseus',
'greek_training_set_sentence_cltk',
'greek_word2vec_cltk',
'greek_text_lacus_curtius',
'grc_text_first1kgreek',
'grc_text_tesserae']
Or Latin corpora:
corpus_downloader = FetchCorpus('lat')
corpus_downloader.list_corpora
['lat_text_perseus',
'lat_treebank_perseus',
'lat_text_latin_library',
'phi5',
'phi7',
'latin_proper_names_cltk',
'lat_models_cltk',
'latin_pos_lemmata_cltk',
'latin_treebank_index_thomisticus',
'latin_lexica_perseus',
'latin_training_set_sentence_cltk',
'latin_word2vec_cltk',
'latin_text_antique_digiliblt',
'latin_text_corpus_grammaticorum_latinorum',
'latin_text_poeti_ditalia',
'lat_text_tesserae',
'cltk_lat_lewis_elementary_lexicon']
We can download a corpus from the list of available corpora associated with the selected language:
corpus_downloader.import_corpus('lat_text_latin_library')
Note
By default, the data is download to ~/cltk_data
If we download the Latin corpora, we can find corpus files in:
~/cltk_data/lat/text/lat_text_latin_library/
from cltk import NLP
# Load the default Pipeline for Latin
cltk_nlp = NLP(language="lat")
𐤀 CLTK version '1.0.14'.
Pipeline for language 'Latin' (ISO: 'lat'): `LatinNormalizeProcess`, `LatinStanzaProcess`, `LatinEmbeddingsProcess`, `StopsProcess`, `LatinNERProcess`, `LatinLexiconProcess`.
cltk_nlp.pipeline.processes
[cltk.alphabet.processes.LatinNormalizeProcess,
cltk.dependency.processes.LatinStanzaProcess,
cltk.embeddings.processes.LatinEmbeddingsProcess,
cltk.stops.processes.StopsProcess,
cltk.ner.processes.LatinNERProcess,
cltk.lexicon.processes.LatinLexiconProcess]
path="/Users/tonyhirst/cltk_data/lat/text/lat_text_latin_library/vergil/aen1.txt"
with open(path) as f:
aeneid_1 = f.read()
aeneid_1[1000:1200]
'Tyrias olim quae verteret arces; 20 \nhinc populum late regem belloque superbum \nventurum excidio Libyae: sic volvere Parcas. \nId metuens, veterisque memor Saturnia belli, \nprima quod ad Troiam pro '