Masterfile

1. Scanned book

A scan of the whole book can be found at archive.org :

Agheyisi, Rebecca N. - An Ẹdo-English dictionary (1986)

2. Digitized dictionary: masterfile

The masterfile contains the dictionary entries in a raw format from which other formats (ex. html, epub, database) can be generated.

The masterfile is a UTF-8 text file sandwiched between <pre> and </pre> HTML-markers and completed with the necessary HTML code to make it readable by a HTML-reader. Masterfile for the dictionary:

There are 3894 entries in the dictionary. Each entry is one line in the masterfile. The largest line in the masterfile has 1852 bytes (entry for rhie).

This masterfile cannot be read from inside the epub-reader. Unzip the epub file to access it. It is a Linux text file.

3. Masterfile format

In the following 'masterfile' refers to the text of the masterfile without the HTML additions.

The masterfile is a text-file in Linux-format, that is, the lines are ended by a 'Line Feed' (ASCII x0A) rather than 'Carriage Return' (ASCII x0D) + 'Line Feed' (ASCII x0A) as in DOS-format.

The masterfile contain characters that do not appear in the dictionary itself. These are the characters: '$', '£', '#', '\', '%', '|', '§', '_', '^'. These characters contain formatting information.

Each page of the dictionary (pages 1-169 of the book) is identified by a page-markers: '$Page-nnn$' where 'nnn' is '001',...,'169'. These page-markers appear in the masterfile before the entries in the page. The first page-marker appears as the first line in the masterfile. The other page-markers e appended to the last entry of the previous page.

The end of a letter in the alphabet is indicated by markers '£BB£', '£DD£, etc. These markers are appended to the last entry of the previous letter.

Each entry in the masterfile starts with '# '. Sometimes a column in the dictionary starts in the middle of an entry e.g. '$Page-001$'. This is marked with '#-' and occurs 194 times.

The end of a column-line is marked with '\'. Sometimes, a column-line ends with a hyphen. These are marked with '-\'. Most of the time the hyphen indicates word-splitting at the end of a line. But in many cases the hyphen is part of the word itself, both in English and in Edo. For example, 'abọ-ukpọn' in the entry for abọ. This occurs 83 times out of 751. These cases are marked with '--\'. The decision whether the hyphen is part of the word or not has been made by examining each case individually.

The end of the left column is marked with '%%'. The end of the right column is marked with the page-marker.

The dictionary contains many examples consisting of en Edo phrase and its English translation separated by a spaces and/or the '―' character. These examples are enclosed by '|'. The end of the Edo part is marked with '§' and the beginning of the English part also. For example: |Aan, vbua kha hẹẹ?§ ―§\“What did you say?”| (entry for aan).

Words in italics are enclosed by '_'.

The text of the dictionary has been digitized as it appears in the book including possible errors. In some cases however errors have been corrected. To convert a masterfile into another format the masterfile has to be parsed. Matching between opening and closing parentheses, brackets and quotes were needed to simplify parsing. Missing opening or closing parentheses, brackets or quotes have been added. In a few cases missing numbers in a entry for the different meanings have also added been added. These modifications are marked with '^' and were made 145 times.

On each line and after the grammatical category appears an expression like {SS::}, {SS:la:act:}, {SM::} or {SM:fo:be:} except for four entries. For these four entries the expression {MM::} appears just before the grammatical categories.

All words in the dictionary except four have a single grammatical category. The four words kẹkan, koko1, kherhekherhe and vbuyẹvbuyẹ have two categories.

3291 words have a single meaning. 603 words have several meanings. These meanings are normally numbered 1., 2., 3., ... In some cases missing numbers have been added because these numbers are used for the layout of the entries.

SS means 'Single' grammatical category, 'Single' meaning. SM means 'Single' grammatical category, 'Multiple' meanings. MM means 'Multiple' grammatical categories, 'Multiple' meanings. These tags are used for the layout of the entries.

The letters between colons, when present, denote semantic classes.

SS and SM appear after the grammatical category. MM before, after the phonetics

2037 entries of the 3894, mostly nouns, have been assigned a semantic class.

:act: activity
:an: animal
:be: body-external
:bf: body-fluid
:bi: body-internal
:bp: body-property
:bs: body-sickness
:bu: building
:bird: bird
:cl: clothing
:com: communication
:ec: economy
:em: emotion
:fa: family
:fo: food
:gr: greeting
:hh: household
:ho: house
:in: insect
:int: interjection
:la: language
:loc: location
:mu: music
:nu: numerical
:ob: object
:pe: person
:pl: plant
:pn: proper-name
:pr: profession
:re: religion
:st: state
:time: time
:to: tool
:trans: transport
:unit: unit

Some entries have two semantic tags. For example, ọta (speech; conversation) has been assigned the classes 'language' and 'activity' ({SS:la:act:}) and ewẹn (1. breast; 2. milk) has been assigned the classes 'food' and 'body-external' ({SM:fo:be:}).

The entries can then be organized by semantic class.

The entries can also be organized by grammatical category.

4. UNICODE

The digitized dictionary uses UNICODE characters coded in UTF-8.

Relevant entries from the UTF-8 encoding table

Unicode characters in the masterfile


Last update: 23-06-2025