--  DOCUMENT IN DEVELOPMENT  --

PROCESSES TO
    DO INFLECTIONS
    PREPARE DICTIONARY ADDITIONS
    UPGRADE LATIN DICTLINE
    CHECK LATIN DICTLINE
    MAINTAIN LATIN DICTLINE
    CHECK DICTLINE FOR ENGLISH SPELLING
GENERATE WORDS SYSTEM
        PREPARE LATIN DICTIONARY PHASE
        PREPARE ENGLISH DICTIONARY PHASE
    
OTHER FORMS OF DICTIONARY
    DICTPAGE
        Like a paper dictionary
    LISTALL
        All words that DICTLINE and INFLECTS can generate
          For spellcheckers
          Will not catch ADDONS and TRICKS words

TOOLS

CHECK.ADB
DUPS.ADB

DICTORD.ADB
FIXORD.ADB
LINEDICT.ADB
LISTORD.ADB

DICTPAGE.ADB

DICTFLAG.ADB

INVERT.ADB
INVSTEMS.ADB

ONERS.ADB

CCC.ADB

SLASH.ADB
PATCH.ADB

SORTER.ADB
    
-------------------  DO INFLECTIONS  ----------------------

INFLECTS.LAT contains the inflections in human-readable form
with comments, and in  useful order.
This is the input for MAKEINFL, which produces INFLECTS.SEC.


(LINE_INF uses INFLECTS.LAT input to produce INFLECTS.LIN,
clean and ordered, but still readable.

Run            

        LINE_INF


which produces
    INFLECTS.LIN
and INFLECTS.SEC)


----------------------------------------------------------
------------PREPARE  DICTIONARY  ADDITIONS----------------
----------------------------------------------------------

This process is to prepare a submission of new dictionary entries
for inclusion in DICTLINE.  The normal starting point is a text file
in DICTLINE (LIN) form, the full entry on one line, spaced appropriately.


The other likely form is an edit file (ED) in which the entry is broken
into three lines

STEMS
PART and TRAN
MEAN

For this form, spacing is not important, as long as there are spaces
separating individual elements.  

This is transformed into LIN form by the program LINEDICT
LINEDICT.IN (ED form) -> LINEDICT.OUT (LIN form)


The inverse of this, LIN to ED, is useful to produce a more easily
editable file (3 lines per entry so it is all on one screen)
LISTDICT.IN (LIN DICTLINE form) -> LISTDICT.OUT (ED form)

Having a LIN form, one can create a DICTLINE.SPE and do checking on that.

Besides running CHECK to validate syntax, one can run DICTORD and create
a file in which leading words are in dictionary entry form.  One can then
run this against the existing WORDS and DICTLINE to check for overlap. 

DICTORD makes # file in long format 
DICTORD.IN -> DICTORD.OUT  
Takes DICTLINE form, puts # and dictionary form at beginning.

This file can be sorted to produce word order of paper dictionary.

SORTER on (1 300) (with or without U for I/J U/V conversion)

One can then run WORDS against this file using DEV (!) parameters
DO_ONLY_INITIAL_WORD and FOR_WORD_LIST_CHECK, 
and (#) parameters 
HAVE_OUTPUT_FILE, WRITE_OUTPUT_TO_FILE, WRITE_UNKNOWNS_TO_FILE
The output provides for a check whether the new submissions 
are duplucated in the existing dictionary, and even if the forms are
are the meanings the same.

After editorial review in light of the WORDS run, the new submission
is ready for inclusion by the usual process with CHECK and SPELLCHECK.



----------------------------------------------------------
----------------UPGRADE  DICTIONARY ----------------------
----------------------------------------------------------

This is a variation of the additions process.

This process is to prepare a section of DICTLINE for upgrade.
A section (aboout 100 entries) is extracted and ordered alphabetically
It is then put in a form for convenient editing and compared to
the OLD and L+S.  Entries are checked and additions are made.
The edit form is returned to DICTLINE form and inserted in
place of the extracted section.

Much the same process is involved in preparing an independent submission
of new entries.



DICTORD makes # file in long format
DICTORD.IN -> DICTORD.OUT  
Takes DICTLINE form, puts # and dictionary form at beginning,
a file that can be sorted to produce word order of paper dictionary

SORTER on (1 300) 

LISTORD    Takes # (DICTORD) long format to ED file
(3 lines per entry so it is all on one screen)
LISTORD.IN -> LISTORD.OUT

Edit 


FIXORD produces clean ED file

LINEDICT makes long format (LINE_DIC/IN/OUT)

----------------------------------------------------------
-------ADDING A BLOCK OF NEW ENTRIES TO DICTIONARY -------
----------------------------------------------------------

This may be in association with the upgrade process or from
a block of new entries submitted by a developer or user.

The format may be strange.  It is usually easiest to reduce/edit
it down ro the 3 line ED form, because that has no column restrictions.

From there one does the usual, making LINEICT format and preparing the addition.

One quirk is that there may be entries duplicate of the current DICTLINE.
This is so even if the supplier was working from and checking his current DICTLINE,
because there may have been later additions to the master.  

While DUPS will catch these, that is a big effort for a full DICTLINE.
One would rather check just the new input.

Take the input and DICTORD.  This gives a format with the dictionary entry
word first.  Run the current WORDS against that with NO FIXES/TRICKS and 
FIRST_WORD and FOR_WORDLIST parameters.  And not UNKNOWN in the output 
should be examined.

Then run CHECK and spellcheck the English.

 
----------------------------------------------------------
------------PREPARE  DICTIONARY (DICTLINE) WITH ADDITIONS-----------
----------------------------------------------------------
Save present copies of DICTLINE.GEN, DICTLINE.SPE, DICT.LOC,
and whateverelse, in case you foul up and have to redo.

Add DICT.LOC to DICTLINE.GEN

        Copy DICT.LOC   LINEDICT.IN
        Run LINEDICT
      
        Copy LINEDICT.OUT+DICTLINE.GEN   DICTLINE.NEW

Or if there is a SPE that you want to integrate

         COPY DICTLINE.GEN+DICTLINE.SPE  DICTLINE.NEW

Or any other and combiination.


Sort DICTLINE.NEW in the normal fashion (to check for duplicates)

      SORTER
        DICTLINE.NEW   --  Or whatever you call it
            1 75         --  STEMS  
           77 24   P     --  PART  
          111 80         --  MEAN  --  To order |'s
          101 10         --  TRAN
         DICTLINE.SOR    --  Where to put result

Check the sort for oddities and any blank lines.
(Look for long/run-on lines.)

Then run CHECK and examine CHECK.OUT

Run

        CHECK

to produce 
   CHECK.OUT

Examine CHECK.OUT and make any corrections required
(The easiest way is to edit CHECK.IN and rerun as necessary.
Then copy the final CHECK.IN to DICTLINE.)
Errors are cites by line number in CHECK.IN.
Edit examining CHECK.OUT from the bottom, so that changes do not
affect the numbering of the rest of CHECK.IN
CHECK is very fussy.  The hits are primarily warnings to look for
the possibility of error.  Most will not be wrong.  In fact, over 
one percent of correct lines will trigger some warning, more false
positives than real errors.
This make a full run and edit of DICTLINE a considerable burden.


Sort the fixed CHECK.IN again if there have been any changes in order.

Check for duplicates in columns 1..100
(DUPS checks for '|' in column 111 so that it does not give
hits on lines known to be continuations, provided the sort is in order.)

   COPY CHECK.IN DUPS.IN
   Run DUPS
          1 100

Examine DUPS.OUT and fix DUPS.IN (again from the bottom).
Resort if necessary.

Copy the final product to DICTLINE.GEN
    
This only checks DICTLINE for syntax,

----------------------------------------------------------
----------CHECK DICTLINE FOR ENGLISH SPELLING-------------
----------------------------------------------------------
To check DICTLINE further, one can check the spelling of MEAN.

The fixed format of DICTLINE facilitates this process.
Just running DICTLINE through a spellchecker is impossible,
since all lines contain Latin stems, which will fail not only
an English spellchecker, but a Latin spellchecker as well 
(since they are just stems, not proper words).

The process is to extract the MEAN portion, spellcheck this,
and reassemble, making sure to preserve the exact line order.
I use two personal tools, SLASH and PATCH.

Run SLASH on DICTLINE
SLASH takes a file and cuts it into two, lines or columns.
In this case we want to separate the first 110 columns from the rest.

   SLASH
      c          --  Rows or columns
      110        --  How many in first 
      LEFT.      --  Name of left file
      RIGHT.     --  Name of right file
                 --  Or whatever you want to call them

Save LEFT for later and work on RIGHT, which is only MEANs.

There is one additional complication.  
Some MEANs have a translation example element [... => ...]
This will contain some Latin (the left half) as well as English.

The rest I do with editors, but I suppose I should make tools.

Introduce 80 blanks in front of any [
SLASH out the first 80 columns, giving the MEAN omitting the []
Spellcheck that
In the [] file, left justify and add 80 blanks before the =
SLASH out the first 80 columns and spellcheck 
Reassemble the three parts of MEAN 
Eliminate blanks, leaving a simple MEAN/RIGHT.
PATCH LEFT. and RIGHT together to give DICTLINE. 





___________________________________________

 To Prepare English Dictionary
__________________________________________

The first part of the following procedure is only for those 
starting from scratch.  If porting with a full package,
EWDSLIST.GEN will be provided and you can skip down.

---------------------------------------------------------

Preparing the dictionary for the English mode also 
involves checks on the syntax of MEAN.

Run MAKEEWDS against DICTLINE.GEN
(There may be some errors cited.  Correct as appropriate.)

This extracts the English words from DICTLINE MEAN (G or S)
Makes EWDSLIST.GEN (or .SPE)

Make sure that if running from DICTLINE.GEN that the extra ESSE line
is added.  If we start from DICTFILE.GEN, it is already in.

 type EWDS_RECORD is 
        record
          W    : EWORD;                       1
          AUX  : AUXWORD;                    40
          N    : INTEGER;                    50
          POFS : PART_OF_SPEECH_TYPE := X;   62
        end record;

Ah                                                         1 INTERJ
Aulus                                                      2 N     
Roman                                                      2 N     
praenomen                                                  2 N     
abbreviated                                                2 N     



__________________________________________________


Sort EWDSLIST making a revised version (same name)

1    24   A
1    24   C
51    6   R
75    2   N  D




(Run ONERS on ONERS.IN if you want to see FREQ)
(Sort ONERS.OUT  1 11 D; 13 99)

_____________________________________________________

If you are supplied with EWDSLIST.GEN as part of a port package,
the above process is not done.

_____________________________________________________


Run MAKE_EWDSFILE against EWDSLIST.GEN
(This also removes some duplicates, entries in which the 
key word appears more than once.)

producing EWDSFILE.GEN

(At present these will act to produce a EWDSFILE.SPE, but
WORDS is not yet setup to use that - only English on GEN for now.)

----------------------------------------------------------
------------PREPARE  WORDS SYSTEM-------------------------
----------------------------------------------------------

If using GNAT, otherwise compile with your favorite compiler      

gnatmake -O3 words
gnatmake -O3 makedict
gnatmake -O3 makestem
gnatmake -O3 makeewds
gnatmake -O3 makeefil
gnatmake -O3 makeinfl


This produces executables (.EXE files) for 
WORDS
MAKEDICT
MAKESTEM
MAKEEWDS
MAKEEFIL
MAKEINFL 

(You may also need my SORTER to prepare the data if you are modifying data.
gnatmake -O3 sorter)

(If you have modified DICTLINE, SORTER sort 
            1 75         --  STEMS  
           77 24   P     --  PART
          111 80         --  MEAN
          101 10         --  TRAN
Actually the order of DICTLINE is not important for the programs; 
it is only a convenience for the human user.)


Run MAKEDICT against the DICTLINE.GEN  -  When it asks for dictionary, reply G for GENERAL
This produces DICTFILE.GEN
("against" means that the data file and the program are in the same folder/subdirectory.)

(This assumes that you are using the presorted STEMFILE.GEN 
which comes with distribution and matches that DICTLINE.GEN.
Otherwise make and run WAKEDICT (Identical to MAKEDICT with
PORTING parameter set in source).  This produces DICTFILE.GEN 
and a STEMLIST.GEN, which has to be sorter by SORTER.
MAKE ABSOLUTELY SURE YOU ARE USING THE RIGHT MAKEDICT/WAKEDICT!

Invoke SORTER to sort the stems with I/J and U/V equivalence
and replace initial STEMLIST with the sorted one.

       SORTER
         STEMLIST.GEN    --  Input  
           1    18   U
           20   24   P
           1    18   C
           1    56   A
           58    1   D      
         STEMLIST.GEN    --  Output  

The output file is also STEMLIST.GEN - Enter/CR for the name works.)
(All SORTER parameters are based on the layout of WORDS 1.97E.
Later versions may have further/expanded fields.)

Run MAKESTEM against STEMLIST.GEN (with dictionary "G") produces STEMFILE.GEN and INDXFILE.GEN

The same procedures can generate DICTFILE.SPE and STEMFILE.SPE (input S) 
if there is a SPECIAL dictionary, DICTLINE.SPE


For the English part, if you use the presorted EWDSLIST.GEN 
run MAKEEFIL against it.

(This assumes that you are using the presorted EWDSLIST.GEN 
which comes with distribution and matches that DICTLINE.GEN.
Otherwise make and run MAKEEWDS against DICTLINE.GEN 
This produces EWSDLIST.GEN which has to be sorted by SORTER.
Check the beginning of EWDSLIST with an editor.  
If there are any strange lines, remove them.
Invoke SORTER.  The input file is EWSDLIST.GEN.  
The sort fields are

SORTER
    EWDSLIST.GEN
       1   24   A         --  Main word
       1   24   C         --  Main word for CAPS
      51    6   R         --  Part of Speech  
      72    5   N    D    --  RANK
      58    1   D         --  FREQ
    EWSDLIST.GEN     --  Store 

The output file is also EWDSLIST.GEN - Enter/CR for the name works.)
(For this distribution, there is no facility for English from a SPECIAL dictionary -
there is no D_K field yet)

Run MAKEEFIL against the sorted EWDSLIST.GEN producing EWDSFILE.GEN


Run MAKEINFL against INFLECTS.LAT producing INFLECTS.SEC

Along with ADDONS.LAT and UNIQUES.LAT, 
this is the entire set of data for WORDS.

WORDS.EXE
INFLECTS.SEC
ADDONS.LAT
UNIQUES.LAT
DICTFILE.GEN
STEMFILE.GEN
INDXFILE.GEN
EWDSFILE.GEN
--  And whatever .SPE as appropriate



(If you go through the process and have a working WORDS but it 
gives the wrong output, the most likely source of error is 
a missing or improper sort.)


--------------------------------------------------------------
Viewing WORD.STA


A view to see what ADDONS and TRICKS were used


Sort WORD.STA on
1    12      --  The STAT name
55   25      --  STAT details
32   20      --  Word in question
16   10      --  Line number


------------------------------------------------------------------
------------------PREPARING DICTPAGE------------------------------
------------------------------------------------------------------

Preparing DICTPAGE, the listing as of a paper dictionary.

IMPORTANT NOTE

During the process, you may find it useful to edit some entries.  Feel free to do so.
But remember that you have to keep the separate files (.TXT) and reassemble at the end
into a new DICTLINE.


For a release, ideally DICTPAGE is done before the final DICTLINE,
because in the process there may be some editing of entries.
To first order, this is accomplished by running DICTPAGE 
against DICTLINE, producing a listing of DICTLINE with each
entry preceded by # and the DICTIONARY_FORM.  
DICTPAGE is a simple modification of DICTORD to produce a
more readable output.

Some polishing of this process gives a better product.
Extracting a few groups of entries for special handling
will simplify the process.


1) Use the regular DICTLINE sort.
Those entries with first stem zzz may give an output
which sorts to #-.  But it is likely the second term which 
you want to represent this entry.  For this and other reasons
these entries will require some hand editing, so extract them
from their place at the end of the regular DICTLINE, run DICTPAGE 
on them, sort output on full line, and process separately.  
(About 30 entries, but half handled completely by DICTPAGE)
It is likely that this set has not changed much since the last run,
so check to see if you have to do it over.

2)Sort remaining DICTLINE on (77, 8), (110, 80), (1, 75).  Extract ADJ 2 X.
Many Greek adjectives are handled in DICTLINE in two or three parts
(ADJ 2, X by gender.  The full declension is the 
sum of these partials.  (The Greek adjective form 3 6 is handled in the
regular process and does not have to be extracted.) Extract these ADJ declensions 
from a sort of DICTLINE by PART.  Sort this output on stem and meaning to group
the constituent parts, run DICTPAGE and polish by hand edit to make 
a single paper entry from the parts.  (About 150 entries, half that 
after editing, not too hard, but a program could do the modification.)  
It is very likely that this has not changed.

3)The qu-/aliqu- PRONOUN/PACKON (PRON/PACK 1) are yet more complicated 
than the Greek adjectives, and are handled in the same manner.  
Extract them, sort on meaning, DICTPAGE, and polish output by hand.  
Also PRON 5 (only 8 of these).  Both of these are sufficiently
unchanging that one could archive the final edit and reuse on a later run.

4)The rest are automatically done by DICTPAGE.

5)UNIQUES are a special case, handled by UNIQPAGE.  This processes UNIQUES.LAT
(as UNIQPAGE.IN) into a raw form compatible with the regular PAGE material
(UNIQPAGE.OUT which is copied into UNIQPAGE.pg), added to, and sorted with.


The various phases are assembled into a whole and sorted on the lead,
producing DICTPAGE.RAW

DICTPAGE.RAW is ZIPped to provide a source for others to process for their purposes.

DICTPAGE.RAW is processes herein by PAGE2HTM to give (with the addition of PREAMBLE.txt
and an end BODY) to give the presentation form DICTPAGE.HTM




The process:

First do a SORT of DICTLINE on STEM to find zzz stems

      SORTER
        DICTLINE.GEN   --  Or whatever
           1 75         --  STEMS  
          77 24   P     --  PART  
         111 80         --  MEAN  --  To order |'s
        DICTLINE.TXT    --  Where to put result

Extract the zzz stems from the end of the file into ZZZ.TXT leaving DICTLINE.NOZ

Sort these 

     SORTER
        ZZZ.TXT
           77 24   P     --  PART  
            1 75         --  STEMS  
          111 80         --  MEAN  --  To order |'s
          101 10         --  TRAN
        ZZZ.TXT             --  Where to put result

Extract the PRON 5 to a PRON5.TXT  --  More to come



Now sort the rest

      SORTER
        DICTLINE.NOZ       
           77 24   P     --  PART  
            1 75         --  STEMS  
          111 80         --  MEAN  --  To order |'s
          101 10         --  TRAN
        DICTLINE.NOZ    --  Where to put result


Now extract from DICTLINE.NOZ the remaining PRON 5, the Greek adjectives, 
and the qui/alqui PRON/PACK 1, giving

ZZZ.TXT
GKADJ.TXT
PRON1.TXT
PRON5.TXT

After those are removed, the remaining is REST.TXT.


Run DICTPAGE on each of these 5 files 
(Copy them to DICTPAGE.IN, run DICTPAGE, copy DICTPAGE.OUT to the appropriate file .PG)


----------------ZZZ

Process the remaining (less PRON 5) ZZZ.TXT with DICTPAGE
(Copy ZZZ.TXT to DICTPAGE.IN, run DICTPAGE, copy DICTPAGE.OUT to ZZZ.PG)
Most of them will be handled.  Hand edit the rest.

Some should be expanded (archaic forms in one stem need to be filled out).
Some should be modified (e.g., the plurals).
Some should be trimmed (adjectives with no positive).
There are some kludges (artificial entries which generate irregular forms)
here.  Some may just be excluded from the .PG .

----------------GKADJ

Sort GKADJ to get the various parts together for a multiple entry


      SORTER
        GKDAJ.TXT       
            1 75         --  STEMS  
          111 80         --  MEAN  --  To order |'s
          101 10         --  TRAN
           77 24   P     --  PART  
        GKADJ.TXT            --  Where to put result

Run DICTPAGE and edit.  This edit is straightforward but tedious.
I should prepare a procedure to do this automatically, but have not yet.
It is likely that there are few or no changes
from the previous run and those results can be used/modified.


The product is GKADJ.PG

----------------PRON1

This must be hand edited.  However it may not change much between versions.

----------------PRON5

Very small.

----------------UNIQUES

UNIQUES are treated by UNIQPAGE.EXE, giving UNIQPAGE.PG

----------------

----------------

The resulting files (with extensions appropriate to the phase of the operation,
ending in .PG) are 

GKADJ
PRON1
PRON5
REST
UNIQPAGE
ZZZ

----------------FINISH

Assemble the 6 .PG files to DICTPAGE.PG and sort to produce DICTPAGE.RAW


  SORTER
        DICTPAGE.PG   
           1 300  C      --  Everything  
           1 300  A      --  For Caps  
        DICTPAGE.RAW    --  Where to put result


Then process with PAGE2HTM and add PREAMBLE.TXT at beginning and end BODY at end 
to get DICTPAGE.HTM

---------------------------------------------------------------------


 

------------------------------------------------------------------
----------------------THE SHORT FORM------------------------------
------------------------------------------------------------------

------  SORT DICTLINE

      SORTER
        DICTLINE.GEN
            1 75         --  STEMS  
           77 24   P     --  PART  
          111 80         --  MEAN  --  To order |'s
          101 10         --  TRAN
         DICTLINE.GEN    --  Where to put result


WAKEDICT/MAKEDICT

------  SORT STEMLIST IF NOT PROVIDED

       SORTER
         STEMLIST.GEN    --  Input  
           1    18   U
           20   24   P
           1    18   A
           1    56   C
         STEMLIST.GEN    --  Output  

MAKESTEM

MAKEEWDS

------  SORT EWDSLIST

       SORTER
         EWDSLIST.GEN   
           1   24   A         --  Main word
           1   24   C         --  Main word for CAPS
          51    6   R         --  Part of Speech  
          72    5   N    D    --  RANK
          58    1   D         --  FREQ
         EWSDLIST.GEN        --  Output 

MAKEEFIL
