GEDCOM import already usable, almost done
Nov 24, 2023 1:30:42 GMT -8
Post by Uncle Buddy on Nov 24, 2023 1:30:42 GMT -8
The GEDCOM import program for Treebard and UNIGEDS (Treebard's database structure) is very close to being finished, and is already usable.
Except for a dialog that upgrades GEDCOM places to UNIGEDS places, which is finished, I'm abandoning the notion of the Interactive Exceptions Report for these reasons:
1) Except for the bad dates correction feature, it's not really interactive.
2) The user is given instructions and values which would be easier to copy off a text file.
3) The user is expected to copy and paste what could be going into a text file automatically.
4) If the user doesn't copy, paste and save the bad values, the data is gone; it should be auto-saved in a text file.
5) The only truly interactive exception--bad dates--requires complicating what would otherwise be simple code, and it introduces bugs by interfering with other code. It's not worth doing over.
6) The text file, if it existed, would have the instructions which the user will lose when he turns off the exceptions dialog, whose instructions can't be copied unless the dialogs are made using more complicated widgets.
7) Early on in the Interactive Exceptions Report writing process, I had stopped short of reporting all the exceptions that exist because I didn't want to write GUIs for each exception type no matter how ridiculous the exception. With a text file, I could turn it over to the user and let them work on the problems at their leisure.
8) The exceptions aren't that critical because most of the important data is already going into the database. There's not that much for the user to do.
I came within crawling distance of finishing the exceptions that I'd taken on, and then decided the code was becoming bloated by what should have been sent to an exceptions log, to a text file. This will be easy to fix and will reduce the length of the code by hundreds of lines, and give the user less to do, less instructions to read and follow, during the GEDCOM import.
I decided that it would be too klunky to expect the user to get out his files during the import process and input data on demand. It wasn't a very good idea. It goes against the notion of separating processes from each other. Either import or input, don't mix them. Places are another story, GEDCOM's places are not usable in a genieware that takes places seriously.
Not finishing this superfluous GUI feature will speed things up, and I'm almost finished as it is, and ready to write the GEDCOM export program. I expect to whip out the export program in a few weeks.
I might start with the hardest part, when I do the export program, and if it's too painful to dumb down UNIGEDS' data to GEDCOM's expectations, I might instead export only in the gedMOM format. I could write that code in a few days. For example:
GEDCOM:
gedMOM:
Anyone who's worked with SQL for a week will know what to do with gedMOM, no "specifications" required. On the other hand, the GEDCOM specs prescribe some nonsensical rules based on GEDCOM's creators apparently guessing at the relationships among data or pretending that the subordinate numbers would convey what they're not suited to convey.
Example of GEDCOM nonsense: in order to correctly link the TEXT tag above to the PAGE tag above, the DATA tag simply needs to be ignored, to get this: PAGE.TEXT (i.e. citation linked to assertion). But "simply" isn't that simple, because if the exported GEDCOM doesn't position the DATA tag under the PAGE tag, this won't work. Positioning the siblings PAGE and DATA in the right order might be an unwritten rule of GEDCOM. A specification with unwritten rules is not a specification.
Anyway, I haven't decided yet whether to export gedMOM, GEDCOM, or both. GEDCOM has the advantage of being importable to nearly any genieware. gedMOM has the advantage of teaching SQL concepts without the learner having to look at any SQL. Well that's one of its advantages, one of many.
I used to think that being a text file was what made GEDCOM so slow. I'm not so sure anymore. I think the slow part of importing GEDCOM is inputting data to the database. Of course I'm using Python but everything I do is in the public domain, so it can be translated into the programming language of your choice.
Except for a dialog that upgrades GEDCOM places to UNIGEDS places, which is finished, I'm abandoning the notion of the Interactive Exceptions Report for these reasons:
1) Except for the bad dates correction feature, it's not really interactive.
2) The user is given instructions and values which would be easier to copy off a text file.
3) The user is expected to copy and paste what could be going into a text file automatically.
4) If the user doesn't copy, paste and save the bad values, the data is gone; it should be auto-saved in a text file.
5) The only truly interactive exception--bad dates--requires complicating what would otherwise be simple code, and it introduces bugs by interfering with other code. It's not worth doing over.
6) The text file, if it existed, would have the instructions which the user will lose when he turns off the exceptions dialog, whose instructions can't be copied unless the dialogs are made using more complicated widgets.
7) Early on in the Interactive Exceptions Report writing process, I had stopped short of reporting all the exceptions that exist because I didn't want to write GUIs for each exception type no matter how ridiculous the exception. With a text file, I could turn it over to the user and let them work on the problems at their leisure.
8) The exceptions aren't that critical because most of the important data is already going into the database. There's not that much for the user to do.
I came within crawling distance of finishing the exceptions that I'd taken on, and then decided the code was becoming bloated by what should have been sent to an exceptions log, to a text file. This will be easy to fix and will reduce the length of the code by hundreds of lines, and give the user less to do, less instructions to read and follow, during the GEDCOM import.
I decided that it would be too klunky to expect the user to get out his files during the import process and input data on demand. It wasn't a very good idea. It goes against the notion of separating processes from each other. Either import or input, don't mix them. Places are another story, GEDCOM's places are not usable in a genieware that takes places seriously.
Not finishing this superfluous GUI feature will speed things up, and I'm almost finished as it is, and ready to write the GEDCOM export program. I expect to whip out the export program in a few weeks.
I might start with the hardest part, when I do the export program, and if it's too painful to dumb down UNIGEDS' data to GEDCOM's expectations, I might instead export only in the gedMOM format. I could write that code in a few days. For example:
GEDCOM:
0 @I3@ INDI
1 NAME Howard A. /Teal/
2 SOUR @SRC8@
3 PAGE 431-30-8217
3 DATA
4 TEXT father Howard A. Teal
...
0 @SRC8@ SOUR
1 TITL Application for Social Security Number Form SS5
gedMOM:
PRSN 3
* *
NAME 12
PRSN_FK 3
NAME_STRG Howard A. Teal
NAME_SORT Teal, Howard A.
* *
ASRTN 99
CTTN_FK 34
NAME_FK 12
ASRTN_NAME father Howard A. Teal
* *
CTTN 34
CTTN_STRG 431-30-8217
SORC_FK 8
* *
SORC 8
SORC_TITL Application for Social Security Number Form SS5
Anyone who's worked with SQL for a week will know what to do with gedMOM, no "specifications" required. On the other hand, the GEDCOM specs prescribe some nonsensical rules based on GEDCOM's creators apparently guessing at the relationships among data or pretending that the subordinate numbers would convey what they're not suited to convey.
Example of GEDCOM nonsense: in order to correctly link the TEXT tag above to the PAGE tag above, the DATA tag simply needs to be ignored, to get this: PAGE.TEXT (i.e. citation linked to assertion). But "simply" isn't that simple, because if the exported GEDCOM doesn't position the DATA tag under the PAGE tag, this won't work. Positioning the siblings PAGE and DATA in the right order might be an unwritten rule of GEDCOM. A specification with unwritten rules is not a specification.
Anyway, I haven't decided yet whether to export gedMOM, GEDCOM, or both. GEDCOM has the advantage of being importable to nearly any genieware. gedMOM has the advantage of teaching SQL concepts without the learner having to look at any SQL. Well that's one of its advantages, one of many.
I used to think that being a text file was what made GEDCOM so slow. I'm not so sure anymore. I think the slow part of importing GEDCOM is inputting data to the database. Of course I'm using Python but everything I do is in the public domain, so it can be translated into the programming language of your choice.