Post by Uncle Buddy on May 10, 2022 18:52:21 GMT -8
GEDCOM and Psychology
(or:) Why genieware developers don't bother writing GEDCOM really well.
We could spiral immedately downward into the most negative available interpretation of the situation, bordering on conspiracy theory: "Genieware vendors write bad GEDCOM on purpose to lock customers into their product. Without a complete and accurate export, the customer might decide against moving to a different genealogy software."
The problem with a downward spiral is that even in the miraculous event that everyone manages to agree about what's really maybe being kicked about, over cappuccinos down at Conspiracy Cafe, a downward spiral goes nowhere but down and down and 'round and 'round. So I'd rather give the vendors their alibis and try to be realistic.
It doesn't matter whether genealogy software developers are purposely pumping out bad GEDCOM or not. What matters is that it's so easy and natural to do just that. Especially when the GEDCOM specs say it's OK to make up your own GEDCOM tags, thus leaving it for someone else to do all the work of importing your data. How convenient for the vendors, to whom time is money; why shouldn't they take advantage of the GEDCOM's unstandard self-image? All they need to be able to do is say that they have a GEDCOM export feature. It doesn't have to do anything that useful.
Under these circumstances, let's do be radical, let's do radically improve the way genealogy treats its data, and let's also try to be nice about it. We still need each other, even when one or both of us is wrong, uninformed, or has bad judgement.
In the process of writing a GEDCOM importing module, I spent the whole day dealing with GEDCOM's `FAMC` tag, which still appears to be totally redundant. It's supposed to be used subordinately to the INDI tag as a reference to which FAM tag a CHIL tag is supposed to be subordinate to. So theoretically, I should be able to ignore the FAMC tag, since it shouldn't exist and doesn't need to. But since the same thing is done two ways, some vendors will not use this linking "bidirectionally" (that's a technical-sounding buzzword for "redundantly"), so I have to detect when the FAMC tag is used alone and let the user know about the exception so a child can be entered manually. Since I decided to detect pointers for children from the other end of the double-pointed arrow, where the CHIL tag is used subordinately to the FAM tag, I have to test the GEDCOM to determine that no child is left behind. This means the import process is spending GEDCOM's precious crawling time testing itself as to whether or not it's already been imported, because of bad (i.e. non-redundant) GEDCOM that someone might write for good (DRY: "don't repeat yourself") reasons.
Let's just pretend this is nobody's fault.
But what if this redundancy wasn't "properly" practiced by the vendor who wrote the GEDCOM file? Then is it theoretically the fault of the lazy [sic] vendor, or else the fault of GEDCOM for expecting bidirectional (redundant) links to pointers when such a thing goes against everything that programmers have been taught? Well I don't know about you, but instead of blaming my friends who try to write genieware, I'd rather blame the GEDCOM. Everyone knows GEDCOM is inadequate, so optimistically speaking, no one's feelings are gonna get hurt if I point the finger at an inanimate text file.
And then we can all pat each other on the back and go on our merry way, knowing that the culprit is inanimate and can't be expected to fix itself. But then... what has been accomplished? What's gonna change for anybody?
On the other hand, what if we stop trying to fix GEDCOM when forced to use it? That's what I'm gonna do. If an imported .ged file is written wrong, I ain't gonna fix it. I'm gonna tell the Treebard user that the GEDCOM unstandard is inadequate and therefore the vendor could have written a program to directly export his data to my Treebard database instead of using an obsolete tool. The structure of Treebard's database is in the public domain, while the structure of the exporter's data storage facility might be a company secret. So I can't possibly write an import program from his structure to mine. I either have to use GEDCOM or use GEDCOM. I won't waste my energy wondering whose fault that is. Let's try to move forward, let the past be the past, and stop trying to be backwardsly compatible with mistaken half-measures which are largely incompatible with anything useful.
If it ain't broke, don't fix it.
If it is broke, try to fix it.
If fixing it is perennially not worth the trouble, start over, using gained experience and the wisdom it could offer. Emulating a broken model doesn't work because it shouldn't work. In my limited experience, computer code that is too complicated should be fixable by making it less complicated. If that's impossible, it should not be fixed, but replaced.
If everyone opened up their data structure, fewer vendors would be motivated to use GEDCOM any more. In such a world, as direct import/export facilities improved naturally, a commonly-useful, single-minded, universal structure would become relatively obvious. Because, if you can see something, and compare it to other things, then we all learn something, and everyone's behavior improves at the same time.
Assuming that some of this makes sense, then there's still a problem, but it has an obvious solution. The problem is that, if everyone magically happened to open up their data structure to the public eyeball, the above-mentioned process of their organically growing into a common, universal data structure would take another twenty years. The obvious solution: write the common data structure pro-actively now. Make it available and then the vendors don't have to be talked into revealing their trade secrets. They can leave the past in the past and unapologetically adopt something better. With everyone sharing a common data structure, every single vendor will share a common advantage: no more writing a separate direct import program for every other vendor's product. Not that much of this actually gets done; it's an unreasonable thing to expect.
This is how GEDCOM gets replaced: a single standard that's actually a standard. Otherwise we'll be using this zombie GEDCOM tool till our descendants are trying to find our middle name on ancestry.com and still cursing GEDCOM's pretense at being a tool.
As a noob programming wanna-be with limited mental resources and nowhere to go but down because of my age, let me offer an opinion anyway: to use any medium other than SQLite for GEDCOM's replacement would be utterly ridiculous. It's either serverless SQL or it's Lemming Leap. I am eagerly awaiting rebuttals from those who are way ahead of me in expertise and experience.
Otherwise, we can just keep blaming a poor, inanimate text file that wanted to be a database, pointing our fingers in the dark, expecting someone else to solve our problems.
No sir, boys and girls, it's your fault and mine that GEDCOM is still hobbling us and still has no replacement, because we keep trying to cover its limping tracks by fixing unimportable GEDCOM when we should be participating in a program of giving this unnecessary evil that refuses to die a proper burial.
My next assignment, should I choose to accept it, is to compile a list of which genieware vendors reveal their data structure so GEDCOM can be completely bypassed, and which vendors hide their data structure. It's that latter reprehensible act, among other things, which has kept GEDCOM howling unrestful around its unhappy grave.
This tape will self-destruct in five seconds.
(or:) Why genieware developers don't bother writing GEDCOM really well.
We could spiral immedately downward into the most negative available interpretation of the situation, bordering on conspiracy theory: "Genieware vendors write bad GEDCOM on purpose to lock customers into their product. Without a complete and accurate export, the customer might decide against moving to a different genealogy software."
The problem with a downward spiral is that even in the miraculous event that everyone manages to agree about what's really maybe being kicked about, over cappuccinos down at Conspiracy Cafe, a downward spiral goes nowhere but down and down and 'round and 'round. So I'd rather give the vendors their alibis and try to be realistic.
It doesn't matter whether genealogy software developers are purposely pumping out bad GEDCOM or not. What matters is that it's so easy and natural to do just that. Especially when the GEDCOM specs say it's OK to make up your own GEDCOM tags, thus leaving it for someone else to do all the work of importing your data. How convenient for the vendors, to whom time is money; why shouldn't they take advantage of the GEDCOM's unstandard self-image? All they need to be able to do is say that they have a GEDCOM export feature. It doesn't have to do anything that useful.
Under these circumstances, let's do be radical, let's do radically improve the way genealogy treats its data, and let's also try to be nice about it. We still need each other, even when one or both of us is wrong, uninformed, or has bad judgement.
In the process of writing a GEDCOM importing module, I spent the whole day dealing with GEDCOM's `FAMC` tag, which still appears to be totally redundant. It's supposed to be used subordinately to the INDI tag as a reference to which FAM tag a CHIL tag is supposed to be subordinate to. So theoretically, I should be able to ignore the FAMC tag, since it shouldn't exist and doesn't need to. But since the same thing is done two ways, some vendors will not use this linking "bidirectionally" (that's a technical-sounding buzzword for "redundantly"), so I have to detect when the FAMC tag is used alone and let the user know about the exception so a child can be entered manually. Since I decided to detect pointers for children from the other end of the double-pointed arrow, where the CHIL tag is used subordinately to the FAM tag, I have to test the GEDCOM to determine that no child is left behind. This means the import process is spending GEDCOM's precious crawling time testing itself as to whether or not it's already been imported, because of bad (i.e. non-redundant) GEDCOM that someone might write for good (DRY: "don't repeat yourself") reasons.
Let's just pretend this is nobody's fault.
But what if this redundancy wasn't "properly" practiced by the vendor who wrote the GEDCOM file? Then is it theoretically the fault of the lazy [sic] vendor, or else the fault of GEDCOM for expecting bidirectional (redundant) links to pointers when such a thing goes against everything that programmers have been taught? Well I don't know about you, but instead of blaming my friends who try to write genieware, I'd rather blame the GEDCOM. Everyone knows GEDCOM is inadequate, so optimistically speaking, no one's feelings are gonna get hurt if I point the finger at an inanimate text file.
And then we can all pat each other on the back and go on our merry way, knowing that the culprit is inanimate and can't be expected to fix itself. But then... what has been accomplished? What's gonna change for anybody?
On the other hand, what if we stop trying to fix GEDCOM when forced to use it? That's what I'm gonna do. If an imported .ged file is written wrong, I ain't gonna fix it. I'm gonna tell the Treebard user that the GEDCOM unstandard is inadequate and therefore the vendor could have written a program to directly export his data to my Treebard database instead of using an obsolete tool. The structure of Treebard's database is in the public domain, while the structure of the exporter's data storage facility might be a company secret. So I can't possibly write an import program from his structure to mine. I either have to use GEDCOM or use GEDCOM. I won't waste my energy wondering whose fault that is. Let's try to move forward, let the past be the past, and stop trying to be backwardsly compatible with mistaken half-measures which are largely incompatible with anything useful.
If it ain't broke, don't fix it.
If it is broke, try to fix it.
If fixing it is perennially not worth the trouble, start over, using gained experience and the wisdom it could offer. Emulating a broken model doesn't work because it shouldn't work. In my limited experience, computer code that is too complicated should be fixable by making it less complicated. If that's impossible, it should not be fixed, but replaced.
If everyone opened up their data structure, fewer vendors would be motivated to use GEDCOM any more. In such a world, as direct import/export facilities improved naturally, a commonly-useful, single-minded, universal structure would become relatively obvious. Because, if you can see something, and compare it to other things, then we all learn something, and everyone's behavior improves at the same time.
Assuming that some of this makes sense, then there's still a problem, but it has an obvious solution. The problem is that, if everyone magically happened to open up their data structure to the public eyeball, the above-mentioned process of their organically growing into a common, universal data structure would take another twenty years. The obvious solution: write the common data structure pro-actively now. Make it available and then the vendors don't have to be talked into revealing their trade secrets. They can leave the past in the past and unapologetically adopt something better. With everyone sharing a common data structure, every single vendor will share a common advantage: no more writing a separate direct import program for every other vendor's product. Not that much of this actually gets done; it's an unreasonable thing to expect.
This is how GEDCOM gets replaced: a single standard that's actually a standard. Otherwise we'll be using this zombie GEDCOM tool till our descendants are trying to find our middle name on ancestry.com and still cursing GEDCOM's pretense at being a tool.
As a noob programming wanna-be with limited mental resources and nowhere to go but down because of my age, let me offer an opinion anyway: to use any medium other than SQLite for GEDCOM's replacement would be utterly ridiculous. It's either serverless SQL or it's Lemming Leap. I am eagerly awaiting rebuttals from those who are way ahead of me in expertise and experience.
Otherwise, we can just keep blaming a poor, inanimate text file that wanted to be a database, pointing our fingers in the dark, expecting someone else to solve our problems.
No sir, boys and girls, it's your fault and mine that GEDCOM is still hobbling us and still has no replacement, because we keep trying to cover its limping tracks by fixing unimportable GEDCOM when we should be participating in a program of giving this unnecessary evil that refuses to die a proper burial.
My next assignment, should I choose to accept it, is to compile a list of which genieware vendors reveal their data structure so GEDCOM can be completely bypassed, and which vendors hide their data structure. It's that latter reprehensible act, among other things, which has kept GEDCOM howling unrestful around its unhappy grave.
This tape will self-destruct in five seconds.