Who's gonna tell the emperor that he wears no clothes?
Jun 28, 2023 5:42:37 GMT -8
Post by Uncle Buddy on Jun 28, 2023 5:42:37 GMT -8
Once upon a time, in the mists of time not long after the Big Bang, the long-anticipated year 1984 rolled around, and the world had recently become littered with a newish invention called "the personal computer". Everywhere you went, purchasing a simple candy bar or a tank top or a hamburger had become a complicated affair. The simplistic old cash register which had been invented long before, for the purpose of speeding things up at the customer counter, had been replaced by a machine that came with an instruction manual, and you could usually see two or more cashiers frantically thumbing through the manual together, sweating profusely, kinda propping up each other's morale while looking for some hint of why the customer couldn't just leave with his candy bar. This search through the manual took place under the watchful eye of a long line of customers who just wanted to go home and try on their new jeans but instead were made to wonder what had happened to the good old cash register, which had done a perfectly fine job of speeding things up, just last year.
During this heydey of mass confusion when the world was transitioning from mechanical analog devices to programmable gadgets that few managers or employees had any experience with, genealogy got in a big hurry to transition from the typewriter to the computer. Hard drives, if they existed, were measured in megabytes or less. Floppy discs could hold one or two images, if you were lucky. The relational database language SQL was in its infancy, and into this world of fledgling toy wonders was born... GEDCOM!
Since databases barely existed in 1984, GEDCOM is a text file pretending to be a database. A placeholder for a database. Or, as the old song says, "I'll be your database till your database comes." (paraphrased) The computer can sort of read GEDCOM, but in a linear way, as if it were stored on a cassette tape instead of a digital device. It does little good to store the text file on digital media; the computer might as well be reading the letters off of a piece of paper, it probably takes about that long. It takes many times longer for even the computers of today to read a text file than to read the same data from a binary file such as a SQL database. The parts of the text file can't see each other, the file does not communicate with itself. The left hand does not know what the right hand is doing.
But heck, who's in a hurry? Genealogy is just a hobby, right? Well there are those who take it considerably more seriously than that, but the slowness of GEDCOM is not its worst defect. While its being a text file makes it kinda attractive in a way--it looks like a fun puzzle to work on--the fact that its organization and self-referencing are locked up into linearly-read text means that every single one of its birth defects is one more mind-crunching pestilence of difficulty to slog through. Do you remember the software of those days? If it wasn't made by Microsoft--and sometimes even if it was--it was probably not worth the time it took you to learn it, considering what would be available a few years down the road. In the 1990s I paid good money for a drafting program that barely worked, a paint program that barely worked, etc. GEDCOM is of that world, it was born of that world, a world where, when I bought my first computer, its hard drive was only 750 Mb.
OK, I confess, having had to stand in line for a long time to purchase many a simple item during the 1980s because software was so mysterious to us, I didn't even consider becoming the owner of one of those ghastly things till my girlfriend talked me into it around 1993 or so. I wouldn't touch DOS with a ten foot pole. I still remember when my geeky co-worker said, "I love Windows!" and invited me to her family home to play with her new Windows machine. I just typed for three hours, as fast as I could, till her husband threw me out of the house with my precious half-typed document. I would soon cave in and order a machine from Gateway, in the black-and-white spotted cowhide box, complete with a simple scanner that cost $1000.
And now I try to write my own program many hours each day, usually seven days a week. We don't learn very fast, do we? Well, at least I don't.
I have spent many hours working through GEDCOM's challenges. It's easy to read with the eye, that's part of the problem. Unfortunately the computer needs to be told what all those little symbols, words and numbers mean, and then it can only read one line at a time, it cannot relate the lines to each other, because it is not a modern database but a stop-gap measure invented by genealogy enthusiasts in the early days of the personal computer who wanted to ensure that computer genealogists didn't all have to use the same application.
The inbuilt flaws of GEDCOM can all be traced to the basic underlying fact that GEDCOM is not a database. By so tracing these flaws, the only logical conclusion is that GEDCOM should have been a database, and thus GEDCOM's replacement should be a database.
Those early pioneers of computer genealogy, who well-meaningly wanted us to be able to share our family trees among various applications, no doubt had little knowledge of database software since SQL was brand new at the time, and had no way of knowing that SQL would become the standard for sharing data among multitudinous diverse applications, and remain the standard, one of the most stable and useful products in the software world.
I have nothing against vendors busting their butts to write GEDCOM import and export programs so their customers can get data in and out of their applications. But Treebard is a one-man project and it does not exist for the purpose of badly importing and exporting data. It exists for the purpose of demonstrating the best user interface features that a novice like I can laboriously create. Treebard's partner project UNIGEDS exists for the purpose of demonstrating the best database structure, the structure that genealogy, in its intricacy, demands. The fact is that for five or ten features I could build for Treebard and UNIGEDS, I could only hope to build one partial feature for GEDCOM.
Why partial? Because there is, and always will be, many genieware vendors with different ways of doing things. No matter what else happens, there is that perpetually and unalterably vicious fact of life, which is that GEDCOM can't enforce One Dang Thing. SQL comes with a regimen of enforcement tools to keep data on the right track, but GEDCOM is a text file, not a relational database. If a vendor exports a GEDCOM file full of so-called "custom tags", then it becomes the importing vendor's problem to rewrite what could have been a perfectly effective labor of love, a standard import procedure. GEDCOM's creators knew they couldn't keep people from inventing their own tags so they declared, "This is how you write a custom tag, but we don't think you should use them." That's the best they could do, short of making GEDCOM perfect from day one, and this problem of vendors using custom tags will never go away. A data-sharing program that allows everybody to do everything differently will never be a standard, no matter how many people mistake it for a standard and call it a standard. This emperor wears no clothes, no matter how many people refuse to see him as buck naked.
You genieware vendors sell your products based not on what's buried deep inside them, but on the programs' outer features, the things your customers interact with. This is the so-called front-end or graphical user interface or GUI. This is what the user cares about. Genealogists don't really care how their data is stored. They will pay for the program whose interactive features they like. Once they've learned how to use the program, if they still like it, they will recommend it to others. Tastes differ. For example, there is not a genieware product out there that I can recommend. Not even one. But there are plenty of places online to find genieware recommendations.
Genealogy is one of the most popular hobbies of our time, so all across the world many wanna-be genieware vendors are busting their butts to re-invent the wheel, and I'm not talking about the GUI where the user fills in names and dates and pushes buttons and stuff. That's not only the part that sells the program, it's the part that's very much up to the creator as to how to make it look and act. Vendors somehow imagine that they need to own their own back-ends, the innards, the database or other storage device that keeps the data safe till next time the user turns on their computer. No, that part does not need to be owned by anyone. That is where the standards need to be set. There will never be a consensus on what a proper genieware GUI should be. It's a matter of taste.
The back-end of the genealogy database software is the database itself. Since we can't use a box full of index cards for a database, like we used to do, can we ever agree to notice that without a self-enforcing standard for data storage, we will forever be at the mercy of GEDCOM or something just as imperfectible? SQL enforces data integrity, and if used incorrectly it doesn't work. GEDCOM doesn't have that problem, since it enforces nothing, anyone can break the rules and still call their import/export program 'GEDCOM'. But since every vendor uses it differently, it will never be the vendors' problem to use GEDCOM correctly, no matter how possible it supposedly is.
I've been working full-time on Treebard GPS and UNIGEDS for five years. Treebard is the GUI that I would prefer to use, as a matter of personal taste, but UNIGEDS is the database structure DICTATED by the structure of the elements of genealogy and how they relate to each other. Because SQL is built with constraints and enforcement, because SQL is a binary file that a computer can read quickly, because SQL is stable and has been the standard for data-sharing in the back-end for a long time, the structure in which genealogy data should be stored is not pliable. It's not a matter of taste, it's dictated by rules which can be learned and must be followed. Sure, it has some optional features. There are plenty of secondary features in genealogy for which no standard storage facility is needed, that's up to the vendor. But all it takes to use SQL is the basics. Nothing fancy has to be done, and SQLite can do it all. Genealogy data is complex, but it ain't rocket science. There's no calculus or trigonometry, writing most genealogy software features is not easy but it's not that hard either.
Until you run up against that GEDCOM wall. Standing at that wall I can see many ways of going forward, and I've worked hard on several of these. I can definitely see the light at the end of the tunnel, and that light is one teeny-tiny little speck. I know how to do this, and I know I can. But should I?
If I were to spend the next year creating GEDCOM import and export programs, I would be spending my 68th year on this fine planet participating in a farce that must end. What am I trying to prove? That I'm not lazy? Who cares whether or not I'm too lazy to face the GEDCOM challenge. I even want to face the GEDCOM challenge. In spite of the difficulty, it would be fun and as fulfilling as solving any other intricate puzzle. But I know that GEDCOM can and will be replaced soon, either by a SQL database or by some other database product that every vendor can share as the back end of their diverse interfaces. Every day I waste trying to prove that I can wrestle alligators brings me one day closer to the real question: who needs GEDCOM?
Do we hate doing genealogy so much that we will refuse to input our data over again? That's what GEDCOM makes us do anyway, since it barely works, and worse: sometimes GEDCOM botches up our data while pretending to import and export it, and we have to not only input data all over again, we have to search for data that was input wrong by the non-standard that GEDCOM is.
My real personal reason for creating Treebard and UNIGEDS is that I love inputting genealogy data. I love to read those old census images and see the tree taking shape. For those who don't love doing this, well maybe they should try Treebard and see if it isn't so tedious after all.
I might be too lazy to work through another year of GEDCOM's surmountable quirks, but I really don't think it's laziness. For one thing, if AI can do GEDCOM better, then let AI do it. But I will never touch AI. Genealogy is my hobby, not my computer's hobby, not some website's hobby, not some robot's hobby. When I get on ancestry.com and they show me everything I'd ever want to know about some old-time family in one convenient place, I kinda resent them for it. I long for the old days when ancestry.com made you search for stuff. It's a treasure hunt. No, I'm not lazy. I'm impatient to put my hands on a genealogy program that's so much fun to use that I can't stop. That is why I started doing this, that's why I continue to do this.
I want to make it clear that if somebody doesn't create a better standard for genealogy file transfer than what UNIGEDS is becoming, then UNIGEDS will be taken up as the standard and developed, because it's available, free, in the public domain and it works well, even though it's not complete or perfect in every way. So if you don't like UNIGEDS, you'd better get busy and create something better than it. Genealogy will either standardize its back end or it will fail to be worthwhile, and that is a fact. We computer genealogy enthusiasts are up against even more addictive pursuits such as antisocial media which is taking over minds and entire poor countries. In a world where people are losing interest in anything that isn't delivered by a screen, we could at least care about the quality of what our screens deliver. Without better software, this hobby could wither up for people who have standards. You know those riddled-with-errors one-world trees the big corporations are always frothing about? The ones populated by every scrap of data, real or fake, right or wrong, true or false, that the robotware can scrape off the internet? If you care about genealogy telling a worthwhile--and true--story, then it's your problem to do something about it, as an individual, while it's still legal and sort-of socially acceptable for people to carry out individual pursuits for personal reasons.
I have decided that GEDCOM is not worth my time, not worth your time, and not worth the space it takes up on our computers. I'm going back to my real do list, once again convinced that I must discipline myself to forgo the extreme pleasure of climbing GEDCOM Mountain just to be able to say I did it. As far as I know, and mostly because of the surmountable but ubiquitous and extremely time-consuming Seven Deadly Sins of GEDCOM, GEDCOM is not worth the effort of forcing it into the Treebard/UNIGEDS arsenal of features.
What are the 7 deadly sins of GEDCOM? I haven't taken the time to list them all, but might add them if it seems important, and no doubt there are more than seven. Here are the first two time-killers when trying to write an import program for GEDCOM.
1) Custom tags, which should not exist at all and would not exist if GEDCOM were an actual standard.
2) The lack of primary keys in GEDCOM. GEDCOM has primary keys only for individuals (INDI), couples, (FAM), contacts (SUBM), media (OBJE), repositories (REPO), notes (NOTE), and sources (SOUR). Many missing primary elements of genealogy are lumped into the primary records as subordinate to them when in fact they demand their own primary keys. Extracting subordinate lines from GEDCOM's tangled mess of numbered lines in order to make primary keys for events, names, places, assertions, attributes, event types, name types, etc.... that's where I drew the line. I don't mess with Rubik's Cubes either, or things of that nature, because I don't see the point. When I finish working hard on a huge project, I want to see excellent results, but with GEDCOM that will never happen. If I were to spend the next year handling snakes, I mean GEDCOM, I would have accomplished nothing except to line myself up with GEDCOM's oversimplified, compromised, out-of-order and wrong way of describing the world.
During this heydey of mass confusion when the world was transitioning from mechanical analog devices to programmable gadgets that few managers or employees had any experience with, genealogy got in a big hurry to transition from the typewriter to the computer. Hard drives, if they existed, were measured in megabytes or less. Floppy discs could hold one or two images, if you were lucky. The relational database language SQL was in its infancy, and into this world of fledgling toy wonders was born... GEDCOM!
Since databases barely existed in 1984, GEDCOM is a text file pretending to be a database. A placeholder for a database. Or, as the old song says, "I'll be your database till your database comes." (paraphrased) The computer can sort of read GEDCOM, but in a linear way, as if it were stored on a cassette tape instead of a digital device. It does little good to store the text file on digital media; the computer might as well be reading the letters off of a piece of paper, it probably takes about that long. It takes many times longer for even the computers of today to read a text file than to read the same data from a binary file such as a SQL database. The parts of the text file can't see each other, the file does not communicate with itself. The left hand does not know what the right hand is doing.
But heck, who's in a hurry? Genealogy is just a hobby, right? Well there are those who take it considerably more seriously than that, but the slowness of GEDCOM is not its worst defect. While its being a text file makes it kinda attractive in a way--it looks like a fun puzzle to work on--the fact that its organization and self-referencing are locked up into linearly-read text means that every single one of its birth defects is one more mind-crunching pestilence of difficulty to slog through. Do you remember the software of those days? If it wasn't made by Microsoft--and sometimes even if it was--it was probably not worth the time it took you to learn it, considering what would be available a few years down the road. In the 1990s I paid good money for a drafting program that barely worked, a paint program that barely worked, etc. GEDCOM is of that world, it was born of that world, a world where, when I bought my first computer, its hard drive was only 750 Mb.
OK, I confess, having had to stand in line for a long time to purchase many a simple item during the 1980s because software was so mysterious to us, I didn't even consider becoming the owner of one of those ghastly things till my girlfriend talked me into it around 1993 or so. I wouldn't touch DOS with a ten foot pole. I still remember when my geeky co-worker said, "I love Windows!" and invited me to her family home to play with her new Windows machine. I just typed for three hours, as fast as I could, till her husband threw me out of the house with my precious half-typed document. I would soon cave in and order a machine from Gateway, in the black-and-white spotted cowhide box, complete with a simple scanner that cost $1000.
And now I try to write my own program many hours each day, usually seven days a week. We don't learn very fast, do we? Well, at least I don't.
I have spent many hours working through GEDCOM's challenges. It's easy to read with the eye, that's part of the problem. Unfortunately the computer needs to be told what all those little symbols, words and numbers mean, and then it can only read one line at a time, it cannot relate the lines to each other, because it is not a modern database but a stop-gap measure invented by genealogy enthusiasts in the early days of the personal computer who wanted to ensure that computer genealogists didn't all have to use the same application.
The inbuilt flaws of GEDCOM can all be traced to the basic underlying fact that GEDCOM is not a database. By so tracing these flaws, the only logical conclusion is that GEDCOM should have been a database, and thus GEDCOM's replacement should be a database.
Those early pioneers of computer genealogy, who well-meaningly wanted us to be able to share our family trees among various applications, no doubt had little knowledge of database software since SQL was brand new at the time, and had no way of knowing that SQL would become the standard for sharing data among multitudinous diverse applications, and remain the standard, one of the most stable and useful products in the software world.
I have nothing against vendors busting their butts to write GEDCOM import and export programs so their customers can get data in and out of their applications. But Treebard is a one-man project and it does not exist for the purpose of badly importing and exporting data. It exists for the purpose of demonstrating the best user interface features that a novice like I can laboriously create. Treebard's partner project UNIGEDS exists for the purpose of demonstrating the best database structure, the structure that genealogy, in its intricacy, demands. The fact is that for five or ten features I could build for Treebard and UNIGEDS, I could only hope to build one partial feature for GEDCOM.
Why partial? Because there is, and always will be, many genieware vendors with different ways of doing things. No matter what else happens, there is that perpetually and unalterably vicious fact of life, which is that GEDCOM can't enforce One Dang Thing. SQL comes with a regimen of enforcement tools to keep data on the right track, but GEDCOM is a text file, not a relational database. If a vendor exports a GEDCOM file full of so-called "custom tags", then it becomes the importing vendor's problem to rewrite what could have been a perfectly effective labor of love, a standard import procedure. GEDCOM's creators knew they couldn't keep people from inventing their own tags so they declared, "This is how you write a custom tag, but we don't think you should use them." That's the best they could do, short of making GEDCOM perfect from day one, and this problem of vendors using custom tags will never go away. A data-sharing program that allows everybody to do everything differently will never be a standard, no matter how many people mistake it for a standard and call it a standard. This emperor wears no clothes, no matter how many people refuse to see him as buck naked.
You genieware vendors sell your products based not on what's buried deep inside them, but on the programs' outer features, the things your customers interact with. This is the so-called front-end or graphical user interface or GUI. This is what the user cares about. Genealogists don't really care how their data is stored. They will pay for the program whose interactive features they like. Once they've learned how to use the program, if they still like it, they will recommend it to others. Tastes differ. For example, there is not a genieware product out there that I can recommend. Not even one. But there are plenty of places online to find genieware recommendations.
Genealogy is one of the most popular hobbies of our time, so all across the world many wanna-be genieware vendors are busting their butts to re-invent the wheel, and I'm not talking about the GUI where the user fills in names and dates and pushes buttons and stuff. That's not only the part that sells the program, it's the part that's very much up to the creator as to how to make it look and act. Vendors somehow imagine that they need to own their own back-ends, the innards, the database or other storage device that keeps the data safe till next time the user turns on their computer. No, that part does not need to be owned by anyone. That is where the standards need to be set. There will never be a consensus on what a proper genieware GUI should be. It's a matter of taste.
The back-end of the genealogy database software is the database itself. Since we can't use a box full of index cards for a database, like we used to do, can we ever agree to notice that without a self-enforcing standard for data storage, we will forever be at the mercy of GEDCOM or something just as imperfectible? SQL enforces data integrity, and if used incorrectly it doesn't work. GEDCOM doesn't have that problem, since it enforces nothing, anyone can break the rules and still call their import/export program 'GEDCOM'. But since every vendor uses it differently, it will never be the vendors' problem to use GEDCOM correctly, no matter how possible it supposedly is.
I've been working full-time on Treebard GPS and UNIGEDS for five years. Treebard is the GUI that I would prefer to use, as a matter of personal taste, but UNIGEDS is the database structure DICTATED by the structure of the elements of genealogy and how they relate to each other. Because SQL is built with constraints and enforcement, because SQL is a binary file that a computer can read quickly, because SQL is stable and has been the standard for data-sharing in the back-end for a long time, the structure in which genealogy data should be stored is not pliable. It's not a matter of taste, it's dictated by rules which can be learned and must be followed. Sure, it has some optional features. There are plenty of secondary features in genealogy for which no standard storage facility is needed, that's up to the vendor. But all it takes to use SQL is the basics. Nothing fancy has to be done, and SQLite can do it all. Genealogy data is complex, but it ain't rocket science. There's no calculus or trigonometry, writing most genealogy software features is not easy but it's not that hard either.
Until you run up against that GEDCOM wall. Standing at that wall I can see many ways of going forward, and I've worked hard on several of these. I can definitely see the light at the end of the tunnel, and that light is one teeny-tiny little speck. I know how to do this, and I know I can. But should I?
If I were to spend the next year creating GEDCOM import and export programs, I would be spending my 68th year on this fine planet participating in a farce that must end. What am I trying to prove? That I'm not lazy? Who cares whether or not I'm too lazy to face the GEDCOM challenge. I even want to face the GEDCOM challenge. In spite of the difficulty, it would be fun and as fulfilling as solving any other intricate puzzle. But I know that GEDCOM can and will be replaced soon, either by a SQL database or by some other database product that every vendor can share as the back end of their diverse interfaces. Every day I waste trying to prove that I can wrestle alligators brings me one day closer to the real question: who needs GEDCOM?
Do we hate doing genealogy so much that we will refuse to input our data over again? That's what GEDCOM makes us do anyway, since it barely works, and worse: sometimes GEDCOM botches up our data while pretending to import and export it, and we have to not only input data all over again, we have to search for data that was input wrong by the non-standard that GEDCOM is.
My real personal reason for creating Treebard and UNIGEDS is that I love inputting genealogy data. I love to read those old census images and see the tree taking shape. For those who don't love doing this, well maybe they should try Treebard and see if it isn't so tedious after all.
I might be too lazy to work through another year of GEDCOM's surmountable quirks, but I really don't think it's laziness. For one thing, if AI can do GEDCOM better, then let AI do it. But I will never touch AI. Genealogy is my hobby, not my computer's hobby, not some website's hobby, not some robot's hobby. When I get on ancestry.com and they show me everything I'd ever want to know about some old-time family in one convenient place, I kinda resent them for it. I long for the old days when ancestry.com made you search for stuff. It's a treasure hunt. No, I'm not lazy. I'm impatient to put my hands on a genealogy program that's so much fun to use that I can't stop. That is why I started doing this, that's why I continue to do this.
I want to make it clear that if somebody doesn't create a better standard for genealogy file transfer than what UNIGEDS is becoming, then UNIGEDS will be taken up as the standard and developed, because it's available, free, in the public domain and it works well, even though it's not complete or perfect in every way. So if you don't like UNIGEDS, you'd better get busy and create something better than it. Genealogy will either standardize its back end or it will fail to be worthwhile, and that is a fact. We computer genealogy enthusiasts are up against even more addictive pursuits such as antisocial media which is taking over minds and entire poor countries. In a world where people are losing interest in anything that isn't delivered by a screen, we could at least care about the quality of what our screens deliver. Without better software, this hobby could wither up for people who have standards. You know those riddled-with-errors one-world trees the big corporations are always frothing about? The ones populated by every scrap of data, real or fake, right or wrong, true or false, that the robotware can scrape off the internet? If you care about genealogy telling a worthwhile--and true--story, then it's your problem to do something about it, as an individual, while it's still legal and sort-of socially acceptable for people to carry out individual pursuits for personal reasons.
I have decided that GEDCOM is not worth my time, not worth your time, and not worth the space it takes up on our computers. I'm going back to my real do list, once again convinced that I must discipline myself to forgo the extreme pleasure of climbing GEDCOM Mountain just to be able to say I did it. As far as I know, and mostly because of the surmountable but ubiquitous and extremely time-consuming Seven Deadly Sins of GEDCOM, GEDCOM is not worth the effort of forcing it into the Treebard/UNIGEDS arsenal of features.
What are the 7 deadly sins of GEDCOM? I haven't taken the time to list them all, but might add them if it seems important, and no doubt there are more than seven. Here are the first two time-killers when trying to write an import program for GEDCOM.
1) Custom tags, which should not exist at all and would not exist if GEDCOM were an actual standard.
2) The lack of primary keys in GEDCOM. GEDCOM has primary keys only for individuals (INDI), couples, (FAM), contacts (SUBM), media (OBJE), repositories (REPO), notes (NOTE), and sources (SOUR). Many missing primary elements of genealogy are lumped into the primary records as subordinate to them when in fact they demand their own primary keys. Extracting subordinate lines from GEDCOM's tangled mess of numbered lines in order to make primary keys for events, names, places, assertions, attributes, event types, name types, etc.... that's where I drew the line. I don't mess with Rubik's Cubes either, or things of that nature, because I don't see the point. When I finish working hard on a huge project, I want to see excellent results, but with GEDCOM that will never happen. If I were to spend the next year handling snakes, I mean GEDCOM, I would have accomplished nothing except to line myself up with GEDCOM's oversimplified, compromised, out-of-order and wrong way of describing the world.