Post by Uncle Buddy on May 3, 2022 13:32:42 GMT -8
I'm not your plumber,
I'm not your plumber's son,
but I can be your plumber
until your plumber comes.
Tell me how long
do I have to wait?
Can I get you now
or must I hesitate?
--Hesitation Blues
I'm not your plumber's son,
but I can be your plumber
until your plumber comes.
Tell me how long
do I have to wait?
Can I get you now
or must I hesitate?
--Hesitation Blues
I already mentioned that GEDCOM is not a standard, but a utility: a tool for accomplishing something. That is, a tool for communicating data between different geniewares.
Now I have to come right out and say it with a little less tact: GEDCOM is also not a tool for accomplishing something. It's a tool for pretending to accomplish something.
GEDCOM is a placeholder.
Saying that GEDCOM is a tool for communicating data between programs is like saying President Nixon was a tool for ending the Vietnam War. GEDCOM was in the right place at the right time... once upon a time. Since it lives in the world of software, that perfectly understandable tyrant backwards compatibility still has us kneeling down to it long after it has worn out its welcome. But let's save ourselves some pain and face the task at hand. The emperor wears no clothes.
We have to look at the bigger picture. While we shudder to think what pain we'd be putting ourselves through to toss the baby out with the bathwater, and while 1984 is, from a certain perspective (that of short-sightedness), a long time ago, there is a bigger picture.
That picture is the almighty reign of genealogy as one of the most interesting treasure hunts that ever came out of the computer revolution. Do we want to see genealogy continue to sit at the top of the top five of the most popular hobbies in the world? We're missing the larger context of the problem in order to preserve its swaddling clothes. And that context is this: the phenomenon of people sitting at their computers actually did not begin a long time ago, because that phase of our existence as a species has probably only just begun. It's still early on in the age of computing. Genealogy went on its honeymoon with the wrong person. Do genealogy and GEDCOM have to stay married, just so we don't have to admit that we've made a mistake?
I am very much looking forward to re-doing all of my data entry when Treebard is ready to receive it. I know, many genealogists have thousands of times more data than I have, even though I have thousands of sources sitting in a pile of files waiting to be input. But knowing how much I enjoyed using a genieware from the pioneer days of computer genealogy, I can't imagine how much more I will enjoy using a GUI I designed myself around my own interests.
But this is not about Treebard. It's about genealogy. Even Treebard is not about Treebard. Treebard is not a self-serving project intended to trap users in itself. It's a light in a dark forest of proprietary forces which have been dragging their feet about becoming sharable. One of Treebard's slogans has long been, "Forget GEDCOM and share the whole program." That's fine for people who can change horses midstream without bogging themselves down in do-over for years. The rest of us are still flailing about for a practical solution.
Since I've recently decided to meet GEDCOM halfway by creating Treebard's (GEDCOM) import/export feature right now, I suddenly have a sort of insight into the related problems that I didn't have a month ago. Part of me actually knows what I'm talking about, while part of me knows that my rants will go in-one-ear-and-out-the-other of a large sector of the committee-mind approach which plants the hurdle that standards creators have to agree about something before anything can get done.
I'm in no position to make threats and I'm not threatening to go around the committee. Standards creators and committees in general are nothing if not perfectionists trying to do the right thing, and it takes a long time. But there's a new kid in town, and it ain't Treebard. It's the open source mob, or what you might call "the Force". GEDCOM is going to be replaced, hook, line and sinker, because it is stifling the most 2nd most popular online sport (the first most popular being social media and porn, which I place in the same category). In my research the last few days I think I see a wave of folks calling out for GEDCOM to be banished. About ten to fifteen years ago. In my optimism, I conjecture that this means the next wave is building. The next wave might accidentally wash the committee approach out to sea, good intentions and all. The Force might get this job done if the committee can't agree on something.
What Treebard really is: a two-pronged approach to the letters after its name, "GPS". Never mind what GPS stands for; its real meaning is something like "show the way". But there are two arms to this project. Back-end and front-end. Database and GUI. Since I became somewhat knowledgeable about GEDCOM--even though it was only about a week or two ago--I've realized that there is no reason to push for the general adoption of the GUI features that I've grown so fond of. Who doesn't care deeply about their own creation? The most important contribution of Treebard GPS is a sample database structure that could be adopted as a working model by The Committee and extended into the sophisticated sharing tool that GEDCOM can never be, no matter how long anyone fiddles with it.
Treebard is not a phenomenon and I am not a somebody. Treebard is just a gift, but unlike the gifts of profit-oriented players, it's not a trojan horse. Someone, whether it's I or someone else, has to create a real database--you know, those devices that were created for the express purpose of expressing relationships of varying complexity?--which really represents the real world of individuals, events, attributes, places, and how they actually relate to each other in the real world. I don't have to list GEDCOM's well-known shortcomings. So here's a list of some things that Treebard's back-end does right, and these are things that will be required in a tree-sharing tool in the distant future whether we imps of computer genealogy's honeymoon days are ready to admit it or not:
--Places have multiple parents. Dallas was once in a country called "The Republic of Texas". Did Dallas get up and move to a new country? No, it's the same Dallas it always was, in the sense that I, at age 66, am the same person whose diapers had to be changed twenty times a day, not so very long ago. Dallas has to have one place ID and multiple enclosing places. A field for time span is needed so the user doesn't have to look up the history of Podunk County every time he finds someone living there. The user (not the software provider) is the one to decide what these parent places are. The database has to handle the complexity of the situation while allowing more superficial researchers to do it their simplified way, without compromising or altering the data input by either user. Doing this with GEDCOM or any text-file approach is utterly psychotic when the RDBMS already has ALL the gear to do it without breaking a sweat.
--Places are not a feature of Google Maps. They are not a function of zip codes. Their jurisdictional level does not matter. Jurisdiction has to be optional. Even the GEDCOM specification agrees on the latter point, stating that jurisdictional distinction is included only because some genieware creators have gone overboard on categorizing places.
--Storing media in the database is bad database practice. Not everything that can be done should be done. Media are stored as links.
--Many people have multiple parents, such as adoptive and foster parents, and guardians. The user has to be able to show the people who actually raised the child, either in adjunct roles or right there in the family where they belong, and the user has to have a choice to do it either way. Biological parents are linked to the birth event, guardians are linked to the guardianship event, etc. How they're displayed is up to the GUI designer, but a complete and flexible, reality-based representation of the person's actual life circumstances has to be built into the database structure.
--Speaking of accessory roles, they are as important as biological family to some family stories. They cannot be created as an afterthought. These folks need to have equal status in the database as any relative. Many of them (such as people listed as boarders and neighbors in the census) turn out to be relatives, and many others are tightly woven into the actual tree-related person's life.
--Everything that a person got called during his life has to have its rightful place in the database along with a name type. The "preferred name" concept is weird if not downright silly. The birth name takes precedence over other names. If there's a question as to what the birth name is, the name type system solves most of the problems and the conclusion/assertion system solves the rest.
--Speaking of types, the user has to be free to create his own, without necessitating something like the disaster of GEDCOM's custom tags which serve only to expose GEDCOM as the untool which it is.
--Real RDBMS (SQL) databases such as SQLite--which is easy to use because it's serverless--provide one-to-one, one-to-many, and many-to-many relationships among stored elements because that's what SQL was made to do. A good example of a many-to-many relationship that exists--as a many-to-many relationship--in nature is a Note element. It's up to the user how to use notes, not the Philosophers on the Board of Standards. Any note has to be linkable to any other element. That way, there's no copy/pasting of a note from element to element within the tree. If that's too simple for someone to understand, then he can keep GEDCOM.
--The storage of assertions (what the source claims to be true) separately from the user's conclusions is optional to the user but not to the reality-based file-sharing data structure. This is not an abstract thing to try and accomplish. It's just another ordinary RDBMS task. The database has to have parallel tables of separate-but-linkable events/attributes/facts: one table for what the source asserts and one for what the user concludes. Their columns would largely be the same. Except that conclusions (i.e. events and attributes) are linked to assertions, while assertions are linked to sources. This has to be designed in a way to make assertions and sources optional, because many people who are only slightly or temporarily interested in genealogy are never going to record their sources no matter how very wrong this omission is. No matter how easy we make it for them to do. Which isn't something anyone has done yet. The making it easy part. It's hard to make sourcing easy, so no one has bothered to do it yet.
--Sources, citations, assertions and conclusions are forever separate-but-linkable. They do not morph into each other. Events and attributes, on the other hand, are a single fuzzy category. They have to be treated the same, optionally dated.
--The family is not an element of family history, any more than water (H2O) is an element of hydrogen peroxide (H2O2). Both compounds have the same two elements. The family is a compound. It is comprised solely of individuals. Tracking families by ID is an artificial contrivance which reduces GEDCOM to adjectives I don't want to use in public. If a software provider wants to track families as if they were elements, give them an ID, etc., then that has to be optional. Building a data structure around redundancy is going to have predictable results: confusion, mistakes, and extra work for everybody. Recording the same data twice is one of the biggest no-nos in computer programming for good reason.
The simple fact is that a file-sharing tool that allows software providers to create custom tags--encourages them to create custom tags, forces them to create custom tags due to its own shortcomings--is not a file-sharing tool at all. It's just a happy-go-lucky substitute for a file-sharing tool till the real one gets here. I predict that if something like Treebard's database doesn't replace GEDCOM, it will be only because the Force will come up with something even better and just as accurate. I predict that it will not be done in cooperation with the existing genieware providers, because the existing genieware providers are not motivated to help people export their data to a different product. I predict that these companies will all be left behind due to their unwillingness to adopt a real reality-centric common database structure as a true file-sharing utility. To put it bluntly, all genieware providers who actually intend to have their data be shareable will adopt the same SQL database structure as their backend, and build their product identity on the strength of their user interface. The companies that won't play will continue to pretend that GEDCOM is a real file-sharing tool.
But the Force will have its way, because it's way bigger than any committee, and one day we'll look around us and say, "Whatever happened to all those companies that used to sell genealogy database software?" The answer to that question: GEDCOM killed them.