UNIGEDS models the real world
Dec 4, 2023 17:10:18 GMT -8
Post by Uncle Buddy on Dec 4, 2023 17:10:18 GMT -8
When I was googling around yesterday to see if anyone had anything to say about how many-to-many relationships are expressed by GEDCOM, I found nothing on that topic but some other things resonated with me. Or against me, as the case may be.
I saw cases where wanna-be genieware creators--or wanna-be GEDCOM reformers--were discussing events as if they were very complicated affairs. One person was talking about "who all participates in the event" and how to include them. Another was positing an abstract element called PERSON-EVENT.
Here's what UNIGEDS does with events: it models the real world.
Let's say George and Susan are both in a jury, and when they're done listening to testimony and get to the jury deliberations, it turns out that these two people have sat through the same testimony, day after day, and come away with two completely different experiences.
This is absolutely normal. Two humans experience the same event in a different way. So do we second-guess the two and find some complicated way to represent the single event as being linked to two people? It's possible that something like this might have occurred to me in the beginning, but it's not how UNIGEDS is doing things now.
I don't want to veer off into academic abstractions, so let's keep this simple. Group events make the database hard to think about. This leads to mistakes and wrong design decisions. It should save work, but it introduces inaccuracies and probably forces the designer to denormalize the database. I'm just guessing as to why I didn't go this way, because I don't remember. It was four or five years ago, and the UNIGEDS event table has represented events as single-person elements ever since.
In UNIGEDS, which could serve as a replacement for GEDCOM when it is perfected, there are actually two kinds of events: generic events which are linked to a single person, and couple events which are linked to a couple. A sub-class of the couple event exists, marital events. In a generic event such as occupation or residence, a person_id is used as a foreign key in a row of the event table, since one person has many events, and from UNIGEDS point-of-view, every event that is not explicitly a couple event is experienced by one person. I feel that this is the real-world schema for the data, as opposed to shenanigans like group events.
Birth is a special kind of generic (one-person) event because it has two more participants: the parents.
Or does it?
In terms of the data, it has one other participant: a couple. There is a couple_id column in the event table for foreign keys from the couple table. The couple table is a many-to-many table with a column for a person_id1 and a column for a person_id2. Each is a foreign key from the person table. So here is a case, in spite of what I said in my previous post, where the primary key from a many-to-many table is being used as a foreign key. The result is that in a birth event row, the person_id column is for the person who was born and the couple_id column is for the parents. This isn't the first arrangement I've tried, but I vaguely recall, a few years ago, forcing myself to buckle down and study the cardinality of the relationships, and once the arrangement was set up, it's been smooth sailing ever since.
Only the birth event uses both person_id and couple_id columns. For couple events (some of which are also marital events), the person_id column is null and there is a foreign key value in the couple_id column.
What we want to avoid doing is going cutesy-wootsy with the notion of group events, as GEDCOM does with such events as residence and immigration. This leads to all kinds of extra work when importing from GEDCOM to UNIGEDS because UNIGEDS models the real world while GEDCOM models somebody's pet ideas which apparently were not tested in real databases. Some genieware vendors don't use real databases and a few even try to use GEDCOM as a database, which guarantees that their application will be hard pressed to raise the bar over GEDCOM's low expectations.
Residence can refer to a whole family in a casual way, but not in an accurate way that reflects the real world. Robert and Sylvia had fourteen children. In 1820 six of them have been born and all six live with their parents. Ten years later, we find eight children in the household, but we don't know which eight since the census doesn't name them. Ten years later, there are eleven children, but some of them are grandchildren. Seldom will we find all fourteen children living in the same house at the same time, and not till 1850 does the US census tell us which children they actually were. So what does a family event "residence" do? It spoils the fun if what you enjoy is accuracy and a real picture of a real family.
So the moral of that story is that there is no correct way around inputting each individual as an individual in a real residence event. Immigration and emigration are just specialized residence events, the same thing goes for those. Very often, the father or both parents will emigrate first to find work and establish a home, leaving the oldest siblings in the old country to care for the young and bring in the bacon. A few years later you'll find all the kids (or some of them) on a ship manifest together, travelling to join their parents. The only solution to the tedium of inputting every individual as an individual is to make the user interface as simple and easy to use as possible. Creating group events for groups whose membership is more fluid that the software can be? That's cheating, and that sort of dishonesty ruins genealogy.
Having taken this opportunity to review some of my design decisions regarding events, my goal for today is to try a thought experiment in which there are NO group events, i.e. no couple events. So each partner in the couple would have his own marriage event and the two events would be linked somehow. I just thought of this, and it's possible that I thought of it sometime in the past and rejected the idea, but I want to go over it in my mind and wonder why any couple event is needed, because it does complicate the code somewhat, and I'd like to be able to say why it needs to be done.
I remember why I created a marital events category. When I finally got around to creating a family table on the person tab, so the user could see who the current person's parents, spouse and children were, I had to detect spouses by means of looking for marital events. For example, a "first kiss" event might create a couple, but it doesn't make a marriage. It was probably during this phase of development that I was forced to create the couple table and rewrite a bunch of code, because up till then I'd been using a "kin type" category which was annoying and troubleseome. Since I created the couple table, I've also had to do something about eventless couples. The user knows two people are a couple and wants to say so, he doesn't want to wait till he finds the marriage event, so there has to be a way to link two people besides detecting one or more marital events.
All of this (or most of it) is so very foreign to how GEDCOM does it. The FAM element in GEDCOM is really a couple element mis-named, and unfortunately, family events (so-called) are linked to family elements when some of them are generic events like residence which were experienced by some--but not all--individuals in the family group abstraction.
I saw cases where wanna-be genieware creators--or wanna-be GEDCOM reformers--were discussing events as if they were very complicated affairs. One person was talking about "who all participates in the event" and how to include them. Another was positing an abstract element called PERSON-EVENT.
Here's what UNIGEDS does with events: it models the real world.
Let's say George and Susan are both in a jury, and when they're done listening to testimony and get to the jury deliberations, it turns out that these two people have sat through the same testimony, day after day, and come away with two completely different experiences.
This is absolutely normal. Two humans experience the same event in a different way. So do we second-guess the two and find some complicated way to represent the single event as being linked to two people? It's possible that something like this might have occurred to me in the beginning, but it's not how UNIGEDS is doing things now.
I don't want to veer off into academic abstractions, so let's keep this simple. Group events make the database hard to think about. This leads to mistakes and wrong design decisions. It should save work, but it introduces inaccuracies and probably forces the designer to denormalize the database. I'm just guessing as to why I didn't go this way, because I don't remember. It was four or five years ago, and the UNIGEDS event table has represented events as single-person elements ever since.
In UNIGEDS, which could serve as a replacement for GEDCOM when it is perfected, there are actually two kinds of events: generic events which are linked to a single person, and couple events which are linked to a couple. A sub-class of the couple event exists, marital events. In a generic event such as occupation or residence, a person_id is used as a foreign key in a row of the event table, since one person has many events, and from UNIGEDS point-of-view, every event that is not explicitly a couple event is experienced by one person. I feel that this is the real-world schema for the data, as opposed to shenanigans like group events.
Birth is a special kind of generic (one-person) event because it has two more participants: the parents.
Or does it?
In terms of the data, it has one other participant: a couple. There is a couple_id column in the event table for foreign keys from the couple table. The couple table is a many-to-many table with a column for a person_id1 and a column for a person_id2. Each is a foreign key from the person table. So here is a case, in spite of what I said in my previous post, where the primary key from a many-to-many table is being used as a foreign key. The result is that in a birth event row, the person_id column is for the person who was born and the couple_id column is for the parents. This isn't the first arrangement I've tried, but I vaguely recall, a few years ago, forcing myself to buckle down and study the cardinality of the relationships, and once the arrangement was set up, it's been smooth sailing ever since.
Only the birth event uses both person_id and couple_id columns. For couple events (some of which are also marital events), the person_id column is null and there is a foreign key value in the couple_id column.
What we want to avoid doing is going cutesy-wootsy with the notion of group events, as GEDCOM does with such events as residence and immigration. This leads to all kinds of extra work when importing from GEDCOM to UNIGEDS because UNIGEDS models the real world while GEDCOM models somebody's pet ideas which apparently were not tested in real databases. Some genieware vendors don't use real databases and a few even try to use GEDCOM as a database, which guarantees that their application will be hard pressed to raise the bar over GEDCOM's low expectations.
Residence can refer to a whole family in a casual way, but not in an accurate way that reflects the real world. Robert and Sylvia had fourteen children. In 1820 six of them have been born and all six live with their parents. Ten years later, we find eight children in the household, but we don't know which eight since the census doesn't name them. Ten years later, there are eleven children, but some of them are grandchildren. Seldom will we find all fourteen children living in the same house at the same time, and not till 1850 does the US census tell us which children they actually were. So what does a family event "residence" do? It spoils the fun if what you enjoy is accuracy and a real picture of a real family.
So the moral of that story is that there is no correct way around inputting each individual as an individual in a real residence event. Immigration and emigration are just specialized residence events, the same thing goes for those. Very often, the father or both parents will emigrate first to find work and establish a home, leaving the oldest siblings in the old country to care for the young and bring in the bacon. A few years later you'll find all the kids (or some of them) on a ship manifest together, travelling to join their parents. The only solution to the tedium of inputting every individual as an individual is to make the user interface as simple and easy to use as possible. Creating group events for groups whose membership is more fluid that the software can be? That's cheating, and that sort of dishonesty ruins genealogy.
Having taken this opportunity to review some of my design decisions regarding events, my goal for today is to try a thought experiment in which there are NO group events, i.e. no couple events. So each partner in the couple would have his own marriage event and the two events would be linked somehow. I just thought of this, and it's possible that I thought of it sometime in the past and rejected the idea, but I want to go over it in my mind and wonder why any couple event is needed, because it does complicate the code somewhat, and I'd like to be able to say why it needs to be done.
I remember why I created a marital events category. When I finally got around to creating a family table on the person tab, so the user could see who the current person's parents, spouse and children were, I had to detect spouses by means of looking for marital events. For example, a "first kiss" event might create a couple, but it doesn't make a marriage. It was probably during this phase of development that I was forced to create the couple table and rewrite a bunch of code, because up till then I'd been using a "kin type" category which was annoying and troubleseome. Since I created the couple table, I've also had to do something about eventless couples. The user knows two people are a couple and wants to say so, he doesn't want to wait till he finds the marriage event, so there has to be a way to link two people besides detecting one or more marital events.
All of this (or most of it) is so very foreign to how GEDCOM does it. The FAM element in GEDCOM is really a couple element mis-named, and unfortunately, family events (so-called) are linked to family elements when some of them are generic events like residence which were experienced by some--but not all--individuals in the family group abstraction.