Post by Uncle Buddy on Feb 21, 2023 3:48:37 GMT -8
After working with UNIGEDS' newest data structure for a while and making slow progress with the queries, a better way to do it just occurred to me. By "better way", I mean, "not another wrong way, hopefully."
Yes, there's something odd about the way it works right now. The strangest thing is that the family table in the database has only one column, a primary key column. The problem has been dealing with the couple, since there are times when you want to update something about one person and not the other, so writing a query that can easily tell the difference between the two has been harder than it sounds. There's the problem of using gender to decide which is which. Not because it's socially unacceptable or politically incorrect to do so, but for a more practical reason: we don't always know which person in a couple is male and which is female, because we don't always know what gender a given individual happens to be. A girl named Frank, a boy named Sue, I could go on. It happens, in family research, completely aside from gender "issues" prominent in today's political stew of new vs. old ideals. Just today, my wife hired a neighborhood girl named Dodong to give her a massage. Dodong is such a popular male name in the Philippines where we live that you can call any male younger than you "Dodong". This is similar to, but different from, the past German custom that we genealogists run into, wherein more than one male sibling in the same family is named Johann, more than one female sibling is named Anna, and they all go by their middle names. Well I'm rambling, that's not about gender.
So here's my idea for a better schema that will be easier to query, so that I don't have to squeeze couples out of the families_persons junction table. One problem with what I've been trying to do is that it's based on two people having the same family_id and then figuring out which is which based on other column values. Since the same couple can appear multiple times under the same family_id in the same junction table, there's a problem filtering out unwanted results. For example, the same two people can appear in the junction table once because they were married, once because they had a wedding, once because they got divorced, and once for each time they parented a child. So which of these instances do you query? You have to join to the event table and get the event_id that has the right family_id. But there can be more than one of those too. The result is queries with a lot of filters: WHERE...AND...AND..AND..AND...
In the new schema, couples will be defined in the family table:
Now the family table isn't so weird, it has more than one column. Why is this possible? It's not a stroke of genius, it's how I should have done it from the start, but for some reason I failed to take into account the underlying cardinality of the relationships. The couple is what defines the family. So I could literally do this instead, with the same effect:
Each combination of two people could serve as a primary key. But I prefer to use an automatically incrementing integer for the primary key in every table. It seems like better practice to define all PKs the same way.
Anyway, cardinality is what I missed when I first started on this new scheme. The couple (not the person but the couple) has a one-to-one relationship with the family_id, so the couple should go on the same row of the same table as the family_id. This would not prevent two people from getting married to each other twice; it could be the same family or could be different families I suppose if someone wanted to do it that way. If Mary's husband John died and she moved in with Marvin and they had some recordable events and/or children, and then she threw the bum out and married Fred, then Mary would have three family_ids before she ever got to the families_persons junction table.
This will simplify things, hopefully. After I rewrite all the queries again. Too bad I didn't stop and listen when the single-column family table called out to me from the very beginning, "Hey professor... it's lonely in here..."
What I really want to do is rewrite the whole families.py module. But it has to work right before I try to rewrite it, because it worked right before I broke it by adding a family element to the database. The feature is too complex to go at it willy-nilly. It has to have a solid foundation. It has to be working code at every step of the way or I could be barking up the wrong tree and wouldn't know it.
But that's not all there is to it. Once the two people that comprise a couple go back into the same row together like they once were when this was all being done in the event table, that will eliminate at least two columns from the families_persons table, probably a lot more than that, and it will change the name of the table too. I'll try to get it right this time:
The column `which_partner` tells the GUI which partner to display on the left and which to display on the right in the parents area. Since we can't always query gender to decide this, we don't ever want to query gender to decide it. For any given couple this will be defined in one place, so it's one-to-one and it goes in the family table too. ON SECOND THOUGHT, the perceived need for this column was due to the two people being in separate rows, which was not at all helpful. I'm going back to the old way: person_id1 is on the left and person_id2 is on the right.
There's the question of what to do with kin_type. It might be redundant. Does it say the same thing as event_type? No, not really. The user and/or the code can define the same couple as spouses in one event, parents in another event, fiance/fiancee in another event, etc. I just know it doesn't go in the family table or the event table. For any couple event, so far the only thing I've been able to get into a single event row correctly is family_id, date, particulars. Age of the couple members, that's how it used to work and it could again.
Children need a junction table but unlike couples they don't take up two rows each in a families_persons table so that might work. Possibly kin_type_id1 and kin_type_id2 should be added to the event table and left null for non-couple events. And there could be a kin_type column for child but that's redundant. Well maybe not, but kin_type "child" corresponds to a birth event, "ward" corresponds to guardianship event, "foster child" corresponds to a fosterage event, and "adopted child" corresponds to an adoption event. It seems these things could be built into the code with a dictionary of correspondences but the user still needs to be able to specify labels for partners so there's no way to get rid of the type. It's useful for couples, but I've never used it for children.
Which reminds me, there are only four unique kinds of family relationships: parent, child, sibling and partner. Your cousin is your parent's sibling's child. Your aunt is your parent's sibling or your parent's sibling's partner, etc. I think I can make a relationship calculator with the four categories but I haven't tried it yet. It might be possible to do it with only two basic relationships since a child is the flip side of a parent/child relationship and your sibling is a child of your parents and your half-sibling is a child of one of your parents, etc.
Well that was a refreshing interlude, now back to the matter at hand.
It would be more accurate to call the family table the `couple` table. I'll think about it. The table name `family` corresponds to the FAM tag in GEDCOM, but GEDCOM's OBJE tag corresponds to UNIGEDS' media table and GEDCOM's SUBM tag corresponds to UNIGEDS' contact table. So UNIGEDS calls it the way we see it and the family table is going to be the couple table instead.
The families_persons table is nearly empty now but if its name is changed to `couple_children`, it will find its place in the schema of things.
The age of the partners and kin type of partners will go back into the event table where they used to work fine. The kin_type of child has never been used but it might simplify the code for alt birth events ie. adoption, guardianship, and fosterage. By "alt birth" I mean "get alt parent" but that's too many words. On the other hand, birth/adoption/guardianship/fosterage can be gleaned from the event type so the kin_type for child might be left out of the table for now to see if it ever begs to be added. Kin types for partners have to be customizable by the user, especially in these days of gender upheaval, so that category of data is not optional.
To reiterate the purpose behind all this restructuring of UNIGEDS, a couple table with a couple_id primary key is needed so that the user can say "John and Mary are a couple" without creating any marital events or adding any offspring to the pairing. This will also provide the family element that GEDCOM expects since GEDCOM defines each unique FAM as a unique couple. I am grateful to the correspondent who pointed out that having a family element would give UNIGEDS more flexibility, not less. Before, when all this information was in the event table including person_id1 and person_id2 for the couple, it was either impossible or unnatural to create a baseless couple that exists just because the user says so. A field will be added to the families table so a partner can be added to the current person without adding a couple event or child first.
Yes, there's something odd about the way it works right now. The strangest thing is that the family table in the database has only one column, a primary key column. The problem has been dealing with the couple, since there are times when you want to update something about one person and not the other, so writing a query that can easily tell the difference between the two has been harder than it sounds. There's the problem of using gender to decide which is which. Not because it's socially unacceptable or politically incorrect to do so, but for a more practical reason: we don't always know which person in a couple is male and which is female, because we don't always know what gender a given individual happens to be. A girl named Frank, a boy named Sue, I could go on. It happens, in family research, completely aside from gender "issues" prominent in today's political stew of new vs. old ideals. Just today, my wife hired a neighborhood girl named Dodong to give her a massage. Dodong is such a popular male name in the Philippines where we live that you can call any male younger than you "Dodong". This is similar to, but different from, the past German custom that we genealogists run into, wherein more than one male sibling in the same family is named Johann, more than one female sibling is named Anna, and they all go by their middle names. Well I'm rambling, that's not about gender.
So here's my idea for a better schema that will be easier to query, so that I don't have to squeeze couples out of the families_persons junction table. One problem with what I've been trying to do is that it's based on two people having the same family_id and then figuring out which is which based on other column values. Since the same couple can appear multiple times under the same family_id in the same junction table, there's a problem filtering out unwanted results. For example, the same two people can appear in the junction table once because they were married, once because they had a wedding, once because they got divorced, and once for each time they parented a child. So which of these instances do you query? You have to join to the event table and get the event_id that has the right family_id. But there can be more than one of those too. The result is queries with a lot of filters: WHERE...AND...AND..AND..AND...
In the new schema, couples will be defined in the family table:
family table
family_id INTEGER PRIMARY KEY
person_id1 INTEGER REFERENCES person (person_id)
person_id2 INTEGER REFERENCES person (person_id)
Now the family table isn't so weird, it has more than one column. Why is this possible? It's not a stroke of genius, it's how I should have done it from the start, but for some reason I failed to take into account the underlying cardinality of the relationships. The couple is what defines the family. So I could literally do this instead, with the same effect:
family table
person_id1 INTEGER REFERENCES person (person_id)
person_id2 INTEGER REFERENCES person (person_id)
PRIMARY KEY (person_id1, person_id2)
Each combination of two people could serve as a primary key. But I prefer to use an automatically incrementing integer for the primary key in every table. It seems like better practice to define all PKs the same way.
Anyway, cardinality is what I missed when I first started on this new scheme. The couple (not the person but the couple) has a one-to-one relationship with the family_id, so the couple should go on the same row of the same table as the family_id. This would not prevent two people from getting married to each other twice; it could be the same family or could be different families I suppose if someone wanted to do it that way. If Mary's husband John died and she moved in with Marvin and they had some recordable events and/or children, and then she threw the bum out and married Fred, then Mary would have three family_ids before she ever got to the families_persons junction table.
This will simplify things, hopefully. After I rewrite all the queries again. Too bad I didn't stop and listen when the single-column family table called out to me from the very beginning, "Hey professor... it's lonely in here..."
What I really want to do is rewrite the whole families.py module. But it has to work right before I try to rewrite it, because it worked right before I broke it by adding a family element to the database. The feature is too complex to go at it willy-nilly. It has to have a solid foundation. It has to be working code at every step of the way or I could be barking up the wrong tree and wouldn't know it.
But that's not all there is to it. Once the two people that comprise a couple go back into the same row together like they once were when this was all being done in the event table, that will eliminate at least two columns from the families_persons table, probably a lot more than that, and it will change the name of the table too. I'll try to get it right this time:
family table
family_id INTEGER PRIMARY KEY
person_id1 INTEGER REFERENCES person (person_id)
person_id2 INTEGER REFERENCES person (person_id)
which_partner BOOLEAN # no!
The column `which_partner` tells the GUI which partner to display on the left and which to display on the right in the parents area. Since we can't always query gender to decide this, we don't ever want to query gender to decide it. For any given couple this will be defined in one place, so it's one-to-one and it goes in the family table too. ON SECOND THOUGHT, the perceived need for this column was due to the two people being in separate rows, which was not at all helpful. I'm going back to the old way: person_id1 is on the left and person_id2 is on the right.
There's the question of what to do with kin_type. It might be redundant. Does it say the same thing as event_type? No, not really. The user and/or the code can define the same couple as spouses in one event, parents in another event, fiance/fiancee in another event, etc. I just know it doesn't go in the family table or the event table. For any couple event, so far the only thing I've been able to get into a single event row correctly is family_id, date, particulars. Age of the couple members, that's how it used to work and it could again.
Children need a junction table but unlike couples they don't take up two rows each in a families_persons table so that might work. Possibly kin_type_id1 and kin_type_id2 should be added to the event table and left null for non-couple events. And there could be a kin_type column for child but that's redundant. Well maybe not, but kin_type "child" corresponds to a birth event, "ward" corresponds to guardianship event, "foster child" corresponds to a fosterage event, and "adopted child" corresponds to an adoption event. It seems these things could be built into the code with a dictionary of correspondences but the user still needs to be able to specify labels for partners so there's no way to get rid of the type. It's useful for couples, but I've never used it for children.
Which reminds me, there are only four unique kinds of family relationships: parent, child, sibling and partner. Your cousin is your parent's sibling's child. Your aunt is your parent's sibling or your parent's sibling's partner, etc. I think I can make a relationship calculator with the four categories but I haven't tried it yet. It might be possible to do it with only two basic relationships since a child is the flip side of a parent/child relationship and your sibling is a child of your parents and your half-sibling is a child of one of your parents, etc.
Well that was a refreshing interlude, now back to the matter at hand.
It would be more accurate to call the family table the `couple` table. I'll think about it. The table name `family` corresponds to the FAM tag in GEDCOM, but GEDCOM's OBJE tag corresponds to UNIGEDS' media table and GEDCOM's SUBM tag corresponds to UNIGEDS' contact table. So UNIGEDS calls it the way we see it and the family table is going to be the couple table instead.
The families_persons table is nearly empty now but if its name is changed to `couple_children`, it will find its place in the schema of things.
couple_children table
couple_children_id INTEGER PRIMARY KEY
couple_id INTEGER REFERENCES couple (couple_id)
child_id INTEGER REFERENCES person (person_id)
couple table
couple_id INTEGER PRIMARY KEY
person_id1 INTEGER REFERENCES person (person_id)
person_id2 INTEGER REFERENCES person (person_id)
event table
event_id INTEGER PRIMARY KEY
date TEXT NOT NULL DEFAULT '-0000-00-00-------'
particulars TEXT NOT NULL DEFAULT ''
age TEXT NOT NULL DEFAULT ''
person_id INTEGER REFERENCES person (person_id) # left null for couple events
event_type_id INTEGER # can join to a type table in a different db but no FK constraints
date_sorter TEXT NOT NULL DEFAULT '0,0,0'
nested_place_id INTEGER NOT NULL DEFAULT 1 REFERENCES nested_place (nested_place_id)
couple_id INTEGER REFERENCES couple (couple_id)
kin_type_id INTEGER # for child/adopted child/foster child/ward; probably not needed
kin_type_id1 INTEGER # joins to a type table in a different db but no FK constraints
kin_type_id2 INTEGER # joins to a type table in a different db but no FK constraints
age1 TEXT DEFAULT NULL
age2 TEXT DEFAULT NULL
person table
person_id INTEGER PRIMARY KEY
gender TEXT NOT NULL DEFAULT 'unknown'
The age of the partners and kin type of partners will go back into the event table where they used to work fine. The kin_type of child has never been used but it might simplify the code for alt birth events ie. adoption, guardianship, and fosterage. By "alt birth" I mean "get alt parent" but that's too many words. On the other hand, birth/adoption/guardianship/fosterage can be gleaned from the event type so the kin_type for child might be left out of the table for now to see if it ever begs to be added. Kin types for partners have to be customizable by the user, especially in these days of gender upheaval, so that category of data is not optional.
To reiterate the purpose behind all this restructuring of UNIGEDS, a couple table with a couple_id primary key is needed so that the user can say "John and Mary are a couple" without creating any marital events or adding any offspring to the pairing. This will also provide the family element that GEDCOM expects since GEDCOM defines each unique FAM as a unique couple. I am grateful to the correspondent who pointed out that having a family element would give UNIGEDS more flexibility, not less. Before, when all this information was in the event table including person_id1 and person_id2 for the couple, it was either impossible or unnatural to create a baseless couple that exists just because the user says so. A field will be added to the families table so a partner can be added to the current person without adding a couple event or child first.