|
Post by Uncle Buddy on Feb 9, 2023 4:50:39 GMT -8
In this thread I'll work through some difficulties in understanding GEDCOM's unnecessary FAM tag, the subsequent redundancy in referring to its elements, and what UNIGEDS needs to do about this. I will have some misconceptions at first, which I will try to recover from.
Keep in mind that I'm not saying the traditional family unit is fictional or that it serves no purpose to society or the individuals in the family, or that it should not be represented in the genieware interface. I'm only saying that the family is a fluid, ever-changing structure without static boundaries, and therefore not accurately represented as a UNIT (a database element with an ID).
MARITAL EVENT TYPES
2 marriage 11 wedding 15 divorce 17 annulment 18 separation 59 marriage license 85 filing for divorce 90 partnership 93 anniversary celebration 98 marriage contract 102 cohabitation 103 living together 104 wedding anniversary
FAM LINES
0 @f2@ FAM ... 1 MARR ... 1 HUSB @i1@ 1 WIFE @i3@ 1 CHIL @i8@ 1 CHIL @i6@ 1 CHIL @i7@ 1 CHIL @i92@
INDI LINES
0 @i1@ INDI 1 NAME Anthony Edward /Munro/ 1 SEX M 1 FAMC @f3@ 1 FAMS @f1@ 1 FAMS @f2@
0 @i2@ INDI 1 NAME Julia Amanda /Fish/ 1 SEX F 1 FAMC @f37@ 1 FAMS @f1@
GEDCOM's family element is designated by the FAM tag which creates an artificial collection of individuals that is impossible to accurately track since the individuals this element includes can change often but can only be represented as a static collection in GEDCOM. UNIGEDS replaces this rigid, inaccurate system by tracking the individuals and their exact relationships to each other. UNIGEDS leaves the representation of a family element up to the GUI designer. Since a family element is actually a compound of individuals and relationships, and since it is always in flux (unlike an individual for example which is a real irreducible element that has a definite boundary around it), UNIGEDS has to glean data from GEDCOM's family records, represent it correctly, give the individuals and their relationships to the GUI to do whatever it wants with them, and when it's time to create GEDCOM to export its data, UNIGEDS then has to get the individuals and their relationships into the redundant and unnecessary format used by GEDCOM, the FAM tag and its subordinates.
UNIGEDS could have a family table in the database but it would mean having to maintain the table while the data is all being kept, updated, and used in other tables. The family table therefore is not only unnecessary, but also undesirable, since it would increase the amount of work that has to be done. Creating FAM tags and their subordinates when it's time to export UNIGEDS' succinct data is more efficient than continously updating a table that UNIGEDS doesn't need for its own purposes.
There's a mismatch between how UNIGEDS works and how GEDCOM works that has to be compensated for. GUIs based on the assumptions made by GEDCOM generally make it possible for the user to create families that have no marriage-like events and no children. The user just types a name into the GUI on the left and another name on the right, and there's an instant family, which most GUIs automatically call "spouses" even without marital events. UNIGEDS on the other hand enables the GUI developer to display all the current person's partners and children in the same table, as demonstrated by Treebard GPS. In Treebard, the user adds a child or a marital event such as wedding, cohabitation, divorce, etc. in order to add inputs to the family table for a partner name. Because of this mismatch, when UNIGEDS imports GEDCOM, it has to check for FAM records with no events or children. For each baseless family element, UNIGEDS creates a generic partnership event so that the partnership will show up in the GUI and so it will be found by an export procedure later. The user can change the kin type from "partner" to "spouse" or whatever he wants, including his own custom kin types.
GEDCOM not only creates an unnecessary and unperfectable family element, but then tries to track its members redundantly. The FAM tag provides an (unneeded) primary key which is reference by FAMC (child) and FAMS (spouse) foreign keys (pointers) in the INDI records. These pointers are useless annoyances which should be ignored. The same data and more is provided by then inserting CHIL, HUSB, and WIFE tags (correctly) in the FAM records. Here's how we know which of the two redundant insertions is correct. Usable, maintainable, extendable database structure relies on adherence to the rules of cardinality. Redundant input of data into a SQL database is not strictly forbidden, but it's called "denormalization" for a reason: it makes the programmer work extra hard and keep track of things that the database, being denormalized, can't keep track of itself. SQL is designed by real programmers, so it intends to never do anything twice.
Cardinality refers to the designation of relationships between two kinds of data as one-to-one, one-to-many, or many-to-many. John has a person ID and his fingers each have a finger ID. If you look at it from the finger's perspective, you might get it wrong, because for each finger there's only the one John. To get cardinality right, you have to look at the relationship from both directions. From John's point-of-view, that's one John and many fingers. So it's a one-to-many relationship, and all one-to-many relationships follow this pattern: from the perspective of the one, there are many related elements. But from the perspective of each of the many elements, there's only the one related element.
The one-to-one relationship is like John and his head. Assuming that John's a regular guy, for each John there is one head. Turn that around: for each head there is one John. That pattern indicates one-to-one cardinality.
The many-to-many relationship is like a student table where each student has a student ID, and a class table where each class has a class ID. Each student needs a link to many classes and each class needs a link to many students. So the one-to-one and many-to-many relationships are symmetrical from either point-of-view, and the one-to-many relationship is not. Getting this wrong will require a redesign down the road, so the database will be easy to understand and use. The creators of GEDCOM would have been using SQL database except that in 1984 when they created GEDCOM, SQL was only a few years old; it was not a universal tool at the time. So the creators of GEDCOM did what anyone would do: they made a tool they could read with their own eyes. Unfortunately, considering the future fact that SQL would become ubiquitously the preferred format for storing intricately related data, the creators got cardinality wrong and never fixed it. They required pointless and destructive redundancy, thus booby-trapping their own work with denormalization as the base assumption.
Assuming that the family element isn't going away, in spite of its rigidity and inborn inaccuracy, here's how it works in terms of cardinality.
For each family, there can be more than one husband, more than one wife, and more than one child. Each of these folks has a person ID. From the perspective of each person, there is one family. Never mind that this "unit" falls apart as a data structure if you bother to look at it. We have to use it because everyone else does. And if we're gonna use it, we have to use it as accurately and efficiently as possible. We've just decided that the family:person relationship is one:many. We've defined cardinality's three types. But once we've done all that, what are we supposed to do about it?
In a SQL database, a one-to-one relationship is best represented by putting the two elements on the same line of the same table. For example, prior to this century, every person came with one gender. So gender is about the only thing you can put on the same row of the same table as the person ID. If the gender people get their way, we'll have to change that, but for the sake of this discussion let's keep it simple.
A one-to-one relationship can use a foreign key if a UNIQUE constraint is placed on the foreign key column in the child table. So you have a person table with one column for person IDs. You could have a name column if you plan to track only one name, but in genealogy we need a name table so a one-to-many relationship can be represented in regards to person:name. One person can have many names, but each of these names refers to only one person. In a simple database which plans only to store one name per person, you could put the person ID in the name table as a foreign key, but if so there has to be UNIQUE constraint on the person ID foreign key column so that each person ID can only be used once in the name table. UNIGEDS does this for change dates in order to keep from having to add a change date column to every table.
But in UNIGEDS' more realistic name model, the cardinality of person:name has to be represented correctly. This is always done by putting the foreign key in the "many" table. So there's a person ID foreign key column in the name table and the person table never mentions names at all. We'll get back to this point in just a second.
Also in UNIGEDS we make it possible for any note to be linked to any number of elements, and obviously any element can have any number of notes linked to it. No more copy-pasting notes from person to person. So this is a many-to-many relationship, which is always represented by a foreign key column for both sides of the relationship. You need a place to put the pair of foreign keys, so they go in a junction table with a person ID foreign key column and a note_id foreign key column. The person table never mentions notes at all and the note table never mentions persons at all. This is non-redundancy in action, keeping our tasks... possible.
Since a family-to-person relationship is one-to-many, the only place a foreign key is needed or wanted is in the person table...
Wait. Hold everything. I didn't analyze cardinality right, and it has to be right. I'll backtrack a bit.
Each child has one family, right? Nope. Each child has one biological family but potentially many foster/guardianship/adoptive families. And each family can have many children. That is a many-to-many relationship.
Each family head can have any number of partners and start any number of families. And each family can have any number of mothers or fathers. So spouse and child tags need to be provided with a junction table if we use a family table. The person ID for spouse or child can be linked to any number of family IDs and the family ID can be linked to any number of person IDs.
Even though the family table is redundant, GEDCOM uses it so we have to get the cardinalty of its elements right, or we won't know what to do with its data and subordinate data.
I don't know why the creators of GEDCOM required users to put foreign keys for family elements in INDI records and foreign keys for person elements in FAM records. Could it be they were trying to simulate a junction table? Unfortunately they were not trying to simulate reality, but rather the Christian/Mormon assumption that the nuclear family is an absolute, like a source or person. While ignoring the fact that all actual irreducible elements need their own primary key, such as names, places, citations, etc. Fortunately, I wrote UNIGEDS before I studied GEDCOM, so there's already a non-redundant way to represent family relationships in UNIGEDS, where these relationships are recorded one relationship at a time instead of being lumped together into artificial family "units". UNIGEDS has no family table because it doesn't need one. But we still have to analyze cardinality correctly so that we know how to use the data GEDCOM provides.
When I started writing this post, I had not realized that family-to-person is a many-to-many relationship, but it certainly is. In the next post I will consider whether or not this means that the double-entry or redundancy in GEDCOM is correct, because it's trying to simulate a junction table. I'll consider whether to augment the current vision of UNIGEDS with the family structure as well.
|
|
|
Post by Uncle Buddy on Feb 9, 2023 4:51:27 GMT -8
UNIGEDS EVENT_TYPES WHERE MARITAL = 1
event_type_id event_types ------------- ------------ 2 marriage 11 wedding 15 divorce 17 annulment 18 separation 59 marriage license 85 filing for divorce 90 partnership 93 anniversary celebration 98 marriage contract 102 cohabitation 103 living together 104 wedding anniversary
SAMPLE UNIGEDS EVENTS WHERE COUPLE = 0 AND MARITAL = 0
event_type_id event_types ------------- ------------ 83 adoption 1 birth 4 death 95 fosterage 48 guardianship 6 occupation 13 residence
UNIGEDS KIN_TYPE TABLE
kin_type_id kin_types 1 father 2 mother 3 parent 4 son 5 daughter 6 child 7 husband 8 wife 9 spouse 10 mate 11 brother 12 sister 13 sibling 14 partner 15 common-law spouse 16 fiance 17 fiancee 18 boyfriend 19 girlfriend 20 lady friend 21 lover 22 mistress 23 betrothed 24 groom 25 bride 26 biological parent 27 biological father 28 biological mother 110 adoptive parent 111 adoptive father 112 adoptive mother 120 foster parent 121 foster father 122 foster mother 128 generic_partner1 129 generic_partner2 130 guardian 131 legal guardian
GEDCOM FAM RECORD
0 @f1@ FAM 1 MARR 1 DIV 1 HUSB @i1@ 1 WIFE @i2@
0 @f2@ FAM 1 MARR 1 HUSB @i1@ 1 WIFE @i3@ 1 CHIL @i8@ 1 CHIL @i6@ 1 CHIL @i7@ 1 CHIL @i92@
GEDCOM INDI RECORD
0 @i1@ INDI 1 NAME Anthony Edward /Munro/ 1 SEX M 1 FAMC @f3@ 1 FAMS @f1@ 1 FAMS @f2@
0 @i2@ INDI 1 NAME Julia Amanda /Fish/ 1 SEX F 1 FAMC @f37@ 1 FAMS @f1@
0 @i3@ INDI 1 NAME Susan Isabel /Dowling/ 1 FAMC @f24@ 1 FAMS @f2@ 1 FAMS @f21@
There would be some advantages to actually using a family table along with the required junction table which would look like this based on the GEDCOM lines shown above:
PERSON
person_id gender 1 M 2 F 3 F 8 F 6 M 7 F 92 M
FAMILY
family_id 1 2 3
PERSONS_FAMILIES
pfid family_id person_id kin_type_id age 1 1 1 9 25 2 1 2 9 28 3 2 1 9 30 4 2 3 9 29
FINDING
finding_id person_id family_id event_type_id age date particulars nested_place_id 45 null 1 2 48 null 1 15 59 null 2 2 167 8 2 1 0 54 6 2 1 0 89 7 2 1 0 498 92 2 1 0 87 1 3 1 354 1 null 6 4 bdrm bungalow 17 34 1 null 13 1945 gardener 17 12 1 null 13 1967 landscaper 19
I try to be honest with myself even when it hurts. The fact is, I've sweat blood over the couple and child relationships. It works OK now except that baseless families can't be made. There isn't even an input for a partner in the GUI families table until the user inputs a marriage-like event or a child. The families table was hard to get right. I'd assumed that this was because of my lofty goal of showing every partner and child of the current person in the same table. I still agree with this goal, but would it have been easier if I'd used a family element? If I did use a family element, is it really a rigid, inaccurate structure or is it just a better idea that I didn't think of myself?
If I decide it would be better to use a family table, a lot of restructuring will have to be done in the database, which is not hard. But a lot of code will also have to be rewritten, code that I sweat blood over.
Did I mention that I have sweat blood over this feature? And it already works?
If UNIGEDS is gonna replace GEDCOM, it has to do what GEDCOM can do, and do it better. If people want to say, "Ron was the dad and Mary was the mom," and nothing else, and then they want to see a family with no marital events and no children, it would be nice to be able to do that without having to auto-create a generic marriage-like event without the user's permission.
I have to consider whether all genieware has a family unit just because it's the right way to represent the data. I have to wonder whether I've been missing something, and making my job harder by oversimplifying something.
All because I got the cardinality wrong, there may be hell to pay. But if it makes UNIGEDS better, it has to be done. What I can't do is dumb down UNIGEDS to match GEDCOM. That would just be more GEDCOM. However, a structure that fits the expectations of vendors--if they are correct--is essential to its ever being adopted.
The cardinality for marital events also has to be revisited. This has been another case of try again, and again, and again. It's working fine now but if you look at GEDCOM, the marital event is linked to the family table, instead of being stored in the event table. Is this a good thing or a bad thing? One family can have many events: marriage, divorce, remarriage; but one event belongs to one family. Family-to-event has a one-to-many relationship so family ID is a foreign key in the event table where UNIGEDS stores events. This is currently being accomplished by storing a person_id, person_id1, and person_id2 in the event table, along with two age columns and two kin type columns. The new arrangement will remove six columns from the event table and replace them with four columns in the persons_families table. It doesn't change the cardinality of family-to-event, which I already had right, but it's a simpler way to say the same thing, and I think it will solve the problem of fixing the unlikability of the current solution even though the current solution works. The current solution: person_id is used for the person born, while person_id1 and person_id2 refer to parents and partners. A somewhat unlikable, slightly messy solution.
It occurred to me that if the family is an inferior genealogy element due to its being hard to define clearly, then I have to find a way to define it clearly. Here's a stab at it: a family is 1) a collection of individuals based on marriage-like events, children in common, or an arbitrary decision, and 2) entirely independent of time. The second item means, for example, if little Chester died in infancy in 1884, he is still part of the family in 1886. If this is accepted as a definition, then the family does have a definition: two people and their biological children. (What about adoptive families? What if the father adopts the child and the mother is a biological mother or a foster mother? That's where kin type comes in.) A family based on an arbitrary decision is a baseless family that's given a family ID only because two people are entered together in the GUI as a couple, even though they have no children or marital events entered. Users expect this, and since the tables where these events are displayed are conclusion tables, there's no harm in displaying baseless couples.
I still think it's wrong to give a residence event to a family, and UNIGEDS won't allow it. This doesn't work with my definition of family ("entirely independent of time"). In terms of trackability and accuracy, a family doesn't live somewhere; individuals live somewhere together. When my birth family left Colorado, we left a sister behind. When we left New Mexico, we left another sister behind. When my dad and I left Kansas, we left my mom and brother behind. We're still the same set of related people--related by biological parenthood, not by place or time--but trying to say where the family lived as a unit could not be done. GEDCOM wrongly allows a family to reside somewhere as a unit, which isn't necessary since GEDCOM also allows individuals to reside somewhere. UNIGEDS has never even considered making "residence" a multi-person event, how could GEDCOM keep doing this in version after version?
The new system might make it easier to assign offspring events to parents, which Treebard does automatically based on birth events it finds in UNIGEDS. Anyway, it won't be harder.
|
|
|
Post by Uncle Buddy on Feb 9, 2023 4:51:57 GMT -8
My aunt had a slightly unusual nuclear family scenario which I'll try to record in the new family system. She had a child with a first husband, divorced him, and had a child with a second husband. The second husband legally adopted the first child.
event_type_id event_types ------------- ------------ 83 adoption 1 birth 4 death 95 fosterage 48 guardianship 6 occupation 13 residence
PERSONS_FAMILIES
pfid family_id person_id kin_type_id age 1 1 1 9 25 2 1 2 9 28 3 2 1 9 30 4 2 3 9 29 5 99 390 7 25 6 99 332 8 18 7 102 332 8 21 8 102 396 111 23
Offspring events are auto-constructed, not recorded. We can't record the birth event again from the parents' point-of-view. The GUI creates offspring events for the two parents when it finds a birth event in the event table.
EVENT
event_id person_id family_id event_type_id age date particulars nested_place_id 45 null 1 2 null 48 null 1 15 null 59 null 2 2 null 167 8 2 1 0 54 6 2 1 0 89 7 2 1 0 498 92 2 1 0 87 1 3 1 0 354 1 null 6 4 bdrm bungalow 17 34 1 null 13 1945 gardener 17 12 1 null 13 1967 landscaper 19 25 250 99 1 0 1955 49 28 259 99 1 0 1958 49 66 null 99 2 null 1955 75 null 99 15 null 1956 78 null 102 2 null 1957 92 250 102 83 1 1956
The act of becoming an adoptive parent for person number 396 (my aunt's second husband) is constructed by the GUI when it comes upon an adoption event, but unlike a birth event, it doesn't give the adoption to both partners. In the case of my aunt, the older child was hers biologically so adopted only by the second husband.
There's a problem that I've just noticed. Age at marriage can't go into the persons_families table. Families have to be entirely independent of time, or they have a changing definition. Because they're independent of time, they can't mention events or age. I was hoping to put age in a column on the junction table since there are separate lines for the two partners. This won't work.
The ages of the partners can't be kept in one table cell such as "26/27" at the time of marriage. That's another way to denormalize a database. I suspect that the right solution here might be to separate couple event types and generic event types into two separate tables, and separate couple events and generic events into two separate tables. This would be a good time to get rid of the term "finding" and use the word "event" instead. It's a throwback to when I was trying to re-invent genealogy from scratch, which is not my goal anymore. I do still like showing events and attributes on the same conclusions table, but instead of calling a collection of conclusions on a row of the conclusions a finding, it can be changed to the normal word "event" which will signify both events and attributes.
But would that even solve the problem? Here's a hypothetical schema for a couple-event table:
couple_event_id family_id event_type_id person_id1 person_id1 age1 age2 date particulars nested_place_id
I don't have to add data to this to see the problems. It's obviously very close to the event table I'm talking about breaking up. But the real problem is the double-entry of person_id1 and person_id2 to indicate a family when that data has already been put into the family table. This is denormalization.
Before changing anything, it needs to be reiterated that the way I'm doing it already works and is not denormalized.
I was all set to go to all this trouble of restructing the database and rewriting a bunch of queries and functions when this problem was spotted: how to record age separately for the two heads of the family. Well maybe GEDCOM knows. GEDCOM's been doing this a lot longer than I have.
GEDCOM puts the couple events in the FAM record.
I have to suspect that this could be wrong, I mean the cardinality has to be inspected before even considering it. Not that I want to consider it, because events get linked to dates also. That's age and date in the family table, but we already decided that a family only has a fixed boundary around it if it's defined independently of time. But I want to isolate my thinking to one problem at a time, and right now the problem is cardinality.
It was decided above that family:event is a one-to-many relationship. A couple can experience many events, but each couple event relates to one couple only. So a foreign key for family has to go into the event table. This might help the schema of the hypothetical table; I see that the column family_id says the same thing as the columns person_id1 and person_id2. So that leaves:
couple_event_id family_id event_type_id age1 age2 date particulars nested_place_id
That's not very satisfying, because we don't know which person gets age1 and which gets age2. Much worse than that, we see that by putting an event into the family table, GEDCOM has the cardinality backwards. The family is supposed to be referenced in the event table, and the event is not supposed to be mentioned in the family table. Not that GEDCOM is a SQL database. But that's the whole problem, macroscopically speaking. That's why UNIGEDS is being created so carefully: so that when GEDCOM is replaced, its replacement is not another freak of nature like GEDCOM. Well, not really a freak of nature. More like a six-headed monster from another galaxy. Well, that's too harsh. More like a slave trying to carve granite into pyramid blocks with copper chisels and wooden hammers. Yeah, that's the ticket. Status-quo's typical "good enough for government work" alibi for everything. The emperor with no clothes perpetrated as "the genealogy standard". Stuff like that.
Philosophically speaking, I noticed a long time ago that time considerations such as age and date are not elements of genealogy. They can't get IDs. You can't assign a unique ID to every moment or even every day. A date has to serve as its own ID, there's too many of them. Age, like date is just a way to measure time. In any case, these things have a one-to-one relationship with the event they're describing. Where the event is identified, that's where date and age go, in the same row of the same table as the event_type_id.
It occurs to me that a junction table is needed, because a couple event is linked to more than one person and each of these persons can be linked to more than one couple event. But date and age don't both go in a junction table. Date is relative to the couple event, so it's the same for both persons. Age is measured from birth, which is different for each person.
Another correction needs to be made, since it was determined that a family:child is many:many. A couple has several children and the children each belong to only the one family. So it seems like a one:many relationship. But the mother dies and the father drops his children off at the orphanage on his way to greener pastures and each child is adopted by a different family (or families plural if you count temporary foster homes). Now we see that each child can also have several families. So the birth, fosterage, guardianship and adoption events have to be moved to the couple event table which therefore has to be renamed the family_event table. This completely eliminates the family_id column in the single event table. A row in the age table is not needed for birth events since the vast majority of folks are born at the age of zero, but for adoption etc., a row in the age table can be used.
As it turns out, since a child can have multiple birth/adoption/fosterage/guardianship events but each of these events refers to only one child, this is a one-to-many relationship and the person_id goes in the family_event table as a foreign key. So birth events are recorded in the family_event table, unlike couple events, and the persons_families table should be called the persons_couples table since its person_id column can't be used for the children in the family. Possibly the word "family" should be replaced with "couple" in other places too since it's the pairing of two persons that seems to define a "family" whether or not there are any events or children.
AGE
age_id family_event_id person_id age 123 45 1 25 156 45 2 28 199 48 1 26 255 48 2 29 414 59 1 30 426 59 3 29 155 66 390 25 156 66 332 18 244 75 390 26 266 75 332 19 854 78 332 21 952 78 396 23 584 92 250 1
EVENT
event_id person_id event_type_id age date particulars nested_place_id 354 1 6 4 bdrm bungalow 17 34 1 13 1945 gardener 17 12 1 13 1967 landscaper 19
FAMILY_EVENT
family_event_id family_id child_id event_type_id date particulars nested_place_id 45 1 null 2 48 1 null 15 59 2 null 2 66 99 null 2 1955 75 99 null 15 1956 78 102 null 2 1957 167 2 8 1 54 2 6 1 89 2 7 1 498 2 92 1 87 3 1 1 25 99 250 1 1955 49 28 99 259 1 1958 49 92 102 250 83 1956
PERSONS_FAMILIES
pfid family_id person_id kin_type_id 1 1 1 9 2 1 2 9 3 2 1 9 4 2 3 9 5 99 390 7 6 99 332 8 7 102 332 8 8 102 396 111 9 2 8 6 10 2 6 6 11 2 7 6 12 2 92 6 13 3 1 6 14 99 250 6 15 99 259 6 16 102 250 6
Now all I have to do is replace the word "finding" with the word "event" in all my tables, databases, queries, variables, classes, functions, etc. Then rewrite the SQL queries and Python code to reflect the new database structure with an added and active family table. No biggie, it should only take a week or two. That usually translates to a month or two, but what is an old man without something to keep him busy? Same as a young man without something to keep him busy, but older.
First I have to finish my GEDCOM import module, the places tab, the sources tab, and whatever else I've gotten started.
|
|
|
Post by Uncle Buddy on Feb 12, 2023 1:23:30 GMT -8
There's something I didn't take into account when I replaced the database tables that were being used to store data about family relationships such as parent, child, foster parent, etc.
I used to have a person_id1 and a person_id2 which both referenced person IDs (primary keys from the person table). The male/father/person on the left of the GUI design was person_id1 and the female/mother/person on the right of the GUI design was person_id2. This is not good database design, as far as I know, and I was happy to get rid of it. You aren't supposed to have column order in a database table mean anything.
The same information is now stored in a many-to-many table called family_couple which references a family ID and a person ID in one row. The rows with the same family ID constitute a couple. Of course it has nothing to do with gender, don't worry about that.
When I started looking at rewriting queries, only then did I realize that I now have no way to code which is the parent on the left and which is the parent on the right. It won't do to look at which is male/female as there isn't a guarantee that there will be one of each, so I guess I'll need a boolean column or something so that the schema will look something like this:
family_couple
family_id INTEGER REFERENCES family (family_id) person_id INTEGER REFERENCES person (person_id) kin_type_id INTEGER REFERENCES kin_type (kin_type_id) which_partner BOOLEAN
So for example, 0 in the boolean column is for what used to be person_id1, and 1 in the boolean column is for what used to be person_id2.
|
|
|
Post by Uncle Buddy on Feb 12, 2023 22:24:25 GMT -8
sqlite> select * from event; event_id date particulars age person_id event_type_id date_sorter nested_place_id -------- ------------------ ----------- --- --------- ------------- ----------- --------------- 1 -1945--------- 1 4 1945,0,0 1 2 -1922--------- musician 43 1 6 1922,0,0 2 3 -0000-00-00------- 1 13 0,0,0 1 4 -0000-00-00------- 0 3 1 0,0,0 1 5 -0000-00-00------- 0 4 1 0,0,0 1 6 -0000-00-00------- 0 5 1 0,0,0 1 7 -0000-00-00------- 0 6 1 0,0,0 1 sqlite> select * from family_event; family_event_id date particulars person_id event_type_id date_sorter nested_place_id family_id --------------- ------------------ ----------- --------- ------------- ----------- --------------- --------- 1 -0000-00-00------- 2 0,0,0 1 1
The two tables `event_` and `family_event` are very similar. Splitting them into two tables has introduced a new problem that will force me to rewrite a lot more code than should be necessary. This is because there are now two separate primary keys, which I quickly learned the hard way is going to mess up my algorithm for displaying events in the conclusion table. That's because each event gets a row based on a dictionary key which is the same number as the primary key in the database table. But as you can see in the above output, the dict key `1` that's gotten from the event table primary key will be overwritten by the primary key in the family_event_id table since both PKs are 1.
To relieve the burden of redesigning a whole new dictionary (probably by adding another layer of nesting to the dict), I hope to recombine the two tables once again to match the way things worked before. The family, family_couple, and age tables will still be used for couples and parents. The new event table will be about the same as the `event` table schema shown above with only a `family_id` column added.
There are couple events such as "marriage" or "first kiss". There are generic events such as "occupation" and "residence" which apply to individuals. And there is the "birth" event which is a generic event for the person born as well as a couple event for the parents. So for birth events, all the columns will be used. For couple events, the person_id and age columns will be null. For generic events, the family_id column will be null.
That should make it work a lot more like the old way (or I could say, "like the old algorithm", but certain forces in modern online society have ruined the word "algorithm").
|
|
|
Post by Uncle Buddy on Feb 13, 2023 0:56:32 GMT -8
That worked. An interesting repercussion of this change is that you can delete an event now, such as marriage, without having to worry about deleting the relationship. The event is just an event, it no longer defines the relationship. The relationship can now be "baseless", i.e. based on the decision of the user to create a family based on two people forming a couple.
|
|
|
Post by Uncle Buddy on Feb 13, 2023 19:37:44 GMT -8
Before, kin_type_id had a one-to-one relationship with each event. For example, the respective kin_type_ids for the two partners in an event row might express "spouse" or something for a marriage event but the same two people could be labelled parents for a birth event . This is not being expressed properly by the new code so far, with a family_id link in the event row. Now a family element is a couple and their corresponding children. So one thing that's wrong is that the family_id link in the event table would be the same for a couple's wedding, marriage, divorce, children and first kiss. To make the code work without major changes, what's needed instead as a 1:1 link in an event row might be a link to a row in the family_couple table. Then in the family_couple table, the pairs of rows that express a relationship between two people could exist in more than one pair of rows. I can't link a birth event to a whole family as I've been trying to do, because siblings don't give birth to each other. If that doesn't work, then still another table might be needed for links to kin_type_id. So the cardinality of that relationship between an event and a pair of kin types has to be looked at in order to decide what to improve and how.
In the past, when working with these same issues, I had several unnecessary junction tables and finally realized that the need to use the primary keys of junction tables as foreign keys in other tables was a tip off: the elements had a one-to-one relationship with each other. So I was able to drastically simplify the code and the queries by combining three or four extraneous tables, so all that one-to-one data could be in the same row of the same table. This was the right thing to do under the circumstances, but the circumstances have changed. I have a new family element that didn't exist before in the data structure. There are advantages to ripping a bunch of columns out of a wide table, but the queries become more complex because joins are more common so there's more to think about and try to understand before it will be right. So the question is: What is the cardinality of event-to-kin_type?
In the previous approach (I can't say the word "algorithm"), it was one-to-one, which is obviously correct in regards to a couple's corresponding kin types. If the event is a birth, the kin types are mother and father. If the event is a fosterage, the kin types are foster mother and foster father. If the event is a marriage, it's some version of husband and wife, spouse and spouse, partner and partner. We don't dictate these things, the user decides what kin types to use except for biological parents. But there's only one pair of parents. Right?
I don't think so. There are two things wrong with this. First, it's conceivable that there could be more than one way to describe the kin relationship between a couple as regards the birth of their child. But maybe not likely enough to worry about. The same is true in regards to a couple event. There could be more than one correct way to describe the relationship between the couple. However, I don't think it's unreasonable to ask the user to choose one or create one of his own. Allowing pointless splitting of hairs would be counterproductive.
Secondly, and this is a big deal, what is a couple anyway? It's just our current "civilized" understanding of how to head up a family. We can't build a data structure that won't allow three or more people to be married to each other at the same time. This could happen either legally or illegally, depending on the customs of the times. The religion of Mormonism itself, which has genealogy built into its dogma, once allowed and encouraged polygamy. But did they build this into GEDCOM or into the Personal Ancestral File app? More likely, they built dishonesty into their computer genealogy because in the computer age, the polygamous past is considered wrong and/or embarrassing so people want to pretend it didn't happen. UNIGEDS can't take this attitude. It has to be possible to record polygamous relationships.
So the whole idea of person_id1 & person_id2 for recording a relationship was wrong to begin with. Fortunately the new approach doesn't have this limitation. You can have three lines in family_couple that are linked to the same family. But there's a problem with that too. There's only one biological mother and one biological father. Not that every woman knows who fathered their child, but that's almost a different issue. The problem is that we'd decided to define a family element as a biological father and mother and their exact children.
What's coming home to me right now is how impossible it is to truly define a family unit. If not impossible, then very slippery. But if genealogy cares about history, then polygamy has to be recordable without ugly workarounds. Just as we've gone to the trouble of giving foster parents, adoptive parents, and guardians a place in UNIGEDS, simultaneous plural marriage partners have to be allowed too. So the person_id1 & person_id2 approach would have had to be abandoned anyway, whether a family element was going to be added to the data structure or not.
One approach might be to allow this sort of thing:
family_id = 42 (this table has only an ID column so far) family_couple table (which might need to be renamed again so that this doesn't feel like a workaround): family_couple_id family_id person_id kin_type_id which partner 27 42 5 1 0 92 42 10 2 1 203 42 25 129 1
The `which_partner` column tells Treebard whether to display the person on the left or the right in a GUI which puts spouses next to each other depending on whether the person is the husband or wife. The GUI can do what it wants with this information. The data above would express a man with two wives. Here's a sitation where a man has families with both a legal wife and a mistress, and remember that a family element doesn't have anything to do with time, so in order to show that the man had two families concurrently, it would have to be done in the GUI with dated events.
family_couple_id family_id person_id kin_type_id which partner 27 42 5 1 0 92 42 10 2 1 203 54 25 2 1 209 54 5 1 0
Two more questions come to mind. Why can't children go into this table too, and why not put all of these folks right into the family table? And is there a better way to record kin type? OK, that's three questions.
"Child" is a kin type that I've never had to use. The child is the person_id in a birth event; what else is there to say? But then we have this new element to deal with, the family element, and GEDCOM wants a husband, a wife, and some children. No way can I start designing based on GEDCOM's structure, I'm just trying to include GEDCOM so it won't be left out. Is there a reason to not add a couple of children like this:
family_couple_id family_id person_id kin_type_id which partner 27 42 5 1 0 92 42 10 2 1 99 42 17 6 null 238 42 29 6 null
Something was pulling at me a minute ago, something like, "Is the whole kin type being recorded all wrong?" I think what I'm getting at is a kin type table instead of a kin type column. I can't call it that because there's already a kin_type table that looks something like this:
kin_type_id kin_types built_in hidden 1 father 1 0 2 mother 1 0 6 child 1 0 128 partner_1 1 0 129 partner_2 1 0
Do I need to add a relationship table? What would such a thing look like? Would it replace the current family_couple table, or would it be a new added table?
relationship_id family_id person_id kin_type_id which_partner
Well that's nothing new. Come to think about it, this approach already allows a person to have more than one kin type in a family. I don't know what it would be used for, but if it's important, the GUI could display it somehow. What about answering why all this information isn't just put into the family table?
That's easy. It would have in-built limitations that won't allow the real world to be represented. In this anti-schema, each column except the first would reference a person_id:
family_id father_id mother_id child1_id child2_id partner_id 42 5 10 17 29 25
It seems to me that the kin type system takes care of the obvious limitations of the above anti-schema. A fixed amount of columns doesn't work for children, and in the case of polygamy, you'd need a number of partner columns too.
Using the existing `family_couple` table as-is and just renaming it `families_persons` as it should have been named anyway, here's a family with a man, two children, and two wives. I suddenly don't see a problem with this:
family_couple_id family_id person_id kin_type_id which partner 27 42 5 1 0 92 42 10 2 1 99 42 17 6 null 238 42 29 6 null 56 42 25 129 1
I don't know if the `which partner` field will be needed in Treebard since displaying the current person as a spouse next to another spouse on a current person tab is redundant, a waste of space, and Treebard's families table doesn't redundantly display the current person twice. But other GUIs might need it.
As for putting the children into a families_persons table, I wasn't going to do it because it doesn't seem strictly necessary. But I'm still trying to sort out what is and is not redundant, now that a family element is going to be considered a necessary part of the data structure. It's always been true that all the needed information about who's in the family can be gleaned from the events. I've been doing that for a long time. However, it might be a lot easier to say that "this approach is not redundant" and if so, then showing a families table would be much easier than it currently is.
Say you have John, Mary and their child Sylvia. The event table already has a person_id reference, so if the event_type is "birth", then the person referenced is being born. But if you were to query a RELATIONSHIP table to simply select relationships, instead of querying an EVENT table to GLEAN relationships, is that redundancy or is it just being practical? I don't want to denormalize UNIGEDS by inputting the same data in two places. But I think I've been missing something.
It seems like the real question all along has been, what to do for a foreign key in the event table row instead of referencing family_id? What we're doing here is trying to replace the six columns I've just removed from the event table, which gave person_id, kin_type_id, and age for both parents at the time of the birth, or both partners at the time of of a wedding. But... "both"? What if we're about to be invaded and conquered by Martians, and Martians have twelve genders so they marry twelve people to each other in one ceremony, and they're going to shove this system down our throats (or make us try to simulate it in a conquered culture)? In that case, 100 years from now, we're gonna need genieware that doesn't say "both" partners and stop there. So we might as well get started preparing for a time when the current cultural expectations aren't considered some kind of absolutes.
The answer is that the family table and the families_persons table REPLACE the formerly required act of gleaning family offspring RELATIONSHIPS from the EVENT table. The new system is not redundant at all; what's redundant is the notion of adding a foreign key to the event table to say whose kid is being born. To display the families table, you just go straight to the families_persons table and query it. The data structure is not denormalized; you only add the relationship data once. If you don't have to glean parents from a birth event, then the kin_type category has finally come into its own, and you no longer have to figure out how to reference kin type in the event table at all. This is a revolution in thought. I'll have to try it to see if I'm right. If so, then events displayed in the conclusions table might have less code dealing with whether an event is a generic event or a couple event. We'll see about that. If so, then this will simplify the queries and the code, not make it more complicated.
As for current progress, I don't mind making major changes, but first the code should work as-is with the database structural changes I've just made, with as few changes to the logic code as possible, so there will be a good foundation for making any larger changes that are needed or desirable. The families table is turned off for now, I'm still dealing with query changes in the conclusions table where events are displayed as a row of conclusions (date, place, particulars, age, and role; conclusions about names are in the names tab).
|
|
|
Post by Uncle Buddy on Feb 13, 2023 20:50:08 GMT -8
In short, the theory is that up till today, the only reason I thought that a family element was redundant is that I was trying to mix event data with relationship data, and then tease them apart again by gleaning relationship data from events. Instead of keeping events and relationships isolated from each other and simply querying relationship tables for relationship data, I've been laboriously squeezing relationship data out of events where the data was misplaced to begin with.
|
|
|
Post by Uncle Buddy on Feb 13, 2023 21:30:11 GMT -8
select * from families_persons; families_persons_id family_id person_id kin_type_id which_partner ------------------- --------- --------- ----------- ------------- 1 1 1 9 0 2 1 6 9 1 3 2 2 1 0 4 2 3 2 1 5 3 5 1 0 6 3 4 2 1 7 1 7 6 8 1 8 6 9 2 1 6 10 3 6 6 11 1 6 2 1 12 1 1 1 0
Now, if you wanna know something, you just ask. Children?
SELECT names FROM person JOIN name ON person.person_id = name.person_id JOIN families_persons ON person.person_id = families_persons.person_id WHERE family_id = 1 and kin_type_id = 6; names ----------------- Patricia Grimaldo Katrina Grimaldo
Parents?
SELECT names FROM person JOIN name ON person.person_id = name.person_id JOIN families_persons ON person.person_id = families_persons.person_id WHERE family_id = 1 and kin_type_id in (1, 2) and name_type_id = 1; names -------------------------- Jeremiah Laurence Grimaldo Ronnie Webb
|
|
|
Post by Uncle Buddy on Feb 19, 2023 22:51:37 GMT -8
The nuclear families table is working with the new data structure (added family element). Most of the queries are not simpler, they're more complex. More joins are needed, since the old way had all the data in one row of the event table. Aliases are needed since the two partners are in two rows of the families_persons table, so the table often has to get queried twice in one query. Joining to the event table is usually necessary. I can't say that it's easier, but with GEDCOM and other factors detailed above sort of expecting a family element, it was necessary and probably desirable.
I'm looking forward to rewriting the families.py module, but not right away. Most of it seems to be working, but next time I will try to break it up into three smaller classes instead of one big one. GUI, partners, and parents. These three things work separately, especially partners and parents which don't affect each other at all, so stuffing them into one class and then teasing them apart again is a waste of effort and results in more complicated code and a large, opaque single class full of unrelated stuff. It would be easier to work on if the brain didn't hurt when looking at the code.
When the families.py code is functioning back the way it was before the family table was added to the database, then it will be time to get back into the GEDCOM import program.
|
|
|
Post by Uncle Buddy on Feb 21, 2023 3:47:05 GMT -8
This discussion will continue in the Thinking Out Loud category.
|
|