What is cardinality and why should we care
Dec 3, 2023 16:14:16 GMT -8
Post by Uncle Buddy on Dec 3, 2023 16:14:16 GMT -8
You've heard terminology like "one-to-one relationship", "one-to-many relationship", and "many-to-many relationship". I saw these things mentioned when I first started trying to use MS Access to create a dictionary for my study of the Visayan language. (I am the author of Advanced Bisaya for Beginners). The dictionary project came to a halt when I could not find any simple instructions in MS Access to help me decide how to decide among the choices of relationship types listed above.
Switching to SQLite was the best decision I ever made. Now instead of some smart software creating tables for me based on relationships I didn't understand, I had to learn what a table actually was, what column constraints do, how primary keys and foreign keys work together. It would take me a lifetime to master this topic but the thing about SQL is that the basics are simple and you can get started right away without getting a PhD in the topic. By the way, no book or course on this topic will substitute for building your own database from scratch.
That's where Python comes in. It comes with SQLite, if you're on Windows. All you have to do is install SQLite's terminal tool and a code editor like Notepad++ and you're ready to dive in. Do not, under any circumstances, mess around with a GUI program for SQLite that promises to write your tables and queries for you. I absolutely forbid it. You have to be able to understand the mechanics of the situation or you're shooting in the dark.
For a long time with SQLite, I was still groping around and guessing when it came to cardinality. What difference does it make if the relationship between two data types is one-to-one (1:1), one-to-many (1:m) or many-to-many (m:m)? I found myself still guessing on this issue, and it cost me a lot of time wasted doing things over in my database design. A big change in attitude came about the day I decided there had to be a name for this whole topic "1:1, 1:m, or m:m". The name isn't used much, but the name does exist: cardinality. Now I knew what topic was being discussed.
I didn't find the answer in a book on database design, but in a Stack Overflow post. Someone wanted to know how to decide what the relationship is, and most importantly, what difference does it make, that is, what should I do about it in the table design (schema) when I did figure out what the relationship was. This is now my go-to topic in designing and redesigning database schemas, and I no longer have to guess. In my experience, you'll find out when you've got cardinality right because your queries will suddenly start working right and everything will become easy and obvious.
So here is cardinality in a nutshell, and I want to keep this simple but if I've made a mistake or left out something important, join the forum and let me know.
ONE-TO-ONE
As I see it, we're talking about pairs of data here, which makes a database a flexible, growable, shrinkable, multidimensional web of links. A link is a pair of data that are associated somehow. A 1:1 relationship is like a US citizen and his social security number. Each person has one SSN and each SSN refers to only one person. Generally to express a 1:1 relationship, I would place the two columns in the same table so that one record (one row in the table) contains both items of data. If the two items need to be in different tables, with one of them referenced by a foreign key, then a column constraint should be used so that any given foreign key can only be used once in the column. So the two items are still in the same record, with one of them represented by a foreign key.
ONE-TO-MANY
Here we have a person and his name. The trick with identifying 1:m that changed everything for me was this: analyze the relationship from both directions. We're only talking about two items, a pair of data. So let's say Nicolas Kim Coppola is one person, and that he is. He wants a career as an actor but he doesn't want to ride on the coattails of his famous uncle the filmmaker Coppola, so he uses a stage name, Nicolas Cage. Now he has two names but he's still one person. One person in this case has two names. Now to double check and make sure this is really a 1:m relationship, reverse the perspective. Start with the name. Each name refers to exactly one person. See? It looks like a one-to-one if you don't check it both ways. "Nicolas Cage" refers to only one person. One-to-one, right? Wrong! Look both ways, and you'll not only get it right, but you'll know which side is one and which side is many.
This enables you to do the right thing in the schema, and this is what has saved me oh-so-much grief. Once you know the cardinality, so what? What do you do about it? Very important! The foreign key is a column in the MANY side of the relationship. I used to put foreign keys just anywhere and then wonder why my queries didn't work. So in the case of a person table and a name table, the person ID is used as a foreign key in the name table, never the other way around. A person has many names but each name refers to only one person.
MANY-TO-MANY
I have a way of leaving many words in my trail, so my family trees have some large notes, and I noticed when using Genbox that I had to copy-paste large notes all over the place. This is tedious and uncomfortable, because when you're doing it, you know you're bloating your file, and it becomes an intuitive knowledge when you've used computers for a while that repetition in computing tends to be unnecessary. So I decided that Treebard would let you re-use a note anywhere you wanted, and you'd never have to copy-and-paste them, just select existing notes from a list. This feature hasn't been easy to design, since the genieware user won't remember IDs for notes. Just as important as how to make this work for the user in the GUI, how do you do it in the database?
Let's say there's a note linked to a name which also gives details about a place. We use notes for things that don't break down into simple, one-line data, so there will be more than one topic in a larger note. I want to link the same note to both a person's name and a place. Analyze the relationship, again from both directions: for each name, there could be many notes. And we want to use each note for many things, whether it's multiple names or even many different types of things, like a name and a place, linked to the same note. So this is a many-to-many relationship.
A m:m relationship is expressed in a database by two foreign keys in the same table row. The table should be a junction table, not the note table or the name table but in my case a notes_links table. The note table itself just has a primary ID column and a column for the note text. The note_id from the note table is used as a foreign key in the notes_links junction table, and in the same row is a foreign key from the name table. In another row, the same note_id will be used with a place_id. The two foreign keys are in the same row of the same table, that's what constitutes a link. So the junction table notes_links will have several columns, one for each element that you might want to link a note to: person_id, name_id, place_id, source_id, etc. Most of these columns will have null values, since many-to-many relationships are often between two data items only.
The notes feature in Treebard is kinda complex, because I've decided the user has to add a title to each note. This is so he can select notes to re-use from a list. Instead of a note ID, which he wouldn't remember, he selects the desired note from a dialog that lists the note title and the first few words in the note. This system isn't perfect but I'm giving it a try to see if it's practical. So far it works fine.
The end result so far is a notes_links table with a few extra columns in it besides the several foreign key columns. The note topic only has to be unique within a given linked element in order to be used as an identifier, so it would be possible for a note to have more than one title, depending on what it was linked to. Maybe not a good idea but possible to do if necessary, and made possible by putting the note_topic column in the junction table. So there's another m:m relationship, and the note_topic column had to be moved out of the note table into the notes_links table to make this work.
The notes dialog in the GUI has a list of topics, and you click the topic to display the note. The topics can be re-ordered, but this order is specific to each linked element, so note_topic_order is another m:m relationship, also in the junction table. The note_topic and note_topic_order columns are not foreign keys, but for the most part, most columns in m:m tables are for foreign keys. The primary key in a junction table is only used occasionally, and I've never used one as a foreign key, just in queries. I don't know if it's possible or useful to use a primary key from a m:m table as a foreign key somewhere.
That's not everything I know about databases, but it's the part that was once the most mysterious and has been the most important to get squeaky clean in order to get UNIGEDS right. More improvements are being made as we speak, due to the writing of GEDCOM import and export programs forcing me to think about things that hadn't come up till GEDCOM made me think about them.
Switching to SQLite was the best decision I ever made. Now instead of some smart software creating tables for me based on relationships I didn't understand, I had to learn what a table actually was, what column constraints do, how primary keys and foreign keys work together. It would take me a lifetime to master this topic but the thing about SQL is that the basics are simple and you can get started right away without getting a PhD in the topic. By the way, no book or course on this topic will substitute for building your own database from scratch.
That's where Python comes in. It comes with SQLite, if you're on Windows. All you have to do is install SQLite's terminal tool and a code editor like Notepad++ and you're ready to dive in. Do not, under any circumstances, mess around with a GUI program for SQLite that promises to write your tables and queries for you. I absolutely forbid it. You have to be able to understand the mechanics of the situation or you're shooting in the dark.
For a long time with SQLite, I was still groping around and guessing when it came to cardinality. What difference does it make if the relationship between two data types is one-to-one (1:1), one-to-many (1:m) or many-to-many (m:m)? I found myself still guessing on this issue, and it cost me a lot of time wasted doing things over in my database design. A big change in attitude came about the day I decided there had to be a name for this whole topic "1:1, 1:m, or m:m". The name isn't used much, but the name does exist: cardinality. Now I knew what topic was being discussed.
I didn't find the answer in a book on database design, but in a Stack Overflow post. Someone wanted to know how to decide what the relationship is, and most importantly, what difference does it make, that is, what should I do about it in the table design (schema) when I did figure out what the relationship was. This is now my go-to topic in designing and redesigning database schemas, and I no longer have to guess. In my experience, you'll find out when you've got cardinality right because your queries will suddenly start working right and everything will become easy and obvious.
So here is cardinality in a nutshell, and I want to keep this simple but if I've made a mistake or left out something important, join the forum and let me know.
ONE-TO-ONE
As I see it, we're talking about pairs of data here, which makes a database a flexible, growable, shrinkable, multidimensional web of links. A link is a pair of data that are associated somehow. A 1:1 relationship is like a US citizen and his social security number. Each person has one SSN and each SSN refers to only one person. Generally to express a 1:1 relationship, I would place the two columns in the same table so that one record (one row in the table) contains both items of data. If the two items need to be in different tables, with one of them referenced by a foreign key, then a column constraint should be used so that any given foreign key can only be used once in the column. So the two items are still in the same record, with one of them represented by a foreign key.
ONE-TO-MANY
Here we have a person and his name. The trick with identifying 1:m that changed everything for me was this: analyze the relationship from both directions. We're only talking about two items, a pair of data. So let's say Nicolas Kim Coppola is one person, and that he is. He wants a career as an actor but he doesn't want to ride on the coattails of his famous uncle the filmmaker Coppola, so he uses a stage name, Nicolas Cage. Now he has two names but he's still one person. One person in this case has two names. Now to double check and make sure this is really a 1:m relationship, reverse the perspective. Start with the name. Each name refers to exactly one person. See? It looks like a one-to-one if you don't check it both ways. "Nicolas Cage" refers to only one person. One-to-one, right? Wrong! Look both ways, and you'll not only get it right, but you'll know which side is one and which side is many.
This enables you to do the right thing in the schema, and this is what has saved me oh-so-much grief. Once you know the cardinality, so what? What do you do about it? Very important! The foreign key is a column in the MANY side of the relationship. I used to put foreign keys just anywhere and then wonder why my queries didn't work. So in the case of a person table and a name table, the person ID is used as a foreign key in the name table, never the other way around. A person has many names but each name refers to only one person.
MANY-TO-MANY
I have a way of leaving many words in my trail, so my family trees have some large notes, and I noticed when using Genbox that I had to copy-paste large notes all over the place. This is tedious and uncomfortable, because when you're doing it, you know you're bloating your file, and it becomes an intuitive knowledge when you've used computers for a while that repetition in computing tends to be unnecessary. So I decided that Treebard would let you re-use a note anywhere you wanted, and you'd never have to copy-and-paste them, just select existing notes from a list. This feature hasn't been easy to design, since the genieware user won't remember IDs for notes. Just as important as how to make this work for the user in the GUI, how do you do it in the database?
Let's say there's a note linked to a name which also gives details about a place. We use notes for things that don't break down into simple, one-line data, so there will be more than one topic in a larger note. I want to link the same note to both a person's name and a place. Analyze the relationship, again from both directions: for each name, there could be many notes. And we want to use each note for many things, whether it's multiple names or even many different types of things, like a name and a place, linked to the same note. So this is a many-to-many relationship.
A m:m relationship is expressed in a database by two foreign keys in the same table row. The table should be a junction table, not the note table or the name table but in my case a notes_links table. The note table itself just has a primary ID column and a column for the note text. The note_id from the note table is used as a foreign key in the notes_links junction table, and in the same row is a foreign key from the name table. In another row, the same note_id will be used with a place_id. The two foreign keys are in the same row of the same table, that's what constitutes a link. So the junction table notes_links will have several columns, one for each element that you might want to link a note to: person_id, name_id, place_id, source_id, etc. Most of these columns will have null values, since many-to-many relationships are often between two data items only.
The notes feature in Treebard is kinda complex, because I've decided the user has to add a title to each note. This is so he can select notes to re-use from a list. Instead of a note ID, which he wouldn't remember, he selects the desired note from a dialog that lists the note title and the first few words in the note. This system isn't perfect but I'm giving it a try to see if it's practical. So far it works fine.
The end result so far is a notes_links table with a few extra columns in it besides the several foreign key columns. The note topic only has to be unique within a given linked element in order to be used as an identifier, so it would be possible for a note to have more than one title, depending on what it was linked to. Maybe not a good idea but possible to do if necessary, and made possible by putting the note_topic column in the junction table. So there's another m:m relationship, and the note_topic column had to be moved out of the note table into the notes_links table to make this work.
The notes dialog in the GUI has a list of topics, and you click the topic to display the note. The topics can be re-ordered, but this order is specific to each linked element, so note_topic_order is another m:m relationship, also in the junction table. The note_topic and note_topic_order columns are not foreign keys, but for the most part, most columns in m:m tables are for foreign keys. The primary key in a junction table is only used occasionally, and I've never used one as a foreign key, just in queries. I don't know if it's possible or useful to use a primary key from a m:m table as a foreign key somewhere.
That's not everything I know about databases, but it's the part that was once the most mysterious and has been the most important to get squeaky clean in order to get UNIGEDS right. More improvements are being made as we speak, due to the writing of GEDCOM import and export programs forcing me to think about things that hadn't come up till GEDCOM made me think about them.