Post by Uncle Buddy on Apr 30, 2022 3:00:11 GMT -8
Here's a quote from an interesting article about GEDCOM. Among other things, the article addresses some genieware provider's questionable practice of encrypting family data. The part I found most interesting is this:
There is a common thread here: SQLite.
SQLite is a most incredible phenomenon of the open source software community. Its creator gifted the program to the world with a statement which is a classic in the field of unlicensed, public domain software. His program is one of the most popular and useful programs in existence. The "lite" in its name is unfortunate because it leads some to assume that it's light on functionality. No, actually, it's light on extra effort needed to use it. Unlike other excellent SQL software such as PostgreSQL, you don't need to set up a local server to use it. And when it comes to Python programming, at least if you're using Python in Windows, SQLite comes packaged with the language so you don't even need to install the SQLite library separately. Just install the SQLite tool, the console you type commands into, type `import sqlite3` at the top of any Python module, and you're ready to play with databases, right now. Compared to MySQL, there's little boilerplate. It's very easy to use, and its use of the SQL language is standard. If you already know SQL, you already know most of SQLite.
I said that so I could say this: when GEDCOM was created, no doubt a simple text format was chosen only because, as the brainstorm goes, SQLite didn't exist yet. SQL itself had only become a viable entity a few years earlier.
So while we're playing around with ideas, instead of just eating whatever's served to us, let's pretend we can roll back the clock to 1984 when GEDCOM was created as a sort of intermediary among RDBMS (relational database management systems) and other means of data storage. What if the need for GEDCOM had occurred in a world where SQLite already existed? Do you think we would have seen GEDCOM created as a huge, klunky text file that can take hours to process and then fix thousands of broken data manually?
Here is my hypothesis:
Sooner or later, GEDCOM is going to be replaced by a proper RDBMS database, probably written in SQLite.
Since this means that genieware providers will all have to use the same data storage structure, does that mean there will no longer be a need for independent genieware providers? No. In this hypothetical new world of geniewares that can literally communicate with each other, your GUI and my GUI will and should be very different. And that's all the user cares about. Most genealogists care what flavor of GUI they're using, but few genealogists care what sort of software is used to store their data, as long as it's stored well and not lost. GEDCOM is a big letdown in this area, as everyone knows. It is the very independence of the genieware developers that makes GEDCOM an inadequate solution. GEDCOM is, in fact, a mock relational database. It uses primary keys called "identifiers" and foreign keys called "pointers". It was an attempt to replace a SQLite that did not yet exist.
What if a genieware provider doesn't presently use a SQL database at all? For example, Family Historian apparently stores your data directly into a GEDCOM file as lines of text. My answer to this is that there comes a time in the evolution of an industry where anyone who wants to keep up had better put forth the effort to catch up. If SQLite had been around in 1984 when GEDCOM was created, I'm sure that GEDCOM would not have been created as a text file, and Family Historian would not have been built to store your data in a text file when it came along later.
I watched a video recently where a member of FamilySearch's GEDCOM team said that GEDCOM should not change too quickly. He was talking about some improvements that have been made in GEDCOM 7. Well what if he's wrong? What if we should fix broken things as quickly as possible, even if it means that a lot of genealogists will have to do a lot of work over? Isn't that what we have to do anyway, every time we use GEDCOM? Because every genieware provider stores data differently, a lot of custom tags have to be used to write a GEDCOM file at all. (Actually I suspect that custom tags are overused in lieu of finding creative ways to use the tags that come with the GEDCOM specification.) But it cannot be denied that, for all the good it does, using GEDCOM at all forces the genieware user to do a lot of his work over. Isn't GEDCOM already forcing us to push round pegs into square holes, to say the least? Any file sharing utility is going to force software providers to toe some line. The problem with GEDCOM is that it was invented when the alternatives to it didn't quite exist yet. So are we gonna keep trying to carve granite blocks with wooden mallets and copper chisels, or are we gonna see the light, now that the right tools actually exist, and start using the right tool for the right job?
So what are we waiting for? GEDCOM can and should be replaced by a team-created SQLite data structure that accurately records real people, places, events, and the other elements of genealogy, accurately and realistically, so that GEDCOM's pretense at flexibility--custom tags--can cease to exist. And anyone who doesn't want to participate, well... let them use GEDCOM!
I see one last really good reason for a company to open up its database structure. If they’ve got a really good structure, then maybe others will copy it. If others copy it, then maybe it will become the standard. If it becomes the standard then they are the leaders. Just as FamilySearch was with GEDCOM.
Or even if it doesn’t become the standard, if the database is open, developers can write programs to directly transfer from one database to another without the data loss usually incurred through GEDCOM. This seamless sharing of data with other programs and online trees is something all genealogists want to see.
Followup: Arb pointed out to me on Twitter that MacFamilyTree also uses SQLite and does not encrypt it. Here’s an example of a wonderful way the database was accessed for a Geographical mapping project.
MobileFamilyTree employs exactly the same SQLite database structure as MacFamilyTree, meaning people can use either program with the same database. Now isn’t that a wonderful concept?
Or even if it doesn’t become the standard, if the database is open, developers can write programs to directly transfer from one database to another without the data loss usually incurred through GEDCOM. This seamless sharing of data with other programs and online trees is something all genealogists want to see.
Followup: Arb pointed out to me on Twitter that MacFamilyTree also uses SQLite and does not encrypt it. Here’s an example of a wonderful way the database was accessed for a Geographical mapping project.
MobileFamilyTree employs exactly the same SQLite database structure as MacFamilyTree, meaning people can use either program with the same database. Now isn’t that a wonderful concept?
There is a common thread here: SQLite.
SQLite is a most incredible phenomenon of the open source software community. Its creator gifted the program to the world with a statement which is a classic in the field of unlicensed, public domain software. His program is one of the most popular and useful programs in existence. The "lite" in its name is unfortunate because it leads some to assume that it's light on functionality. No, actually, it's light on extra effort needed to use it. Unlike other excellent SQL software such as PostgreSQL, you don't need to set up a local server to use it. And when it comes to Python programming, at least if you're using Python in Windows, SQLite comes packaged with the language so you don't even need to install the SQLite library separately. Just install the SQLite tool, the console you type commands into, type `import sqlite3` at the top of any Python module, and you're ready to play with databases, right now. Compared to MySQL, there's little boilerplate. It's very easy to use, and its use of the SQL language is standard. If you already know SQL, you already know most of SQLite.
I said that so I could say this: when GEDCOM was created, no doubt a simple text format was chosen only because, as the brainstorm goes, SQLite didn't exist yet. SQL itself had only become a viable entity a few years earlier.
So while we're playing around with ideas, instead of just eating whatever's served to us, let's pretend we can roll back the clock to 1984 when GEDCOM was created as a sort of intermediary among RDBMS (relational database management systems) and other means of data storage. What if the need for GEDCOM had occurred in a world where SQLite already existed? Do you think we would have seen GEDCOM created as a huge, klunky text file that can take hours to process and then fix thousands of broken data manually?
Here is my hypothesis:
Sooner or later, GEDCOM is going to be replaced by a proper RDBMS database, probably written in SQLite.
Since this means that genieware providers will all have to use the same data storage structure, does that mean there will no longer be a need for independent genieware providers? No. In this hypothetical new world of geniewares that can literally communicate with each other, your GUI and my GUI will and should be very different. And that's all the user cares about. Most genealogists care what flavor of GUI they're using, but few genealogists care what sort of software is used to store their data, as long as it's stored well and not lost. GEDCOM is a big letdown in this area, as everyone knows. It is the very independence of the genieware developers that makes GEDCOM an inadequate solution. GEDCOM is, in fact, a mock relational database. It uses primary keys called "identifiers" and foreign keys called "pointers". It was an attempt to replace a SQLite that did not yet exist.
What if a genieware provider doesn't presently use a SQL database at all? For example, Family Historian apparently stores your data directly into a GEDCOM file as lines of text. My answer to this is that there comes a time in the evolution of an industry where anyone who wants to keep up had better put forth the effort to catch up. If SQLite had been around in 1984 when GEDCOM was created, I'm sure that GEDCOM would not have been created as a text file, and Family Historian would not have been built to store your data in a text file when it came along later.
I watched a video recently where a member of FamilySearch's GEDCOM team said that GEDCOM should not change too quickly. He was talking about some improvements that have been made in GEDCOM 7. Well what if he's wrong? What if we should fix broken things as quickly as possible, even if it means that a lot of genealogists will have to do a lot of work over? Isn't that what we have to do anyway, every time we use GEDCOM? Because every genieware provider stores data differently, a lot of custom tags have to be used to write a GEDCOM file at all. (Actually I suspect that custom tags are overused in lieu of finding creative ways to use the tags that come with the GEDCOM specification.) But it cannot be denied that, for all the good it does, using GEDCOM at all forces the genieware user to do a lot of his work over. Isn't GEDCOM already forcing us to push round pegs into square holes, to say the least? Any file sharing utility is going to force software providers to toe some line. The problem with GEDCOM is that it was invented when the alternatives to it didn't quite exist yet. So are we gonna keep trying to carve granite blocks with wooden mallets and copper chisels, or are we gonna see the light, now that the right tools actually exist, and start using the right tool for the right job?
So what are we waiting for? GEDCOM can and should be replaced by a team-created SQLite data structure that accurately records real people, places, events, and the other elements of genealogy, accurately and realistically, so that GEDCOM's pretense at flexibility--custom tags--can cease to exist. And anyone who doesn't want to participate, well... let them use GEDCOM!