|
Post by Uncle Buddy on Dec 10, 2023 22:19:57 GMT -8
I'm back on track after the recent confusion which coincided with a sore throat and other cold symptoms like being too stupid to just take a break.
When I'm working on the import program I have a rant repository called Luther's 99 Theses which helps me retain my sanity and gives me something to do between spurts of productivity. I've been using this forum as a place to post rants while writing the export program, so here's today's notes, before I get back to work.
The INDI and FAM records are finished, in the export program. As I expected, it's going many times faster than the writing of the import program.
Here is the promised rant from gedcom_export.py:
|
|
|
Post by Uncle Buddy on Dec 11, 2023 0:20:36 GMT -8
The devil's advocate has to ask, "Yeah but what if you're wrong?"
I'm almost finished with the OBJE record export, it was simple till I got to GEDCOM's "source_citation" which would be one of the things mentioned in the post above which I feel might not really exist. I know there's not currently any such thing in Treebard, but that's not the criteria. Treebard GPS is not, and never will be, meant to represent finished software.
But I don't like being wrong, so quick, before someone else tells me what's wrong with my opinion, what can I myself say might be wrong with it?
Here's the question: in what way is a citation linked to a multimedia object? For example, in what way is a scan of a particular census page linked to the 1880 census? By "linked", I mean, why would the two elements be found together on the same row of a database table?
Into my mind pops this: the URL where the image was downloaded from. But that is a locator, which is a location within a repository such as archive.org or familysearch.org. A citation is a location within a source.
I have to try and imagine that my predefined truth, or dogma, is either wrong or sometimes wrong or sometimes irrelevant. Or most likely, I've forgotten something I knew a couple weeks ago. The dogma is that you can't link a source to something without an assertion (what the source says). Linking a source to an image file seems to have nothing at all to do with what the source says. It does seem to relate to the citation, for example the sheet number and enumeration district of the census page depicted in the image. That does seem relevant.
I'm seeing a possible loophole in my law, so I'll keep going.
Here's a what-if:
What if event parts (date, place, particulars, age, role) and names--i.e. the factoids of history--do need both assertions and citations to properly link to sources, but on the other hand, media files only need a citation to form such a link?
Hmmmmmmmmmmm....
Oops, wait a minute. Treebard does too have a place to make such a link, and speaking of repositories, the link actually is in the repositories_links junction table. There doesn't have to be a locator but by putting a foreign key for the media and one for the citation in the repositories_links table, they can be linked together without an assertion, and a locator and repository can be added too but are not required. Except that currently the repositories_links table schema probably requires a repository, but that could be a mistake. Don't assume the table is correctly named, especially since it's a brand new and little-used table. Hang on, let me run and get the schema.
Here it is:
sqlite> .schema repositories_links CREATE TABLE repositories_links (repositories_links_id INTEGER PRIMARY KEY, repository_type_id INTEGER DEFAULT null, source_id INTEGER DEFAULT null, citation_id INTEGER DEFAULT null, repository_id INTEGER DEFAULT null, locator_id INTEGER DEFAULT null, media_id INTEGER DEFAULT null, contact_id INTEGER DEFAULT null, FOREIGN KEY (repository_type_id) REFERENCES repository_type (repository_type_id), FOREIGN KEY (source_id) REFERENCES source (source_id), FOREIGN KEY (citation_id) REFERENCES citation (citation_id), FOREIGN KEY (repository_id) REFERENCES repository (repository_id), FOREIGN KEY (locator_id) REFERENCES locator (locator_id), FOREIGN KEY (media_id) REFERENCES media (media_id), FOREIGN KEY (contact_id) REFERENCES contact (contact_id));
The schema states that of all the fields that can be given a value in this table, none of them are required; they're all null by default. This now sounds right, and I'd have to say that the reason this table got called a repositories_links table is that I was thinking about repositories a lot when I created it. In fact, it was first called sources_links but I changed it recently.
In fact, I now recall adding media_id to this table for the simple reason that, obviously, we need a place to link sources and citations to media. In the case of a census page image, the link would be to a citation, a location within a source. If you were referring to a .pdf of a whole census reel, maybe that would link to the source. In the case of a gravestone photo, the gravestone is the source and I guess there'd be no citation unless you want to count the lines on the gravestone. So for a gravestone, you could link the whole source to the photo and put down the cemetery as the repository and the plot number as the locator.
|
|
|
Post by Uncle Buddy on Dec 11, 2023 0:31:09 GMT -8
Just to be clear, the repositories_links junction table is not the place to link a source to a citation. You'd use one or the other here. You could use both if you had to, for example if a locator (call number, URL, etc.) referred to both the source and the citation. But in general, the right way to link a source to a citation is by putting a foreign key for the source_id into a row of the citation table. This is because source-to-citation is a one-to-many relationship, and this relationship should be expressed by putting the foreign key in the many side of the relationship. You'd want to use a given source_id multiple times in links to different citations, but each citation is linked to only one source.
|
|
|
Post by Uncle Buddy on Dec 11, 2023 0:59:45 GMT -8
However, the GEDCOM specs as usual are not helpful, or I should say they raise more questions than they answer. They give many rules but few examples. Here we find that the source_citation construct is to be used in the OBJE record, but when you look up source_citation, you find that it contains a multimedia_link. OBJE and multimedia are the same thing, so this seems like a mistake. I'll look in the 5.5.5 specs by Tamura Jones to see if he had anything to say about this.
No he didn't.
I'll have to come up with an example to try to make sense of this. Why would an OBJE tag contain, anywhere within it, an OBJE pointer?
The OBJE record refers to an image of Grannie Smith's 1850 census. The source_citation in the OBJE record has a pointer to the 1850 census and a PAGE tag subordinate to the SOUR pointer provides a citation. Now why on God's Green Earth would you add a multimedia link as a sibling to the citation?
What I need right now is a cold shower.
|
|
|
Post by lkessler on Mar 5, 2024 15:27:57 GMT -8
Hi Scott, I really enjoyed reading through all your material. See my thoughts about it in the blog post I put up yesterday: A New(?) Genealogy Program –Treebard GPSYour GEDCOM page is terrific. I think if we combined all your suggestions, my suggestions and Tamura Jones' suggestions together, we could probably turn GEDCOM 5.5.1 into a really solid standard. I'm currently working on my GEDCOM export for Behold. Behold is not a genealogy editor but just a GEDCOM reader, so unlike you, writing an export routine is not simpler for me than writing an import routine because I need to re-export all the junk that Behold imports from other programs. I'll have two possible exports: (1) same format as input which is easy, since the assumption is that the program that wrote the GEDCOM will read the GEDCOM and should understand its own oddities (almost finished), and (2) to valid GEDCOM 5.5.1 following 5.5.5 guidelines and maximizing the amount of data that all programs will read. (Not so easy, but I've got my plan for it). Louis
|
|
|
Post by Uncle Buddy on Mar 5, 2024 17:05:34 GMT -8
Hi Louis,
Welcome to the forum and thanks so much for posting. And especially thanks for your kind words. I am a fan of your work, by the way, and wish I had more time to read on your blog and on Tamura Jones' website. This project keeps me super busy and I am not an efficient worker nor a professional anything. I am more of a plodding floater with a one-track mind.
I believe my notion of UNIGEDS (a standardized SQL database as a replacement for GEDCOM) was seeded by a comment you made somewhere, something to the effect of, "Wouldn't it be nice if we were all using the same data structure," that's a loose paraphrase from my bad memory. Of course I was already trying to write the best data structure I could manage, but your comment tripped a switch and the light bulb came on: what if all vendors had the same database schema as their back end.
Well, I'm not holding my breath for that to happen, but I am trying to imagine such a standardized data schema into existence, and when I find something wrong with it, or when I find it hard to work with, I fix it, sometimes over and over, till it works. There's more than one way to do things but simpler is better as long as the results are right. To my way of thinking, the cardinality rules of data relations are the key to everything.
I will now go to your blog post and read it carefully. I am very happy to find Treebard noticed on your blog, and happy to welcome you to the forum.
P.S. I removed "GPS" from the name after Tamura reminded me that there's already a GPS in the field of genealogy. When I googled GPS genealogy I got the other fellas, not me, so I thought now would be the best time to change the name, instead of later. I still have to make the change on my website. I've been busy making videos the past few days so I can put out an .exe for anyone who wants to try Treebard without having to install Python.
|
|
|
Post by Uncle Buddy on Mar 5, 2024 18:46:13 GMT -8
Here are some comments about your comments about Treebard on your blog. Before I start, I want to say thanks again for noticing this project. This is a big deal to me. I'm trying not to say something too flowery but I'm probably going to send a link to my mama so she can read your review.
Treebard might become a full-fledged app for daily use someday, but that's not overtly the goal. I'm creating what is meant to be the core, seed, or inspiration for developers to use as an example, a working model, or a showcase of functionalities. Personally I don't use secondary features such as hints or website scraping as I get so wrapped up in the research, the treasure hunt, that sometimes I look away when others have already found something, because I'd rather find it myself. But that's just me. Anyway, as a working model and not a complete program, and as a team of one, I am offering mostly primary features. Treebard does have a few graphics editing functionalities which is sort of secondary to genealogy and available on any graphics program. I had to install a graphics library called Pillow to improve Tkinter's capabilities, and once I had it I figured I might as well use it.
I would say that Treebard is conclusion based when you want it to be, and evidence based when you need it to be. In order to conform to the cardinality of the data (relationship between sources, citations, events), assertions have been left out of genieware. Assertions are not needed when something is obvious or non-controversial or not very interesting. But when your key research subject was born in three different countries or something, you kinda get this urge to find the actual facts, so you start collecting sources, none of which perfectly agree with each other. This to me is when genealogy gets interesting.
On Treebard's events/attributes table (a.k.a. conclusions table), you can enter what you think happened, make a guess, leave it blank, you do what you want. In each row there is a sources button, so if you want to know how you came to your conclusion, you can click SOURCES to open an assertions dialog which shows sources, citations, repositories, and in order to properly link all this to your conclusion, the assertion is necessary. Anyone who wants to do sourcing but doesn't want to record assertions (what the source says) can type a space, an x, a hyphen, anything they want, in the assertion field and carry on. You get your sourcing done, in a way that is set up properly according to the real-world relationships among the data. The main point is that a table of the assertions you saw in sources will tell you what you were thinking when you concluded that John Smith was born in 1842. Instead of having to search throughout your tree or look at the source documents again. Also citations should not have to be copied & pasted, so the assertions dialog gives you a way to select the citations you've already entered, and they don't have to be pasted or re-typed.
Many thanks for your kind and wonderful words.
The YouTube channel has been around for a few years, but at first I was too busy to edit the videos. With time I became embarrassed by the fact that the videos were just raw, I mean they were really bad. I stopped writing code for a month or so when I finally found a video editing software that was fun and easy to use, edited all the videos, and deleted all the raw footage. So the descriptions say when they were first created as well as when they were edited, so people can see which ones are more current.
This is actually the import structure of my Python files. Python can't do circular imports, so the chart is helpful to see what can import to what. The UNIGEDS schema is SQL, for example here are the tables it contains:
C:\Users\Lutherman>sqlite3 d:/treebard/data/sample_tree/sample_tree.tbd SQLite version 3.45.1 2024-01-30 16:01:20 (UTF-16 console I/O) Enter ".help" for usage hints. sqlite> .tables assertion event_type nested_place report_type change_date font_preference note repositories_links chart handle notes_links repository chart_type kin_type person repository_type citation locator place role_type colors_type locator_type place_name roles_links contact media place_type source couple media_links places_types source_type current media_type preferences to_do date_format name project transcription event name_type report transcription_type
Here is a table schema:
sqlite> .schema event CREATE TABLE IF NOT EXISTS "event" (event_id INTEGER PRIMARY KEY, date TEXT NOT NULL DEFAULT '-0000-00-00-------', particulars TEXT NOT NULL DEFAULT '', age TEXT NOT NULL DEFAULT '', person_id INTEGER DEFAULT null, event_type_id INTEGER NOT NULL, date_sorter TEXT NOT NULL DEFAULT '0,0,0', nested_place_id INTEGER NOT NULL DEFAULT 1, couple_id INTEGER DEFAULT NULL REFERENCES couple (couple_id), age1 TEXT NOT NULL DEFAULT '', age2 TEXT NOT NULL DEFAULT '', FOREIGN KEY (person_id) REFERENCES person (person_id), FOREIGN KEY (nested_place_id) REFERENCES nested_place (nested_place_id), FOREIGN KEY (event_type_id) REFERENCES event_type (event_type_id)); sqlite>
Yes. Thank you.
That was the culmination of more than 2 years. My first video series on GEDCOM import was deleted when I finally created what I call a "finishable" GEDCOM import program.
I don't advocate ignoring all custom tags. I do it in my project because proving that GEDCOM is perfectly usable is not my goal; because there's only one of me; and because creating Treebard and UNIGEDS is where my time needs to be invested. Treebard is a working model for others to borrow/steal/finish, so my import program has some samples of sending things to an exceptions report and sends all custom tags and their subordinate tags to the exceptions report. I think we should stop wasting time on GEDCOM and get serious about replacing it with something that can be corrected and perfected. GEDCOM programmers will continue to use and abuse custom tags to keep from having to spend the rest of their lives trying to figure out what to do with the dad-blasted specs-prescribed way of doing things.
The 5.5.1 specs do actually refer to TEXT values as "assertions". Then they prescribe doing nonsensical things with TEXT lines. My way of handling the DATA.TEXT tag is to literally delete the DATA tag since this typically would put the TEXT tag subordinate to the PAGE tag which is usable. This is a serious compromise and I do it with fingers crossed. Tamura suggested making an _ASRTN tag for the export program, but I'm off GEDCOM right now and will have time to think about whether to use any custom tags in my export program. Currently there are zero custom tags in my export.
I can't thank you enough.
Tamura requested an .exe so he can try out the interface without installing Python. In preparation for Treebard's first .exe, I'm taking care of a few things that needed to be done. The result will not be a finished program, but a few things were bothering me and there's plenty for me to work on after I do put out the .exe, so when it appears, it is not meant to be final. It will be replaced with a better .exe soon after, if I get the time to work on it.
Treebard is a working model. My goal is not to finish it. My goal is to perfect it. That could take a long time. In the meanwhile, anyone can use the code without permission to seed their own project. It can be translated to any programming language and any GUI widget toolkit. It can use Postgresql instead of SQLite (though I see no reason for that).
So I'm editing three videos this week on recent changes and will make the .exe when that is done.
|
|
|
Post by lkessler on Mar 6, 2024 13:07:59 GMT -8
To become a full app was my goal for Behold 25 years ago. But in 2017 I found that MyHeritage was "good enough" and its hints were too irresistible to me to not use. I've got new specialized purposes in mind for Behold now.
Your ambition to provide something for other developers is admirable, but the market for that you can likely be counted on one hand. They'd need to see things your way, program things your way, and see a real need for a new program in the genealogical space like you do. Personally, I think you'll get more traction by doing the program for yourself and for what you need. Others will see inspiration in that and take some of your unique ideas into their fold.
Some programs like RootsMagic have added hints supplied by the online data providers with agreements between them. But I wouldn't get into webscraping. It's frowned upon and in some cases may be considered illegal.
The most valuable resources you are providing to me as a developer are your ideas and thoughts that are included in your extensive writeup and recordings of your development work and your trials and tribulations.
Yes, well.. SQL is a wonderfully powerful and complex language. I dealt extensively with it when custom building my websites using MySQL and the worst programming language ever developed: PHP. I was getting ready to add a sqlite database to store Behold data in 2017 before I decided to use MyHeritage instead.
You can leave that up to me. That's something that I've spent a lot of time doing. I've always argued that GEDCOM does not need a rewrite. It only needs tweaks, such as the many you highlight in your GEDCOM book.
Here I respectfully disagree. I understand your thinking that an agreed upon database spec like UNIGEDS by all genealogy software would be utopia. Data transfer would be so simple and guaranteed correct. But the problem is instead of just arguing with other developers over a tag, now you're arguing over fields and record definitions. Developers are as different as chalk and cheese. One needs the Place and Assertion and Repository records, and the next not only doesn't need it but doesn't want it.
You were actually beaten to the idea by FamilySearch who created GEDCOM X as their database representation. They were hoping like you that everyone would adopt it and the world would be saved and the GEDCOM standard would no longer be needed. In the end, GEDCOM X became used in the API that allowed programs to connect to FamilySearch's Family Tree, but not much else.
There are hundreds of full featured genealogy programs out there, and even more utility programs, and 99% of them use GEDCOM 5.5.1 or something that attempts to be it to exchange their data with other programs. They are all more than happy that version 5.5.1 has been around for 20 years unchanged. None of them will want to spend time on updating their GEDCOM to a new standard or a new way of transferring data. And you said it yourself somewhere, most only care about importing others' data and few care about exporting their own data because they don't want to make it easy for someone to switch to another program.
On my soapbox now. DON'T USE CUSTOM TAGS! You will guarantee that few if any programs will import your custom data, just like Treebard won't. Try to fit everything in on standard tags that 99% of programs do import.
Each DATA tag is associated with just one PAGE tag. The TEXT tag can occur multiple times under the DATA tag. Therefore you can list all the assertions you want for each PAGE tag. So I don't really see what the problem is.
You won't find anyone better to technically and critically review your program.
Second statement is not possible. You might as well work on finishing it, at least for your own needs.
Louis
|
|
|
Post by Uncle Buddy on Mar 6, 2024 17:27:15 GMT -8
Louis--thanks for your response, and thanks for not agreeing with everything I said. Without discourse, it's like, Where's the meat?
I'm not a debater and this is not a debate. I feel that there are jillions of people in the world for a reason, if we all agreed on everything, it would be kind of a weird sci-fi movie. But I will reply because it's nice and fun to have a conversation with someone besides myself for a change.
Plan A is to save the world with genealogy. Plan B is, as you state, to create for myself the genieware I'm willing and happy to use. I usually know the difference. Plan A gets me out of bed in the morning and Plan B makes it possible to sleep at night.
I'm an odd bird in these times where everyone wants modern and online and AI and etc. I want an old-fashioned stand-alone not-webby app that does basics and lets me do my own research and make my own decisions. I don't like smart software. I don't like that vendors are creating apps that will write a book or a website, but they don't care about doing genealogy data better than GEDCOM does it. I don't like the fact that programs are for sale before they're worth buying, and then they get to version 14 and still don't do basic, primary things right but brag about all these unnecessary secondary features. I am aware that my orientation is basically negative and somewhat anti-social. I'm not a team player, but a lone voice in the wilderness that exists to be ignored, because the status quo is self-perpetuating, due to the simple laws of physics. I predict that a few of my ideas will catch on, especially if my websites live longer than I do. That's up to my heirs.
I'm even aware that it's not about me, although it doesn't sound like it. Spearheading new ideas is a lonely business. But the failure of any number of GEDCOM replacements in the past doesn't apply to my project, logically or otherwise. Things do change. Usually too late for the originator of a new idea to get more than a pat on the head(stone) for his trouble.
I already replaced the gasoline automobile, and nobody noticed that either. Except... actually they did. That could grow into something, someday, but I don't expect to see it happen. I'll be up on my cloud playing my harp, look down, and... hey, look! They noticed me!
And go back to playing my harp.
That's good to know.
I've presented two alternatives to GEDCOM. One is a SQL schema and one is a gedcomoid (text file similar in some ways to GEDCOM) which is based on the SQL schema. The two can be used together, they are compatible with each other. The advantage of SQL is that it enforces its own rules. The disadvantage of SQL is that people are afraid of it. I blame this on MySQL which is ugly like its partner PHP. (That's the extent of my knowledge of MySQL and PHP. I chose Python and SQLite because they are beautiful.)
I woke up this morning all fired up to write a gedMOM specification. I had some of the jokes figured out before I got out of bed. The advantage of gedMOM is that it's a text file like GEDCOM so transitioning to it would be more natural for non-SQL users who prefer a tool they can read by eye. But gedMOM's disadvantage is that it can't enforce its own rules, it's not a program but a text file. Inert and helpless to keep people from using it wrongly. Of course SQL can be used all kinds of wrongly too but it does have some ways of protecting the data and itself from human error.
Many tweaks is a rewrite.
In SQL, an unwanted field is left empty. It's not a problem to have a standard that is partially used, if it's not a standard for operating nuclear power plants.
But you're probably correct that people would be arguing over something, no matter how high its potential to impose utopia on the unwilling. That's why the committee approach produces compromise, not change. Too many chefs spoil the brew. What's needed is a one-man project that is irresistible. That is my silly little fantasy.
I enthusiastically agree that GEDCOM should not be updated. As Tamura mentioned, the 20 years it was ignored is more time than they actually spent working on it.
But tweaking the Apollo spaceship to go to Mars? I don't think so.
Genealogy software is in its infancy. The personal computer is literally a few years old, like the personal automobile which will also be replaced. As historians we forget about this amazing, infinite thing called "the future". That is a place where GEDCOM does not belong and therefore will no longer exist. How long we have to wait for the future to get here depends on how soon we become willing to endure the pain of making a transition to something way better. And yes, it will hurt. But it will happen, hopefully before the fall of civilization.
Thank you. I almost swooned with happiness to hear you say this. The problem is your term "try". Before I wrote my finishable GEDCOM import program I got stuck in an earlier attempt I was calling GEDKANDU. Just a GEDCOM import program, but with the attitude that everything could be done. It's possible that much of what I did in that version was worth preserving, but in order to finish a finishable import program I had to back off the idealistic wish to make GEDCOM really really work. It was burning me out and I was not gonna get to a usable end result pretending that GEDCOM was capable of providing us with a "can do" situation. Finishable is good enough for now, I might get back to it someday and try again.
There are two real problems. One is that if the sibling tags PAGE and DATA aren't written in a certain order, my gambit will not work. The other is that since GEDCOM's actual intent was to link TEXT to SOUR, the apps are not providing one assertion (TEXT) linked to one citation (PAGE). The specs way is so weirdly unjustifiable that most vendors ignore TEXT. Others provide a place for source text but--just as prescribed by the specs--a specific block of text is not at all linked to a specific citation within the source. The result is a bunch of assertions glommed together which is useless so most vendors don't bother.
The purpose of an assertion is so you can figure out easily why you decided that Mary Brown was born in Acorn County, because you've just run into a source that asserts otherwise. Without an assertions feature, you have to go find all your other sources on her birth place and review them all to see what they say. A program that provides a place to link source text to source is not better than fishing through all your files to re-look at all the sources on Mary's birth. An assertion is not a bunch of assertions. Those are two different things.
For me it's always cardinality that straightens these things out. As it turns out on careful analysis, source and text have no direct link to each other. Assertion links to citation, where the assertion is made. Citation links to source, where it exists. So the link between source and text is indirect, in the real world. To record that Mary was born in Acorn County according to the 1880 census is not useful. To say which page of the census, that is useful, that's the link. But these are still just general remarks. The real design maker is the real nature of the relationship between source, citation and assertion. Is a relationship one-to-one, one-to-many, or many-to-many? Until this becomes the basis for schema design, whether GEDCOM or SQL, we're just playing in the mud.
Sorry for the preaching, this is getting too long and probably too intense. The first problem I mentioned is that the order PAGE then DATA.TEXT is not prescribed in the specs. Since PAGE and DATA are siblings, DATA could legally come first. Then my gambit of deleting the DATA tag in order to make the TEXT tag subordinate to PAGE... would result in chaos. I looked at every .ged I have to make sure that no one was putting DATA before PAGE, and decided it was sort of safe to jump in and get the job done the dirty way, but I still call it cheating. More detail is on page 139 of my book.
When I said that my goal is perfection it was a flowery exaggeration. Here's what I should have said. I'm kinda trying to contrast Treebard, as a showcase of functionalities which is in the public domain for anybody to borrow from, with a commercial application which is gotten to market in a big hurry and then the basic improvements it needs never get done because the flashy secondary features that want to be listed on the features page of the vendor's marketing page become the creator's goal, when the primary features of the app were never done correctly to begin with. Backward compatibility and commercial interest are working against the commercial app in a way that does not constrain Treebard.
I stand with the statement that finishing Treebard is not my goal. As a 68-year old lone hobbyist who is slowing down, and going blind, I should be in the garden getting sunshine and exercise, not staring at a screen all day. With new projects starting all the time and old projects not even started or not close to being finished, the Treebard app is not finishable by me, never was. However your remark is correct in spirit, because after 5-1/2 years of full-time work, I can say that the parts of Treebard that I have begun are finishable, and when I do that, I will in fact be able to use the app for my own needs. And maybe add features occasionally one at a time. My problem with finishing Treebard is that when something isn't good enough, I write it over, which has consequences. When thing A is written over, things B & C need to be written over to match it. So this is very unlike a commercial app in some ways because I'm totally unconstrained by backward compatibility.
Your contribution by noticing my project is a big deal because it's very motivating to be noticed. We don't have to agree about everything, or anything, for your input to be valuable. I suspect we probably agree about a lot of things.
My main disagreement with what I know of your work is that you might think that people can be taught to use GEDCOM right, that v. 5.5.1 approaches being able to transfer data correctly. Vendors will not use GEDCOM right, ever, because they don't have to, so they won't.
But that means that if GEDCOM is replaced, the replacement will also not be used right.
Welcome to the real world.
Speaking of reality, there's this thing called "cardinality". I haven't been able to get a reaction to this, my main point. It is commonly expressed as the terms "one-to-one", "one-to-many", and "many-to-many". The relationships among data fall into one of these categories, and an accurate model of the real world will carefully analyze each pair of related data (such as person and name) and handle it differently depending on what its cardinality really is. GEDCOM 5.5.1 doesn't even know what this is, as far as I can tell.
|
|
|
Post by lkessler on Mar 7, 2024 8:50:24 GMT -8
You're not quite getting what I'm saying. My point is that under a SOUR link, there is at most one PAGE tag and at most one DATA tag. Therefore the PAGE and DATA tags are associated with each other. The DATA tag is not associated with any other PAGE tag and the PAGE tag is not associated with any other DATA tag. This means that all your assertions as TEXT tags under the DATA tag are associated only with the one citation in the PAGE tag. e.g.: 1 BIRT 2 SOUR @s99@ 3 PAGE what: citation info for citation 1 3 DATA 4 TEXT assertion 1 for citation 1 4 TEXT assertion 2 for citation 1 2 SOUR @s99@ 3 DATA 4 TEXT assertion 1 for citation 2 4 TEXT assertion 2 for citation 2 3 PAGE what: citation info for citation 2 There's no need at all for you to delete the DATA tag, and the order of the PAGE and DATA tags doesn't matter. Forget that in GEDCOM the TEXT tag is subordinate to SOUR rather than PAGE and that other vendors likely haven't implemented TEXT under citations rather than sources. Because of the one-to-one relationships between a citation (PAGE tag) and a DATA tag, they must retain a many-to-one relationship from assertions (TEXT tag) to citations. And therefore you can take their GEDCOM output (no matter how they implement it in their program) and input it into Treebard as you've designed by linking the assertions to the citation to which they belong. GEDCOM was designed a long time ago to be a simple way to transfer basic genealogy data between programs. It did a very good job at that and still does. But once vendors designed squares that didn't fit the round holes of GEDCOM, people complained because the squares weren't being transferred. GEDCOM gave loopholes (e.g. custom tags) which only made the problem worse. The replacement to GEDCOM is slowly happening. The replacement is "syncing", i.e. the direct transfer between programs: - RootsMagic <--> Ancestry, FS FamilyTree
- FamilyTreeMaker <--> Ancestry
- AncestralQuest <--> FS Family Tree
- FamilyTreeBuilder <--> MyHeritage
- FS Family Tree --> MyHeritage (for members of the Church)
Only data that both systems support will be able to be transferred. And that is also true of GEDCOM, and neither GEDCOM 7 nor any other future standard can overcome that.
Of course, your utopian idea of having everyone use exactly the same database would work. That will only happen once one megacompany owns all of genealogy.
Louis
|
|
|
Post by Uncle Buddy on Mar 7, 2024 16:50:46 GMT -8
According to this quote from page 20 of the 5.5.1 specs...
...you are correct. I had developed a blind spot for the column in the specs with curious characters such as these:
For this glaring omission my only excuse would be that the specs don't spell out the purpose of these devices in the same place as the above quote (or not at all?) but still it's a bad excuse. I suspected that the info in the curly braces had something to do with my favorite topic, cardinality, yet I had not stopped to analyze the situation, and just got into the habit of ignoring the info when I should have thought carefully about the ramifications of how many of something could occur. Possibly I am not smart enough to extrapolate cardinality from the specs' {0:1} without a more experienced GEDCOM warrior pointing it out, but we learn from our mistakes, and from each other, hopefully.
It looks like I will have to make some revisions in my import program since deleting the DATA tags in order to make TEXT subordinate to PAGE actually changes the structure of the GEDCOM. If the PAGE comes after the DATA in the .ged, as in your second example, the TEXT might end up linked to the wrong thing.
Even if deleting the DATA tag is safe (and I suspect now that it's not), the code should be redone for this part of the import program based on a better understanding of how PAGE and DATA can be used, which would simplify the code.
Your second example:
I think you're saying that it is the very fact that DATA and PAGE are both one-only siblings under the SOUR pointer, which defines a one-to-many relationship for PAGE-to-TEXT. This is an abstraction to me, seems correct and I'm sure you've thought it out carefully. I'm not good at this sort of covert logic but after using it to redesign that part of my import program, it will probably come naturally.
A brief history of me and cardinality: long ago I gave up trying to use MS Access for a complex dictionary project because I couldn't figure out how to think about relationships among data or what to do about it. Using SQL instead forced me to deal with the nuts and bolts instead of the abstractions, and getting cardinality right has solved many a quandary for me.
So I've offered gedMOM as an alternative to UNIGEDS because it has a solid, consistent, singular basis: the way SQL works. GEDCOM has any basis it wants and switches strategies all over the place. In regards to a replacement for GEDCOM, let's say you're right and nothing short of all-out Big Brother control of genealogy could ever cause all vendors to adopt the same SQL database as a back end. Because there would be only one vendor.
Assuming that McGenealogy.com never does manage to take over everything, then what's really needed is a GEDCOM replacement that's palatable to the many varieties of genieware.
In that case, then each pair of vendor products would need its own transfer solution custom designed for that pair of products. A data transfer standard to serve that cause would be a guideline with one or more well-fleshed-out examples. For example, to import to Treebard, I would offer a gedMOM file which vendors can model to import their data structure to my data structure. Fat chance, I suppose, of anyone getting on board with that. Software companies are not going to make it a priority to find a better way to send their customers that-a-way.
In my ignorance, I know nothing about syncing. Are you saying that syncing is what I just described? A unique solution for each pair of genealogy data products? I wouldn't know, I'd still be using Word 6 and Windows 7 if it were possible. I avoid new stuff till it's old, like my old smartphone that's ignored somewhere under a pile of papers, which I'm only now starting to learn how to use.
I only learned less than a week ago how to find a cable that will allow data transfer from an Android phone to a Windows computer. In the Philippines, that means "try it before you buy it", even if it says "Data Cable" on the package.
|
|
|
Post by lkessler on Mar 7, 2024 19:09:56 GMT -8
Scott, If you didn't realize that {0,1} meant 0 or 1 occurrence, then methinks you should browse back through GEDCOM just to see all the tags that are only allowed once. There's quite a few of them. And then there are some that are required {1:1} or {1:M} and a few odd ones allowed a few times {0,3}. Some of those considerations may affect your GEDCOM input or output, but hopefully not your data structure. With regards to syncing, it's sort of like an automatic transfer between two programs without an intermediate file. So you don't have to export from program 1 to a file and then go into program 2 import the file. Syncs always transfer the data invisibly to the user, often with SOAP or a Rest API. stackify.com/soap-vs-rest/To sync, you go into program 1 or program 2, and press some button to start the sync. It will need to somehow connect to the other program (may require login or whatever). You do an initial sync with a new data file in one of the programs which will give you two identical sets of data (or as close as possible given the different data structures of the two programs). Then the next time you sync, the program checks to see the differences between the data in the two programs. Often you can select what you want it to do, e.g. make program 1 data the same as program 2, make program 2 data the same as program 1, or show the differences a person at a time and let the user select which fields from which program they want in both. There's a lot of work involved for the programmer to set up syncing, and they must get the cooperation of the other side to allow the communication between the two. It is fantastic when it is set up. Check this video by Devon Noel Lee: Sync Your Family Tree Between Ancestry & Family Search with RootsMagic - notice no GEDCOMs involved in any of the uploads or downloads in Devon's video. The nice thing about GEDCOM is that it doesn't require cooperation. But then because of that, you're at the mercy of how compatibly the two programs did their GEDCOM export and import. Louis
|
|
|
Post by Uncle Buddy on Mar 8, 2024 14:59:48 GMT -8
Thanks Louis, I will look at the links you provided.
I didn't get enough sleep last night, my gedMOM kept waking me up.
I looked at Gramps' data structure a little bit this morning, thinking their SQLite database would work something like mine, but it turns out they're not storing literal genealogy data but serializing it with a Python tool called pickle and then storing the pickled data. Sounds like an extra step to me, I can't imagine why the complication.
I had been thinking, if gedMOM is a potential GEDCOM replacement, I hadn't even thought about how to translate a vendor's dissimilar data structure into a gedMOM format. On second thought, I guess it's their problem to write their own export program but I'd still need to access somebody's data structure to prove it can be translated into gedMOM. Gramps is open source but not simple, nothing they do is simple.
Dropbox has a syncing feature, I should try it.
|
|
|
Post by lkessler on Mar 8, 2024 16:00:35 GMT -8
I don't think you'll find any program's data structure is simple. Not only that, they are all VERY different from each other.
|
|
|
Post by Uncle Buddy on Mar 9, 2024 6:30:02 GMT -8
That's starting to sink in.
|
|