GEDCOM places import has to be interactive
Oct 27, 2023 16:45:07 GMT -8
Post by Uncle Buddy on Oct 27, 2023 16:45:07 GMT -8
Treebard/UNIGEDS is the only genieware I know of which attempts to treat places at the right level of detail and accuracy.
Other genieware comes pre-loaded with many places that the user won't need. Here at Treebard University, our attitude is that the user wants to do his own research. He doesn't want to compromise by picking from a list of pre-loaded places. If he creates a new place, he wants it created as it exists in the real world.
Other genieware asks the user to classify single places according to place type (jurisdiction level such as county, city, nation, township, etc.) but can't provide a complete list of possible place types nor provide any useful functionality related to this extra step of over-categorization.
Another problem with pre-loaded places is that there is so much data stored in the app that the app loads slowly, and much worse, a reasonable amount of manipulation cannot be done with that much data. I'll describe some things that Treebard and UNIGEDS do, which would be overly difficult, bloating, and/or too costly in computing resources if these actions were taken on a database with thousands of places. Treebard is created for normal genealogists doing normal genealogy. What's normal genealogy? Compiling a family tree. Your tree or mine, going back as far as records will allow, only go back so far before the thread is lost. Treebard does not exist to serve the one-world-tree fantasy of McGenealogy.com. We care about Adam and Eve's family tree, but writing software to handle that much data is not our goal. Because of this, we can offer these unprecedented features:
Every single place (child place) can have more than one enclosing place (parent place). For example, "Dallas, Texas, United States of America" consists of three single places, one autofill string, and three place names, requiring seven unique ID numbers in order to be accurately represented in a database. Because each part of this compound or nested place is uniquely identified, "Dallas, Republic of Texas" can use the same ID for the "Dallas" part (same as Dallas, Texas, USA) but a different ID for the Republic of Texas, which was its own country. And currently, Dallas, Texas is part of four different counties, so a resident of Dallas, Collins County, Texas might live across the street from a resident of Dallas, Denton County, Texas. In Treebard, this is treated as the same Dallas but a different county. And when the user autofills a long, complex, place name by typing three letters, Treebard gets it right. This feature becomes important when going back in time, since state, province, and county boundaries and place names change so much as countries are forming, and some are still changing. Ever tried to research historical German boundaries? Or U.S. counties and townships in the 1800s? Treebard can handle the details, but other genieware products brush the details under the rug, committing severe compromise on the real-world details in the way that they store places.
In order to ask less of the user later, when the user is inputting his own data, when the user will type a few characters and watch a whole compound place name fill in automatically, we have to ask a little more of the user during the GEDCOM import process. Because the GEDCOM place sub-element is so inadequate at representing how real places in the world actually work as PRIMARY elements, we are creating an interactive place import interface. Shortly after the GEDCOM import begins, the user will be asked to look at a chart of all the nested places in the import file. This step will be carefully designed to be self-teaching and easy to accomplish, but it won't be completely effortless for the user. Creating some sort of smart software procedure to do it all in the background was considered, and has eaten up plenty of our time, but in the end our anti-smart-software ethic won out. In Treebard, it is the user who makes the decisions.
Below is an example of some of the variation that a user might be asked to sort out, by clicking a single place name such as "Paris" for example. All the single place names spelled the same way will be highlighted and checked. The user will uncheck those that are different from the rest. In this case, all the places named "Paris" that are in Texas could remain checked, and the user would just uncheck the few that are in France. He will click a NEXT button and the checked places will gray out. Next he can click Paris again, the remaining Parises will be auto-checked, and then he just clicks NEXT, since there are only two places named Paris in the tree he's importing.
This process could take a few minutes for a large GEDCOM file, but it could save the user hours of manually splitting and merging places done wrong by "smart" software. It will also automatically create aliases such as U.K., UK, and United Kingdom, if all three exist in the GEDCOM file; all three will have the same place ID number (but different place name IDs) if the user has indicated during the interactive import process that they are the same place.
In practice, this will be easier to do than it is to read about.
Now to write the software I just promised... It will be fun, unlike the smarty-pants software I was trying to write that was driving me crazy. Really, folks, smart software is not only the bane of our existence, since it makes wrong choices 3/4 of the time, it is very hard to create. In my book, "very hard to create" + "usually wrong" = don't do it.
It has not taken me years to get Treebard's place systems to work so complicatedly. It has taken me years to get them to work so simply.
The list below is a sampling of the sort of confusing GEDCOM places that our interactive place import + user input will sort through correctly in a few minutes, whereas writing imperfect "smart" software to make the same decisions would bloat the code and make the person writing the code waver between 1) needing a large bowl of chocolate ice cream followed by a long nap and 2) wanting to throw furniture.
Other genieware comes pre-loaded with many places that the user won't need. Here at Treebard University, our attitude is that the user wants to do his own research. He doesn't want to compromise by picking from a list of pre-loaded places. If he creates a new place, he wants it created as it exists in the real world.
Other genieware asks the user to classify single places according to place type (jurisdiction level such as county, city, nation, township, etc.) but can't provide a complete list of possible place types nor provide any useful functionality related to this extra step of over-categorization.
Another problem with pre-loaded places is that there is so much data stored in the app that the app loads slowly, and much worse, a reasonable amount of manipulation cannot be done with that much data. I'll describe some things that Treebard and UNIGEDS do, which would be overly difficult, bloating, and/or too costly in computing resources if these actions were taken on a database with thousands of places. Treebard is created for normal genealogists doing normal genealogy. What's normal genealogy? Compiling a family tree. Your tree or mine, going back as far as records will allow, only go back so far before the thread is lost. Treebard does not exist to serve the one-world-tree fantasy of McGenealogy.com. We care about Adam and Eve's family tree, but writing software to handle that much data is not our goal. Because of this, we can offer these unprecedented features:
Every single place (child place) can have more than one enclosing place (parent place). For example, "Dallas, Texas, United States of America" consists of three single places, one autofill string, and three place names, requiring seven unique ID numbers in order to be accurately represented in a database. Because each part of this compound or nested place is uniquely identified, "Dallas, Republic of Texas" can use the same ID for the "Dallas" part (same as Dallas, Texas, USA) but a different ID for the Republic of Texas, which was its own country. And currently, Dallas, Texas is part of four different counties, so a resident of Dallas, Collins County, Texas might live across the street from a resident of Dallas, Denton County, Texas. In Treebard, this is treated as the same Dallas but a different county. And when the user autofills a long, complex, place name by typing three letters, Treebard gets it right. This feature becomes important when going back in time, since state, province, and county boundaries and place names change so much as countries are forming, and some are still changing. Ever tried to research historical German boundaries? Or U.S. counties and townships in the 1800s? Treebard can handle the details, but other genieware products brush the details under the rug, committing severe compromise on the real-world details in the way that they store places.
In order to ask less of the user later, when the user is inputting his own data, when the user will type a few characters and watch a whole compound place name fill in automatically, we have to ask a little more of the user during the GEDCOM import process. Because the GEDCOM place sub-element is so inadequate at representing how real places in the world actually work as PRIMARY elements, we are creating an interactive place import interface. Shortly after the GEDCOM import begins, the user will be asked to look at a chart of all the nested places in the import file. This step will be carefully designed to be self-teaching and easy to accomplish, but it won't be completely effortless for the user. Creating some sort of smart software procedure to do it all in the background was considered, and has eaten up plenty of our time, but in the end our anti-smart-software ethic won out. In Treebard, it is the user who makes the decisions.
Below is an example of some of the variation that a user might be asked to sort out, by clicking a single place name such as "Paris" for example. All the single place names spelled the same way will be highlighted and checked. The user will uncheck those that are different from the rest. In this case, all the places named "Paris" that are in Texas could remain checked, and the user would just uncheck the few that are in France. He will click a NEXT button and the checked places will gray out. Next he can click Paris again, the remaining Parises will be auto-checked, and then he just clicks NEXT, since there are only two places named Paris in the tree he's importing.
This process could take a few minutes for a large GEDCOM file, but it could save the user hours of manually splitting and merging places done wrong by "smart" software. It will also automatically create aliases such as U.K., UK, and United Kingdom, if all three exist in the GEDCOM file; all three will have the same place ID number (but different place name IDs) if the user has indicated during the interactive import process that they are the same place.
In practice, this will be easier to do than it is to read about.
Now to write the software I just promised... It will be fun, unlike the smarty-pants software I was trying to write that was driving me crazy. Really, folks, smart software is not only the bane of our existence, since it makes wrong choices 3/4 of the time, it is very hard to create. In my book, "very hard to create" + "usually wrong" = don't do it.
It has not taken me years to get Treebard's place systems to work so complicatedly. It has taken me years to get them to work so simply.
The list below is a sampling of the sort of confusing GEDCOM places that our interactive place import + user input will sort through correctly in a few minutes, whereas writing imperfect "smart" software to make the same decisions would bloat the code and make the person writing the code waver between 1) needing a large bowl of chocolate ice cream followed by a long nap and 2) wanting to throw furniture.
GEDCOM_places = ("Dallas",
"Dallas, Texas",
"Dallas, United States of America",
"Dallas, Republic of Texas",
"Dallas, Denton County, Texas",
"Dallas, Denton County, Texas, United States of America",
"Victoria, Mississippi, United States of America",
"Precinct 4, Trevor County, Texas, United States of America",
"Bomarton, Texas, USA",
"Paris, France",
"Île-de-France",
"Île-de-France, France",
"Paris, Ile-de-France",
"Paris, ile-de-France",
"Little Spring, Caleb County, Arkansas, United States of America",
"Mountain Valley Route, Green Valley Springs, Arkansas, United States of America",
"Paris, Texas, United States of America",
"Paris, Lamar County, Texas",
"Bush Family Cemetery, Maddox, Garland County, Arkansas, United States of America",
"Arkansas, United States of America",
"Byhalia, Mississippi, U.S.A.",
"Muskogee, Muskogee County, Oklahoma, United States of America",
"8-E. 1-N. 1/2 W., Raoul Valley, Carver County, Oklahoma, United States of America",
"Raoul Valley, Oklahoma, United States of America",
"Raoul Valley, United States of America",
"Hubert Cemetery, Raoul Valley, Carver County, Oklahoma, United States of America",
"Mississippi, United States of America",
"Snowflake, Arizona, United States of America",
"Paris, United States of America",
"Texas, United States of America",
"Highway 5C, Precinct 2, Trevor County, Texas, United States of America",
"Green Valley Springs, Arkansas, United States of America",
"Trinidad, Colorado, usa",
"Southern Mindanao, Philippines",
"Mindanao, The Philippines",
"Panay Island, the Philippines",
"Little Panay, Davao del Norte, Mindanao, P.I.",
"東京",
"144 East Main Street, Little Spring, Caleb County, Arkansas, United States of America")