Doing Nested Places Right

new

« Prev
1
Next »

Uncle Buddy
Administrator

Posts: 618

Doing Nested Places Right Apr 22, 2021 2:19:22 GMT -8

Quote

Post by Uncle Buddy on Apr 22, 2021 2:19:22 GMT -8

Nested places are places like "Huron, Erie County, Ohio, USA". Each place separated by commas is an independent entity, each place is itself a nest which is in turn nested in a larger place, so each succeeding nest or parent is larger than the nest before it. I thought I had this all figured out a few years ago. Finally it occurred to me that a place has to have more than one parent in order to represent how real places work. Boundary lines change all the time. Places change their name. A city might be nested in a variety of townships, precincts or counties all at the same time and/or at different times. Two babies born in the same town on the same day might be born in different counties. Take for example the counties of West Virginia. Before West Virginia existed, some or all of these same counties were nested in Virginia. So we need to be able to input a given place with two or more parents.

This is where it wouldn't work for Treebard to make decisions for the user. If we made it impossible for the user to put Milford, Oregon in West Virginia, then we might inadvertently also be making it impossible to put Milford, West Virginia in Virginia. Since we can't make rules to automatically exclude circumstances that we can't foresee, we have to make the GUI give the user all the information he needs in a crystal-clear way.

I used to use genieware that had autofill places and that was nice, but it was very easy to do something like put Paris, Texas in France. This was considered the user's problem; when I found out what I'd been doing, it took hours to make things right by splitting and merging places. Autofill places are great but the program has to know what's going on and it has to tell the user what's going on. Mistakes should be prevented, not by making things impossible, but by telling the user what he's about to do. Not with annoying, superfluous messages: "R U Sure that Milford, Oregon is in Virginia???" but in a way that doesn't make the user have to click-click-click several times to do one thing. It might seem crazy that Treebard doesn't just accept user input but we really have to do our best to keep Paris, Texas out of France. The French and the Texans might be equally miffed if this were to happen.

But because any place can have an unlimited number of parents, it's not Treebard's job to stop mistakes by force, by making things impossible. Instead, we have to make mistakes easily preventable by detecting duplicate place names and giving the user a chance to make his own informed decision.

What it boils down to is that when a database has a single town named Milford in both Virginia and West Virginia, that town should be the same town with the same ID number in the database. If not for giving nested places a many-to-many relationship, it would not be possible to do it right. The single place "Milford" would have to be treated as two different places with two different IDs if a place could only have one parent. Software like this would be useless for representing real historical places. Treebard will also give the user the option of specifying during what period of time Milford was in West Virginia, and what period of time it was in Virginia.

This all has to be made easy for the user, and that's the hard part. It would be easy to just dump all the information in the user's lap and make him sort it all out. The right way is to do the hard part, prefill the most obvious choices, and usually the user can just click OK and keep going. But he has to know what he's OKing and this info has to be obvious without being too wordy. If the user won't read it because it's ugly, it won't be used. And then Milford, Oregon will end up in West Virginia anyway.

Last Edit: Apr 28, 2021 4:36:25 GMT -8 by Uncle Buddy

Scott Robertson (Professor U. d'Guru)
"If you don't build it, it won't work."

Uncle Buddy
Administrator

Posts: 618

Doing Nested Places Right May 2, 2021 6:28:17 GMT -8

Quote

Post by Uncle Buddy on May 2, 2021 6:28:17 GMT -8

When I realized a few years ago that nested places would have to have a many-to-many relationship, I spent some days trying to figure out how to write a recursive query on a many-to-many database table. That was a lost cause at the time so I shelved that project and worked on other parts of the events table first. Nested autofill places was to be that thing I'd do when I got done with the easy columns on the table, a few of which actually ended up being easy.

Almost two months ago, during a binge of refactoring, I started tinkering with an idea, got too close and fell in: what if the recursion were done in Python and the SQL were kept super simple and basic? Oddly enough, this worked out fine and somehow I managed to write a Python procedure for dealing with multiple parents of a single place. For example, Excelsior Township was in Batwing County from 1848 to 1892 and since then it's been in Hardrock County. The historical facts need to be inserted to the database, not some "if life was simple" truncated version of them. Nested places have more than one parent.

Not that you can't record the digested version if you want. The level of detail you report is up to you. The problem is that there are people who need to input a level of detail that mimics reality. If these folks can't use Treebard, there's no point in Treebard existing.

I've started over several times on something I call "the new place dialog", and it's finally starting to fall into place. The point is, when do you open a dialog for the user to clarify and confirm details about which place he's talking about? He types text into an autofill, and the program has to figure out which nest of places he's talking about. There are a lot of places named Portland, a lot of places named Paris, and several places named Ohio.

Right now it seems like it should be possible to figure out which place the user is talking about nearly all the time. There are some exceptions, when it comes to places with the same name. If not for duplicate place names, this would be easy. Or I could just make the user look up each and every place ID number when entering places but that wouldn't be very nice. There are ways to figure things out based on what else is in the nest of places.

The new place dialog might have to be opened up when a place being entered has only one known place in its nest. But realistically speaking, in such a case the one known place is going to be very large, like the USA, and putting a new place called "Ohio" in an existing place called "USA" probably shouldn't require checking. Another option would be to populate the database with countries and states but this could get hairy. Possibly I'd stop with currently-existing countries and the US states, and let people populate the database with historical places and provincial jurisdictions of places they're familiar with.

I don't think there's any absolute reason why genieware should come with a bunch of places pre-installed. But in the case of detecting which nested place Colorado County belongs to, it would make things easier and keep the new place dialog from opening all the time, if a place called "Texas" in a place called "USA" were already in the databse.

Last Edit: May 2, 2021 6:54:18 GMT -8 by Uncle Buddy

Scott Robertson (Professor U. d'Guru)
"If you don't build it, it won't work."

Uncle Buddy
Administrator

Posts: 618

Doing Nested Places Right May 2, 2021 6:54:33 GMT -8

Quote

Post by Uncle Buddy on May 2, 2021 6:54:33 GMT -8

Speaking of jurisdiction, can someone tell me why genieware authors try to encode jurisdiction categories into their programs? I don't see the point. I wouldn't mind an extra column to say what level of place the place happens to be--it's a "county" or a "province" or a "baranggay" or a "purok" or a "parish"--but I don't see why the program has to track this category. I definitely don't see why any algorithm should be based on it. It's sort of like gender. If you try to use the category to decide something about the data, someone is going to add a new category and the code gets more and more complicated.

So up to this point, nested places code in Treebard just ignores jurisdiction levels. If I'm wrong about this, let me know.

About abbreviating place names. I can see how it might be nice at times to have short versions of place names to use where space is limited. But why should space be limited? I've tried to use a popular program that is clogged with too much data in one place, so all the columns are resizable. Manually, by hand; the hand that aches from using a mouse too much. When I am shown data, I want to be able to read it with my hands in my pockets.

It's really bad form in a way to abbreviate names. People names should never be abbreviated. A genieware designer who changes John Warwick Shellshire to J. W. Shellshire because he hasn't bothered to design a GUI that fits the data... should be ashamed of himself. Usually the same goes for country names. I have typed the word "county" thousands of times because in the U.S. where I'm from, we say "County" after "Garfield" for "Garfield County". In England they just say "Leicester". But in Ireland it's "County Down". I like to use the whole thing including the country name.

But when it comes to ostentatious place names like "The United States of America" or "Union of Soviet Socialist Republics", commonplace abbreviations like "USA" and "USSR" practically need to be used so that the grandiosity of the politicians who named the country so loftily is not the only thing I see when I look at a page with a lot of places on it. Often it's the smaller places that we're more interested in. A census page is not so much about the country but the small place--the township, village, neighborhood, even the enumeration district is where it's at in genealogy. The real stories, for most people, are the local ones.

Last Edit: May 2, 2021 6:58:40 GMT -8 by Uncle Buddy

Scott Robertson (Professor U. d'Guru)
"If you don't build it, it won't work."

Uncle Buddy
Administrator

Posts: 618

Doing Nested Places Right May 13, 2021 23:14:26 GMT -8

Quote

Post by Uncle Buddy on May 13, 2021 23:14:26 GMT -8

Here's a good discussion on the topic of entering place names to a genealogy database: www.tamurajones.net/PlaceNameStandardisationBasics.xhtml

I agree with Mr. Jones that the country is very important and should not be left out. I used a genieware once that could be instructed to not display the name of the country. But let's say you're inputting cities in Georgia, USA and then you have to input some cities in the country Georgia. Confusion is just around the corner. Countries should never be left out. In fact I was re-reading Tamura Jones' blog post on place names because I was trying to decide whether Treebard should disallow entry of names without their largest places. I'm thinking this should be one of the few things that Treebard should in fact disallow.

But I do disagree with the convention of typing two commas in a row in order to note a missing nest in the nesting of places. For example, you have the city and state/province but not the county. The convention of adding an extra comma is from the days of typewriters. Mr. Jones points out that the double comma might be auto-changed by word-processing software. (I tend to turn such features off, since "smart" software is more often than not just making trouble; I call it "smarty-pants software".)

Mr. Jones suggests using a question mark as a placeholder for the missing county, so you can tell that the double comma is not a typo.

But I'm pushing a new attitude about place names, which is that it's none of our business what kind of category each nest in the nested place belongs to. County, State, Province, etc... it's not that I don't care. I'm too detail-oriented like a lot of genealogists but sometimes I think we're too detail-oriented for our own good. I'm not opposed to tagging places with their jurisdictional category, and that might be a good idea because some folks might find that really interesting or important. But to have any of the program logic depend on these categories is a big can of worms. My reasoning is spelled out earlier in this thread.

I believe that if you don't know the county name and don't want to take a minute to research it, you should just enter City, State, Country and add the county later. I'm writing Treebard so that inserting another nest in the nested place string will be easy. To me, the most important feature in places is the autofill. Getting rid of tedium is where it's at. Not by sweeping everything under the rug, and I don't intend to sweep jurisdictional categories under the rug, but they need to be taken down a notch in importance.

So no question mark for missing places. In fact, Mr. Jones' suggestion to use a question mark seems contradictory to his belief that no nicknames or placeholders should be used in lieu of real names. But his whole blog needs to be read and studied from top to bottom. It is one of the best sources for inspiration on "what should I do and why should I do it" for anyone thinking about writing his own genieware.

Except for the part where he suggests not writing your own genieware. I can't go along with that.

Last Edit: May 13, 2021 23:17:14 GMT -8 by Uncle Buddy

Scott Robertson (Professor U. d'Guru)
"If you don't build it, it won't work."

Uncle Buddy
Administrator

Posts: 618

Doing Nested Places Right May 15, 2021 22:27:41 GMT -8

Quote

Post by Uncle Buddy on May 15, 2021 22:27:41 GMT -8

Regarding the case of two countries by the same name. A duplicate country name would be rare but possible. For example, historically there has been more than one country named "Israel", assuming that we don't want to input the modern one as "The Republic of Israel", "State of Israel", "Medinat Yisra'el" etc. Shorter names are better for countries in genieware, as long as we aren't creating fictional nicknames. It's a visual thing and a space problem on the screen.

(And local places tend to be more interesting to the research. Finding two people named Mary Smith in the same country is irrelevant. Finding two people named Mary Smith in the same township is interesting. Finding two people named Mary Smith in the same house could be the key to everything.)

The edge cases have to be dealt with. It's the cost of trying to do something amazing. In the case of Jerusalem, Israel, it might be considered more correct to input Jerusalem as child of one or the other countries named Israel instead of lumping two countries together when they existed at different epochs and had different boundaries and reasons to exist or at least different types of governments and/or comings-into-being as nations.

Then there's the problem of many names for the same place. Jerusalem has had over 70 different names in the Hebrew language alone. So we'd better not start making rules and shutting out possibilities. It's better to deal with edge cases up front when they can be anticipated. Forbidding anything in genieware is going to lead to problems for somebody. Ideally, if the genieware designer accepts the challenge of solving even the more rare problems in advance, then the user might not know there's a problem at all.

Last Edit: May 15, 2021 22:29:28 GMT -8 by Uncle Buddy

Scott Robertson (Professor U. d'Guru)
"If you don't build it, it won't work."

Uncle Buddy
Administrator

Posts: 618

Doing Nested Places Right May 15, 2021 23:28:35 GMT -8

Quote

Post by Uncle Buddy on May 15, 2021 23:28:35 GMT -8

But in the case of one place named Jerusalem with two different parents by the same name, Treebard has no way of knowing which place the user intends without opening a dialog to get clarification. It's not just because there are only two nests in the nested place, although that's part of it. Let's say the database has a nice long nesting in it like Room 17, BPOE Hall, Mugwump Flats, Giffard Township, Silver Tomahawk County, Kansas, USA. Now let's say BPOE Hall burned down and was replaced by another BPOE Hall. The user doesn't want to give them unique names to differentiate them because they have identical names in reality. Conceivably there could end up being another Room 17, BPOE Hall, Mugwump Flats, Giffard Township, Silver Tomahawk County, Kansas, USA in the database. A silly example but edge cases have to be invented beforehand in order to prepare code that will handle the real ones if they ever come up.

Here are some examples showing how unique names can help Treebard figure out what the user intends when there are also duplicate names.

Rm 17, BPOE, Mugwump, KS, USA
Rm 17, BPOE, Mugwump, KS, USA... Dialog will open with hints since there are 2 different BPOEs and 2 different Room 17s.

Rm 13, BPOE, Mugwump, KS, USA
Rm 19, BPOE, Mugwump, KS, USA... No dialog is needed even if 2 different BPOEs.

Rm 13, BPOE, Stumpwart, KS, USA
Rm 13, BPOE, Mugwump, KS, USA... No dialog is needed.

In the first example a dialog will be needed for clarification because there's nothing unique except for the ID, and the user shouldn't have to look up IDs. And even if he did, unless he'd memorized which ID was which or made notes on it, looking up IDs wouldn't help. So Treebard places can be given a hint upon creation which will display on the duplicate places dialog, so the user will know which place he intends when he tries to use one of the duplicates.

If two different nested places have identical names, then the only solution is to open a dialog for clarification. But if there is even one difference between their respective nestings, Treebard can figure out which place is intended. The problem is that the user is not expected to look up the IDs for all the nests in his nested place, and Treebard can't differentiate them without IDs or some reliable way of guessing the IDs correctly.

The reason for this discussion is that allowing free entry of place names is ideal, and we don't want to open a dialog for the user to clarify what he's already entered once, unless it's really necessary. But it is necessary at some point if we're going to draw the line on free entry of ambiguous place names, so that mistakes aren't made by things being too free.

Last Edit: May 16, 2021 0:46:57 GMT -8 by Uncle Buddy

Scott Robertson (Professor U. d'Guru)
"If you don't build it, it won't work."

Uncle Buddy
Administrator

Posts: 618

Doing Nested Places Right Aug 20, 2021 22:42:50 GMT -8

Quote

Post by Uncle Buddy on Aug 20, 2021 22:42:50 GMT -8

A lot of water has gone under the bridge since my last post in this thread. I've taken some breaks from Treebard GPS coding to start other projects, to deal with health issues, etc.

I started working on the places.py module over five months ago with higher expectations that what I've ended up accepting. I thought Treebard should be able to figure out what place the user was talking about by extracting hints, and no doubt more could be done in this regard to keep the new & duplicate places dialog from opening for every new and duplicate place. But after starting over several times due to being unable to slog through the spaghetti to a satisfactory conclusion, the only solution to prevent total burnout was to be satisfied with simpler solutions. So the dialog will in fact open for every single new and duplicate place. Fortunately there aren't that many duplicate places in most trees. And I think the dialog is easy to use and self-explanatory without lacking needed detail. We'll see how it goes, but for now I hope to move on to the next phase of the refactoring. I have not found any defects from the limited testing I've done so far on places.py.

I'll post below the notes I'm pulling off the module, and I'll post the code itself in the code section. I'll also commit to the public repo at github, but be aware if you're cloning it that the project is still being refactored and will not all work together as a unit till I finish the refactoring, which is a basic restructuring of everything. When I started the refactoring at least six months ago, I didn't know it would mean taking on the nested places, but it did mean that and what a journey that has been.

Notes from places.py

Q: Wherever a pair is used it has to be changed i.e. in the nesting
so this has to be taken into account, put off or ignored.
A: You can't change all nestings just because one similar nesting changes.
The user has to do this on purpose, either one link at a time or Treebard
can provide a way to do it en masse. You can keep both old and new.
If you keep both, you'll keep autofill functionality for both.
So what is the repercussion of doing that? You also have to leave pairs intact
and create new ones. Editing pairs and similar nests--like deleting
places--should only be done in the place tab, where en masse editings of all
similar places can be done, instead of trying to do it on the fly in the
events table.

PLACE STRING TERMINOLOGY

nest: "Paris" or "France" in "Paris, France"
new nest: a nest that's not in the database yet
nesting, nested places, nested place string: "Paris, France"
child: "Paris" in "Paris, France"
parent: "France" in "Paris, France"
pair, nested pair: (Paris, France) but represented by their respective IDs
inner duplicate: "Maine" in "Maine, Maine, USA"
outer duplicate, duplicate: "Paris" in "Paris, France" and "Paris, Texas"
known: this string exists only once in the database e.g. "Timbuktu", so if user
enters "Timbuktu", depending on the context in the nesting, Treebard might
guess with reasonable certainty that this is Timbuktu, Mali since there's
only nest called "Timbuktu" in the database: Timbuktu, Mali. But if the
user also has also entered another place by the same name (e.g. the crater
on Mars named Timbuktu), then "Timbuktu" will no longer be treated as a
known, and Treebard will have to try to figure out which Timbuktu is
intended each time the user enters the name. A dialog will open for every
new and duplicate place entry.

EXPECTATIONS

The goal is to open a duplicate place name dialog for user clarification but
to do so as seldom as possible within reason. Here's an example of "within
reason". User inputs "Paris, Precinct 5, Lamar County, Texas, USA". Paris is
an outer duplicate, Precinct 5 is a new place, and the last three nests are
known. It seems pretty obvious to the user which Paris is meant: the one he's
already entered in Lamar County, Texas. And we could write the code to guess
that's what he meant, but the more complicated the code gets, the greater will
be our self-loathing at some point in the future when an even more complicated
bit of code has to be added for an even more convoluted imagined necessity. We
don't like to open R U Sure or anything like it, but entering new places is
done all the time because Treebard should not come pre-loaded with everything
that Google Maps has ever heard of. Portability is important and we have only
a wee snort of contempt for the kind of genealogy that is supposed to be all
done by machine logic without the user having to do any research. The internet
is filling up with bad data invented by smarty-pants software. And we don't
like bloated, unmaintainable code. Part of the reason for Treebard to exist is
that the code should be usable by amateur programmers. So for these reasons and
others, it is my decision at this point (after literally months spent writing
and re-writing the code for the new and duplicate places dialog) that the right
thing to do is to draw the line for opening a dialog somewhat early rather than
somewhat late. Treebard will ask for user clarification slightly more often
than some users would like, in cases where new places are inserted into existing
nestings or nestings which contain duplicate place names.

But I failed to mention the most important reason for not trying to guess which
place out of a set of duplicates is intended: due to circumstances that we can't
always predict, Treebard might guess wrong. Whereas if we give the user a chance
to input the new place correctly, in the long run the user will be happier about
it. We shouldn't try to predict every eventuality in a misguided attempt to make
the user's decisions for him. But it would also be wrong to allow free entry of
just any old misbegotten place name such as mis-spellings and true duplicates.
Somewhere between the two extremes of "do what you want" and "do what I say", a
sane middle-ground is being groped-in-the-dark for as fastidiously as possible.
For example, when entering a new place nested among existing places, the user
will be shown a dialog in which the new place has been assigned an ID and
barring other complex input, all he has to do is press OK. This dialog seems
superfluous depending on your point of view, but from Treebard's stance, the
user should have a chance to review and cancel all new and duplicate place input.
The only time Treebard doesn't open a dialog for new place input is when within
the nesting that's being input, all of the nests are already in the database.

There are three tables in the database re: places:
1) place table stores place_id and places i.e. the nest (the place string)
2) places_places table stores child-parent pairs. A nest can have more than one
parent, and this is a unique feature of Treebard which makes its places data
storage and manipulation historically accurate but relatively complex.
3) finding_places stores nestings. Its reason to exist is to provide autofill
strings for place inputs.

Autofill places are key to our design. Allowing only good data up front is
important so user will not have to split and merge places when he finds later
that bad data has been allowed into the database. Ideally, the validation
processes should be invisible to the user almost always, unless he's doing
something really unusual in which case he can expect to see dialogs. Places
are an exception in Treebard because of my experience using other genieware in
which it was super easy to accidentally locate the Eiffel Tower in Paris,
Texas. In very unusual cases, the user will want to cancel the new & duplicate
places dialog and instead input the place nesting manually in the places tab.

Unfortunately, Treebard can't accept place names that contain a comma. Few of
these exist, and if a user wants to enter one, he'll have to change the comma
to a hyphen or something. We more or less have to use commas to delimit nests
within nestings, I don't think there's any way around it, so place names
containing commas won't work.

We should pre-populate the database only with currently existing countries,
continents and major oceans, and let the user provide the places he actually
needs. The places he enters can be used in all his trees. I once thought there
should be a global places list for places he wants to use in every tree and a
per-tree places list, but I no longer want to deal with this sort of thing. It
shouldn't be a problem because the autofill place list is manipulated to first
show recently used places. So if the user has input Zuneida Resort and uses it
a lot, it will show up before Albuquerque. If a place called "Aababaca" had to
be input to one tree and used only once, it can remain harmlessly in a global
places table because Treebard will show Albuquerque first if Albuquerque has
been used more recently. [EDIT: I don't understand that sentence either 2022-06-26.]

The only way to leave out the largest place (e.g. "USA" if user doesn't want to
see it on every nested place string) is for the user to not enter it. Treebard
disapproves so we shouldn't encourage it. Short versions of country names will
be input for USA and USSR only, with their full versions input as a.k.a. All
other place names should be spelled out. If some kinda silly officialistic
country names like "The Redundant Republic of Featherduster" are input as
"Featherduster", it's OK because even if the capitol of Featherduster is called
"Featherduster" too, Treebard can tell them apart. So for the most part, the
user has a lot of flexibility.

Last Edit: Jun 26, 2022 0:46:45 GMT -8 by Uncle Buddy

Scott Robertson (Professor U. d'Guru)
"If you don't build it, it won't work."

Treebard Genealogy Software

Treebard Genealogy Software: setting the record straight since 2020

How genealogy software should work

Doing Nested Places Right

Post by Uncle Buddy on Apr 22, 2021 2:19:22 GMT -8

Post by Uncle Buddy on May 2, 2021 6:28:17 GMT -8

Post by Uncle Buddy on May 2, 2021 6:54:33 GMT -8

Post by Uncle Buddy on May 13, 2021 23:14:26 GMT -8

Post by Uncle Buddy on May 15, 2021 22:27:41 GMT -8

Post by Uncle Buddy on May 15, 2021 23:28:35 GMT -8

Post by Uncle Buddy on Aug 20, 2021 22:42:50 GMT -8

Treebard Genealogy Forum is for suggesting changes in family tree conclusions and software design.