Time to Rewrite the Places Feature

new

« Prev
1
Next »

Uncle Buddy
Administrator

Posts: 661

Time to Rewrite the Places Feature Jun 25, 2022 21:39:35 GMT -8

Quote

Post by Uncle Buddy on Jun 25, 2022 21:39:35 GMT -8

Adding a place to an existing finding no longer works. The problem might be...[DELETED]

It looks like this is a good time to bring the complex places functionality up to speed with my current understanding of how a database should be structured. I recently combined findings_persons and persons_persons (two many-to-many tables) into the finding table. When I set out to do this, it also seemed like the odd table finding_places also had a one-to-one relationship with finding, so I added that too, and probably the reason that new places can no longer be added is that I didn't finish what I was doing and didn't do enough testing. That was a few months ago and I've spent enough time looking for the reason for the problem. Basically, it seems I rewrote the places functionality around that time, improving some questionable things about it, but not really finishing it.

It's true that the way finding_places was constructed did constitute a one-to-one relationship with a finding. The problem is that it should not have been constructed that way. The way it worked (which I never liked) was that the schema consisted of a primary key, a foreign key for finding_id, and foreign keys for up to nine single places, which would constitute a nested place linked to a finding. What I didn't like was that the values of the multiple nested places could be the same in many rows. As regards cardinality, this bad design did need to be moved to the finding table since a nested place would be repeated. I vaguely recall solving whatever problem this arrangement caused by putting the finding_id in the finding_places table instead of putting the finding_places_id in the finding table. This would have been correct only if the cardinality was like this: each finding can be linked to many nested places and each nested place can be linked to only one finding. Since this was wrong, the nest0 thru nest8 FKs got moved to the finding table (where they are a nuisance by the way).

As for why the places feature is broken (new places don't go into the database anymore--everything else works), I don't think that's worth worrying about. The code needs to be rewritten but first the data structure needs to be fixed. I don't care exactly what's wrong with the code if it needs to be rewritten anyway.

The easiest way to confront this is to first decide, without reference to any code or SQL, as if this was the first time I'd ever thought about it, what the cardinality of the relationships REALLY are: one-to-one, one-to-many, or many-to-many. Not what they have been thought to be in Treebard at any point in Treebard's history, but what they SHOULD be. And then rewrite the places functionality as needed to align with that reality. I don't want to oversimplify the world to fit it into Treebard. I want to figure out the world and make Treebard match that.

OK, relationship with what? Assuming that a nested_places table is needed (not finding_places like before), then it would have a primary key nested_place_id which could be used as a foreign key in another table or ignored. What would you want to link these rows with? Assertions and conclusions, obviously. Since conclusions join to form a finding (a row in the GUI's conclusions table), what first occurs to me is that the nested_place_id would be used as a foreign key to refer to a place. In the assertion table, there would be a place column and that would be the right place, and the columns in both places should be named nested_place_id.

However, where this link should take place depends not on what first occurs to me, but on the actaul cardinality of the relationships. A nested place is no longer a string composed by getting values for an ordered collection of single place ids. It's a single foreign key, it's one thing. So "Denver, Arapahoe County, Colorado" and "Denver, Denver County, Colorado" would be two separate elements even though Denver and Colorado haven't changed much; if Arapahoe County and Denver County have very different boundaries, then they have unique identities. (If it was just a matter of renaming the same county, then it would be one place with two names, but that's a different problem requiring place aliases to be worked into the data structure, which I haven't started yet, fortunately.)

So the linked pairs whose cardinality have to be determined are finding/nested_place and assertion/nested_place.

Each finding can be linked to zero or one nested_places. But each nested_place can be linked to many findings.

Each assertion can be linked to zero or one nested_place. But each nested_place can be linked to many assertions.

So it seems one of the mistakes that has made this feature so confusing is that the many side of the relationship is the nested_place, and due to the long time it took me to learn what cardinality was and how easy it is to figure it out for a relationship, I haven't done it right yet for places. I've done it every which way but right. Probably the way I'm doing it is denormalized or just plain wrong. The finding_id or the assertion_id has to go in the nested_place table. So the schema for that table will include the usual autoincrement primary key; nest0 thru nest8 which are foreign keys from the place table; a finding_id FK column and an assertion_id column. Both the latter can be null, which allows new places to be created--for example in the places tab--without linking them to anything.

While I'm restructuring the places functionality, this is the perfect time to do something else that had to be done anyway, which I was putting off since it would be an interruption of everything else. I have to move the database's place table(s) to a global database so that all the user's places become available to all the user's trees. So the user doesn't have to input the same places anew for each tree he makes.

Also I have to evaluate the usefulness of the database table places_places and whether or not it really needs to exist. It seems like double the effort (or more) to keep track of which two places can have a child-in-parent relationship when the order of the place_ids that will be stored in the new nested_place table records the same information. Possibly the very existence of the places_places table is just a throwback to the days when I thought nested places could be correctly modeled by a simple recursive place table. I would love to do away with the places_places table and the recursive Python code that I invented so that I wouldn't have to figure out how to do a many-to-many recursive query with SQL.

I suspect and/or hope that properly complex nested places (in which a place can be nested inside more than one immediate parent) can be simply modeled in a database with no recursion at all, just as data read straightforwardly out of some simple tables.

I'd already started on the assertions dialog, which was to be the climax to the first chapter of Treebard development, and it still will be, but with the places functionality in this much question, it's too early to work on it. Looks like chapter one of Treebard development will take another year. Good thing I'm not in a hurry. There are already plenty of genealogy applications available that were written in a hurry, without regard for the real relationships that the data have in the world. We don't need another app whose data structure was dumbed down so the app could be written in a hurry. Especially an app like Treebard GPS that purports to show the way for other developers of genealogy software.

Scott Robertson (Professor U. d'Guru)
"If you don't build it, it won't work."

Uncle Buddy
Administrator

Posts: 661

Time to Rewrite the Places Feature Jun 25, 2022 22:23:47 GMT -8

Quote

Post by Uncle Buddy on Jun 25, 2022 22:23:47 GMT -8

That's not right either.

The nested_place table has nothing in it except a primary key and the nine place_id foreign key columns nest0 thru nest8.

One finding can be linked to one nested_place. One assertion can be linked to one nested place.

One nested place can be linked to many findings. One nested place can be linked to many assertions. The finding or assertion is the many side. The foreign key in a one-to-many relationship goes in the many side of the relationship. So nested_place_id has to be a foreign key in finding table and assertion table.

I still hope to stop using the table places_places.

First step is the one I've been putting off the longest: moving places to a global database so the user doesn't have to repeatedly input the same places if he has more than one tree. I haven't done much of this sort of thing. I used to do more of it, but simplified. I can see one problem already. Not really a problem, but deleting a place will not be allowed if any tree is using the place.

One way around this might be to keep the places local to the tree where they're created, but also copy the place to a global database. The place would be gone from the local table as desired, but the user would be informed that the place still exists in the global table since it's being used by a different tree.

Scott Robertson (Professor U. d'Guru)
"If you don't build it, it won't work."

Uncle Buddy
Administrator

Posts: 661

Time to Rewrite the Places Feature Jun 30, 2022 22:50:36 GMT -8

Quote

Post by Uncle Buddy on Jun 30, 2022 22:50:36 GMT -8

The place table and nested_place table have now been created in the global treebard.db and dropped from the specific family tree database so that the user of Treebard will be able to use the same places over and over from tree to tree. It looks like the simple stuff is gonna work fine.

I'm now looking at re-defining my goals for Treebard places. I've gotten up to the point where I have to remember what the extra/extraneous table places_places used to do for me. I want to get rid of it, and I'm ready to talk myself into setting my sites lower to make places_places a thing of the past.

Not that the code wouldn't work--at one time it was finished--but the problem is that I want Treebard to do reasonably complex things with places, especially in regards to letting a single nest exist within two or more larger nests. For example, Dallas must be able to nest within either the state of Texas, USA or the Republic of Texas, without being two difference Dallases.

However, what happened when I wrote this code is that I also decided the user was not allowed to input any mistaken data, ever.

That attitude worked with dates, where there are only twelve months, 31 days, and some fairly predictable names for years.

But with places, I learned (for example) that I would have to detect more than one kind of possible duplicate, in order to keep the user from inputting duplicate places. I called them "inner dupes" for places like "Maine, Maine, USA" and "outer dupes" for places like "Paris, Texas" and "Paris, France". The resulting barrage of hard thinking and hard work went on for months, long past the time when I still found it interesting to continue with my high-falutin' goal. The only reason I didn't roll back my goals back then (2 years ago?) was that it finally came together and started working just before I reached the point where I no longer cared enough to keep starting over.

Now, two years later and in possession of code already broken by apparently unfinished rewrites that took place sometime over the past couple years, I no longer believe this level of validation is worth the trouble. It's not just that I'm burned out and want to get a core of Treebard functioning coherently and cohesively so I can claim that Treebard "works". It's not just that my health declined rapidly while sitting in this chair for the past four years; partly the chair's fault, partly the fault of my age, and partly my fault for sitting in the chair and getting old.

There's also this: the purpose of Treebard is to provide evidence to genealogists that they can write their own apps without getting a PhD in programming. The New and Duplicate Places dialog that I wrote back when... well it was perfect. So naturally if you looked at it cross-eyed, it would break. Any change in anything that touched that code would require a rewrite of a complex data manipulation system bordering on too clever and definitely brittle, waiting for the next stiff breeze to blow it down. Clever code lying in pieces on the ground doesn't look too clever, especially if the person who has to fix it is not the person who wanted it bad enough to devote months of full-time work to creating it.

I can always go back to the places_places table and its ManyManyRecursiveQuery or whatever it was that created all possible nested place strings out of child/parent pairs of nested places and made the user test each one for plausibility. For now I owe it to myself and to Treebard developers of the future to try and turn down the volume on this feature, retaining the ability for each individual place to have more than one enclosing place or parent, while trying to eliminate for the most part the perceived need for the New and Duplicate Places Dialog. In the spirit of Let's Get This Show on the Road, I'm gonna cut loose and try to make something that works easy, even if the user might occasionally be allowed to enter erroneous data. That's what merging and splitting elements are for, right?

I still want it to be easier to input places right than to do it wrong, but the code has to be readable, extendable and maintainable, or it won't get used by anyone, maybe including me.

Last Edit: Jun 30, 2022 22:55:01 GMT -8 by Uncle Buddy

Scott Robertson (Professor U. d'Guru)
"If you don't build it, it won't work."

Uncle Buddy
Administrator

Posts: 661

Time to Rewrite the Places Feature Jul 5, 2022 0:16:30 GMT -8

Quote

Post by Uncle Buddy on Jul 5, 2022 0:16:30 GMT -8

Starting the place feature over almost from scratch wasn't very hard and the data seemed to be fairly cooperative at first. But due to the usual lack of up-front planning, the devil is in the details. In spite of leaving some of the details behind on this rewrite and focusing more on pragmatically getting the place feature simply-made and simply finished, details get tangled up as soon as the developer starts rushing for the finish line with blinders on.

In order for the autofill to function without losing important features, it needs to keep working from a simple list of values which would include, for example, `[...'Paris, Ile-de_France, France', 'Paris, Lamar County, Texas, USA'...]`. But the simple list that supplies potential hits to the autofill is not enough. It either has to have a parallel list of dicts or it needs to itself be a part of a list of dicts so that the list doesn't exist in isolation from the other information about places. The code shouldn't have to stop what it's doing to go fetch needed information from the database in order to complete its thought. Code like that is impossible to finish, because it wasn't started right. Overly simplistic code is easy to start and nearly impossible to finish, because it suffers from the just one more thing flaw, piling on tanglesome appendages as needed instead of starting from a complete design with all its needs thought out in advance and laid out ready to be used.

There should not be a bunch of stray lists floating around. Everything should be put into a master place collection on load and again on making/deleting/editing a place or nesting. In anticipation of the data's needing to actually be used, the master collection is created before the user touches the interface. Elements of a master place collection might include a place name and its ID, a list of nestings that include each individual place, and the position of each place within its associated nestings.

A table needs to be made in the database for place name aliases. It's common for every branch of government to have its own way of dividing up land and referencing the divisions. This is probably essentially unnecessary, but controllish types do love to remake the world in their own image. Treebard can't hope to accomodate all the excesses of officialdom in this regard, but this is the right time to stop entirely relegating the place-name-aliases topic to the do list. To deal with this needed feature, a place_name table has to be made in treebard.db. The names column in the place table will be removed. One place can be called by many names but one of these names can refer to only one place, which is a one-to-many relationship with names being the many side, so there will be a place_id FK col in a table called place_name which will have only 3 cols: place_name_id, place_id, & place_names. Some queries will have to be upgraded. I hope to include nothing about place name types in the database. The detail mongers can add more info in notes if desired.

Sideline news is that I have to consider re-thinking how places with multiple parents (enclosing places) should be represented. If this were implemented, it would be a lot of work, so this idea should be rejected unless it really makes the data structure for places easier and better. The general idea (and this plows over everything previously said about places) is that each place has one parent in the database. Nothing changes except the data structure. Which is everything, but what I mean to say is that the user will not see the difference. This should either serve Treebard and its developers or it should not be done, because it might be more complicated than the current scheme. Everything that touches places would have to be rewritten, so I'm hoping to see no great advantage to doing this, but the existence of this alternate route should be mentioned anyway.

The hypothetical scheme is this. There's a place table with two columns: place_id1 and place_id2. The names could be done as outlined above, in a place_name table. This would be a self-referencing table. By detecting a null parent (place_id2), a nesting could be built by recursion till the null stops the recursion. The advantage is that the nesting isn't stored. The disadvantage is that the autofill has to run a query since the nestings aren't stored. I've done it this way, long ago, but never tried it with the autofills I use now. It worked, but I gave it up when I realized that each place needs to potentially have multiple parents, and that this multiple-parents ability is not some optional featurette, it's basic to the nature of the real world.

The seed of this new idea is that another way could be used to give multiple parents to places. A place would have a different place_id in order to have an additional parent, and then the two different place_ids would be linked in a junction table. Each place could have many same_places in some sort of foreign key table, to link them together as if they were the same place, which they are.

I don't like this idea but I just wanted to mention it. I like the autofill having a dedicated list of values to choose from instead of having to run a query each time you press a key.

I've been saying for a long time that the Dallas in the Republic of Texas and the Dallas in the state of Texas, USA are the same place so should have the same ID. While it's doubtful that Dallas got up and went someplace new when it became part of the USA, you can see in this map of the Republic of Texas that Dallas' two unique enclosing parents barely resembled each other.

Last Edit: Jul 5, 2022 0:24:40 GMT -8 by Uncle Buddy

Scott Robertson (Professor U. d'Guru)
"If you don't build it, it won't work."

Uncle Buddy
Administrator

Posts: 661

Time to Rewrite the Places Feature Jul 5, 2022 1:57:29 GMT -8

Quote

Post by Uncle Buddy on Jul 5, 2022 1:57:29 GMT -8

The place_name table in treebard.db ended up with a boolean column `main_place_name` so the GUI will be able to tell which of a place's aliases to display by default. I didn't want to store any place name types in the database, but this was necessary.

As for jurisdiction levels such as "nation", "county", etc., someday a column can be added for free-form text in case any users think that's something they want to try and keep track of for some reason.

sqlite> .schema place
CREATE TABLE IF NOT EXISTS "place" (place_id INTEGER PRIMARY KEY AUTOINCREMENT, latitude TEXT DEFAULT '', longitude TEXT DEFAULT '', cartesian_coordinates TEXT DEFAULT '', township TEXT DEFAULT '', range TEXT DEFAULT '', section TEXT DEFAULT '', legal_subdivision TEXT DEFAULT '', hint TEXT UNIQUE DEFAULT null);

sqlite> .schema place_name
CREATE TABLE place_name (place_name_id INTEGER PRIMARY KEY AUTOINCREMENT, place_names TEXT NOT NULL, place_id INTEGER REFERENCES place (place_id), main_place_name BOOLEAN NOT NULL DEFAULT 0);

sqlite> .schema nested_place
CREATE TABLE nested_place (nested_place_id INTEGER PRIMARY KEY AUTOINCREMENT, nest0 INTEGER NOT NULL DEFAULT 1, nest1 INTEGER DEFAULT NULL, nest2 INTEGER DEFAULT NULL, nest3 INTEGER DEFAULT NULL, nest4 INTEGER DEFAULT NULL, nest5 INTEGER DEFAULT NULL, nest6 INTEGER DEFAULT NULL, nest7 INTEGER DEFAULT NULL, nest8 INTEGER DEFAULT NULL, FOREIGN KEY (nest0) REFERENCES place (place_id), FOREIGN KEY (nest1) REFERENCES place (place_id), FOREIGN KEY (nest2) REFERENCES place (place_id), FOREIGN KEY (nest3) REFERENCES place (place_id), FOREIGN KEY (nest4) REFERENCES place (place_id), FOREIGN KEY (nest5) REFERENCES place (place_id), FOREIGN KEY (nest6) REFERENCES place (place_id), FOREIGN KEY (nest7) REFERENCES place (place_id), FOREIGN KEY (nest8) REFERENCES place (place_id));

Scott Robertson (Professor U. d'Guru)
"If you don't build it, it won't work."

Uncle Buddy
Administrator

Posts: 661

Time to Rewrite the Places Feature Jul 6, 2022 4:53:20 GMT -8

Quote

Post by Uncle Buddy on Jul 6, 2022 4:53:20 GMT -8

places master list notes:
place name and its ID, a list of nestings that include each individual place, the nested_place_id for each nesting, and the position of each place within its associated nestings

list of dicts sorted alphabetically (when used popped to front of list)
each key is a nesting

>>> place_data = [
... {'Eton, Valley, Bay, Stump': {'nested_place_id': 255, 'dupes': [['Eton', 12], ['Stump', 96]], }},
... {'Still, Still, Mud, Bill': {'nested_place_id': 143, 'dupes': [], }},
... {'Chet, Eton, Neat, Rudd, Stump': {'nested_place_id': 635, 'dupes': [['Eton', 12], ['Stump', 96]], }}
... ]

>>> autofill_place_values = [key for key in dkt for dkt in place_data]

>>> autofill_place_values
['Eton, Valley, Bay, Stump', 'Still, Still, Mud, Bill', 'Chet, Eton, Neat, Rudd, Stump']
>>> for idx, item in enumerate(autofill_place_values):
...     if item == 'Chet, Eton, Neat, Rudd, Stump':
...             nest_id = place_data[idx][item]['nested_place_id']
...             dupes = place_data[idx][item]['dupes']
...
>>> nest_id
635
>>> dupes
[['Eton', 12], ['Stump', 96]]

In the code above, which is copied from the Python console:

`place_data` is a master list of all places. Each place is a one-key dict whose key is a string representing a unique nested place name. Since it's expected to be unique, 1) it could be used as a primary key, but I prefer to use an autoincrement integer generated by SQLite; but it can be assumed to be unique (and has to be); 2) it can be used as a dict key; 3) in the freak edge case wherein a user finds two different places named 'Biddlewell Swamps Amusement Park, Stagman City, Precinct 16, Gator County, Florida, USA', the user would have to tweak the spelling on one of them or the two places' details would overwrite each other, acting like one place. I apologize in advance for the inconvenience.

`autofill_place_values` is a simple list of nested place strings whose only purpose is to provide values for the autofill place input fields to choose from. The user can also input new place strings, or place strings comprised of partly new and partly existing places, or place strings that include nests which sound like existing places but are just spelled the same. (There are dozens of places in the world named "Paris".) The autofill values list is separate from the master list in the sense that it doesn't require the other, irrelevant parts of the list to be read by the autofill process. But it works like a parallel list in that its contents are pulled directly from the master list, so everything is the same order, and the index where the right nest is found by the autofill can be used by the rest of the place procedure to access the corresponding contents of the master list. If anything is added/deleted/edited in the database place tables, the master list and the autofill list will be recreated immediately. If the user inputs a place using an autofill, the autofill list will be recreated immediately, because the master list gets the most recently used place popped out and moved to the front if it's not already there, so the autofill selects the most recently used strings before moving through the rest of the strings alphabetically. This speeds up input on documents like census forms where the same place string will be input over and over. The end result of this feature is that a long place string will often fill in completely after the user types one character.

Each place or `dkt` is a nested dict (that's a different "nested"; no relation to nested places a.k.a "nestings"), i.e. the key's value is an inner dict comprised of needed information such as the nested_place_id and the nests e.g. Eton and Stump which correspond to same-spelled but different places (single nests) in the database. These places will need special treatment in the duplicate places dialog, which will open only in case of newly input places, not if an autofill actually fills in and is accepted. If the database only has "Paris, Indiana, USA" and "Paris, Kentucky, USA" but the user tried to enter "Paris, Oregon, USA", the duplicate place dialog would open for more user input.

This planning process is either very fun or very sloggish. In this case, I've already designed and re-designed the places feature enough times that the process is sloggish, but I have to slog through it, otherwise there are more wasted days writing half-imagined features that stall, revealing themselves to be half-formed only when the finish line is about to come into sight.

Last Edit: Jul 7, 2022 18:55:45 GMT -8 by Uncle Buddy

Scott Robertson (Professor U. d'Guru)
"If you don't build it, it won't work."

Uncle Buddy
Administrator

Posts: 661

Time to Rewrite the Places Feature Jul 10, 2022 23:10:02 GMT -8

Quote

Post by Uncle Buddy on Jul 10, 2022 23:10:02 GMT -8

The places rewrite is coming along nicely. The master list of dicts turned out to be simpler than outlined above. I've also rewritten the means by which the autofill values are prepended with the most recently-used nestings. For example, on opening my copy of Treebard, the places starting with a "c" are in this order:

California...
Canada...
Cheyenne...
Colorado...

But if an autofill value "Cheyenne" is accepted, the list changes to...

Cheyenne...
California...
Canada...
Colorado...

...which means the user will only have to type a "c" and Cheyenne will fill in.

Now let's say that an autofill value "Canada" is accepted, and after that "Colorado" is accepted in a place field. Next time the user types a "c" in a place field, Colorado will fill in. But if he types "ca", Canada will fill in, instead of California.

Here's an example of Treebard's place limitation, which is that each nesting such as "Paris, Kenosha County, Wisconsin, USA" can only exist once.

Paris, Wisconsin
From Wikipedia, the free encyclopedia

Paris is the name of some places in the U.S. state of Wisconsin:
Paris, Grant County, Wisconsin, a town
Paris, Kenosha County, Wisconsin, a town
Paris (community), Wisconsin, an unincorporated community in Kenosha County

In this case, since there are two places called "Paris, Kenosha County, Wisconsin, USA", the spelling on one of them would have to be tweaked somehow. For example, "Paris, Kenosha County, Wisconsin, USA" and "Paris (village), Kenosha County, Wisconsin, USA".

Also in this example, it would be up to the genealogist whether to even differentiate between the two places since the village is within the boundaries of the town by the same name. In fact, now that I think of it, the two could be accurately represented like this...

"Paris, Kenosha County, Wisconsin, USA" and "Paris Village, Paris, Kenosha County, Wisconsin, USA"

...or even like this:

"Paris, Kenosha County, Wisconsin, USA" and "Paris, Paris, Kenosha County, Wisconsin, USA".

The latter representation of the nesting is 100% accurate and includes no compromising to satisfy Treebard's limitation. So this has been an example of how, if the user really thinks about it and/or does his research for alternative namings, the limitation isn't bad enough to warrant a lot of scary coding.

Scott Robertson (Professor U. d'Guru)
"If you don't build it, it won't work."

Treebard Genealogy Software

Treebard Genealogy Software Blog & Forum: setting the record straight since 2020

How genealogy software should work

Time to Rewrite the Places Feature

Post by Uncle Buddy on Jun 25, 2022 21:39:35 GMT -8

Post by Uncle Buddy on Jun 25, 2022 22:23:47 GMT -8

Post by Uncle Buddy on Jun 30, 2022 22:50:36 GMT -8

Post by Uncle Buddy on Jul 5, 2022 0:16:30 GMT -8

Post by Uncle Buddy on Jul 5, 2022 1:57:29 GMT -8

Post by Uncle Buddy on Jul 6, 2022 4:53:20 GMT -8

Post by Uncle Buddy on Jul 10, 2022 23:10:02 GMT -8

Treebard Genealogy Forum is for suggesting changes in family tree conclusions and software design.