gedcom_import.py
Dec 25, 2023 23:47:18 GMT -8
Post by Uncle Buddy on Dec 25, 2023 23:47:18 GMT -8
<drive>:\treebard\app\python\gedcom_import.py Last Changed 2024-03-21
# gedcom_import.py
import tkinter as tk
import sqlite3
from datetime import datetime
from files import app_path, current_drive, appwide_db_path
from opening import close_tree
import gedcom_constants as gcc
from dates import format_stored_date, get_date_formats
from widgets import Scrollbar, make_formats_dict, configall, ScrolledDialog
import dev_tools as dt
from dev_tools import look, seeline
""" Treebard Genealogy Software and its associated modules are demonstrations that a genealogist
who is a novice, self-taught programming hobbyist can write most or all of a
genealogy app if he's not in too much of a hurry. My projects are meant to be
a showcase of functionalities and a source for ideas and inspiration. I plan
to use this genieware to do my daily genealogy work, but that does not mean
it pretends to be a complete application, or backwards compatible, etc. This
is primarily an educational project for wanna-be genieware authors and the
curious. All my genealogy projects are in the public domain.
We support GEDCOM v. 5.5.1 in a whimsical way to match the whimsy level of the
GEDCOM design. This module is a base for a demo model of a GEDCOM
import program. It tries to do most of the important stuff but could be more
complete, and could be rewritten in another program or with more efficient
Python code at least, if very large files are to be imported.
In my intractable state of chronic whimsy, I also hope, pray, and semi-intend
to pseudo-support GEDCOM v. 5.5 with this self-same code. I do not encourage
other creators of GEDCOM import programs to do things this way, but all of
my trees were made in Genbox 3.7.1 or Family Historian 6, neither of which
used the standard version of their time which was GEDCOM 5.5.1. Since Treebard
is a showcase of functionalities and not an app for doing the daily work of
genealogy, I can afford to have a little fun.
The `Elucidom` class should be usable (and you have my permission to use it,
or adapt it, for anything you want), as a first step in any GEDCOM import
program. This was the golden elixir for me. It made everything that followed
it substantially easier to do, less crazy-making and more likely to succeed.
To improve this GEDCOM import program:
--Do something with the HEAD record. I've never needed it much--or never
needed it at all, since I can visually look at it and see what version
of GEDCOM it uses--therefore I just remove it so it won't get in the way.
See the 5.5.5 specs for a focus on the HEAD record, especially character
sets. Having failed to research character sets adequately due to never
needing the information, I can only guess that line-ending character
conventions--which differ among Windows, MacOS, and Linux--need to be
handled, for one thing. Treebard is a Windows program.
--Do something like `id_shift` for types so, for example, if the .ged includes
an unknown event type, the event type ID can be made for it without
overwriting existing event types. This is a priority, since tree sharing
depends on it, as well as Treebard's policy of allowing users to create
their own event types, role types, name types, place types, etc.
--Handle some of the yuck tags, edge cases, and a few custom tags which I've
ignored, avoided, or overlooked. See functions referenced in
`self.do_function` which just `pass` instead of running code. Some of the
docstrings might still explain why the tag is being passed over if I
forgot to censor them. See also `gedcom_constants.IGNORE` and
`gedcom_constants.EXCEPTION_MENU_ITEMS`.
"""
submitter = {}
coordinates = {}
persons = {}
names = {}
couples = {}
places = {}
nested_places = {}
events = {}
sources = {}
media = {}
repositories = {}
transcriptions = {}
roles = {}
notes = {}
notes_links = {}
media_links = {}
roles_links = {}
repositories_links = {}
exceptions = {}
opened_dialogs = {}
date_prefs = get_date_formats()
def get_existing_ids_from_db(cur):
""" Enable the imported data to be added to existing data already in
UNIGEDS. This basic version is needed since a few UNIGEDS tables have
default values in a new or "blank" database, including person #1,
place #1, place_name #1, and nested_place #1. The model or
`default_new_tree.db` is truly blank but the default values and all the
types are added during the `make_tree()` process.
Something like this has to be done with types also, since users are
able to create new types (except that marital and couple events can't
be added by GEDCOM unless a user interface is also added so the user
can say which events are generic, which are couple, and which are
marital). That will be part of the tree-sharing feature but it will be
more complex since the built-in types that come with UNIGEDS can be
hidden but can't be overwritten or deleted.
Re: `max_id + 1`: We can't use 0 because SQL primary keys start at 1,
and we can't use 1 here because GEDCOM allows an identifier to be 0.
Adding 1 to the logical minimum value will also cause minimum starting
ID of 2 in order to ensure there's no clash with our default person 1
and default place IDs of 1.
"""
for (id_col, table) in (
("person_id", "person"), ("note_id", "note"), ("name_id", "name"),
("repository_id", "repository"), ("source_id", "source"),
("media_id", "media"), ("couple_id", "couple"),
("place_id", "place"), ("place_name_id", "place_name"),
("nested_place_id", "nested_place"), ("citation_id", "citation"),
("assertion_id", "assertion"), ("event_id", "event"),
("transcription_id", "transcription"), ("contact_id", "contact")):
cur.execute(f"SELECT MAX({id_col}) FROM {table}")
result = cur.fetchone()[0]
def get_all_relationship_types():
conn = sqlite3.connect(appwide_db_path)
cur = conn.cursor()
cur.execute('''
SELECT relationship_type_id, abbrev_rel_types
FROM kin_type
WHERE hidden = 0''')
all_relationship_types = cur.fetchall()
cur.close()
conn.close()
return all_relationship_types
def get_all_role_types():
conn = sqlite3.connect(appwide_db_path)
cur = conn.cursor()
cur.execute('''
SELECT role_type_id, role_types
FROM role_type
WHERE hidden = 0''')
all_role_types = cur.fetchall()
cur.close()
conn.close()
return all_role_types
def get_couple_events(target_unigeds_db):
conn = sqlite3.connect(target_unigeds_db)
cur = conn.cursor()
cur.execute("SELECT event_types FROM event_type WHERE couple = 1")
COUPLE_EVENTS = [i[0] for i in cur.fetchall()]
cur.close()
conn.close()
return COUPLE_EVENTS
def get_max_ids():
data = (persons, sources, notes, media, couples, repositories)
max_ids = {"persons": len(persons), "sources": len(sources),
"notes": len(notes), "media": len(media), "couples": len(couples),
"repositories": len(repositories)}
for idx, (k,v) in enumerate(max_ids.items()):
if v > 0:
max_ids[k] = max(data[idx])
return max_ids
class GEDCOMImport():
def __init__(
self, treebard, elucidom_output_path, target_unigeds_db,
source_gedcom_file, exceptions_log):
self.treebard = treebard
self.elucidom_output_path = elucidom_output_path
self.target_unigeds_db = target_unigeds_db
self.source_gedcom_file = source_gedcom_file
self.exceptions_log = exceptions_log
self.conn = sqlite3.connect(appwide_db_path)
self.conn.execute("PRAGMA foreign_keys = 1")
self.cur = self.conn.cursor()
self.cur.execute("ATTACH ? as tree", (self.target_unigeds_db,))
get_existing_ids_from_db(self.cur)
self.event_type = None
self.current_smallest_place_id = None
self.current_media_id = None
self.name_id_shifted = False
self.types = {
"name_types": {}, "event_types": {}, "role_types": {},
"source_types": {}, "locator_types": {}, "media_types": {},
"relationship_types": {}, "repository_types": {},
"place_types": {}, "transcription_types": {}}
self.gedcom_places = set()
self.concatenating = False
self.base = []
self.anchor = 0
self.concatenations = {}
self.ready_for_db = 0
self.FAMC_to_INDI_links = []
self.do_function = {
"SOUR_pointer": self.do_source_pointer,
"NOTE_pointer": self.do_note_pointer,
"FAM_pointer_child_event": self.do_offspring_event,
"FAM_pointer_child_eventless": self.do_offspring_no_event,
"FAM_pointer_spouse": self.do_fams,
"INDI_pointer_husband": self.do_partner,
"INDI_pointer_wife": self.do_partner,
"INDI_pointer_child": self.do_chil,
"OBJE_pointer": self.do_media_pointer,
"OBJE_minor": self.do_media_minor,
"REPO_pointer": self.do_repository_pointer,
"PLAC": self.do_place,
"LATI": self.do_coordinates,
"LONG": self.do_coordinates,
"PAGE": self.do_citation,
"TEXT_citation": self.do_assertion,
"CONC": self.do_conc_note,
"NOTE_minor": self.do_note_minor,
"DATE_event": self.do_event_date,
"EVEN": self.do_event_unknown,
"SEX": self.do_gender,
"AGE_event": self.do_age,
"AGE_event_husband": self.do_partner_age,
"AGE_event_wife": self.do_partner_age,
"TYPE_event": self.do_event_type,
"TYPE_event_detail": self.do_event_type_detail,
"NAME_person": self.do_person_name,
"TYPE_name": self.do_name_type,
"IDNO": self.do_person_name,
"SSN": self.do_person_name,
"TITL_name_type": self.do_person_name,
"NICK": self.do_person_name,
"FORM_media": self.do_media_form,
"FILE": self.do_media_file,
"TITL_media": self.do_media_title,
"TYPE_media": self.do_media_type,
"MEDI": self.do_media_type,
"PUBL": self.do_source_publication,
"AUTH": self.do_source_author,
"TITL_source": self.do_source_title,
"NAME_reposit": self.do_repository_name,
"ROMN": self.do_transcription,
"FONE": self.do_transcription,
"TYPE_transcription": self.do_transcription_type,
"LANG": self.do_submitter_language,
"NAME_submitter": self.do_submitter_name,
"PEDI": self.do_pedi,
"ADOP_by_whom": self.do_adopter,
"CALN": self.do_caln,
"SOUR_minor": self.do_source_minor}
self.do_primary = {
"SUBM_major": self.do_submitter,
"INDI_major": self.do_person,
"FAM_major": self.do_couple,
"NOTE_major": self.do_note,
"OBJE_major": self.do_media,
"SOUR_major": self.do_source,
"REPO_major": self.do_repository}
self.supers = dict(gcc.superior)
self.process()
self.cur.execute("DETACH tree")
self.cur.close()
self.conn.close()
def process(self):
Elucidom(self.source_gedcom_file, self.elucidom_output_path)
with open(self.elucidom_output_path, mode="r", encoding="utf-8-sig") as elucidom:
for ln in elucidom:
if " PLAC " in ln:
line = ln.replace("\n", "")
nesting = line.split(" PLAC ")[1]
self.save_gedcom_place(nesting)
UpgradeGEDCOMPlaces(self.treebard, self.source_gedcom_file, self)
def process2(self):
self.do_concat()
self.save_concat()
self.do_subrecords()
self.read3()
# Auto-create a birth event for every person in the tree.
self.make_missing_birth_events()
ExceptionsReport(self.treebard, self.source_gedcom_file, self)
def make_missing_birth_events(self):
""" Create birth events for anyone who doesn't have one. Link couples
to birth events where the only source for the couple ID was an
INDI.FAMC tag.
"""
def inner_loop(person_id):
for tup in self.FAMC_to_INDI_links:
p_id, c_id = tup
if p_id == person_id:
couple_id = c_id
return couple_id
birth_event_type_id = self.types["event_types"]["birth"]
existing_births = []
for event_id, dkt in events.items():
if dkt["EVNT_TYPE_FK"] == birth_event_type_id:
if dkt.get("CUPL_FK") is None:
dkt["CUPL_FK"] = None
existing_births.append((dkt["PRSN_FK"], dkt["CUPL_FK"]))
existing_events = [i[0] for i in existing_births]
existing_parents = [i[1] for i in existing_births]
for person_id in persons:
if person_id not in existing_events:
self.max_event_id += 1
self.current_event_id = self.max_event_id
events[self.current_event_id] = {}
events[self.current_event_id]["PRSN_FK"] = person_id
events[self.current_event_id]["EVNT_TYPE_FK"] = birth_event_type_id
events[self.current_event_id]["EVNT_AGE"] = "0"
events[self.current_event_id]["CUPL_FK"] = None
else:
for idx, pers_id in enumerate(existing_events):
if pers_id == person_id:
for event_id, dkt in events.items():
if (dkt["PRSN_FK"] == person_id and
dkt["EVNT_TYPE_FK"] == birth_event_type_id):
birth_event_id = event_id
break
if events[birth_event_id]["CUPL_FK"] is None:
couple_id = inner_loop(person_id)
dkt["CUPL_FK"] = couple_id
break
def insert_to_db(self):
""" Run this from the OK button on `UpdateGEDCOMPlaces`.
If person #1 is not used in the GEDCOM, than name #1 can't be used
either since Treebard's default person #1 is already using it, so
name IDs will be shifted + 1.
"""
self.add_coords_to_places()
self.treebard.new_family_tree.withdraw()
self.print_stuff()
if min(persons) != 1:
self.name_id_shifted = True
conn = sqlite3.connect(appwide_db_path)
conn.execute("PRAGMA foreign_keys = 1")
cur = conn.cursor()
cur.execute("ATTACH ? as tree", (self.target_unigeds_db,))
self.insert_persons(conn, cur)
self.insert_couples(conn, cur)
self.insert_places(conn, cur)
self.insert_nested_places(conn, cur)
self.insert_events(conn, cur)
self.insert_sources(conn, cur)
self.insert_media(conn, cur)
self.insert_repositories(conn, cur)
self.insert_transcriptions(conn, cur)
self.insert_submitter(conn, cur)
self.insert_notes(conn, cur)
self.insert_notes_links(conn, cur)
self.insert_media_links(conn, cur)
self.insert_repositories_links(conn, cur)
self.insert_roles_links(conn, cur)
cur.execute("DETACH tree")
cur.close()
conn.close()
# The user has to re-open the tree to redraw the GUI.
close_tree(self.treebard.new_family_tree, self.treebard.new_family_tree_id)
self.treebard.deiconify()
def insert_persons(self, conn, cur):
for key, values in persons.items():
if len(values) == 0:
gender = "unknown"
else:
gender = values["PRSN_GNDR"]
person_id = key
if person_id == 1:
cur.execute(gcc.update_person_1, (gender,))
else:
cur.execute(gcc.insert_new_person, (person_id, gender))
conn.commit()
for name_id, dkt in names.items():
if dkt["PRSN_FK"] == key:
self.insert_name(conn, cur, name_id, dkt, person_id)
def insert_name(self, conn, cur, name_id, dkt, person_id):
if dkt.get("NAME_TYPE_FK") is None:
name_type_id = 1
else:
name_type_id = dkt["NAME_TYPE_FK"]
if person_id == 1 and name_id == 1:
cur.execute(
gcc.update_name_1_person_1,
(dkt["NAME_STRG"], name_type_id, dkt["NAME_SORT"]))
elif self.name_id_shifted is False:
cur.execute(
gcc.insert_new_name,
(name_id, person_id, dkt["NAME_STRG"],
dkt["NAME_SORT"], name_type_id))
elif self.name_id_shifted:
name_id += 1
cur.execute(
gcc.insert_new_name,
(name_id, person_id, dkt["NAME_STRG"],
dkt["NAME_SORT"], name_type_id))
conn.commit()
def insert_couples(self, conn, cur):
for key, dkt in couples.items():
couple_id = key
person_id1 = None
person_id2 = None
if len(dkt) != 0:
if dkt.get("PRSN1_FK"):
person_id1 = dkt["PRSN1_FK"]
if dkt.get("PRSN2_FK"):
person_id2 = dkt["PRSN2_FK"]
cur.execute(gcc.insert_new_couple, (couple_id, person_id1, person_id2))
conn.commit()
def insert_places(self, conn, cur):
for idnum, dkt in places.items():
latitude = ""
longitude = ""
place_id = idnum
if dkt.get("LATI"):
latitude = dkt["LATI"]
if dkt.get("LONG"):
longitude = dkt["LONG"]
cur.execute(gcc.insert_new_place, (place_id, latitude, longitude))
conn.commit()
for tup in dkt["aliases"]:
self.insert_place_name(conn, cur, tup, place_id)
def insert_place_name(self, conn, cur, tup, place_id):
idnum, place_name = tup
place_name_id = idnum
cur.execute(gcc.insert_new_place_name, (place_name_id, place_name, place_id))
conn.commit()
def insert_nested_places(self, conn, cur):
for idnum, tup in nested_places.items():
nested_place_id = idnum
values = tuple([nested_place_id] + [i if i == 1 else i for i in tup[1]])
cur.execute(gcc.insert_new_nested_place, values)
conn.commit()
def insert_events(self, conn, cur):
for idnum, dkt in events.items():
person_id = None
couple_id = None
date = "-0000-00-00-------"
date_sorter = "0,0,0"
nested_place_id = 1
particulars = ""
age = ""
age1 = ""
age2 = ""
event_id = idnum
event_type_id = dkt["EVNT_TYPE_FK"]
if event_id != 1:
cur.execute(gcc.insert_new_event, (event_id, event_type_id))
conn.commit()
if dkt.get("PRSN_FK"):
person_id = dkt["PRSN_FK"]
if dkt.get("CUPL_FK"):
couple_id = dkt["CUPL_FK"]
if dkt.get("EVNT_DATE"):
date = dkt["EVNT_DATE"]
date_sorter = dkt["EVNT_DATE_SORT"]
if dkt.get("PLACE_NEST_FK"):
nested_place_id = dkt["PLACE_NEST_FK"]
if dkt.get("EVNT_DETL"):
particulars = dkt["EVNT_DETL"]
if dkt.get("EVNT_AGE"):
age = dkt["EVNT_AGE"]
if dkt.get("EVNT_AGE1"):
age1 = dkt["EVNT_AGE1"]
if dkt.get("EVNT_AGE2"):
age2 = dkt["EVNT_AGE2"]
cur.execute(
gcc.update_event,
(person_id, couple_id, date, date_sorter, nested_place_id,
particulars, age, age1, age2, event_id))
conn.commit()
def insert_sources(self, conn, cur):
self.fix_duplicate_source_titles()
for idnum_src, sorc_dkt in sources.items():
author = ""
description = ""
source_id = idnum_src
source_title = sorc_dkt["SORC_TITL"]
if sorc_dkt.get("SORC_ATHR"):
author = sorc_dkt["SORC_ATHR"]
if sorc_dkt.get("SORC_PUBN"):
description = sorc_dkt["SORC_PUBN"]
cur.execute(gcc.insert_new_source, (source_id, source_title, author, description))
conn.commit()
self.insert_citation(conn, cur, source_id, sorc_dkt)
def fix_duplicate_source_titles(self):
""" Treebard uses autofill widgets for sources, so disallows
duplicate source titles. Eliminate duplicate source titles by
counting duplicates and appending an index.
"""
dupes = {}
for idnum, dkt in sources.items():
if dkt.get("SORC_TITL"):
source_title = dkt["SORC_TITL"]
if dupes.get(source_title) is None:
dupes[source_title] = [idnum]
else:
dupes[source_title].append(idnum)
for dikt, lst in dupes.items():
if len(lst) > 1:
for idx, num in enumerate(lst):
original = sources[num]["SORC_TITL"]
sources[num]["SORC_TITL"] = f"{original} ({str(idx+1)})"
def insert_citation(self, conn, cur, source_id, sorc_dkt):
if sorc_dkt.get("citations") is None:
return
for idnum_cttn, cttn_dkt in sorc_dkt["citations"].items():
citation = ""
citation_id = idnum_cttn
citation = cttn_dkt["CTTN_STRG"]
cur.execute(gcc.insert_new_citation, (citation_id, source_id, citation))
conn.commit()
self.insert_assertion(conn, cur, citation_id, cttn_dkt)
def insert_assertion(self, conn, cur, citation_id, cttn_dkt):
if len(cttn_dkt["assertions"]) == 0:
return
for idnum_asrtn, asrtn_dkt in cttn_dkt["assertions"].items():
event_id = None
name_id = None
particulars = ""
dates = ""
names = ""
assertion_id = idnum_asrtn
if asrtn_dkt.get("EVNT_FK"):
event_id = asrtn_dkt["EVNT_FK"]
assertion_type = asrtn_dkt["ASRTN_TYPE"]
if assertion_type == "particulars":
particulars = asrtn_dkt["ASRTN_STRG"]
elif assertion_type == "dates":
dates = asrtn_dkt["ASRTN_STRG"]
elif asrtn_dkt.get("NAME_FK"):
name_id = asrtn_dkt["NAME_FK"]
if self.name_id_shifted:
name_id += 1
names = asrtn_dkt["ASRTN_STRG"]
cur.execute(
gcc.insert_new_assertion,
(assertion_id, citation_id, event_id, name_id, dates,
particulars, names))
conn.commit()
def insert_media(self, conn, cur):
for idnum, dkt in media.items():
file = ""
title = ""
extension = ""
media_type_id = None
if dkt.get("MDIA_TYPE_FK"):
media_type_id = dkt["MDIA_TYPE_FK"]
if dkt.get("MDIA_FILE"):
file = dkt["MDIA_FILE"]
if dkt.get("MDIA_EXTN"):
extension = f"{dkt['MDIA_EXTN'].lower()}"
if len(file) != 0 and file.lower().endswith(extension) is False:
file = f"{file}.{extension}".replace("..", ".")
if dkt.get("MDIA_TITL"):
title = dkt["MDIA_TITL"]
media_id = idnum
cur.execute(
gcc.insert_new_media, (media_id, media_type_id, file, title))
conn.commit()
def insert_repositories(self, conn, cur):
for idnum, dkt in repositories.items():
repository = dkt["RPST_NAME_STRG"]
repository_id = idnum
cur.execute(gcc.insert_new_repository, (repository_id, repository))
conn.commit()
def insert_transcriptions(self, conn, cur):
for idnum, dkt in transcriptions.items():
transcription_type_id = None
name_id = None
place_name_id = None
if dkt.get("TRNSCRPN_TYPE_FK"):
transcription_type_id = dkt["TRNSCRPN_TYPE_FK"]
if dkt.get("NAME_FK"):
# name_id = dkt["NAME_FK"] + 1
name_id = dkt["NAME_FK"]
if self.name_id_shifted:
name_id += 1
elif dkt.get("PLACE_NAME_FK"):
place_name_id = dkt["PLACE_NAME_FK"]
transcription_id = idnum
transcription = dkt["TRNSCRPN_STRG"]
cur.execute(
gcc.insert_new_transcription,
(transcription_id, transcription, transcription_type_id,
name_id, place_name_id))
conn.commit()
def insert_submitter(self, conn, cur):
for idnum, dkt in submitter.items():
contact = ""
detail = "GEDCOM submitter"
language = ""
submitted = self.source_gedcom_file
contact_id = idnum
if dkt.get("name"):
contact = dkt["name"]
if dkt.get("language"):
language = dkt["language"]
cur.execute(
gcc.insert_gedcom_submitter,
(contact_id, contact, detail, language, submitted))
conn.commit()
def insert_notes(self, conn, cur):
for idnum, dkt in notes.items():
note = dkt["NOTE_STRG"]
note_id = idnum
cur.execute(gcc.insert_new_note, (note_id, note))
conn.commit()
def insert_notes_links(self, conn, cur):
for dkt in notes_links.values():
note_id = dkt["NOTE_FK"]
for tag in gcc.NOTE_LINKS:
if dkt.get(tag):
fk_col = gcc.NOTE_LINKS[tag]
element_id = dkt[tag]
note_topic = dkt["NOTE_TOPC"]
query = (f"INSERT INTO notes_links (note_id, {fk_col}, "
f"note_topic, note_topic_order) VALUES (?, ?, ?, ?)")
cur.execute(query, (note_id, element_id, note_topic, 1))
conn.commit()
break
def insert_repositories_links(self, conn, cur):
for dkt in repositories_links.values():
repository_id = dkt["RPST_FK"]
repository_type_id = None
source_id = None
citation_id = None
locator_id = None
media_id = None
if dkt.get("RPST_TYPE_FK"):
repository_type_id = dkt["RPST_TYPE_FK"]
if dkt.get("SORC_FK"):
source_id = dkt["SORC_FK"]
if dkt.get("CTTN_FK"):
citation_id = dkt["CTTN_FK"]
if dkt.get("LCTR_FK"):
locator_id = dkt["LCTR_FK"]
if dkt.get("MDIA_FK"):
media_id = dkt["MDIA_FK"]
query = ("INSERT INTO repositories_links (repository_id, "
"repository_type_id, source_id, citation_id, locator_id, "
"media_id) VALUES (?, ?, ?, ?, ?, ?)")
cur.execute(
query,
(repository_id, repository_type_id, source_id, citation_id,
locator_id, media_id))
conn.commit()
def insert_media_links(self, conn, cur):
for dkt in media_links.values():
media_id = dkt["MDIA_FK"]
for tag in gcc.MEDIA_LINKS:
if dkt.get(tag):
fk_col = gcc.MEDIA_LINKS[tag]
element_id = dkt[tag]
query = f"INSERT INTO media_links (media_id, {fk_col}) VALUES (?, ?)"
cur.execute(query, (media_id, element_id,))
conn.commit()
break
def insert_roles_links(self, conn, cur):
for dkt in roles_links.values():
role_type_id = dkt["ROLE_TYPE_FK"]
person_id = dkt["PRSN_FK"]
event_id = dkt["EVNT_FK"]
query = ("INSERT INTO roles_links (role_type_id, "
"person_id, event_id) VALUES (?, ?, ?)")
cur.execute(query, (role_type_id, person_id, event_id,))
conn.commit()
def add_coords_to_places(self):
if len(coordinates) == 0:
return
for tup in coordinates["latitude_longitude"]:
tag, value, nesting = tup
for k, tupp in nested_places.items():
nested_place_string, ids = tupp
if nested_place_string == nesting:
place_id = ids[0]
places[place_id][tag] = {}
places[place_id][tag] = value
break
def do_subrecords(self):
self.get_types()
self.unknown_event_type_id = self.types["event_types"]["unknown event type"]
max_ids = get_max_ids()
self.max_person_id = max_ids["persons"]
self.max_source_id = max_ids["sources"]
self.max_note_id = max_ids["notes"]
self.max_media_id = max_ids["media"]
self.max_couple_id = max_ids["couples"]
self.max_repository_id = max_ids["repositories"]
self.citations_filter = {}
self.assertion_type = None
self.current_source_id = None
self.current_repository_id = None
self.current_event_id = None
self.current_name_id = None
self.current_citation_id = None
self.current_assertion_id = None
self.current_transcription_id = None
self.current_notes_links_id = 0
self.current_media_links_id = 0
self.current_roles_links_id = 0
self.current_repositories_links_id = 0
self.max_name_id = 0
self.max_citation_id = 0
self.max_assertion_id = 0
self.max_event_id = 0
self.max_transcription_id = 0
self.max_notes_links_id = 0
self.max_media_links_id = 0
self.max_roles_links_id = 0
self.max_repositories_links_id = 0
self.COUPLE_EVENTS = get_couple_events(self.target_unigeds_db)
def save_gedcom_place(self, nesting):
self.gedcom_places.add(nesting)
def do_concat(self):
with open(self.elucidom_output_path, mode="r", encoding="utf-8-sig") as elucidom:
for idx,ln in enumerate(elucidom):
line = ln.replace("\n", "")
self.read1(idx, line)
def save_concat(self):
with open(self.elucidom_output_path, mode="r", encoding="utf-8-sig") as elucidom:
for idx,ln in enumerate(elucidom):
line = ln.replace("\n", "")
self.read2(idx, line)
def read1(self, idx, line):
""" Every line is treated as if it could be followed by a CONC line.
(Because of the unique treatment of NOTE per specs, even the
zero line could be followed by a CONC line.) If CONC doesn't follow
a line, the `self.base` content is replaced by the next line's
content. A zero line signals the end of a concatenation in case the
last line of the previous record was a CONC.
A blank line at the end of the imported `.lux` file would prevent
the last concatenation from processing correctly if the last
line-to-be-saved of the `.lux` file is a CONC. Therefore, the
`.lux` file ends with a line we're already prepared to ignore,
`1 DATA null`. Unlike GEDCOM's `0 TRLR`, DATA is not a one-time
tag so it doesn't require its own separate conditional check
(`if tag == "TRLR"...`).
According to the GEDCOM specs, PAGE and other lines that might need
values over 255 characters are not treated to the concatenations
feature. Actually this is not GEDCOM's decision; some vendors use
CONC/CONT wherever needed and we have to be ready for that.
"""
lst = line.split(" ", 2)
num, (tag, value) = int(lst[0]), lst[1:]
num = int(num)
lst[0] = num
if num == 0:
self.read_primary_key_line(idx, lst)
# If final line of previous record was CONC:
if self.concatenating is True:
self.concatenations[self.anchor] = "".join(self.base)
# Signal the next line that there's no concatenating taking place:
self.concatenating = False
# Prepare in case the next line is the first CONC:
self.base = []
# There's no text allowed as a fourth item in a primary `.lux` NOTE
# line, as there is in a primary `.ged` NOTE line:
self.anchor = idx + 1
elif tag not in ("CONC", "CONT"):
# End the concatenation, if any, and save its result:
if self.concatenating and len(self.base) != 0:
self.concatenations[self.anchor] = "".join(self.base)
# Start a new potential concatenation with the current value as base:
self.base = [value]
# Save the line number where the concatenation started:
self.anchor = idx
# Assume no concatenation will actually take place:
self.concatenating = False
elif tag in ("CONC", "CONT"):
self.concatenating = True
tag, text = lst[1:3]
if tag == "CONT":
text = f"\n{text}"
self.base.append(text)
def read2(self, idx, line):
lst = line.split(" ", 2)
num, (tag, value) = int(lst[0]), lst[1:]
if tag == "TRLR":
# Ignore blank lines after trailer, if any:
return
if idx in self.concatenations: # this is how you get concat values where they belong
# but it depends on reading the same file twice to get the same idx
value = self.concatenations[idx]
def read_primary_key_line(self, idx, lst):
num, tag, value = lst[0:]
pk = int("".join(c for c in value if c.isdigit()))
self.do_primary[tag](idx, num, tag, pk)
def do_person(self, idx, num, tag, pk):
persons[pk] = {}
def do_couple(self, idx, num, tag, pk):
couples[pk] = {}
def do_note(self, idx, num, tag, pk):
notes[pk] = {}
def do_media(self, idx, num, tag, pk):
media[pk] = {}
def do_source(self, idx, num, tag, pk):
sources[pk] = {"SORC_TITL": None, "citations": {}}
def do_repository(self, idx, num, tag, pk):
repositories[pk] = {}
def do_submitter(self, idx, num, tag, pk):
submitter[pk] = {}
def read3(self):
with open(self.elucidom_output_path, mode="r", encoding="utf-8-sig") as elucidom:
for idx,ln in enumerate(elucidom):
line = ln.replace("\n", "")
num, tag, value = line.split(" ", 2)
num = int(num)
if "_pointer" in tag or tag.endswith("_major"):
if value != "null":
value = int(value)
if tag.startswith("EVEN_"):
self.do_event(idx, num, tag, value)
elif tag in self.do_function:
self.do_function[tag](idx, num, tag, value)
self.supers[num] = (tag, value)
def do_person_name(self, idx, num, tag, value):
def save_name_type(tag):
tag = gcc.NAME_TYPE_TAGS[tag]
name_type_id = self.types["name_types"][tag]
names[self.current_name_id]["NAME_TYPE_FK"] = name_type_id
if len(value) == 0:
return
self.max_name_id += 1
self.current_name_id = self.max_name_id
name_type_id = None
zeroval = self.supers[0][1]
sorter = name = value
if len(value.split(" /", 1)) > 1:
forename, lastname = value.split(" /", 1)
surname = lastname.replace("/", "")
sorter = ", ".join([surname, forename])
name = " ".join([forename, surname])
names[self.current_name_id] = {
"PRSN_FK": zeroval, "NAME_STRG": name, "NAME_SORT": sorter}
if tag in ("NICK", "IDNO", "SSN", "TITL_name_type"):
save_name_type(tag)
def do_name_type(self, idx, num, tag, value):
value = value.lower()
if value in gcc.GEDCOM_NAME_TYPES:
value = gcc.GEDCOM_NAME_TYPES[value]
self.cur.execute("SELECT name_type_id, name_types FROM name_type")
results = self.cur.fetchall()
all_name_types = [i[1] for i in results]
all_name_type_ids = [i[0] for i in results]
if value in all_name_types:
indx = all_name_types.index(value)
name_type_id = all_name_type_ids[indx]
else:
self.cur.execute(
"INSERT INTO name_type (name_types) VALUES (?)", (value,))
self.conn.commit()
name_type_id = self.cur.lastrowid
self.get_types()
names[self.current_name_id]["NAME_TYPE_FK"] = name_type_id
def do_source_pointer(self, idx, num, tag, value):
supertag = self.supers[num-1][0]
if supertag == "NAME_person":
self.assertion_type = "name"
elif supertag.startswith("EVEN_"):
self.assertion_type = "event"
self.current_source_id = value
def do_source_title(self, idx, num, tag, value):
if idx in self.concatenations:
value = self.concatenations[idx]
sources[self.supers[0][1]]["SORC_TITL"] = value
def do_source_publication(self, idx, num, tag, value):
if idx in self.concatenations:
value = self.concatenations[idx]
sources[self.supers[0][1]]["SORC_PUBN"] = value
def do_source_author(self, idx, num, tag, value):
if idx in self.concatenations:
value = self.concatenations[idx]
sources[self.supers[0][1]]["SORC_ATHR"] = value
def do_citation(self, idx, num, tag, value):
if idx in self.concatenations:
value = self.concatenations[idx]
if self.citations_filter.get(self.current_source_id) is None: # change to setdefault
self.citations_filter[self.current_source_id] = set()
before = len(self.citations_filter[self.current_source_id])
self.citations_filter[self.current_source_id].add(value)
after = len(self.citations_filter[self.current_source_id])
self.save_citation(value, after, before)
def save_citation(self, value, after, before):
""" UNIGEDS requires an assertion for each reference to a citation,
since an assertion record is needed as the correct place to put a
citation foreign key. Increment the current assertion ID for each
citation instance and create an assertion record with placeholder
text for the assertion itself. Later, if a TEXT tag is found in the
same citation subrecord, don't increment assertion ID again, just
update the assertion text.
"""
if after > before:
self.max_citation_id += 1
self.current_citation_id = self.max_citation_id
sources[self.current_source_id]["citations"][
self.current_citation_id] = {"CTTN_STRG": value, "assertions": {}}
else:
for k,v in sources[self.current_source_id]["citations"].items():
if v["CTTN_STRG"] == value:
self.current_citation_id = k
break
self.plant_assertion()
def plant_assertion(self):
self.max_assertion_id += 1
self.current_assertion_id = self.max_assertion_id
if self.assertion_type == "event":
sources[self.current_source_id]["citations"][self.current_citation_id][
"assertions"][self.current_assertion_id] = {
"EVNT_FK": self.current_event_id,
"ASRTN_STRG": "blank assertion",
"ASRTN_TYPE": "particulars"}
elif self.assertion_type == "name":
sources[self.current_source_id]["citations"][self.current_citation_id][
"assertions"][self.current_assertion_id] = {
"NAME_FK": self.current_name_id,
"ASRTN_STRG": "blank assertion",
"ASRTN_TYPE": "names"}
def do_assertion(self, idx, num, tag, value):
""" See docstring under `self.save_citation()`. """
if idx in self.concatenations:
value = self.concatenations[idx]
if self.assertion_type == "event":
event_assertion_subtype = "particulars"
for stg in gcc.DATE_WORDS:
if stg in value.lower():
event_assertion_subtype = "dates"
break
sources[self.current_source_id]["citations"][self.current_citation_id][
"assertions"][self.current_assertion_id] = {
"EVNT_FK": self.current_event_id,
"ASRTN_STRG": value,
"ASRTN_TYPE": event_assertion_subtype}
elif self.assertion_type == "name":
sources[self.current_source_id]["citations"][self.current_citation_id][
"assertions"][self.current_assertion_id] = {
"NAME_FK": self.current_name_id,
"ASRTN_STRG": value,
"ASRTN_TYPE": "name"}
def do_source_minor(self, idx, num, tag, value):
if value == "null":
return
self.max_source_id += 1
self.current_source_id = self.max_source_id
sources[self.current_source_id] = {"SORC_TITL": value}
def do_conc_note(self, idx, num, tag, value):
""" This is irregular because the GEDCOM specs allow the note to start
as a fourth item on the zero line, while the developer tends to
avoid that by starting the note on the first CONC line. These are
GEDCOM irregularities, dealt with by this method.
"""
supertag, superval = self.supers[0]
if supertag != "NOTE_major":
return
if idx in self.concatenations:
value = self.concatenations[idx]
notes[superval]["NOTE_STRG"] = value
def do_repository_pointer(self, idx, num, tag, value):
self.max_repositories_links_id += 1
self.current_repositories_links_id = self.max_repositories_links_id
repository_id = value
source_id = self.supers[0][1]
repository_type_id = None
citation_id = None
locator_id = None
media_id = None
repositories_links[self.current_repositories_links_id] = {
"RPST_FK": repository_id, "SORC_FK": source_id,
"RPST_TYPE_FK": repository_type_id, "CTTN_FK": citation_id,
"LCTR_FK": locator_id, "MDIA_FK": media_id}
def do_note_pointer(self, idx, num, tag, value):
if idx in self.concatenations:
value = self.concatenations[idx]
zerotag, zeroval = self.supers[0]
supertag, superval = self.supers[num-1]
self.max_notes_links_id += 1
self.current_notes_links_id = self.max_notes_links_id
if supertag.startswith("EVEN"):
notes_links[self.current_notes_links_id] = {
"NOTE_FK": value,
"EVNT_FK": self.current_event_id}
elif supertag == "NAME_person":
notes_links[self.current_notes_links_id] = {
"NOTE_FK": value,
"NAME_FK": self.current_name_id}
elif supertag == "INDI_major":
notes_links[self.current_notes_links_id] = {
"NOTE_FK": value, "PRSN_FK": zeroval}
elif supertag == "PLAC":
place_id = self.get_place_id(superval)
notes_links[self.current_notes_links_id] = {
"NOTE_FK": value,
"PLACE_FK": place_id}
elif supertag == "FAM_major":
notes_links[self.current_notes_links_id] = {
"NOTE_FK": value, "CUPL_FK": zeroval}
elif supertag.startswith("SOUR"):
if supertag == "SOUR":
notes_links[self.current_notes_links_id] = {
"NOTE_FK": value,
"SORC_FK": zeroval}
else:
notes_links[self.current_notes_links_id] = {
"NOTE_FK": value,
"SORC_FK": self.current_source_id}
elif supertag.startswith("OBJE"):
if supertag in ("OBJE_pointer", "OBJE_major"):
notes_links[self.current_notes_links_id] = {
"NOTE_FK": value,
"MDIA_FK": superval}
else:
notes_links[self.current_notes_links_id] = {
"NOTE_FK": value,
"MDIA_FK": self.current_media_id}
elif supertag.startswith("REPO"):
if supertag == "REPO":
notes_links[self.current_notes_links_id] = {
"NOTE_FK": value,
"RPST_FK": zeroval}
else:
notes_links[self.current_notes_links_id] = {
"NOTE_FK": value,
"RPST_FK": self.current_repository_id}
else:
print("in `do_note_pointer()`; note not handled: idx, num, tag, value, supertag, superval, zerotag, zeroval", idx, num, tag, value, supertag, superval, zerotag, zeroval)
notes_links[self.current_notes_links_id]["NOTE_TOPC"] = f"Note ID #{value}"
notes_links[self.current_notes_links_id]["NOTE_ORDR"] = 1
def do_note_minor(self, idx, num, tag, value):
""" Unique note topics and note topic orders are used in UNIGEDS but
since GEDCOM doesn't provide them, the goal here is to prevent
import errors. The user can adjust these values if he wants in the
Treebard or other UNIGEDS-employing interface.
"""
self.max_note_id += 1
self.current_note_id = self.max_note_id
if idx in self.concatenations:
value = self.concatenations[idx]
notes[self.current_note_id] = {"NOTE_STRG": value}
zerotag, zeroval = self.supers[0]
supertag, superval = self.supers[num-1]
self.max_notes_links_id += 1
self.current_notes_links_id = self.max_notes_links_id
if supertag.startswith("EVEN"):
notes_links[self.current_notes_links_id] = {
"NOTE_FK": self.current_note_id,
"EVNT_FK": self.current_event_id}
elif supertag == "NAME_person":
notes_links[self.current_notes_links_id] = {
"NOTE_FK": self.current_note_id,
"NAME_FK": self.current_name_id}
elif supertag == "INDI_major":
notes_links[self.current_notes_links_id] = {
"NOTE_FK": self.current_note_id,
"PRSN_FK": zeroval}
elif supertag == "PLAC":
place_id = self.get_place_id(superval)
notes_links[self.current_notes_links_id] = {
"NOTE_FK": self.current_note_id,
"PLACE_FK": place_id}
elif supertag == "FAM_major":
notes_links[self.current_notes_links_id] = {
"NOTE_FK": self.current_note_id,
"CUPL_FK": zeroval}
elif supertag == "OBJE_minor":
notes_links[self.current_notes_links_id] = {
"NOTE_FK": self.current_note_id,
"MDIA_FK": self.current_media_id}
elif supertag == "OBJE_pointer":
notes_links[self.current_notes_links_id] = {
"NOTE_FK": self.current_note_id,
"MDIA_FK": superval}
elif supertag == "OBJE_major":
notes_links[self.current_notes_links_id] = {
"NOTE_FK": self.current_note_id,
"MDIA_FK": zeroval}
elif supertag == "REPO_pointer":
notes_links[self.current_notes_links_id] = {
"NOTE_FK": self.current_note_id,
"RPST_FK": superval}
elif supertag == "REPO_major":
notes_links[self.current_notes_links_id] = {
"NOTE_FK": self.current_note_id,
"RPST_FK": zeroval}
elif supertag.startswith("SOUR"):
notes_links[self.current_notes_links_id] = {
"NOTE_FK": self.current_note_id,
"SORC_FK": self.current_source_id}
else:
print("note not handled; supertag, superval:", supertag, superval)
if notes_links.get(self.current_notes_links_id) is None:
notes_links[self.current_notes_links_id] = {}
notes_links[self.current_notes_links_id]["NOTE_TOPC"] = (f"{value[0:10]}"
f"... (note ID #{self.current_note_id})")
notes_links[self.current_notes_links_id]["NOTE_ORDR"] = 1
def get_place_id(self, nested_place_string):
for tup in nested_places.values():
if nested_place_string == tup[0]:
return tup[1][0]
def do_event(self, idx, num, tag, value):
""" Not all GEDCOM's family events are UNIGEDS couple events. Send
bogus family events such as residence and emigration to the
exceptions report.
"""
event_type = tag.split("EVEN_")[1]
event_type_id = self.get_event_type_id(event_type)
zerotag, zeroval = self.supers[0]
if zerotag == "INDI_major":
self.make_event(event_type_id, value, idx)
events[self.current_event_id]["PRSN_FK"] = zeroval
elif zerotag == "FAM_major":
if event_type in self.COUPLE_EVENTS:
self.make_event(event_type_id, value, idx)
events[self.current_event_id]["CUPL_FK"] = zeroval
else:
print("line", look(seeline()).lineno, "event_type", event_type)
def make_event(self, event_type_id, value, idx):
self.max_event_id += 1
self.current_event_id = self.max_event_id
events[self.current_event_id] = {"EVNT_TYPE_FK": event_type_id}
if value != "null":
particulars = value
if idx in self.concatenations:
particulars = self.concatenations[idx]
events[self.current_event_id]["EVNT_DETL"] = particulars
def do_age(self, idx, num, tag, value):
events[self.current_event_id]["EVNT_AGE"] = value
def do_partner_age(self, idx, num, tag, value):
zerotag, zeroval = self.supers[0]
event_type_id = events[self.current_event_id]["EVNT_TYPE_FK"]
for evt_type, val in self.types["event_types"].items():
if val == event_type_id:
event_type = evt_type
break
if (event_type_id == self.unknown_event_type_id or
event_type in self.COUPLE_EVENTS):
if tag == "AGE_event_husband":
key = "EVNT_AGE1"
elif tag == "AGE_event_wife":
key = "EVNT_AGE2"
events[self.current_event_id][key] = value
def do_event_unknown(self, idx, num, tag, value):
zerotag, zeroval = self.supers[0]
self.make_event(self.unknown_event_type_id, value, idx)
if zerotag == "INDI_major":
events[self.current_event_id]["PRSN_FK"] = zeroval
elif zerotag == "FAM_major":
events[self.current_event_id]["CUPL_FK"] = zeroval
def do_event_type(self, idx, num, tag, value):
supertag, superval = self.supers[num-1]
zerotag, zeroval = self.supers[0]
event_type = value.lower()
event_type_id = self.get_event_type_id(event_type)
events[self.current_event_id]["EVNT_TYPE_FK"] = event_type_id
def do_event_type_detail(self, idx, num, tag, value):
events[self.current_event_id]["EVNT_DETL"] = value
def get_event_type_id(self, event_type):
self.cur.execute(
"SELECT event_type_id FROM event_type WHERE event_types = ?",
(event_type,))
result = self.cur.fetchone()
if result:
return result[0]
else:
self.cur.execute(
"INSERT INTO event_type (event_types) VALUES (?)",
(event_type,))
self.conn.commit()
self.get_types()
return self.cur.lastrowid
def do_place(self, idx, num, tag, value):
for nested_place_id, tup in nested_places.items():
nesting = tup[0]
self.current_smallest_place_id = tup[1][0]
if nesting == value:
events[self.current_event_id]["PLACE_NEST_FK"] = nested_place_id
break
def do_coordinates(self, idx, num, tag, value):
""" Overwrite latitude & longitude if repeated in subsequent
`PLAC.MAP.LATI/LONG`. Keep the last coordinates input for any given
place.
"""
places[self.current_smallest_place_id][tag] = value
def do_media_pointer(self, idx, num, tag, value):
supertag, superval = self.supers[num-1]
if supertag in gcc.EVENT_TYPES:
self.max_media_links_id += 1
self.current_media_links_id = self.max_media_links_id
media_links[self.current_media_links_id] = {
"MDIA_FK": value, "EVNT_FK": self.current_event_id}
elif supertag == "SOUR_minor":
self.max_media_links_id += 1
self.current_media_links_id = self.max_media_links_id
media_links[self.current_media_links_id] = {
"MDIA_FK": value, "SORC_FK": self.current_source_id}
elif supertag == "INDI_major":
self.max_media_links_id += 1
self.current_media_links_id = self.max_media_links_id
media_links[self.current_media_links_id] = {
"MDIA_FK": value, "PRSN_FK": superval}
elif supertag == "FAM_major":
self.max_media_links_id += 1
self.current_media_links_id = self.max_media_links_id
media_links[self.current_media_links_id] = {
"MDIA_FK": value, "CUPL_FK": superval}
elif supertag in ("SOUR_pointer", "SOUR_major"):
self.max_media_links_id += 1
self.current_media_links_id = self.max_media_links_id
media_links[self.current_media_links_id] = {
"MDIA_FK": value, "SORC_FK": superval}
elif supertag == "SUBM_major":
self.max_media_links_id += 1
self.current_media_links_id = self.max_media_links_id
media_links[self.current_media_links_id] = {
"MDIA_FK": value, "SUBM_FK": superval}
def do_media_minor(self, idx, num, tag, value):
""" Do nothing here. It's unknown here whether the subsequent FILE line
has a unique file name or not. If unique, an ID has to be created;
otherwise, the correct ID has to be found in the `media` dict.
The code has to run in `self.do_media_file()`.
"""
pass
def do_media_file(self, idx, num, tag, value):
zerotag, zeroval = self.supers[0]
supertag = self.supers[num-1][0]
if zerotag == "OBJE_major":
media[zeroval]["MDIA_FILE"] = value
elif supertag == "OBJE_minor":
if len(media) == 0:
self.max_media_id += 1
self.current_media_id = self.max_media_id
media[self.current_media_id] = {}
else:
for media_id, dkt in dict(media).items():
if dkt.get("MDIA_FILE") is None:
self.max_media_id += 1
self.current_media_id = self.max_media_id
media[self.current_media_id] = {"MDIA_FILE": None}
elif dkt["MDIA_FILE"] == value:
self.current_media_id = media_id
break
media[self.current_media_id]["MDIA_FILE"] = value
def do_media_form(self, idx, num, tag, value):
zerotag, zeroval = self.supers[0]
supertag = self.supers[num-1][0]
if zerotag == "OBJE_major":
media[zeroval]["MDIA_EXTN"] = value
elif supertag == "OBJE_minor":
media[self.current_media_id]["MDIA_EXTN"] = value
def do_media_title(self, idx, num, tag, value):
zerotag, zeroval = self.supers[0]
supertag = self.supers[num-1][0]
if zerotag == "OBJE_major":
media[zeroval]["MDIA_TITL"] = value
elif supertag == "OBJE_minor":
media[self.current_media_id]["MDIA_TITL"] = value
def do_media_type(self, idx, num, tag, value):
zerotag, zeroval = self.supers[0]
supertag, superval = self.supers[num-1]
media_type = value.lower()
if media_type in self.types["media_types"]:
media_type_id = self.types["media_types"][media_type]
else:
media_type_id = self.make_media_type(media_type)
if tag == "TYPE_media":
media[zeroval]["MDIA_TYPE_FK"] = media_type_id
elif tag == "MEDI":
if zerotag == "OBJE_major":
media[superval]["MDIA_TYPE_FK"] = media_type_id
elif supertag == "FORM_media":
media[self.current_media_id]["MDIA_TYPE_FK"] = media_type_id
elif zerotag == "SOUR_major":
# SOUR.MEDI is anti-specs; there is no direct link between a
# media type and a source.
pass
def make_media_type(self, media_type):
self.cur.execute(
"INSERT INTO media_type (media_types) VALUES (?)",
(media_type,))
self.conn.commit()
media_type_id = self.cur.lastrowid
self.get_types()
return media_type_id
def do_repository_name(self, idx, num, tag, value):
repository_id = self.supers[0][1]
repositories[repository_id]["RPST_NAME_STRG"] = value
if exceptions.get("locators_linked_to_sources_via_repositories"):
for dkt in exceptions["locators_linked_to_sources_via_repositories"]:
if dkt["repository_id"] == repository_id:
dkt["repository_name"] = value
def do_caln(self, idx, num, tag, value):
""" UNIGEDS' repository feature gives central status to locators such
as call numbers and URLs, but GEDCOM's locator tag CALN is only
linked to a repository which is only linked to a source. Locators
should be linkable to repository types, citations and multimedia
files also. Give the user a simple list of what can be gleaned
from the GEDCOM about locators and the sources and repositories
they're linked to. UNIGEDS stores the data in a many-to-many
junction table called `repositories_links`. GEDCOM has no facility
devoted to handling many-to-many relationships.
"""
zeroval = self.supers[0][1]
superval = self.supers[num-1][1]
if idx in self.concatenations:
value = self.concatenations[idx]
locator = value
repository_id = superval
repository_name = None
source_id = zeroval
source_title = sources[source_id]["SORC_TITL"]
if exceptions.get("locators_linked_to_sources_via_repositories") is None:
exceptions["locators_linked_to_sources_via_repositories"] = []
for repo_id, dkt in repositories.items():
if repo_id == repository_id and dkt.get("repository_name"):
repository_name = dkt["repository_name"]
break
exceptions["locators_linked_to_sources_via_repositories"].append({
"locator": locator, "repository_id": repository_id,
"repository_name": repository_name,
"source_id": source_id,
"source_title": source_title})
def do_fams(self, idx, num, tag, value):
""" `FAMS` is ignored since its cardinality is wrong, it's redundant,
and it conveys less needed information than `HUSB` and `WIFE` which
are in the `FAM` record where their cardinality is correct. The
person ID foreign keys for partners in a couple (`HUSB` or `WIFE`
pointers to `INDI` records) belong in the couple table since
person-to-couple is a many-to-many relationship: each person can be
in multiple couples and each couple can contain more than one person.
UNIGEDS recognizes the GEDCOM `FAM` record as a couple record. The
foreign key for the couple ID is used 1) in a birth record of the
event table to indicate parentage or alt-parentage of the person
born, or 2) in a couple event record of the event table, in which
case the person ID foreign key in that event record will be null.
"""
pass
def do_offspring_event(self, idx, num, tag, value):
""" `INDI.BIRT.FAMC` or `INDI.ADOP.FAMC` """
events[self.current_event_id]["CUPL_FK"] = value
if tag == "EVEN_birth":
events[self.current_event_id]["AGE"] = "0"
def do_offspring_no_event(self, idx, num, tag, value):
""" `1 FAMC...`
This is problematic due to not being linked to a specific birth or
adoption event, so it's turned off in cases where
`do_offspring_event()` already provided the couple ID, to keep it
from overwriting good couple_id values. Since this overwrites good
data in those cases, it's not known whether it ever works right or
not.
EDIT: This is turned off completely till we discover a reason for
the `1 FAMC...` line to exist. Treebard requires a birth event for
each person in the tree and auto-creates them. The code below was
supposed to do something similar but since there's no guarantee a
`1 FAMC...` line exists in every INDI record, a separate method
has been written instead to just create a birth for every INDI
record.
"""
person_id = self.supers[0][1]
couple_id = value
self.FAMC_to_INDI_links.append((person_id, couple_id))
def do_adopter(self, idx, num, tag, value):
""" INDI.ADOP.FAMC.ADOP values are 'BOTH', 'HUSB', or 'WIFE'. Append
this data to the event's existing particulars field value. Names
and IDs of parents are unavailable at this point since FAM
sub-records haven't been read yet.
"""
particulars = ""
adopter = "unknown"
event_type = "adoption"
dooperval = self.supers[num-2][1]
if dooperval and len(dooperval) != 0:
if dooperval.startswith("guardian"):
event_type = "guardianship"
elif dooperval.startswith("foster"):
event_type = "fosterage"
event_type_id = self.types["event_types"][event_type]
dkt = events[self.current_event_id]
dkt["EVNT_TYPE_FK"] = event_type_id
if dkt.get("EVNT_DETL"):
particulars = f"{dkt['EVNT_DETL']};"
couple_id = dkt["CUPL_FK"]
if value == "BOTH":
adopter = f"both parents, couple ID #{couple_id}"
elif value == "HUSB":
adopter = f"parent on the left, couple ID #{couple_id}"
elif value == "WIFE":
adopter = f"parent on the right, couple ID #{couple_id}"
if event_type == "adoption":
who = "adopted by"
elif event_type == "fosterage":
who = "fostered by"
elif event_type == "guardianship":
who = "ward of"
new = f"{particulars} {who} {adopter}"
events[self.current_event_id]["EVNT_DETL"] = new
def do_pedi(self, idx, num, tag, value):
""" `PEDI` can be used by vendors to convey foster, adoption, LDS
sealing, and even birth links to parents in a superior `FAMC`
pointer. It is a cumbersome alternative way of doing something that
could have been done with normal event tags. UNIGEDS allows eventless
couples but doesn't have child/parent relationships without events.
For example to indicate that a child is adopted, the user creates an
adoption event and Treebard displays the relationship in the family
table where the user can fill in the parents' names.
Instead of auto-creating events in a GEDCOM import, which would tie
the import process to fairly complex Treebard procedures which are
likely to change, we prefer to instruct the user to manually input
the data by listing the information that can be gleaned from the
`FAMC.PEDI` tag in the exceptions report. If this tag refers to a
birth family, it will be ignored, and otherwise it will generate an
exception with instructions for the user to add the adoption,
fosterage or sealing event manually in Treebard or other UNIGEDS-
employing front-end.
Since PEDI is subordinate to the FAMC tag, the FAMC line doesn't
know what to do unless it has become omniscient by having already
jumped through requisite hoops, such as pre-reading the file for
FAM pointers. This could be done, but doing it for tags that are
seldom used or that usually convey redundant information is not
worth the complicating effect it would have on the code. In light
of our goal of keeping this import code simple, it's better to use
the exceptions report to inform the user that some data was not
imported.
Example: Gramps' sample .ged file has 1374 PEDI tags which state that
the person of record was born, none of which is not already known.
By ignoring this tag when its value is "birth", we speed up the
import of Gramps trees.
Example from a GEDCOM file:
0 @I559@ INDI
1 BIRT
2 FAMC @F49@
1 FAMC @F22@
2 PEDI foster
1 FAMC @F49@
Oddly enough, the specs require the apparently useless `INDI.FAMC`
tag which is not linked to a birth event, while specifying that the
`INDI.BIRTH.FAMC` tag is optional. The latter is the more useful
tag since it's linked to a birth event, or can be linked to an ADOP
event instead. Needing to do both--and then to add a CHIL tag to
indicate the same relationship in the FAM record--seems like
programming heresy. I await an explanation from someone more
experienced, as to why this redundancy is required.
If GEDCOM is going to be used, it should be used with as much
consistency as possible, so for example instead of using `PEDI` to
convey half the story in two lines, how about doing something with
the available null value following a normal `ADOP` event tag:
0 @I559@ INDI
1 BIRT twin
2 FAMC @F49@
1 ADOP foster
2 FAMC @F22@
"""
if value == "birth":
return
zeroval = self.supers[0][1]
superval = self.supers[num-1][1]
left_person_id = None
right_person_id = None
left_person_name = None
right_person_name = None
current_person_name = None
current_person_id = zeroval
current_person_name = self.get_name(current_person_id)
couple_id = superval
event_type_string = gcc.PEDI_TAGS[value]
event_type_id = self.types["event_types"][event_type_string]
dkt = {
"couple_id": couple_id,
"current_person": (current_person_id, current_person_name),
"left_person": (left_person_id, left_person_name),
"right_person": (right_person_id, right_person_name),
"event": (event_type_id, event_type_string)}
if exceptions.get("PEDI_tag_is_useless") is None:
exceptions["PEDI_tag_is_useless"] = []
exceptions["PEDI_tag_is_useless"].append(dkt)
def do_partner(self, idx, num, tag, value):
""" `HUSB` or `WIFE` when not indicating age. """
couple_id = self.supers[0][1]
if tag == "INDI_pointer_husband":
key = "PRSN1_FK"
elif tag == "INDI_pointer_wife":
key = "PRSN2_FK"
couples[couple_id][key] = value
if exceptions.get("PEDI_tag_is_useless") is None:
return
for dkt in exceptions["PEDI_tag_is_useless"]:
if dkt["couple_id"] == couple_id:
person_id = value
person_name = self.get_name(person_id)
if key == "PRSN1_FK":
dkt["left_person"] = (person_id, person_name)
elif key == "PRSN2_FK":
dkt["right_person"] = (person_id, person_name)
def get_name(self, person_id):
for val in names.values():
if val["PRSN_FK"] == person_id:
name = val["NAME_STRG"]
break
return name
def do_chil(self, idx, num, tag, value):
""" The FAM record represents a couple and their children. Since each
couple can have multiple children and each child can belong to
multiple couples by way of fosterage, adoption, and guardianship,
UNIGEDS records couples in a many-to-many table. A reference to the
child's person ID belongs in the event record for the birth/adoption
event. That's handled by FAMC. A reference to the child's parents
also belongs in the birth event record, but this doesn't exist in
GEDCOM. I don't know why `CHIL` exists without a link to the birth
event that caused the child to exist, and therefore must be
referenced in the birth record. Due to cardinality, the links belong
in specific records. Reversing the order seems wrong, and recording
the same data twice in a database is denormalization.
"""
pass
def do_gender(self, idx, num, tag, value):
persons[self.supers[0][1]]["PRSN_GNDR"] = gcc.SPELL_IT[value]
def do_event_date(
self, idx=None, num=None, tag=None, value=None, corrected=(False, None)):
""" Handle these, and some combinations of them, case-insensitively:
1 OCCU farmer
2 DATE 1917
1 DEAT
2 DATE JUL 1970
1 OCCU shoe shop manager
2 DATE 7 APR 1930
1 RESI
2 DATE ABT 1917
1 DEAT apoplexy
2 DATE EST 12 DEC 1927
1 EVEN married
2 TYPE Marital Status
2 DATE CAL 1927
1 EVEN
2 TYPE Invention
2 DATE BEF 1934
1 RESI
2 DATE AFT 1900
2 PLAC Precinct 4, Fannin County, Texas, United States of America
1 RESI
2 DATE BET 1925 AND 1927
1 OCCU farmer
2 DATE FROM 9 MAR 1875 TO 14 OCT 1927
"""
if corrected[0]:
event_id = corrected[1]
else:
event_id = self.current_event_id
storable_date = "-0000-00-00-------"
sorter = "0,0,0"
compound_date, link = gcc.split_compound_dates(value)
for indx, date_input in enumerate(compound_date):
if date_input is None:
storable_date = storable_date # didn't test this
sorter = sorter # didn't test this
return storable_date, sorter
if indx == 0:
year, month, day, slots = gcc.get_date_parts(date_input, indx)
elif indx == 1:
year, month, day, slots = gcc.get_date_parts(
date_input, indx, slots=slots)
slots[5] = link.strip()
date_is_bad, month = gcc.validate_date(year, month, day)
if indx == 0:
sorter = gcc.make_date_sorter(year, month, day)
POS = {0: [1, 2, 3], 1: [7, 8, 9]}
if date_is_bad:
if exceptions.get("bad_dates") is None:
exceptions["bad_dates"] = []
exceptions["bad_dates"].append((self.current_event_id, value))
return "-0000-00-00-------", "0,0,0"
else:
slots[POS[indx][0]] = year
if len(month) != 0:
slots[POS[indx][1]] = gcc.MONTHS[month]
else:
slots[POS[indx][1]] = month
slots[POS[indx][2]] = day
storable_date = "-".join(slots)
events[event_id]["EVNT_DATE"] = storable_date
events[event_id]["EVNT_DATE_SORT"] = sorter
events[event_id]["EVNT_DATE"] = storable_date
events[event_id]["EVNT_DATE_SORT"] = sorter
return storable_date, sorter
def do_transcription_type(self, idx, num, tag, value):
def get_transcription_type_id(tag, value, supertag):
if value in gcc.GEDCOM_TRANSCRIPTION_TYPES:
transcription_type = gcc.GEDCOM_TRANSCRIPTION_TYPES[value]
else:
transcription_type = value
if transcription_type in self.types["transcription_types"]:
transcription_type_id = self.types["transcription_types"][
transcription_type]
else:
transcription_type_id = make_transcription_type(
transcription_type, supertag)
transcription_type_id = self.types["transcription_types"][
transcription_type]
return transcription_type_id
def make_transcription_type(transcription_type, supertag):
if supertag == "ROMN":
column = "romanized"
elif supertag == "FONE":
column = "phonetic"
self.cur.execute(
(f"INSERT INTO transcription_type (transcription_types, {column}) "
f"VALUES (?, 1)"),
(transcription_type,))
self.conn.commit()
transcription_type_id = self.cur.lastrowid
self.get_types()
return transcription_type_id
supertag = self.supers[num-1][0]
transcription_type_id = get_transcription_type_id(tag, value, supertag)
transcriptions[self.current_transcription_id][
"TRNSCRPN_TYPE_FK"] = transcription_type_id
def do_transcription(self, idx, num, tag, value):
self.max_transcription_id += 1
self.current_transcription_id = self.max_transcription_id
nested_place_id = None
name_id = None
supertag, superval = self.supers[num-1]
if supertag == "NAME_person":
for k,v in names.items():
if v["NAME_STRG"] == superval:
name_id = k
break
elif supertag == "PLAC":
for k,v in nested_places.items():
if v[0] == superval:
nested_place_id = k
break
transcriptions[self.current_transcription_id] = {
"TRNSCRPN_STRG": value, "NAME_FK": name_id,
"PLACE_NEST_FK": nested_place_id}
def do_submitter_language(self, idx, num, tag, value):
superval = self.supers[0][1]
submitter[superval]["language"] = value
def do_submitter_name(self, idx, num, tag, value):
superval = self.supers[0][1]
submitter[superval]["name"] = value
def get_types(self):
for table in gcc.MAX_TYPE_IDS:
self.cur.execute(f"SELECT {table}_id, {table}s FROM {table}")
types = self.cur.fetchall()
self.types[f"{table}s"] = {tup[1]: tup[0] for tup in types}
self.cur.execute(f"SELECT MAX({table}_id) from {table}")
max_id = self.cur.fetchone()[0]
def print_stuff(self):
""" Don't delete this. Just comment the print commands not wanted. """
# print("line", look(seeline()).lineno, "coordinates", coordinates)
# print("line", look(seeline()).lineno, "places:", places)
# print("line", look(seeline()).lineno, "events:", events)
# print("line", look(seeline()).lineno, "nested_places:", nested_places)
# print("line", look(seeline()).lineno, "self.types:", self.types)
# print("line", look(seeline()).lineno, "self.concatenations:", self.concatenations)
# print("line", look(seeline()).lineno, "exceptions:", exceptions)
# print("line", look(seeline()).lineno, "persons:", persons)
# print("line", look(seeline()).lineno, "names:", names)
# print("line", look(seeline()).lineno, "couples:", couples)
# print("line", look(seeline()).lineno, "sources:", sources)
# print("line", look(seeline()).lineno, "media:", media)
# print("line", look(seeline()).lineno, "roles", roles)
# print("line", look(seeline()).lineno, "repositories:", repositories)
# print("line", look(seeline()).lineno, "transcriptions:", transcriptions)
# print("line", look(seeline()).lineno, "submitter:", submitter)
# print("line", look(seeline()).lineno, "notes:", notes)
# print("line", look(seeline()).lineno, "notes_links", notes_links)
# print("line", look(seeline()).lineno, "repositories_links", repositories_links)
# print("line", look(seeline()).lineno, "roles_links", roles_links)
# print("line", look(seeline()).lineno, "media_links", media_links)
pass
class Elucidom():
""" Stay within the basic GEDCOMlike structure. Translate ambiguous tags
into tags with one meaning and one usage. Create an Elucidom or `.lux`
file which is generic with respect to its target application; it can be
used as a first step in the process by anyone hoping to write a GEDCOM
import program. Perform here the many conditional tests that parsing
GEDCOM requires while creating a consistent, symmetrical, predictable,
simple yet gedcomlike translation of a `.ged` which can be parsed with
relatively little conditional testing.
"""
def __init__(self, source_gedcom_file, elucidom_output_path):
self.source_gedcom_file = source_gedcom_file
self.elucidom_output_path = path_string = elucidom_output_path
pathlist = path_string.split("/")
new_filename = f"pre_{pathlist[-1]}"
pathlist[-1] = new_filename
self.pre_elucidom_output_path = "/".join(pathlist)
self.fam_elucidom_output_path = self.pre_elucidom_output_path.replace(
"/pre_", "/fam_")
self.process_gedcom()
def process_gedcom(self):
self.split_file()
self.evict_non_gedcom()
self.fix_non_couple_fam_events()
self.translate()
def split_file(self):
filename = self.source_gedcom_file.split("/")[-1].replace(".ged", "")
self.head_file = f"d:/treebard/app/python/{filename}_HEAD.ged"
self.gedcom = f"d:/treebard/app/python/{filename}_DATA.ged"
with open(
self.source_gedcom_file, mode="r",
encoding="utf-8-sig") as intake, open(
self.head_file, mode="w", encoding="utf-8-sig") as head:
for idx,line in enumerate(intake):
if line.startswith("0 @") is False:
head.write(line)
else:
here = idx
break
with open(
self.source_gedcom_file, mode="r",
encoding="utf-8-sig") as intake, open(
self.gedcom, mode="w", encoding="utf-8-sig") as exhaust:
for idx, line in enumerate(intake):
if line.startswith("0 @") and idx >= here:
exhaust.write(line)
elif idx >= here:
exhaust.write(line)
def evict_non_gedcom(self):
""" Re-order zero-lines to read `num, tag, value`. Send all custom tags
to the exceptions report along with their subordinate lines.
"""
self.sooperz = dict(gcc.superior)
with open(self.gedcom, mode="r", encoding="utf-8-sig") as gedcop:
with open(
self.pre_elucidom_output_path,
mode="w", encoding="utf-8-sig") as eviction:
cuz = CustomLinesPruner(self.sooperz)
text = None
for ln in gedcop:
four_items = False
if "TRLR" in ln:
eviction.write("0 TRLR")
return
line = ln.replace("\n", "")
nm, tag, vl = f"{line} null".split(" ", 2)
num = int(nm)
value = vl.replace(" null", "")
if num == 0:
if value.startswith("NOTE") and value != "NOTE":
four_items = True
new_value = tag
tag, text = value.split(" ", 1)
value = new_value
elif tag.startswith("_"):
# Handle Ancestral Quest's `0 _EVENT_DEFN Marital Status`.
# The order is already `num, tag, value`.
pass
else:
# Change `num, value, tag` of zero-line to match the
# order `num, tag, value` as used in all other lines.
tag, value = value, tag
self.sooperz[num] = (tag, value)
new_line = cuz.extract_subrecord(num, tag, value)
if new_line:
if "TRLR" in new_line:
eviction.write(f"{new_line}\n")
elif (new_line.startswith("0") and tag.startswith("_")):
current_zero_line = new_line
exceptions["custom_records"].append([new_line])
elif new_line.startswith("0"):
current_zero_line = new_line
exceptions["custom_records"].append([new_line])
eviction.write(f"{new_line}\n")
elif four_items:
self.add_extra_line(text, value, eviction)
else:
eviction.write(f"{new_line}\n")
else:
for lst in exceptions["custom_records"]:
if lst[0] == current_zero_line:
lst.append(" ".join([str(num), tag, value]))
def add_extra_line(self, text, value, eviction):
""" Write the primary tag without the irregular 4th item. Write an
extra CONC line beneath it with the value as the split-out 4th
item from the original NOTE record line.
"""
def write_line(num, tag, value):
added_line = " ".join([str(num), tag, str(value)])
eviction.write(f"{added_line}\n")
self.sooperz[num] = [tag, value]
write_line(0, "NOTE", value)
write_line(1, "CONC", text)
def fix_non_couple_fam_events(self):
""" Send non-couple events from GEDCOM's FAM record to Treebard's
exception report.
"""
self.superiors = dict(gcc.superior)
with open(
self.pre_elucidom_output_path, mode="r",
encoding="utf-8-sig") as culled:
with open(
self.fam_elucidom_output_path, mode="w",
encoding="utf-8-sig") as event_fix:
exceptions["gedcom_fam_events"] = {}
bad_fam_event = False
bad_fam_event_starts_at = 0
bad_fam_event_lines = []
supertag = None
zerotag = None
couple_id = None
for ln in culled:
if "TRLR" in ln:
event_fix.write("0 TRLR")
return
line = ln.replace("\n", "")
nm, tag, val = f"{line} null".split(" ", 2)
num = int(nm)
value = val.replace(" null", "")
if value:
self.superiors[num] = [tag, value]
else:
self.superiors[num] = [tag, ""]
if num > 0:
supertag = self.superiors[num-1][0]
else:
zerotag = tag
if zerotag == "FAM":
if (tag in gcc.EVENT_TAGS and
tag not in gcc.COUPLE_EVENT_TAGS):
pointer = self.superiors[0][1]
couple_id = int("".join(c for c in pointer if c.isdigit()))
if exceptions["gedcom_fam_events"].get(couple_id) is None:
exceptions["gedcom_fam_events"][couple_id] = []
if bad_fam_event:
exceptions["gedcom_fam_events"][couple_id].append(
bad_fam_event_lines)
bad_fam_event_lines = []
tag = None
bad_fam_event = True
bad_fam_event_starts_at = num
bad_fam_event_lines.append(line)
self.superiors[num] = [tag, val]
continue
elif bad_fam_event and num > bad_fam_event_starts_at:
tag = None
bad_fam_event_lines.append(line)
self.superiors[num] = [tag, val]
continue
elif num <= bad_fam_event_starts_at:
if bad_fam_event:
exceptions["gedcom_fam_events"][couple_id].append(
bad_fam_event_lines)
bad_fam_event_lines = []
bad_fam_event = False
if tag and num != 0:
new_line = " ".join([str(num), tag, value])
event_fix.write(f"{new_line}\n")
elif tag and num == 0:
new_line = " ".join([str(num), tag, value])
event_fix.write(f"{new_line}\n")
def translate(self):
""" Replace ambiguous tags. Replace ambiguous tags. Repla[truncated].
Make all lines three items long. Replace event and attribute tags
with one standard pattern. Move illegal 4th items out of `0 NOTE`
line into an extra CONC line.
Resist the temptation to handle other extra tasks while running this
translator. This is the core, the base, the mother of the other
operations. Don't muddy it up. Also, keep this procedure separate from
any ties to the Treebard interface or even UNIGEDS. This code should be
adaptable by anyone trying to write a GEDCOM import to any data structure.
"""
pointer = False
self.supers = dict(gcc.superior)
with open(self.fam_elucidom_output_path, mode="r", encoding="utf-8-sig") as lean:
with open(self.elucidom_output_path, mode="w", encoding="utf-8-sig") as translation:
zerotag = None
for ln in lean:
if "TRLR" in ln:
translation.write("1 DATA null")
return
line = ln.replace("\n", "")
nm, tg, vl = f"{line} null".split(" ", 2)
num = int(nm)
val = vl.replace(" null", "")
if num == 0:
self.supers[0] = [tg, val]
if val.startswith("@") and val.endswith("@"):
pointer = True
# Handle `0 @SUBM@ SUBM`:
ok = False
for char in val:
if char.isdigit():
ok = True
break
if not ok:
val = f"{val}1"
value = int("".join(c for c in val if c.isdigit()))
else:
pointer = False
value = val
supertag = None
if num > 0:
supertag = self.supers[num-1][0]
else:
zerotag = tg
if tg in ("INDI", "SOUR", "NOTE", "FAM", "OBJE", "REPO",
"FAMC", "FAMS", "WIFE", "HUSB", "CHIL", "SUBM",):
tag = self.rename_primary_tag(num, tg, pointer, supertag)
elif tg == "FACT":
tag = "EVEN"
elif tg in (
"PAGE", "SEX", "EVEN", "SSN", "IDNO", "CAUS",
"AUTH", "PEDI", "LANG", "LATI", "LONG",
"PUBL", "ROMN", "FONE", "FILE", "CALN"):
tag = tg
elif tg in ("CONC", "CONT") and supertag == "ADDR":
tag = None
elif tg in ("CONC", "CONT"):
tag = tg
elif tg in gcc.EVENT_TAGS:
if tg == "ADOP" and supertag != "INDI":
tag = "ADOP_by_whom"
else:
tag = f"EVEN_{gcc.EVENT_TAGGER[tg]}"
elif tg == "PLAC":
zerotag = self.supers[0][0]
if zerotag in ("INDI", "FAM"):
tag = tg
else:
tag = None
elif tg in gcc.IGNORE:
tag = None
elif val.startswith("_") and tg != "NAME":
# So a value can start with '_', e.g.: '_____ Smith'.
tag = None
else:
zerotag = self.supers[0][0]
tag = self.rename_tag(num, tg, zerotag, supertag)
if tag:
new_line = " ".join([str(num), tag, str(value)])
translation.write(f"{new_line}\n")
if num > 0:
self.supers[num] = [tg, val]
def rename_primary_tag(self, num, tg, pointer, supertag):
tag = tg
if tg == "INDI" and num == 0:
tag = "INDI_major"
elif tg == "INDI" and pointer:
tag = "INDI_pointer"
elif tg == "CHIL":
tag = "INDI_pointer_child"
elif tg == "WIFE" and pointer:
tag = "INDI_pointer_wife"
elif tg == "HUSB" and pointer:
tag = "INDI_pointer_husband"
elif tg in ("HUSB", "WIFE"):
tag = tg
elif tg == "FAM" and num == 0:
tag = "FAM_major"
elif tg == "FAMC" and supertag == "INDI":
tag = "FAM_pointer_child_eventless"
elif tg == "FAMC" and supertag in ("BIRT", "ADOP"):
tag = "FAM_pointer_child_event"
elif tg == "FAMC" and supertag == "SLGC":
tag = None
elif tg == "FAMC":
tag = tg
elif tg == "FAMS":
tag = "FAM_pointer_spouse"
elif tg == "NOTE" and num == 0:
tag = "NOTE_major"
elif tg == "NOTE" and pointer:
tag = "NOTE_pointer"
elif tg == "NOTE":
tag = "NOTE_minor"
elif tg == "SOUR" and num == 0:
tag = "SOUR_major"
elif tg == "SOUR" and pointer:
tag = "SOUR_pointer"
elif tg == "SOUR":
tag = "SOUR_minor"
elif tg == "OBJE" and num == 0:
tag = "OBJE_major"
elif tg == "OBJE" and pointer:
tag = "OBJE_pointer"
elif tg == "OBJE":
tag = "OBJE_minor"
elif tg == "REPO" and num == 0:
tag = "REPO_major"
elif tg == "REPO" and pointer:
tag = "REPO_pointer"
elif tg == "SUBM" and num == 0:
tag = "SUBM_major"
elif tg == "SUBM" and pointer:
tag = "SUBM_pointer"
else:
# If a tag prints here, handle it above or add it to
# `gedcom_constants.IGNORE`.
print("in rename_primary_tag(); tg:", tg)
return tag
def rename_tag(self, num, tg, zerotag, supertag):
tag = tg
if tg == "NAME":
if zerotag == "INDI":
tag = "NAME_person"
elif zerotag == "REPO":
tag = "NAME_reposit"
elif zerotag == "SUBM":
tag = "NAME_submitter"
elif supertag in ("_PLAC", "_PLACE", "_LOC"):
tag = None
elif tg == "AGE":
if supertag in gcc.EVENT_TAGS:
tag = "AGE_event"
elif supertag == "HUSB":
tag = "AGE_event_husband"
elif supertag == "WIFE":
tag = "AGE_event_wife"
elif tg == "TYPE":
if supertag in ("NAME", "IDNO"):
tag = "TYPE_name"
elif supertag in ("EVEN", "FACT"):
tag = "TYPE_event"
elif supertag in gcc.EVENT_TAGS:
tag = "TYPE_event_detail"
elif supertag in ("ROMN", "FONE"):
tag = "TYPE_transcription"
elif supertag == "FORM":
tag = "TYPE_media"
elif tg == "TEXT":
if zerotag == "SOUR":
tag = "TEXT_source"
else:
tag = "TEXT_citation"
elif tg == "DATE":
if supertag == "DATA":
tag = None
elif supertag != "CHAN":
tag = "DATE_event"
else:
tag = None
elif tg == "FORM":
if supertag in ("OBJE", "FILE"):
tag = "FORM_media"
elif supertag in ("PLAC"):
tag = "FORM_place"
elif tg == "TITL":
if supertag == "OBJE":
tag = "TITL_media"
elif supertag == "FILE":
tag = "TITL_media"
elif zerotag == "INDI":
tag = "TITL_name_type"
elif zerotag == "SOUR":
tag = "TITL_source"
elif tg == "NICK":
tag = tg
elif tg == "MEDI":
if supertag == "CALN":
tag = None
else:
# If a tag prints here, handle it above or add it to
# `gedcom_constants.IGNORE`.
print("in rename_tag(); num, tg, zerotag, supertag:", num, tg, zerotag, supertag)
return tag
class UpgradeGEDCOMPlaces(ScrolledDialog):
""" Run a GUI for user input at the beginning of the import process to
convert GEDCOM's simplistic strings of comma-separated nested single
places into place IDs, place name IDs, and nested place IDs. Make it
easy for the user to indicate which single place names are the same
place and which are different places. For example: match up common
elements within "Paris, Texas", "Paris, France", and "Paris, Lamar
County, TX, USA".
"""
def __init__(
self, master, source_gedcom_file, gedport, *args, **kwargs):
ScrolledDialog.__init__(self, master, *args, **kwargs)
self.treebard = master
self.source_gedcom_file = source_gedcom_file
self.gedport = gedport
self.geometry("+100+50")
self.protocol("WM_DELETE_WINDOW", self.exit_exceptions_report)
conn = sqlite3.connect(appwide_db_path)
conn.execute("PRAGMA foreign_keys = 1")
cur = conn.cursor()
self.title("User Input: Extract Single Places from Nested Places")
self.places_tracker = {}
self.current_place_name = None
self.max_place_name_id = 1
self.max_place_id = 1
self.make_widgets()
cur.close()
conn.close()
ScrolledDialog.bind_canvas_to_mousewheel(self.canvas)
configall(self, self.formats)
self.resize_scrolled_content(self, self.canvas, add_x=16, add_y=24)
self.bind("<Visibility>", self.focus)
self.wait_window(self)
self.gedport.process2()
def focus(self, evt):
self.focus_force()
self.but.focus_set()
def make_widgets(self):
self.columnconfigure(1, weight=1)
self.window.columnconfigure(2, weight=1)
self.window.rowconfigure(1, minsize=60)
self.window.columnconfigure(0, weight=1)
self.window.rowconfigure(0, weight=1)
headlab = tk.Label(
self.window,
text= f"Some of the information from the source GEDCOM file "
f"\n `{self.source_gedcom_file}` \ncould not be imported. \n\nThe "
f"exceptions log detailing these unimported data will be stored in the "
f"`etc` folder of your Treebard app.",
justify="left", wraplength=600, bd=1, relief="raised")
self.main_menu = tk.Frame(self.window)
self.make_menu()
exit_import = tk.Button(
self.window, text="EXIT IMPORT PROCESS",
command=self.exit_exceptions_report)
headlab.grid(column=0, row=0, pady=12, ipadx=6, ipady=6, padx=12)
self.main_menu.grid(column=0, row=1)
exit_import.grid(column=0, row=2, sticky="e", pady=12)
widgets = self.main_menu.winfo_children()
def make_menu(self):
self.but = tk.Button(
self.main_menu, text="Upgrade GEDCOM Places", cursor="hand2",
command=self.open_places_dialog)
self.but.grid()
def open_places_dialog(self):
def close_dialog():
dlg.destroy()
dlg = ScrolledDialog(self)
dlg.protocol("WM_DELETE_WINDOW", close_dialog)
dlg.title("GEDCOM Places")
self.make_places_widgets(dlg)
ScrolledDialog.bind_canvas_to_mousewheel(dlg.canvas)
configall(dlg, self.formats)
self.update_idletasks()
right_pos = int(
self.winfo_screenwidth()/2 - dlg.window.winfo_reqwidth()/2)
dlg.resize_scrolled_content(dlg, dlg.canvas, add_x=16, add_y=24)
dlg.geometry(f"+{right_pos}+50")
def make_places_widgets(self, dlg):
def cancel_places():
dlg.destroy()
header_text = gcc.PLACES_HEADER
header = tk.Label(
dlg.window, text=header_text, justify="left",
wraplength=600, bd=1, relief="raised")
self.inputs = tk.Frame(dlg.window)
self.make_places_inputs()
buttons = tk.Frame(dlg.window)
all_unique = tk.Button(
buttons, text="UNHIGHLIGHTED PLACES ARE UNIQUE",
command=self.run_next_for_black)
all_unique.bind("<Leave>", self.check_if_places_done)
nexxt = tk.Button(
buttons, text="NEXT", width=7,
command=self.save_place)
self.done = tk.Button(
buttons, text="DONE", width=7, state="disabled",
command=self.ok_places)
nexxt.focus_set()
cancel = tk.Button(buttons, text="EXIT IMPORT PROCESS", command=cancel_places)
header.grid(column=0, row=0, pady=12, ipadx=6, ipady=6, padx=12)
self.inputs.grid(column=0, row=1, sticky="news", padx=12, pady=12)
buttons.grid(column=0,row=2, sticky="e", pady=12, padx=(0,12))
all_unique.grid(column=0, row=0)
nexxt.grid(column=1, row=0, padx=(6,0))
self.done.grid(column=2, row=0, padx=(6,0))
cancel.grid(column=3, row=0, padx=(6,0))
dlg.resize_scrolled_content(dlg, dlg.canvas, add_x=16, add_y=24)
def make_places_inputs(self):
r = 2
for nesting in self.gedport.gedcom_places:
self.places_tracker[nesting] = {
"row": []}
nested_places[r] = (nesting, list([1, 1, 1, 1, 1, 1, 1, 1, 1]))
nests = nesting.split(", ")
c = 0
for name in nests:
lab = tk.Label(
self.inputs, text=name, takefocus=1, bg="black", fg="white",
relief="raised", bd=3)
lab.grid(column=c, row=r, sticky="ew", padx=3, pady=3)
lab.bind("<Button-1>", self.toggle_colors)
lab.bind("<Control-Button-1>", self.highlight_alias)
lab.bind("<Leave>", self.check_if_places_done)
self.places_tracker[nesting]["row"].append(
{"widget": lab, "name": name})
c += 1
r += 1
def highlight_alias(self, evt):
evt.widget["bg"] = "red"
def check_if_places_done(self, evt):
places_done = True
for child in self.inputs.winfo_children():
if child["bg"] != "gray":
places_done = False
if places_done:
self.done["state"] = "normal"
def toggle_colors(self, evt):
def revert_id(widget):
grid_info = widget.grid_info()
column, row = grid_info["column"], grid_info["row"]
place_id = nested_places[row][1][column]
for idx, tup in enumerate(nested_places.values()):
for indx, num in enumerate(tup[1]):
if num == place_id:
nested_places[idx][1][indx] = 1
widg = evt.widget
self.current_place_name = text = evt.widget["text"]
if widg["bg"] == "black":
widg["bg"] = "red"
for nesting, dkt in self.places_tracker.items():
for dk in dkt["row"]:
if dk["name"] == text and dk["widget"]["bg"] != "gray":
dk["widget"]["bg"] = "red"
elif dk["widget"]["bg"] == "gray":
pass
else:
dk["widget"]["bg"] = "black"
elif widg["bg"] == "gray":
widg["bg"] = "black"
for nesting, dkt in self.places_tracker.items():
for dk in dkt["row"]:
if dk["name"] == text:
dk["widget"]["bg"] = "black"
revert_id(dk["widget"])
else:
widg["bg"] = "black"
def save_place(self):
place_id = self.make_place()
names = set()
for child in self.inputs.winfo_children():
if child["bg"] == "red":
names.add(child["text"])
nesting_ids = nested_places[child.grid_info()["row"]][1]
nesting_ids[child.grid_info()["column"]] = place_id
child["bg"] = "gray"
for name in names:
self.make_place_name(place_id, name)
def run_next_for_black(self):
for child in self.inputs.winfo_children():
if child["bg"] == "black":
place_id = self.make_place()
nesting_ids = nested_places[child.grid_info()["row"]][1]
nesting_ids[child.grid_info()["column"]] = place_id
child["bg"] = "gray"
self.make_place_name(place_id, child["text"])
def make_place(self):
self.max_place_id += 1
places[self.max_place_id] = {"aliases": [], "LATI": None, "LONG": None}
return self.max_place_id
def make_place_name(self, place_id, place_name):
self.max_place_name_id += 1
places[place_id]["aliases"].append((self.max_place_name_id, place_name))
def ok_places(self):
self.gedport.ready_for_db += 1
if self.gedport.ready_for_db >= 2:
self.gedport.insert_to_db()
self.destroy()
def exit_exceptions_report(self):
self.destroy()
class ExceptionsReport(ScrolledDialog):
def __init__(self, master, source_gedcom_file, gedport, *args, **kwargs):
ScrolledDialog.__init__(self, master, *args, **kwargs)
self.treebard = master
self.source_gedcom_file = source_gedcom_file
self.gedport = gedport
self.geometry("+100+50")
self.protocol("WM_DELETE_WINDOW", self.exit_exceptions_report)
self.conn = sqlite3.connect(appwide_db_path)
self.conn.execute("PRAGMA foreign_keys = 1")
self.cur = self.conn.cursor()
self.title("GEDCOM Import Exceptions Log")
self.make_widgets()
ScrolledDialog.bind_canvas_to_mousewheel(self.canvas)
configall(self, self.formats)
self.resize_scrolled_content(self, self.canvas, add_x=16, add_y=24)
self.cur.close()
self.conn.close()
def make_widgets(self):
def ok():
self.gedport.ready_for_db += 1
if self.gedport.ready_for_db >= 2:
self.gedport.insert_to_db()
self.destroy()
self.columnconfigure(1, weight=1)
self.window.columnconfigure(2, weight=1)
self.window.rowconfigure(1, minsize=60)
self.window.columnconfigure(0, weight=1)
self.window.rowconfigure(0, weight=1)
headlab = tk.Label(
self.window,
text= f"Some of the information from the source GEDCOM file "
f"\n `{self.source_gedcom_file}` \ncould not be imported. \n\nThe "
f"exceptions log detailing these unimported data will be stored in the "
f"`etc` folder of your Treebard app.",
justify="left", wraplength=600, bd=1, relief="raised")
self.main_menu = tk.Frame(self.window)
self.write_exceptions_report()
buttons = tk.Frame(self.window)
ok_button = tk.Button(
buttons, text="ADD GEDCOM DATA TO NEW TREE", command=ok)
cancel_button = tk.Button(
buttons, text="EXIT IMPORT PROCESS",
command=self.exit_exceptions_report)
headlab.grid(column=0, row=0, pady=12, ipadx=6, ipady=6, padx=12)
self.main_menu.grid(column=0, row=1)
buttons.grid(column=0, row=2, sticky="e", pady=12)
ok_button.grid(column=0, row=0, sticky="e")
cancel_button.grid(column=1, row=0, sticky="e", padx = (6,0))
widgets = self.main_menu.winfo_children()
if len(widgets) != 0:
widgets[0].focus_set()
def write_exceptions_report(self):
mn = 70
lg = 10
time_stamp = datetime.now().strftime("%d %B %Y %H:%M")
with open(self.gedport.exceptions_log, mode="w", encoding="utf-8-sig") as self.buts:
self.buts.write((f"GEDCOM Import Exceptions Log of {time_stamp}\n "
f"from {self.source_gedcom_file}\n"))
self.buts.write((f"Imported by Treebard Genealogy Software to UNIGEDS, the Universal "
f"Genealogy Database Structure\n and the replacement for GEDCOM\n\n"))
if len(exceptions) == 0:
self.buts.write(f"\nThere are no exceptions in {self.source_gedcom_file}.")
for key in exceptions:
title, instrux = gcc.EXCEPTION_MENU_ITEMS[key]
self.buts.write(f"\n\n{title}\n\n")
instx = instrux.split()
c = 0
for i in range(len(instx)):
self.buts.write(f"{instx[i]} ")
c = c + len(instx[i]) + 1
if (i + 1) % lg >= 0 and c >= mn:
self.buts.write('\n')
c = 0
self.write_buts(key)
def xyz(self, key):
""" See superceded `gedkandu.py` for procedures already developed for
handling more custom tags, weird tags, exceptions and edge cases.
"""
print("line", look(seeline()).lineno, "exceptions[key]", exceptions[key])
def write_buts(self, key):
except_funx = {
'bad_dates': self.write_bad_dates,
'gedcom_fam_events': self.write_family_events,
'custom_records': self.write_custom_records,
'PEDI_tag_is_useless': self.write_pedi,
'single_adopters': self.xyz,
'event_address_parts': self.xyz,
'researcher_interests': self.xyz,
'unknown_Y_values': self.xyz,
'per_specs_ALIA': self.xyz,
'missing_citations': self.xyz,
'undefined_assertions': self.xyz,
'eventless_roles': self.xyz,
'locators_linked_to_sources_via_repositories': self.write_locators,
'assertion_made_date_not_linked_to_citation': self.xyz}
for kk,vv in except_funx.items():
if kk == key:
except_funx[kk](key)
def write_custom_records(self, key):
for record in exceptions[key]:
primary = record[0]
lines = record[1:]
if len(lines) == 0:
continue
self.buts.write(f"\n\n{primary}\n")
for line in lines:
self.buts.write(f" {line}\n")
def write_bad_dates(self, key):
self.buts.write("\n\nDAY MONTH YEAR\n\n")
for row, item in enumerate(exceptions[key]):
event_id, bad_date = item
text = f"{bad_date} (event ID #{event_id})"
event = events[event_id]
for k,v in self.gedport.types["event_types"].items():
if v == event["EVNT_TYPE_FK"]:
event_type = k
break
current_person_name = None
couple_id = None
left_person_name = None
right_person_name = None
nested_place = None
particulars = None
age = None
age1 = None
age2 = None
if event.get("PRSN_FK"):
current_person_name = self.gedport.get_name(event["PRSN_FK"])
if event.get("CUPL_FK"):
couple_id = event["CUPL_FK"]
left_person_name = self.gedport.get_name(couples[couple_id]["PRSN1_FK"])
right_person_name = self.gedport.get_name(couples[couple_id]["PRSN2_FK"])
if event.get("PLACE_NEST_FK"):
nested_place = nested_places[event["PLACE_NEST_FK"]][0]
if event.get("EVNT_DETL"):
particulars = event["EVNT_DETL"]
if event.get("EVNT_AGE"):
age = event["EVNT_AGE"]
if event.get("EVNT_AGE1"):
age1 = event["EVNT_AGE1"]
if event.get("EVNT_AGE2"):
age2 = event["EVNT_AGE2"]
if event_type == "birth":
one_event = (f"GEDCOM mentions the birth of the person "
f"{current_person_name}, whose parents were\n "
f"{left_person_name}, age {age1}, and {right_person_name}, "
f"age {age2} at the time of the event.\n GEDCOM describes "
f"the event thus:\n Event place: {nested_place}.\n "
f"Event particulars: {particulars}.\n")
elif couple_id:
one_event = (f"GEDCOM mentions {event_type} event for the couple "
f"comprising\n {left_person_name}, age {age1}, and "
f"{right_person_name}, age {age2}.\n GEDCOM describes the "
f"event thus:\n Event place: {nested_place}.\n Event "
f"particulars: {particulars}.\n")
else:
one_event = (f"GEDCOM mentions {event_type} event for the person "
f"{current_person_name}, age {age}.\n GEDCOM describes "
f"the event thus:\n Event place: {nested_place}.\n "
f"Event particulars: {particulars}.\n")
self.buts.write(f"{text}:\n {one_event}\n")
def write_locators(self, key):
self.buts.write("\n\nREPOSITORY SOURCE LOCATOR\n\n")
lst = exceptions[key]
for dkt in lst:
repository_name = dkt["repository_name"]
source_title = dkt["source_title"]
locator = dkt["locator"]
self.buts.write(f"{repository_name} {source_title}\n")
self.buts.write(f" {locator}\n\n")
def write_pedi(self, key):
self.buts.write("\n\nEVENT TYPE PARENTS CURRENT PERSON\n\n")
lst = exceptions[key]
for dkt in lst:
event_type_string = dkt["event"][1]
left_person_id, left_person = dkt["left_person"]
right_person_id, right_person = dkt["right_person"]
current_person_id, current_person = dkt["current_person"]
self.buts.write(f"{event_type_string} \n")
self.buts.write((f" {left_person} ID #{left_person_id} & {right_person} "
f"ID #{right_person_id} \n"))
self.buts.write(f" {current_person} ID #{current_person_id}\n\n")
def write_family_events(self, key):
for couple_id, lst in exceptions["gedcom_fam_events"].items():
if len(lst) == 0:
continue
self.buts.write(f"\n\nIn FAM record for family #{couple_id}:")
for subrecord in lst:
self.buts.write("\n ...")
self.buts.write("\n ")
block = "\n ".join(subrecord)
self.buts.write(f"{block}")
def exit_exceptions_report(self):
self.destroy()
class CustomLinesPruner():
def __init__(self, superiors):
self.superiors = superiors
if exceptions.get("custom_records") is None:
exceptions["custom_records"] = []
def extract_subrecord(self, num, tag, value):
""" Extract custom tag lines and all lines subordinate to them.
For each line, identify its chain of superior tags, stopping when:
1) a custom tag is found in the chain of superiors, in which
case the tag is in a custom subrecord and is discarded, or
2) a zero-line is reached, in which case the tag is not in a
custom subrecord and is kept.
"""
line = None
a = num
num = str(num)
if a == 0:
line = " ".join([num, tag, value])
for i in range(a):
supertag = self.superiors[a-1][0]
if tag.startswith("_"):
break
elif tag.startswith("_") is False:
if supertag.startswith("_"):
break
elif a == 1:
line = " ".join([num, tag, value])
a -= 1
elif a > 1:
a -= 1
else:
print("case not handled")
return line
# DO LIST
# Import pandas and read the file in chunks, in RAM, and concat back together to get the resulting file, but keep it in RAM. Don't keep saving it over and over by different names, just do that once at the end. This is Tamura Jones' prescription to me: "Programs that manipulate GEDCOM in memory are the programs that import GEDCOM quickly. Writing to disc is what slows everything down." https://www.youtube.com/watch?v=97t9zmXeyD0
# ADD forum category gedcom_sample to repo.html
# add 2 new gedcom videos to youtube and video.html (watch it first and make .txt file), post on forum, post today's commit to forum
# what ever happened in ahnenblatt OBJE (re: non-primary records being made), fix it for ??? too and add lines to RA TEAL.ged to test it. Anything that only optionally uses a primary record. SOUR and ???
# make import & export work from all relevant buttons ie on the Treebard dialog and on the app text menu at top
# make another gedcom movie see do list fix stuff above here first
# redo tour 9, the other was edited but the sound and picture don't match anymore so delete it and make another based on the .txt description
# birth event age 0 not being added in db when creating birth events
# go thru gedcom_constants.py for items that can be deleted since they're no longer being used such as genbox name types
# event_type_id = self.types["event_types"][event_type] LINE 1065 MAKE A NEW event type if type doesn't exist, built_in = 0, hidden = 0, don't add to new_tree.py I REFUSE TO CALL census an event but user can if he wants, re-run get_types(), reTest fh7 and legacy
# after the export file is made, it shd be displayed on a label, then if making another export file, the label shd be blanked out on press of button and replaced with new disploy when export is finished
# get rid of the export exceptions class or method, it's not needed
# make all the dialogs modal
# RE: multiple birth/death, ADD TO .ged so it can be tested: make them all assertions and don't conclude anything, just link the assertions to the birth event and create a death event with sources but no conclusions
# change EXIT IMPORT PROCESS TO CANCEL IMPORT PROCESS and use the reset() method to fix the db
# instruct user to copy gedcom files to etc directory or else navigate to it with the first button on the import tab
# OK and CANCEL buttons should work--cancel buttons already say EXIT IMPORT PROCESS but that is what they should actually do. Also, if you press CANCEL on the bad dates dialog, then try to open the dialog again, there is an error, maybe I forgot to delete the dlg from a dict?
# add exceptions for unhandled/ignored esp. CALN since it's needed for REPO to work in Treebard
# test all functionalities and look at all schemas to make sure link_links and misc_links are completely gone, see EXAMPLE 33 in sqlite guide for how to create a new blank default_new_tree.db by making changes only once in sample_tree.tbd
# why is the alias "U.S." being used for all
# keep track of file names created (temp files like .lux, head.ged, data.ged) and delete them at the end, also add the deletion of these files to the reset() method
# test with real gedcom files 551 and real genbox files
# Look for things that are not being concatenated. What about places eg "United Sta"?
# if idx in self.concatenations:
# value = self.concatenations[idx]