GEDCOM import code is alive (first draft)
Oct 15, 2023 1:08:17 GMT -8
Post by Uncle Buddy on Oct 15, 2023 1:08:17 GMT -8
It ain't perfect, boys and girls, but it's a GEDCOM import program. It works, and I wrote it all on my lonesome.
Three files are needed, which I'll copy in this thread. I'll attach the database that the types are stored in, the code also needs that to run. Another file is optional, it's in the code repository on this forum. gedkandu.py is the import program. r_a_teal_gedcom.ged is the GEDCOM file I used to test the import program with. gedcom.py holds mostly constants needed by the import program. If you get the paths right for your computer, it should work. There are five paths for you to edit in gedkandu.py. CTRL-F "d:/"
To see the results, uncomment the parts of self.print_results() that you're interested in. There's a file you'll need for this (in the code repository--see main page of the forum) called `dev_tools.py` so these print lines will work. If you don't care to see the line numbers in the print results, leave the commented lines commented and add your own print lines. If you don't want to import dev_tools, also comment out the lines that import it. The dev_tools.py module is here: treebard.proboards.com/thread/191/dev-tools-py
This program imports almost every GEDCOM 5.5.1 tag and a few other things that I thought were important.
Three text files will be written to your computer every time you run the code. One is the head, extracted from the GEDCOM file. Another is the body of the GEDCOM without the head. The other is a .mom textfile which is a translation of the GEDCOM's body into a text file wherein the GEDCOM tags have been replaced with non-ambiguous tags. These three text files can just be deleted, they build up pretty fast so I could add some code to delete them automatically but haven't gotten around to it yet. Here's what they look like in Windows Explorer:
More information is in the file itself. The results are Python dictionaries that match UNIGEDS' data structure. This code is independent of my Treebard software except for the types tables which it will read from the database I'm going to attach here. <drive>:\treebard_gps\data\settings\unigeds.db
Download this file and change the extension to `.db`.
unigeds.txt (140 KB)
Three files are needed, which I'll copy in this thread. I'll attach the database that the types are stored in, the code also needs that to run. Another file is optional, it's in the code repository on this forum. gedkandu.py is the import program. r_a_teal_gedcom.ged is the GEDCOM file I used to test the import program with. gedcom.py holds mostly constants needed by the import program. If you get the paths right for your computer, it should work. There are five paths for you to edit in gedkandu.py. CTRL-F "d:/"
To see the results, uncomment the parts of self.print_results() that you're interested in. There's a file you'll need for this (in the code repository--see main page of the forum) called `dev_tools.py` so these print lines will work. If you don't care to see the line numbers in the print results, leave the commented lines commented and add your own print lines. If you don't want to import dev_tools, also comment out the lines that import it. The dev_tools.py module is here: treebard.proboards.com/thread/191/dev-tools-py
This program imports almost every GEDCOM 5.5.1 tag and a few other things that I thought were important.
Three text files will be written to your computer every time you run the code. One is the head, extracted from the GEDCOM file. Another is the body of the GEDCOM without the head. The other is a .mom textfile which is a translation of the GEDCOM's body into a text file wherein the GEDCOM tags have been replaced with non-ambiguous tags. These three text files can just be deleted, they build up pretty fast so I could add some code to delete them automatically but haven't gotten around to it yet. Here's what they look like in Windows Explorer:
r_a_teal_gedcom_202310151849_DATA.ged
r_a_teal_gedcom_202310151849_gedMOM.mom
r_a_teal_gedcom_202310151849_HEAD.ged
More information is in the file itself. The results are Python dictionaries that match UNIGEDS' data structure. This code is independent of my Treebard software except for the types tables which it will read from the database I'm going to attach here. <drive>:\treebard_gps\data\settings\unigeds.db
Download this file and change the extension to `.db`.
unigeds.txt (140 KB)
# gedkandu.py
"""
October 15, 2023
From a design standpoint, this code is not perfect. There are some minor
redundancies and inconsistencies of function and design, and therefore some
points of potential confusion and places where the code just needs to be cleaned up (unused parameters, for example). But it is a working first draft and it took
almost 1-1/2 years of off-and-on stabs in the dark, and finally more than two months
of full-time work, to create this file, my first working GEDCOM import program.
The result of running this code: some Python dictionaries that are matched to
the expectations of my SQLite database structure, UNIGEDS. I already know this
nested dictionary structure will slide right into UNIGEDS, as that was the
first thing I tested before starting the current project two months ago. I also
plan to write an interactive exception report which the user can run in a GUI
before the data is used to create the SQL database. The interactive exception
report will give the user a chance to add loose ends to the Python dictionaries.
By "loose ends" I mean data that GEDCOM was not able to communicate well enough
to satisfy the data structure of UNIGEDS. The entire process can be summarized
in this way:
GEDCOM .ged text file is exported by a genealogy software application.
GEDKANDU translates GEDCOM to Python dictionaries.
InterProCept is an interactive, pro-active user interface for adding loose ends
to the collection of data that UNIGEDS will receive.
UNIGEDS is a SQLite database designed to match the real-world relationships of
real-world genealogy data to each other.
Treebard GPS is the first genealogy software to use UNIGEDS as its back-end.
Any and all genieware products can be adapted to use UNIGEDS as their
back end, and in this way, GEDCOM could be replaced. Treebard is not a
finished product meant for daily use, but a working model written in Python
for convenience, as well as a showcase of functionalities. Treebard can be
translated from Python to any other programming language.
I am not a programmer. Therefore, my work is accessible to other non-programmers.
While I naturally would make the mistakes that novice programmers make, I
literally can't make the mistake that expert programmers make, which is to
write code that only experts can use. The purpose of my project is to prove
that genealogists can write their own software.
Anyone can use any of the products listed above for any purpose, for free
without my permission, without paying me. All my work in the field of
genealogy software creation is in the public domain, with no exception.
If you want to donate, contact me and we'll find a way. This is a one-man
project and I'm planning to keep it that way, as I am old and need to get
away from the computer for my health. A team project is probably not my
cup of tea. Many thanks to all who have accidentally encouraged me...
'cause as far as I know, nobody has encouraged me on purpose!
"""
import sqlite3
from datetime import datetime
# from files import global_db_path
import gedcom as ged
import dev_tools as dt
from dev_tools import looky, seeline
# TO TEST THIS CODE:
# Replace `d:/` in 4 places with your current drive.
# Keep the folder structure as-is, this is how Treebard works.
# Or change it if you prefer, then change the paths (4 of them).
import_file = "d:/treebard_gps/app/python/r_a_teal_gedcom.ged"
global_db_path = "d:/treebard_gps/data/settings/unigeds.db"
def make_filenames():
filename = import_file.split("/")[-1].replace(".ged", "")
stamp = datetime.now().strftime("%Y%m%d%H%M")
translation = f"d:/treebard_gps/app/python/{filename}_{stamp}_gedMOM.mom"
return translation
translation = make_filenames()
superior = {0: None,
1: None, 2: None, 3: None, 4: None, 5: None, 6: None, 7: None, 8: None,
9: None, 10: None, 11: None, 12: None, 13: None, 14: None, 15: None,
16: None, 17: None, 18: None, 19: None, 20: None, 21: None, 22: None,
23: None, 24: None, 25: None, 26: None, 27: None, 28: None, 29: None,
30: None, 31: None, 32: None, 33: None, 34: None, 35: None, 36: None,
37: None, 38: None, 39: None, 40: None, 41: None, 42: None, 43: None,
44: None, 45: None, 46: None, 47: None, 48: None, 49: None, 50: None,
51: None, 52: None, 53: None, 54: None, 55: None, 56: None, 57: None,
58: None, 59: None, 60: None, 61: None, 62: None, 63: None, 64: None,
65: None, 66: None, 67: None, 68: None, 69: None, 70: None, 71: None,
72: None, 73: None, 74: None, 75: None, 76: None, 77: None, 78: None,
79: None, 80: None, 81: None, 82: None, 83: None, 84: None, 85: None,
86: None, 87: None, 88: None, 89: None, 90: None, 91: None, 92: None,
93: None, 94: None, 95: None, 96: None, 97: None, 98: None, 99: None}
class GedKanDu():
def __init__(self, import_file):
self.do_function = {
"PRSN": self.do_person,
"CUPL": self.do_couple,
"RMRK": self.do_remark,
"MDIA": self.do_media,
"CNTCT": self.do_contact,
"SORC": self.do_source,
"RPST": self.do_repository,
"RPST_NAME": self.do_repository_name,
"RPST_ADRS": self.do_address,
"CNTCT_NAME": self.do_contact_details,
"CNTCT_LANGUAGE": self.do_contact_details,
"CNTCT_ADDRESS": self.do_address,
"EVNT_ADRS": self.do_address,
"NAME_STRG": self.do_person_name,
"RMRK_STRG": self.do_remark,
"PLACE_NEST_STRG": self.do_place,
"PLACE_TYPE_STRG": self.do_place,
"_LOC": self.do_place,
"_PLAC": self.do_place,
"_PLACE": self.do_place,
"PLACE_LTTD": self.do_place,
"PLACE_LNGTD": self.do_place,
"TRNSCRPN_STRG_R": self.do_transcription,
"TRNSCRPN_STRG_P": self.do_transcription,
"TRNSCRPN_TYPE_STRG": self.do_transcription_type,
"ALTR": self.do_change_date,
"ALTR_TIME": self.do_change_time,
"ALTR_DATE": self.do_change_date,
"EVNT_DATE": self.do_event_date,
"ASRTN_MADE_DATE": self.do_assertion_made_date,
"PRSN_GNDR": self.do_gender,
"EVNT_AGE": self.do_age,
"EVNT_AGE1": self.do_age,
"EVNT_AGE2": self.do_age,
"EVNT_TYPE_STRG": self.do_event_type,
"NAME_TYPE_STRG": self.do_name_type,
"MDIA_EXTN": self.do_media,
"MDIA_FILE": self.do_media,
"MDIA_TITLE": self.do_media,
"MDIA_TYPE_STRG": self.do_media,
"SORC_PBLN": self.do_publication,
"SORC_ATHR": self.do_author,
"SORC_TITL": self.do_source_title,
"LCTR_STRG": self.do_locator,
"ASRTN_SRTY": self.do_surety,
"MEDI": self.do_media,
"ANCI": self.do_project,
"DESI": self.do_project,
"LANG": self.do_contact,
"AGNC": self.do_source_creator,
"ABBR": self.do_source_abbrev,
"REFN": self.do_reference_number,
"AFN": self.do_reference_number,
"RFN": self.do_reference_number,
"RIN": self.do_reference_number,}
self.FKS = { # foreign keys (pointers)
"PRSN_FK": self.do_person_fk, "RMRK_FK": self.do_remark_fk,
"CNTCT_FK": self.do_contact_fk, "RPST_FK": self.do_repository_fk,
"DESI": self.do_contact_fk, "ANCI": self.do_contact_fk,
"MDIA_FK": self.do_media_fk, "SORC_FK": self.do_source_fk}
self.conn = sqlite3.connect(global_db_path)
self.conn.execute("PRAGMA foreign_keys = 1")
self.cur = self.conn.cursor()
self.event_type = None
self.types = {
"name_types": {}, "event_types": {}, "role_types": {},
"source_types": {}, "locator_types": {}, "media_types": {},
"relationship_types": {}, "repository_types": {},
"place_types": {}, "transcription_types": {}}
# Change each ID to the corresponding max_id.
self.get_types()
self.head = []
self.vendor = ""
self.submitter = ""
self.lines = {}
self.custom_tags = {}
self.exceptions = {}
self.primary_tag = None
self.adoptees = {}
self.address_block = []
self.current_address = ""
self.nested_place_counter = set()
self.single_place_counter = set()
self.nested_place_string = ""
self.citations_filter = {}
self.asso = False
self.relationship_labels = {}
self.get_relationship_types()
self.witn = False
self.fh_shared_event = False
self.role_person_id = None
self.concatenating = False
self.base = []
self.anchor = None
self.concatenations = {}
self.remarks_text_to_file = []
self.good_chan = False
self.in_chandatetime = False
# current record PKs with GEDCOM identifiers:
self.person_id = None
self.couple_id = None
self.remark_id = None
self.source_id = None
self.contact_id = None
self.media_id = None
self.repository_id = None
# current record PKs with no GEDCOM identifiers
self.name_id = 0
self.citation_id = 0
self.assertion_id = 0
self.place_id = 1
self.place_name_id = 100
self.nested_place_id = 1000
self.event_id = 0
self.locator_id = 0
self.handle_id = 0
self.chart_id = 0
self.report_id = 0
self.project_id = 0
self.to_do_id = 0
self.transcription_id = 0
self.links_links_id = 0
self.places_types_id = 0
self.change_date_id = 0
# `self.person_id` changes with FK also; `self.current_person_id` is PK only:
self.current_person_id = 0
self.current_media_id = 0
self.persons = {}
self.couples = {}
self.remarks = {}
self.sources = {}
self.contacts = {}
self.media = {}
self.repositories = {}
self.names = {}
self.citations = {}
self.assertions = {}
self.places = {}
self.place_names = {}
self.nested_places = {}
self.events = {}
self.locators = {}
self.handles = {}
self.charts = {}
self.reports = {}
self.projects = {}
self.to_dos = {}
self.transcriptions = {}
self.places_types = {}
self.links_links = {}
self.change_dates = {
"PRSN": {}, "RMRK": {}, "RPST": {}, "CNTCT": {}, "MDIA": {},
"CUPL": {}, "SORC" : {}}
self.process_gedcom()
self.cur.close()
self.conn.close()
self.print_results()
def process_gedcom(self):
self.split_file(import_file)
self.save_head()
self.read1()
self.save_lines()
self.save_citations_assertions()
self.adjust_for_adoptions()
self.link_remarks_texts()
def link_remarks_texts(self):
for tup in self.remarks_text_to_file:
remark_id, idx = tup
self.remarks[remark_id]["RMRK_STRG"] = self.concatenations[idx]
def split_file(self, import_file):
filename = import_file.split("/")[-1].replace(".ged", "")
stamp = datetime.now().strftime("%Y%m%d%H%M")
self.head_file = f"d:/treebard_gps/app/python/{filename}_{stamp}_HEAD.ged"
self.data_file = f"d:/treebard_gps/app/python/{filename}_{stamp}_DATA.ged"
with open(import_file, mode="r", encoding="utf-8-sig") as intake, open(
self.head_file, mode="w", encoding="utf-8-sig") as head:
for idx,line in enumerate(intake):
if line.startswith("0 @") is False:
head.write(line)
else:
here = idx
break
with open(import_file, mode="r", encoding="utf-8-sig") as intake, open(
self.data_file, mode="w", encoding="utf-8-sig") as exhaust:
for idx, line in enumerate(intake):
if line.startswith("0 @") and idx >= here:
exhaust.write(line)
elif idx >= here:
exhaust.write(line)
def save_head(self):
def get_vendor(line):
self.vendor = line.split()[2]
def get_submitter(line):
fk = line.split()[2]
self.submitter = int(''.join(c for c in fk if c.isdigit()))
h = open(self.head_file, "r", encoding="utf-8-sig")
for idx, line in enumerate(h.readlines()):
line = line.replace("\n", "")
self.head.append(line)
HEAD_DICT = {"SOUR": get_vendor, "SUBM": get_submitter}
for line in self.head:
for k,v in HEAD_DICT.items():
if k in line:
v(line)
break
def read1(self):
""" Translate GEDCOM tags to unambiguous tags which have one usage each.
Get rid of tags which convey no information. This is supposed to be
more efficient without `readlines()`. If I'm correct, `readlines()`
stores the whole file in memory as a collection of lines, so maybe
better be avoided for large files.
Re-order zero-lines so the item order is "num, tag, value" like all
the other lines.
Every zero-line should have 4 items since the 0 NOTE line can have
4 items and some 0 _PLAC lines have 4 items. Every non-zero line
should have 3 items, e.g. `1 EVEN` expands to `1 EVEN null`. With
symmetrical lines, line length never has to be tested.
Remove the HEAD record so its tags don't have to be side-stepped
throughout the entire process.
"""
supers = dict(superior)
with open(self.data_file, mode="r", encoding="utf-8-sig") as gedcom:
with open(translation, mode="w", encoding="utf-8-sig") as gedmom:
for ln in gedcom:
if ln.endswith("TRLR"):
return
elif ln.startswith("0"):
line = ln.replace("\n", " fourth")
lx = line.split(" ", 3)
lst = lx[0:4]
x3 = lst[3]
if x3 != "fourth" and x3.endswith("fourth"):
fix = x3.replace(" fourth", "", -1)
lst[3] = fix
else:
line = ln.replace("\n", " null")
lx = line.split(" ", 2)
lst = lx[0:3]
x2 = lst[2]
if x2 != "null" and x2.endswith("null"):
fix = x2.replace(" null", "", -1)
lst[2] = fix
if lst[0] == "0" and lst[1] != "TRLR":
num, value, tag, more = lst
if more == "fourth":
more = ""
lst = [num, tag, value, more]
self.primary_tag = tag
else:
num, tag, value = lst
supers[int(num)] = (tag, value)
new_tag = tag
if (tag in ged.IDENTIFIERS and num != "0" and
value.startswith("@") and value.endswith("@")):
new_tag = f"{ged.IDENTIFIERS[tag]}_FK"
# The new tag will be needed right away for ambig. `DATA`:
supers[int(num)] = (new_tag, value)
elif tag in ged.VAGUE_TAGS:
supertag = supers[int(num)-1][0]
self.primary_tag = supers[0][0]
new_tag = ged.VAGUE_TAGS[tag][supertag]
elif tag.startswith("_"):
new_tag = tag
elif tag in ("ASSO", "RELA", "CONC", "CONT"):
new_tag = tag
elif tag in (
"HUSB", "WIFE", "FAMS", "FAMC", "CHIL", "BIRT",
"ADOP", "PEDI"):
new_tag = tag
elif tag in ged.ADDRESS_TAGS:
new_tag = tag
else:
new_tag = ged.TAGS[tag]
lst[1] = new_tag
new_line = " ".join(lst)
gedmom.write(f"{new_line}\n")
def get_types(self):
for table in ged.MAX_TYPE_IDS:
self.cur.execute(f"SELECT {table}_id, {table}s FROM {table}")
types = self.cur.fetchall()
self.types[f"{table}s"] = {tup[1]: tup[0] for tup in types}
self.cur.execute(f"SELECT MAX({table}_id) from {table}")
max_id = self.cur.fetchone()[0]
def save_lines(self):
with open(translation, mode="r", encoding="utf-8-sig") as gedmom:
for idx,ln in enumerate(gedmom):
line = ln.replace("\n", "")
self.read2(idx, line)
with open(translation, mode="r", encoding="utf-8-sig") as gedmom:
for idx,ln in enumerate(gedmom):
line = ln.replace("\n", "")
self.read3(idx, line)
def read2(self, idx, line):
if line.startswith("0"):
lst = line.split(" ", 3)
num, (tag, value, more) = int(lst[0]), lst[1:]
superior[num] = [tag, value, more]
else:
lst = line.split(" ", 2)
num, (tag, value) = int(lst[0]), lst[1:]
superior[num] = [tag, value]
num = int(num)
lst[0] = num
if num == 0:
self.read_primary_key_line(idx, lst)
if self.concatenating is True:
self.concatenations[self.anchor] = "".join(self.base)
self.concatenating = False
if len(more) != 0:
self.base = [more]
else:
self.base = []
self.anchor = idx
elif tag not in ("CONC", "CONT"):
if self.concatenating and len(self.base) != 0:
self.concatenations[self.anchor] = "".join(self.base)
self.base = [value]
self.anchor = idx
self.concatenating = False
elif tag in ("CONC", "CONT"):
self.concatenating = True
tag, text = lst[1:3]
if tag == "CONT":
text = f"\n{text}"
self.base.append(text)
def do_remark(self, idx, num, tag, value, more=""):
self.links_links_id += 1
linked_tag = None
linked_id = None
remark_fk = None
idnum = None
key = None
if num == 0:
remark_fk = self.remark_id = value
self.remarks[self.remark_id] = {"RMRK_STRG": ""}
self.remarks_text_to_file.append((self.remark_id, idx))
else:
linked_tag, linked_id = superior[num-1][0:2]
remark_string = value
if linked_id.startswith("@"):
idnum = int("".join(c for c in linked_id if c.isdigit()))
if linked_tag.endswith("_FK"):
key = linked_tag
else:
key = f"{linked_tag}_FK"
if linked_tag == "PRSN":
key = "PRSN_FK"
elif linked_tag == "SORC":
key = "SORC_FK"
elif linked_tag == "SUBM":
key = "CNTCT_FK"
elif linked_tag == "OBJE":
key = "MDIA_FK"
elif linked_tag == "REPO":
key = "RPST_FK"
elif linked_tag == "FAM":
key = "CUPL_FK"
elif linked_tag == "NAME_STRG":
idnum = self.name_id
key = "NAME_FK"
elif linked_tag == "BIRT" or linked_tag in (ged.EVENT_TYPES.values()):
idnum = self.event_id
key = "EVNT_FK"
elif linked_tag == "PLACE_NEST_STRG":
idnum = self.nested_place_id
key = "PLACE_NEST_FK"
elif linked_tag == "CTTN_STRG":
idnum = self.citation_id
key = "CTTN_FK"
elif linked_tag == "ASRTN_STRG":
idnum = self.assertion_id
key = "ASRTN_FK"
elif linked_tag == "LCTR_STRG":
idnum = self.locator_id
key = "LCTR_FK"
remark_fk = self.remark_id = max(list(self.remarks.keys())) + 1
self.links_links[self.links_links_id] = {"RMRK_FK": remark_fk, key: idnum}
self.remarks[self.remark_id] = {"RMRK_STRG": remark_string}
self.lines[idx] = [num, tag, value, more]
def read_primary_key_line(self, idx, lst):
num, tag, value, more = lst[0:]
if tag.startswith("_"):
if (tag.startswith("_PLAC") or tag.startswith("_PLACE") or
tag == "_LOC"):
pass
else:
self.file_custom_tag(idx, lst)
pk = int("".join(c for c in value if c.isdigit()))
if tag not in ("_PLAC", "_PLACE", "_LOC"):
self.do_function[tag](idx, num, tag, pk)
else:
pass
def read3(self, idx, line):
if line.startswith("0"):
lst = line.split(" ", 3)
num, (tag, value, more) = int(lst[0]), lst[1:]
else:
lst = line.split(" ", 2)
num, (tag, value) = int(lst[0]), lst[1:]
if idx in self.concatenations:
value = self.concatenations[idx]
if num == 0:
superior[num] = [tag, value, more]
else:
superior[num] = [tag, value]
if num == 0:
# Re-order num, tag, value for zero-lines only:
self.primary_tag = tag
if tag == "PRSN":
self.current_person_id = int(
"".join(c for c in value if c.isdigit()))
elif tag == "ROLE_TYPE_STRG" or tag in (ged.ROLE_TAGS):
self.do_role(idx, num, tag, value)
elif tag in ("_AKA", "_AKAN", "_USED") or tag in ged.NAME_TYPES.values() or (
tag == "_TYPE" and value in ged.GENBOX_NAME_TYPES) or (
tag == "TYPE" and superior[num-1][0] == "ID_number") or (
tag == "TYPE" and superior[num-1][0] == "NAME_STRG"):
self.do_name_type(idx, num, tag, value)
elif tag.startswith("_"):
self.file_custom_tag(idx, lst)
elif tag in ("ALTR", "ALTR_TIME"):
if (tag == "ALTR" and num != 1) or (tag == "ALTR_TIME" and num != 3):
self.good_chan = False
if tag == "ALTR":
self.in_chandatetime = True
elif tag == "ALTR_TIME":
self.in_chandatetime = False
elif (tag == "ALTR" and num == 1):
self.in_chandatetime = True
self.good_chan = True
self.do_change_date(idx, num, tag, value)
elif (tag == "ALTR_TIME" and num == 3):
self.good_chan = False
self.in_chandatetime = False
self.do_change_date(idx, num, tag, value)
elif tag == "ALTR_DATE":
if self.good_chan:
self.do_change_date(idx, num, tag, value)
elif self.good_chan is False and self.in_chandatetime:
pass
elif tag in ged.EVENT_TYPES.values():
value = None
self.event_type = tag
if lst[2] != "null":
value = lst[2]
self.do_event(idx, num, tag, value, more=self.event_type)
elif tag in ("CTTN_STRG", "ASRTN", "ASRTN_STRG", "ASRTN_SRTY"):
if tag == "ASRTN_STRG" and num == 1:
new_tag = "RMRK_STRG"
else:
new_tag = tag
self.lines[idx] = [int(num), new_tag, value]
elif tag in ("BIRT", "FAMC", "CHIL", "ADOP", "PEDI"):
self.do_child_fk(idx, num, tag, value)
elif tag in ("FAMS", "HUSB", "WIFE"):
self.do_partner_fk(idx, num, tag, value)
elif tag in ged.ADDRESS_TAGS:
self.do_address(idx, num, tag, value)
elif value.startswith("@") and value.endswith("@"):
self.read_foreign_key_line(idx, num, tag, value)
else:
if tag in ("CONC", "CONT"):
return
elif value == "null":
return
else:
pass
self.do_function[tag](idx, num, tag, value)
def read_foreign_key_line(self, idx, num, tag, value):
fk = int("".join(c for c in value if c.isdigit()))
self.FKS[tag](idx, num, tag, fk)
self.lines[idx] = [num, tag, fk]
def file_custom_tag(self, idx, line):
self.custom_tags[idx] = line
def do_person(self, idx, num, tag, value, more=""):
self.person_id = value
self.persons[value] = {"PRSN_GNDR": "unknown"}
self.lines[idx] = [num, tag, value, more]
def do_couple(self, idx, num, tag, value, more=""):
if num == 0:
new_tag = "CUPL"
self.couple_id = value
self.couples[value] = {"PRSN1_FK": None, "PRSN2_FK": None}
elif value.startswith("@"):
print("line", looky(seeline()).lineno, "tag, value:", tag, value)
self.lines[idx] = [num, new_tag, value, more]
def do_media(self, idx, num, tag, value, more=""):
""" OBJE, FILE, FORM, MEDI:
1) Legacy use of SOUR.MEDI:
0 @36@ SOUR
1 MEDI Email
2) source_repository_citation
Why is a source type being linked to a locator (call number/CALN)?
0 @6@ SOUR
1 REPO @7@
2 CALN 13B-1234.01
3 MEDI Microfilm
3) multimedia_record
0 @O1@ OBJE
1 FILE C:/throat singing compilation by simon maxwell-stewart.mp3
2 FORM MP3
3 TYPE audio
2 TITL Throat Singing
4) multimedia_link 5.5.1
0 @I44@ INDI
1 EVEN
2 OBJE
3 FILE D:/INVENTORS/Floyd Teal interview from CD/SideApt1.mp3
4 FORM MP3
5 MEDI audio
3 TITL Floyd Teal Interview
5) multimedia_link 5.5
0 @I34@ INDI
1 BIRT
2 OBJE
3 FORM pdf
4 MEDI book
3 TITL Asa C Brown Family Bible
3 FILE C:/Users/Lutherman\Documents/Crosby family bible.pdf
To differentiate media type from source type (since there is some
valid overlap): the source pre-exists the media, which is any
representation of the source that conveys or tries to convey the
source's existence, its appearance, its content, etc.
"""
if num != 0:
supertag, superval = superior[num-1][0:2]
if tag == "MDIA": # primary record zero line
if self.media.get(value) is None:
self.media_id = value
self.media[value] = {
"MDIA_FILE": "", "MDIA_TYPE_FK": None, "MDIA_TITLE": "",
"MDIA_EXTN": ""}
elif tag == "MDIA_FILE":
if superval != "null":
self.media_id = int("".join(c for c in superval if c.isdigit()))
self.media[self.media_id]["MDIA_FILE"] = value
else:
self.media_id += 1
self.media[self.media_id] = {
"MDIA_FILE": value, "MDIA_TYPE_FK": None, "MDIA_TITLE": "",
"MDIA_EXTN": ""}
# Handle vendors who use SOUR.MEDI in a source record for source type:
elif tag == "MDIA_TYPE_STRG" and num == 1:
source_id = int("".join(c for c in superval if c.isdigit()))
if value in self.types["source_types"]:
source_type_id = self.types["source_types"][value]
else:
source_type_id = max(self.types["source_types"].values()) + 1
self.types["source_types"][value] = source_type_id
self.sources[source_id]["SORC_TYPE_FK"] = source_type_id
elif tag == "MDIA_TYPE_STRG" and supertag != "LCTR_STRG":
if value in self.types["media_types"]:
media_type_id = self.types["media_types"][value]
else:
media_type_id = max(self.types["media_types"].values()) + 1
self.types["media_types"][value] = media_type_id
self.media[self.media_id]["MDIA_TYPE_FK"] = media_type_id
elif tag == "MDIA_TYPE_STRG" and supertag == "LCTR_STRG":
# SOUR.REPO.CALN.MEDI
self.exceptions["missing_citations"] = (idx, num, tag, value)
elif tag == "MDIA_EXTN":
self.media[self.media_id]["MDIA_EXTN"] = value
elif tag == "MDIA_TITLE":
self.media[self.media_id]["MDIA_TITLE"] = value
self.lines[idx] = [num, tag, value, more]
def do_contact(self, idx, num, tag, value):
""" 0 @pk@ SUBM
NAME value
ADDR value
CONT value
ADR1 value
ADR2 value
ADR3 value
CITY value
STAE value
POST value
CTRY value
PHON value
EMAIL value
FAX value
WWW value # PAF used URL
LANG value
OBJE @fk@ (or:)
OBJE
FILE value
FORM value
MEDI value
TITL value
"""
self.contacts[value] = dict(ged.BLANK_CONTACTS)
self.lines[idx] = [num, tag, value]
def do_address(self, idx, num, tag, value):
""" GEDCOM imports site addresses which are linked to genealogy events
with ADDR instead of PLAC, so event addresses could be stored in
UNIGEDS' contact table which was not designed for that purpose.
Instead, the ADRS tag when subordinate to an event will be sent to
the interactive exception report where the user will be asked to
link the street address to the genealogical place, and if there is
other information besides just an address line, the user will be
able to 1) discard the other info such as irrelevant zip codes etc.,
2) store it in a note, or 3) store it in Treebard's contacts feature.
"""
contact_id = None
supertag, superval = superior[num-1][0:2]
if tag == "EVNT_ADRS":
if self.exceptions.get("event_address_parts") is None:
self.exceptions["event_address_parts"] = {}
self.exceptions["event_address_parts"][value] = {}
self.current_address = value
elif tag in ged.ADDRESS_TAGS and self.primary_tag in ("CUPL", "PRSN"):
self.exceptions["event_address_parts"][self.current_address][tag] = value
elif tag == "RPST_ADRS":
contact_id = max(list(self.contacts.keys())) + 1
repository_id = int("".join(c for c in superior[0][1] if c.isdigit()))
self.address_block = value.split("\n")
if self.contacts.get(contact_id) is None:
self.contacts[contact_id] = dict(ged.BLANK_CONTACTS)
self.contacts[contact_id]["CNTCT_ADDRESS"] = value
self.links_links_id += 1
self.links_links[self.links_links_id] = {}
self.links_links[self.links_links_id]["CNTCT_FK"] = contact_id
self.links_links[self.links_links_id]["RPST_FK"] = repository_id
elif tag == "CNTCT_ADDRESS":
contact_id = int("".join(c for c in superior[0][1] if c.isdigit()))
self.address_block = value.split("\n")
self.contacts[contact_id]["CNTCT_ADDRESS"] = value
else:
contact_id = int("".join(c for c in superior[0][1] if c.isdigit()))
key = ged.ADDRESS_TAGS[tag]
if value in self.address_block:
indx = self.address_block.index(value)
new_text = f"{key}: {value}"
self.address_block[indx] = new_text
self.contacts[contact_id]["CNTCT_ADDRESS"] = "\n".join(
self.address_block)
elif tag in ("PHON", "FAX", "EMAIL", "WWW", "URL"):
self.contacts[contact_id][f"CNTCT_{key.upper()}"] = value
self.lines[idx] = [num, tag, value]
def do_contact_details(self, idx, num, tag, value):
superval = superior[num-1][1]
contact_id = int("".join(c for c in superval if c.isdigit()))
for tg in ("CNTCT_NAME", "CNTCT_LANGUAGE"):
if tag == tg:
self.contacts[contact_id][tag] = value
self.lines[idx] = [num, tag, value]
def do_contact_fk(self, idx, num, tag, value):
""" The GEDCOM specs don't explain why a SUBM pointer would link to an
individual or a family. INDI.SUBM and FAM.SUBM are considered to be
deprecated even by some of GEDCOM's greatest fans.
UNIGEDS recognizes the difference between a repository which
provides a source without an intermediary contact person, and a
submitter or contact person who provides a source and takes the
place of a repository. The following construct is used (imported
and exported) by UNIGEDS, although it is not found in the specs:
0 @I1@ INDI
1 NAME Bill /Teal/
1 BURI
2 SOUR @S12@
3 PAGE Photo of Bill Teal's Grave by William A. Stratton
...
0 @S2@ SUBM
1 NAME William A. Stratton
...
0 @S12@ SOUR
1 SUBM @S2@
1 TITL Grave photos by William A. Stratton
"""
if tag in ("ANCI", "DESI"):
if self.exceptions.get("researcher_interests") is None:
self.exceptions["researcher_interests"] = []
self.exceptions["researcher_interests"].append((idx, num, tag, value))
return
source_fk = int("".join(c for c in superior[num-1][1] if c.isdigit()))
self.sources[source_fk]["CNTCT_FK"] = value
self.lines[idx] = [num, "CNTCT_FK", value]
def do_repository(self, idx, num, tag, value, more=""):
self.repositories[value] = {"RPST_NAME": ""}
self.lines[idx] = [num, tag, value, more]
def do_repository_name(self, idx, num, tag, value, more=""):
repository_id = int("".join(c for c in superior[0][1] if c.isdigit()))
self.repositories[repository_id]["RPST_NAME"] = value
self.lines[idx] = [num, tag, value, more]
def do_event(self, idx, num, tag, value, more=""):
""" If the value is "Y" and there's no date or place, it indicates the
event is known to have taken place but there is no further
information. The Y value should not be used when there's a DATA or
PLAC subordinate line. If a line with a Y value has subordinates,
it's being used wrong per specs.
Usually there is no 3rd item (value) on an EVEN (event) line, and
the subordinate line TYPE names the event's type, e.g. "won the
Nobel Peace Prize for replacing GEDCOM with something useful". But
just to keep things interesting, the specs allow a value here. If
this is used correctly, then this value should fit into UNIGEDS as
a "particulars" (details) value. Here's the example from the specs:
1 EVEN Appointed Zoning Committee Chairperson
2 TYPE Civic Appointments
2 DATE FROM JAN 1952 TO JAN 1956
2 PLAC Cove, Cache, Utah
These lines are used oppositely when a pre-defined event or attribute
tag occurs. Since the type is denoted by the superior tag, the TYPE
tag contains the details (event particulars or EVNT_DETL in UNIGEDS):
1 GRAD
2 TYPE College
A Y value can also be used on a pre-defined event tag:
1 DEAT Y
Why would someone want to say that Joe is no longer living when this
assertion can't be associated with a date? Joe is no longer living...
when??? But the Y value is supposed to be used only where there is
no other data about the event. Just because I can't think of a reason,
that doesn't mean that no such reason exists.
"""
self.event_id += 1
particulars = ""
event_type_ids = []
event_types = []
for tp, idnum in self.types["event_types"].items():
event_type_ids.append(idnum)
event_types.append(tp)
event_type, particulars = more, value
if particulars is None:
particulars = ""
elif particulars == "Y":
if ged.Y_VALUES.get(tag):
particulars = ged.Y_VALUES[tag]
else:
if self.exceptions.get("unknown_Y_values") is None:
self.exceptions["unknown_Y_values"] = []
self.exceptions["unknown_Y_values"].append(
(self.event_id, particulars, event_type))
return
if event_type in event_types:
indx = event_types.index(event_type)
event_type_id = event_type_ids[indx]
elif event_type == "unknown":
event_type_id = None
else:
ged.MAX_TYPE_IDS["event_type"] += 1
event_type_id = ged.MAX_TYPE_IDS["event_type"]
self.events[self.event_id] = {
"EVNT_TYPE_FK": event_type_id,
"EVNT_AGE": "", "EVNT_AGE1": "", "EVNT_AGE2": "",
"PLACE_NEST_FK": None, "EVNT_DETL": particulars,
"EVNT_DATE": "-0000-00-00-------", "EVNT_DATE_SORT": "0,0,0"}
if self.primary_tag == "PRSN":
person_id = int("".join(c for c in superior[0][1] if c.isdigit()))
self.events[self.event_id]["PRSN_FK"] = person_id
elif self.primary_tag == "CUPL":
couple_id = int("".join(c for c in superior[0][1] if c.isdigit()))
self.events[self.event_id]["CUPL_FK"] = couple_id
tag = "EVNT"
self.lines[idx] = [num, tag, value, more]
def do_child_fk(self, idx, num, tag, value):
""" From the 5.5.1 specs:
0 @3@ INDI
1 BIRT
2 DATE 11 JUN 1861
2 FAMC @4@ --- allowed in 5.5
1 FAMC @4@
1 FAMC @9@
2 PEDI Adopted
1 ADOP
2 FAMC @9@
3 ADOP HUSB/WIFE/BOTH
2 DATE 16 MAR 1864
...
0 @4@ FAM
1 CHIL @3@
...
0 @9@ FAM
1 CHIL @3@
"""
def save_data():
self.events[self.event_id] = {
"EVNT_TYPE_FK": event_type_id, "PRSN_FK": person_id,
"CUPL_FK": parents_id, "EVNT_AGE": "", "EVNT_AGE1": "", "EVNT_AGE2": "",
"PLACE_NEST_FK": None, "EVNT_DETL": particulars,
"EVNT_DATE": "-0000-00-00-------", "EVNT_DATE_SORT": "0,0,0"}
# Default event type is birth, but PEDI line could change it later.
parents_id = None
person_id = int("".join([c for c in superior[0][1] if c.isdigit()]))
event_type_id = self.types["event_types"]["birth"]
supertag, superval = superior[num-1][0:2]
particulars = ""
who_adopted = "BOTH"
if value != "null" and value.startswith("@") is False and tag != "PEDI":
particulars = value
event_type = ged.ALT_BIRTH_EVENT_TYPES[tag]
if tag == "PEDI":
event_type = ged.PEDI_TAG_EVENT_TYPES[value]
event_type_id = self.types["event_types"][event_type]
parents_id = int("".join(c for c in superval if c.isdigit()))
save_data()
elif tag == "CHIL":
couple_id = int("".join(c for c in superior[0][1] if c.isdigit()))
person_id = int("".join(c for c in value if c.isdigit()))
elif tag == "FAMC" and supertag in ("BIRT", "ADOP"):
# INDI.BIRT.FAMC, INDI.ADOP.FAMC
parents_id = int("".join(c for c in value if c.isdigit()))
self.events[self.event_id]["CUPL_FK"] = parents_id
if supertag == "ADOP":
person_id = int("".join(c for c in superior[0][1] if c.isdigit()))
couple_id = int("".join(c for c in value if c.isdigit()))
self.adoptees[couple_id] = person_id
elif tag == "FAMC" and supertag == "PRSN":
self.event_id += 1
event_type_id = self.types["event_types"]["birth"]
parents_id = int("".join(c for c in value if c.isdigit()))
save_data()
elif tag == "BIRT" and value == "null":
self.event_id += 1
event_type = ged.ALT_BIRTH_EVENT_TYPES[tag]
event_type_id = self.types["event_types"][event_type]
self.events[self.event_id] = {}
save_data()
elif tag == "ADOP" and supertag == "FAMC":
# Handle `INDI.ADOP.FAMC.ADOP` separately later.
pass
elif tag == "ADOP" and value == "null":
self.event_id += 1
event_type = ged.ALT_BIRTH_EVENT_TYPES[tag]
event_type_id = self.types["event_types"][event_type]
self.events[self.event_id] = {}
save_data()
self.lines[idx] = [num, tag, value]
def do_partner_fk(self, idx, num, tag, value):
""" 0 @F3@ FAM
0 @F3@ FAM
1 HUSB @I1@
1 WIFE @I12@
1 CHIL @I13@
1 CHIL @I14@
1 CHIL @I15@
1 MARR
2 DATE ABT 1921
2 HUSB
3 AGE 30
"""
couple_id = int("".join(c for c in superior[0][1] if c.isdigit()))
if tag in ("HUSB", "WIFE") and num > 1 and value == "null":
# Handle `HUSB.AGE` and `WIFE.AGE` in `self.do_age()`.
pass
elif tag in ("HUSB", "WIFE"):
partner_id = int("".join(c for c in value if c.isdigit()))
# Record person IDs -- unknown when FAMS created the couple_id:
if tag == "HUSB":
self.couples[couple_id]["PRSN1_FK"] = partner_id
elif tag == "WIFE":
self.couples[couple_id]["PRSN2_FK"] = partner_id
self.lines[idx] = [num, tag, value]
def adjust_for_adoptions(self):
""" 0 @I560@ INDI
1 NAME Trevor /Tewksbury/
1 BIRT
2 FAMC @F88@
1 ADOP
2 FAMC @F321@
3 ADOP WIFE
In the example above, the person was adopted by the wife only. Make
a new couple ID for the wife with a null partner. Don't change the
existing couple. Change the couple ID linked to the adoption event
to the new couple ID.
"""
for idx,lst in self.lines.items():
num, tag, value = lst[0:3]
if tag != "ADOP":
continue
elif tag == "ADOP" and value == "null":
continue
elif value == "BOTH":
continue
elif value in ("HUSB", "WIFE"):
old_couple_id = int(
"".join(c for c in self.lines[idx-1][2] if c.isdigit()))
new_couple_id = max(list(self.couples.keys())) + 1
if value == "HUSB":
parent_id = self.couples[old_couple_id]["PRSN1_FK"]
self.couples[new_couple_id] = {
"PRSN1_FK": parent_id, "PRSN2_FK": None}
elif value == "WIFE":
parent_id = self.couples[old_couple_id]["PRSN2_FK"]
self.couples[new_couple_id] = {
"PRSN1_FK": None, "PRSN2_FK": parent_id}
event_type_id = self.types["event_types"]["adoption"]
adoptee_id = self.adoptees[old_couple_id]
for event_id, vals in self.events.items():
if vals.get("PRSN_FK") is None:
continue
elif (vals["PRSN_FK"] == adoptee_id and
vals["EVNT_TYPE_FK"] == event_type_id):
adoption_event_id = event_id
break
self.events[adoption_event_id]["CUPL_FK"] = new_couple_id
# are these methods doing anything? are the lines needed in self.lines?
def do_person_fk(self, idx, num, tag, value):
self.lines[idx] = [num, "PRSN_FK", value]
def do_remark_fk(self, idx, num, tag, value):
if len(self.links_links) != 0:
self.links_links_id = max(list(self.links_links.keys())) + 1
else:
self.links_links_id = 1
linked_tag, linked_id = superior[num-1][0:2]
if linked_id.startswith("@"):
self.links_links[self.links_links_id] = {
"RMRK_FK": value,
f"{linked_tag}_FK": int("".join(c for c in linked_id if c.isdigit()))}
self.lines[idx] = [num, "RMRK_FK", value]
def do_media_fk(self, idx, num, tag, value):
self.lines[idx] = [num, "MDIA_FK", value]
def do_source_fk(self, idx, num, tag, value):
self.lines[idx] = [num, "SORC_FK", value]
def do_repository_fk(self, idx, num, tag, value):
self.lines[idx] = [num, "RPST_FK", value]
def strip_name_slashes(self, value):
lst = value.rstrip("/").split(" /")
name_string = " ".join(lst)
lst.insert(0, f"{lst.pop()}, ")
return name_string, lst
def make_default_name_sorter(self, value):
sorter0 = value.split()
surname = sorter0.pop()
sorter0.insert(0, f"{surname},")
sorter = " ".join(sorter0)
return sorter
def do_person_name(self, idx, num, tag, value):
""" Use default name type 1 for birth ID. """
self.name_id += 1
name_type_id = self.types["name_types"]["birth name"]
name_string, lst = self.strip_name_slashes(value)
sorter = self.make_default_name_sorter(name_string)
self.names[self.name_id] = {
"NAME_STRG": name_string, "PRSN_FK": self.current_person_id,
"NAME_TYPE_FK": name_type_id, "NAME_SORT": sorter}
self.lines[idx] = [num, tag, value]
def do_name_type(self, idx, num, tag, value):
""" Treat all person labels, such as ID numbers and names, the same way.
There is no reason to not lump them all together in one category.
Handle Genbox `_TYPE` tag when subordinate to NAME.
1 NAME Bob /Neal
2 _TYPE _ALIA
FH makes `_USED` subordinate to NAME and other programs make it
subordinate to `0 @I1@ INDI`. In either case, the name string and
the name type are in the same line, and the zero-line is the person_id.
`IDNO` per specs requires a `TYPE` subordinate line but use the value
anyway if the subordinate line is missing. Give a tentative value
to name_string here. If a TYPE is found, concatenate the type
value to the ID number.
1 IDNO 43-456-1899
2 TYPE Canadian Health Registration
`SSN` should not exist as it's USA-centric, but handle it anyway.
1 SSN 555-55-5555
`ALIA` per specs can be used like this, but this usage is rare &
unnecessary, thus not supported here:
n @XREF:INDI@ INDI
+1 ALIA @<XREF:INDI>@
Anti-spec `ALIA` & equivalent examples from: Louis Kessler,
https://www.beholdgenealogy.com/blog/?p=1074
0 @I1@ INDI
1 NAME Thomas Jacob /Black/
1 ALIA Jack /Black/
Personal Ancestry File (PAF):
1 NAME Thomas Jacob /Black/
2 _AKA Jack /Black/
Brother’s Keeper:
1 NAME Thomas Jacob /Black/
2 _AKAN Jack /Black/
"""
def save_name(sorter=None):
self.name_id += 1
if sorter is None:
sorter = self.make_default_name_sorter(name_string)
self.names[self.name_id] = {
"NAME_STRG": name_string, "PRSN_FK": person_fk,
"NAME_TYPE_FK": name_type_id, "NAME_SORT": sorter}
def check_if_name_type_exists(name_type):
if name_type in self.types["name_types"]:
name_type_id = self.types["name_types"][name_type]
else:
name_type_id = max(self.types["name_types"].values()) + 1
self.types["name_types"][name_type] = name_type_id
return name_type_id
superline = superior[num-1]
supertag = superline[0]
zero_line = superior[0]
person_fk = int("".join(c for c in zero_line[1] if c.isdigit()))
if tag == "NAME_TYPE_STRG":
if value in ged.GEDCOM_NAME_TYPES:
name_type = ged.GEDCOM_NAME_TYPES[value]
else:
name_type = value
name_type_id = check_if_name_type_exists(name_type)
self.names[self.name_id]["NAME_TYPE_FK"] = name_type_id
elif (value.startswith("@") is False and tag == "also_known_as" and
supertag == "PRSN"):
name_string = value
name_type = tag
person_fk = int("".join(c for c in superior[0][1] if c.isdigit()))
name_type_id = check_if_name_type_exists(name_type)
save_name()
elif tag == "_USED":
name_string = value
name_type_id = self.types["name_types"]["usual name"]
save_name()
elif tag == "ID_number":
name_string = f"ID number: {value}"
name_type_id = self.types["name_types"]["ID number"]
save_name(sorter=name_string)
elif tag == "title":
name_string = value
name_type_id = self.types["name_types"]["title"]
save_name(sorter=name_string)
elif tag == "social_security_number":
name_string = f"USA Social Security number: {value}"
name_type_id = self.types["name_types"]["ID number"]
save_name(sorter=name_string)
elif value.startswith("@") and value.endswith("@"):
if self.exceptions.get("per_specs_ALIA") is None:
self.exceptions["per_specs_ALIA"] = []
self.exceptions["per_specs_ALIA"].append((num, tag, value, superline))
elif tag in ("_AKA", "_AKAN"):
name_string = self.strip_name_slashes(value)[0]
name_type_id = self.types["name_types"]["also_known_as"]
sorter = self.make_default_name_sorter(value)
save_name()
elif tag == "_TYPE":
name_type = ged.GENBOX_NAME_TYPES[value]
name_type_id = check_if_name_type_exists(name_type)
self.names[self.name_id]["NAME_TYPE_FK"] = name_type_id
self.lines[idx] = [num, tag, value]
def make_role_person(self, value):
self.person_id += 1
self.name_id += 1
sorter = self.make_default_name_sorter(value)
self.persons[self.person_id] = {"PRSN_GNDR": "unknown"}
self.names[self.name_id] = {
"NAME_STRG": value, "PRSN_FK": self.person_id,
"NAME_TYPE_FK": 1, "NAME_SORT": sorter}
self.names[self.name_id]["NAME_STRG"] = value
return self.person_id
def do_transcription(self, idx, num, tag, value):
self.transcription_id += 1
name_id = None
nested_place_id = None
supertag = superior[num-1][0]
if supertag == "NAME_STRG":
name_id = self.name_id
fk = "NAME_FK"
fk_value = name_id
elif supertag == "PLACE_NEST_STRG":
nested_place_id = self.nested_place_id
fk = "PLACE_NEST_FK"
fk_value = nested_place_id
if tag.endswith("_R"):
self.transcriptions[self.transcription_id] = {
"romanized": 1, "TRNSCRPN_STRG": value, fk: fk_value,
"TRNSCRPN_TYPE_FK": None}
elif tag.endswith("_P"):
self.transcriptions[self.transcription_id] = {
"phonetic": 1, "TRNSCRPN_STRG": value, fk: fk_value,
"TRNSCRPN_TYPE_FK": None}
self.lines[idx] = [num, tag, value]
def do_transcription_type(self, idx, num, tag, value):
def check_if_transcription_type_exists(transcription_type):
if transcription_type in self.types["transcription_types"]:
transcription_type_id = self.types["transcription_types"][transcription_type]
else:
transcription_type_id = max(self.types["transcription_types"].values()) + 1
self.types["transcription_types"][transcription_type] = transcription_type_id
return transcription_type_id
if value in ged.GEDCOM_TRANSCRIPTION_TYPES:
transcription_type = ged.GEDCOM_TRANSCRIPTION_TYPES[value]
else:
transcription_type = value
transcription_type_id = check_if_transcription_type_exists(transcription_type)
self.transcriptions[self.transcription_id]["TRNSCRPN_TYPE_FK"] = transcription_type_id
self.lines[idx] = [num, tag, value]
def do_event_type(self, idx, num, tag, value):
particulars = ""
supertag, superval = superior[num-1]
if supertag == "unknown":
self.event_type = value.lower()
particulars = superval
elif supertag in ged.EVENT_TYPES:
particulars = value
elif supertag == "ROMN":
new_tag = "TRNSCRPN_TYPE"
elif supertag == "FONE":
new_tag = "TRNSCRPN_TYPE"
if supertag in ("ROMN", "FONE"):
pass
elif self.primary_tag in ("INDI", "FAM"):
new_tag = ged.VAGUE_TAGS["TYPE"][self.primary_tag]
else:
new_tag = tag
if self.event_type in self.types["event_types"]:
event_type_id = self.types["event_types"][self.event_type]
else:
event_type_id = max(self.types["event_types"].values()) + 1
self.types["event_types"][self.event_type] = event_type_id
self.events[self.event_id]["EVNT_TYPE_FK"] = event_type_id
self.events[self.event_id]["EVNT_DETL"] = particulars
self.lines[idx] = [num, new_tag, value]
def do_event_date(self, idx, num, tag, value):
""" Handle these, and some combinations of them, case-insensitively:
1 OCCU farmer
2 DATE 1917
1 DEAT
2 DATE JUL 1970
1 OCCU shoe shop manager
2 DATE 7 APR 1930
1 RESI
2 DATE ABT 1917
1 DEAT apoplexy
2 DATE EST 12 DEC 1927
1 EVEN married
2 TYPE Marital Status
2 DATE CAL 1927
1 EVEN
2 TYPE Invention
2 DATE BEF 1934
1 RESI
2 DATE AFT 1900
2 PLAC Precinct 4, Fannin County, Texas, United States of America
1 RESI
2 DATE BET 1925 AND 1927
1 OCCU farmer
2 DATE FROM 9 MAR 1875 TO 14 OCT 1927
"""
compound_date, link = ged.split_compound_dates(value)
for idx, date_input in enumerate(compound_date):
if date_input is None:
return
if idx == 0:
year, month, day, slots = ged.get_date_parts(date_input, idx)
elif idx == 1:
year, month, day, slots = ged.get_date_parts(date_input, idx, slots=slots)
slots[5] = link.strip()
date_is_bad, month = ged.validate_date(year, month, day)
if idx == 0:
sorter = ged.make_date_sorter(year, month, day)
pos = {0: [1, 2, 3], 1: [7, 8, 9]}
if date_is_bad:
if self.exceptions.get("bad_dates") is None:
self.exceptions["bad_dates"] = []
self.exceptions["bad_dates"].append((idx, num, tag, value))
else:
slots[pos[idx][0]] = year
slots[pos[idx][1]] = month
slots[pos[idx][2]] = day
default_storable_date = "-".join(slots)
self.events[self.event_id]["EVNT_DATE"] = default_storable_date
self.events[self.event_id]["EVNT_DATE_SORT"] = sorter
self.lines[idx] = [num, tag, value]
def do_assertion_made_date(self, idx, num, tag, value):
""" INDI.NAME.SOUR.DATA.DATE
INDI.EVEN.SOUR.DATA.DATE
INDI.BURI.SOUR.DATA.DATE, etc.
This tag is subordinate to SOUR instead of PAGE, so it is useless
since it refers to the data an assertion was made, as shown by the
cited portion of the source. The user might get a chance to tell us
which citation or assertion is being referred to, but the tag is so
seldom used by vendors that it doesn't seem worthwhile to bother
with it.
"""
if self.exceptions.get("assertion_made_date_not_linked_to_citation") is None:
self.exceptions["assertion_made_date_not_linked_to_citation"] = []
self.exceptions["assertion_made_date_not_linked_to_citation"].append((
idx, num, tag, value))
def do_change_time(self, idx, num, tag, value):
self.lines[idx] = [num, tag, value]
def do_gender(self, idx, num, tag, value):
spelled = {"f": "female", "m": "male", "u": "unknown"}
gender = spelled[value.lower()]
person_id = int("".join(c for c in superior[0][1] if c.isdigit()))
self.persons[person_id]["PRSN_GNDR"] = gender
self.lines[idx] = [num, tag, gender]
def do_age(self, idx, num, tag, value):
supertag = superior[num-1][0]
if supertag in ged.EVENT_TYPES.values():
self.events[self.event_id]["EVNT_AGE"] = value
elif supertag == "PRSN1_FK":
self.events[self.event_id]["EVNT_AGE1"] = value
elif supertag == "PRSN2_FK":
self.events[self.event_id]["EVNT_AGE2"] = value
self.lines[idx] = [num, tag, value]
def do_reference_number(self, idx, num, tag, value):
""" RE: REFN, RIN, AFN and the like.
Why do we need to preserve numbering systems from prior data
structures? I'll get back to this after doing more research.
"""
pass
def do_surety(self, idx, num, tag, value):
self.lines[idx] = [num, tag, value]
def do_author(self, idx, num, tag, value):
source_fk = int("".join(c for c in superior[num-1][1] if c.isdigit()))
self.sources[source_fk]["SORC_ATHR"] = value
self.lines[idx] = [num, "SORC_ATHR", value]
def do_publication(self, idx, num, tag, value):
source_fk = int("".join(c for c in superior[num-1][1] if c.isdigit()))
self.sources[source_fk]["SORC_PBLN"] = value
self.lines[idx] = [num, "SORC_PBLN", value]
def do_locator(self, idx, num, tag, value):
""" SEE Tamura Jones 5.5.1 annotated specs, he thinks AFN is
superfluous/obsolete.
"""
self.lines[idx] = [num, tag, value]
def do_change_date(self, idx, num, tag, value):
table, fk = superior[0][0:2]
element_id = int("".join(c for c in fk if c.isdigit()))
if self.change_dates[table].get(element_id) is None:
self.change_dates[table][element_id] = {"ALTR_DATE": "", "ALTR_TIME": ""}
if tag == "ALTR_DATE":
self.change_dates[table][element_id]["ALTR_DATE"] = value
elif tag == "ALTR_TIME":
self.change_dates[table][element_id]["ALTR_TIME"] = value
self.lines[idx] = [num, tag, value]
def do_project(self, idx, num, tag, value):
""" SUBM.ANCI/DESI Interest in ancestors or descendants of the contact
person referenced by the SUBM pointer/FK? Deprecated for now.
"""
self.lines[idx] = [num, tag, value]
# ******************************** ROLES ************************************#
def do_role(self, idx, num, tag, value):
""" ASSO.RELA is the only 5.5.1 specs-prescribed way to do roles,
but not the only way that roles are being done. This is because
ASSO.RELA doesn't work for roles, since it isn't linked
to an event but to an individual. So you could say Jane had a
biographer Sally but there's no way to link to Sally's
event "wrote a biography" or Jane's event "someone wrote her
biography." Therefore, ASSO.RELA should not be used for roles
such as "flower girl". Flower girl... when? where? _WITN._ROLE
should be used for roles.
"""
if tag in ("WITN", "_WITN"):
self.do_witn_role(idx, num, tag, value)
elif tag in ("_ROLE", "ROLE") and self.witn:
self.do_witn_role(idx, num, tag, value)
elif tag in ("_SHAR", "_SHAN", "ROLE"):
self.do_shar_shan_role(idx, num, tag, value)
elif tag == "ROLE" and self.fh_shared_event:
# elif tag == "ROLE" and self.witn:
self.do_shar_shan_role(idx, num, tag, value)
elif tag == "ROLE" and self.witn is False:
self.do_source_type_role(idx, num, tag, value)
elif tag in ("ASSO", "RELA"):
self.do_asso_role(idx, num, tag, value)
else:
self.do_shar_shan_role(idx, num, tag, value)
def do_witn_role(self, idx, num, tag, value):
""" GEDCOM specs removed `WITN` with v. 5.5.1 but it's still used.
Handle `BIRT./DEAT./etc._WITN._ROLE`, a Genbox custom tag.
1 BURI
2 _WITN @I6@
3 _ROLE pallbearer
"""
if tag == "_WITN":
self.witn = True
self.role_person_id = int("".join(c for c in value if c.isdigit()))
elif tag in ("ROLE", "_ROLE") and self.witn is False:
return
elif tag in ("ROLE", "_ROLE"):
role_type_id = self.get_role_type(value)[1]
self.witn = False
self.links_links_id += 1
self.links_links[self.links_links_id] = {}
self.links_links[self.links_links_id]["PRSN_FK"] = self.role_person_id
self.links_links[self.links_links_id]["ROLE_TYPE_FK"] = role_type_id
self.links_links[self.links_links_id]["EVNT_FK"] = self.event_id
self.lines[idx] = [num, tag, value]
def do_shar_shan_role(self, idx, num, tag, value):
""" `_SHAN.ROLE` has a string value (a person's name).
`_SHAR.ROLE` has a foreign key value (a person's ID).
Handle `MARR./DIV./etc._SHAR/_SHAN.ROLE`, a Family Historian custom
tag for couple events.
Ignore `RESI./EMIG./etc._SHAR/_SHAN.ROLE` if the role person is
redundant.
Valid Examples:
1 WILL
2 _SHAN William James Smith
3 ROLE Executor
1 PROB
2 _SHAR @I9@
3 ROLE Beneficiary
...
1 EVEN
2 TYPE Invention
2 _SHAN Frank O. Parker
3 ROLE Patent Attorney
Handle these roles (based on FH's including residence,
emigration, etc. as "shared events".)
0 @I104@ INDI
1 EMIG
2 _SHAR @I105@
3 ROLE Emigrant
1 RESI
2 _SHAR @I105@
3 ROLE Resident
But if an individual listed in a shared event is the current
individual whose INDI record these lines are in, ignore the
redundant value. For example, handle the FKs except for @I1@
here:
0 @I1@ INDI
1 NAME Anthony Edward /Munro/
1 RESI
2 PLAC Cheltenham, Gloucestershire, England
2 _SHAR @I1@
3 ROLE Resident
1 RESI
2 PLAC Lewisham, London, England
2 _SHAR @I1@
3 ROLE Resident
2 _SHAR @I3@
3 ROLE Resident
1 RESI
2 DATE FROM MAY 1957 TO 1965
2 PLAC Clifton Wood, Bristol, England
2 _SHAR @I1@
3 ROLE Resident
2 _SHAR @I3@
3 ROLE Resident
2 _SHAR @I8@
3 ROLE Resident
2 _SHAR @I6@
3 ROLE Resident
But ("resident", "emigrant", "immigrant") are not valid roles
because the individuals referenced by the foreign key pointers
are primary participants in the events "residence",
"emigration", and "immigration". So the validity of the role
has to be tested against ("resident", "emigrant", "immigrant"),
and if it's already one of these event types, the individual has
to have an event created for him, instead of a role. See
`self.get_role_type()` and `self.make_event()`.
"""
if tag == "_SHAN":
self.fh_shared_event = True
self.role_person_id = self.make_role_person(value)
elif tag == "_SHAR":
self.fh_shared_event = True
self.role_person_id = int("".join(c for c in value if c.isdigit()))
elif tag == "ROLE_TYPE_STRG":
if self.current_person_id == self.role_person_id:
self.fh_shared_event = False
return
make_role, role_type_id = self.get_role_type(value)
self.fh_shared_event = False
if make_role is False:
return
self.links_links_id += 1
self.links_links[self.links_links_id] = {}
self.links_links[self.links_links_id]["PRSN_FK"] = self.role_person_id
self.links_links[self.links_links_id]["ROLE_TYPE_FK"] = role_type_id
self.links_links[self.links_links_id]["EVNT_FK"] = self.event_id
self.lines[idx] = [num, tag, value]
def do_asso_role(self, idx, num, tag, value):
""" `ASSO.RELA` refers to a submitter's relationship to the person in the
INDI record if the ASSO line pointer is a submitter FK:
0 @I1@ INDI
1 NAME Joe /Jacob/
1 ASSO @<XREF:SUBM>
2 RELA great grandson
`INDI.ASSO.RELA` refers to any person in the tree who has a primary
ID and is referenced by a pointer in the ASSO line and the role type
in the RELA line:
0 @I1@ INDI
1 NAME Fred/Jones/
1 ASSO @<XREF:INDI>
2 RELA Godfather
Participation in an event seems to be implied in the next examples
from Kessler but the ASSO and the BIRT are siblings so it's just
implied. Ignore the implication and link it to the person as per
specs?... i.e. ALL THREE OF THESE USAGES are the same, and the FK can
either be a person_id (INDI) or a contact_id (SUBM).
1 BIRT
2 DATE 3 AUG 1780
1 ASSO @I4@
2 RELA Saw birth
1 ASSO @I3@
2 RELA Godmother
ASSO.RELA is not useful for linking events to roles, it is only
useful for relationships, which are linked person to person. Roles
need to be linked to events. The implied link to events in the last
example above can't be relied on, since the ASSO.RELA's following
its sibling BIRT event could be a coincidence.
Using a single tag for multiple purposes is bad
programming. It doesn't help that GEDCOM is a text file, not
a computer program. There is no way to say for sure that @S9@
points to a SUBM record. It could be a SOUR record. (For that matter,
@I44@ doesn't have to be an INDI record.) Once again, we are made to
guess and figure things out because of the GEDCOM specs' completely
unnecessary, bull-headedly stubborn, and wrong-headed insistence
on continuing to give two or more meanings to many single tags. Are
they still doing this? I'm afraid to look into that. A glance at the
generally unaccepted GEDCOM 7 specs will be all it takes to assure
any sane programmer who is not a certified apologist for GEDCOM
that GEDCOM's creators are not solving problems by simplifying
anything, but rather by making things more complicated. I have no
doubt that they are doing their best. But doing their best to
preserve GEDCOM's role, undeserved as it is, as the so-called
standard of file transfer in genealogy. The time has come to
adopt a universal relational database structure for genealogy
programs, and put an end to this text-file-transfer charade.
Vendors who want to export adjunct roles in GEDCOM (such as the
officiater at a wedding) should use the WITN tag that was part of
the 5.5 specs, but dropped in 5.5.1. No one likes custom tags, but
everyone hates ASSO.RELA.
Based on not being able to find any GEDCOM exporters actually using
ASSO.RELA, and on other problems mentioned above--especially the
obvious fact that an adjunct role in an event has to be linked to
that event--UNIGEDS import can realistically only support the ASSO
tag by sending any occurrence of this tag to the exceptions report.
There, the user will be asked for specific input about what event,
what role, what person, etc. is supposed to be associated with the
current individual.
"Witnesses must be associated with the event they witnessed. This
_ASSO record approach fails to associate witnesses with an event
and because of that, cannot even tell you which marriage someone
witnessed. It is best to use some vendor-defined _WITN record on
the event itself." --Tamura Jones, The FamilySearch GEDCOM 5.5.1
Specification Annotated Edition, page 64. That quote refers to the
custom tag _ASSO. On p. 121-122 of the same reference, Mr. Jones
tries to defend the existence of the real ASSO tag, but he does it
by detailing all the ways the specs' description of this tag are
wrong. The only reasonable way to deal with a tag fiasco as deeply
in a rut as this one (since it's a tag that no one uses), is to not
use it.
"""
if self.exceptions.get("eventless_roles") is None:
self.exceptions["eventless_roles"] = []
self.exceptions["eventless_roles"].append((idx, num, tag, value))
def do_source_type_role(self, idx, num, tag, value):
""" Role values can be:
[ CHIL | HUSB | WIFE | MOTH | FATH | SPOU | "other--user defined" ]
1 BIRT
2 SOUR @6@
3 PAGE Sec. 2, p. 45
3 EVEN BIRT
4 ROLE CHIL
The above example from the specs doesn't reveal the purpose of this
tag, so I added this one which does:
1 BIRT
2 SOUR @6@
3 PAGE Sec. 2, p. 45
3 EVEN BIRT
4 ROLE FATH
The purpose of this tag is to denote EVENT_TYPE_CITED_FROM. For
example, the source in the examples could be a birth certificate,
and that's what is meant by the line `3 EVEN BIRT`. Then
`4 ROLE FATH` reveals that the data found in the source is the
identity of the father, even though the certificate is not a
fatherhood certificate but a birth certificate. I can't think of a
good reason to handle this tag. No one uses it since it was made
in support of a concept that no genealogy software addresses. This
topic would not be so abstract if anyone brought it to the front
and designed a user interface to deal with this topic directly and
up front. The missing term here is "assertion", i.e. "what the
source says". If the user could enter an assertion about the
father's name and link it to a particular citation, that's the sort
of thing this tag usage seems to be addressing, but what this tag
can't do is force anyone to make a series of mental leaps to figure
out why it exists. UNIGEDS makes assertions a central part of its
design, so if anyone actually used this tag, we'd want to handle it.
Let me know if you ever find some real-world examples of
SOUR.EVEN.ROLE in a GEDCOM file, and which genealogy software
generated that GEDCOM file, and I'll consider doing something
with SOUR.EVEN.ROLE.
"""
pass
# ******************************* PLACES *************************************#
def do_coordinates(self, idx, num, tag, value):
latitude = None
longitude = None
if tag == "PLACE_LTTD":
latitude = value
elif tag == "PLACE_LNGTD":
longitude = value
supertag, superval = superior[num-1][0:2]
if num-2 >= 0:
megaline = superior[num-2]
if len(megaline) == 2:
self.nested_place_string = megaline[1]
elif len(megaline) == 3: # _PLAC with a 4th item on the line (FH)
self.nested_place_string = megaline[2]
for k,v in self.nested_places.items():
if v["nested_place_string"] == self.nested_place_string:
nested_place_id = k
break
smallest_place = self.nested_place_string.split(", ")[0]
smallest_place_id = self.nested_places[nested_place_id]["nest0"]
if latitude:
self.places[smallest_place_id]["PLACE_LTTD"] = latitude
if longitude:
self.places[smallest_place_id]["PLACE_LNGTD"] = longitude
def do_place(self, idx, num, tag, value, more=""):
""" Identify the single place, nested place, and place name elements,
each with its own primary key. Assign `LATI/LONG` coordinates to the
smallest nest in nested place strings, e.g. the coordinates for Santa
Cruz, California, USA are linked only to the single place Santa Cruz.
`_LOC` can be handled by ignoring it since it also uses PLAC.
The `MAP` a.k.a. `PLACE_COORDS` line conveys no information.
With vendors of genealogy software recognizing the need for a
primary place element, this is being accomplished in a variety of
un-database-like ways as shown below. Since the identifiers provided
by the custom tag zero-lines are not referenced in order to provide
any non-redundant information, ignore the identifiers and make all
place IDs in the same way.
Do not import _places that are given an identifier that is not
pointed at in any of the GEDCOM lines. This will prevent all of
Google Maps from being stored in UNIGEDS.
Ahnenblatt and possibly other German vendors:
0 @L7@ _LOC
1 NAME Bremen
1 MAP
2 LATI N53.079278
2 LONG E8.801667
...
1 MARR
2 PLAC Bremen
3 _LOC @L7@
3 MAP
4 LATI N53.079278
4 LONG E8.801667
Family Historian:
0 @P50@ _PLAC Farmborough, Somerset, England
1 MAP
2 LATI N51.342362
2 LONG W2.4875262
...
1 RESI
2 PLAC Farmborough, Somerset, England
Genbox:
0 @P435@ _PLACE
1 NAME Byhalia, Mississippi, United States
1 LAT 34 52 10 N
1 LONG 089 41 17 W
...
1 BIRT
2 PLAC Byhalia, Mississippi, United States
GEDCOM 5.5.1 specs:
0 @I3@ INDI
1 BIRT
2 PLAC Byhalia, Mississippi, United States
3 MAP
4 LATI 34 52 10 N
4 LONG 089 41 17 W
0 @F4@ FAM
1 RESI
2 PLAC Arkadelphia, Clark County, Arkansas, United States of America
3 MAP
4 LATI 34.126476772597925
4 LONG -93.07273371608865
GEDCOM 5.5 specs:
1 BURI
2 PLAC Spring Hill Cem., Stamford, CT
"""
dupes = []
if len(more) != 0:
print("line", looky(seeline()).lineno, "more:", more)
if num == 0:
self.primary_tag = tag
elif tag == "PLACE_NEST_STRG":
self.do_nested_place(idx, num, tag, value)
elif self.primary_tag == "_LOC":
pass
elif tag in ("PLACE_LTTD", "PLACE_LNGTD"):
self.do_coordinates(idx, num, tag, value)
elif tag == "PLACE_TYPE_STRG":
self.do_place_type(idx, num, tag, value)
self.exceptions["duplicate_places"] = dupes
self.lines[idx] = [num, tag, value, more]
def do_place_type(self, idx, num, tag, value):
""" Place type or "jurisdiction" cannot be defined or categorized well
in genealogy software without severely limiting the geographical
domain that the software will accurately represent. The right
strategy is not "draw the line somewhere and then give up halfway,"
but rather, "just don't start". The user can easily input any place
types he wants to us. This is a type and is treated like other types
for users who want the feature, but it isn't a primary feature or
element of genealogy. Place type is covered by GEDCOM's `PLAC.FORM`
tag. Examples of place type: "city", "county", "parish", "Indian
country", "nation", "colony", "settlement", "village", "empire", etc.
"""
def check_if_place_type_exists(place_type):
if place_type in self.types["place_types"]:
place_type_id = self.types["place_types"][place_type]
else:
place_type_id = max(self.types["place_types"].values()) + 1
self.types["place_types"][place_type] = place_type_id
return place_type_id
superline = superior[num-1]
place_type = value
place_type_id = check_if_place_type_exists(place_type)
nested_place_string = superline[1]
for val in self.nested_places.values():
if val["nested_place_string"] == nested_place_string:
place_id = val["nest0"]
break
for k,v in self.places_types.items():
if k.get(place_id):
if v["PLACE_TYPE_FK"] == place_type_id:
break
else:
self.places_types[place_id] = place_type_id
self.lines[idx] = [num, tag, value]
def do_nested_place(self, idx, num, tag, value):
""" Store all unique nested place strings in self.nested_places. Then
during the interactive exceptions report, show the user all the
substring matches, e.g.:
Paris, Texas, USA
Paris, France
Dallas, Texas, USA
The user will have to compare the "Paris" in the first two examples
above, and the "Texas, USA" in the bottom two examples, to let the
program know which are matches and which are unique places. In any
matching substring, the largest possible match is the only one that
has to be handled, for example "Garfield County, Colorado, USA" in
Rifle, Garfield County, Colorado, USA
Eagle, Garfield County, Colorado, USA
If the user says that "Garfield County, Colorado, USA" is the same
place in both matches, all the single places "Garfield County",
"Colorado", and "USA" in the two larger strings will be considered
to be the same.
"""
if superior[num-1][0] == "_PLACE":
self.nested_place_string = value
length1 = len(self.nested_place_counter)
self.nested_place_counter.add(value)
length2 = len(self.nested_place_counter)
if length2 > length1:
self.nested_place_id += 1
current_nested_place_id = self.nested_place_id
# The full nesting string will be used here, even though it won't
# be imported to UNIGEDS, since merging/splitting/dupes will be
# handled in the interactive exception report.
self.nested_places[current_nested_place_id] = dict(ged.NESTING)
self.nested_places[current_nested_place_id]["nested_place_string"] = value
nesting = value.split(", ")
ids = []
for nest in nesting:
self.do_place_name(nest, ids)
keys = list(ged.NESTING.keys())
for indx, place_id in enumerate(ids):
self.nested_places[self.nested_place_id][keys[indx]] = place_id
else:
for nested_place_id, innerdkt in self.nested_places.items():
if value == innerdkt["nested_place_string"]:
current_nested_place_id = nested_place_id
break
self.events[self.event_id]["PLACE_NEST_FK"] = current_nested_place_id
def do_place_name(self, nest, ids):
""" Treat each single place name as if there were only one place by
that name in the world. Splitting and merging will be done during
the interactive exception report.
"""
length1 = len(self.single_place_counter)
self.single_place_counter.add(nest)
length2 = len(self.single_place_counter)
if length2 > length1:
self.place_name_id += 1
place_id = self.do_single_place(nest)
else:
for k,v in self.place_names.items():
if v["PLACE_NAME_STRG"] == nest:
place_id = v["PLACE_FK"]
break
ids.append(place_id)
def do_single_place(self, nest):
self.place_id += 1
self.place_names[self.place_name_id] = {
"PLACE_NAME_STRG": nest, "PLACE_FK": self.place_id}
self.places[self.place_id] = {}
return self.place_id
# ************************************** SOURCES ******************************
def save_citations_assertions(self):
""" The relationships between sources, citations, and assertions are
too complex and too poorly represented in GEDCOM to be handled
amidst other tasks. This needs to be done in isolation. For each
source there can be many citations, but each citation refers to
only one source. For each citation, there can be many assertions,
but each assertion refers to only one citation. So what's needed
is a two-stage nested dictionary with each stage representing a
one-to-many relationship: first source:citation and then the
citation:assertion inside of that.
The resulting `self.sources` dict is copied into `self.exceptions`
since the user will have to be asked to define each ASRTN_STRG as
either a name, date, place, details, age, or role assertion. Any
text not assigned to one of these categories can be assigned to an
assertion note, deleted completely, edited, or left in the exception
report to be handled later.
"""
self.source_fk = None
for idx,lst in self.lines.items():
num, tag, value = lst[0:3]
if tag not in (
"SORC_FK", "CTTN_STRG", "DUD", "ASRTN_STRG", "ASRTN_SRTY"):
continue
elif tag == "SORC_FK":
self.source_fk = value
elif tag == "CTTN_STRG":
if self.citations_filter.get(self.source_fk) is None:
self.citations_filter[self.source_fk] = set()
before = len(self.citations_filter[self.source_fk])
self.citations_filter[self.source_fk].add(value)
after = len(self.citations_filter[self.source_fk])
self.current_citation_id = self.save_citation(
value, after, before)
elif tag == "ASRTN_STRG":
self.save_assertion(value)
elif tag == "ASRTN_SRTY":
self.update_assertion_surety(value)
self.exceptions["undefined_assertions"] = self.sources
def save_citation(self, value, after, before):
if after > before:
self.citation_id += 1
self.sources[self.source_fk]["citations"][self.citation_id] = {}
citation_id = self.citation_id
else:
for k,v in self.sources[self.source_fk]["citations"].items():
if v["CTTN_STRG"] == value:
citation_id = k
break
self.sources[self.source_fk]["citations"][citation_id]["CTTN_STRG"] = value
if self.sources[self.source_fk]["citations"][citation_id].get(
"assertions") is None:
self.sources[self.source_fk]["citations"][citation_id]["assertions"] = {}
return citation_id
def save_assertion(self, value):
self.assertion_id += 1
if self.sources[self.source_fk]["citations"][self.current_citation_id][
"assertions"].get(self.assertion_id) is None:
self.sources[self.source_fk]["citations"][self.current_citation_id][
"assertions"][
self.assertion_id] = {}
self.sources[self.source_fk]["citations"][self.current_citation_id][
"assertions"][
self.assertion_id]["ASRTN_STRG"] = value
def update_assertion_surety(self, value):
self.sources[self.source_fk]["citations"][self.current_citation_id][
"assertions"][self.assertion_id]["ASRTN_SRTY"] = value
def do_source_title(self, idx, num, tag, value):
source_fk = int("".join(c for c in superior[num-1][1] if c.isdigit()))
self.sources[source_fk]["SORC_TITL"] = value
def update_contact_source(self, idx, num, tag, value):
""" A source can be an object (e.g. grave marker), document (e.g. census),
or a person (e.g. someone who sends you photos). Use CNTCT_FK in
a SORC record to indicate sources that are human. Titles of individual
photos in a photo collection can each be a citation.
"""
source_fk = int("".join(c for c in superior[num-1][1] if c.isdigit()))
self.sources[source_fk]["CNTCT_FK"] = value
self.lines[idx] = [num, "CNTCT_FK", value]
def do_source_creator(self, idx, num, tag, value):
""" The tag AGNC represents a citation part. Citations should not be
split into parts by genealogy software. No one uses this tag?
"""
pass
def do_source_abbrev(self, idx, num, tag, value):
""" UNIGEDS doesn't use abbreviations or short versions of names as
such, but source titles longer than what the user wants to see can
be input as notes.
"""
pass
def do_source(self, idx, num, tag, value, more=""):
self.source_id = value
if self.sources.get(self.source_id) is None:
self.sources[self.source_id] = {"SORC_TITL": "", "citations": {}}
new_tag = "SORC"
self.lines[idx] = [num, new_tag, value, more]
# *********************************** ROLES ******************************************
def get_role_type(self, value):
role_type = value.lower()
conn = sqlite3.connect(global_db_path)
cur = conn.cursor()
cur.execute(
"SELECT role_type_id FROM role_type WHERE role_types = ?",
(role_type,))
result = cur.fetchone()
if result:
role_type_id = result[0]
ged.MAX_TYPE_IDS["role_type"] = role_type_id
elif role_type in ("resident", "emigrant", "immigrant"):
self.make_event(value)
return False, None
else:
role_type_id = ged.MAX_TYPE_IDS["role_type"]
role_type_id += 1
self.types["role_types"][role_type] = role_type_id
cur.close()
conn.close()
return True, role_type_id
def make_event(self, value):
for role, event_type in ged.ROLE_TO_EVENT.items():
if value == role:
event_type = event_type
break
conn = sqlite3.connect(global_db_path)
cur = conn.cursor()
cur.execute(
"SELECT event_type_id FROM event_type WHERE event_types = ?",
(event_type,))
event_type_id = cur.fetchone()[0]
cur.close()
conn.close()
self.event_id += 1
self.events[self.event_id] = {
"EVNT_TYPE_FK": event_type_id, "PRSN_FK": self.role_person_id,
"EVNT_AGE": "", "EVNT_AGE1": "", "EVNT_AGE2": "", #"CUPL_FK": None,
"PLACE_NEST_FK": None, "EVNT_DETL": "",
"EVNT_DATE": "", "EVNT_DATE_SORT": "0,0,0"}
# ******************************** RELATIONSHIPS **********************
def get_relationship_types(self):
self.cur.execute(
"SELECT relationship_types, abbrev_rel_types FROM relationship_type")
relationship_types = self.cur.fetchall()
generic_family_role_types = set(i[1] for i in relationship_types)
for role in generic_family_role_types:
self.relationship_labels[role] = []
for tup in relationship_types:
full, abbrev = tup
self.relationship_labels[abbrev].append(full)
def print_results(self):
# print("line", looky(seeline()).lineno, "self.custom_tags:", self.custom_tags)
# print("line", looky(seeline()).lineno, "self.exceptions:", self.exceptions)
# print("line", looky(seeline()).lineno, "self.lines:", self.lines)
# print("line", looky(seeline()).lineno, "self.persons:", self.persons)
# print("line", looky(seeline()).lineno, "self.names:", self.names)
# print("line", looky(seeline()).lineno, "self.couples:", self.couples)
# print("line", looky(seeline()).lineno, "self.events:", self.events)
# print("line", looky(seeline()).lineno, "self.concatenations:", self.concatenations)
# print("line", looky(seeline()).lineno, "self.places:", self.places)
# print("line", looky(seeline()).lineno, "self.place_names:", self.place_names)
# print("line", looky(seeline()).lineno, "self.nested_places:", self.nested_places)
# print("line", looky(seeline()).lineno, "self.sources:", self.sources)
# print("line", looky(seeline()).lineno, "self.repositories:", self.repositories)
# print("line", looky(seeline()).lineno, "self.contacts:", self.contacts)
# print("line", looky(seeline()).lineno, "self.media:", self.media)
# print("line", looky(seeline()).lineno, "self.types:", self.types)
# print("line", looky(seeline()).lineno, "self.remarks:", self.remarks)
# print("line", looky(seeline()).lineno, "self.links_links:", self.links_links)
# print("line", looky(seeline()).lineno, "self.change_dates:", self.change_dates)
# print("line", looky(seeline()).lineno, "self.places_types:", self.places_types)
# print("line", looky(seeline()).lineno, "self.transcriptions:", self.transcriptions)
pass
if __name__ == "__main__":
GedKanDu(import_file)