What are all these "approximate matches" in my search results?
Given the thousand years of history we reenact, and the uneven levels of literacy and calligraphy among those who record information for this database, there's always a certain amount of variation in spelling of people's names. For example, in the original version of this database, Her Excellency Viscountess Catherine Digby of Sherbourne was variously listed as Catherine Digbie of Sherborne, Catherine Digbie of Sherbourne and Catherine Digby of Sherbourne. (If I've got it right at last, it's because of a website in Ealdormere that told me so!) So we need a way to find people by name without any guarantee that their names will be correct.
Geeks among you will assume I've used some sort of Regular Expression or Soundex technique to achieve this, but this underestimates the problem's complexity (or overestimates my abilities). What I did instead was create a list of every name in a "platonic" form. This involved de-accenting accented letters, regularising vowels, and merging some similar letters together. The method is largely arbitrary, but it seems to work.
If we take the three spellings of the good viscountess, we can see how this works.
Recorded Name | Platonic Form |
---|---|
Catherine Digbie of Sherbourne | K.D.N D.GB F S.B.N |
Catherine Digbie of Sherborne | K.D.N D.GB F S.B.N |
Catherine Digby of Sherbourne | K.D.N D.GB F S.B.N |
(C and Q become K; T becomes D; all vowels become "."; H, Y and W are put in the Too Hard Basket and deleted; and all double letters and leading and trailing vowels are stripped out.)
Note how all three spellings give the same result. A search for, for example, "Diggbee" (D.GB) or "Katharine" (K.D.N) will turn up these names in the list of approximate matches, which is what we want.
For more information about the approximate matching algorithm, or to make suggestions on how to improve it, please contact Mortar Herald directly.