A simplified step by step process of a forensic genealogy investigation.
Step 1
Obtain and Assess DNA Data
Receive the Single Nucleotide Polymorphism (SNP) profile from the DNA sample. The SNP profile is typically generated from a bone or tooth sample using whole-genome sequencing or a SNP array. The profile contains 600,000–700,000 SNPs, compatible with consumer genealogy databases like GEDmatch or FamilyTreeDNA. SNPs provide detailed genetic markers for distant kinship matches, unlike CODIS, which uses STRs (short tandem repeats) for direct matches, the SNP profile allows comparison with public genealogy databases.
Technical Notes: Ensure the DNA file is in a format compatible with target databases (GEDmatch requires specific SNP sets). If the sample is degraded, additional bioinformatics processing (imputation) may be needed to enhance match-ability.
Step 2
Upload DNA to Genealogy Databases
Upload the SNP profile to your platform of choice.
-
GEDmatch: A free platform where users upload consumer DNA test results (e.g., from AncestryDNA, 23andMe). It offers tools like “One-to-Many” matching to identify relatives.
-
FamilyTreeDNA: A commercial database with opt-in law enforcement matching. It provides similar SNP-based matching.
-
Settings: Ensure the profile is flagged as a “research” or “law enforcement” profile to comply with database policies.
These databases contain millions of user profiles, increasing the likelihood of finding genetic relatives. Other platforms like AncestryDNA and 23andMe prohibit law enforcement uploads.
Technical Notes: GEDmatch uses a proprietary algorithm to compare SNPs and estimate shared centiMorgans (cM), a measure of genetic distance. For example: A first cousin shares ~850 cM, while a third cousin shares ~25–100 cM. Upload requires data normalisation to match database SNP panels.
Step 3
Analyse DNA Matches
Review the list of genetic matches to identify close relatives. GEDmatch lists matches with shared cM and estimated relationship (e.g., “2nd–3rd cousin”). FamilyTreeDNA uses a similar interface, with additional chromosome browser tools to confirm shared segments. Prioritise matches sharing >100 cM (likely 2nd cousins or closer) for efficiency.
GEDmatch segment triangulation: Identifies shared DNA segments among multiple matches, indicating a common ancestor.
Leeds Method: A manual clustering technique to group matches by shared ancestry using spreadsheets e.g. Excel, Google Sheets or Proton Docs.
High-cM matches indicate closer relatives, narrowing the search. Triangulation confirms which matches share a common ancestor, reducing false positives.
Technical Notes: A match sharing 200 cM might be a 2nd cousin, 1/64 probability of identical DNA. Use statistical models (Shared cM Project) to estimate relationships, check for inbred populations, which inflates cM values.
Step 4
Build Ancestral Trees for Matches
Construct family trees for the top 5–10 matches, those sharing 100–400 cM. The following data sources can be used for family trees:
Ancestry.com: Access public and private family trees, census records, birth/death certificates, and marriage records.
MyHeritage: Similar to Ancestry, with strong European records.
FamilySearch.org: Free, with extensive historical records.
Newspapers: Obituaries and articles for recent ancestors.
Public Records: Use sites like BeenVerified or PeopleFinders for living relatives.
Start with the match’s known ancestors from public trees or database profiles. Trace backward to great-grandparents, then forward to identify all descendants. Ancestry ThruLines, suggests common ancestors between the match and the unknown individual. MyHeritage Theory of Family Relativity has similar automated ancestor matching. Use good note taking to track lineages and cross-reference matches.
Technical Notes: Building trees identifies the most recent common ancestor (MRCA) among matches, pinpointing the family line of John Doe. Use standardised genealogy formats (GEDCOM) to organise trees. Cross-check records for accuracy, verify a 1940 census record against a birth certificate, account for non-paternity events like adoptions, which may skew DNA relationships.
Step 5
Identify Convergence Points
Find where family trees of multiple matches intersect, identify MRCAs (a great-grandparent couple shared by two 2nd cousins). Trace descendants of the MRCA to build out a reverse genealogy tree, including all living or recently deceased individuals who could be John Doe.
The DNA Painter Shared cM tool confirms relationship probabilities based on cM values. What Are The Odds? (WATO) is a Bayesian tool on DNA Painter to model possible placements of John Doe in the family tree based on cM matches.
Convergence points narrow the candidate pool. For example, if three matches share a great-grandparent couple, John Doe is likely a descendant of that couple.
Technical Notes: WATO uses probabilistic modelling to calculate likelihoods e.g. John Doe is 3 x more likely to be a grandchild than a great-grandchild. Account for missing records or incomplete trees by hypothesising “placeholder” individuals.
Step 6
Narrow Down Candidates
Generate a shortlist of potential identities for John Doe. Filter descendants by age, sex, and location. For example, if John Doe’s remains suggest a male aged 30–50, exclude females and those outside this age range. Cross-reference with missing-person databases like National Missing Persons Coordination Centre, Australian Missing Persons Register and Crime Stoppers Australia.
Use social media like Facebook genealogy groups to find living relatives who might provide context: “Did your uncle go missing in 2010?”. This step refines the candidate list to a manageable number (2–5 individuals) for further investigation.
Technical Notes: Use Boolean search operators on Social Media and Search Engines to refine results. Verify candidate details against forensic data e.g. estimated height, dental records.
Step 7
Confirm Identity with Targeted Testing
Contact living relatives of shortlisted candidates for DNA testing. Reach out discreetly to relatives of a candidate. Request a targeted DNA test for comparison with John Doe’s profile. GEDmatch's One-to-One Tool, confirms close relationships e.g. >1500 cM for siblings. A direct DNA match confirms identity with high certainty, 99.99% probability for parent-child.
Technical Notes: Targeted testing uses STRs or SNPs, depending on the lab. Ensure chain of custody for legal admissibility.
Step 8
Report Findings
Compile a report for law enforcement. Include the family tree, DNA match analysis, and candidate shortlist. Detail the confirmation process and provide a narrative explaining how John Doe was identified e.g. “John Doe is likely John Smith, born 1975, based on a sibling DNA match and corroborating records”.
Use Proton Docs, Word or Google Docs for report writing. The use of a Lucidchart for visualising family trees. A clear, evidence-based report ensures law enforcement can act on the findings.
Technical Notes: Cite all sources including census records, DNA match IDs and include statistical confidence levels (WATO probabilities).
Case Considerations
Degraded DNA: If the SNP profile is incomplete, use advanced bioinformatics (e.g., low-coverage sequencing) to reconstruct usable data. In populations with high intermarriage, cM values may be inflated, requiring adjusted thresholds.
Privacy/Ethics: Adhere to database policies and avoid contacting matches directly without law enforcement approval.
False Positives: Cross-check matches with multiple tools to reduce errors.
Edge Cases: If no close matches (<100 cM) are found, expand to distant matches (20–50 cM), requiring larger, more complex trees. If adoption or non-paternity is suspected, use Y-DNA or mitochondrial DNA via FamilyTreeDNA to trace paternal/maternal lines.
Maintain a chain of evidence for DNA data to ensure legal defensibly.
Example Outcome
After uploading John Doe’s SNP profile to GEDmatch, I find three matches:
A 2nd cousin (220 cM), a 2nd–3rd cousin (150 cM), and a 3rd cousin (80 cM).
Using Ancestry.com, I build trees for each match, identifying a common great-grandparent couple in South Australia, 1920s. Tracing descendants, I shortlist three males aged 30–50, one of whom, John Smith, went missing in 2010.
A sibling DNA test confirms John Smith as John Doe (1800 cM match). The report is submitted to law enforcement, who notify the family.