Yaniv Erlich, a geneticist at Columbia University, was far from surprised at last week’s news that police may have found a serial murderer and rapist, California’s long-sought Golden State Killer, by tapping a public DNA database to match crime scene DNA: Erlich had cautioned in a June 2014 article about genetic privacy, published in Nature Reviews Genetics, that GEDmatch, the website that was reportedly used, could allow for such “genealogical triangulation.” On GEDmatch, people voluntarily supply their own DNA sequences that they obtain through consumer sequencing companies—such as MyHeritage, where Erlich serves as chief science officer—and provide email addresses, which allows presumed relatives to contact each other. In this case, the investigators fished the database with a DNA sequence obtained from a frozen, 37-year-old rape kit used in a murder case attributed to the Golden State Killer.
Police have not yet revealed precise details about how GEDmatch, or other such sites, were used, but Erlich, who was not involved with cracking this decades-old case, spoke with Science about how the suspect’s DNA sequence likely led to his arrest and related privacy issues.
This interview has been edited for brevity and clarity.
Q: How do you think police narrowed down the many matches they found on GEDmatch?
A: I would be surprised if it was more distant than a second cousin—[it was] probably a first cousin because with a second you have too many people. Then they had three choices: no cooperation, just figure out the family tree; contact the relative and make up a story like, “I’m an adoptee and saw you on GEDmatch”; or explain, “We’re the police and you’re not a suspect but you can help us because of your DNA.” Probably the safest thing is to come up with a story and say, “Oh, thank god I found you, let’s meet.” When they meet, police come as a team and say we’re investigating this type of thing, please walk us through your family tree. It’s not very nice to say no. Then if you have 20 people on the tree, it’s quite trivial to go for the one person you’re looking for who is quite old, male, lives in California, and who, some of the victims said, had light colored eyes.
Q: GEDmatch has information from the sex chromosomes (X and Y) and the autosomes (the 22 other chromosomes). You think only the autosomal data are needed?
A: Yes. It’s been reported that one search tried to use the Y chromosome and had a poor match. [Editor’s note: An earlier search of a Y chromosome database, Associated Press reported, mistakenly led police to target another man.] X wouldn’t be very helpful because it goes in different directions: If the closest match was female, the X can go from her mother’s side or her father’s side. You don’t know which line to go from. And it’s only one chromosome. It helps, but it’s much more complicated.
Q: There’s a lot of concern about privacy being compromised here, but people voluntarily put their data into GEDmatch.
A: It’s not like people fully understand the consequences of putting their DNA into a public database. They think, “So many people use the website, so it’s OK.” Or: “Oh, it’s a website for genealogy.” What if it was called Police Genealogy? People wouldn’t do it. We don’t think about everything. We think about the most likely thing.
Q: You’ve shared your own genetic data. Have you learned anything?
A: I found someone who was my fifth cousin, a man in Poland who didn’t know he had Jewish roots. This match was very important to him. My entire family for the first time was going to Poland—me, my father, my grandfather, an aunt, and a cousin—and I told him, “Why don’t you join us?” He came and spent 2, 3 days with us. He later made Aliyah [moved to Israel] and converted to Judaism. Why do people do genealogy and why do we have these databases? Every day we connect to people we want to find, and once in a blue moon, we find there’s a murderer or something.
Q: We’ve both put our DNA into GEDmatch, and we both have Jewish ancestry. Can we compare and see how closely we match?
A: Sure. [Cohen: We each put the other’s “kit” number into GEDmatch with our own kit number and select a graphic display of each chromosome that shows regions that match.] We share something at chromosome 16 since we’re both half Ashkenazi: Every pair of Ashkenazi Jews behaves like fourth cousins. But we don’t share meaningful stuff. Your father and my mother are underrepresented populations in the database. We don’t get many relatives there. [Editor’s note: Cohen’s father was Yemenite, Erlich’s mother Bukhari.]
Q: Do you think this arrest is going to change anything with GEDmatch and other public DNA databases?
A: It’s too soon to say. Let’s wait to see how they did it. But it’s good to have these types of discussions and to respect people’s autonomy. Privacy means different things to different people.