Darvin L. Martin

We are all relieved to know a fugitive serial rapist and killer has finally been apprehended, and perhaps a bit amazed this was accomplished using new technology not available at the time the crimes were committed.  The resolution and subsequent sentencing of the case involving the Golden State killer signals a sign of our times, how the easily accessible and readily transmitted information of individuals can be used to curb crime and extradite those involved. Yet this leaves us a bit unsettled because the methods used beg the question of whether data privacy has any meaning, and if any attempt at privacy is simply a fruitless and wasted effort of futility.  Whether we consider the tracking our cell phones, the monitoring our credit card purchases or the dissemination of our genetic data, our privacy appears to be an illusion.  Welcome to 2018.

Two pieces of technology have converged to give us this troubling scenario, a circumstance impossible in 1990 or 2003, or even 2015.  Genetic tests have become cheap and commonplace, and readily affordable to any hobbyist. In addition, one can readily download, copy and transmit millions of bits of data at virtually no cost.  Not only are our genomes readily available if we choose to test, we can copy and send our data files to hundreds of our distant relatives, or to anyone we do not know, in an instant with no apparent effort.  And they, in turn can copy and share these same records with anyone they want, with no apparent consequences.

For those of us who seek to find more information about our ancestors, or attempt to catalogue the intricate details of family connections, or wish to uncover a complex composite ethnic background, comparative DNA studies set us in the direction of our goals. We weigh the potential of sharing our genetic and genealogical data with the reality that our data could be used in ways we never intended—by people who shouldn’t necessarily have access.  And we decide to go for it anyway, because the value of this data to connect families supersedes the concerns that the data will be used by others at odds with our intentions.  However, it is not so easy to convince our disinterested relatives that our intentions are likewise benign, that our data and their data are appropriately controlled to limit its purposes.

Data privacy has been a pertinent issue thoroughly addressed by the major DNA testing companies since the beginning of this field about 15 years ago.  Every one of them has detailed procedures to enable and maintain privacy.  Raw genomic files are not made public.  Instead particular SNPs (and in the case of yDNA, particular STR values) are compared to determine matches based on the algorithms established by the testing company. You choose your level of consent to determine what people can see and use of your data.  You choose your level of anonymity, in regards to how your data is associated or disassociated with your name, email, or further contact information.  Safeguards are in place, because for the big players in the market, a breach of privacy has always been a major liability.  Privacy concerns have the potential to kill the industry.

And that’s exactly why law enforcement in the Golden State killer case didn’t pursue the big genealogy-based genetic testing companies when trying the identify the extracted DNA from a crime scene.  These companies simply do not readily allow external evaluation of raw data files, and have policies against allowing such samples, such as from a crime scene, to be submitted into their databases.  It is simply not their purpose to use this information to prove criminal activity.  Not so with the smaller companies founded in the principle of voluntary exchange of personal information.  GEDMatch, while certainly valuable to family history researchers, is such an open source company built on volunteers allowing public access to their unfiltered genetic data.  Here, you can willingly download your raw data files to a public domain and scroll through the data of other likewise volunteers who similarly downloaded their genetic files in search of matches.  In addition, you can link these raw data files to your own genealogy, so when similarities are found, an avid researcher can scroll among four or five or a dozen or a hundred matches and potentially find an ancestor common to some of them.

This is what law enforcement did in the case above.  They were able to extract DNA from a crime scene, set up blocks of the raw genetic data in a way that it could be uploaded to GEDMatch, and then consequently searched for matches.  Once matches were found the investigators searched available genealogies of the matches to find a common ancestor.  By doing so, they isolated a finite number and possible identities of the criminal, and initiated a process of elimination to ultimately solve the case.

Concerned?  In this particular case a crime was solved and a criminal apprehended for our safety and the betterment of society. Yet, perhaps we all should be concerned.  Our cultural norms dictate that you’re allowed to do with your data what you want, but you had better not mess with mine.  However, this intensely individualistic notion does not match up with reality. Nature itself hasn’t followed our norms, but instead showcases a continuum of connectedness between individuals that we can readily assess through comparing genomes. It does not matter that you personally decide not to upload your individual genome to the public domain, because your cousin has uploaded his.  About 12.5% of the individual uniqueness of your own genome is shared with that cousin, and 25% with an aunt or uncle, and 50% with a parent or sibling.  This is not an individual choice, because in essence a close relative will make the choice to upload their genome whether you like it or not.  Even the 3.125% of unique genetic material you share with every one of those scores of second cousins is manifest only to the small tribe that is you and the descendants of a common great-grandparent.  Somebody in the family will be less concerned about privacy than you, and will provide their genetic information and genealogy for public view long before you know, and long before you can convince him or her otherwise.

And perhaps we need to accept this new reality.  But there are limitations. None of these databases which use DNA for genealogical purposes is secured or controlled in the way drug screening is.  None require a proof of identification to determine for certain that the tester is the actual individual listed on the test.  And what about those who upload their data with fake names to sake of privacy? What about those who upload their genealogical data full of mistakes, guesswork or wild speculation?  How will law enforcement know the difference?

And where does this unfettered public screening of data end up? If police officers can solve crimes using our publicly available genomes, will insurance companies also get a hold of this data and tailor coverage accordingly? What will you do when a prospective employer insists on your raw genomic data file along with your job application?  Already our doctors are storing this information to customize medical treatment according to our individual genomes. 

Needless to say the issue of privacy is not going away and will become more crucial as artificial intelligence develops more effective means to sort through our haystacks of information to find the purposefully meaningful bits for those whose intentions are good or ill.  There are some basic safeguards for those of us who are interested is using DNA to explore our ancestries and develop new connections with distant relatives, yet also want to have some measure of control and privacy in regards to our data.  Here are four general steps—suggested guidelines for the researcher concerned with privacy.

  1. Refrain from uploading your raw genetic data to third party sites.  Each of the big companies (FTDNA, 23andMe and Ancestry.com) has their own internal database from which it determines matches.  If you test within these companies, keep your data only within the database of the company with whom you’ve tested.  Your data is more secure within the testing company than moving it to a third-party site. 
  2. Do not publicly display your genetic data with the genealogical data of close living family members, or even those close relatives who have died.  Instead, setup up an ahnentafel word document and save it as a pdf. Begin with your great-grandparents and work back to your immigrant ancestors.  Have this available to share with matches upon request, after you begin a conversation.  Know the person you are talking to and don’t make your data available to immediate strangers.  Start a conversation with your matches first, so you can build trust.
  3. When joining DNA projects which have your DNA searchable in a database, set up at least some measure of anonymity associated with your DNA.  Rather than your name and email, share instead the name of your immigrant ancestor.  Perhaps only list your surname and county of origin.  Perhaps use an email you have setup only for genealogical purposes, one that cannot be tied to your home address or ultimate identification.
  4. And while you are cautious and selective to whom you share your genetic information, also have fun!  DNA testing is a journey, and with it you will discover new truths to share with your friends and family. On that journey, finding the best balance between privacy and learning about your ancestors from newly found distant relatives is all part of the ride.

Darvin L Martin is administrator of the Mennonite and Amish Immigrants DNA Project through FamilyTreeDNA.