The has announced a grant of up to $1 million from the to the school's to develop a comprehensive map of human genetic variation.
The current model for analyzing human genome data is based on the use of a single reference sequence for the human genome, onto which all novel sequencing data is mapped to identify variants — a process that leads to biases and mapping ambiguities. In an earlier Simons Foundation-funded project at the , researchers David Reich and Nick Patterson amassed more than three hundred complete human genome sequences representing a range of ethnicities. In the UC Santa Cruz project, a team led by Genomics Institute director David Haussler and co-investigator Benedict Paten will use the Reich-Patterson set of genome sequences, which they say is deeper and more completely organized than any other data set, to build a new graph-based human reference genome structure, the Human Genome Variation Map, to replace isolated, incompatible databases around the world with a single fundamental representation formalized as a very large mathematical graph.
To that end, the UC Santa Cruz team will collaborate with leading genomics researchers at other institutions to develop algorithms and formulate the best mathematical approach for constructing the new map. Initial work on developing a standard data model for the map is already under way in the context of the 's , which is co-chaired by Paten.
"One exemplary human genome cannot represent humanity as a whole, and the scientific community has not been able to agree on a single precise method to refer to and represent human genome variants," said Haussler. "There is a great deal we still don't know about human genetic variation because of these problems."