Scientists have finally filled the gaps in the sequence of the human Y chromosome, 20 years after the first draft of its DNA code was published.
The work reveals a complete catalog of the genes in the male sex chromosome — the last human chromosome to be fully sequenced — and how they're organized.
The findings, published Wednesday (Aug. 23) in the journal Nature, may make it easier to detect gene variants on the Y chromosome and link them to specific traits, as well as to develop personalized therapies for sex-specific genetic diseases. A companion paper, also published in Nature, used new software built on this advance to sequence the Y chromosomes of more than 40 additional men worldwide. The findings show that the Y chromosome evolved to have extensive variation in its size and structure among individuals.
"This is an amazing piece of work and a landmark study in genetics," Dianne Newbury, a professor of medical genetics and genomics at Oxford Brookes University who was not involved in the research, told Live Science in an email. "The sequencing of this unit [the Y chromosome] provides a complete picture of the whole human genome and will kickstart many studies into the sex chromosome that had not previously been possible."
Related: 'Disappearing' Y chromosome in aging men may worsen bladder cancer, mouse study shows
The Y chromosome is one of two chromosomes, along with the X chromosome, that determine biological sex in humans. Females typically have two X chromosomes, whereas males usually have one of each type.
The Y chromosome is important for the development of male sex characteristics; it carries the sex-determining region Y (SRY) gene, which codes for a protein that promotes the development of the testicles and blocks the development of female organs such as the uterus and fallopian tubes. It was partially sequenced in 2003 as part of the Human Genome Project, which produced the first draft human genome. However, the Y chromosome has a complex structure containing many long, repetitive sequences of DNA, and until now, less than 50% of the sequence was known.
"When we're putting a genome back together, we can't sequence a whole chromosome in one piece; we can only sequence little bits and pieces, and then we have to stitch everything back together again," Adam Phillippy, lead author of one of the new studies and a senior investigator at the National Human Genome Research Institute, told Live Science. "And just like reconstructing a puzzle, it's the repetitive parts that look the same that are always the hardest, and you save those for the end."
Using a combination of cutting-edge DNA sequencing and machine-learning technologies, as well as insight acquired from sequencing the other 23 human chromosomes, Phillippy's team were able to completely assemble the Y chromosome.
They discovered it contains 62,460,029 base pairs of DNA — the pairs of "letters" that make up the rungs of DNA's ladder. That's 30 million more than in the most current reference genome, GRCh38, which has been continually improved over the past 20 years. The team corrected errors in GRCh38 and uncovered the complete structures of several gene families, such as DAZ and RBMY, which are involved in sperm production. They also identified 41 new protein-coding genes and provided insight into noncoding regions known as satellite DNA — which don't carry blueprints for proteins — as well as sequences that have often been mistaken for bacterial DNA in the past.
Using lessons they learned in building this new genome, called T2T-Y, the team then developed software called Verkko that can be used to efficiently assemble the full sequences of additional chromosomes. This is important because T2T-Y was constructed from only one donor's Y chromosome, so it doesn't give a complete picture of how it may vary among people.
In a separate study, scientists used Verkko to fully assemble the Y chromosomes of 43 men from populations across Africa, the Americas, Asia and Europe who donated DNA samples to the 1000 Genomes Project — an international project to create a catalog of human genetic variation. These sequences reflected the extent of modern diversity in the Y chromosome, with the 43 chromosomes and T2T-Y sharing a common ancestor around 183,000 years ago. The sequences ranged from 45.2 million to 84.9 million base pairs in length, and taken together, they show that the Y chromosome has substantially diversified in its size and structure over this time.
In particular, the team found "surprising" variation in the repetitive regions of the chromosome, Charles Lee, senior author of the companion study and scientific director and professor at the Jackson Laboratory for Genomic Medicine, told Live Science in an email. He explained that the Y chromosome contains the largest block of condensed, noncoding DNA — known as heterochromatin — in the whole human genome. This region is made up mainly of two repetitive DNA sequences, which were organized differently among participants in the study but always appeared in the same 1-to-1 ratio, in terms of the number of times they were found in the chromosome.
"To me, this strongly suggests some as-of-yet unknown functional significance for this chromosome region," Lee said. Indeed, scientists no longer consider heterochromatin to be "junk" DNA, with recent research suggesting it can play important roles in cells, and may have even contributed to the evolution of the human brain.
In their paper, Phillippy and his team wrote that existing reference genomes will need to be updated and that appropriate labels will have to be given to any newly identified regions of the Y chromosome. Lee's team hopes to learn more about the functional impact that structural variations in the Y chromosome may have on health and disease, and their possible link to human evolution, he said.
Kerstin Howe, head of production genomics at the Wellcome Sanger Institute who was not involved in the research, reiterated the importance of this advancement. "Assembling complete sequences of 43 Y chromosomes across space and time not only helps us to investigate sex chromosome evolution but also human evolution more generally," she told Live Science in an email.
As for developing new therapies to treat diseases that may be linked to the Y chromosome, Phillippy said: "It's a long road from having the genome to developing, approving and rolling out some kind of therapy." Nevertheless, he is optimistic.
"This is like the blueprint that we're looking at, and if there's complete holes in it, you might not even know where to begin," he said. "But by having them filled in, we have the complete picture."