#유전자 #고대DNA #중국인 #부계유전

Wang, M., Wang, S., Cui, Y. et al. (2024) ‘Multiple human population movements and cultural dispersal events shaped the landscape of Chinese paternal heritage’, Molecular Biology and Evolution, 41(7), msae122.

Multiple Human Population Movements and Cultural Dispersal Events Shaped the Landscape of Chinese Paternal Heritage

다수의 인구 이동과 문화 확산이 중국 부계 유전 지형을 형성한 과정

Mengge Wang,^1,2,3,†,* Yuguo Huang,^1,† Kaijun Liu,^4,5 Zhiyong Wang,^1,6 Menghan Zhang,^7,8 Haibing Yuan,² Shuhan Duan,^1,9 Lanhai Wei,¹⁰ Hongbing Yao,¹¹ Qiuxia Sun,^1,12 Jie Zhong,¹ Renkuan Tang,¹² Jing Chen,^1,13 Yuntao Sun,^1,14 Xiangping Li,^1,6 Haoran Su,^1,15 Qingxin Yang,^1,6 Liping Hu,⁶ Libing Yun,¹⁴ Junbao Yang,¹⁶ Shengjie Nie,⁶ Yan Cai,¹⁵ Jiangwei Yan,¹³ Kun Zhou,⁵ Chuanchao Wang,¹⁷ 10K_CPGDP Consortium,^‡ Bofeng Zhu,^18,19,* Chao Liu,^18,20,* and Guanglin He^1,2,†,*

왕몽가(王夢鴿)^1,2,3,†,*, 황옥국(黃玉國)^1,†, 유개군(劉凱軍)^4,5, 왕지용(王志勇)^1,6, 장몽함(張夢涵)^7,8, 원해빙(袁海冰)², 단숙함(段淑涵)^1,9, 위란해(韋蘭海)¹⁰, 요홍병(姚宏兵)¹¹, 손추하(孫秋霞)^1,12, 종걸(鍾杰)¹, 당인관(唐仁寬)¹², 진정(陳靜)^1,13, 손운도(孫雲濤)^1,14, 이향평(李向平)^1,6, 소호연(蘇浩然)^1,15, 양경흠(楊慶鑫)^1,6, 호입평(胡立平)⁶, 운입빙(雲立冰)¹⁴, 양준보(楊俊寶)¹⁶, 섭성걸(聶聖傑)⁶, 채연(蔡燕)¹⁵, 염강위(閆江偉)¹³, 주곤(周昆)⁵, 왕전초(王傳超)¹⁷, 10K_CPGDP 컨소시엄^‡, 주보봉(朱博峰)^18,19,*, 유초(劉超)^18,20,*, 하광림(何廣林)^1,2,†,*

¹ Institute of Rare Diseases, West China Hospital of Sichuan University, Sichuan University, Chengdu 610000, China
사천대학(四川大學) 서중병원(西中醫院) 희귀질환연구소, 중국, 610000, 성도(成都)

² Center for Archaeological Science, Sichuan University, Chengdu 610000, China
사천대학(四川大學) 고고학과학센터, 중국, 610000, 성도(成都)

³ Faculty of Forensic Medicine, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou 510275, China
중산대학(中山大學) 중산의학원 법의학과, 중국, 510275, 광주(廣州)

⁴ School of International Tourism and Culture, Guizhou Normal University, Guiyang 550025, China
귀주사범대학(貴州師範大學) 국제관광문화학부, 중국, 550025, 귀양(貴陽)

⁵ MoFang Human Genome Research Institute, Tianfu Software Park, Chengdu, Sichuan 610042, China
모방인간게놈연구소, 중국, 610042, 사천(四川) 성도(成都), 천부소프트웨어파크

⁶ School of Forensic Medicine, Kunming Medical University, Kunming 650500, China
곤명의과대학(昆明醫科大學) 법의학부, 중국, 650500, 곤명(昆明)

⁷ Institute of Modern Languages and Linguistics, Fudan University, Shanghai 200433, China
복단대학(復旦大學) 현대언어·언어학연구소, 중국, 200433, 상해(上海)

⁸ Research Institute of Intelligent Complex Systems, Fudan University, Shanghai 200433, China
복단대학(復旦大學) 지능형복합시스템연구소, 중국, 200433, 상해(上海)

⁹ School of Basic Medical Sciences, North Sichuan Medical College, Nanchong 637100, China
천북의학원(川北醫學院) 기초의학부, 중국, 637100, 남충(南充)

¹⁰ School of Ethnology and Anthropology, Institute of Humanities and Human Sciences, Inner Mongolia Normal University, Hohhot 010022, China
내몽골사범대학(內蒙古師範大學) 민족·인류학학부 인문과학연구소, 중국, 010022, 호화호특(呼和浩特)

¹¹ Belt and Road Research Center for Forensic Molecular Anthropology Gansu University of Political Science and Law, Lanzhou 730000, China
감숙정법대학(甘肅政法大學) 일대일로 법의분자인류학 연구센터, 중국, 730000, 난주(蘭州)

¹² Department of Forensic Medicine, College of Basic Medicine, Chongqing Medical University, Chongqing 400331, China
중경의과대학(重慶醫科大學) 기초의학부 법의학과, 중국, 400331, 중경(重慶)

¹³ School of Forensic Medicine, Shanxi Medical University, Jinzhong 030001, China
산서의과대학(山西醫科大學) 법의학부, 중국, 030001, 진중(晉中)

¹⁴ Institute of Forensic Medicine, West China School of Basic Medical Sciences & Forensic Medicine, Sichuan University, Chengdu 610041, China
사천대학(四川大學) 서중기초의학·법의학부 법의학연구소, 중국, 610041, 성도(成都)

¹⁵ School of Laboratory Medicine and Center for Genetics and Prenatal Diagnosis, Affiliated Hospital of North Sichuan Medical College, Nanchong, Sichuan 637007, China
천북의학원(川北醫學院) 부속병원 검사의학부 유전·산전진단센터, 중국, 637007, 사천(四川) 남충(南充)

¹⁶ Institute of Basic Medicine and Forensic Medicine, North Sichuan Medical College and Center for Genetics and Prenatal Diagnosis, Affiliated Hospital of North Sichuan Medical College, Nanchong, Sichuan 637007, China
천북의학원(川北醫學院) 기초의학·법의학연구소 및 부속병원 유전·산전진단센터, 중국, 637007, 사천(四川) 남충(南充)

¹⁷ State Key Laboratory of Cellular Stress Biology, School of Life Sciences, Xiamen University, Xiamen 361005, China
하문대학(廈門大學) 생명과학부 세포스트레스생물학 국가핵심연구실, 중국, 361005, 하문(廈門)

¹⁸ Guangzhou Key Laboratory of Forensic Multi-Omics for Precision Identification, School of Forensic Medicine, Southern Medical University, Guangzhou 510515, China
남방의과대학(南方醫科大學) 법의학부 정밀식별을 위한 광주(廣州) 법의학 다중오믹스 핵심연구실, 중국, 510515, 광주(廣州)

¹⁹ Microbiome Medicine Center, Department of Laboratory Medicine, Zhujiang Hospital, Southern Medical University, Guangzhou, Guangdong 510515, China
남방의과대학(南方醫科大學) 주강(珠江)병원 검사의학과 마이크로바이옴 의학센터, 중국, 510515, 광동(廣東) 광주(廣州)

²⁰ Anti-Drug Technology Center of Guangdong Province, Guangzhou 510230, China
광동성(廣東省) 마약퇴치기술센터, 중국, 510230, 광주(廣州)

[리뷰] 중화 쇼비니즘 경사도 평가: 5/10

18. Wang, M., Wang, S., Cui, Y. et al. (2024) ‘Multiple human population movements and cultural dispersal events shaped the landscape of Chinese paternal heritage’, Molecular Biology and Evolution, 41(7), msae122.

(1) 연구 개요 및 저자의 주장

이 연구는 15,563명에 달하는 현대 및 고대 유라시아인의 Y염색체 데이터를 통합 분석하여 중국 부계 유산의 지형을 재구성했다. 연구는 네 가지 주요 고대 인구 이동이 중국의 부계 유산을 형성했다고 주장한다: ①황하 유역 기장 농경민(O2/D 계통)의 확산, ②양자강 유역 벼 농경민(O1 계통)의 확산, ③신석기 시베리아(Q/C 계통)의 영향, ④청동기 시대 서유라시아 목축민(R/J 계통)의 유입.

(2) 편향성 분석 (중화 쇼비니즘 경사도: 4/10)

‘다중 사건’ 모델을 제시하여 단선적 서사를 약화시켰으나, 부계에만 집중하여 발생할 수 있는 해석의 한계가 있다.

서사 프레이밍 (낮은 편향성): ‘다중 사건’ 모델을 통해 단선적인 확산 서사를 약화시켰다.
모델 선택과 반례 취급 (중간 편향성): 부계에만 집중하여 성별에 따른 비대칭적 이동, 특히 타기도 사례처럼 모계 중심의 연해 네트워크가 만들어내는 반례를 충분히 설명하기 어렵다.
지리·환경 제약 반영 (중간 편향성): 해양 회랑을 부계 혈통의 지리적 분포와 연결하여 시각화할 필요가 있다.
유전자–문화 결합 가정 (낮은 편향성): 유전자와 문화의 분리 가능성을 열어두고 있다.

(3) 결론 재구성

결론을 보강하기 위해서는, 동일한 분석 대상에 대해 부계(Y)/모계(mt)/전체 유전체(오토좀)의 “세 겹 지도”를 그려 성별에 따른 비대칭적 이동을 수량화해야 한다. 예를 들어, 홍산 및 산동 지역에서 확인된 ‘북방계 부계’와 ‘연해계 모계(B5b2)’의 교차 배치를 비교하여, 혼인 네트워크의 구체적인 양상을 복원할 수 있다.

[논문요약]

DNA로 밝혀낸 현대 중국인의 기원: 4대 조상 그룹 이야기

요약

이 논문은 현대 중국인이 어떤 경로를 통해 형성되었는지 Y-염색체 DNA를 추적하여 밝혀낸 대규모 유전학 연구다. 연구 결과, 현대 중국인의 부계 혈통은 단일한 조상에서 나온 것이 아니라, 크게 네 개의 고대 인구 집단이 수천 년에 걸쳐 이동하고 서로 섞이면서 만들어진 거대한 ‘모자이크’와 같다는 사실이 밝혀졌다.

이 네 그룹은 각각 ①서쪽에서 온 목축민, ②북쪽 시베리아의 수렵-채집인, ③황하 유역의 기장 농사꾼, ④양자강 유역의 쌀 농사꾼이다. 이 연구는 이들 각 그룹이 남긴 고유한 유전적 표식(하플로그룹)을 찾아내고, 이것이 오늘날 중국 각 지역의 인구 특성과 어떻게 연결되는지를 다양한 데이터와 지도를 통해 명확하게 보여준다.

1. 서론: 유전자로 조상 찾기

중국은 수많은 민족이 어우러져 사는 거대한 땅이지만, 이들이 어떤 역사적 과정을 거쳐 현재와 같이 분포하게 되었는지는 오랫동안 풀리지 않은 수수께끼였다. 이 연구는 이 수수께끼를 풀기 위해 ‘유전적 족보’와 같은 Y-염색체를 활용했다. Y-염색체는 아버지에게서 아들에게로만 거의 그대로 전달되어, ‘유전적 성씨(姓氏)’처럼 한 집단의 남성 계보를 추적할 수 있게 해준다.

연구팀은 같은 Y-염색체 특징을 공유하는 거대한 유전적 가문인 하플로그룹(Haplogroup)을 분석 단위로 삼았다. 이들은 919명의 새로운 유전자 데이터를 포함해 총 15,563명의 방대한 Y-염색체 정보를 모아 중국인 부계 혈통의 거대한 ‘가계도'(계통수)를 완성했다.

[그림 1. 연구의 설계도와 큰 그림]

이 그림은 연구의 전체적인 모습을 보여준다.

그림 1(a) 지도: 중국 전역의 어느 지역에서 어떤 소수민족의 DNA 샘플을 채취했는지 보여주는 ‘연구 출장 지도’다. 원이 클수록 그 지역에서 많은 샘플을 얻었다는 뜻이다.
그림 1(b) 계통수: 이번 연구에 참여한 사람들의 Y-염색체를 분석해 만든 ‘부계 혈통 마스터 가계도’다. 나무의 큰 가지(예: O, N, C, R)들이 주요 하플로그룹(유전적 가문)을 나타내며, 가지가 갈라지는 지점은 유전적으로 분화된 시점을 의미한다. 이 가계도를 통해 어떤 ‘가문’이 얼마나 크고 서로 어떻게 연결되는지 한눈에 파악할 수 있다.

2. 연구 결과: 현대 중국인을 만든 4대 조상 그룹

[그림 2. 고대인과 현대인의 유전적 관계도]

이 그림은 연구 결과의 핵심을 보여주는 ‘고대-현대 유전자 종합 관계도’다. 원형 가계도에 현대 중국인(실선)뿐만 아니라, 수천 년 전 고대 인골에서 추출한 DNA(점선)까지 함께 배치했다. 이를 통해 특정 고대인 그룹이 오늘날 어떤 사람들의 직계 조상에 해당하는지 직접적으로 확인할 수 있다. 바깥쪽의 큰 색상 영역은 이 연구가 밝혀낸 4대 조상 그룹의 범위를 나타낸다.

그룹 1. 서쪽에서 온 손님들: 고대 목축민과 보리 농부 (하플로그룹 J, G, R)

정체: 중앙아시아와 서유라시아 초원지대에서 유래한 보리 농부와 목축민.
유전적 흔적: 이들의 Y-염색체 표식인 하플로그룹 J, G, R은 오늘날 중국 북서부(신강, 감숙성 등)에 집중적으로 분포한다.
데이터로 본 증거:
- 그림 3(a, b) 분포도: 이 지도들은 J, G, R 하플로그룹의 ‘핫스팟’이 중국 서북쪽 경계에 집중되어 있음을 명확히 보여준다.
- 그림 4(b) 조상 비율 지도: ‘아파나시에보(서쪽 목축민) 관련 조상’의 유전적 비율이 서북쪽으로 갈수록 짙어지는 것을 확인할 수 있다.

그룹 2. 북방의 개척자들: 시베리아 수렵-채집인 (하플로그룹 C, N, Q)

정체: 몽골 고원과 시베리아 아무르강 유역에 살던 신석기 시대 수렵-채집인들.
유전적 흔적: 이들의 표식인 하플로그룹 C, N, Q는 중국 북부에 널리 퍼져 있으며, 특히 몽골족, 퉁구스족 등과 깊은 관련이 있다.
데이터로 본 증거:
- 그림 3(c) 분포도: C와 Q 하플로그룹이 내몽골과 동북 3성을 중심으로 높은 빈도를 보인다.
- 그림 4(c) 조상 비율 지도: ‘몽골 북부 관련 조상’의 유전적 영향력이 중국 북쪽에서 가장 강하게 나타난다.

그룹 3. 중국 문명의 뿌리: 황하 유역의 기장 농부 (하플로그룹 O2)

정체: 황하 유역에서 기장 농사를 시작하며 중국 초기 문명을 이끈 집단. 한족(漢族)과 중국-티베트어족의 핵심 조상이다.
유전적 흔적: 이들의 하플로그룹 O2는 폭발적으로 팽창하여 오늘날 중국 전역에서 가장 지배적인 부계 혈통이 되었다.
데이터로 본 증거:
- 그림 3(d) 분포도: O2 계열 하플로그룹이 중국 대륙 거의 전역에서 높은 빈도를 보여 이들의 엄청난 확산세를 증명한다.
- 그림 4(e) 조상 비율 지도: ‘하고가 유적(황하 농부) 관련 조상’의 유전자가 오늘날 중국 인구의 절대다수를 차지하는 기반임을 보여준다.

그룹 4. 남방 문화의 원류: 양자강 유역의 쌀 농부 (하플로그룹 O1)

정체: 양자강 유역에서 세계 최초로 쌀농사를 시작한 집단.
유전적 흔적: 이들의 하플로그룹 O1은 중국 남부와 동남부 해안을 따라 퍼져나갔으며, 오늘날 중국 남부 한족과 동남아시아 여러 민족의 주요 조상이 되었다.
데이터로 본 증거:
- 그림 3(e) 분포도: O1 계열 하플로그룹이 양자강 이남 지역과 해안선을 따라 높은 빈도를 보인다.
- 그림 4(g) 조상 비율 지도: ‘대만 한본 유적(남방 농부) 관련 조상’의 유전적 영향력이 중국 남동부에서 가장 강하게 나타난다.

[그림 5. 최종 증거: 유전자와 조상의 연결고리]

이 그림은 “이러한 주장이 통계적으로도 사실인가?”라는 질문에 답하는 ‘결정적 증거’다. 각 그래프는 특정 조상 그룹의 유전적 비율(가로축)과 특정 하플로그룹의 빈도(세로축)를 보여준다. 점들이 흩어져 있지 않고 우상향하는 직선에 가깝게 모여있는데, 이는 두 요소가 우연이 아니라 직접적인 상관관계가 있음을 의미한다. 예를 들어, 그림 5(b)는 ‘몽골 북부 조상’ 유전자가 많을수록 북방계 하플로그룹(C, N, Q)을 가질 확률이 비례하여 높아진다는 것을 명확히 보여준다.

3. 결론: 중국인은 거대한 유전자 모자이크

이 연구는 현대 중국인의 부계 혈통이 단일한 기원이 아니라, 서로 다른 환경에서 다른 방식으로 살아가던 4개의 거대한 고대 집단이 수천 년에 걸쳐 이동, 혼합, 팽창하며 형성된 복잡한 모자이크임을 방대한 데이터와 시각 자료를 통해 증명했다. 이 연구는 최신 DNA 분석 기술을 통해 고고학, 언어학, 역사학의 증거들을 하나로 엮어, 중국과 동아시아 인류가 어떻게 형성되었는지에 대한 매우 상세하고 종합적인 그림을 제시한 중요한 성과다.

[논문번역]

요약 ABSTRACT

Large-scale genomic projects and ancient DNA innovations have ushered in a new paradigm for exploring human evolutionary history. However, the genetic legacy of spatiotemporally diverse ancient Eurasians within Chinese paternal lineages remains unresolved. Here, we report an integrated Y-chromosome genomic database encompassing 15,563 individuals from both modern and ancient Eurasians, including 919 newly reported individuals, to investigate the Chinese paternal genomic diversity. The high-resolution, time-stamped phylogeny reveals multiple diversification events and extensive expansions in the early and middle Neolithic. We identify four major ancient population movements, each associated with technological innovations that have shaped the Chinese paternal landscape. First, the expansion of early East Asians and millet farmers from the Yellow River Basin predominantly carrying O2/D subclades significantly influenced the formation of the Sino-Tibetan people and facilitated the permanent settlement of the Tibetan Plateau. Second, the dispersal of rice farmers from the Yangtze River Valley carrying O1 and certain O2 sublineages reshapes the genetic makeup of southern Han Chinese, as well as the Tai-Kadai, Austronesian, Hmong-Mien, and Austroasiatic people. Third, the Neolithic Siberian Q/C paternal lineages originated and proliferated among hunter-gatherers on the Mongolian Plateau and the Amur River Basin, leaving a significant imprint on the gene pools of northern China. Fourth, the J/G/R paternal lineages derived from western Eurasia, which were initially spread by Yamnaya-related steppe pastoralists, maintain their presence primarily in northwestern China. Overall, our research provides comprehensive genetic evidence elucidating the significant impact of interactions with culturally distinct ancient Eurasians on the patterns of paternal diversity in modern Chinese populations.

대규모 유전체 프로젝트와 고대 DNA 혁신은 인류 진화사를 탐구하는 새로운 패러다임을 열었다. 그러나 시공간적으로 다양한 고대 유라시아인이 중국 부계 혈통 내에 남긴 유전적 유산은 아직 해결되지 않은 문제이다. 본 연구에서는 현대 및 고대 유라시아인 15,563명을 포함하는 통합 Y 염색체 유전체 데이터베이스를 보고한다. 여기에는 새로 보고되는 919명의 데이터도 포함된다. 이를 통해 중국의 부계 유전체 다양성을 조사했다. 고해상도의 시간 기록 계통 분석 결과, 신석기 초기와 중기에 여러 차례의 다양화 사건과 광범위한 팽창이 있었음을 밝혀냈다. 우리는 중국 부계 지형을 형성한 기술 혁신과 관련된 네 가지 주요 고대 인구 이동을 확인했다. 첫째, 주로 O2/D 하위 분기군을 가진 황하 유역의 초기 동아시아인과 조(millet) 농민의 팽창은 중국-티베트어족(Sino-Tibetan people) 형성에 큰 영향을 미쳤고, 티베트 고원의 영구 정착을 촉진했다. 둘째, O1과 특정 O2 하위 혈통을 가진 양쯔강 유역의 쌀 농민의 확산은 남부 한족(Han Chinese)뿐만 아니라 타이-카다이어족(Tai-Kadai), 오스트로네시아어족(Austronesian), 몽-미엔어족(Hmong-Mien), 오스트로아시아어족(Austroasiatic) 사람들의 유전적 구성을 재편했다. 셋째, 신석기 시대 시베리아의 Q/C 부계 혈통은 몽골 고원과 아무르강 유역의 수렵-채집인 사이에서 기원하고 확산하여 중국 북부 유전자 풀에 상당한 흔적을 남겼다. 넷째, 서부 유라시아에서 유래한 J/G/R 부계 혈통은 초기에 얌나야(Yamnaya) 관련 초원 목축민에 의해 퍼졌으며, 주로 중국 북서부에 그 존재를 유지하고 있다. 종합적으로, 우리 연구는 문화적으로 다른 고대 유라시아인과의 상호작용이 현대 중국 인구의 부계 다양성 패턴에 미친 중대한 영향을 명확히 하는 포괄적인 유전적 증거를 제공한다.

Key words: YanHuang cohort, Y-chromosome phylogeny, evolutionary history, founding lineage.
키워드: 옌황 코호트(YanHuang cohort), Y염색체 계통, 진화사, 창시 혈통.

서론 (Introduction)

Population genomics and human pangenome projects aim to comprehensively document the genetic landscapes of globally diverse populations, elucidate their demographic histories, and uncover the genetic underpinnings of complex traits and diseases (Bergstrom et al. 2020; Byrska-Bishop et al. 2022). East Asia serves as one of the earliest cradles of civilization and the crossroads of the peopling of Oceania, Siberia, and America, whose genetic landscape is poorly characterized in the era of population genomics. China harbors extensive genetic, physical, cultural, and ethnolinguistic diversities, positioning it uniquely for studying the intricate demographic histories of diverse populations, including human divergence, migration, and admixture, and the interplay between genetics and culture (Wang et al. 2021b; Kumar et al. 2022). Numerous studies have sought to bridge the knowledge gap regarding the genetic diversity of Chinese populations by examining their evolutionary histories and the genetics of complex traits and diseases. Recent research utilized genome-wide SNP microarrays to analyze the genomic diversity and population history of various Sino-Tibetan, Mongolic, Tungusic, Turkic, Tai-Kadai, and Hmong-Mien groups (Feng et al. 2017; He et al. 2022; Wang et al. 2022; He et al. 2023b; Sun et al. 2023; Li et al. 2024). Additionally, the rise of whole-genome sequencing studies has expanded, featuring projects, such as the Westlake BioBank for Chinese, the NyuWa genome resource, the China Metabolic Analytics Project, and the 10K Chinese People Genomic Diversity Project (10K_CPGDP; Cao et al. 2020; Zhang et al. 2021a; Cong et al. 2022; Cheng et al. 2023; He et al. 2023c). These efforts enhance our understanding of the genetic diversity, demographic history, and genetic architecture of complex traits and diseases in ethnolinguistically distinct Chinese populations from an autosomal perspective, suggesting a further exploration of their fine-scale genetic structure from both uniparental and population-scale project perspectives.

인구 유전체학 및 인간 범유전체 프로젝트는 전 세계 다양한 인구 집단의 유전적 특징을 종합적으로 기록하고, 그들의 인구학적 역사를 밝히며, 복잡한 특성과 질병의 유전적 기반을 밝히는 것을 목표로 한다(Bergstrom et al. 2020; Byrska-Bishop et al. 2022). 동아시아는 가장 오래된 문명의 발상지 중 하나이자 오세아니아, 시베리아, 아메리카로 인류가 퍼져나간 교차로였지만, 인구 유전체학 시대에 그 유전적 특징은 제대로 규명되지 않았다. 중국은 광범위한 유전적, 신체적, 문화적, 민족언어학적 다양성을 보유하고 있어, 인간의 분화, 이주, 혼혈, 그리고 유전과 문화 간의 상호작용을 포함한 다양한 인구 집단의 복잡한 인구학적 역사를 연구하는 데 독보적인 위치에 있다(Wang et al. 2021b; Kumar et al. 2022). 많은 연구가 중국 인구의 진화 역사와 복잡한 특성 및 질병의 유전학을 조사함으로써 중국 인구의 유전적 다양성에 대한 지식 격차를 해소하고자 했다. 최근 연구에서는 게놈 전체 SNP 마이크로어레이를 사용하여 다양한 중국-티베트어족, 몽골어족, 퉁구스어족, 튀르크어족, 타이-카다이어족, 몽-미엔어족 집단의 게놈 다양성과 인구 역사를 분석했다(Feng et al. 2017; He et al. 2022; Wang et al. 2022; He et al. 2023b; Sun et al. 2023; Li et al. 2024). 또한, 웨스트레이크 중국인 바이오뱅크(Westlake BioBank for Chinese), 여와(NyuWa) 게놈 자원, 중국 대사 분석 프로젝트(China Metabolic Analytics Project), 1만 중국인 게놈 다양성 프로젝트(10K_CPGDP)와 같은 프로젝트를 특징으로 하는 전체 게놈 시퀀싱 연구가 확대되었다(Cao et al. 2020; Zhang et al. 2021a; Cong et al. 2022; Cheng et al. 2023; He et al. 2023c). 이러한 노력은 상염색체 관점에서 민족언어학적으로 구별되는 중국 인구의 유전적 다양성, 인구학적 역사, 복잡한 특성 및 질병의 유전적 구조에 대한 우리의 이해를 향상시키며, 단일부모 유전 및 인구 규모 프로젝트 관점에서 그들의 미세한 유전 구조에 대한 추가 탐구를 제안한다.

The nonrecombining portion of the Y-chromosome has become pivotal in studying human evolutionary history across various time scales (Poznik et al. 2016). Recent advancements in sequencing technologies and computational methods for genome assembly, read mapping, variant calling, and benchmarking have significantly improved the generation of complete Y-chromosome sequences, enriching our understanding of Y-chromosome variations (Olson et al. 2023). These developments have facilitated the construction of a robust phylogenetic tree, with branch lengths indicating mutation counts (Poznik et al. 2016; Zhabagin et al. 2022). Over the past two decades, studies on targeted Y-SNPs have traced ancestral lines through paternal lineages, providing crucial phylogenetic data for research on human origins, migrations, and admixture (Su et al. 1999; Zerjal et al. 2003). Resequencing the entire Y-chromosome region using advanced next-generation sequencing and computational techniques has transformed research paradigms. For instance, Wei et al. identified 6,662 high-confidence variants across 36 diverse Y-chromosome sequences, refining existing Y-chromosome phylogenies (Wei et al. 2013). Similarly, Poznik et al. (2016) analyzed 1,244 complete Y-chromosome genomes from the 1000 Genomes Project (1KGP), uncovering over 65,000 variants and identifying recent expansions within specific paternal lineages. Studies on single populations or specific lineages have also been conducted. The O1a-M119 lineage, which is shared among the Sinitic, Tai-Kadai, and Austronesian groups, and key paternal lineages like C2a-F5484 and Qlala-M120 have been examined to trace their origins, diffusion, and contributions to the gene pools of Chinese ethnolinguistically diverse groups (Sun et al. 2019; Wu et al. 2020; Sun et al. 2021). However, the availability of large-scale Y-chromosome genomic databases for China remains limited, underscoring the need for more comprehensive databases to explore the paternal genetic landscape and its historical influences on diverse populations.

Y 염색체의 비재조합 부분은 다양한 시간 규모에 걸쳐 인간의 진화 역사를 연구하는 데 중추적인 역할을 해왔다(Poznik et al. 2016). 시퀀싱 기술의 최근 발전과 게놈 조립, 리드 매핑, 변이 호출 및 벤치마킹을 위한 계산 방법은 완전한 Y 염색체 서열 생성을 크게 향상시켜 Y 염색체 변이에 대한 우리의 이해를 풍부하게 했다(Olson et al. 2023). 이러한 발전은 돌연변이 수를 나타내는 가지 길이로 견고한 계통수를 구축하는 것을 용이하게 했다(Poznik et al. 2016; Zhabagin et al. 2022). 지난 20년 동안, 표적 Y-SNP에 대한 연구는 부계 혈통을 통해 조상 계통을 추적하여 인간의 기원, 이주 및 혼혈 연구에 중요한 계통학적 데이터를 제공했다(Su et al. 1999; Zerjal et al. 2003). 차세대 시퀀싱 및 계산 기술을 사용하여 전체 Y 염색체 영역을 재시퀀싱하는 것은 연구 패러다임을 변화시켰다. 예를 들어, 웨이(Wei) 등은 36개의 다양한 Y 염색체 서열에서 6,662개의 신뢰도 높은 변이를 식별하여 기존의 Y 염색체 계통을 개선했다(Wei et al. 2013). 마찬가지로, 포즈닉(Poznik) 등(2016)은 1000 게놈 프로젝트(1KGP)의 1,244개 전체 Y 염색체 게놈을 분석하여 65,000개 이상의 변이를 발견하고 특정 부계 혈통 내에서 최근의 팽창을 확인했다. 단일 인구 또는 특정 혈통에 대한 연구도 수행되었다. 중국어파, 타이-카다이어족, 오스트로네시아어족 그룹 간에 공유되는 O1a-M119 혈통과 C2a-F5484 및 Qlala-M120과 같은 주요 부계 혈통이 그들의 기원, 확산 및 중국의 민족언어학적으로 다양한 그룹의 유전자 풀에 대한 기여를 추적하기 위해 조사되었다(Sun et al. 2019; Wu et al. 2020; Sun et al. 2021). 그러나 중국에 대한 대규모 Y 염색체 게놈 데이터베이스의 가용성은 여전히 제한적이어서, 부계 유전적 지형과 다양한 인구에 대한 역사적 영향을 탐구하기 위해 보다 포괄적인 데이터베이스가 필요함을 강조한다.

Recent increases in genomic resources from Chinese populations have highlighted the gap in our understanding of the paternal genetic diversity among ethnic minorities, which lags significantly behind that of Han Chinese and other global populations (Karmin et al. 2022). To address this issue, we launched the 10K_CPGDP by employing anthropologically informed sampling strategies (He et al. 2023c). Additionally, we introduce the YanHuang cohort (YHC) genomic resource that includes new Y-chromosome sequences from ethnolinguistically diverse ethnic minorities and integrates data from the 10K_CPGDP. The YHC aims to provide a high-quality population-specific Y-chromosome database, delineate the fine-scale paternal demographic history of underrepresented groups, construct a high-resolution, time-stamped phylogenetic tree, and develop novel East Asian-specific next-generation sequencing panels covering SNPs, STRs, InDels, and other variants for medical and forensic use. We also developed the “YHSeqY3000”, the highest-resolution Y-specific targeted resequencing panel designed from whole-genome and genome-wide SNP data of Y-chromosomes within the YHC. We genotyped 2,999 panel-related Y-SNPs in 919 males from 57 diverse ethnic minorities who were also genotyped by whole Y-chromosome sequencing. Our efforts culminated in a comprehensive Y-chromosome database encompassing 15,563 individuals from modern and ancient Eurasian backgrounds, allowing us to construct the first fully resolved phylogeny incorporating ancient DNA sequences. This phylogeny helps estimate the coalescence dates of dominant lineages, trace the origins of Chinese paternal lineages, and elucidate the impacts of historical migrations, admixture, and shifts in subsistence strategies on the genetic architecture of these diverse groups.

최근 중국 인구로부터의 게놈 자원 증가는 한족(Han Chinese) 및 기타 전 세계 인구에 비해 현저히 뒤처져 있는 소수 민족의 부계 유전적 다양성에 대한 우리의 이해 격차를 부각시켰다(Karmin et al. 2022). 이 문제를 해결하기 위해, 우리는 인류학적으로 정보를 기반으로 한 샘플링 전략을 사용하여 10K_CPGDP를 시작했다(He et al. 2023c). 또한, 우리는 민족언어학적으로 다양한 소수 민족의 새로운 Y-염색체 서열을 포함하고 10K_CPGDP의 데이터를 통합한 염황 코호트(YanHuang cohort, YHC) 게놈 자원을 소개한다. YHC는 고품질의 인구 특이적 Y-염색체 데이터베이스를 제공하고, 소외된 집단의 미세한 부계 인구학적 역사를 규명하며, 고해상도의 시간 기록 계통수를 구축하고, 의료 및 법의학용으로 SNP, STR, InDel 및 기타 변이를 포괄하는 새로운 동아시아 특이적 차세대 시퀀싱 패널을 개발하는 것을 목표로 한다. 우리는 또한 YHC 내 Y-염색체의 전체 게놈 및 게놈 전반 SNP 데이터로부터 설계된 최고 해상도의 Y-특이적 표적 재시퀀싱 패널인 “YHSeqY3000″을 개발했다. 우리는 전체 Y-염색체 시퀀싱으로도 유전자형이 분석된 57개 다양한 소수 민족 출신 남성 919명의 2,999개 패널 관련 Y-SNP의 유전자형을 분석했다. 우리의 노력은 현대 및 고대 유라시아 배경을 가진 15,563명의 개인을 포함하는 포괄적인 Y-염색체 데이터베이스로 결실을 맺었으며, 이를 통해 고대 DNA 서열을 통합한 최초의 완전하게 해결된 계통을 구축할 수 있었다. 이 계통은 우세한 혈통의 공통 조상 시점을 추정하고, 중국 부계 혈통의 기원을 추적하며, 역사적 이주, 혼합 및 생계 전략의 변화가 이 다양한 집단들의 유전적 구조에 미친 영향을 밝히는 데 도움을 준다.

결과 및 토의 (Results and Discussion)

Genetic Diversity of YHC Paternal Lineages Inferred from Y-Chromosome Sequences and the YHSeqY3000 Panel

Y 염색체 서열과 YHSeqY3000 패널로 본 염황 코호트(YHC) 부계 혈통의 유전적 다양성

We performed whole Y-chromosome sequencing on 919 participants from 57 populations of 39 ethnic minorities (Fig. 1a; supplementary table S1, Supplementary Material online), integrated the genetic data of nearly 15,000 modern and ancient Eurasian people (supplementary tables S2 and S3, Supplementary Material online), and developed a high-resolution YHSeqY3000 panel, including Y-SNPs not present in existing phylogenetic databases (ISOGG, Yfull). The predominant paternal lineages identified, namely, C-M130, N-M231, O-M175, and R-M207, demonstrated haplogroup frequencies greater than 5% (supplementary fig. S1 and table S4, Supplementary Material online). Additional sublineages, such as D1-M174 and E1-P147, were also noted among these minorities (supplementary fig. S1 and table S4, Supplementary Material online). For the haplogroup classification, three methods were used, namely, in-house scripts, Y-Lineage Tracker, and HaploGrouper, to simultaneously infer haplogroups from the YHSeqY3000 panel data. The discrepancies in the classification results highlighted the need for improved accuracy in the haplogroup determination, especially the 40 significant discrepancies involving major subclades like C-M130 and J-M304 based on the Y-Lineage Tracker classification (supplementary table S4, Supplementary Material online). In contrast, the haplogroup differences obtained based on HaploGrouper were minimal (supplementary table S4, Supplementary Material online). The analysis revealed 564 distinct paternal lineages, with 384 subhaplogroups observed only once (supplementary fig. S2 and table S4, Supplementary Material online). This underpins the necessity for a continuous refinement of Y-chromosome phylogenetic trees to accommodate newly identified Y-SNPs and update the haplogroup classification tool (Chen et al. 2021; Jagadeesan et al. 2021). The upcoming version of the YHC phylogenetic topology aims to address these gaps. Overall, the resolution and coverage of the YHSeqY3000 panel confirmed by the minimal differences in the haplogroup classification compared to ~10 Mb Y-chromosome sequences establish it as the most refined system to date for high-resolution paternal lineage analysis in Chinese populations (supplementary table S4, Supplementary Material online). This system exceeds the capabilities of previous methods, ensuring a more precise haplogroup classification at a finer scale (Wang et al. 2019; He et al. 2023a).

우리는 39개 소수 민족의 57개 인구 집단에 속하는 919명을 대상으로 전체 Y 염색체 시퀀싱을 수행했다(그림 1a; 보충 표 S1). 그리고 약 15,000명에 달하는 현대 및 고대 유라시아인들의 유전 데이터를 통합했으며(보충 표 S2, S3), 기존 계통 데이터베이스(ISOGG, Yfull)에는 없는 Y-SNP를 포함하는 고해상도 YHSeqY3000 패널을 개발했다. 주요 부계 혈통으로 확인된 C-M130, N-M231, O-M175, R-M207은 모두 5% 이상의 하플로그룹 빈도를 보였다(보충 그림 S1, 표 S4). 이 소수 민족들 사이에서는 D1-M174나 E1-P147과 같은 추가적인 하위 혈통도 발견되었다(보충 그림 S1, 표 S4). 하플로그룹을 분류하기 위해 자체 개발 스크립트, Y-Lineage Tracker, HaploGrouper라는 세 가지 방법을 사용하여 YHSeqY3000 패널 데이터에서 하플로그룹을 동시에 추정했다. 분류 결과에 나타난 불일치는 하플로그룹 판정의 정확성을 높일 필요가 있음을 보여주었다. 특히 Y-Lineage Tracker 분류법에서는 C-M130, J-M304와 같은 주요 하위 그룹에서 40개의 심각한 불일치가 발견되었다(보충 표 S4). 반면, HaploGrouper를 기반으로 한 분류에서는 하플로그룹 차이가 거의 없었다(보충 표 S4). 분석 결과, 총 564개의 뚜렷한 부계 혈통이 밝혀졌으며, 이 중 384개의 하위 하플로그룹은 단 한 번만 관찰되었다(보충 그림 S2, 표 S4). 이는 새롭게 발견되는 Y-SNP를 반영하고 하플로그룹 분류 도구를 업데이트하기 위해 Y 염색체 계통수를 계속해서 개선해야 할 필요성을 뒷받침한다(Chen et al. 2021; Jagadeesan et al. 2021). 앞으로 나올 YHC 계통수 버전에서는 이러한 문제점을 해결하고자 한다. 전반적으로, YHSeqY3000 패널의 해상도와 포괄 범위는 매우 뛰어난 것으로 확인되었다. 약 10Mb 길이의 Y 염색체 서열과 비교했을 때 하플로그룹 분류에서 차이가 거의 없었으며, 이는 이 패널이 현재까지 중국 인구의 고해상도 부계 혈통 분석을 위한 가장 정교한 시스템임을 입증한다(보충 표 S4). 이 시스템은 기존 방법들의 능력을 뛰어넘어, 더 세밀한 수준에서 더 정확한 하플로그룹 분류를 보장한다(Wang et al. 2019; He et al. 2023a).

그림 1. 새로 시퀀싱된 중국 소수 민족 919명의 지리적 위치 및 계통학적 특성. a) 동아시아 지도에 57개 중국 소수 민족 그룹에 속하는 919명의 주요 데이터를 표시했다. 지도 위의 원 크기는 각 인구 집단의 샘플 크기를 나타내며, 색칠된 성(省)은 샘플링 위치를 나타내고, 색상은 해당 지역의 총 샘플 크기를 의미한다. 또한 서부 유라시아, 몽골 고원, 그리고 중국 농업의 발원지인 황하(黃河) 및 양자강(揚子江) 유역의 고대 생계 전략(목축, 수렵-채집, 농업)을 묘사했다. b) Y 염색체 계통은 품질 관리를 통과한 914명을 포함하며, 널리 퍼져 있는 다양한 부계 혈통의 가장 최근 공통 조상(TMRCA)을 보여준다. 시몬스 게놈 다양성 프로젝트(Simons Genome Diversity Project)의 B-혈통 관련 대표 하플로타입을 외부 그룹(outgroup)으로 사용했다. 가지의 길이는 추정된 공통 조상 시점(TMRCA)과 비례한다. 주요 혈통은 색깔 있는 삼각형으로 표시했으며, 각 삼각형의 밑변 너비는 샘플 크기에 비례한다. 상세하고 시간 정보가 포함된 계통수는 보충 자료 그림 S1에 제시되어 있으며, 분기 시간의 척도는 다양한 배경색으로 구분했다.

Genetic Connections and Population Stratification among Modern and Ancient Eurasians

현대 및 고대 유라시아인 간의 유전적 관계와 인구 계층화

We explored the population differentiation among spatiotemporally diverse Eurasian populations based on the clustering patterns identified via the principal component analysis (PCA), multidimensional scaling analysis (MDS), and other population genetic analyses. The PCA distinctly separated ancient western Eurasians from East Asians, with each group exhibiting unique patterns of dominant paternal lineages and clustering branches on the phylogenetic tree (supplementary fig. S3a to j, Supplementary Material online). Modern population clustering aligns with their geographic and linguistic attributes, showing a clear separation among most Austronesian and Tibeto-Burman groups, while other populations demonstrate a considerable overlap in their clustering positions (supplementary fig. S4, Supplementary Material online). Iron Age (IA) Hanben individuals show close genetic ties with Austronesian groups, and northern Chinese individuals are closely aligned with Sino-Tibetan groups. Notably, there is a marked stratification between northern and southern East Asians, with further substructures among linguistically similar, but geographically distinct groups (supplementary figs. S5 to S7, Supplementary Material online). For instance, IA Hanben populations align closely with modern Han populations from Guangxi and Taiwan, whereas Yellow River Basin farmers form distinct clusters from other Han groups (supplementary fig. S7a, Supplementary Material online). Diverse Tibeto-Burman groups exhibited genetic distinctions between their northern and southern divisions (supplementary fig. S7b, Supplementary Material online). A significant differentiation was also evident among the Transeurasian-speaking groups, with the Koreanic and Japonic groups forming separate clades, the Mongolic and some Tungusic groups clustering together, and the Turkic groups sharing close affinity with certain Tungusic populations (supplementary fig. S5, Supplementary Material online). In South China and Southeast Asia (SEA), fine-scale clustering among the Austroasiatic, Austronesian, Hmong-Mien, and Tai-Kadai groups suggests an extensive gene flow, as evidenced by their overlapping genetic patterns (supplementary fig. S6, Supplementary Material online). Phylogenetic relationships and haplogroup frequency spectra highlighted genetic disparities between northern and southern Han groups and between northern and southern Tibeto-Burman speakers, while the gene flow was apparent between geographically proximate groups, such as between Austronesian and southern Han populations and between Transeurasian and northern Han populations (supplementary fig. S8, Supplementary Material online). This comprehensive analysis elucidates the complex genetic landscape and interactions among Eurasian populations.

우리는 주성분 분석(PCA), 다차원 척도법(MDS) 및 기타 집단 유전학 분석을 통해 나타난 군집 패턴을 기반으로 시공간적으로 다양한 유라시아 인구 집단 간의 분화를 탐구했다. 주성분 분석(PCA) 결과, 고대 서부 유라시아인과 동아시아인은 뚜렷하게 구분되었다. 각 그룹은 우세한 부계 혈통의 독특한 패턴을 보였고, 계통수에서도 서로 다른 가지로 묶였다(보충 그림 S3a-j). 현대 인구 집단의 군집은 지리적, 언어적 특성과 일치하는 경향을 보였다. 특히 오스트로네시아어족과 티베트-버마어족 그룹 대부분은 명확히 분리된 반면, 다른 인구 집단들은 군집 위치가 상당히 겹치는 모습을 보였다(보충 그림 S4). 철기 시대(IA)의 한본(Hanben) 유적 인골들은 오스트로네시아어족 그룹과 유전적으로 가까운 관계를 보였고, 중국 북부인들은 중국-티베트어족 그룹과 가깝게 나타났다. 주목할 점은, 북부와 남부 동아시아인 사이에 뚜렷한 유전적 계층이 존재하며, 언어는 비슷하지만 지리적으로 떨어진 그룹들 사이에는 더 세부적인 구조가 관찰된다는 것이다(보충 그림 S5-S7). 예를 들어, 철기 시대 한본(Hanben) 인구는 오늘날 광서(廣西)와 대만(臺灣)의 한족(漢族)과 매우 가깝게 나타나는 반면, 황하(黃河) 유역의 고대 농부들은 다른 한족 그룹과는 구별되는 뚜렷한 군집을 형성했다(보충 그림 S7a). 다양한 티베트-버마어족 그룹들은 북부와 남부 집단 간에 뚜렷한 유전적 차이를 보였다(보충 그림 S7b). 트랜스유라시아어족(알타이어족) 그룹들 사이에서도 상당한 분화가 뚜렷하게 나타났다. 한국어족과 일본어족 그룹은 각기 별개의 분기군을 형성했고, 몽골어족과 일부 퉁구스어족 그룹은 함께 묶였으며, 튀르크어족 그룹은 특정 퉁구스어족 인구 집단과 가까운 유연관계를 보였다(보충 그림 S5). 중국 남부와 동남아시아(SEA)에서는 오스트로아시아어족, 오스트로네시아어족, 몽-미엔어족, 타이-카다이어족 그룹들 간의 세밀한 군집 분석 결과, 유전적 패턴이 서로 겹치는 것으로 나타나 이들 사이에 광범위한 유전자 흐름이 있었음을 시사했다(보충 그림 S6). 계통 관계와 하플로그룹 빈도 분포를 통해 북부와 남부 한족(漢族) 간, 그리고 북부와 남부 티베트-버마어족 사용자 간의 유전적 차이가 드러났다. 동시에 오스트로네시아어족과 남부 한족, 트랜스유라시아어족과 북부 한족처럼 지리적으로 가까운 그룹들 사이에서는 뚜렷한 유전자 흐름이 관찰되었다(보충 그림 S8). 이 종합적인 분석은 유라시아 인구 집단들 간의 복잡한 유전적 지형과 상호작용을 명확히 보여준다.

We grouped populations by linguistic and ethnic traits to investigate genetic affinities within language- or ethnicity-based metapopulations (supplementary fig. S7c to h, Supplementary Material online). Geographically close populations, including the Austronesian-speaking Saisiyat, Thao, Taroko, Atayal, and Tsou from Taiwan Province, clustered distinctly, separating early from other reference groups (supplementary fig. S7c, Supplementary Material online). Distinct branches primarily comprised Tai-Kadai, nearby Austronesian groups like Ede and Giarai, and southern Tibeto-Burman speakers, such as Sila and Lolo. The genetic closeness between the Austronesian-related and Tai-Kadai-dominant clusters supports the hypothesis of a shared origin for Austronesian and Tai-Kadai speakers, as demonstrated by phylogenetic analyses based on neighbor-joining methods and clustering inferred from the haplogroup frequency spectra, PCA, and MDS (supplementary fig. S7d to f, Supplementary Material online). These analyses also revealed fine-scale genetic differences between Han Chinese and Tibeto-Burman populations and among linguistically diverse groups, underscoring frequent massive population movements and gene flow events in historical contexts. To determine whether paternal lineages corroborate current language family classifications and further explore genetic relationships within linguistically defined metapopulations, we merged all groups based on linguistic affinities for a comprehensive population genetic analysis (supplementary fig. S7g and h, Supplementary Material online). Notably, a close genetic clustering between the Tai-Kadai and Austroasiatic groups and between the Mongolic/Tungusic groups and the Amur River Basin ancient populations was observed. The neighbor-joining tree also indicated close genetic relationships between the Turkic and ancient Xinjiang populations, between the Koreanic and Japonic populations, and between the Austronesian and ancient Hanben populations (supplementary fig. S7h, Supplementary Material online). This study provides robust paternal genetic evidence supporting complex admixture and interactions among modern Chinese populations and ancient Eurasians. However, caution is advised regarding potential biases from low-coverage sampling and the simplistic grouping of linguistically similar, yet geographically disparate populations.

언어나 민족을 기반으로 한 메타집단 내의 유전적 관계를 조사하기 위해, 인구 집단을 언어 및 민족적 특성에 따라 그룹화했다(보충 그림 S7c-h). 대만(臺灣)성의 오스트로네시아어족에 속하는 사이시얏(Saisiyat), 사오(Thao), 타로코(Taroko), 아타얄(Atayal), 초우(Tsou)족과 같이 지리적으로 가까운 인구들은 뚜렷하게 하나의 군집을 형성하며 다른 참조 그룹들과 일찍 분리되었다(보충 그림 S7c). 별개의 가지들은 주로 타이-카다이어족, 에데(Ede)나 기아라이(Giarai) 같은 인근 오스트로네시아어족 그룹, 그리고 실라(Sila)와 롤로(Lolo) 같은 남부 티베트-버마어족 사용자들로 구성되었다. 오스트로네시아어족 관련 군집과 타이-카다이어족이 우세한 군집 간의 유전적 근접성은 이 두 언어의 사용자들이 공통된 기원을 가졌다는 가설을 뒷받침한다. 이는 근린 결합법에 기초한 계통 분석과 하플로그룹 빈도, PCA, MDS로부터 추론된 군집 분석을 통해 입증되었다(보충 그림 S7d-f). 이 분석들은 또한 한족(漢族)과 티베트-버마어족 인구 간, 그리고 언어적으로 다양한 그룹들 간의 미세한 유전적 차이를 드러냈으며, 이는 역사적으로 빈번하고 거대한 인구 이동과 유전자 흐름 사건이 있었음을 강조한다. 부계 혈통이 현재의 어족 분류와 일치하는지 확인하고 언어적으로 정의된 메타집단 내 유전 관계를 더 탐구하기 위해, 우리는 모든 그룹을 언어적 유사성에 따라 통합하여 종합적인 집단 유전학 분석을 수행했다(보충 그림 S7g, h). 그 결과, 타이-카다이어족과 오스트로아시아어족 그룹 간, 그리고 몽골어족/퉁구스어족 그룹과 아무르강(黑龍江) 유역 고대 인구 간에 가까운 유전적 군집이 관찰되었다. 근린 결합법으로 만든 계통수는 또한 튀르크어족과 고대 신강(新疆) 인구 간, 한국어족과 일본어족 인구 간, 그리고 오스트로네시아어족과 고대 한본(Hanben) 인구 간에 가까운 유전 관계가 있음을 보여주었다(보충 그림 S7h). 이 연구는 현대 중국 인구와 고대 유라시아인들 사이에 복잡한 혼혈과 상호작용이 있었음을 뒷받침하는 강력한 부계 유전학적 증거를 제공한다. 그러나 낮은 분석 깊이의 샘플링이나, 언어는 비슷하지만 지리적으로 멀리 떨어진 인구 집단을 단순히 하나로 묶는 과정에서 발생할 수 있는 잠재적 편향에 대해서는 주의가 필요하다.

Complex Population Migration and Admixture Events Inferred from the Y-Chromosome Diversity Landscape

Y 염색체 다양성으로 추론한 복잡한 인구 이동과 혼합 사건

The observed paternal genetic structure indicated that multiple complex ancient migration and admixture events significantly shaped the gene pool of Chinese populations. A time-stamped phylogenetic tree revealed multiple lineage diversifications after the last glacial maximum (20 kya), with these lineages dispersing at varying times (Fig. 1b; supplementary fig. S1, Supplementary Material online). Analysis using a maximum likelihood (ML) tree incorporating ancient DNA sequences revealed diverse founding populations contributing to the Chinese paternal gene pool that likely originated from ancient migrations of descendants from indigenous rice or millet farmers, Siberian hunter-gatherers, or western Eurasian steppe pastoralists (Fig. 2; supplementary figs. S9 to S12, Supplementary Material online).

관찰된 부계 유전 구조는 여러 차례의 복잡한 고대 이주와 혼합 사건이 중국 인구의 유전자 풀을 형성하는 데 중대한 영향을 미쳤음을 보여준다. 시간 정보가 포함된 계통수는 마지막 최대 빙하기(약 2만 년 전) 이후 여러 혈통이 다양하게 분화했으며, 이 혈통들이 각기 다른 시기에 퍼져나갔음을 보여준다(그림 1b; 보충 그림 S1). 고대 DNA 서열을 통합한 최대 가능도(ML) 계통수 분석 결과, 중국인 부계 유전자 풀에 기여한 창시 집단이 매우 다양했음이 드러났다. 이들은 토착 쌀 또는 기장 농부의 후손, 시베리아 수렵-채집인, 또는 서부 유라시아 초원 목축민들의 고대 이주에서 비롯되었을 가능성이 높다(그림 2; 보충 그림 S9-S12).

그림 2. 현대 및 고대 유라시아 인구 집단 간의 최대 가능도 계통수. 이 계통수는 새로 유전자형이 분석된 중국 소수 민족과 고대 유라시아 참조 인구 집단을 포함한다. 지도상의 색칠된 지역은 고대 및 현대 유라시아 인구의 샘플링 위치를 의미하며, 검은색 삼각형은 고대 유라시아 샘플이 수집된 유적지를 표시한다. 가지 길이는 돌연변이 수에 비례하고, 가지 색상은 샘플링 위치를 나타낸다. 실선은 현대인, 점선은 고대인을 나타낸다. 원 안의 다양한 모양과 색상은 현대인의 어족을 나타낸다: 장미색은 중국어족, 어두운 황금색은 티베트-버마어족, 청자색은 몽골어족, 갈색은 퉁구스어족, 회색은 튀르크어족, 연보라색은 한국어족, 녹색은 타이-카다이어족, 중간 바다색은 몽-미엔어족이다. 샘플 크기가 작아(5명 미만) 오스트로네시아어족, 오스트로아시아어족, 인도-유럽어족은 표시하지 않았다. 바깥쪽 원은 샘플 정보와 해당 하플로그룹 결과를 보여주며, 보라색은 현대인, 하늘색은 고대인을 나타낸다. 우세한 혈통과 그 대표적인 고대 게놈의 확대도는 지도 네 모서리에 강조 표시했다.

The extent to which ancestral sources affected the paternal genetic makeup of Chinese ethnic minorities was systematically investigated, along with the geographical spread of identified lineages and their associations with expansions related to ancient farmers, hunter-gatherers, and pastoralists. Additionally, to determine the origins and distribution patterns of dominant paternal lineages in China, the participants were grouped into geographically defined metapopulations, and general geographical distribution patterns were estimated (Figs. 3 to 5; supplementary figs. S17 and S18, Supplementary Material online). Finally, we systematically assessed how ancient technological innovations and human migration events have influenced the paternal genetic landscape of Chinese populations, revealing a complex interplay of genetic inputs from various ancient populations.

이러한 조상 집단이 중국 소수 민족의 부계 유전 구조에 어느 정도 영향을 미쳤는지를 체계적으로 조사했다. 또한, 확인된 혈통들의 지리적 확산과 이들이 고대 농부, 수렵-채집인, 목축민의 팽창과 어떤 연관이 있는지도 함께 살펴보았다. 더 나아가, 중국의 주요 부계 혈통의 기원과 분포 패턴을 파악하기 위해, 연구 참여자들을 지리적으로 정의된 메타집단으로 그룹화하여 전반적인 지리적 분포 패턴을 추정했다(그림 3-5; 보충 그림 S17, S18). 마지막으로, 고대의 기술 혁신과 인류 이주 사건이 중국 인구의 부계 유전 지형에 어떻게 영향을 미쳤는지 체계적으로 평가했다. 그 결과 다양한 고대 인구로부터 유입된 유전적 요소들이 복잡하게 얽혀 있음이 밝혀졌다.

그림 3. 고대 유라시아인 및 현대 동아시아·동남아시아인에서 나타나는 중국 주요 부계 혈통의 빈도 분포. a) 12개의 Y 염색체 혈통을 지닌 고대인 1,284명의 지리적 분포를 묘사했다. 다양한 하플로그룹은 각기 다른 색상의 원으로 표시되며, 각 원의 크기는 해당 하플로그룹의 빈도에 비례한다. b)와 c) 동부 유라시아 인구 집단 중 서쪽 기원 및 시베리아 수렵-채집인 관련 부계 혈통의 빈도를 보여준다. 최적화된 핫스팟 분석을 사용하여 이들 주요 혈통의 지리적 기원을 추정했다. d)와 e) 초기 동아시아인 관련 D, 고대 북부 동아시아(ANEA) 기장 농부 관련 O2, 고대 남부 동아시아 쌀 농부 관련 O1과 연관된 하위 혈통의 빈도를 표시했다. 연구된 혈통의 빈도가 높거나 계통지리학적으로 중요한 지역은 붉은색으로 강조 표시했다.

Gene Flow from Ancient Pastoralists and Barley Farmers in West Eurasia and Central/South Asia to East Asia

서부 유라시아 및 중앙/남아시아의 고대 목축민과 보리 농부로부터 동아시아로의 유전자 흐름

Prehistoric and historical cultural exchanges along the southern Bactrian Marianna Archaeological Complex oasis farming route, Inner Asian Mountain Corridor, and northern Yamnaya/Afanasievo steppe pastoralist migration routes have significantly shaped the autosomal gene pool of ancient populations in the Altai Mountains and surrounding areas of northwestern and northern East Asia (Zhang et al. 2021b). Haplogroups J/G/R and their major sublineages, which are prevalent among ancient western Eurasians, exhibit the highest frequencies in Northwest China (Figs. 2 and 3a and b; supplementary fig. S9a, Supplementary Material online). Specifically, most J haplogroup carriers in China belong to the J2-M172 sublineage, particularly J2a-M410. The origins of J2a in ancient populations can likely be traced back to the northern Fertile Crescent, and its current distribution primarily reflects expansions and admixture events related to ancient barley farmers (Figs. 2 and 3a). Similarly, individuals carrying G-M201 in Northwest China were predominantly classified under sublineage G2a (Fig. 3b). An optimized hot spot analysis revealed diffusion centers for J2a and G2a in the Xinjiang and Gansu-Qinghai regions, suggesting a correlation with these areas (supplementary fig. S17a, Supplementary Material online). Generally, the introduction of J/G-derived lineages into China is attributed to the eastward migration of barley farmer-related ancestral populations likely facilitated by gene flow events along the ancient Silk Road (Zhabagin et al. 2022; He et al. 2023c).

선사 및 역사 시대에 남쪽의 박트리아-마르기아나 고고학 복합체(BMAC) 오아시스 농경로, 내륙 아시아 산악 회랑, 북쪽의 얌나야/아파나시에보 초원 목축민 이주 경로를 따라 일어난 문화 교류는 알타이 산맥과 그 주변 북서 및 북부 동아시아 지역 고대 인구의 상염색체 유전자 풀을 형성하는 데 큰 영향을 미쳤다(Zhang et al. 2021b). 고대 서부 유라시아인들 사이에서 널리 퍼져 있던 하플로그룹 J, G, R과 그 주요 하위 혈통들은 중국 북서부에서 가장 높은 빈도로 나타난다(그림 2, 3a, 3b; 보충 그림 S9a). 구체적으로, 중국의 J 하플로그룹 보유자 대부분은 J2-M172 하위 혈통, 특히 J2a-M410에 속한다. 고대 인구에서 J2a의 기원은 북부 ‘비옥한 초승달 지대’로 거슬러 올라갈 수 있으며, 현재의 분포는 주로 고대 보리 농부와 관련된 팽창 및 혼합 사건을 반영한다(그림 2, 3a). 마찬가지로, 중국 북서부에서 G-M201을 보유한 개인들은 주로 G2a 하위 혈통으로 분류되었다(그림 3b). 최적화된 핫스팟 분석 결과, J2a와 G2a의 확산 중심지가 신강(新疆)과 감숙(甘肅)-청해(靑海) 지역으로 나타나, 이 지역들과의 연관성을 시사했다(보충 그림 S17a). 일반적으로, J/G 계열 혈통이 중국으로 유입된 것은 고대 실크로드를 따라 일어난 유전자 흐름을 통해 보리 농부와 관련된 조상 인구가 동쪽으로 이주한 결과로 여겨진다(Zhabagin et al. 2022; He et al. 2023c).

R-M207 is predominantly found among ancient western Eurasians and modern populations in North China, particularly in Northwest China (Figs. 2 and 3a and b). The basal haplogroup R was identified in a ~24,000-year-old individual from the Mal’ta site near Lake Baikal in Siberia (Raghavan et al. 2014). In China, approximately 90% of R carriers are categorized as R1-M173, which bifurcates into R1a-L146 and R1b-M343 approximately 23 kya. The frequency of R1a-L146 notably exceeded that of R1b-M343 (Figs. 1b and 3b; supplementary fig. S1, Supplementary Material online). Furthermore, all individuals within R1a were classified into R1a1a sublineages, with R1a1a1b diverging approximately 5 kya and being the most prevalent (Fig. 1b; supplementary fig. S1 and table S5, Supplementary Material online). The spatiotemporal distribution of R1 subclades is closely linked to the movements of ancient steppe pastoralists, underscoring a significant genetic flow into China (Figs. 2 and 3a). Conversely, R2-M479 appears in East Asia at low frequencies (supplementary fig. S17a, Supplementary Material online) and is primarily concentrated in Central/South Asia, having recently extended from South Asia to North China via the northern route. Analysis combining ancient and modern population phylogenies revealed that samples from Mongolia with substantial West Eurasian ancestry, such as Mongolia_EIA_Sagly_4 and Mongolia_LBA_Mongun Taiga_3, fall within the R1a1a sublineage. Nearly half of the ancient Xinjiang individuals are categorized within sublineages R1a or R1b, reflecting the historical impact of the Yamnaya/Afanasievo-related pastoralists on the genetic makeup of the northwestern Chinese populations (Figs. 2 and 3a). Additionally, the sporadic presence of other rare haplogroups like H-L901 and I-M170 in China suggests a broad and recent gene flow from Central/South Asian and West Eurasian ancestors into the region.

R-M207은 주로 고대 서부 유라시아인과 중국 북부, 특히 북서부의 현대 인구에서 발견된다(그림 2, 3a, 3b). R 하플로그룹의 뿌리는 시베리아 바이칼 호수 근처 말타(Mal’ta) 유적에서 발견된 약 24,000년 전 개인에게서 확인되었다(Raghavan et al. 2014). 중국에서 R 하플로그룹 보유자의 약 90%는 R1-M173으로 분류되며, 이는 약 23,000년 전에 R1a-L146과 R1b-M343으로 나뉘었다. R1a-L146의 빈도는 R1b-M343보다 현저히 높았다(그림 1b, 3b; 보충 그림 S1). 또한, R1a에 속하는 모든 개인은 R1a1a 하위 혈통으로 분류되었으며, 그중 R1a1a1b는 약 5,000년 전에 분기하여 가장 널리 퍼져 있다(그림 1b; 보충 그림 S1, 표 S5). R1 하위 그룹들의 시공간적 분포는 고대 초원 목축민의 이동과 밀접하게 연관되어 있으며, 이는 중국으로 상당한 유전적 흐름이 있었음을 강조한다(그림 2, 3a). 반대로, R2-M479는 동아시아에서 낮은 빈도로 나타나며(보충 그림 S17a), 주로 중앙/남아시아에 집중되어 있다. 이 혈통은 최근 남아시아에서 북쪽 경로를 통해 중국 북부로 확산되었다. 고대 및 현대 인구의 계통을 결합한 분석 결과, 상당한 서부 유라시아 혈통을 가진 몽골 지역 샘플들(예: Mongolia_EIA_Sagly_4, Mongolia_LBA_Mongun Taiga_3)이 R1a1a 하위 혈통에 속하는 것으로 밝혀졌다. 고대 신강(新疆) 지역 개인의 거의 절반이 R1a 또는 R1b 하위 혈통으로 분류되는데, 이는 얌나야/아파나시에보 관련 목축민들이 중국 북서부 인구의 유전 구성에 미친 역사적 영향을 반영한다(그림 2, 3a). 또한, 중국에서 H-L901이나 I-M170과 같은 다른 희귀 하플로그룹이 드물게 발견되는 것은 중앙/남아시아와 서부 유라시아 조상으로부터 이 지역으로 광범위하고 최근의 유전자 흐름이 있었음을 시사한다.

To confirm that migrations related to pastoralist populations have reshaped the distribution of western Eurasian-related lineages in Chinese populations, we estimated the correlation between haplogroup frequencies and both geographical (longitude and latitude) and genetic features (PC1-2, haplogroup frequency, Fst matrix, and autosomal-based admixture proportions). The frequencies of R-related sublineages correlate with latitude and exhibit high frequency in modern northwestern Chinese populations (supplementary fig. S14a and b, Supplementary Material online). Furthermore, the distribution patterns of R and its sublineages were significantly correlated (supplementary fig. S14c, Supplementary Material online). To elucidate the direct genetic contributions from ancient sources to modern Chinese populations, we constructed a six-source admixture model, revealing a gradual decrease in ancestral proportions from their archeologically confirmed origins or earliest emergence areas in China (Fig. 4b). If ancient migration events directly influence the lineage frequency patterns in Chinese populations, a solid positive correlation would be expected between the proportion of autosomal-based admixture from presumed ancestral sources and the frequency of founding lineages. Intriguingly, a significant correlation was observed between the Afanasievo-related ancestral proportions and the haplogroup frequencies of multiple H, J, and R sublineages (Fig. 5a and g). These findings, derived from the haplogroup frequency spectra of modern and ancient Eurasians, phylogeographic origin inferences, and multiple factor correlations, suggest that migrations of western Eurasian barley and pastoralist-related populations likely facilitated the development of these Chinese founding lineages.

목축 인구와 관련된 이주가 중국 인구 내 서부 유라시아 관련 혈통의 분포를 어떻게 바꾸었는지 확인하기 위해, 우리는 하플로그룹 빈도와 지리적(경도, 위도) 및 유전적 특징(주성분 1-2, 하플로그룹 빈도, Fst 행렬, 상염색체 기반 혼합 비율) 간의 상관관계를 추정했다. R 관련 하위 혈통의 빈도는 위도와 상관관계가 있으며, 현대 중국 북서부 인구에서 높은 빈도를 보인다(보충 그림 S14a, b). 나아가, R과 그 하위 혈통들의 분포 패턴은 통계적으로 유의미한 상관관계를 보였다(보충 그림 S14c). 고대 집단이 현대 중국 인구에 미친 직접적인 유전적 기여를 밝히기 위해, 우리는 6개의 조상 집단을 가정한 혼합 모델을 만들었다. 그 결과, 조상 집단의 비율이 고고학적으로 확인된 기원지나 중국 내 최초 출현 지역으로부터 멀어질수록 점차 감소하는 패턴을 보였다(그림 4b). 만약 고대의 이주 사건이 중국 인구의 혈통 빈도 패턴에 직접적인 영향을 미쳤다면, 추정된 조상 집단으로부터 온 상염색체 기반 혼합 비율과 창시 혈통의 빈도 사이에는 확고한 양의 상관관계가 기대될 것이다. 흥미롭게도, 아파나시에보(Afanasievo) 관련 조상 비율과 여러 H, J, R 하위 혈통의 하플로그룹 빈도 사이에 유의미한 상관관계가 관찰되었다(그림 5a, g). 현대 및 고대 유라시아인의 하플로그룹 빈도 분포, 계통지리학적 기원 추론, 다중 요인 상관관계 분석에서 도출된 이러한 결과들은 서부 유라시아의 보리 농부 및 목축 관련 인구의 이주가 이들 중국 창시 혈통의 형성을 촉진했을 가능성이 높음을 시사한다.

그림 4. 혼합(Admixture) 분석 결과 및 조상 집단의 지리적 분포. a) 2개에서 15개까지의 조상 집단을 미리 가정하여 현대 및 고대 동아시아 인구 집단에 대해 모델 기반 ADMIXTURE 분석을 수행했다. 교차 검증 오류가 가장 낮은 6개 조상 집단 혼합 모델이 최적의 모델로 선택되었다. b)부터 g)까지 다양한 중국 인구 집단에 걸친 혼합 비율의 분포를 묘사했으며, 붉은색은 특정 조상 요소의 비율이 가장 높음을 나타낸다.

그림 5. 상염색체 조상 비율과 Y 염색체 혈통 빈도 간의 상관관계. a)부터 f)까지 산점도는 상염색체 기반 조상 비율과 인구 특이적 창시 부계 혈통 간의 통계적으로 유의미한 양의 상관관계를 보여준다. g) 상염색체 기반 혼합 분석으로 추정된 조상 비율과 부계 혈통 빈도 간의 상관관계를 보여준다. 서로 다른 조상 집단 비율 간의 상관관계는 제외했으며, 이 상관관계들의 초기 군집 위치는 화살표로 표시했다. 이 시각적 분석은 유전적 유산과 혈통의 역학을 추적하는 데 있어 상염색체 데이터와 Y 염색체 데이터 간의 복잡한 상호작용을 명확히 보여준다.

Siberian Hunter-Gatherer-Dominant Paternal Lineages Are Widely Distributed in China

시베리아 수렵-채집인에게서 유래한 부계 혈통이 중국에 널리 분포함

Ancient DNA studies have identified an ancestral component, termed Ancient Northeast Asian (ANA) ancestry, related to Neolithic hunter-gatherers from the Russian Far East, Mongolian Plateau, and Baikal region (Jeong et al. 2020; Fig. 4a and c). This ANA-related ancestry has contributed variably to distinct ancient populations in these regions, which are characterized by high proportions of C/N/Q/R sublineages (Figs. 2 and 3a). The frequencies of the C2/N1/R1 sublineages were significantly positively correlated with the ANA-related ancestry (P<0.05, Fig. 5b and g). The haplogroup Q-M242 appears in China at very low frequencies (<3%, supplementary table S5, Supplementary Material online) and displays varied distribution patterns between North and South China (Fig. 3c; supplementary table S5, Supplementary Material online). This lineage, which might have originated in Central Asia and southern Siberia approximately 31 kya (Fig. 1b), includes the Q1a1a-M120 subclade. This subclade, unique to East Asians, is relatively prevalent among Han Chinese individuals (~81% of all Q lineages, supplementary table S5, Supplementary Material online) and likely underwent a local expansion in Northwest China between 5 and 3 kya (Sun et al. 2019). Furthermore, the Q1a1a1-F1626 subclade, a derivative of Q1a1a-M120, diversified approximately 4.3 kya (Fig. 1b). The ML phylogenetic topology indicated that ancient Mongolian individuals with minimal West Eurasian-related ancestry (<20%) belonged to Q1a1a or its sublineages (Figs. 2 and 3a). Venn diagrams illustrating shared ancestry-correlated lineages also show that the Q and R lineages are common among the Yamnaya and ANA-associated lineages (supplementary fig. S15, Supplementary Material online). Moreover, ancient individuals from the middle Neolithic (MN) Yangshao culture and approximately 3,000-year-old Hengbei residents from Shanxi, who carried the Q1a1a-M120 lineage, indicate that this haplogroup influenced the Han Chinese gene pool at least 6 kya. Q1b-M346, although rare in China, is concentrated at the intersection of Siberia and North China (supplementary fig. S17b and table S5, Supplementary Material online), with some Bronze Age (BA) and IA individuals from the Mongolian Plateau and Xinjiang regions genotyped for Q1b or its subclades (Figs. 2 and 3a).

고대 DNA 연구들은 러시아 극동, 몽골 고원, 바이칼 지역의 신석기 시대 수렵-채집인과 관련된 ‘고대 동북아시아인(ANA)’이라는 조상 요소를 확인했다(Jeong et al. 2020; 그림 4a, c). 이 ANA 관련 혈통은 해당 지역들의 뚜렷한 고대 인구 집단에 다양한 정도로 기여했으며, 이들 집단은 C/N/Q/R 하위 혈통의 비율이 높은 특징을 보인다(그림 2, 3a). C2/N1/R1 하위 혈통의 빈도는 ANA 관련 혈통과 통계적으로 유의미한 양의 상관관계를 보였다(P<0.05, 그림 5b, g). 하플로그룹 Q-M242는 중국에서 매우 낮은 빈도(<3%)로 나타나며(보충 표 S5), 중국 남부와 북부 간에 다양한 분포 패턴을 보인다(그림 3c; 보충 표 S5). 약 31,000년 전 중앙아시아와 남부 시베리아에서 기원했을 가능성이 있는 이 혈통은 Q1a1a-M120 하위 그룹을 포함한다(그림 1b). 동아시아인에게 특유한 이 하위 그룹은 한족(漢族) 사이에서 비교적 널리 퍼져 있으며(전체 Q 혈통의 약 81%, 보충 표 S5), 약 5,000년에서 3,000년 전 사이에 중국 북서부에서 지역적 팽창을 겪었을 가능성이 높다(Sun et al. 2019). 더 나아가, Q1a1a-M120에서 파생된 Q1a1a1-F1626 하위 그룹은 약 4,300년 전에 다양화되었다(그림 1b). 최대 가능도(ML) 계통수 구조는 서부 유라시아 관련 혈통이 거의 없는(<20%) 고대 몽골인들이 Q1a1a 또는 그 하위 혈통에 속했음을 보여준다(그림 2, 3a). 공유 조상과 관련된 혈통을 보여주는 벤 다이어그램 또한 Q와 R 혈통이 얌나야(Yamnaya) 및 ANA 관련 혈통 모두에서 공통적으로 나타남을 보여준다(보충 그림 S15). 또한, Q1a1a-M120 혈통을 지녔던 신석기 중기(MN) 양소(仰韶) 문화의 고대인들과 산서성(山西省)의 약 3,000년 전 횡북(橫北) 유적 거주민들의 존재는 이 하플로그룹이 적어도 6,000년 전부터 한족(漢族)의 유전자 풀에 영향을 미쳤음을 나타낸다. Q1b-M346은 중국에서는 드물지만 시베리아와 중국 북부의 교차점에 집중되어 있으며(보충 그림 S17b, 표 S5), 몽골 고원과 신강(新疆) 지역의 일부 청동기 시대(BA) 및 철기 시대(IA) 개인들이 Q1b 또는 그 하위 그룹으로 유전자형이 분석되었다(그림 2, 3a).

Haplogroup N-M231, particularly its subclade N1-CTS3750, is prevalent among Chinese populations, diverging into N1a-F1206 and N1b-F2930 around 19 kya (Fig. 1b; supplementary fig. S1, Supplementary Material online). Y-chromosome analyses of ancient individuals from the West Liao River Basin, dating from 6,500 to 2,700 BP, indicated that N-M231 was the dominant paternal lineage in Northeast China during the Neolithic, with its frequency gradually declining over time (Cui et al. 2013). The frequencies of the N1a-F1206 and N1b-F2930 subclades were high in North China and low-altitude Southwest China, respectively (supplementary fig. S17b, Supplementary Material online). These findings suggest a north-south differentiation of N1 subclades in North China, with N1a-F1206 migrating northward beyond East Asia and N1b-F2930 moving southward to become a major paternal lineage among Tibeto-Burman groups, notably the Yi people (supplementary table S5, Supplementary Material online). N1a1-M46/Tat, a dominant subclade, likely originated in Northeast Asia. An individual from the Houtaomuga site, dated 7 kya, carried N1a1a1a1a-M2117, which was genetically linked to early Neolithic (EN) Amur River Basin individuals (Ning et al. 2020a). Further analysis revealed that the IA Xinjiang individuals, BA West Liao River individuals, and several southern Siberian ancient individuals belonged to N1a or its sublineages (Fig. 2), which correlated significantly with ANA-related ancestral components (Fig. 5b). N1a2-F1008/L666 comprised approximately 67% of the N1a sublineages, bifurcating into N1a2a and N1a2b approximately 9.5 kya; N1a2a1, which made up the largest proportion of N1a2a (~82%), which diversified approximately 4.4 kya; and N1a2b, which diverged approximately 4.0 kya (Fig. 1b; supplementary fig. S1, Supplementary Material online). Ancient DNA data revealed that EN Shamanka individuals from Cis-Baikal and several southern Siberian ancients belonged to N1a2a (Fig. 2). Early diffusion centers for N1a2 were identified in North China and the southeastern part of Northeast China (supplementary fig. S17b, Supplementary Material online). N1b-F2930 is primarily found in Tibeto-Burman-speaking populations in low-altitude Southwest China (~24%) and less frequently in other Chinese populations (supplementary table S5, Supplementary Material online). Notably, ancient East Asians belonging to sublineage N1b, specifically N1b2 or its derivatives, are mainly distributed on the Tibetan Plateau (Fig. 2). To better understand the phylogeographic origins of N-M231 and its N1a/N1b subclades and the factors influencing their distribution patterns, further collection and whole Y-chromosome sequencing of spatiotemporally distinct ancient and modern Eurasians belonging to N sublineages are essential.

하플로그룹 N-M231, 특히 그 하위 그룹인 N1-CTS3750은 중국 인구에 널리 퍼져 있으며, 약 19,000년 전에 N1a-F1206과 N1b-F2930으로 분기했다(그림 1b; 보충 그림 S1). 기원전 6,500년에서 2,700년 사이의 서요하(西遼河) 유역 고대인들에 대한 Y 염색체 분석 결과, N-M231은 신석기 시대 중국 동북부의 지배적인 부계 혈통이었으며, 그 빈도는 시간이 지남에 따라 점차 감소했다(Cui et al. 2013). N1a-F1206과 N1b-F2930 하위 그룹의 빈도는 각각 중국 북부와 저지대인 중국 남서부에서 높게 나타났다(보충 그림 S17b). 이 결과는 중국 북부에서 N1 하위 그룹이 남북으로 분화했음을 시사한다. 즉, N1a-F1206은 북쪽으로 동아시아를 넘어 이주했고, N1b-F2930은 남쪽으로 이동하여 티베트-버마어족 그룹, 특히 이족(彝族)의 주요 부계 혈통이 되었다(보충 표 S5). 주요 하위 그룹인 N1a1-M46/Tat은 동북아시아에서 기원했을 가능성이 높다. 7,000년 전 후투오무가(後套木嘎) 유적의 한 개인은 N1a1a1a1a-M2117을 지녔는데, 이는 신석기 초기(EN) 아무르강(黑龍江) 유역 개인들과 유전적으로 연결된다(Ning et al. 2020a). 추가 분석 결과, 철기 시대(IA) 신강(新疆) 개인, 청동기 시대(BA) 서요하(西遼河) 개인, 그리고 여러 남부 시베리아 고대인들이 N1a 또는 그 하위 혈통에 속했으며(그림 2), 이는 ANA 관련 조상 요소와 유의미한 상관관계가 있었다(그림 5b). N1a2-F1008/L666은 N1a 하위 혈통의 약 67%를 차지했으며, 약 9,500년 전에 N1a2a와 N1a2b로 나뉘었다. N1a2a의 가장 큰 부분(약 82%)을 차지하는 N1a2a1은 약 4,400년 전에 다양화되었고, N1a2b는 약 4,000년 전에 분기했다(그림 1b; 보충 그림 S1). 고대 DNA 데이터는 바이칼 연안(Cis-Baikal)의 신석기 초기(EN) 샤만카(Shamanka) 유적 개인들과 여러 남부 시베리아 고대인들이 N1a2a에 속했음을 보여준다(그림 2). N1a2의 초기 확산 중심지는 중국 북부와 중국 동북부의 남동쪽 지역으로 확인되었다(보충 그림 S17b). N1b-F2930은 주로 저지대 중국 남서부의 티베트-버마어족 사용 인구(약 24%)에서 발견되며, 다른 중국 인구에서는 빈도가 낮다(보충 표 S5). 주목할 점은, N1b 하위 혈통, 특히 N1b2나 그 파생 혈통에 속하는 고대 동아시아인들은 주로 티베트 고원에 분포한다는 것이다(그림 2). N-M231 및 그 하위 그룹인 N1a/N1b의 계통지리학적 기원과 분포 패턴에 영향을 미치는 요인들을 더 잘 이해하기 위해서는, N 하위 혈통에 속하는 시공간적으로 다양한 고대 및 현대 유라시아인들을 추가로 수집하여 전체 Y 염색체 시퀀싱을 수행하는 것이 필수적이다.

Haplogroup C-M130, one of the primary paternal lineages in East Asia and likely carried by early settlers, diverged approximately 50 kya (Fig. 1b). Its subclade C2-M217, which is particularly widespread in North China, exhibited a notable frequency across multiple regions (Figs. 2 and 3a and c; supplementary fig. S17b, Supplementary Material online). The earliest known individual carrying C2-M217, designated as AR19K, dates from 19,587 to 19,175 cal BP in the Amur River Basin (Fig. 2). Distinct patterns are observed for the C2a-L1373 and C2b-F1067 subclades. C2a-L1373, sometimes referred to as the “northern branch,” shows the highest frequencies in Inner Mongolia, whereas C2b-F1067, the “southern branch,” is most prevalent in Central, North, and Northeast China (Fig. 3c). The C2a subclade, particularly C2a1a, is predominant among Transeurasian-speaking populations, with C2a1a1b1-F1756, C2a1a2a-M86, and C2a1a3a-F3796 identified as major subclades within China (Fig. 2; supplementary table S5, Supplementary Material online). BEAST-based phylogenetic analysis revealed that C2a1a1b1 diversified into C2a1a1b1a and C2a1a1b1b approximately 5.4 kya (Fig. 1b; supplementary fig. S1, Supplementary Material online), which are widely found in the northern Han, Mongolic and Tungusic people (supplementary table S5, Supplementary Material online). Historical dispersal of these subclades is evidenced by their presence in BA West Liao River, IA Amur River Basin, and several BA to Historical Era (HE) individuals from the Mongolian Plateau (Fig. 2), suggesting links to early expansions of Mongolic/Tungusic ancestors. Furthermore, C2a1a2a sublineages are common in Transeurasian groups across East Asia and North Asia (Fig. 2; supplementary table S5, Supplementary Material online). A Mesolithic Amur River Basin individual (AR11K), an ANA-representative Boisman_MN, and two HE Mongolian Plateau individuals carried C2a1a2 or C2a1a2a (Fig. 2), indicating that migration from the Amur River Basin to the Mongolian Plateau contributed to the genetic makeup of the current Transeurasians, particularly Mongolic/Turkic speakers (supplementary table S5, Supplementary Material online). C2a1a3a-F3796, also known as the C2*-Star Cluster, diverged from C2a1a3 approximately 3.7 kya, predating previous estimates (Wei et al. 2018). This may be due to sampling bias and differences in TMRCA estimation methods based on Y-STRs and Y-chromosome sequences. This sublineage is foundational among Mongolic-speaking populations. One Neolithic Amur River Basin individual, one MN Boisman individual, several HE Mongolian Plateau ancients, and IA Xinjiang samples are classified under the C2a1a3 sublineages (Fig. 2). Additionally, C2a sublineages are also identified in central and southern Chinese populations (supplementary table S5, Supplementary Material online), suggesting their southward migration from North Asia, likely driven by the expansion of the Mongol Empire.

하플로그룹 C-M130은 동아시아의 주요 부계 혈통 중 하나로, 초기 정착민들이 지녔을 가능성이 있으며 약 50,000년 전에 분기했다(그림 1b). 그 하위 그룹인 C2-M217은 특히 중국 북부에 널리 퍼져 있으며, 여러 지역에 걸쳐 높은 빈도를 보였다(그림 2, 3a, 3c; 보충 그림 S17b). C2-M217을 지닌 가장 오래된 개인으로 알려진 AR19K는 아무르강(黑龍江) 유역에서 발견되었으며, 그 연대는 보정연대로 19,587년에서 19,175년 전이다(그림 2). C2a-L1373과 C2b-F1067 하위 그룹에서는 뚜렷이 다른 분포 패턴이 관찰된다. 때때로 ‘북방계’로 불리는 C2a-L1373은 내몽골(內蒙古)에서 가장 높은 빈도를 보이는 반면, ‘남방계’인 C2b-F1067은 중국 중부, 북부, 동북부에서 가장 널리 퍼져 있다(그림 3c). C2a 하위 그룹, 특히 C2a1a는 트랜스유라시아어족(알타이어족) 사용 인구에서 우세하며, 중국 내에서는 C2a1a1b1-F1756, C2a1a2a-M86, C2a1a3a-F3796이 주요 하위 그룹으로 확인되었다(그림 2; 보충 표 S5). BEAST 프로그램을 이용한 계통 분석 결과, C2a1a1b1은 약 5,400년 전에 C2a1a1b1a와 C2a1a1b1b로 다양화되었으며(그림 1b; 보충 그림 S1), 이들은 북부 한족(漢族), 몽골어족, 퉁구스어족 사람들에게서 널리 발견된다(보충 표 S5). 이 하위 그룹들의 역사적 확산은 청동기 시대(BA) 서요하(西遼河), 철기 시대(IA) 아무르강(黑龍江) 유역, 그리고 청동기 시대부터 역사 시대(HE)에 이르는 몽골 고원의 여러 개인들에게서 이들이 발견됨으로써 입증된다(그림 2). 이는 몽골어족/퉁구스어족 조상의 초기 팽창과 관련이 있음을 시사한다. 나아가, C2a1a2a 하위 혈통은 동아시아와 북아시아에 걸쳐 트랜스유라시아어족(알타이어족) 그룹에서 흔하게 나타난다(그림 2; 보충 표 S5). 중석기 시대 아무르강(黑龍江) 유역의 개인(AR11K), 고대 동북아시아인(ANA)을 대표하는 보이스만(Boisman) 신석기 중기인, 그리고 역사 시대(HE) 몽골 고원의 개인 두 명이 C2a1a2 또는 C2a1a2a를 지니고 있었다(그림 2). 이는 아무르강 유역에서 몽골 고원으로의 이주가 현재 트랜스유라시아어족, 특히 몽골어족/튀르크어족 사용자의 유전적 구성에 기여했음을 나타낸다(보충 표 S5). ‘C2*-스타 클러스터’로도 알려진 C2a1a3a-F3796은 약 3,700년 전에 C2a1a3에서 분기했는데, 이는 이전 연구들의 추정치보다 이른 시점이다(Wei et al. 2018). 이는 샘플링 편향과, Y-STR 및 Y 염색체 서열 기반의 공통 조상 시점(TMRCA) 추정 방법의 차이 때문일 수 있다. 이 하위 혈통은 몽골어족 사용 인구의 기초를 이룬다. 신석기 시대 아무르강(黑龍江) 유역의 개인 한 명, 신석기 중기(MN) 보이스만(Boisman) 개인 한 명, 여러 역사 시대(HE) 몽골 고원 고대인, 그리고 철기 시대(IA) 신강(新疆) 샘플들이 C2a1a3 하위 혈통으로 분류된다(그림 2). 또한, C2a 하위 혈통은 중국 중부 및 남부 인구에서도 확인되는데(보충 표 S5), 이는 아마도 몽골 제국의 팽창에 의해 촉발된 북아시아로부터의 남방 이주를 시사한다.

The phylogenetic analysis of C2b-F1067 indicated that ancient populations carrying its sublineages significantly enriched the gene pool of modern eastern Eurasians. Observations suggest that Inner Mongolia and Northeast China were likely initial dispersal centers for C2b, exhibiting distinct geographical distribution patterns (supplementary fig. S17b, Supplementary Material online). For instance, C2b1a1-CTS2657 is found at high frequencies in North and Northeast China, while C2b1a2-F3880 predominates in Northeast and North China, as well as in East China, notably in Shandong, Jiangsu, and Shanghai. Conversely, C2b1b-F845 had the highest frequencies in Central China, Southwest China (mainly Guizhou), and SEA (supplementary fig. S17b, Supplementary Material online). The distribution patterns identified for the C2b sublineages, which partly diverge from previous studies (Wu et al. 2020), may result from sampling biases and differences in reference populations. Our analysis confirmed the southern origin of C2b1b-F845 (supplementary fig. S17b, Supplementary Material online) and identified two ancient individuals from Shigatse on the Tibetan Plateau with C2b1 mutations, one late Neolithic (LN) Shimao individual belonging to C2b1a2b1, and only two Neolithic Yellow River Basin farmers and one HE Tibetan Plateau individual associated with C2b1b sublineages (Fig. 2). To comprehensively explore the phylogeographic origin and dispersal of C2b1 sublineages, further analysis of spatiotemporally diverse ancient southern East Asians (ASEA), particularly from low-altitude regions, is needed. Statistically significant negative correlations between pairwise Fst genetic distances and the frequency of western Eurasian/Siberian-related lineages underscore their contribution to the genetic differentiation between northern and southern East Asians (supplementary fig. S14b, Supplementary Material online). Overall, genetic analyses incorporating the haplogroup frequency spectra of modern and ancient East Asians revealed a robust genetic connection between the descendants of Neolithic southern Siberian hunter-gatherers and modern East Asians. The geographical distribution patterns and TMRCA estimates of C2a1a/C2b1a/N1a1/Q1a1a-derived sublineages support the hypothesis that ancient migrations of West Liao River millet farmers have shaped the current distribution patterns in Chinese populations, particularly among Transeurasian speakers. These findings align with earlier findings triangulated from linguistic, archaeological, and genetic evidence (Robbeets et al. 2021).

C2b-F1067의 계통 분석은 이 하위 혈통들을 지닌 고대 인구가 현대 동부 유라시아인의 유전자 풀을 상당히 풍부하게 했음을 나타낸다. 관찰 결과, 내몽골(內蒙古)과 중국 동북부는 C2b의 초기 확산 중심지였을 가능성이 높으며, 이들은 뚜렷한 지리적 분포 패턴을 보인다(보충 그림 S17b). 예를 들어, C2b1a1-CTS2657은 중국 북부와 동북부에서 높은 빈도로 발견되는 반면, C2b1a2-F3880은 동북부와 북부, 그리고 동부 중국, 특히 산동(山東), 강소(江蘇), 상해(上海)에서 우세하다. 반대로, C2b1b-F845는 중국 중부, 남서부(주로 귀주(貴州)), 그리고 동남아시아(SEA)에서 가장 높은 빈도를 보였다(보충 그림 S17b). C2b 하위 혈통에 대해 확인된 분포 패턴은 이전 연구(Wu et al. 2020)와 부분적으로 다른데, 이는 샘플링 편향과 참조 인구 집단의 차이에서 비롯된 것일 수 있다. 우리의 분석은 C2b1b-F845가 남쪽에서 기원했음을 확인시켜 주었으며(보충 그림 S17b), 티베트 고원의 시가체(Shigatse)에서 C2b1 돌연변이를 가진 고대인 두 명, C2b1a2b1에 속하는 신석기 후기(LN) 석묘(石峁) 유적 개인 한 명, 그리고 C2b1b 하위 혈통과 관련된 신석기 시대 황하 유역 농부 단 두 명과 역사 시대(HE) 티베트 고원 개인 한 명을 확인했다(그림 2). C2b1 하위 혈통의 계통지리학적 기원과 확산을 종합적으로 탐구하기 위해서는, 시공간적으로 다양한 고대 남부 동아시아인(ASEA), 특히 저지대 지역 출신에 대한 추가 분석이 필요하다. 쌍별 Fst 유전 거리와 서부 유라시아/시베리아 관련 혈통의 빈도 사이에 나타나는 통계적으로 유의미한 음의 상관관계는, 이들 혈통이 북부와 남부 동아시아인 간의 유전적 분화에 기여했음을 강조한다(보충 그림 S14b). 전반적으로, 현대 및 고대 동아시아인의 하플로그룹 빈도 분포를 통합한 유전 분석은 신석기 시대 남부 시베리아 수렵-채집인의 후손과 현대 동아시아인 사이에 강력한 유전적 연결이 있음을 밝혔다. C2a1a/C2b1a/N1a1/Q1a1a에서 파생된 하위 혈통들의 지리적 분포 패턴과 공통 조상 시점(TMRCA) 추정치는 서요하(西遼河) 유역 기장 농부들의 고대 이주가 현재 중국 인구, 특히 트랜스유라시아어족 사용자들의 분포 패턴을 형성했다는 가설을 뒷받침한다. 이러한 발견은 언어학, 고고학, 유전학 증거를 종합하여 얻은 이전의 연구 결과와 일치한다(Robbeets et al. 2021).

Traces of the Early Asian and Ancient Northern East Asian Millet Farmer-Related Lineages in China

초기 아시아인 및 고대 북부 동아시아 기장 농부 관련 혈통의 흔적

The ancient genetic connections among Andamanese, Jomon-related indigenous Japanese, and highland Tibetans are evidenced by shared Paleolithic ancestral components and the uniparental D lineage. Analysis of the phylogeographic origins of D subclades revealed that D1-M174, a major paternal haplogroup in East Asians, is prevalent in our YHC (supplementary fig. S16, Supplementary Material online). Haplogroup D1a, which is particularly frequent in the Tibetan Plateau, is predominantly subdivided into D1a1a-M15 and D1a1b-P99, with these divisions occurring approximately 46 kya (Fig. 1b; supplementary fig. S1, Supplementary Material online). D1a1a sublineages are commonly found (>54%) among Tibeto-Burman-speaking populations in Southwest China and are less frequent in other Chinese populations, while D1a1b sublineages are most prevalent on the Tibetan Plateau (>36%, supplementary table S5, Supplementary Material online). D1a1a sublineages are frequently found in the Mongolian and Tibetan plateaus and Yellow River Basin ancients, and D1a1b sublineages are mainly found in the Tibetan Plateau ancients (Fig. 2). The distribution patterns of these sublineages in both modern and ancient East Asians provide direct evidence of their migration paths: D1a1a-M15 likely migrated northward through western Sichuan to the Gansu-Qinghai region and possibly into the Himalayan area along the Tibetan-Yi corridor; D1a1b-P99, particularly its subclade D1a1b1-P47, originated on the Tibetan Plateau. These D1a sublineages are predominantly found in Tibetan populations, supported by genetic contributions from northern Chinese millet farmers via a revised Y-chromosome phylogeny and correlations with O2 sublineages and Lubrak-related Tibetan Plateau ancestry (Fig. 4d; supplementary fig. S14c, Supplementary Material online). Gene flow events and the presence of Lubrak-related D sublineages significantly influenced the genetic diversity patterns. Notably, the frequencies of four lineages (O2a2b1, O2a2b1a, O2a2b1a1, and O2a2b1a1a) strongly correlate with the Lubrak-related ancestry, confirming that Neolithic expansions from the Yellow River Basin contributed to the peopling of the Tibetan Plateau (Fig. 5c). Ancient DNA evidence from autosomal variations and maternal lineages further underscores the substantial impact of Neolithic millet farmers on the permanent settlement of the Tibetan Plateau (Wang et al. 2023).

안다만인, 조몬(繩文) 관련 일본 원주민, 고지대 티베트인 사이의 고대 유전적 연결은 이들이 공유하는 구석기 시대 조상 요소와 단일부모 유전 계통인 D 혈통에 의해 입증된다. D 하위 그룹의 계통지리학적 기원을 분석한 결과, 동아시아인의 주요 부계 하플로그룹인 D1-M174가 우리의 염황 코호트(YHC)에서 널리 퍼져 있음이 밝혀졌다(보충 그림 S16). 티베트 고원에서 특히 빈도가 높은 하플로그룹 D1a는 주로 D1a1a-M15와 D1a1b-P99로 나뉘며, 이 분화는 약 46,000년 전에 일어났다(그림 1b; 보충 그림 S1). D1a1a 하위 혈통은 중국 남서부의 티베트-버마어족 사용 인구에서 흔하게 발견되며(>54%), 다른 중국 인구에서는 빈도가 낮은 반면, D1a1b 하위 혈통은 티베트 고원에서 가장 널리 퍼져 있다(>36%, 보충 표 S5). D1a1a 하위 혈통은 몽골 고원, 티베트 고원, 황하(黃河) 유역의 고대인들에게서 자주 발견되며, D1a1b 하위 혈통은 주로 티베트 고원의 고대인들에게서 발견된다(그림 2). 현대 및 고대 동아시아인 모두에서 나타나는 이 하위 혈통들의 분포 패턴은 그들의 이동 경로에 대한 직접적인 증거를 제공한다. 즉, D1a1a-M15는 사천(四川) 서부를 거쳐 북쪽으로 감숙(甘肅)-청해(靑海) 지역으로 이주했으며, 아마도 티베트-이족(彝族) 회랑을 따라 히말라야 지역으로도 들어갔을 것이다. 반면, D1a1b-P99, 특히 그 하위 그룹인 D1a1b1-P47은 티베트 고원에서 기원했다. 이들 D1a 하위 혈통은 주로 티베트 인구에서 발견되는데, 이는 수정된 Y 염색체 계통수와 O2 하위 혈통 및 루브락(Lubrak) 관련 티베트 고원 조상 혈통과의 상관관계를 통해 볼 때, 중국 북부 기장 농부들의 유전적 기여가 있었음을 뒷받침한다(그림 4d; 보충 그림 S14c). 유전자 흐름 사건과 루브락(Lubrak) 관련 D 하위 혈통의 존재는 유전적 다양성 패턴에 큰 영향을 미쳤다. 특히, 4개 혈통(O2a2b1, O2a2b1a, O2a2b1a1, O2a2b1a1a)의 빈도는 루브락 관련 조상 혈통과 강한 상관관계를 보이며, 이는 황하(黃河) 유역으로부터의 신석기 시대 팽창이 티베트 고원의 인구 형성에 기여했음을 확인시켜 준다(그림 5c). 상염색체 변이와 모계 혈통에서 나온 고대 DNA 증거는 신석기 시대 기장 농부들이 티베트 고원의 영구 정착에 상당한 영향을 미쳤음을 더욱 강조한다(Wang et al. 2023).

Archeological evidence indicates that millet-based agriculture independently emerged in the Yellow River Basin and West Liao River at approximately 6,000 BCE, fostering the development of foxtail (Setaria italica)-prevalent Yangshao and broomcorn (Panicum miliaceum)-prevalent Xinglongwa cultures, respectively (Miller et al. 2016; Leipe et al. 2019). Leipe et al. noted that shifts in agricultural practices from approximately 6000 to 2,000 BCE led to a quasi-exponential population growth in North China, aligning with the major dispersal of Sino-Tibetan-speaking populations from the Yellow River Basin during the fourth millennium BCE (Leipe et al. 2019). Ancient DNA analyses of millet farmers from the Yangshao and Longshan cultures suggested that the Sino-Tibetan people originated in North China (Ning et al. 2020b). The Haojiatai-related ancestry dominant in Chinese populations correlated strongly with the O/Q/C/N lineages (Figs. 4e and 5d). O-M175, which is prevalent in East and Southeast Asians, includes the significant O1-F265 and O2-M122 subclades, whose expansions are associated with the spread of millet and rice agriculture from domestication centers in the Yellow River Basin, West Liao River, and Yangtze River Basin (Fig. 5d to f). The influence of Ancient Northern East Asian (ANEA) on modern East Asian paternal genetic diversity requires a further comprehensive assessment. O-related sublineages, with O2 lineages diversifying approximately 29 kya (Fig. 1b), are broadly distributed in North China and the Tibetan Plateau (Figs. 2 and 3a). O2-M122, particularly subclade O2a-M324, is a major paternal lineage in East and Southeast Asians, showing a strong correlation in distribution patterns (Fig. 3d; supplementary figs. S17, and S18c, Supplementary Material online). O2a-M324 is found at high frequencies along China’s coast and surrounding areas (>52%), suggesting ancestral migration routes along the coast extending into SEA (supplementary fig. S17b, Supplementary Material online). An ancient individual from the MN West Liao River Hongshan culture identified as belonging to O2a-M324 supports this lineage’s association with early cultural developments in Northeast China. Systematic evidence further corroborated that O2a-M324 originated in Northeast China, particularly in Heilongjiang Province, where it remains highly prevalent (supplementary fig. S17b, Supplementary Material online). However, the high frequencies also observed in eastern coastal provinces like Shandong, Shanghai, Fujian, and Guangdong may reflect sampling biases and historical migrations, notably during the Chuangguandong movement. Additionally, the optimized hot spot analysis results suggest that the middle and lower reaches of the Yellow River Basin were early diffusion centers for O2a-M324 (supplementary fig. S17b, Supplementary Material online).

고고학적 증거에 따르면 기장 농업은 기원전 약 6,000년경 황하(黃河) 유역과 서요하(西遼河) 유역에서 각각 독립적으로 발생했으며, 이는 조(foxtail millet)가 우세했던 양소(仰韶) 문화와 기장(broomcorn millet)이 우세했던 흥륭와(興隆窪) 문화의 발전을 촉진했다(Miller et al. 2016; Leipe et al. 2019). 라이프(Leipe) 등은 기원전 약 6000년에서 2000년 사이에 농업 방식의 변화가 중국 북부에서 준기하급수적인 인구 증가로 이어졌다고 지적했으며, 이는 기원전 4천년기 동안 황하(黃河) 유역에서 중국-티베트어족 사용 인구가 대규모로 확산된 것과 시기적으로 일치한다(Leipe et al. 2019). 양소(仰韶) 및 용산(龍山) 문화의 기장 농부들에 대한 고대 DNA 분석은 중국-티베트어족이 중국 북부에서 기원했음을 시사했다(Ning et al. 2020b). 중국 인구에서 우세한 하고가(郝家台) 유적 관련 조상 혈통은 O/Q/C/N 혈통과 강한 상관관계를 보였다(그림 4e, 5d). 동아시아인과 동남아시아인에게 널리 퍼져 있는 O-M175는 주요 하위 그룹인 O1-F265와 O2-M122를 포함하며, 이들의 팽창은 황하(黃河) 유역, 서요하(西遼河), 양자강(揚子江) 유역의 작물 재배 중심지로부터 기장과 쌀 농업이 확산된 것과 관련이 있다(그림 5d-f). 고대 북부 동아시아인(ANEA)이 현대 동아시아인의 부계 유전 다양성에 미친 영향은 추가적인 종합 평가가 필요하다. O 계열 하위 혈통들, 특히 약 29,000년 전에 다양화된 O2 혈통은 중국 북부와 티베트 고원에 널리 분포한다(그림 1b, 2, 3a). O2-M122, 특히 그 하위 그룹인 O2a-M324는 동아시아인과 동남아시아인의 주요 부계 혈통으로, 분포 패턴에서 강한 상관관계를 보인다(그림 3d; 보충 그림 S17, S18c). O2a-M324는 중국 해안과 주변 지역에서 높은 빈도(>52%)로 발견되는데, 이는 조상들이 해안을 따라 동남아시아(SEA)로 뻗어 나가는 경로로 이주했음을 시사한다(보충 그림 S17b). 신석기 중기(MN) 서요하(西遼河) 홍산(紅山) 문화의 고대인 한 명이 O2a-M324에 속하는 것으로 확인되었는데, 이는 이 혈통이 중국 동북부의 초기 문화 발전과 관련이 있음을 뒷받침한다. 체계적인 증거들은 O2a-M324가 중국 동북부, 특히 여전히 매우 높은 빈도를 보이는 흑룡강성(黑龍江省)에서 기원했음을 더욱 뒷받침한다(보충 그림 S17b). 그러나 산동(山東), 상해(上海), 복건(福建), 광동(廣東)과 같은 동부 해안 지방에서도 높은 빈도가 관찰되는데, 이는 샘플링 편향이나 역사적 이주, 특히 ‘틈관동(闖關東)’ 시기의 이주를 반영하는 것일 수 있다. 추가적으로, 최적화된 핫스팟 분석 결과는 황하(黃河) 중·하류 유역이 O2a-M324의 초기 확산 중심지였음을 시사한다(보충 그림 S17b).

Distinct distribution patterns were observed for the O2a1-L127.1 and O2a2-JST021354/P201 sublineages (supplementary fig. S17, Supplementary Material online). O2a1 is most prevalent in Southeast China, with its frequency decreasing in adjacent regions (supplementary fig. S17c, Supplementary Material online). Most of the O2a1 subclades show similar distribution patterns (supplementary fig. S17c to d, Supplementary Material online). O2a2 has the highest frequency in the Tibetan Plateau, Southeast China, and SEA (supplementary fig. S17f, Supplementary Material online). The sublineage O2a2a-M188 is notably frequent in the SEA, decreasing in frequency from south to north across East Asia (supplementary fig. S17f, Supplementary Material online); O2a2b-P164 is widespread in China, with the highest occurrence on the Tibetan Plateau (supplementary fig. S17h, Supplementary Material online). The majority of O2a1 individuals (~87%) are O2a1b-JST002611, which is widespread across Chinese populations, particularly among Han populations (supplementary fig. S17d and table S5, Supplementary Material online). However, O2a1b and its sublineages appear infrequently among Tibeto-Burman groups, suggesting a minimal impact on this population. The initial diffusion centers for O2a1b sublineages are identified in the middle and lower reaches of the Yellow River Basin (supplementary fig. S17d, Supplementary Material online). Two main sublineages, O2a1b1a1a1a-F11 and O2a1b1a2a-F238, are found with differing frequencies; O2a1b1a1a1a-F11 is more common, especially in diverse Han populations (supplementary table S5, Supplementary Material online). O2a1b1a1a1a expanded approximately 8.9 kya, and O2a1b1a2a diverged approximately 9.0 kya (Fig. 1b; supplementary fig. S1, Supplementary Material online), which are the times that greatly preceded earlier TMRCA estimates. This discrepancy highlights differences between the Y-STR and Y-SNP-based TMRCA estimations and the influence of the Y-chromosome sequence coverage. O2a1b1a1a, the upstream lineage of O2a1b1a1a1a, appears most frequently in Southeast China and Guizhou in Southwest China, and its initial diffusion center is likely to be the middle and lower portions of the Yellow River Basin (supplementary fig. S17e, Supplementary Material online). O2a1b1a1a1a-F11 was identified in a Banpo site sample (Zhang et al. 2018), linking its emergence to Yangshao millet farmers. Furthermore, historical individuals from Shigatse on the southern Tibetan Plateau and Mongolian Plateau also carried this sublineage (Fig. 2), indicating the significant influence of Neolithic millet farmers. Linguistic evidence points to the initial divergence of Sino-Tibetan languages during the Yangshao period, with their dispersal likely occurring in the upper Yellow River Basin (Zhang et al. 2019). The estimated expansion of O2a1b1a1a1a and the divergence of Sino-Tibetan languages, in addition to paleogenomic evidence, suggest significant genetic contributions from ANEA millet farmers to modern Sino-Tibetan groups in China. Notably, the diffusion center for Sino-Tibetan-related ancestors with O2a1b1a1a1a does not align with the dispersal center of Sino-Tibetan languages, highlighting potential discrepancies due to real differences, sampling bias, or limitations in computational biology algorithms. Thus, further extensive sampling of modern and ancient East Asians is recommended to refine these findings.

O2a1-L127.1과 O2a2-JST021354/P201 하위 혈통에서는 뚜렷이 다른 분포 패턴이 관찰되었다(보충 그림 S17). O2a1은 중국 남동부에서 가장 널리 퍼져 있으며, 인접 지역으로 갈수록 빈도가 감소한다(보충 그림 S17c). O2a1 하위 그룹 대부분은 유사한 분포 패턴을 보인다(보충 그림 S17c-d). O2a2는 티베트 고원, 중국 남동부, 동남아시아(SEA)에서 가장 높은 빈도를 보인다(보충 그림 S17f). 하위 혈통 O2a2a-M188은 동남아시아에서 특히 빈번하며, 동아시아를 가로질러 남쪽에서 북쪽으로 갈수록 빈도가 감소한다(보충 그림 S17f). 반면 O2a2b-P164는 중국에 널리 퍼져 있으며, 티베트 고원에서 가장 높은 빈도로 나타난다(보충 그림 S17h). O2a1 개인의 대다수(약 87%)는 O2a1b-JST002611에 속하며, 이 혈통은 중국 인구 전반, 특히 한족(漢族)에게 널리 퍼져 있다(보충 그림 S17d, 표 S5). 그러나 O2a1b와 그 하위 혈통들은 티베트-버마어족 그룹에서는 드물게 나타나, 이 인구 집단에는 최소한의 영향만 미쳤음을 시사한다. O2a1b 하위 혈통의 초기 확산 중심지는 황하(黃河) 중·하류 유역으로 확인된다(보충 그림 S17d). 두 주요 하위 혈통인 O2a1b1a1a1a-F11과 O2a1b1a2a-F238은 서로 다른 빈도로 발견된다. O2a1b1a1a1a-F11이 더 흔하며, 특히 다양한 한족(漢族) 집단에서 그렇다(보충 표 S5). O2a1b1a1a1a는 약 8,900년 전에 팽창했고, O2a1b1a2a는 약 9,000년 전에 분기했는데(그림 1b; 보충 그림 S1), 이는 이전의 공통 조상 시점(TMRCA) 추정치보다 훨씬 이른 시기이다. 이러한 불일치는 Y-STR 기반과 Y-SNP 기반의 공통 조상 시점(TMRCA) 추정치 간의 차이와 Y 염색체 서열 커버리지의 영향을 보여준다. O2a1b1a1a1a의 상위 혈통인 O2a1b1a1a는 중국 남동부와 남서부의 귀주성(貴州省)에서 가장 빈번하게 나타나며, 그 초기 확산 중심지는 황하(黃河) 중·하류 지역일 가능성이 높다(보충 그림 S17e). O2a1b1a1a1a-F11은 반파(半坡) 유적의 샘플에서 확인되어(Zhang et al. 2018), 그 출현이 양소(仰韶) 문화의 기장 농부들과 연결된다. 나아가, 티베트 고원 남부의 시가체(Shigatse)와 몽골 고원의 역사 시대 개인들 또한 이 하위 혈통을 지니고 있었는데(그림 2), 이는 신석기 시대 기장 농부들의 상당한 영향을 나타낸다. 언어학적 증거는 중국-티베트어족의 초기 분화가 양소(仰韶) 시대에 일어났으며, 그 확산은 황하(黃河) 상류 유역에서 발생했을 가능성이 높음을 가리킨다(Zhang et al. 2019). O2a1b1a1a1a의 팽창 추정 시기와 중국-티베트어족의 분화 시기, 그리고 고대 유전체 증거를 종합해 볼 때, 고대 북부 동아시아(ANEA) 기장 농부들이 현대 중국의 중국-티베트어족 그룹에 상당한 유전적 기여를 했음을 시사한다. 주목할 점은, O2a1b1a1a1a를 지닌 중국-티베트 관련 조상의 확산 중심지가 중국-티베트어족의 확산 중심지와 일치하지 않는다는 것이다. 이는 실제 차이, 샘플링 편향, 또는 계산생물학 알고리즘의 한계로 인한 잠재적 불일치를 보여준다. 따라서 이러한 연구 결과를 더욱 정교하게 다듬기 위해서는 현대 및 고대 동아시아인에 대한 추가적이고 광범위한 샘플링이 권장된다.

High frequencies of most O2a2a subclades are observed in South China and SEA (supplementary fig. S17f and g, Supplementary Material online). Among these sublineages, the O2a2a1a2-M7 sublineages constitute the largest proportion (~43.8%, supplementary table S5, Supplementary Material online) and are primarily found in the Hmong-Mien people and southern Han Chinese (supplementary table S5, Supplementary Material online). Only one IA Hanben individual from Taiwan Island was identified within O2a2a1a2a2 (Fig. 2). A recent rapid expansion of O2a2a1a2a1a1a2a1a1a1 around 2.9 kya was noted (Fig. 1b). Moreover, the O2a2b sublineages are widely distributed across China (supplementary fig. S17h and table S5, Supplementary Material online). O2a2b1-M134, a major subclade of O2a2b, appears predominantly among Sino-Tibetan speakers (~85%), with the highest occurrence in the Tibetan Plateau (supplementary fig. S17h and table S5, Supplementary Material online). Two star-like expansions have been linked to O2a2b1a1a1-F8 (~7.3 kya) and O2a2b1a2a1a-F46 (~9 kya) (Fig. 1b; supplementary fig. S1, Supplementary Material online). The upstream lineage of O2a2b1a1a1, O2a2b1a1a, is prevalent in Southwest/Southeast China and the Circum-Bohai-Sea region (supplementary fig. S17h, Supplementary Material online)The frequencies of O2a2b1a2 and the upstream lineage of O2a2b1a2a1a are greater in Northeast, North, and East China than in other areas (supplementary fig. S17h, Supplementary Material online)The optimized hot spot analysis suggests that the early diffusion center for O2a2b1a2 is likely the Circum-Bohai-Sea region (supplementary fig. S17h, Supplementary Material online)Several LN to IA ANEA millet farmers, HE Mongolian Plateau ancients, IA to HE Xinjiang individuals, and multiple IA to HE Tibetan Plateau individuals, particularly those in the southern Tibetan Plateau, are assigned to the sublineages of O2a2b1a1Additionally, some ancient individuals from the Yellow River Basin and Northeast/Southeast Tibetan Plateau are linked to O2a2b1a2a1a or its sublineages (Fig. 2). Star-like expansions noted in O2a1b1a1a1a-F11 (~8.9 kya), O2a2b1a1a1-F8 (~7.3 kya), and O2a2b1a2a1a-F46 (~9 kya) represent approximately 27% of the newly reported paternal lineages and 31% of the paternal lineages in China (supplementary fig. S1 and tables S4 and S5, Supplementary Material online), highlighting significant contributions from the Neolithic expansions of ANEA millet farmers to modern Chinese gene pools. Consequently, the development of millet agriculture, migration of ancient millet farmers, and admixture with diverse indigenous populations have shaped the present distribution of the O2a-M324 sublineages, particularly O2a1b-JST002611 and O2a2b1-M134.

대부분의 O2a2a 하위 그룹은 중국 남부와 동남아시아(SEA)에서 높은 빈도로 관찰된다(보충 그림 S17f, g). 이 하위 혈통들 중 O2a2a1a2-M7 계열이 가장 큰 비율(약 43.8%)을 차지하며, 주로 몽-미엔족과 남부 한족(漢族)에게서 발견된다(보충 표 S5). 대만섬(臺灣島)의 철기 시대(IA) 한본(Hanben) 유적 개인 단 한 명만이 O2a2a1a2a2 내에서 확인되었으며(그림 2), 약 2,900년 전 O2a2a1a2a1a1a2a1a1a1 계통의 최근 급격한 팽창이 관찰되었다(그림 1b). 또한, O2a2b 하위 혈통은 중국 전역에 널리 분포한다(보충 그림 S17h, 표 S5). O2a2b의 주요 하위 그룹인 O2a2b1-M134는 주로 중국-티베트어족 사용자들 사이에서 나타나며(약 85%), 티베트 고원에서 가장 높은 빈도를 보인다(보충 그림 S17h, 표 S5). O2a2b1a1a1-F8(약 7,300년 전)과 O2a2b1a2a1a-F46(약 9,000년 전)에서는 별 모양의 급격한 인구 팽창이 확인되었다(그림 1b; 보충 그림 S1). O2a2b1a1a1의 상위 혈통인 O2a2b1a1a는 중국 남서부/남동부와 환발해(環渤海) 지역에 널리 퍼져 있다(보충 그림 S17h). O2a2b1a2와 O2a2b1a2a1a의 상위 혈통의 빈도는 다른 지역보다 중국 동북부, 북부, 동부에서 더 높으며(보충 그림 S17h) , 최적화된 핫스팟 분석은 O2a2b1a2의 초기 확산 중심지가 환발해(環渤海) 지역일 가능성이 높음을 시사한다(보충 그림 S17h). 신석기 후기(LN)부터 철기 시대(IA)까지의 고대 북부 동아시아(ANEA) 기장 농부, 역사 시대(HE) 몽골 고원 고대인, 철기 시대부터 역사 시대까지의 신강(新疆) 개인, 그리고 여러 철기 시대부터 역사 시대까지의 티베트 고원 개인들, 특히 티베트 남부 고원의 개인들이 O2a2b1a1의 하위 혈통으로 분류된다. 추가적으로, 황하(黃河) 유역과 티베트 고원 동북/동남부의 일부 고대인들은 O2a2b1a2a1a 또는 그 하위 혈통과 연결된다(그림 2). O2a1b1a1a1a-F11(약 8,900년 전), O2a2b1a1a1-F8(약 7,300년 전), O2a2b1a2a1a-F46(약 9,000년 전)에서 관찰된 별 모양의 팽창은 새로 보고된 부계 혈통의 약 27%, 전체 중국 부계 혈통의 31%를 차지한다(보충 그림 S1, 표 S4, S5). 이는 신석기 시대 고대 북부 동아시아(ANEA) 기장 농부들의 팽창이 현대 중국인의 유전자 풀에 상당한 기여를 했음을 보여준다. 결론적으로, 기장 농업의 발전, 고대 기장 농부의 이주, 그리고 다양한 토착 인구와의 혼합이 현재의 O2a-M324 하위 혈통, 특히 O2a1b-JST002611과 O2a2b1-M134의 분포를 형성했다.

ASEA Rice Farmer-Related Founding Lineages from Yangtze River Basin Left a Massive Genetic Legacy in China and SEA

양자강 유역에서 유래한 고대 남부 동아시아 쌀 농부 관련 창시 혈통이 중국과 동남아시아에 막대한 유전적 유산을 남김

Southern East Asia, an origin center for rice domestication, is considered the ancestral homeland of the Hmong-Mien, Tai-Kadai, Austroasiatic, and Austronesian peopleADMIXTURE models suggested that Hmong/Hanben-related ancestral components prevalent in southern Chinese populations are associated with most O1 subclades (Figs. 4f and g and 5e)Recent studies have shown that ancient Yangtze River Basin rice farmers influenced the genetic makeup of ancient Yellow River Basin millet farmers and populations in SEA and Oceania (Yang et al. 2020; Wang et al. 2021a). The exploration of the phylogeographic features of the O1 sublineages revealed a high prevalence of O1-F265 across Southeast, South, and Southwest China, SEA, and the Japanese archipelagoThe subclade Ola-M119 is common in Southeast China, while O1b-M268 predominates in Southwest China and SEA (supplementary fig. S18 and table SS, Supplementary Material online)The Ola sublineages are primarily found among Austronesian-, Tai-Kadai-, and Sinitic-speaking populations in Southeast, South, and Southwest China (supplementary table S5, Supplementary Material online), suggesting a shared patrilineal origin among these groups and a significant gene flow with the Han Chinese (Chen et al. 2022; Wang et al. 2022; Liu et al. 2023). A Neolithic expansion linked to subclade Olala1 (~7.6 kya, supplementary fig. S1, Supplementary Material online) is identified, with Olalala being more prevalent than Olalalb (supplementary fig. S18b and table S5, Supplementary Material online)Olalala and its sublineages, which are found predominantly in Southeast China (~51%), diversified approximately 5.7 kya, with early dispersal centers likely in the middle and lower portions of the Yangtze River Basin and the southeast coast (supplementary fig. S18b, Supplementary Material online)O1a1a1b, which has the highest frequency in Hainan among the Li people, decreases from south to north, with initial dispersal centers in South/Southwest ChinaThis lineage, which is possibly ancestral to the Baiyue, significantly contributed to other Chinese populations (supplementary fig. S18b and table S5, Supplementary Material online). Another Neolithic expansion associated with Olala2 (~8.4 kya, supplementary fig. S1, Supplementary Material online) shows high frequencies along the southeastern coast, South/Southwest China, and Vietnam, with likely initial dispersal centers in South/Southwest China (supplementary fig. S18b, Supplementary Material online)The primary sublineage Olala2a diverged approximately 6.4 kya (supplementary fig. 51, Supplementary Material online)The geographical distribution patterns and divergence times of Olala1b- and Ola1a2a-related lineages align with inferred migration routes from coastal to inland Southwest China and from Southwest China to mainland SEA according to the phylogenetic reconstructions of the Tai-Kadai languages (Tao et al. 2023). Several Taiwanese Hanben individuals are found within Olalalal sublineages (Fig. 2), and evidence from the Liangzhu culture in the Yangtze River Delta indicates that rice farmers carrying Ola-M119 in the Yangtze River Basin were likely the direct ancestors of the modern Tai-Kadai and Austronesian people, profoundly influencing southern Han ChineseThis migration proceeded southward along China’s southeastern coast or inland routes to Southeast/Southwest China and mainland SEA.

쌀 재배의 기원지 중 하나인 남부 동아시아는 몽-미엔족, 타이-카다이어족, 오스트로아시아어족, 오스트로네시아어족의 조상 고향으로 여겨진다. ADMIXTURE 모델 분석 결과, 중국 남부 인구에 널리 퍼져 있는 몽족/한본(Hanben) 관련 조상 요소가 대부분의 O1 하위 그룹과 연관되어 있음이 시사되었다(그림 4f, g, 5e). 최근 연구들은 고대 양자강(揚子江) 유역의 쌀 농부들이 고대 황하(黃河) 유역의 기장 농부들뿐만 아니라 동남아시아(SEA)와 오세아니아 인구의 유전 구성에도 영향을 미쳤음을 보여주었다(Yang et al. 2020; Wang et al. 2021a). O1 하위 혈통의 계통지리학적 특징을 탐구한 결과, O1-F265는 중국 남동부, 남부, 남서부와 동남아시아(SEA), 그리고 일본 열도에 걸쳐 높은 보급률을 보였다. 하위 그룹 O1a-M119는 중국 남동부에서 흔하고, O1b-M268은 중국 남서부와 동남아시아에서 우세하다(보충 그림 S18, 표 S5). O1a 하위 혈통은 주로 중국 남동부, 남부, 남서부의 오스트로네시아어족, 타이-카다이어족, 중국어파 사용 인구에서 발견되며(보충 표 S5), 이는 이들 그룹이 공통된 부계 기원을 공유하며 한족(漢族)과 상당한 유전자 흐름이 있었음을 시사한다(Chen et al. 2022; Wang et al. 2022; Liu et al. 2023). 하위 그룹 O1a1a1과 관련된 신석기 시대 팽창(약 7,600년 전)이 확인되었으며, O1a1a1a가 O1a1a1b보다 더 널리 퍼져 있다(보충 그림 S1, S18b, 표 S5). 주로 중국 남동부에서 발견되는(약 51%) O1a1a1a와 그 하위 혈통들은 약 5,700년 전에 다양화되었으며, 초기 확산 중심지는 양자강(揚子江) 중·하류와 남동부 해안이었을 가능성이 높다(보충 그림 S18b). 해남(海南)의 리족(黎族) 사이에서 가장 높은 빈도를 보이는 O1a1a1b는 남쪽에서 북쪽으로 갈수록 빈도가 감소하며, 초기 확산 중심지는 중국 남부/남서부이다. 아마도 백월(百越)의 조상 혈통일 이 계통은 다른 중국 인구 집단에도 상당한 기여를 했다(보충 그림 S18b, 표 S5). O1a1a2와 관련된 또 다른 신석기 시대 팽창(약 8,400년 전)은 남동부 해안, 중국 남부/남서부, 베트남을 따라 높은 빈도를 보이며, 초기 확산 중심지는 중국 남부/남서부였을 가능성이 있다(보충 그림 S1, S18b). 주요 하위 혈통인 O1a1a2a는 약 6,400년 전에 분기했다(보충 그림 S1). O1a1a1b 및 O1a1a2a 관련 혈통의 지리적 분포 패턴과 분기 시점은 타이-카다이어족의 계통 재구성에 따른 추정 이주 경로, 즉 해안에서 내륙인 중국 남서부로, 그리고 중국 남서부에서 동남아시아 본토로 향하는 경로와 일치한다(Tao et al. 2023). 몇몇 대만(臺灣)의 한본(Hanben) 유적 개인들은 O1a1a1a1 하위 혈통 내에서 발견되며(그림 2), 양자강(揚子江) 삼각주 지역의 양저(良渚) 문화 증거는 양자강 유역에서 O1a-M119를 지녔던 쌀 농부들이 현대 타이-카다이어족과 오스트로네시아어족의 직계 조상이었을 가능성이 높으며, 남부 한족(漢族)에게도 깊은 영향을 미쳤음을 나타낸다. 이 이주는 중국의 남동부 해안을 따라 남쪽으로 진행되거나 내륙 경로를 통해 남동/남서 중국과 동남아시아 본토로 이어졌다.

Haplogroup O1b-M268, predominantly found in Southwest/South China, SEA, and the Japanese archipelago, is divided into three major subclades, namely, O1b1a1-PK4, O1b1a2-Page59, and O1b2-P49, each displaying distinct distribution patterns (supplementary fig. S18d to f, Supplementary Material online)O1b1a1 and its sublineages, which are mainly located in Southwest/South China and SEA, constitute key paternal lineages among the Tai-Kadai-speaking populations (supplementary table S5, Supplementary Material online)However, O1b1a1a-M95, primarily found in Austroasiatic groups, suggests an ancient gene flow between the proto-Austroasiatic and proto-Tai-Kadai populations, highlighting the impact of limited Austroasiatic sample sizes in our data set (Zhang et al. 2014; Kutanan et al. 2019; Macholdt et al. 2020). Ancient DNA analysis revealed that individuals from ~3,000 years ago at the Wucheng site in Jiangsu along the Yangtze River Basin and from the Hengbei site in Shanxi, as well as several ~1,500-year-old Guangxi individuals from southern East Asia (Li et al. 2007; Zhao et al. 2014), carried O1b1a1a-M95 or related sublineages (Fig. 2)Additionally, recent expansion events associated with O1b1a1a1a1b1a1a1 (~3 kya) and O1blalala1b2a1a1a (~2.5 kya) have been identified. O1b1a2 and its sublineages, which are relatively rare in East Asia, are primarily found in East China, the southeastern part of Northeast China, and Vietnam, especially among Han Chinese individuals (supplementary fig. S18e and table S5, Supplementary Material online)An MN individual from the Wanggou site, which is part of the Yangshao culture, was identified as belonging to Olbla2-Page59 (Fig. 2)O1b2-P49 is most frequent in Japan, followed by Northeast China, but its detailed phylogenetic structure has yet to be fully elucidated (supplementary fig. S18f, Supplementary Material online). The genetic diversity patterns of the 01 lineages indicate a significant influence of ancient rice farmers on the gene pools of populations in South China and SEAThe complex movements and admixture events associated with these ancient agriculturists have profoundly shaped the genetic landscape of modern and ancient East Asians. To clarify the origins of crucial Chinese-dominant subclades and the demographic processes influencing modern Chinese populations, we should design a systematic sampling strategyThis approach should include comprehensive Y-chromosome sequences and the collection of spatiotemporally distinct ancient samples for more detailed analyses.

주로 중국 남서부/남부, 동남아시아(SEA), 일본 열도에서 발견되는 하플로그룹 O1b-M268은 세 개의 주요 하위 그룹, 즉 O1b1a1-PK4, O1b1a2-Page59, O1b2-P49로 나뉘며, 각각 뚜렷한 분포 패턴을 보인다(보충 그림 S18d-f). 주로 중국 남서부/남부와 동남아시아(SEA)에 위치한 O1b1a1과 그 하위 혈통들은 타이-카다이어족 사용 인구 사이에서 핵심적인 부계 혈통을 구성한다(보충 표 S5). 그러나 주로 오스트로아시아어족 그룹에서 발견되는 O1b1a1a-M95는 원시 오스트로아시아어족과 원시 타이-카다이어족 인구 사이에 고대 유전자 흐름이 있었음을 시사하며, 이는 우리 데이터 세트의 오스트로아시아어족 샘플 크기가 제한적이라는 점을 감안할 필요가 있다(Zhang et al. 2014; Kutanan et al. 2019; Macholdt et al. 2020). 고대 DNA 분석 결과, 약 3,000년 전 양자강(揚子江) 유역 강소성(江蘇省)의 오성(吳城) 유적과 산서성(山西省)의 횡북(橫北) 유적의 개인들, 그리고 남부 동아시아의 약 1,500년 전 광서(廣西) 개인들이 O1b1a1a-M95 또는 관련 하위 혈통을 지녔음이 밝혀졌다(Li et al. 2007; Zhao et al. 2014; 그림 2). 추가적으로, O1b1a1a1a1b1a1a1(약 3,000년 전) 및 O1b1a1a1a1b2a1a1a(약 2,500년 전)와 관련된 최근의 팽창 사건들이 확인되었다. 동아시아에서 비교적 드문 O1b1a2와 그 하위 혈통들은 주로 중국 동부, 중국 동북부의 남동쪽 지역, 그리고 베트남에서 발견되며, 특히 한족(漢族) 개인들 사이에서 나타난다(보충 그림 S18e, 표 S5). 양소(仰韶) 문화의 일부인 왕구(王溝) 유적의 신석기 중기(MN) 개인 한 명이 O1b1a2-Page59에 속하는 것으로 확인되었다(그림 2). O1b2-P49는 일본에서 가장 빈번하고 그 다음으로 중국 동북부에서 많이 나타나지만, 그 상세한 계통 구조는 아직 완전히 밝혀지지 않았다(보충 그림 S18f). O1 혈통의 유전적 다양성 패턴은 고대 쌀 농부들이 중국 남부와 동남아시아(SEA) 인구의 유전자 풀에 상당한 영향을 미쳤음을 나타낸다. 이들 고대 농업인들과 관련된 복잡한 이동 및 혼합 사건은 현대 및 고대 동아시아인의 유전적 지형을 깊이 있게 형성했다. 중국에서 우세한 주요 하위 그룹들의 기원과 현대 중국 인구에 영향을 미친 인구학적 과정을 명확히 하기 위해, 우리는 체계적인 샘플링 전략을 설계해야 한다. 이 접근법은 포괄적인 Y 염색체 서열 분석과 함께, 더 상세한 분석을 위해 시공간적으로 구별되는 고대 샘플을 수집하는 것을 포함해야 한다.

결론 (Conclusion)

Genetic evidence from autosomal DNA studies has profoundly transformed our understanding of the genetic histories of diverse human populations. However, research into the ancient genetic legacy reflected in modern Chinese populations via Y-chromosome analysis remains sparse. To address this gap, we used the YHC to analyze the Y-chromosome diversity in ethnolinguistically diverse Chinese populations through whole Y-chromosome sequencing and our newly developed high-resolution YHSeqY3000 panel. This project reconstructs demographic events, such as isolation, expansion, and admixture, using various computational models. The new data were integrated with a Y-chromosome genomic database of 14,644 individuals, creating a comprehensive database that includes 1,786 ancient Eurasians and 115 modern Chinese populations from 47 ethnic groups. This integration facilitates an in-depth exploration of the paternal genetic diversity of Chinese populations. Our findings indicate that multiple founding lineages associated with millet/rice farmers from the Yellow River Basin and the Yangtze River Basin, Siberian hunter-gatherers, and ancient western Eurasian pastoralists and farmers significantly influence the geographical patterns of paternal genetic stratification in Chinese populations. There is a strong correlation between the frequency of subsistence model-related founding lineages and the proportion of autosomal-based admixture from presumed ancestral sources, as well as between the latitude and a differentiated north-to-south genetic matrix. These correlations suggest that ancient migrations and extensive admixtures with indigenous populations primarily shaped the paternal genetic landscape of Chinese populations. To further elucidate the paternal evolutionary history of East Asians, we emphasize the importance of combining high-depth whole-genome sequencing data from both modern and spatiotemporally diverse ancient populations. This comprehensive approach will enhance our understanding of the dynamic interplay between migration, admixture, and cultural development in this region.

상염색체 DNA 연구에서 나온 유전적 증거는 다양한 인류 집단의 유전적 역사에 대한 우리의 이해를 근본적으로 바꾸어 놓았다. 그러나 Y 염색체 분석을 통해 현대 중국 인구에 반영된 고대 유전적 유산을 탐구하는 연구는 여전히 부족하다. 이러한 격차를 해소하기 위해, 우리는 염황 코호트(YHC)를 이용하여 전체 Y 염색체 시퀀싱과 새로 개발한 고해상도 YHSeqY3000 패널을 통해 민족언어학적으로 다양한 중국 인구의 Y 염색체 다양성을 분석했다. 이 프로젝트는 다양한 계산 모델을 사용하여 고립, 팽창, 혼합과 같은 인구학적 사건들을 재구성한다. 새로운 데이터는 14,644명의 개인으로 구성된 Y 염색체 게놈 데이터베이스와 통합되었다. 그 결과 1,786명의 고대 유라시아인과 47개 민족 그룹에 속하는 115개의 현대 중국 인구 집단을 포함하는 포괄적인 데이터베이스가 만들어졌다. 이러한 통합은 중국 인구의 부계 유전적 다양성을 심도 있게 탐구하는 것을 가능하게 한다. 우리의 연구 결과는 황하(黃河) 유역과 양자강(揚子江) 유역의 기장/쌀 농부, 시베리아 수렵-채집인, 그리고 고대 서부 유라시아의 목축민 및 농부들과 관련된 여러 창시 혈통이 중국 인구의 부계 유전적 계층화의 지리적 패턴에 중대한 영향을 미친다는 것을 보여준다. 생계 방식과 관련된 창시 혈통의 빈도는 추정된 조상 집단으로부터 온 상염색체 기반 혼합 비율과 강한 상관관계를 보이며, 위도 및 남북으로 분화된 유전적 매트릭스와도 강한 상관관계를 보인다. 이러한 상관관계는 고대의 이주와 토착 인구와의 광범위한 혼합이 주로 중국 인구의 부계 유전적 지형을 형성했음을 시사한다. 동아시아인의 부계 진화 역사를 더욱 명확히 밝히기 위해, 우리는 현대인과 시공간적으로 다양한 고대 인구 양쪽으로부터 얻은 고심도 전체 게놈 시퀀싱 데이터를 결합하는 것의 중요성을 강조한다. 이러한 포괄적인 접근법은 이 지역의 이주, 혼합, 문화 발전 사이의 역동적인 상호작용에 대한 우리의 이해를 향상시킬 것이다.

연구 재료 및 방법 (Materials and Methods)

Sampling, Sequencing, Genotyping, and Phylogenetic Construction

샘플링, 시퀀싱, 유전자형 분석 및 계통수 구축

연구 참여자 Study Participants

To comprehensively characterize the paternal diversity across China, saliva samples were collected from 919 participants representing 39 ethnolinguistic groups (supplementary table S1, Supplementary Material online). The participants were all descendants of self-identified ethnic group members, with their grandparents having resided in their respective sampling districts for at least three generations. The study received approval from the Medical Ethics Committee of West China Hospital of Sichuan University (2023-306) and was conducted following the Helsinki Declaration of 2013 (World Medical Association 2013). Informed consent was obtained from each participant before sample collection.

중국 전역의 부계 다양성을 종합적으로 규명하기 위해, 39개 민족언어학적 그룹을 대표하는 919명의 참여자로부터 타액 샘플을 수집했다. 모든 참여자는 스스로 밝힌 민족 그룹 구성원의 후손이며, 조부모까지 최소 3세대 이상 각 샘플링 지역에 거주한 이들이다. 본 연구는 사천대학(四川大學) 서중병원(西中醫院) 의학윤리위원회의 승인을 받았으며(2023-306), 2013년 헬싱키 선언에 따라 수행되었다. 샘플 수집 전 각 참여자로부터 사전 동의를 받았다.

DNA Extraction, Whole-Genome Sequencing, and Genotyping

DNA 추출, 전체 게놈 시퀀싱 및 유전자형 분석

Genomic DNA was extracted using the QIAamp DNA Mini Kit (QIAGEN, Germany). DNA concentrations were quantified with the Qubit dsDNA HS Assay Kit, following the standard protocol on a Qubit 3.0 fluorometer (Thermo Fisher Scientific). Sequencing was conducted on the Illumina platform (Illumina, San Diego, CA, USA), achieving 80X genome-wide coverage. The raw sequencing reads were mapped to the human reference genome GRCh37 using BWA v0.7.13 (Li and Durbin 2009). Duplicate reads were removed with Picard v3.0.0, followed by a base quality score recalibration via GATK v4.2.6.1. Joint variant calling was executed using GATK HaplotypeCaller, CombineGVCFs, and GenotypeGVCFs modules (McKenna et al. 2010). High-quality variant calls within a 10 Mb region were obtained through a sequence mask (Poznik et al. 2013). Variants exhibiting missing call rates greater than 5%, base quality below 20, and heterogeneity rate above 15% were filtered out using BCFtools v1.8 (Li 2011). Samples with missing call rates exceeding 5% were removed via vcftools v0.1.16 (Danecek et al. 2011). Ultimately, 914 samples meeting quality standards were selected for the downstream analysis, including the reconstruction of a time-scaled phylogenetic tree. Additionally, Y-specific target sequences with 100x coverage were generated using the custom-designed YHSeqY3000 panel on the MGI sequencing platform to validate the sequencing performance.

게놈 DNA는 QIAamp DNA Mini Kit(QIAGEN, 독일)를 사용하여 추출했다. DNA 농도는 Qubit 3.0 형광계(Thermo Fisher Scientific)에서 표준 프로토콜에 따라 Qubit dsDNA HS Assay Kit로 정량화했다. 시퀀싱은 Illumina 플랫폼(Illumina, 샌디에이고, CA, 미국)에서 수행하여 게놈 전체 80배수(80X)의 커버리지를 달성했다. 원본 시퀀싱 데이터는 BWA v0.7.13을 사용하여 인간 표준 유전체 GRCh37에 매핑했다. Picard v3.0.0으로 중복 리드를 제거한 후, GATK v4.2.6.1을 통해 염기 품질 점수를 재보정했다. GATK HaplotypeCaller, CombineGVCFs, GenotypeGVCFs 모듈을 사용하여 공동 변이 호출을 실행했다. 시퀀스 마스크를 통해 10Mb 영역 내에서 고품질 변이 호출 데이터를 얻었다. 누락 호출률이 5%를 초과하거나, 염기 품질이 20 미만이거나, 이질성 비율이 15%를 초과하는 변이는 BCFtools v1.8을 사용하여 걸러냈다. 누락 호출률이 5%를 초과하는 샘플은 vcftools v0.1.16을 통해 제거했다. 최종적으로 품질 기준을 충족하는 914개 샘플이 시간 척도를 적용한 계통수 재구성을 포함한 후속 분석을 위해 선택되었다. 추가적으로, 시퀀싱 성능을 검증하기 위해 MGI 시퀀싱 플랫폼에서 맞춤 설계된 YHSeqY3000 패널을 사용하여 100배수(100x) 커버리지의 Y-특이적 표적 서열을 생성했다.

Haplogroup Classification and Phylogenetic Relationship Construction

하플로그룹 분류 및 계통 관계 구축

The initial classification of the Y-chromosome haplogroups was performed using in-house scripts based on a newly reconstructed phylogenetic tree supplemented by classifications from HaploGrouper (Jagadeesan et al. 2021) and Y-Lineage Tracker (Chen et al. 2021), referencing the Y-DNA Haplogroup Tree 2019-2020 (https://isogg.org/tree/index.html). BEAST v1.10.4 (Suchard et al. 2018) facilitated the construction of a phylogenetic tree and the estimation of the TMRCA for various nodes using approximately 10 Mb of Y-chromosome sequences. B-related haplotypes served as an outgroup (Mallick et al. 2016). The optimal substitution model was selected via jModelTest v2.1.10 (Darriba et al. 2012). Markov chain Monte Carlo sampling was executed over 100 million iterations, with samples logged every 1,000 iterations and the initial 10 million iterations discarded as a burn-in. An exponential growth coalescent tree prior was used alongside the GTR (general time reversible) substitution model and a strict molecular clock. The substitution rate was set at 7.6×10⁻¹⁰ mutations per base pair per year (95% confidence interval: 6.7×10⁻¹⁰ to 8.6×10⁻¹⁰), as estimated by Fu et al. (2014). Three independent runs were amalgamated using LogCombiner, with the quality of the combined output manually verified using Tracer v1.7.1 (Rambaut et al. 2018). The maximum clade credibility tree was then generated with TreeAnnotator v1.10 and visualized using FigTree. To further investigate the ancient influences on the paternal landscape of the recently genotyped Chinese ethnic minorities, an ML phylogenetic tree was constructed using RAxML (Stamatakis 2014) with 914 ~10 Mb of Y-chromosome sequences. Ancient genomes were integrated into this modern ML phylogeny using pathPhynder (Martiniano et al. 2022), and the tree was refined with iTOL (Letunic and Bork 2021). For the complete data set of Y-chromosome target sequences from 919 samples, a network-based analysis of shared haplotypes was conducted using PopART (Leigh and Bryant 2015), providing a comprehensive view of haplogroup relationships.

Y 염색체 하플로그룹의 초기 분류는 새로 재구성된 계통수를 기반으로 한 자체 스크립트를 사용하여 수행했다. 이때 HaploGrouper(Jagadeesan et al. 2021)와 Y-Lineage Tracker(Chen et al. 2021)의 분류 결과를 보조적으로 활용했으며, Y-DNA 하플로그룹 트리 2019-2020을 참조했다. BEAST v1.10.4를 이용하여 약 10Mb의 Y 염색체 서열로부터 계통수를 구축하고 다양한 분기점의 공통 조상 시점(TMRCA)을 추정했다. B-관련 하플로타입을 외부 그룹(outgroup)으로 사용했다. 최적의 치환 모델은 jModelTest v2.1.10을 통해 선택했다. 마르코프 연쇄 몬테카를로 샘플링은 1억 회 반복 수행했으며, 1,000회마다 샘플을 기록하고 초기 1,000만 회는 번인(burn-in)으로 폐기했다. GTR(일반 시간 가역) 치환 모델과 엄격한 분자 시계와 함께 지수 성장 통합 계통 트리 사전 확률(prior)을 사용했다. 치환율은 Fu 등(2014)이 추정한 연간 염기쌍당 ×10⁻¹⁰ 돌연변이(95% 신뢰 구간: ×10⁻¹⁰ ~ ×10⁻¹⁰)로 설정했다. 세 번의 독립적인 실행 결과를 LogCombiner를 사용하여 통합했고, 통합된 결과의 품질은 Tracer v1.7.1을 사용하여 수동으로 검증했다. 그 후 TreeAnnotator v1.10으로 최대 분기군 신뢰도 트리(maximum clade credibility tree)를 생성하고 FigTree를 사용하여 시각화했다. 최근 유전자형이 분석된 중국 소수 민족의 부계 유전 지형에 미친 고대의 영향을 더 조사하기 위해, 914개의 약 10Mb Y 염색체 서열을 사용하여 RAxML로 최대 가능도(ML) 계통수를 구축했다. 고대 게놈은 pathPhynder를 사용하여 이 현대 ML 계통수에 통합했으며, iTOL을 사용하여 트리를 다듬었다. 919개 샘플의 Y 염색체 표적 서열 전체 데이터 세트에 대해서는 PopART를 사용하여 공유 하플로타입의 네트워크 기반 분석을 수행하여 하플로그룹 관계에 대한 포괄적인 시각을 제공했다.

Haplogroup Frequency Spectra Estimation and Clustering Analysis

하플로그룹 빈도 분포 추정 및 군집 분석

데이터 세트 구성Data Set Composition

We integrated previously published haplogroup data from 11,979 East Asian individuals across 79 populations drawn from key studies, the 1KGP, and the Human Genome Diversity Project (Poznik et al. 2016; Bergstrom et al. 2020). Additionally, data from 879 individuals across 27 SEA populations; 252 ancient East Asians from regions, including the Tibetan Plateau, Xinjiang, Amur River Basin, Yellow River Basin, West Liao River, and South China; and 1,534 ancient western Eurasians from the Allen Ancient DNA Resource were included (supplementary tables S2 and S3, Supplementary Material online; Mallick et al. 2024). A total of 13,777 modern individuals from 12 linguistically distinct groups were sampled, spanning 22 provinces, five autonomous regions, and four municipalities in China, as well as Thailand and Vietnam. These included 135 Austroasiatic-, 693 Austronesian-, 285 Hmong-Mien-, 75 Japonic-, 35 Koreanic-, 994 Mongolic-, 863 Tai-Kadai-, 1338 Tibeto-Burman-, 260 Tungusic-, 1 Indo-European-, 291 Turkic-, and 805 Sinitic-speaking Hui, 3,248 northern Han Chinese, and 4,754 southern Han individuals (supplementary tables S1 and S2, Supplementary Material online). The haplogroups were manually revised according to variant information and the Y-DNA Haplogroup Tree 2019–2020. To facilitate the estimation of the spatial distributions of the paternal lineages, we aggregated haplogroup data to create metapopulations based on geographical region, ethnicity, and language family. The haplogroup frequencies were estimated at various levels of terminal haplogroups. Population genetic analyses were conducted on individual populations with sample sizes exceeding 10 and metapopulations exceeding 30.

우리는 주요 연구, 1000 게놈 프로젝트(1KGP), 인간 게놈 다양성 프로젝트에서 가져온 79개 인구 집단에 속하는 11,979명의 동아시아인에 대한 기존 발표된 하플로그룹 데이터를 통합했다. 또한 27개 동남아시아(SEA) 인구 집단의 879명, 티베트 고원·신강·아무르강 유역·황하 유역·서요하·남중국 등을 포함한 지역의 고대 동아시아인 252명, 그리고 Allen Ancient DNA Resource의 고대 서부 유라시아인 1,534명의 데이터를 포함시켰다. 중국의 22개 성, 5개 자치구, 4개 직할시와 태국, 베트남에 걸쳐 12개의 언어적으로 구별되는 그룹에 속하는 총 13,777명의 현대인이 샘플링되었다. 여기에는 오스트로아시아어족 135명, 오스트로네시아어족 693명, 몽-미엔어족 285명, 일본어족 75명, 한국어족 35명, 몽골어족 994명, 타이-카다이어족 863명, 티베트-버마어족 1,338명, 퉁구스어족 260명, 인도-유럽어족 1명, 튀르크어족 291명, 그리고 중국어를 사용하는 후이족 805명, 북부 한족 3,248명, 남부 한족 4,754명이 포함되었다. 하플로그룹은 변이 정보와 Y-DNA 하플로그룹 트리 2019-2020에 따라 수동으로 수정했다. 부계 혈통의 공간 분포 추정을 용이하게 하기 위해, 하플로그룹 데이터를 지리적 지역, 민족, 어족을 기반으로 통합하여 메타집단을 만들었다. 하플로그룹 빈도는 다양한 수준의 최종 하플로그룹에서 추정했다. 집단 유전학 분석은 샘플 크기가 10명을 초과하는 개별 인구 집단과 30명을 초과하는 메타집단에 대해 수행했다.

Population Structure Inference

인구 구조 추론

Pairwise Fst genetic distances were calculated from the haplogroup frequency spectra using Y-Lineage Tracker. MDS analyses were conducted based on these genetic distances utilizing the “cmdscale” function in R (https://itol.embl.de/itol.cgi). Additionally, PCA was performed on the haplogroup frequency spectra using Y-LineageTracker.

쌍별 Fst 유전 거리는 Y-Lineage Tracker를 사용하여 하플로그룹 빈도 분포로부터 계산했다. 이 유전 거리를 바탕으로 R의 “cmdscale” 함수를 사용하여 다차원 척도법(MDS) 분석을 수행했다. 추가적으로, Y-LineageTracker를 사용하여 하플로그룹 빈도 분포에 대한 주성분 분석(PCA)을 수행했다.

Spatial Statistics Correlated with the Phylogeographic Origin of Founding Lineages

창시 혈통의 계통지리학적 기원과 상관관계가 있는 공간 통계

The frequency of specific haplogroups within a province-defined population at various terminal haplogroup levels was computed using Y-Lineage Tracker, with level parameters adjusted from 0 to 6. The Chinese populations were grouped according to provincial administrative boundaries, while populations from the island and mainland SEA were aggregated by country. The spatial distribution patterns of the dominant haplogroups in China were examined using ArcMap. This included the application of the Getis-Ord General G method for optimized hot spot analysis and spatial autocorrelation analysis using Moran’s I. The clusters identified through optimized hot spot analysis, referred to as hot and cold spots, approximated the potential geographical origins or diffusion centers of specific haplogroups, and the mirroring regions illustrated the general distribution trends of these haplogroups.

성(省) 단위로 정의된 인구 내 특정 하플로그룹의 빈도를 Y-Lineage Tracker를 사용하여 다양한 최종 하플로그룹 수준(레벨 매개변수 0~6)에서 계산했다. 중국 인구는 성급 행정 구역에 따라 그룹화했고, 섬과 본토 동남아시아(SEA) 인구는 국가별로 통합했다. 중국 내 우세한 하플로그룹의 공간 분포 패턴은 ArcMap을 사용하여 조사했다. 여기에는 최적화된 핫스팟 분석을 위한 Getis-Ord General G 방법과 Moran’s I를 사용한 공간 자기상관 분석이 포함되었다. 핫스팟과 콜드스팟으로 불리는, 최적화된 핫스팟 분석을 통해 식별된 군집들은 특정 하플로그룹의 잠재적인 지리적 기원 또는 확산 중심지를 근사했으며, 미러링 지역은 이들 하플로그룹의 일반적인 분포 경향을 보여주었다.

Autosomal-Based ADMIXTURE Estimation

상염색체 기반 ADMIXTURE 추정

A data set was constructed from 445 ancient individuals across 88 Eurasian populations and 1,325 modern individuals from 62 geographically diverse populations, sourced from our integrated 10K_CPGDP database. Admixture proportions of Chinese populations were estimated using model-based ADMIXTURE. The autosomal data set was pruned using PLINK (Chang et al. 2015) with the parameters “–indep-pairwise 200 25 0.4” and “-allow-no-sex”. Subsequently, ADMIXTURE was run with predefined ancestral sources ranging from 2 to 15 (Alexander et al. 2009). The optimal admixture model was determined based on the lowest cross-validation error values, and correlations between the haplogroup frequencies and autosomal-based admixture proportions of modern Chinese populations were estimated.

우리의 통합 10K_CPGDP 데이터베이스에서 가져온 88개 유라시아 인구 집단의 고대인 445명과 62개 지리적으로 다양한 인구 집단의 현대인 1,325명으로 데이터 세트를 구성했다. 중국 인구의 혼합 비율은 모델 기반의 ADMIXTURE를 사용하여 추정했다. 상염색체 데이터 세트는 PLINK를 사용하여 “–indep-pairwise 200 25 0.4″와 “-allow-no-sex” 매개변수로 가지치기(pruning)했다. 그 후, 미리 정의된 조상 집단의 수를 2에서 15개로 설정하여 ADMIXTURE를 실행했다. 최적의 혼합 모델은 가장 낮은 교차 검증 오류 값을 기준으로 결정했으며, 현대 중국 인구의 하플로그룹 빈도와 상염색체 기반 혼합 비율 간의 상관관계를 추정했다.

Correlation between Haplogroup Frequency and ADMIXTURE-Based Ancestral Proportion

하플로그룹 빈도와 ADMIXTURE 기반 조상 비율 간의 상관관계

The haplogroup frequencies of geographically defined metapopulations were initially calculated. The Chinese populations distinguished by geographic and ethnolinguistic characteristics were grouped by provincial administrative region. All examined lineages were truncated at the ninth level, identifying 139 common lineages with a frequency exceeding 0.05 in at least one population, 177 low-frequency lineages, and 165 rare lineages. Pearson’s correlation coefficients between haplogroup frequencies and geographic coordinates (longitude and latitude), along with their intercorrelations and statistical significance, were estimated using the “corrplot” R package. Subsequently, all Chinese populations were consolidated into a single subpopulation, defining common lineages with frequencies above 0.01 or 0.05. The “corrplot” R package was also utilized to assess the correlation between admixture proportions and haplogroup frequencies.

지리적으로 정의된 메타집단의 하플로그룹 빈도를 먼저 계산했다. 지리적, 민족언어학적 특성으로 구별되는 중국 인구는 성급 행정 구역별로 그룹화했다. 조사된 모든 혈통은 9번째 수준에서 잘라냈으며, 그 결과 적어도 한 인구 집단에서 빈도가 0.05를 초과하는 139개의 흔한 혈통, 177개의 저빈도 혈통, 165개의 희귀 혈통을 식별했다. 하플로그룹 빈도와 지리적 좌표(경도, 위도) 간의 피어슨 상관 계수와 그 상호상관 및 통계적 유의성은 “corrplot” R 패키지를 사용하여 추정했다. 그 후, 모든 중국 인구를 단일 하위 집단으로 통합하여 빈도가 0.01 또는 0.05 이상인 공통 혈통을 정의했다. “corrplot” R 패키지는 혼합 비율과 하플로그룹 빈도 간의 상관관계를 평가하는 데에도 활용되었다.

선언 Declarations

Ethics Approval and Consent to Participate

윤리 승인 및 참여 동의

This study received approval from the Medical Ethics Committee of West China Hospital of Sichuan University and was conducted following the principles outlined in the Helsinki Declaration.

이 연구는 사천대학(四川大學) 서중병원(西中醫院) 의학윤리위원회의 승인을 받았으며 헬싱키 선언에 명시된 원칙에 따라 수행되었다.

출판 동의 Consent for Publication

Not applicable.

해당 없음.

보충 자료 Supplementary Material

Supplementary material is available at Molecular Biology and Evolution online.

보충 자료는 Molecular Biology and Evolution 온라인에서 확인할 수 있다.

감사의 글 Acknowledgments

We thank all the volunteers who participated in this project.

이 프로젝트에 참여해주신 모든 자원봉사자분들께 감사드린다.

저자 기여 Author Contributions

G.H., M.W., B.Z., and C.L. conceived and supervised the project. G.H. and M.W. collected the samples. K.L., K.Z., Y.H., G.H., and M.W. extracted the genomic DNA and performed the genome sequencing. G.H., M.W., and K.L. did variant calling. M.Z. provided first-hand language documents from their previous linguistic fieldwork. Ha.Y., L.W., and C.W. collected the archaeological and ancient DNA data. M.W., Y.H., K.L, Z.W., S.D., Ho.Y., Q.S., J.Z., R.T., J.C., Y.S., X.L., H.S., Q.Y., L.H., L.Y., Ju.Y., S.N., Y.C., Ji.Y., K.Z., B.Z., C.L., and G.H. performed population genetic analysis. G.H. and M.W. drafted the manuscript. G.H., M.W., B.Z., and C.L. revised the manuscript.

G.H., M.W., B.Z., C.L.가 프로젝트를 구상하고 감독했다. G.H.와 M.W.가 샘플을 수집했다. K.L., K.Z., Y.H., G.H., M.W.가 게놈 DNA를 추출하고 게놈 시퀀싱을 수행했다. G.H., M.W., K.L.이 변이 호출을 수행했다. M.Z.는 이전 언어학 현장 조사에서 얻은 1차 언어 자료를 제공했다. Ha.Y., L.W., C.W.가 고고학 및 고대 DNA 데이터를 수집했다. M.W., Y.H., K.L, Z.W., S.D., Ho.Y., Q.S., J.Z., R.T., J.C., Y.S., X.L., H.S., Q.Y., L.H., L.Y., Ju.Y., S.N., Y.C., Ji.Y., K.Z., B.Z., C.L., G.H.가 집단 유전학 분석을 수행했다. G.H.와 M.W.가 원고 초안을 작성했다. G.H., M.W., B.Z., C.L.이 원고를 수정했다.

연구비 지원 Funding

This work was supported by grants from the National Natural Science Foundation of China (82202078) and the Major Project of the National Social Science Foundation of China (23&ZD203), the Open Project of the Key Laboratory of Forensic Genetics of the Ministry of Public Security (2022FGKFKT05), the Center for Archaeological Science of Sichuan University (23SASA01), the 1-3-5 Project for Disciplines of Excellence, West China Hospital, Sichuan University (ZYJC20002), and the Sichuan Science and Technology Program (2024NSFSC1518).

이 연구는 중국 국가자연과학기금(82202078)과 중국 국가사회과학기금 중대 프로젝트(23&ZD203), 공안부 법의유전학 핵심연구실 개방 프로젝트(2022FGKFKT05), 사천대학 고고학과학센터(23SASA01), 사천대학 서중병원 우수학문 1-3-5 프로젝트(ZYJC20002), 그리고 사천성 과학기술 프로그램(2024NSFSC1518)의 지원을 받았다.

이해관계 충돌 Conflict of Interest

The authors declare that they have no competing interests.

저자들은 이해관계의 충돌이 없음을 선언한다.

데이터 이용 가능성 Data Availability

All haplogroup information is provided in the Supplementary Material. We followed the regulations of the Ministry of Science and Technology of the People’s Republic of China. The raw genotype data required controlled access. Further requests for access to the raw data can be sent to Guanglin He (Guanglinhescu@163.com) and Mengge Wang (Menggewang2021@163.com).

모든 하플로그룹 정보는 보충 자료에 제공되어 있다. 우리는 중화인민공화국 과학기술부의 규정을 준수했다. 원본 유전자형 데이터는 접근 통제가 필요하다. 원본 데이터에 대한 추가 접근 요청은 Guanglin He(Guanglinhescu@163.com)와 Mengge Wang(Menggewang2021@163.com)에게 보낼 수 있다.

부록 Appendix

Full Author Lists of the 10K_CPGDP Consortium

10K_CPGDP 컨소시엄 전체 저자 목록

Guanglin He¹, Chao Liu², Mengge Wang², Renkuan Tang³, Libing Yun⁴, Junbao Yang⁵, Chuan-Chao Wang⁶, Jiangwei Yan⁷, Bofeng Zhu⁸, Liping Hu⁹, Shengjie Nie⁹, Hongbing Yao¹⁰

하광림(何廣林)¹, 유초(劉超)², 왕몽가(王夢鴿)², 당인관(唐仁寬)³, 운입빙(雲立冰)⁴, 양준보(楊俊寶)⁵, 왕전초(王傳超)⁶, 염강위(閆江偉)⁷, 주보봉(朱博峰)⁸, 호입평(胡立平)⁹, 섭성걸(聶聖傑)⁹, 요홍병(姚宏兵)¹⁰

¹Institute of Rare Diseases, West China Hospital of Sichuan University, Sichuan University, Chengdu, 610000, China ²Anti-Drug Technology Center of Guangdong Province, Guangzhou, 510220, China ³Department of Forensic Medicine, College of Basic Medicine, Chongqing Medical University, Chongqing, 400331, China ⁴West China School of Basic Science & Forensic Medicine, Sichuan University, Chengdu, 610041, China ⁵School of Basic Medicine and Forensic Medicine, North Sichuan Medical College, Nanchong, Sichuan, 637007, China ⁶State Key Laboratory of Cellular Stress Biology, School of Life Sciences, Xiamen University, Xiamen, 361005, China ⁷School of Forensic Medicine, Shanxi Medical University, Jinzhong, 030001, China ⁸Guangzhou Key Laboratory of Forensic Multi-Omics for Precision Identification, School of Forensic Medicine, Southern Medical University, Guangzhou, 510220, China ⁹School of Forensic Medicine, Kunming Medical University, Kunming, 650500, China ¹⁰Belt and Road Research Center for Forensic Molecular Anthropology, Gansu University of Political Science and Law, Lanzhou, 730000, China

¹중국, 610000, 성도(成都), 사천대학(四川大學), 사천대학 서중병원(西中醫院) 희귀질환연구소

²중국, 510220, 광주(廣州), 광동성(廣東省) 마약퇴치기술센터

³중국, 400331, 중경(重慶), 중경의과대학 기초의학부 법의학과

⁴중국, 610041, 성도(成都), 사천대학(四川大學) 서중기초과학·법의학부

⁵중국, 637007, 사천(四川) 남충(南充), 천북의학원(川北醫學院) 기초의학·법의학부

⁶중국, 361005, 하문(廈門), 하문대학(廈門大學) 생명과학부 세포스트레스생물학 국가핵심연구실

⁷중국, 030001, 진중(晉中), 산서의과대학(山西醫科大學) 법의학부

⁸중국, 510220, 광주(廣州), 남방의과대학(南方醫科大學) 법의학부 정밀식별을 위한 광주 법의학 다중오믹스 핵심연구실

⁹중국, 650500, 곤명(昆明), 곤명의과대학(昆明醫科大學) 법의학부

¹⁰중국, 730000, 난주(蘭州), 감숙정법대학(甘肅政法大學) 일대일로 법의분자 인류학 연구센터

참고 문헌 References

Alexander DH, Novembre J, Lange K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 2009:19(9): 1655-1664. https://doi.org/10.1101/gr.094052.109.

Bergstrom A, McCarthy SA, Hui R, Almarri MA, Ayub Q, Danecek P, Chen Y, Felkel S, Hallast P, Kamm J, et al. Insights into human genetic variation and population history from 929 diverse genomes. Science. 2020:367(6484):eaay5012. https://doi.org/10.1126/science.aay5012.

Byrska-Bishop M, Evani US, Zhao X, Basile AO, Abel HJ, Regier AA, Corvelo A, Clarke WE, Musunuri R. Nagulapalli K, et al. High-coverage whole-genome sequencing of the expanded 1000 Genomes Project cohort including 602 trios. Cell. 2022:185(18): 3426-3440 e3419. https://doi.org/10.1016/j.cell.2022.08.004.

Cao Y, Li L, Xu M, Feng Z, Sun X, Lu J, Xu Y, Du P, Wang T, Hu R, et al. The ChinaMAP analytics of deep whole genome sequences in 10,588 individuals. Cell Res. 2020:30(9):717-731. https://doi.org/10.1038/s41422-020-0322-9.

Chang CC, Chow CC, Tellier LC, Vattikuti S, Purcell SM, Lee JJ. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience. 2015:4(1):7. https://doi.org/10.1186/s13742-015-0047-8.

Chen H, Lin R, Lu Y, Zhang R, Gao Y, He Y, Xu S. Tracing Bai-Yue ancestry in aboriginal Li people on Hainan Island. Mol Biol Evol. 2022:39(10):msac210. https://doi.org/10.1093/molbev/msac210

Chen H, Lu Y, Lu D, Xu S. Y-Lineage Tracker: a high-throughput analysis framework for Y-chromosomal next-generation sequencing data. BMC Bioinformatics. 2021:22(1):114. https://doi.org/10.1186/s12859-021-04057-2.

Cheng S, Xu Z, Bian S, Chen X, Shi Y, Li Y, Duan Y, Liu Y, Lin J, Jiang Y, et al. The STROMICS genome study: deep whole-genome sequencing and analysis of 10K Chinese patients with ischemic stroke reveal complex genetic and phenotypic interplay. Cell Discov. 2023:9(1):75. https://doi.org/10.1038/s41421-023-00582-8.

Cong P-K, Bai W-Y, Li J-C, Yang M-Y, Khederzadeh S, Gai S-R, Li N, Liu Y-H, Yu S-H, Zhao W-W, et al. Genomic analyses of 10,376 individuals in the Westlake BioBank for Chinese (WBBC) pilot project. Nat Commun. 2022:13(1):2939. https://doi.org/10.1038/s41467-022-30526-х.

Cui Y, Li H, Ning C, Zhang Y, Chen L, Zhao X, Hagelberg E, Zhou H. Y chromosome analysis of prehistoric human populations in the West Liao River Valley, Northeast China. BMC Evol Biol. 2013:13(1):216. https://doi.org/10.1186/1471-2148-13-216.

Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, Handsaker RE, Lunter G, Marth GT, Sherry ST, et al. The variant call format and VCFtools. Bioinformatics. 2011:27(15):2156-2158. https://doi.org/10.1093/bioinformatics/btr330.

Darriba D, Taboada GL, Doallo R, Posada D. jModelTest 2: more models, new heuristics and parallel computing. Nat Methods. 2012:9(8):772. https://doi.org/10.1038/nmeth.2109.

Feng Q, Lu Y, Ni X, Yuan K, Yang Y, Yang X, Liu C, Lou H, Ning Z, Wang Y, et al. Genetic history of Xinjiang’s Uyghurs suggests Bronze Age multiple-way contacts in Eurasia. Mol Biol Evol. 2017:34(10):2572-2582. https://doi.org/10.1093/molbev/msx177.

Fu Q, Li H, Moorjani P, Jay F, Slepchenko SM, Bondarev AA, Johnson PLF, Aximu-Petri A, Prüfer K, de Filippo C, et al. Genome sequence of a 45,000-year-old modern human from western Siberia. Nature. 2014:514(7523):445-449. https://doi.org/10.1038/nature13810.

He G, Fan Z, Zou X, Deng X, Yeh H, Wang Z, Liu J, Xu Q, Chen L, Deng X, et al. Demographic model and biological adaptation inferred from the genome-wide single nucleotide polymorphism data reveal tripartite origins of southernmost Chinese Huis. Am J Biol Anthropol. 2022:180(3):488-505. https://doi.org/10.1002/ajpa.24672.

He G, Wang M, Miao L, Chen J, Zhao J, Sun Q, Duan S, Wang Z, Xu X, Sun Y, et al. Multiple founding paternal lineages inferred from the newly-developed 639-plex Y-SNP panel suggested the complex admixture and migration history of Chinese people. Hum Genomics. 2023a:17(1):29. https://doi.org/10.1186/s40246-023-00476-6.

He G, Wang J, Yang L, Duan S, Sun Q, Li Y, Wu J, Wu W, Wang Z, Liu Y, et al. Genome-wide allele and haplotype-sharing patterns suggested one unique Hmong-Mein-related lineage and biological adaptation history in Southwest China. Hum Genomics. 2023b:17(1):3. https://doi.org/10.1186/s40246-023-00452-0.

He G, Yao H, Sun Q, Duan S, Tang R, Chen J, Wang Z, Sun Y, Li X, Wang S, et al. Whole-genome sequencing of ethnolinguistic diverse northwestern Chinese Hexi Corridor people from the 10K_CPGDP project suggested the differentiated East-West genetic admixture along the Silk Road and their biological adaptations. bioRxiv. 2023c. 2023.2002. 2026.530053.

Jagadeesan A, Ebenesersdóttir SS, Guðmundsdóttir VB, Thordardottir EL, Moore KHS, Helgason A. HaploGrouper: a generalized approach to haplogroup classification. Bioinformatics. 2021:37(4):570-572. https://doi.org/10.1093/bioinformatics/btaa729.

Jeong C, Wang K, Wilkin S, Taylor WTT, Miller BK, Bemmann JH, Stahl R, Chiovelli C, Knolle F, Ulziibayar S, et al. A dynamic 6,000-year genetic history of Eurasia’s eastern steppe. Cell. 2020:183(4):890-904 e829. https://doi.org/10.1016/j.cell.2020.10.015.

Karmin M, Flores R., Saag L, Hudjashov G, Brucato N, Crenna-Darusallam C, Larena M, Endicott PL, Jakobsson M, Lansing JS, et al. Episodes of diversification and isolation in Island Southeast Asian and Near Oceanian male lineages. Mol Biol Evol. 2022:39(3):msac045. https://doi.org/10.1093/molbev/msac045.

Kumar V, Wang W, Zhang J, Wang Y, Ruan Q, Yu J, Wu X, Hu X, Wu X, Guo W, et al. Bronze and Iron Age population movements underlie Xinjiang population history. Science. 2022:376(6588):62-69. https://doi.org/10.1126/science.abk1534.

Kutanan W, Kampuansai J. Srikummool M, Brunelli A, Ghirotto S, Arias L, Macholdt E, Hübner A, Schröder R, Stoneking M. Contrasting paternal and maternal genetic histories of Thai and Lao populations. Mol Biol Evol. 2019:36(7):1490-1506. https://doi.org/10.1093/molbev/msz083.

Leigh JW, Bryant D. Popart: full-feature software for haplotype network construction. Methods Ecol Evol. 2015:6(9):1110-1116. https://doi.org/10.1111/2041-210X.12

Leipe C, Long T, Sergusheva EA, Wagner M, Tarasov PE. Discontinuous spread of millet agriculture in Eastern Asia and prehistoric population dynamics. Sci Adv. 2019:5(9):eaax6225. https://doi.org/10.1126/sciadv.aax6225.

Letunic I, Bork P. Interactive Tree Of Life (iTOL) v5: an online tool for phylogenetic tree display and annotation. Nucleic Acids Res. 2021:49 (W1):W293-W296. https://doi.org/10.1093/nar/gkab301.

Li H. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics. 2011:27(21):2987-2993. https://doi.org/10.1093/bioinformatics/btr509.

Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009:25(14):1754-1760. https://doi.org/10.1093/bioinformatics/btp324.

Li H, Huang Y, Mustavich LF, Zhang F, Tan J-Z, Wang L-E, Qian J, Gao M-H, Jin L. Y chromosomes of prehistoric people along the Yangtze River. Hum Genet. 2007:122(3-4):383-388. https://doi.org/10.1007/s00439-007-0407-2.

Li X, Wang M, Su H, Duan S, Sun Y, Chen H, Wang Z, Sun Q, Yang Q, Chen J, et al. Evolutionary history and biological adaptation of Han Chinese people on the Mongolian Plateau. hLife. 2024:2(6):296-313. https://doi.org/10.1016/j.hlife.2024.04.005.

Liu D, Ko AMS, Stoneking M. The genomic diversity of Taiwanese Austronesian groups: implications for the “into- and out-of-Taiwan” models. PNAS Nexus. 2023:2:pgad122.

Macholdt E, Arias L, Duong NT, Ton ND, Van Phong N, Schröder R, Pakendorf B, Van Hai N, Stoneking M. The paternal and maternal genetic history of Vietnamese populations. Eur J Hum Genet. 2020:28:636-645. https://doi.org/10.1038/s41431-019-0557-4.

Mallick S, Li H, Lipson M, Mathieson I, Gymrek M, Racimo F, Zhao M, Chennagiri N, Nordenfelt S, Tandon A, et al. The Simons Genome Diversity Project: 300 genomes from 142 diverse populations. Nature. 2016:538(7624):201-206. https://doi.org/10.1038/nature18964.

Mallick S, Micco A, Mah M, Ringbauer H, Lazaridis I, Olalde I, Patterson N, Reich D. The Allen Ancient DNA Resource (AADR) a curated compendium of ancient human genomes. Sci Data. 2024:11(1):182. https://doi.org/10.1038/s41597-024-03031-7.

Martiniano R, De Sanctis B, Hallast P, Durbin R. Placing ancient DNA sequences into reference phylogenies. Mol Biol Evol. 2022:39(2):msac017. https://doi.org/10.1093/molbev/msac017.

McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M, et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010:20(9):1297-1303. https://doi.org/10.1101/gr.107524.110.

Miao B, Liu Y, Gu W, Wei Q, Wu Q, Wang W, Zhang M, Ding M, Wang T, Liu J, et al. Maternal genetic structure of a neolithic population of the Yangshao culture. J Genet Genomics. 2021:48(8):746-750. https://doi.org/10.1016/j.jgg.2021.04.005.

Miller NF, Spengler RN, Frachetti M. Millet cultivation across Eurasia: origins, spread, and the influence of seasonal climate. The Holocene. 2016:26(10):1566-1575. https://doi.org/10.1177/0959683616641742.

Ning C, Fernandes D, Changmai P, Flegontova O, Yüncü E, Maier R, Altınışık NE, Kassian AS, Krause J, Lalueza-Fox C, et al. The genomic formation of First American ancestors in East and Northeast Asia. bioRxiv. 2020a. 2020.2010. 2012.336628.

Ning C, Li T, Wang K, Zhang F, Li T, Wu X, Gao S, Zhang Q, Zhang H, Hudson MJ, et al. Ancient genomes from northern China suggest links between subsistence changes and human migration. Nat Commun. 2020b:11(1):2700. https://doi.org/10.1038/s41467-020-16557-2.

Olson ND, Wagner J, Dwarshuis N, Miga KH, Sedlazeck FJ, Salit M, Zook JM. Variant calling and benchmarking in an era of complete human genome sequences. Nat Rev Genet. 2023:24(7):464-483. https://doi.org/10.1038/s41576-023-00590-0.

Poznik GD, Henn BM, Yee M-C, Sliwerska E, Euskirchen GM, Lin AA, Snyder M, Quintana-Murci L, Kidd JM, Underhill PA, et al. Sequencing Y chromosomes resolves discrepancy in time to common ancestor of males versus females. Science. 2013:341(6145):562-565. https://doi.org/10.1126/science.1237619.

Poznik GD, Xue Y, Mendez FL, Willems TF, Massaia A, Wilson Sayres MA, Ayub Q, McCarthy SA, Narechania A, Kashin S, et al. Punctuated bursts in human male demography inferred from 1,244 worldwide Y-chromosome sequences. Nat Genet. 2016:48(6):593-599. https://doi.org/10.1038/ng.3559.

Raghavan M, Skoglund P, Graf KE, Metspalu M, Albrechtsen A, Moltke I, Rasmussen S, Stafford TW, Orlando L. Metspalu E, et al. Upper Palaeolithic Siberian genome reveals dual ancestry of Native Americans. Nature. 2014:505(7481):87-91. https://doi.org/10.1038/nature12736.

Rambaut A, Drummond AJ, Xie D, Baele G, Suchard MA. Posterior summarization in Bayesian phylogenetics using Tracer 1.7. Syst Biol. 2018:67(5):901-904. https://doi.org/10.1093/sysbio/syy032.

Robbeets M, Bouckaert R, Conte M, Savelyev A, Li T, An D-I, Shinoda K-I, Cui Y, Kawashima T, Kim G, et al. Triangulation supports agricultural spread of the Transeurasian languages. Nature. 2021:599(7886):616-621. https://doi.org/10.1038/s41586-021-04108-8.

Stamatakis A. RAXML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics. 2014:30(9):1312-1313. https://doi.org/10.1093/bioinformatics/btu033.

Su B, Xiao J, Underhill P, Deka R, Zhang W, Akey J, Huang W, Shen D, Lu D, Luo J, et al. Y-Chromosome evidence for a northward migration of modern humans into Eastern Asia during the last Ice Age. Am J Hum Genet. 1999:65(6):1718-1724. https://doi.org/10.1086/302680.

Suchard MA, Lemey P, Baele G, Ayres DL, Drummond AJ, Rambaut A. Bayesian phylogenetic and phylodynamic data integration using BEAST 1.10. Virus Evol. 2018:4(1):vey016. https://doi.org/10.1093/ve/vey016.

Sun J, Li Y, Ma P, Yan S, Cheng H, Fan Z, Deng X, Ru K, Wang C, Chen G, et al. Shared paternal ancestry of Han, Tai-Kadai-speaking, and Austronesian-speaking populations as revealed by the high resolution phylogeny of O1a-M119 and distribution of its sub-lineages within China. Am J Phys Anthropol. 2021:174(4):686-700. https://doi.org/10.1002/ajpa.24240.

Sun N, Ma P-C, Yan S, Wen S-Q, Sun C, Du P-X, Cheng H-Z, Deng X-H, Wang C-C, Wei L-H. Phylogeography of Y-chromosome haplogroup Q1a1a-M120, a paternal lineage connecting populations in Siberia and East Asia. Ann Hum Biol. 2019:46(3):261-266. https://doi.org/10.1080/03014460.2019.1632930.

Sun Y, Wang M, Sun Q, Liu Y, Duan S, Wang Z, Zhou Y, Zhong J, Huang Y, Huang X, et al. Distinguished biological adaptation architecture aggravated population differentiation of Tibeto-Burman-speaking people. J Genet Genomics. 2023:51(5):517-530. https://doi.org/10.1016/j.jgg.2023.10.002.

Tao Y, Wei Y, Ge J, Pan Y, Wang W, Bi Q, Sheng P, Fu C, Pan W, Jin L et al. Phylogenetic evidence reveals early Kra-Dai divergence and dispersal in the late Holocene. Nat Commun. 2023:14(1):6924. https://doi.org/10.1038/s41467-023-42761-x.

Wang M, He G, Zou X, Chen P, Wang Z, Tang R, Yang X, Chen J, Yang M, Li Y, et al. Reconstructing the genetic admixture history of Tai-Kadai and Sinitic people: insights from genome-wide SNP data from South China. J Syst Evol. 2022:61(1):157-178. https://doi.org/10.1111/jse.12825.

Wang M, Wang Z, He G, Liu J, Wang S, Qian X, Lang M, Li J, Xie M, Li C, et al. Developmental validation of a custom panel including 165 Y-SNPs for Chinese Y-chromosomal haplogroups dissection using the ion S5 XL system. Forensic Sci Int Genet. 2019:38:70-76. https://doi.org/10.1016/j.fsigen.2018.10.009.

Wang T, Wang W, Xie G, Li Z, Fan X, Yang Q, Wu X, Cao P, Liu Y, Yang R, et al. Human population history at the crossroads of East and Southeast Asia since 11,000 years ago. Cell. 2021a:184(14):3829-3841 e3821. https://doi.org/10.1016/j.cell.2021.05.018.

Wang H, Yang MA, Wangdue S, Lu H, Chen H, Li L, Dong G, Tsring T, Yuan H, He W, et al. Human genetic history on the Tibetan Plateau in the past 5100 years. Sci Adv. 2023:9(11):eadd5582. https://doi.org/10.1126/sciadv.add5582.

Wang C-C, Yeh H-Y, Popov AN, Zhang H-Q, Matsumura H, Sirak K, Cheronet O, Kovalev A, Rohland N, Kim AM, et al. Genomic insights into the formation of human populations in East Asia. Nature. 2021b:591(7850):413-419. https://doi.org/10.1038/s41586-021-03336-2

Wei W, Ayub Q, Chen Y, McCarthy S, Hou Y, Carbone I, Xue Y, Tyler-Smith C. A calibrated human Y-chromosomal phylogeny based on resequencing. Genome Res. 2013:23(2):388-395. https://doi.org/10.1101/gr.143198.112.

Wei L-H, Yan S, Lu Y, Wen S-Q, Huang Y-Z, Wang L-X, Li S-L, Yang Y-J, Wang X-F, Zhang C, et al. Whole-sequence analysis indicates that the Y chromosome C2-Star Cluster traces back to ordinary Mongols, rather than Genghis Khan. Eur J Hum Genet. 2018:26(2):230-237. https://doi.org/10.1038/s41431-017-0012-3.

World Medical Association (WMA). World Medical Association Declaration of Helsinki: ethical principles for medical research involving human subjects. JAMA. 2013:310:2191-2194. https://doi.org/10.1001/jama.2013.281053.

Wu Q, Cheng H-Z, Sun N, Ma P-C, Sun J, Yao H-B, Xie Y-M, Li Y-L, Meng S-L, Zhabagin M, et al. Phylogenetic analysis of the Y-chromosome haplogroup C2b-F1067, a dominant paternal lineage in Eastern Eurasia. J Hum Genet. 2020:65(10):823-829. https://doi.org/10.1038/s10038-020-0775-1.

Yang MA, Fan X, Sun B, Chen C, Lang J, Ko Y-C, Tsang C-h, Chiu H, Wang T, Bao Q, et al. Ancient DNA indicates human population shifts and admixture in northern and southern China. Science. 2020:369(6501):282-288. https://doi.org/10.1126/science.aba0909.

Zerjal T, Xue Y, Bertorelle G, Wells RS, Bao W, Zhu S, Qamar R, Ayub Q, Mohyuddin A, Fu S, et al. The genetic legacy of the Mongols. Am J Hum Genet. 2003:72(3):717-721. https://doi.org/10.1086/367774.

Zhabagin M, Wei L-H, Sabitov Z, Ma P-C, Sun J, Dyussenova Z, Balanovska E, Li H, Ramankulov Y. Ancient components and recent expansion in the Eurasian heartland: insights into the revised phylogeny of Y-chromosomes from Central Asia. Genes (Basel). 2022:13(10):1776. https://doi.org/10.3390/genes13101776.

Zhang X, Kampuansai J, Qi X, Yan S, Yang Z, Serey B, Sovannary T, Bunnath L, Aun HS, Samnom H, et al. An updated phylogeny of the human Y-chromosome lineage O2a-M95 with novel SNPs. PLoS One. 2014:9(6):e101020. https://doi.org/10.1371/journal.pone.0101020.

Zhang Y, Lei X, Chen H, Zhou H, Huang S. Ancient DNAs and the Neolithic Chinese super-grandfather Y haplotypes. bioRxiv. 2018:487918. https://doi.org/10.1101/487918.

Zhang P, Luo H, Li Y, Wang Y, Wang J, Zheng Y, Niu Y, Shi Y, Zhou H, Song T, et al. NyuWa Genome resource: a deep whole-genome sequencing-based variation profile and reference panel for the Chinese population. Cell Rep. 2021a:37(7):110017. https://doi.org/10.1016/j.celrep.2021.110017.

Zhang F, Ning C, Scott A, Fu Q, Bjørn R, Li W, Wei D, Wang W, Fan L, Abuduresule I, et al. The genomic origins of the Bronze Age Tarim Basin mummies. Nature. 2021b:599(7884):256-261. https://doi.org/10.1038/s41586-021-04052-7.

Zhang M, Yan S, Pan W, Jin L. Phylogenetic evidence for Sino-Tibetan origin in northern China in the Late Neolithic. Nature. 2019:569(7754):112-115. https://doi.org/10.1038/s41586-019-1153-z.

Zhao Y, Zhang Y, Li H, Cui Y, Zhu H, Zhou H. Ancient DNA evidence reveals that the Y chromosome haplogroup Q1a1 admixed into the Han Chinese 3,000 years ago. Am J Hum Biol. 2014:26(6):813-821. https://doi.org/10.1002/ajhb.22604.