출처:

Kim, Jungeun, et al. “The origin and composition of Korean ethnicity analyzed by ancient and present-day genome sequences.” Genome Biology and Evolution 12.5 (2020): 553-565.

The Origin and Composition of Korean Ethnicity Analyzed by Ancient and Present-Day Genome Sequences

고대 및 현대 유전체 서열 분석을 통한 한민족의 기원과 구성

김정은(Jungeun Kim)^1,†, 전성원(Sungwon Jeon)^{2,3, †}, 최재필(Jae-Pil Choi)¹, 아스타 블라지테(Asta Blazyte)², 전연수(Yeonsu Jeon)^2,3, 김종일(Jong-Il Kim)⁴, 준 오하시(Jun Ohashi)⁵, 카츠시 토쿠나가(Katsushi Tokunaga)⁶, 스미오 스가노(Sumio Sugano)⁷, 수탓 푸차로엔(Suthat Fucharoen)⁸, 파드 알-물라(Fahd Al-Mulla)⁹, 박종화(Jong Bhak)^1,2,3,10,*

¹ Personal Genomics Institute (PGI), Genome Research Foundation, Osong, Republic of Korea
개인유전체연구소(PGI), 유전체연구재단, 오송, 대한민국(Republic of Korea)
² Korean Genomics Center (KOGIC), Ulsan National Institute of Science and Technology (UNIST), Ulsan, Republic of Korea
한국유전체센터(KOGIC), 울산과학기술원(UNIST), 울산, 대한민국(Republic of Korea)
³ Department of Biomedical Engineering, School of Life Sciences, Ulsan National Institute of Science and Technology (UNIST), Ulsan, Republic of Korea
생명과학부 생체의공학과, 울산과학기술원(UNIST), 울산, 대한민국(Republic of Korea)
⁴ Department of Archaeology and Art History, Seoul National University, Republic of Korea
고고미술사학과, 서울대학교(Seoul National University), 대한민국(Republic of Korea)
⁵Department of Biological Sciences, Graduate School of Medicine, The University of Tokyo, Japan
생명과학과, 의학대학원, 동경대학(The University of Tokyo), 일본
⁶ Department of Human Genetics, Graduate School of Medicine, The University of Tokyo, Japan
인류유전학과, 의학대학원, 동경대학(The University of Tokyo), 일본
⁷ Department of Medical Genome Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Japan
의학유전체학과, 신영역창성과학대학원, 동경대학(The University of Tokyo), 일본
⁸ Thalassemia Research Center, Institute of Molecular Biosciences, Mahidol University, Nakorn Pathom, Thailand
탈라세미아 연구센터, 분자생명과학 연구소, 마히돌(Mahidol) 대학, 나콘빠톰(Nakorn Pathom), 태국(Thailand)
⁹ Center of Genomic Medicine, Kuwait University, Kuwait
유전체의학센터, 쿠웨이트(Kuwait) 대학, 쿠웨이트(Kuwait)
¹⁰ Clinomics Inc, Ulsan, Republic of Korea
클리노믹스(Clinomics Inc), 울산, 대한민국(Republic of Korea)
^† These authors contributed equally to this work.
이 저자들은 이 연구에 동등하게 기여했다.
^* 교신 저자: 이메일: jongbhak@genomics.org.
Accepted: March 23, 2020
논문 승인: 2020년 3월 23일
Data deposition: This project has been deposited at GenBank under the accession provided in supplementary table S1, Supplementary Material online.
데이터 저장: 이 프로젝트는 온라인 보충 자료(Supplementary Material online)의 보충 표 S1에 제공된 등록 번호로 젠뱅크(GenBank)에 기탁되었다.

[논문요약]

1. 결론

이 연구의 결론은 ‘현대 한국인’의 유전적 구성이 신석기 시대(약 8천 년 전) ‘북방계’ 조상과 철기 시대(약 2~3천 년 전) ‘남방계’ 조상의 비교적 최근(철기 시대 무렵)에 이루어진 ‘급격한 혼합’으로 형성되었다는 것이다.

북방계 조상: 오늘날 러시아(Russia) 지역의 ‘데블스 게이트(Devil’s Gate, 惡魔의 門)’ 동굴에서 발견된 8천 년 전 신석기인의 유전자가 현대 한국인의 북방계 요소를 가장 잘 대표한다.
남방계 조상: 동남아시아(Southeast Asia), 특히 캄보디아(Cambodia)의 ‘밧콤노우(Vat Komnou)’ 유적에서 발견된 철기 시대인의 유전자가 현대 한국인의 남방계 요소를 가장 잘 대표한다.
혼합 과정: 이 두 집단(북방계, 남방계)의 대규모 혼합은 한반도 내에서 독자적으로 일어난 것이 아니라, 아마도 중국 남부 등 한반도 외부에서 먼저 혼합된 집단이 농업 기술 등을 가지고 동아시아(East Asia) 전역으로 급격히 팽창하면서 한국과 일본 등지로 퍼져나간 것으로 보인다.

2. 무엇이 궁금했나?

전통적으로 한국인은 ‘북방계’와 ‘남방계’가 섞인 민족이라고 알려져 왔다. 하지만 이들이 구체적으로 누구인지, 언제, 어디서, 어떻게 혼합되었는지에 대해서는 유전체(게놈) 수준의 확실한 증거가 없었다.

특히 한국은 토양이 산성이라 고대 인골(人骨) 화석이 잘 보존되지 않아, 한반도 내부의 고대 DNA 확보가 매우 어렵다.

따라서 이 연구는 이 문제를 풀기 위해, 확보 가능한 현대 한국인의 유전체와, 한국 주변(중국, 러시아(Russia), 동남아시아(東南亞細亞))에서 발견된 다양한 시대의 고대인 유전체를 대규모로 비교 분석했다.

3. 무엇을 발견했나?

1) 결과 1: 현대인의 유전 구조 (현대 한국인의 유전적 위치)

연구진은 먼저 88명의 현대 한국인(KOR) 유전체를 다른 현대 민족(총 91개 집단)과 비교했다.

전문용어(PCA/ADMIXTURE): 유전체 분석에는 PCA(주성분 분석)와 ADMIXTURE(혼합 분석)라는 도구가 자주 쓰인다.
- PCA(그림 1-B): 복잡한 유전 정보를 2차원 지도에 점으로 찍어, 집단 간 유전적 거리를 보여주는 ‘유전 지도’이다. 가까이 찍힌 집단일수록 유전적으로 가깝다.
- ADMIXTURE(그림 1-C): 특정 민족의 유전자가 몇 개의 가상 조상 집단(예: 그림의 색깔 막대)으로 구성되어 있는지 그 ‘조상 비율’을 보여준다.

분석 결과(그림 1):

- PCA 지도(그림 1-B)에서 한국인(KOR, 진한 파란색)은 중국인, 일본인(日本人)과 매우 가깝게 뭉쳐(EAb 집단) 나타난다.
- ADMIXTURE 분석(그림 1-C, K=10 기준)에서 한국인(KOR)의 유전자는 크게 동시베리아(East Siberians, Eₛᵢ) 계열(붉은색 계열)과 동아시아(East Asians, EAₐ/b) 계열(파란색/청록색 계열)이 약 38% : 62% 비율로 섞여 있는 것으로 나타났다.
- 이는 현대 한국인이 유전적으로 북방계(시베리아(Siberia))와 남방계(동아시아(East Asia))의 혼합이라는 것을 다시 한번 확인시켜 준다.

(그림 1) 현대 인구 집단의 유전적 클러스터링

(A)는 분석에 사용된 현대 인구 집단의 지리적 위치, (B)는 집단 간 유전적 거리를 보여주는 PCA ‘유전 지도’, (C)는 각 집단의 ‘조상 비율’을 보여주는 ADMIXTURE 분석 결과이다. (B)와 (C) 모두 한국인(KOR)이 중국인, 일본인(日本人) 등(EAb)과 매우 가깝다는 것을 보여준다.

2) 결과 2: 고대인의 유전적 친연성 (우리의 조상은 누구인가?)

연구진은 ‘북방계’와 ‘남방계’ 조상이 구체적으로 누구인지 찾기 위해, 115명의 고대인 유전체를 분석했다.

핵심 고대인:
- 전원(田園)인 (Tianyuan, 4만 년 전, 중국 북경(北京)): 동아시아(East Asia) 사람들의 매우 오래된 ‘기저(Basal)’ 조상, 즉 ‘조상의 조상’ 격이다.
- 데블스 게이트(Devil’s Gate)인 (8천 년 전, 러시아(Russia)): 한국 북부와 가까운 지역에서 발견된 신석기인. ‘북방계’ 조상을 대표한다.
- 만박(Man Bac)인 (4천 년 전, 베트남(Vietnam)): ‘전원(田園)인’의 유전자를 많이 가진 ‘초기 남방계’ 조상이다.
- 밧콤노우(Vat Komnou)인 (약 2천 년 전, 캄보디아(Cambodia)): ‘만박(Man Bac)’인보다 후대에 나타난 ‘후기 남방계’ 조상이다.
전문용어(f3/D 통계량): (그림 2, 3)의 분석은 f3 통계량과 D-통계량을 사용했다. 이는 특정 집단(예: 한국인)이 조상 A와 조상 B 중 누구와 유전적으로 더 가까운지, 또는 A와 B가 섞여서 만들어졌는지 등을 통계적으로 검증하는 방법이다.
분석 결과(그림 2):

- 고대인들의 유전적 친연성을 분석한 결과(그림 2-A), 가장 오래된 조상인 ‘전원(田園)인(Tianyuan)’은 이후의 모든 동아시아(East Asia) 및 시베리아(Siberia) 집단과 유전자를 공유하고 있었다.
- 놀랍게도 4만 년 전 ‘전원(田園)인(Tianyuan)’은 현대인보다 오히려 고대 동남아시아인(ancSEAs, 예: 만박(Man Bac))과 유전적으로 더 가까웠다(그림 2-B). 이는 ‘전원(田園)인’이 동남아시아(Southeast Asia) 집단의 직계 조상 중 하나임을 시사한다.
- 반면, 8천 년 전 ‘데블스 게이트(Devil’s Gate)’인은 ‘전원(田園)인’과 유전적 거리가 거의 0(D≈0)으로 나왔는데, 이는 ‘데블스 게이트’인 역시 ‘전원(田園)인’ 계통에서 갈라져 나왔지만, 아마도 ‘전원(田園)인’ 외의 ‘또 다른 유전 요소’와 혼합되었을 가능성을 보여준다.

(그림 2) 고대인과 현대인의 유전적 연관성

(A)는 고대인(가로축)과 현대인(세로축)이 유전적으로 얼마나 가까운지 보여주는 히트맵(붉을수록 가까움). (B)는 ‘전원(田園)인(Tianyuan)’을 기준으로 다른 고대인(X축)과 현대인(Y축)이 얼마나 가까운지 비교한 그래프이다. (C)는 고대인끼리의 유전적 친연성이다.

3) 결과 3: 한국인을 형성한 유전자 흐름 (언제, 어떻게 섞였나?)

이 연구의 가장 중요한 발견은 ‘언제’, ‘어떻게’ 북방계와 남방계가 섞여 지금의 한국인이 되었는지 밝힌 부분이다.

분석 결과(표 1, 그림 3):
- 연구진은 ‘한국인(KOR) = 조상 A + 조상 B’라는 공식을 만들어, 어떤 고대인 조상의 조합이 현대 한국인을 가장 잘 설명하는지 통계(admixture f3)를 돌려보았다.
- 그 결과(표 1), ‘데블스 게이트(Devil’s Gate)인’ (북방계)과 ‘밧콤노우(Vat Komnou)인’ (후기 남방계)의 조합이 현대 한국인을 가장 잘 설명하는 것으로 나타났다. ‘초기 남방계’인 만박(Man Bac)인보다 ‘후기 남방계’인 밧콤노우(Vat Komnou)인이 한국인과 더 강하게 연결된 것이다.
- 또한 D-통계량 분석(그림 3-A)에서도 현대 한국인(韓國人, EAb)은 ‘데블스 게이트(Devil’s Gate)’인과 강한 유전적 연관성을 보였다.
- 종합하면, ‘전원(田園)인’에서 유래한 두 갈래(북방계 ‘데블스 게이트’ / 초기 남방계 ‘만박’)가 신석기 시대까지 동아시아(East Asia) 전역에서 서서히 섞이다가, 청동기/철기 시대에 ‘밧콤노우’로 대표되는 ‘새로운 남방계’ 집단이 급격히 팽창하며 북방계와 혼합되었고, 이들이 현재 한국인의 주된 유전적 토대가 되었다는 것이다.
- (그림 3-C)는 현대 한국인과 가장 유전적으로 가까운 이웃을 보여주는데, 예상대로 일본인(Japanese)이 가장 가깝고(붉은색 점), 그다음으로 남부 중국인(Southern Chinese)이 가깝게 나타났다.

(표 1) 혼합 f3 통계량

‘한국인(KOR) = 출처 1 + 출처 2’로 가정했을 때 통계값(f3)을 보여준다. 값이 낮을수록(더 음수일수록) 해당 조합이 한국인을 잘 설명한다는 뜻이다. ‘밧콤노우(Vat Komnou)’와 ‘데블스 게이트(Devil’s gate) 2’의 조합이 가장 낮은 값을 보인다.

(그림 3) 한국인을 형성한 유전자 흐름

(A)는 ‘데블스 게이트(Devil’s Gate)’인을 기준으로, (B)는 ‘만박(Man Bac)’인을 기준으로 다른 집단과의 유전적 관계를 보여준다. (C)는 현대 한국인과 주변 민족 간의 유전적 친연성 지도로, 붉을수록 가깝다는 의미이다.

4) 결과 4: 부계(Y-DNA)와 모계(mtDNA)의 증거

연구진은 부계(Y-염색체)와 모계(미토콘드리아 DNA) 혈통을 따로 추적하여 이 가설을 검증했다.

부계(Y-DNA) (그림 4-A): 한국 남성의 Y-염색체는 매우 단순했다. ‘O’ 하플로그룹(O2b, O3, 주로 동남아시아(Southeast Asia) 유래)이 71%이고, ‘C’ 하플로그룹(주로 시베리아(Siberia) 유래)이 18%였다. 이는 남성 조상이 크게 ‘남방계’와 ‘북방계’ 두 집단에서 왔음을 명확히 보여준다.
모계(mtDNA) (그림 4-B, 4-C): 모계 혈통은 훨씬 복잡했다.
- 가장 오래된(약 4만 년 전) 이주 물결(N/Y/A, D, B/R)이 현재 한국인의 62%를 차지하며, 이들은 ‘데블스 게이트(Devil’s Gate)’인과 ‘전원(田園)인(Tianyuan)’ 양쪽 모두와 연결된다.
- 더 나중의(약 2만 년 전) 이주 물결(G/C/Z, M, F)이 38%를 차지하며, 이들은 ‘남방계’의 확장(예: 만박(Man Bac), 반치앙(Ban Chiang))과 관련된다.
- 즉, 모계 혈통은 부계보다 훨씬 더 여러 번, 여러 경로(북쪽과 남쪽 모두)를 통해 지속적으로 한반도에 유입되었음을 보여준다.

(그림 4) 한국인의 하플로그룹 분포

(A)는 부계(Y-염색체) 하플로그룹 비율 (단순한 2개 집단). (B)는 모계(mtDNA) 하플로그룹 비율 (매우 다양함). (C)는 모계(mtDNA) 유전자의 계통수(tree)로, 시베리아(Siberia) 및 동남아시아(Southeast Asia)의 고대인(붉은색 이름)과 현대인(파란색/검은색 이름)이 복잡하게 섞여 있음을 보여준다.

5) 결과 5: 혼합 시기 추정 (언제 혼합이 일어났나?)

전문용어(ALDER): 유전체에 남겨진 혼합의 흔적(연관 불균형)을 분석하여, 두 집단이 ‘몇 세대 전’에 섞였는지 계산하는 통계 기법이다.
분석 결과(표 2):
- ALDER 분석 결과, 현대 한국인의 유전적 혼합 시기는 야쿠트(Yakut)인(북방계) 기준 약 5,400년 전, 한(Han)족(남방계) 기준 약 3,500년 전, 일본인(Japanese) 기준 약 2,800년 전으로 추정되었다.
- 이는 한국인의 핵심적인 유전적 혼합이 청동기(Bronze Age)에서 철기(Iron Age) 시대로 넘어가는 비교적 최근에 일어났다는 것을 시사한다.

(표 2) 한국인의 혼합 시기 추정

한국인이 참조 집단(야쿠트(Yakut), 한(Han)족, 일본인(Japanese))과 몇 년(세대) 전에 유전적으로 혼합되었는지 보여준다. 대략 3,000~5,000년 전(약 100~200세대 전)에 집중되어 있다.

4. 한국인 형성의 최종 모델

이상의 모든 증거를 종합하여, 연구진은 (그림 5)와 같은 ‘한국인 유전 형성 모델’을 제시했다.

(그림 5) 요약:
- 모든 것은 약 4만 년 전 ‘전원(田園)인(Tianyuan)'(①)에서 시작된다.
- 전원(田園)인 계통은 북쪽으로 가서 ‘데블스 게이트(Devil’s Gate)’인(②)이 되고(신석기 북방계), 남쪽으로 가서 ‘만박(Man Bac)’인(③)이 된다(신석기 남방계).
- 이 두 집단(②, ③)은 신석기 시대 내내 동아시아(East Asia) 전역에서 서서히 섞인다.
- 그러다 청동기/철기 시대에 남쪽에서 ‘밧콤노우(Vat Komnou)'(④)라는 새로운 남방계 집단이 등장하여 급격히 팽창한다.
- 현대 한국인(Korea)과 일본인(日本人)(Japan)은 이 ‘데블스 게이트’ 계통(북방계)과 ‘밧콤노우’ 계통(후기 남방계)이 약 3,900년 전(3.9Kya)을 기점으로 급격히 혼합(Admixture)되면서 형성되었다.
- 현대 한국인 유전자의 약 70%가 이 ‘후기 남방계(밧콤노우)’의 팽창에서 유래한 것으로 보이며, 이는 한국인의 형성이 한반도 고유의 사건이 아니라, 동아시아(East Asia) 전체의 인구 팽창 및 혼합 과정의 일부였음을 의미한다.

(그림 5) 한국인의 역사적 유전 구성을 묘사하는 혼합 트리 모델

이 논문의 최종 결론 도표. 4만 년 전 ‘전원(田園)인(Tianyuan)’에서 시작된 유전자가 ‘데블스 게이트(Devil’s gate)'(북)와 ‘만박(Man Bac)’/’밧콤노우(Vat Komnou)'(남)로 나뉘고, 이들이 다시 만나(점선) 현대 한국인(Korean), 일본인(Japanese), 한(Han)족 등을 형성하는 과정을 보여준다.

[논문번역]

요약

Koreans are thought to be an ethnic group of admixed northern and southern subgroups. However, the exact genetic origins of these two remain unclear. In addition, the past admixture is presumed to have taken place on the Korean peninsula, but there is no genomic scale analysis exploring the origin, composition, admixture, or the past migration of Koreans. Here, 88 Korean genomes compared with 91 other present-day populations showed two major genetic components of East Siberia and Southeast Asia. Additional paleogenomic analysis with 115 ancient genomes from Pleistocene hunter-gatherers to Iron Age farmers showed a gradual admixture of Tianyuan (40 ka) and Devil’s gate (8 ka) ancestries throughout East Asia and East Siberia up until the Neolithic era. Afterward, the current genetic foundation of Koreans may have been established through a rapid admixture with ancient Southern Chinese populations associated with Iron Age Cambodians. We speculate that this admixing trend initially occurred mostly outside the Korean peninsula followed by continuous spread and localization in Korea, corresponding to the general admixture trend of East Asia. Over 70% of extant Korean genetic diversity is explained to be derived from such a recent population expansion and admixture from the South.

한국인은 남방계와 북방계 하위 집단이 혼합된 민족 집단으로 생각된다. 그러나 이 두 집단의 정확한 유전적 기원은 불분명하다. 또한, 과거의 혼합은 한반도에서 일어난 것으로 추정되지만, 한국인의 기원, 구성, 혼합 또는 과거 이주를 탐구하는 유전체 규모의 분석은 없었다. 이 연구에서 88명의 한국인 유전체를 다른 91개 현대 인구 집단과 비교한 결과, 동시베리아(東-)와 동남아시아(東南亞細亞)라는 두 가지 주요 유전적 구성 요소가 나타났다. 플라이스토세(Pleistocene) 수렵-채집인부터 철기 시대(Iron Age) 농부에 이르는 115개의 고대 유전체를 사용한 추가 고유전체학 분석은 신석기 시대(Neolithic era)까지 동아시아(東亞細亞)와 동시베리아(東-) 전역에서 전원(田園)인(Tianyuan) (4만 년 전)과 데블스 게이트(Devil’s gate)인 (8천 년 전) 조상의 점진적인 혼합이 있었음을 보여주었다. 그 후, 철기 시대(Iron Age) 캄보디아인(Cambodians)과 관련된 고대 남부 중국인 집단과의 급격한 혼합을 통해 현재 한국인의 유전적 토대가 마련되었을 수 있다.

우리는 이러한 혼합 추세가 처음에는 주로 한반도 밖에서 발생한 후, 동아시아(東亞細亞)의 일반적인 혼합 추세와 일치하게 한국 내에서 지속적인 확산과 정착이 이루어졌다고 추측한다. 현존하는 한국인 유전적 다양성의 70% 이상이 이러한 최근의 인구 팽창과 남쪽으로부터의 혼합에서 유래한 것으로 설명된다.

Key words: Korean origin, Korean migration, population study, paleogenomics, variome, KoVariome.
키워드: 한국인 기원, 한국인 이주, 집단 연구, 고유전체학, 배리옴(variome), 코배리옴(KoVariome).

데이터 세트 Data Set
전체 유전체 서열 분석 및 유전형 분석 Whole-Genome Sequencing and Genotyping
하플로타입 분석 Haplotype Analysis
유전체 클러스터링 Genomic Clustering
혼합 시기 추정 Admixture Time Estimation
고대 집단과 현대 집단 간의 유전적 친연성 The Genetic Affinity between the Ancient and Present-Day Populations
혼합 모델 구축 Admixture Model Construction

결과 및 토의

한국인의 유전 구조 Korean Genetic Structure
신석기 시대 데블스 게이트 조상에서 한국인으로의 유전자 흐름 The Gene Flow Neolithic Age Devil’s Gate Ancestry to Korean People
한국 민족 집단을 형성한 고대의 유전자 흐름 The Ancient Gene Flow Making Up the Korean Ethnic Group
한국인 하플로타입 분석은 여러 차례의 유전 요소 유입을 보여준다 Korean Haplotype Analysis Reveals Multiwaves of Genetic Components
한국인의 혼합 시기 추정 Admixture Time Estimation for Koreans

결론

보충 자료

감사의 말

저자 기여

참고문헌

서론

The 1000 Genome Project (1KGP) showed that East Asians displayed a common genetic bottleneck with non-African humans around the last glacial maximum (1000 Genomes Project Consortium et al. 2015). However, the 1KGP project includes only five EA populations failing to fully represent EA genome structures. In 2009, the HUGO Pan-Asian Consortium (PASNP) confirmed a general concordance between linguistic and genetic affiliations (HUGO Pan-Asian SNP Consortium et al. 2009). Most recently, the Asian diversity project showed a correlation between geographical coordinates and genetic structure in Asia (Liu et al. 2017). Although Koreans are similar to the Chinese, the PASNP, 1KGP, and Asian diversity projects cannot fully explain the detailed makeup and peopling of the Korean Peninsula.

1000 유전체 프로젝트(1000 Genome Project, 1KGP)는 동아시아인(East Asians)이 마지막 최대 빙하기(last glacial maximum) 무렵 비(非)아프리카인(non-African)과 공통의 유전적 병목 현상을 보였음을 보여주었다(1000 Genomes Project Consortium et al. 2015). 그러나 1KGP 프로젝트에는 5개의 동아시아(EA) 집단만 포함되어 있어, 동아시아(EA) 유전체 구조를 완전히 대표하지 못한다. 2009년, HUGO 범아시아 컨소시엄(HUGO Pan-Asian Consortium, PASNP)은 언어적 계통과 유전적 계통 간의 일반적인 일치성을 확인했다(HUGO Pan-Asian SNP Consortium et al. 2009). 가장 최근에는 아시아 다양성 프로젝트(Asian diversity project)가 아시아(Asia)의 지리적 좌표와 유전 구조 사이의 상관관계를 보여주었다(Liu et al. 2017). 한국인이 중국인과 유사하지만, PASNP, 1KGP, 아시아 다양성 프로젝트는 한반도의 상세한 구성과 인구 형성을 완전히 설명할 수 없다.

Koreans belong to the Altaic language group and are known to be homogeneous in Northeast Asia along with the Chinese and the Japanese. There are ~85 million Koreans in total (51 mils. South and 25 mils. North Koreans, and 7 mils. outside of the Korean Peninsula) unified by shared ethnic and linguistic traits. There are currently several hypotheses on the origins of the Korean. The Korean Y-chromosome haplogroup (O2b-SRY465) suggests the ancestors of the proto-Koreans are related to the people who inhabited northeastern China during the Neolithic (9,900-10,000 years BP) and Bronze (3,450-2,350 years BP) Ages (Kim et al. 2011). On the other hand, mitochondrial DNA (mtDNA) shows that Koreans display a very typical East Asian (Jin et al. 2009). Previous population studies have revealed that Koreans have not undergone any severe genetic bottlenecks and primarily consist of two genetic components (Takeuchi et al. 2017). One is strongly associated with China, but the other is less clear. Therefore, uncovering the exact genetic makeup of Koreans has not been carried out at a whole-genome scale using both present-day and ancient genomes.

한국인은 알타이(Altaic) 어족에 속하며, 중국인 및 일본인(日本人)과 함께 동북아시아(Northeast Asia)에서 동질적인 집단으로 알려져 있다. 총 약 8,500만 명의 한국인(남한(South) 5,100만, 북한(North) 2,500만, 한반도 외 700만)이 있으며, 이들은 민족적, 언어적 특성을 공유하며 통일되어 있다. 현재 한국인의 기원에 대해서는 여러 가설이 있다. 한국인 Y-염색체 하플로그룹 (O2b-SRY465)은 원시 한국인(proto-Koreans)의 조상이 신석기(Neolithic) 시대(9,900-10,000년 전)와 청동기(Bronze) 시대(3,450-2,350년 전) 동안 중국 동북부에 거주했던 사람들과 관련이 있음을 시사한다(Kim et al. 2011). 반면에, 미토콘드리아 DNA (mtDNA)는 한국인이 매우 전형적인 동아시아인(East Asian)의 특징을 보인다는 것을 보여준다(Jin et al. 2009). 이전의 집단 연구들은 한국인이 심각한 유전적 병목 현상을 겪지 않았으며 주로 두 가지 유전적 요소로 구성되어 있음을 밝혀냈다(Takeuchi et al. 2017). 하나는 중국과 강하게 연관되어 있지만, 다른 하나는 덜 명확하다. 따라서 현대 및 고대 유전체를 모두 사용하여 전체 유전체 규모에서 한국인의 정확한 유전적 구성을 밝히는 작업은 수행되지 않았다.

Paleogenomics is a powerful tool to reveal the exact genetic lineages and affinities that cannot be resolved with present-day populations alone because frequent and complex genetic exchanges occur with or without cultural and linguistic exchanges. Archeological data unearthed in Korea provide the proto-Korean chronology and prehistories of the Korean Peninsula. The oldest archaic relics, such as the Acheulean axes, that have been found in South Korea date back hundreds of thousands of years, however, human bone preservation is poor due to the acidic soils and cannot acquire any ancient genetic data (Norton 2000). The earliest hominid evidences in the Peninsula date to be between 400,000 and 600,000 years ago (YA) (Park 1992). In spite of the claims about human bones in North Korea (Norton 2000; Bae and Bae 2012), these paleoanthropological materials are rare in Korea. Therefore, it is only possible to infer the exact Korean ethnic origins through ancient genomes found in the nearby regions, such as Devil’s Gate in Russian Far East (8,000 years BP) (Siska et al. 2017) and Tianyuan cave, Beijing (40,000 years old) (Yang et al. 2017). Fortunately, Neolithic to Iron Age ancient genomes from Southeast Asia (SEA) have become available recently (Lipson et al. 2018). Such ancient genomes, taken from a wide geographic and temporal distribution, should allow us to answer when and how the genomes of Southeast Asia contributed to the genetic makeup of Koreans.

고유전체학은 문화적, 언어적 교류와 상관없이 빈번하고 복잡한 유전적 교류가 발생하기 때문에 현대 인구 집단만으로는 해결할 수 없는 정확한 유전적 계통과 친연성을 밝혀내는 강력한 도구이다. 한국에서 발굴된 고고학적 자료는 원시 한국인(proto-Korean)의 연대와 한반도의 선사시대 정보를 제공한다. 남한(South Korea)에서 발견된 아슐리안(Acheulean) 도끼와 같은 가장 오래된 고대 유물은 수십만 년 전으로 거슬러 올라가지만, 산성 토양으로 인해 인골 보존 상태가 좋지 않아 고대 유전 데이터를 확보할 수 없다 (Norton 2000). 한반도 최초의 호미니드(hominid) 증거는 40만 년에서 60만 년 전(YA) 사이로 거슬러 올라간다(Park 1992). 북한(North Korea)의 인골에 대한 주장(Norton 2000; Bae and Bae 2012)에도 불구하고, 이러한 고인류학적 자료는 한국에서 드물다. 따라서 러시아(Russian) 극동(Far East)의 데블스 게이트(Devil’s Gate) (8,000년 전) (Siska et al. 2017)나 북경(北京)의 전원(田園) 동굴(Tianyuan cave) (40,000년 전) (Yang et al. 2017)과 같은 인근 지역에서 발견된 고대 유전체를 통해서만 한국 민족의 정확한 기원을 추론할 수 있다. 다행히도, 최근 동남아시아(Southeast Asia, SEA)의 신석기(Neolithic) 시대부터 철기(Iron Age) 시대에 이르는 고대 유전체 데이터가 이용 가능해졌다(Lipson et al. 2018). 넓은 지리적, 시간적 분포에서 얻어진 이러한 고대 유전체는 동남아시아(Southeast Asia)의 유전체가 언제, 어떻게 한국인의 유전적 구성에 기여했는지에 대한 답을 찾는 것을 가능하게 할 것이다.

재료 및 방법

데이터 세트 Data Set

A total of 88 Korean samples were used that are available from the KoVariome database (Kim et al. 2018) (supplementary table 51, Supplementary Material online) and 208 worldwide present-day individual samples were collected: 13 African, 4 American, 26 European, 7 Oceanian, 5 Central Asian, 43 East Asian, 31 North Asian, 36 South Asian, 22 West Asian, and 21 Southeast Asian (supplementary table S2, Supplementary Material online). We collected and added six EA and nine SEA individuals (supplementary table S2, Supplementary Material online). We merged the whole-genome sequence (WGS) data with the human origin SNP panel data set (Lazaridis et al. 2014) including six Korean samples genotype information generated from this panel. A total of 155 ancient genomes were collected (supplementary table S3, Supplementary Material online). Our sample data were chosen to abundantly reflect our target Asian populations and resolve the genetic relationships between Koreans and other populations. All the 88 Korean samples were collected and sequenced according to the guidelines set by the Institutional Review Board (IRB) of the Genome Research Foundation (GRF) (supplementary table 51, Supplementary Material online). Informed consent for study participation was acquired from all participants by the Korean Life Ethics bill, and all experimental protocols were approved by the GRF IRB. We uploaded them on a web site Asian Genome Data for Korean Origin https://Variome.net/Asian_Genome_Data_for_Korean_Origin, last accessed April 17, 2020).

코배리옴(KoVariome) 데이터베이스(Kim et al. 2018) (온라인 보충 자료, 보충 표 S1)에서 이용 가능한 총 88명의 한국인 샘플을 사용했고, 전 세계 208명의 현대인 개별 샘플을 수집했다: 아프리카인(African) 13명, 아메리카인(American) 4명, 유럽인(European) 26명, 오세아니아인(Oceanian) 7명, 중앙아시아인(Central Asian) 5명, 동아시아인(East Asian) 43명, 북아시아인(North Asian) 31명, 남아시아인(South Asian) 36명, 서아시아인(West Asian) 22명, 동남아시아인(Southeast Asian) 21명 (온라인 보충 자료, 보충 표 S2). 우리는 6명의 동아시아(EA)인과 9명의 동남아시아(SEA)인 개인을 추가로 수집하여 포함했다(온라인 보충 자료, 보충 표 S2). 우리는 이 패널에서 생성된 6명의 한국인 샘플 유전형 정보를 포함하여, 전체 유전체 서열(WGS) 데이터를 인간 기원 SNP 패널 데이터 세트(Lazaridis et al. 2014)와 병합했다. 총 155개의 고대 유전체를 수집했다(온라인 보충 자료, 보충 표 S3). 우리의 샘플 데이터는 목표 대상인 아시아(Asian) 집단을 풍부하게 반영하고 한국인과 다른 집단 간의 유전적 관계를 규명하기 위해 선택되었다. 88명의 한국인 샘플은 모두 유전체연구재단(GRF)의 기관생명윤리위원회(IRB)가 정한 지침에 따라 수집되고 염기서열이 분석되었다(온라인 보충 자료, 보충 표 S1). 한국 생명윤리법에 따라 모든 참가자로부터 연구 참여에 대한 사전 동의를 받았으며, 모든 실험 프로토콜은 GRF IRB의 승인을 받았다. 우리는 이 데이터들을 ‘한국인 기원을 위한 아시아(Asian) 유전체 데이터(Asian Genome Data for Korean Origin)’ 웹사이트(http://variome.net/Asian_Genome_Data_for_Korean_Origin, 2020년 4월 17일 마지막 접속)에 업로드했다.

전체 유전체 서열 분석 및 유전형 분석 Whole-Genome Sequencing and Genotyping

Samples were subjected to WGS and genotyping (supplementary table S2, Supplementary Material online). Genomic DNA was extracted using a QIAamp DNA Blood Mini Kit (Qiagen, CA) and 69 WGS libraries were constructed using TruSeq DNA sample preparation kits (Illumina, CA). Sequencing was performed using Illumina HiSeq sequencers following the manufacturer’s instruction. Low-quality reads were removed by NGSQC-toolkit (ver 2.3.3) with “-l 70 and -s 20” options (Patel and Jain 2012). Filtered reads were aligned to the human reference genome (hg19) using BWA-MEM (ver. 0.7.8) (Li and Durbin 2009). We further removed PCR duplicates using MarkDuplicates in Picard (ver. 1.9.2, http://broad-institute.github.io/picard/, last accessed April 17, 2020) and conducted IndelRealigner and BaseRecalibration using GATK (ver. 2.3.9) (McKenna et al. 2010). We predicted individual single-nucleotide variants using GATK Unified Genotyper (McKenna et al. 2010) with “-heterozygosity 0.0010-dcov 200 -stand_call_conf 30.0 -stand_emit_conf 30.0 options. To confirm artifacts in the variants merging from various resources which can occur during the production process caused by different sequencing platforms, alignment algorithms, and genotype callers, WGS-based variants were merged with the six Koreans’ genotypes generated from the human SNP panel data (Lazaridis et al. 2014). Finally, we pruned the panel with linkage disequilibrium information using plink with “-indep-pairwise 200 25 0.4” option (Purcell et al. 2007).

샘플들은 WGS 및 유전형 분석을 거쳤다(온라인 보충 자료, 보충 표 S2). 유전체 DNA는 QIAamp DNA Blood Mini Kit (Qiagen, CA)를 사용하여 추출했으며, 69개의 WGS 라이브러리는 TruSeq DNA 샘플 준비 키트(Illumina, CA)를 사용하여 구축되었다. 서열 분석은 제조사의 지침에 따라 Illumina HiSeq 시퀀서를 사용하여 수행했다. 품질이 낮은 리드(read)는 NGSQC-toolkit (ver 2.3.3)의 “-1 70 및 -s 20” 옵션(Patel and Jain 2012)을 사용하여 제거했다. 필터링된 리드는 BWA-MEM (ver. 0.7.8) (Li and Durbin 2009)을 사용하여 인간 참조 유전체(hg19)에 정렬했다. 우리는 Picard (ver. 1.9.2, http://broad-institute.github.io/picard/, 2020년 4월 17일 마지막 접속)의 MarkDuplicates를 사용하여 PCR 중복을 추가로 제거하고, GATK (ver. 2.3.9) (McKenna et al. 2010)를 사용하여 IndelRealigner와 BaseRecalibration을 수행했다. GATK Unified Genotyper (McKenna et al. 2010)를 “-heterozygosity 0.0010-dcov 200 -stand_call_conf 30.0 -stand_emit_conf 30.0 옵션과 함께 사용하여 개별 단일 염기 변이(SNV)를 예측했다. 서로 다른 시퀀싱 플랫폼, 정렬 알고리즘, 유전형 호출기(genotype caller)로 인해 생산 과정에서 발생할 수 있는, 다양한 출처의 변이들을 병합할 때 생기는 오류(artifact)를 확인하기 위해, WGS 기반 변이를 인간 SNP 패널 데이터(Lazaridis et al. 2014)에서 생성된 6명 한국인의 유전형과 병합했다. 마지막으로, plink의 “-indep-pairwise 200 25 0.4” 옵션(Purcell et al. 2007)을 사용하여 연관 불균형(linkage disequilibrium) 정보로 패널을 정리(pruned)했다.

하플로타입 분석 Haplotype Analysis

Korean haplotypes were analyzed with YFitter (Jostins et al. 2014) for Y-chromosome and haplogrep (Kloss-Brandstatter et al. 2011) for mtDNA haplotypes (supplementary table S1, Supplementary Material online). To analyze the mtDNA haplotypes of the ancient genomes, we downloaded mitochondrial BAM files of ancient genomes via the European Nucleotide Archive with accession ID of PRJEB14817, PRJEB24939, and PRJEB9021 and GenBank with accession ID of KC417443.1 for the Tianyuan mitochondrion. Consensus sequences of ancient and modern mitochondrial genomes were generated by SAM tools with minimal depth 5. Then, multiple sequence alignment of the consensus sequences was performed by MUSCLE. The phylogenetic tree was constructed by MEGA7 with a Gamma distribution model and pairwise deletion for gap treatment. Divergence time between nodes was calibrated by MEGA7 with the four previously suggested calibration points for A (41,504-51,765), B (35,360-44,929), C (29,615-42,453), and D (41,610-52,388) (Bonatto and Salzano 1997).

한국인 하플로타입은 Y-염색체의 경우 YFitter (Jostins et al. 2014)로, mtDNA 하플로타입의 경우 haplogrep (Kloss-Brandstatter et al. 2011)으로 분석했다(온라인 보충 자료, 보충 표 S1). 고대 유전체의 mtDNA 하플로타입을 분석하기 위해, 유럽 뉴클레오타이드 아카이브(European Nucleotide Archive)에서 등록 ID PRJEB14817, PRJEB24939, PRJEB9021로 고대 유전체의 미토콘드리아 BAM 파일을, 젠뱅크(GenBank)에서 등록 ID KC417443.1로 전원(田園)인(Tianyuan) 미토콘드리아 파일을 다운로드했다. 고대 및 현대 미토콘드리아 유전체의 합의 서열(Consensus sequence)은 SAM 도구를 사용하여 최소 깊이 5로 생성했다. 그런 다음, MUSCLE을 사용하여 합의 서열의 다중 서열 정렬을 수행했다. 계통수(phylogenetic tree)는 MEGA7을 사용하여 감마 분포(Gamma distribution) 모델과 간격(gap) 처리를 위한 쌍별 결실(pairwise deletion) 방법으로 구축했다. 노드(node) 간의 분기 시간은 이전에 제안된 4개의 보정 지점[A (41,504-51,765), B (35,360-44,929), C (29,615-42,453), D (41,610-52,388)] (Bonatto and Salzano 1997)을 사용하여 MEGA7로 보정했다.

유전체 클러스터링 Genomic Clustering

We used CHROMOPAINTER to infer “chromosome chunks” for each individual for fineSTRUCTURE (Lawson et al. 2012) analysis and clustered 88 Koreans (supplementary table S1, Supplementary Material online) and 208 present-day individuals (supplementary table S2, Supplementary Material online) into 64 genetic groups (supplementary figure 1, Supplementary Material online). The fineSTRUCTURE produced a homogeneous group of 88 Korean individuals (supplementary figure 2, Supplementary Material online). In total, we reclustered 185 present-day genomes and 6 Korean genomes using CHROMOPAINTER and fineSTRUCTURE (Lawson et al. 2012). Using these individuals, we implemented ADMIXTURE (ver. 1.23) (Alexander et al. 2009) with K=2-14 (supplementary figure 3, Supplementary Material online). We generated a dendrogram with each of the ADMIXTURE result (K=2-14) using the hcluster function in R. We evaluated the consistency of the ADMIXTURE and fineSTRUCTURE results by calculating correlation using the “cor.dendlist” function with the “cophenetic” method in the “dendextend” package in R (supplementary figure 4, Supplementary Material online). It showed the highest correlation when K=10 (corr. 0.78). We used the admixture result of K=10, which best represents the genetic cluster analyzed by fineSTRUCTURE. We performed a principal component analysis (PCA) analysis conducted with EIGENSOFT (ver. 6.0.1) smartpca (Patterson et al. 2006).

fineSTRUCTURE (Lawson et al. 2012) 분석을 위해 CHROMOPAINTER를 사용하여 각 개인의 “염색체 청크(chromosome chunks)”를 추론했고, 88명의 한국인(온라인 보충 자료, 보충 표 S1)과 208명의 현대인(온라인 보충 자료, 보충 표 S2)을 64개의 유전 집단으로 클러스터링했다(온라인 보충 자료, 보충 그림 1). fineSTRUCTURE 분석 결과 88명의 한국인 개인은 동질적인 그룹을 형성했다(온라인 보충 자료, 보충 그림 2). 총 185명의 현대인 유전체와 6명의 한국인 유전체를 CHROMOPAINTER와 fineSTRUCTURE (Lawson et al. 2012)를 사용하여 다시 클러스터링했다. 이 개인들을 사용하여 K=2-14로 ADMIXTURE (ver. 1.23) (Alexander et al. 2009)를 실행했다(온라인 보충 자료, 보충 그림 3). R의 hcluster 함수를 사용하여 각 ADMIXTURE 결과(K=2-14)로 덴드로그램(dendrogram)을 생성했다. R의 “dendextend” 패키지에 있는 “cophenetic” 방법의 “cor.dendlist” 함수를 사용하여 상관관계를 계산함으로써 ADMIXTURE와 fineSTRUCTURE 결과의 일관성을 평가했다(온라인 보충 자료, 보충 그림 4). K=10일 때 가장 높은 상관관계(corr. 0.78)를 보였다. fineSTRUCTURE로 분석한 유전 집단을 가장 잘 나타내는 K=10의 혼합 결과를 사용했다. EIGENSOFT (ver. 6.0.1) smartpca (Patterson et al. 2006)로 주성분 분석(PCA)을 수행했다.

혼합 시기 추정 Admixture Time Estimation

We implemented the ALDER program (Loh et al. 2013) to estimate the admixture time of Korean using the Korean itself as one reference population. We used filtering criteria of a genotype rate >99%, MAF>0.01, and Hardy-Weinberg equilibrium P value > 0.000001.

한국인 자체를 하나의 참조 집단으로 사용하여, 한국인의 혼합 시기를 추정하기 위해 ALDER 프로그램(Loh et al. 2013)을 실행했다. 유전형 비율 > 99%, 최소 대립유전자 빈도(MAF) > 0.01, 하디-바인베르크 평형(Hardy-Weinberg equilibrium) P 값 > 0.000001의 필터링 기준을 사용했다.

고대 집단과 현대 집단 간의 유전적 친연성 The Genetic Affinity between the Ancient and Present-Day Populations

To investigate the genetic relationship between populations of interest, we used the D and outgroup f3 statistic framework by using ADMIXTOOLS (Patterson et al. 2012). The genetic affinity between the ancient and present-day populations was measured with the outgroup f3 statistic using the following notation: f3(X, Y; Yoruba), where X and Y are ancient and present-day populations, respectively. To better represent the genetic association of the present-day population against a focal ancient genome, we applied a scaled f3 statistic by f3_scaled=(f3-m)/(M-m), where m and M represent the minimum and maximum f3 statistic (fig. 2A and supplementary figure 5, Supplementary Material online). To cluster ancient genomes in this study, we analyzed a pairwise outgroup f3 statistic with a form of f3(X, Y; Yoruba). In this analysis, both X and Y were ancient genomes.

관심 집단 간의 유전적 관계를 조사하기 위해 ADMIXTOOLS (Patterson et al. 2012)를 사용하여 D 통계량 및 외집단(outgroup) f3 통계량 프레임워크를 사용했다. 고대 집단과 현대 집단 간의 유전적 친연성은 f3(X, Y; Yoruba) 표기법을 사용하는 외집단 f3 통계량으로 측정했다 (여기서 X와 Y는 각각 고대 집단과 현대 집단임). 현대 집단과 특정 고대 유전체 간의 유전적 연관성을 더 잘 나타내기 위해, 조정된(scaled) f3 통계량f3_scaled=(f3-m)/(M-m), 여기서 m과 M은 각각 f3 통계량의 최소값과 최대값을 나타냄)을 적용했다 (그림 2A 및 온라인 보충 자료, 보충 그림 5). 이 연구에서 고대 유전체를 클러스터링하기 위해, f3(X, Y; Yoruba) 형태의 쌍별 외집단 f3 통계량을 분석했다. 이 분석에서는 X와 Y가 모두 고대 유전체였다.

혼합 모델 구축 Admixture Model Construction

To construct an admixture model depicting the historical genetic makeup of Koreans and other Asians, we fitted the SNP panel to the admixture models with the qpgraph program (Patterson et al. 2012) based on results from D-statistics and f₃ statistics in our study. We first set the skeleton for the admixture model as Tianyuan, Onge, and Ami by adapting a previous study (McColl et al. 2018) (worst-fitting Z=0.044). Then, we added Kinh which has a high admixture F3 score with Devil’s Gate to Koreans (worst-fitting Z=-3.887) and then to Devil’s Gate, Ulchi, Koryak, Mixe, and MA1 (worst-fitting Z=3.317). Finally, Koreans, Han, and Japanese have been added to model the suggested admixture of East Siberians (E_si) and East Asians b (EA_b) (worst-fitting Z value of 3.686). We manually calibrated the final model with a time point which was estimated using the ALDER results.

혼합 모델 구축 한국인과 다른 아시아인(Asians)들의 역사적 유전 구성을 묘사하는 혼합 모델을 구축하기 위해, 우리 연구의 D-통계량 및 f₃-통계량 결과에 기초하여 qpgraph 프로그램(Patterson et al. 2012)으로 SNP 패널을 혼합 모델에 적용했다. 우리는 먼저 이전 연구(McColl et al. 2018)를 적용하여 전원(田園)인(Tianyuan), 옹게(Onge)인, 아미(Ami)인을 혼합 모델의 기본 골격으로 설정했다 (최악 적합 Z=0.044). 그런 다음, 한국인에 대해 데블스 게이트(Devil’s Gate)인과 높은 혼합 F3 점수를 보이는 킨(Kinh)족을 추가하고(최악 적합 Z=-3.887), 이어서 데블스 게이트(Devil’s Gate)인, 울치(Ulchi)인, 코략(Koryak)인, 미헤(Mixe)인, MA1을 추가했다 (최악 적합 Z=3.317). 마지막으로, 제안된 동시베리아인(East Siberians, Eₛᵢ)과 동아시아인 b (East Asians b, EA_b)의 혼합을 모델링하기 위해 한국인, 한(Han)족, 일본인(Japanese)을 추가했다 (최악 적합 Z 값 3.686). ALDER 결과를 사용하여 추정된 시점으로 최종 모델을 수동 보정했다.

결과 및 토의

한국인의 유전 구조 Korean Genetic Structure

To infer the genetic association between the 88 Koreans (supplementary table S1, Supplementary Material online) and our selected neighboring populations, we collected with WGS from 185 contemporary individuals belonging to 91 populations (fig. 1A and supplementary table S2, Supplementary Material online). We included people from 21 and 31 Southeast Asian and North Asian ethnic groups, respectively, from which Koreans could have originated. We predicted an average of 1.5 and 2.6 mega homo- and heterozygous single-nucleotide variants from each individual, respectively (supplementary table S2, Supplementary Material online). We merged WGS-based SNPs with the human origin SNP panel data set and finally produced 199,629 autosomal SNPs for genetic comparison.

88명의 한국인(온라인 보충 자료, 보충 표 S1)과 우리가 선택한 이웃 집단 간의 유전적 연관성을 추론하기 위해, 91개 인구 집단에 속하는 185명의 현대인으로부터 WGS 데이터를 수집했다(그림 1A 및 온라인 보충 자료, 보충 표 S2). 여기에는 한국인이 유래했을 가능성이 있는 동남아시아(Southeast Asian) 21개, 북아시아(North Asian) 31개 민족 집단의 사람들이 포함되었다. 각 개인에게서 평균 150만 개의 동형접합(homozygous) 및 260만 개의 이형접합(heterozygous) 단일 염기 변이(SNV)를 예측했다(온라인 보충 자료, 보충 표 S2). WGS 기반 SNP를 인간 기원 SNP 패널 데이터 세트와 병합하여, 유전적 비교를 위한 199,629개의 상염색체 SNP를 최종적으로 생성했다.

To infer the genetic structures of the Korean ethnic group, we clustered 94 Koreans, including 6 published Koreans genotyped with SNP chip, by applying the CHROMOPAINT and fineSTRUCTURE (Lawson et al. 2012) programs. These algorithms clustered 279 individuals into 64 homogeneous groups according to the haplotype patterns shared by the individuals (supplementary figure 1, Supplementary Material online). This analysis showed eight global haplotype patterns: Africans (AFR), West Asians (WA), Europeans (EUR), South Asians (SA), West Siberians (W_si), East Siberians (E_si), and two groups of East Asians (EA_a and EA_b) (supplementary figure 2, Supplementary Material online), which reflect both geographic and genetic relationships (fig. 1A). The group of EA_b consists mainly of Korean, Chinese, Japanese as well as Austroasiatic speakers in Southeast Asia and EA_a contains several ethnic minorities of Southeast Asia. We first confirmed a genetically homogeneous ethnic group of Koreans by showing a single clade in the fineSTRUCTURE tree (supplementary figure 2, Supplementary Material online). This homogeneity is also consistent across chip-based and WGS-based data, suggesting that there is no technical bias in the sequencing platform or the SNP prediction algorithm. In the PCA, both the Koreans and EA_b fell between the EA_a and E_si populations (fig. 1B), consistent with other previous studies (Kim and Jin 2013; Wang et al. 2018). We reanalyzed fineSTRUCTURE and ADMIXTURE (Alexander et al. 2009) with 6 randomly sampled Koreans and 185 global populations, to compare Korean’s genetic components without sampling bias (fig. 110). Consistent with the PCA result, the fineSTRUCTURE tree showed Koreans formed a homogeneous clade with most of the EA populations represented by EA_b and their sister groups were composed of E_si and EA_a (fig. 1C top).

한국 민족 집단의 유전 구조를 추론하기 위해, SNP 칩으로 유전형이 분석된 기존 6명의 한국인을 포함한 94명의 한국인을 CHROMOPAINT와 fineSTRUCTURE (Lawson et al. 2012) 프로그램을 적용하여 클러스터링했다. 이 알고리즘은 개인들이 공유하는 하플로타입 패턴에 따라 279명의 개인을 64개의 동질적인 그룹으로 분류했다(온라인 보충 자료, 보충 그림 1). 이 분석은 8가지 전 세계적 하플로타입 패턴을 보여주었다: 아프리카인(Africans, AFR), 서아시아인(West Asians, WA), 유럽인(Europeans, EUR), 남아시아인(South Asians, SA), 서시베리아인(West Siberians, Wₛᵢ), 동시베리아인(East Siberians, Eₛᵢ), 그리고 두 그룹의 동아시아인(East Asians, EAₐ 및 EA_b) (온라인 보충 자료, 보충 그림 2). 이는 지리적, 유전적 관계를 모두 반영한다(그림 1A). EA_b 그룹은 주로 한국인(Korean), 중국인(Chinese), 일본인(Japanese) 및 동남아시아(Southeast Asia)의 오스트로아시아(Austroasiatic)어족 화자들로 구성되며, EAₐ는 동남아시아(Southeast Asia)의 여러 소수 민족을 포함한다. 우리는 먼저 fineSTRUCTURE 트리에서 한국인이 단일 분기군(clade)을 형성하는 것을 보여줌으로써 유전적으로 동질적인 민족 집단임을 확인했다(온라인 보충 자료, 보충 그림 2). 이러한 동질성은 칩 기반 데이터와 WGS 기반 데이터 모두에서 일관되게 나타나며, 이는 시퀀싱 플랫폼이나 SNP 예측 알고리즘에 기술적 편향이 없음을 시사한다. PCA 분석에서 한국인과 EA_b 집단은 EAₐ 집단과 Eₛᵢ 집단 사이에 위치했으며(그림 1B), 이는 다른 이전 연구들과 일치한다(Kim and Jin 2013; Wang et al. 2018). 우리는 샘플링 편향 없이 한국인의 유전적 구성 요소를 비교하기 위해, 무작위로 추출된 6명의 한국인과 185개의 전 세계 인구 집단을 대상으로 fineSTRUCTURE와 ADMIXTURE (Alexander et al. 2009)를 재분석했다(그림 1C). PCA 결과와 일관되게, fineSTRUCTURE 트리는 한국인이 EA_b로 대표되는 대부분의 동아시아(EA) 집단과 동질적인 분기군을 형성하고, 그 자매 집단은 Eₛᵢ와 EAₐ로 구성됨을 보여주었다(그림 1C 상단).

그림 1. 현대 인구 집단의 유전적 클러스터링

(A) 이 연구에서 분석된 91개 인구 집단의 지리적 분포도. 각 원은 (B)의 유전적 클러스터를 나타낸다.

(B) 109개 현대 인구 집단에 속한 185명 개인의 199,629개 연관 불균형(linkage disequilibrium) 정리 SNP를 사용한 주성분 분석(PCA).

(C) fineSTRUCTURE (Lawson et al. 2012) (상단) 및 ADMIXTURE (Alexander et al. 2009) (하단)로 분석한 현대 인구 집단의 유전적 클러스터링. 유전적 클러스터의 이름은 혼합(admixture) 그룹 이름 아래에 표시되어 있다.

We also analyzed genetic ancestry assuming ancestral groups from K=2 to K=14 in the ADMIXTURE analysis (Alexander et al. 2009) (supplementary figure 3, Supplementary Material online). From K=5, it showed two genetic components, red and blue, were admixed in Koreans which were dominated in the E_si and EA_a/b populations, respectively; although, these ratios were slightly different depending on the number of ancestral groups (K). The dendrogram correlation analysis showed the greatest consensus between the fineSTRUCTURE clades and ADMIXTURE results at K=10 (supplementary figure 4, Supplementary Material online). At K=10, we observed 38% and 62% of the E_si and EA_a/b genetic components in the Koreans, respectively (fig. 10). Comparing admixture rates among the EA_b populations, both the Korean and Japanese populations showed very similar levels of genetic admixture rates, consistent with their sister groups in the fineSTRUCTURE tree (fig. 10). Takeuchi et al. (2017) reported a high degree of genetic similarity between the Korean and mainland Japanese and the estimated admixture date of the EA-wide genetic component to Japan was in the Yayoi period (3,000-1,700 years BP). The Chinese also have similar genetic compositions to the Korean and Japanese; however, their admixture rates differed depending on geographic region. Overall, we conclude that genetic admixture events occurred first between the Southeast Asians and Chinese outside Korea and Japan and then spread, rather than occurring separately in Korea or Japan locally. It is also possible that such a recent genetic admixture was a broad phenomenon, happening concurrently all across EA driven by a population expansion caused by the agricultural, economic, and technological advances of the last 4,000 years (Lipson et al. 2018).

우리는 또한 ADMIXTURE 분석(Alexander et al. 2009)에서 K=2에서 K=14까지의 조상 집단을 가정하여 유전적 조상을 분석했다(온라인 보충 자료, 보충 그림 3). K=5부터, 한국인에게는 각각 Eₛᵢ 집단과 EA_ab 집단에서 우세하게 나타나는 두 가지 유전적 구성 요소(빨간색과 파란색)가 혼합되어 있음이 나타났다. 비록 이 비율은 조상 집단의 수(K)에 따라 약간 달랐지만 말이다. 덴드로그램 상관관계 분석은 K=10일 때 fineSTRUCTURE 분기군과 ADMIXTURE 결과 간에 가장 큰 일치성을 보였다(온라인 보충 자료, 보충 그림 4). K=10에서, 우리는 한국인에게서 각각 38%의 Eₛᵢ 유전 요소와 62%의 EA_ab 유전 요소를 관찰했다(그림 1C). EA_b 집단 간의 혼합 비율을 비교했을 때, 한국인(Korean)과 일본인(Japanese) 집단은 매우 유사한 수준의 유전적 혼합 비율을 보였으며, 이는 fineSTRUCTURE 트리에서의 자매 집단 관계와 일치한다(그림 1C). 타케우치(Takeuchi) 등(2017)은 한국인(Korean)과 일본 본토인 간의 높은 유전적 유사성을 보고했으며, 동아시아(EA) 전역의 유전 요소가 일본으로 유입된 혼합 시기는 야요이(Yayoi) 시대(3,000-1,700년 전)로 추정되었다. 중국인(Chinese) 역시 한국인(Korean) 및 일본인(Japanese)과 유사한 유전적 구성을 가지지만, 그들의 혼합 비율은 지리적 지역에 따라 다르게 나타났다. 전반적으로, 우리는 유전적 혼합 사건이 한국이나 일본 현지에서 개별적으로 발생했다기보다는, 한국과 일본 외부에서 동남아시아인(Southeast Asians)과 중국인(Chinese) 사이에 먼저 발생한 후 확산되었다고 결론 내린다. 또한 이러한 최근의 유전적 혼합은 지난 4,000년간의 농업, 경제, 기술적 진보로 인한 인구 팽창에 의해 동아시아(EA) 전역에서 동시에 일어난 광범위한 현상이었을 가능성도 있다(Lipson et al. 2018).

신석기 시대 데블스 게이트 조상에서 한국인으로의 유전자 흐름 The Gene Flow Neolithic Age Devil’s Gate Ancestry to Korean People

To reveal past genetic exchanges contributing to the current Koreans and their neighboring populations, we collected 115 ancient genomes from across the world (supplementary table S3, Supplementary Material online), consisting of 4 Pleistocene hunter-gatherers, 13 Holocene hunter-gatherers, 20 Early Neolithic, 10 Mid Neolithic, 10 Late Copper Age, 9 Late Neolithic, 20 Early Bronze Age, 4 Mid Bronze Age, 2 Late Bronze Age, and 12 Iron Age ancient genomes distributed across European and Russian regions (supplementary table S3, Supplementary Material online). The time scale of these ancient genomes was categorized by referring to previous research (Haak et al. 2015). In addition, we included the Tianyuan genome from northern China (Yang et al. 2017), two ancient genomes unearthed from the Devil’s Gate cave near North Korea (Siska et al. 2017), and eight ancient genomes from Southeast Asia dating from the Neolithic to the Iron Age (Lipson et al. 2018), making a total of 115 genomes.

현재의 한국인과 이웃 집단에 기여한 과거의 유전적 교류를 밝히기 위해, 우리는 전 세계에서 115개의 고대 유전체를 수집했다(온라인 보충 자료, 보충 표 S3). 이는 플라이스토세(Pleistocene) 수렵-채집인 4명, 홀로세(Holocene) 수렵-채집인 13명, 초기 신석기(Early Neolithic) 20명, 중기 신석기(Mid Neolithic) 10명, 후기 동기(Late Copper Age) 10명, 후기 신석기(Late Neolithic) 9명, 초기 청동기(Early Bronze Age) 20명, 중기 청동기(Mid Bronze Age) 4명, 후기 청동기(Late Bronze Age) 2명, 철기 시대(Iron Age) 12명의 고대 유전체로 구성되며, 유럽(European) 및 러시아(Russian) 지역에 분포한다(온라인 보충 자료, 보충 표 S3). 이 고대 유전체들의 시간 척도는 이전 연구(Haak et al. 2015)를 참조하여 분류했다. 또한, 중국 북부의 전원(田園)인(Tianyuan) 유전체(Yang et al. 2017), 북한(North Korea) 근처 데블스 게이트(Devil’s Gate) 동굴에서 발굴된 2개의 고대 유전체(Siska et al. 2017), 그리고 신석기(Neolithic) 시대부터 철기(Iron Age) 시대에 이르는 동남아시아(Southeast Asia)의 고대 유전체 8개(Lipson et al. 2018)를 포함하여 총 115개의 유전체를 분석했다.

We measured levels of pairwise genetic affinity among the ancient and present-day genomes by using outgroup f3-statistics, with a form of f3(ancient, present-day; Yoruba) (Patterson et al. 2012). This analysis calculates the global landscape of the genetic associations between ancient and present-day genomes (supplementary figure 5 and table S4, Supplementary Material online). The f3_scaled-statistics showed that the ancient Tianyuan individual (40,000 years BP from China) shares more alleles with present-day Siberians (E_si and W_si) and East Asian (EA_b) populations than with other present-day populations such as European, West-, and South Asians (supplementary figure 5, Supplementary Material online). It suggests Tianyuan is the basal genetic component of the East Eurasian and East Asian lineage. We also observed that present-day E_si and EA_b populations had significant genetic affinities with ancient Southeast Asians (ancSEA), Devil’s Gate, and Bronze and Iron age ancients who lived in central steppe regions (ancCS) (fig. 2A and supplementary table 54 and figure 5, Supplementary Material online).

우리는 f3(고대인, 현대인; 요루바(Yoruba)) 형태의 외집단(outgroup) f3-통계량(Patterson et al. 2012)을 사용하여 고대 및 현대 유전체 간의 쌍별 유전적 친연성 수준을 측정했다. 이 분석은 고대 유전체와 현대 유전체 간의 유전적 연관성에 대한 전반적인 지형도를 계산한다(온라인 보충 자료, 보충 그림 5 및 표 S4). f3_scaled-통계량은 고대 전원(田園)인(Tianyuan) (40,000년 전, 중국) 개체가 유럽인(European), 서아시아인(West-), 남아시아인(South Asians)과 같은 다른 현대 집단보다 현대 시베리아인(Siberians) (Eₛᵢ 및 Wₛᵢ) 및 동아시아(East Asian) (EA_b) 집단과 더 많은 대립유전자(allele)를 공유함을 보여주었다(온라인 보충 자료, 보충 그림 5). 이는 전원(田園)인(Tianyuan)이 동유라시아(East Eurasian) 및 동아시아(East Asian) 계통의 기저 유전 요소임을 시사한다. 우리는 또한 현대 Eₛᵢ 및 EA_b 집단이 고대 동남아시아인(ancient Southeast Asians, ancSEA), 데블스 게이트(Devil’s Gate)인, 그리고 중앙 스텝(central steppe) 지역에 살았던 청동기(Bronze) 및 철기(Iron) 시대 고대인(ancCS)과 유의미한 유전적 친연성을 갖는다는 것을 관찰했다(그림 2A 및 온라인 보충 자료, 보충 표 S4 및 그림 5).

Based on these genetic affinities, we deduced the genetic founders of the Koreans by comparing the Tianyuan-derived alleles shared with these ancients and present-day populations. We applied D-statistics in the form of D(Yoruba, Tianyuan; X, Y), where X and Y were ancient and present-day populations, respectively (fig. 2B and supplementary figure 6, Supplementary Material online). Tianyuan shares more derived alleles with ancSEAs than with any present-day populations (fig. 28), suggesting ancSEAs directly come from the Tianyuan lineage. Neolithic Devil’s gate and present-day population (E_si and EA_a/b) showed a similar amount of Tianyuan’s genetic ancestry by showing D(Yoruba, Tianyuan; Devil’s Gate, E_si or EA_a/b) ≈ 0. It suggests Neolithic Devil’s gate (Northern part of Korea) is possible to be admixed with another genetic component. In addition, Tianyuan’s genetic ancestry had a significantly higher level of genetic affinity with W_si, E_si, and EA_b populations than with ancCS (fig. 2B). It suggests ancCS were possibly generated from other genetic compounds. The genetic clustering of ancient genomes also confirmed the highest genetic affinity of Tianyuan in Man Bac and a slight reduction of this affinity in other ancSEAs over time (fig. 2C and supplementary figure 7, Supplementary Material online). This evidence suggests ancSEA received an additional genetic component over time, consistent with Man Bac having the highest affinity toward Tianyuan.

이러한 유전적 친연성을 바탕으로, 우리는 이들 고대인 및 현대 집단과 공유하는 전원(田園)인(Tianyuan) 유래 대립유전자(allele)를 비교함으로써 한국인의 유전적 창시자를 추론했다. 우리는 D(요루바(Yoruba), 전원(田園)인(Tianyuan); X, Y) 형태의 D-통계량을 적용했으며, 여기서 X와 Y는 각각 고대 및 현대 집단이었다 (그림 2B 및 온라인 보충 자료, 보충 그림 6). 전원(田園)인(Tianyuan)은 어떤 현대 집단보다도 고대 동남아시아인(ancSEAs)과 더 많은 파생 대립유전자(derived allele)를 공유하며(그림 2B), 이는 고대 동남아시아인(ancSEAs)이 전원(田園)인(Tianyuan) 계통에서 직접 유래했음을 시사한다. 신석기(Neolithic) 데블스 게이트(Devil’s gate)인과 현대 집단(Eₛᵢ 및 EA_a/b)은 D(요루바(Yoruba), 전원(田園)인(Tianyuan); 데블스 게이트(Devil’s Gate), Eₛᵢ or EA_a/b) ≈ 0 값을 보임으로써, 전원(田園)인(Tianyuan)의 유전적 조상을 비슷한 양으로 공유함을 나타냈다. 이는 (한국 북부의) 신석기(Neolithic) 데블스 게이트(Devil’s gate)인이 다른 유전 요소와 혼합되었을 가능성을 시사한다. 또한, 전원(田園)인(Tianyuan)의 유전적 조상은 고대 중앙 스텝인(ancCS)보다 Wₛᵢ, Eₛᵢ, EA_b 집단과 유의미하게 더 높은 수준의 유전적 친연성을 보였다(그림 2B). 이는 고대 중앙 스텝인(ancCS)이 아마도 다른 유전적 혼합물로부터 생성되었음을 시사한다. 고대 유전체의 유전적 클러스터링 역시 만박(Man Bac) 유적 고대인에서 전원(田園)인(Tianyuan)과의 유전적 친연성이 가장 높게 나타나며, 시간이 지남에 따라 다른 고대 동남아시아인(ancSEAs)에게서는 이 친연성이 약간 감소함을 확인시켜 주었다(그림 2C 및 온라인 보충 자료, 보충 그림 7). 이 증거는 고대 동남아시아인(ancSEA)이 시간이 지남에 따라 추가적인 유전 요소를 받아들였음을 시사하며, 만박(Man Bac)인이 전원(田園)인(Tianyuan)에 대해 가장 높은 친연성을 갖는다는 사실과 일치한다.

그림 2. 고대 인구 집단과 현대 인구 집단 간의 유전적 연관성

(A) f3(X, Y; 요루바(Yoruba)) 형태의 외집단(Outgroup) f3 통계량 (X와 Y는 각각 고대 및 현대 인구 집단). f3 통계량을 0과 1 사이로 조정(f3ₛ꜀ₐₗₑₔ)했다. 히트맵(heat map)에서 검은색은 f3ₛ꜀ₐₗₑₔ 값이 0에 가까움을, 붉은색은 1에 가까움을 나타낸다. 고대 유전체 X(행 기준)에 대해, 특정 셀의 조정된 f3 통계량은 f3ₛ꜀ₐₗₑₔ = (f3-m)/(M-m)로 계산된다 (m과 M은 각각 f3 통계량의 최소값과 최대값). 따라서 각 열에서 가장 작은 f3 값은 f3ₛ꜀ₐₗₑₔ 통계량 = 0 (검은색)이 되고, 가장 큰 값은 f3ₛ꜀ₐₗₑₔ 통계량 = 1 (붉은색)이 된다. X축의 고대 유전체는 시간 순서대로 배열했다. 또한 중앙 스텝(Central Steppe, CS) 조상 계통(검은색 화살표) (de Barros Damgaard et al. 2018)과 중국 및 동남아시아(Southeast Asian) 조상 계통 유전체(파란색 화살표) (Lipson et al. 2018)를 분리했다. 하단 막대의 P는 플라이스토세(Pleistocene) 수렵-채집인, N, B, I는 각각 신석기(Neolithic) 수렵-채집인, 청동기(Bronze), 철기(Iron) 시대를 의미한다. 이 통계량에 대한 전체 데이터는 온라인 보충 자료의 보충 그림 S5와 표 S4에 있다.

(B) D(요루바(Yoruba), 전원(田園)인(Tianyuan); X, Y) 통계량 (X와 Y는 각각 고대 및 현대 인구 집단). 절댓값 Z-점수 > 3인 경우만 표시했다. 점의 색상은 그림 1C에 있는 개인의 유전적 클러스터를 나타낸다. X축은 그림 1C에 표시된 동아시아(East Asia, EA) 및 동시베리아(East Siberia, E) 집단과 유전적 친연성이 있는 고대 유전체를 나타낸다. 이 D-통계량에 대한 115개 고대 유전체 전체 데이터는 온라인 보충 자료의 보충 그림 S6에 있다.

(C) f3(X, Y, 요루바(Yoruba)) 형태의 고대 유전체 간 외집단 f3 통계량. X와 Y 모두 고대 유전체이다. 전체 고대 유전체 클러스터링은 온라인 보충 자료의 보충 그림 S7에 나타나 있다.

We examined Tianyuan’s genetic affinities for Eₛᵢ and EA_a/b using D-statistic in the form of D(Yoruba, Tianyuan; Eₛᵢ, EA_a_/b) (supplementary figure 8, Supplementary Material online). In these statistics, the Tianyuan genome showed a higher level of genetic affinity with present-day Eₛᵢ than Southeast Asians. However, several EA_b (Korean, Japanese, and south Chinese) populations showed similar levels of affinity with Tianyuan-derived alleles to the Eₛᵢ populations and were equally distant to Tianyuan lineage. This suggests Devil’s Gate ancients and present-day Eₛᵢ and several EA_b populations were subject to similar genetic influences over time and are expected to be a single clade since they are all separated originally from the Tianyuan lineage. These lines of analysis reveal that the basal ancient of the Tianyuan genome was separated in the Neolithic or pre-Neolithic era and independently affected current Koreans.

우리는 D(요루바(Yoruba), 전원(田園)인(Tianyuan); Eₛᵢ, EA_a/b) 형태의 D-통계량을 사용하여 전원(田園)인(Tianyuan)이 Eₛᵢ 및 EA_a/b 집단에 대해 갖는 유전적 친연성을 조사했다 (온라인 보충 자료, 보충 그림 8). 이 통계에서, 전원(田園)인(Tianyuan) 유전체는 동남아시아인(Southeast Asians)보다 현대 Eₛᵢ 집단과 더 높은 수준의 유전적 친연성을 보였다. 그러나 몇몇 EA_b 집단(한국인(Korean), 일본인(Japanese), 남부 중국인(south Chinese))은 Eₛᵢ 집단과 유사한 수준으로 전원(田園)인(Tianyuan) 유래 대립유전자(allele)에 대한 친연성을 보였으며, 전원(田園)인(Tianyuan) 계통과 등거리(equally distant)에 있었다. 이는 데블스 게이트(Devil’s Gate) 고대인과 현대 Eₛᵢ 및 일부 EA_b 집단이 시간이 지남에 따라 유사한 유전적 영향을 받았으며, 모두 원래 전원(田園)인(Tianyuan) 계통에서 분리되었기 때문에 단일 분기군(clade)을 이룰 것으로 예상된다는 것을 시사한다. 이러한 분석 라인들은 전원(田園)인(Tianyuan) 유전체의 기저 고대인이 신석기(Neolithic) 또는 신석기 이전(pre-Neolithic) 시대에 분리되었으며, 현재의 한국인에게 독립적으로 영향을 미쳤음을 보여준다.

한국 민족 집단을 형성한 고대의 유전자 흐름 The Ancient Gene Flow Making Up the Korean Ethnic Group

We focused on the gene flow from the Neolithic ancients into the Korean and EA populations. Based on the Tianyuan’s gene flow into Neolithic ancients and present-day populations, we hypothesized that either the Neolithic ancient genome contributed to the genetic ancestry of Korean or EA populations independently, or a second gene flow could have occurred (fig. 2B). First, we investigated gene flow from two Neolithic ancients to Koreans and EA populations, with a form D(Yoruba, Devil’s Gate/Man Bac, ancient, present-day population). It showed Devil’s Gate genomes shared more derived alleles with most of the present-day Eₛᵢ and EA_b populations than with Neolithic Man Bac in Vietnam (fig. 3A and supplementary table S5, Supplementary Material online). From the Devil’s Gate genome near North Korea, we observed these present-day populations are equivalent to the genetic relationship with Ban Chiang and Vat Komnou ancients who are ancestors of Austroasiatic speakers (Lipson et al. 2018). In addition, we observed local genetic transitions from Oakaie (Late Neolithic and Bronze Age in Myanmar) and Nui Nap (Bronze Age in Vietnam) to EA populations (supplementary table S5, Supplementary Material online). Several Eₛᵢ and EA_b populations, such as Korean, Japanese and several Chinese (Hezen, and She), and Russian (Ulchi) ethnic group, still had dominant genetic contributions from Devil’s Gate compared with Oakaie and Nui Nap ancients. This suggests that local genetic differences observed in present-day EA_a/b populations (fig. 1C) were influenced by a new genetic influx from the Bronze Age to Iron Age in Southeast Asia.

우리는 신석기(Neolithic) 고대인으로부터 한국인(Korean) 및 동아시아(EA) 집단으로의 유전자 흐름에 초점을 맞추었다. 전원(田園)인(Tianyuan)의 유전자가 신석기(Neolithic) 고대인과 현대 집단으로 흘러 들어간 것에 기초하여, 우리는 신석기(Neolithic) 고대 유전체가 한국인(Korean) 또는 동아시아(EA) 집단의 유전적 조상에 독립적으로 기여했거나, 또는 두 번째 유전자 흐름이 발생했을 수 있다고 가정했다(그림 2B). 먼저, D(요루바(Yoruba), 데블스 게이트(Devil’s Gate)/만박(Man Bac), ancient, present-day population) 형태를 사용하여 두 신석기(Neolithic) 고대인으로부터 한국인(Korean) 및 동아시아(EA) 집단으로의 유전자 흐름을 조사했다. 그 결과 데블스 게이트(Devil’s Gate) 유전체는 베트남(Vietnam)의 신석기(Neolithic) 만박(Man Bac)인보다 대부분의 현대 Eₛᵢ 및 EA_b 집단과 더 많은 파생 대립유전자(derived allele)를 공유하는 것으로 나타났다(그림 3A 및 온라인 보충 자료, 보충 표 S5). 북한(North Korea) 근처의 데블스 게이트(Devil’s Gate) 유전체로부터, 우리는 이 현대 집단들이 오스트로아시아(Austroasiatic)어족 화자의 조상인 반치앙(Ban Chiang) 및 밧콤노우(Vat Komnou) 고대인과의 유전적 관계와 동등함을 관찰했다(Lipson et al. 2018). 또한, 우리는 오아카이에(Oakaie) (미얀마(Myanmar)의 후기 신석기(Late Neolithic) 및 청동기(Bronze Age)) 및 누이납(Nui Nap) (베트남(Vietnam)의 청동기(Bronze Age))에서 동아시아(EA) 집단으로의 지역적 유전적 전이(transition)를 관찰했다(온라인 보충 자료, 보충 표 S5). 한국인(Korean), 일본인(Japanese) 및 일부 중국인(Chinese) (허저(Hezen), 서(She))과 러시아인(Russian) (울치(Ulchi)) 민족 집단과 같은 여러 Eₛᵢ 및 EA_b 집단은, 오아카이에(Oakaie) 및 누이납(Nui Nap) 고대인과 비교할 때 여전히 데블스 게이트(Devil’s Gate)인으로부터 우세한 유전적 기여를 받았다. 이는 현대 EA_a/b 집단에서 관찰되는 지역적 유전적 차이(그림 1C)가 동남아시아(Southeast Asia)의 청동기(Bronze Age) 시대에서 철기(Iron Age) 시대로부터의 새로운 유전적 유입에 의해 영향을 받았음을 시사한다.

We also observed D(Yoruba, Devil’s gate, baOku, present-day Eₛᵢ or EA_b) ≈ 0 (fig. 3A) and D(Yoruba, baoku, Eₛᵢ, EA_b) ≈ 0 (supplementary table S6, Supplementary Material online). According to these statistics, the baOku genomes are equally closely related to present-day Eₛᵢ and EA_b populations, which is different from the dominant ancestry of the Eₛᵢ populations in bakarasuk (Iron Age in Russia) and irAltai (Iron Age in Russia). Unlike the Devil’s Gate’s ancestry, the Neolithic Man Bac shares more derived alleles with most of the present-day Eₛᵢ and EA_b populations than either the Bronze Age ancSEAs (Oakaie, Nui Nap, Ban Chiang) or ancCSs (baOku, bakarasuk, irAltai) (fig. 3B and supplementary table S7, Supplementary Material online). This suggests the Neolithic Man Bac is the basal ancestry for the present-day Eₛᵢ and EA_b populations. No genetic drift was observed from Neolithic Man Bac to Devil’s Gate ancient and present-day populations (fig. 3B). We also analyzed genetic associations of ancCS to other ancients and present-day populations with a form of D(Yoruba, ancCS; ancient, present-day populations) (supplementary figure 9, Supplementary Material online). It inferred that present-day Eₛᵢ and EA populations and ancSEA are equally related to ancCS by sharing similar levels of ancCS-derived alleles. It is an agreement with genetic admixture patterns of Asian ancestry in CS ancients (Allentoft et al. 2015; Damgaard et al. 2018). It supports genetic admixture between ancCS and present-day EA populations, however, it cannot explain how and how many events the ancCS influence toward EA occurred. We also observed the first evidence of the genetic divergence of Vat Komnou and several EA_b (Southeast Asian and Southern China) populations from Man Bac (fig. 3B and supplementary table S7, Supplementary Material online). This supports the idea that these ancients are new genetic resources that genetically influenced EA (fig. 2A).

우리는 또한 D(요루바(Yoruba), 데블스 게이트(Devil’s gate), baOku, present-day Eₛᵢ or EA_b) ≈ 0 (그림 3A) 및 D(요루바(Yoruba), baoku, Eₛᵢ, EA_b) ≈ 0 (온라인 보충 자료, 보충 표 S6)임을 관찰했다. 이 통계에 따르면, baOku 유전체는 현대 Eₛᵢ 집단과 EA_b 집단에 동등하게 밀접하게 관련되어 있으며, 이는 바카라숙(bakarasuk) (러시아(Russia)의 철기(Iron Age)) 및 irAltai (러시아(Russia)의 철기(Iron Age))에서 Eₛᵢ 집단의 우세한 조상 계통을 보이는 것과는 다르다. 데블스 게이트(Devil’s Gate)인의 조상 계통과는 달리, 신석기(Neolithic) 만박(Man Bac)인은 청동기(Bronze Age) 고대 동남아시아인(ancSEAs) (오아카이에(Oakaie), 누이납(Nui Nap), 반치앙(Ban Chiang))이나 고대 중앙 스텝인(ancCSs) (baOku, 바카라숙(bakarasuk), irAltai)보다 대부분의 현대 Eₛᵢ 및 EA_b 집단과 더 많은 파생 대립유전자(derived allele)를 공유한다(그림 3B 및 온라인 보충 자료, 보충 표 S7). 이는 신석기(Neolithic) 만박(Man Bac)인이 현대 Eₛᵢ 및 EA_b 집단의 기저 조상임을 시사한다. 신석기(Neolithic) 만박(Man Bac)인에서 데블스 게이트(Devil’s Gate) 고대인 및 현대 집단으로의 유전적 부동(genetic drift)은 관찰되지 않았다(그림 3B). 우리는 또한 D(요루바(Yoruba), ancCS; ancient, present-day populations) 형태를 사용하여 고대 중앙 스텝인(ancCS)이 다른 고대인 및 현대 집단과 갖는 유전적 연관성을 분석했다(온라인 보충 자료, 보충 그림 9). 이는 현대 Eₛᵢ 및 동아시아(EA) 집단과 고대 동남아시아인(ancSEA)이 고대 중앙 스텝인(ancCS) 유래 대립유전자를 비슷한 수준으로 공유함으로써 고대 중앙 스텝인(ancCS)과 동등하게 관련되어 있음을 추론하게 했다. 이는 중앙 스텝(CS) 고대인에게서 나타나는 아시아(Asian) 조상 계통의 유전적 혼합 패턴과 일치한다(Allentoft et al. 2015; Damgaard et al. 2018). 이는 고대 중앙 스텝인(ancCS)과 현대 동아시아(EA) 집단 간의 유전적 혼합을 뒷받침하지만, 고대 중앙 스텝인(ancCS)이 동아시아(EA)에 어떻게, 그리고 몇 번이나 영향을 미쳤는지는 설명할 수 없다. 우리는 또한 밧콤노우(Vat Komnou)와 여러 EA_b (동남아시아(Southeast Asian) 및 남부 중국(Southern China)) 집단이 만박(Man Bac)인으로부터 유전적으로 분기했다는 첫 번째 증거를 관찰했다 (그림 3B 및 온라인 보충 자료, 보충 표 S7). 이는 이 고대인들이 동아시아(EA)에 유전적으로 영향을 미친 새로운 유전적 자원이라는 견해를 뒷받침한다 (그림 2A).

We observed several possible ancient founders by D-statistics, however, it could not clearly resolve the current genetic makeup of Korean. To resolve the genetic relationship of the genetic makeup of Korean, we additionally analyzed the admixture pattern of the ancient/present-day Southeast Asians and Devil’s Gate ancients to Koreans with admixture f₃ statistics (table 1). Notably, the combinations of the Devil’s Gate genome and ancSEAs better represent the current Koreans than those of Devil’s Gate and modern Southeast Asians. Specifically, we observed the lowest admixture f₃-statistics when source 1 was Vat Komnou (Iron Age in Cambodia), followed by Nui Nap (Bronze Age in Vietnam). In a previous study, Nui Nap was a new genetic component close to present-day Vietnamese and Dai but not the ancestors of Austroasiatic speakers (Lipson et al. 2018). Meanwhile, next ancSEAs with lowest admixture f₃-statistics were Ban Chiang and Man Bac who are also ancients of Austroasiatic speakers. In order to investigate whether the ancSEA genetic components migrated into Korea, we analyzed the Koreans’ genetic affinity with present-day populations by outgroup f₃-statistics with a form of f₃(Korean, present-day populations; Yoruba) (fig. 3C and supplementary table S8, Supplementary Material online). It showed the group with the highest genetic affinity with the Koreans were the Japanese. The southern Chinese (Han, and She) had a higher genetic affinity with Koreans than the present-day Lau or Vietnamese, which is consistent with the admixture results (fig. 1C). This suggests that the genetic components of South Chinese were transferred into Korea after admixing with Vat Komnou and Nui Nap ancestries (fig. 3C). These lines of evidence support the conclusion that populations who carried Devil’s Gate and Man Bac genomes admixed throughout the EA_b and Eₛᵢ regions until the Neolithic period, probably accompanied by the climate changes and barriers. After the Bronze Age, the admixed genetic ancestry of the Vat Komnou and Nui Nap migrated to Korea due to rapid cultural and technological advances.

D-통계량을 통해 여러 가능한 고대 창시 집단을 관찰했지만, 이는 현재 한국인의 유전적 구성을 명확하게 해결하지 못했다. 한국인의 유전적 구성에 대한 유전적 관계를 규명하기 위해, 고대/현대 동남아시아인(Southeast Asians) 및 데블스 게이트(Devil’s Gate) 고대인이 한국인에게 미친 혼합 패턴을 혼합(admixture) f₃-통계량으로 추가 분석했다(표 1). 주목할 점은, 데블스 게이트(Devil’s Gate)인과 현대 동남아시아인(Southeast Asians)의 조합보다 데블스 게이트(Devil’s Gate) 유전체와 고대 동남아시아인(ancSEAs)의 조합이 현재의 한국인을 더 잘 나타낸다는 것이다. 구체적으로, 출처 1이 밧콤노우(Vat Komnou) (캄보디아(Cambodia)의 철기(Iron Age))일 때 가장 낮은 혼합 f₃-통계량을 관찰했고, 그 다음이 누이납(Nui Nap) (베트남(Vietnam)의 청동기(Bronze Age))이었다. 이전 연구에서, 누이납(Nui Nap)인은 현대 베트남인(Vietnamese) 및 다이(Dai)족에 가깝지만 오스트로아시아(Austroasiatic)어족 화자의 조상은 아닌 새로운 유전 요소로 밝혀졌다(Lipson et al. 2018). 한편, 그 다음으로 낮은 혼합 f₃-통계량을 보인 고대 동남아시아인(ancSEAs)은 반치앙(Ban Chiang)과 만박(Man Bac)인이었으며, 이들은 오스트로아시아(Austroasiatic)어족 화자의 고대 조상이기도 하다. 고대 동남아시아인(ancSEA)의 유전 요소가 한국으로 이주했는지 조사하기 위해, f₃(Korean, present-day populations; Yoruba) 형태의 외집단(outgroup) f₃-통계량을 사용하여 한국인과 현대 집단 간의 유전적 친연성을 분석했다(그림 3C 및 온라인 보충 자료, 보충 표 S8). 그 결과 한국인과 가장 높은 유전적 친연성을 가진 집단은 일본인(Japanese)이었다. 남부 중국인(Chinese) (한(Han)족, 서(She)족)은 현대 라오(Lau)족이나 베트남인(Vietnamese)보다 한국인과 더 높은 유전적 친연성을 보였으며, 이는 혼합(admixture) 분석 결과와 일치한다(그림 1C). 이는 남부 중국인(South Chinese)의 유전 요소가 밧콤노우(Vat Komnou) 및 누이납(Nui Nap) 조상 계통과 혼합된 후 한국으로 전달되었음을 시사한다(그림 3C). 이러한 증거들은 데블스 게이트(Devil’s Gate)인과 만박(Man Bac)인의 유전체를 지닌 집단이 아마도 기후 변화 및 장벽과 함께 신석기(Neolithic) 시대까지 EA_b 및 Eₛᵢ 지역 전역에서 혼합되었다는 결론을 뒷받침한다. 청동기(Bronze Age) 시대 이후, 밧콤노우(Vat Komnou)인과 누이납(Nui Nap)인의 혼합된 유전적 조상이 급격한 문화적, 기술적 진보로 인해 한국으로 이주했다.

표 1. 혼합(Admixture) f3 통계량

혼합 f3 통계량 표기법: f3(Source1, Source2; KOR)이며 절댓값 Z-점수 |Z| > 3인 경우만 표시했다.

그림 3. 한국인을 형성한 청동기(Bronze) 및 철기(Iron) 시대의 유전자 흐름

(A) D(요루바(Yoruba), 데블스 게이트(Devil’s gate), ancient, present-day population) 형태와 (B) D(요루바(Yoruba), 만박(Man Bac), ancient, present-day population) 형태로 본 신석기(Neolithic) 고대인에서 현대 인구 집단으로의 조상 계통 분석. 각 D-통계량에 대해 Z-점수 > 3인 경우만 표시했다. 양수(+) 값은 현대 인구 집단으로의 유전적 조상 계통을, 음수(-) 값은 하단의 고대인으로의 유전적 조상 계통을 나타낸다. 이 분석들의 원시 데이터는 온라인 보충 자료의 보충 표 S5와 S7에 있다. CS는 중앙 스텝(central steppe) 지역에서 생성된 고대 유전체를 나타낸다 (de Barros Damgaard et al. 2018).

(C) f3(Korean, Y; 요루바(Yoruba)) 형태의 외집단 f3 통계량으로 본 한국인과 이웃 민족 집단 간의 유전적 친연성. 점의 색상은 f3-통계량의 유전적 친연성을 나타낸다. 전체 고대 클러스터링은 온라인 보충 자료의 보충 표 S8에 있다. 예측된 역사적 한국 영토는 “About Korea” 웹사이트를 참조하여 황토색(ocher)으로 표시했다.

한국인 하플로타입 분석은 여러 차례의 유전 요소 유입을 보여준다 Korean Haplotype Analysis Reveals Multiwaves of Genetic Components

We analyzed haplotype distributions using WGS data of 88 unrelated Koreans generated from the KoVariome database (Kim et al. 2018) (supplementary table S1, Supplementary Material online). Nonrecombining Y-chromosome analysis showed a significant proportion of the “O” haplogroup in 55 male Koreans, 29% “O2b” and 42% “03” (fig. 4A). The next most frequent Y-chromosome haplogroup was “C” (18%). The Y-chromosome haplogroup distribution agreed with well-established Y-chromosome haplogroup “O” expansion and colonization within the Korean Peninsula (Kim et al. 2011). A comparison with the global Y-chromosome haplogroup distribution suggested that haplotype “C” is widespread in Siberia, whereas “O” haplogroups show a spatial distribution in Southeast Asia (Chiaroni et al. 2009; Karmin et al. 2015). This strongly suggests a dual origin for Korean males.

우리는 코배리옴(KoVariome) 데이터베이스(Kim et al. 2018)에서 생성된, 혈연관계가 없는 88명 한국인의 WGS 데이터를 사용하여 하플로타입 분포를 분석했다(온라인 보충 자료, 보충 표 S1). 비(非)재조합 Y-염색체 분석 결과, 55명의 한국 남성에서 “O” 하플로그룹이 29% “O2b”, 42% “O3″로 상당한 비율을 차지했다(그림 4A). 그 다음으로 빈번한 Y-염색체 하플로그룹은 “C” (18%)였다. 이러한 Y-염색체 하플로그룹 분포는 한반도(Korean Peninsula) 내에서 Y-염색체 하플로그룹 “O”가 확장되고 정착했다는 기존의 잘 확립된 사실과 일치했다(Kim et al. 2011). 전 세계 Y-염색체 하플로그룹 분포와 비교한 결과, 하플로타입 “C”는 시베리아(Siberia)에 널리 퍼져 있는 반면, “O” 하플로그룹은 동남아시아(Southeast Asia)에 공간적 분포를 보인다(Chiaroni et al. 2009; Karmin et al. 2015). 이는 한국 남성의 이중 기원(dual origin)을 강력하게 시사한다.

In contrast to the Y-chromosome distribution, mtDNA haplotypes reflect a more complex genetic history (fig. 4B). The most frequent mtDNA haplotype was “D” (34%) and ten additional mtDNA haplogroups (“M,” “B,” “N,” “G,” “F,” “R,” “A,” “С.” “Y.” and “Z”) were identified with frequencies ranging from 23% to 2%. We constructed an mtDNA tree combining 11 ancients, and 99 present-day EA_a/b and Siberian (Eₛᵢ and Wₛᵢ) mtDNAs (fig. 4C). We included 11 ancients in this tree who had relatively high-sequencing depth (supplementary table S9, Supplementary Material online). Similar to the global human-mtDNA phylogeny, our mtDNA tree shows two major clades, M’ and R’, dominantly distributed in EA populations (Soares et al. 2009). It also shows two mtDNA dispersions ~40 and 20 ka, which account for 62% and 38% of the present-day Koreans, respectively. The earlier dispersed mtDNAs included “N/Y/A,” “D,” and “B/R” which were distributed to 16%, 34%, and 12% of Koreans, respectively. The mtDNA haplotypes of the “N/Y/A” and “D” were clades coclustered with present-day Siberians as well as the Devil’s Gate ancients, representing Eurasian ancestry. The “A” haplogroup was also frequently observed in the early and middle Bronze Age Okunevo peoples (Lipson et al. 2018), who were culturally associated with bakarasuk (Lipson et al. 2018). We also identified ancient mtDNA “R” divergent into “B/R,” accounting for 12% of Koreans, that also expanded ~40 ka. The root of this clade was Tianyuan, and also coclustered with Vat Komnou ancients and present-day Chinese, representing EA ancestry. This could explain the genetic influence of the Tianyuan on Korean genomes via ancSEA. These old mtDNA waves accounted for human migration in the late Pleistocene when the Yellow sea of Korea was land, therefore, the west coast of Korea was connected to the mainland of China.

Y-염색체 분포와는 대조적으로, mtDNA 하플로타입은 더 복잡한 유전적 역사를 반영한다(그림 4B). 가장 빈번한 mtDNA 하플로타입은 “D” (34%)였으며, 10개의 추가 mtDNA 하플로그룹(“M”, “B”, “N”, “G”, “F”, “R”, “A”, “C”, “Y”, “Z”)이 23%에서 2%에 이르는 빈도로 확인되었다. 우리는 11명의 고대인과 99명의 현대 EA_a/b 및 시베리아인(Eₛᵢ 및 Wₛᵢ)의 mtDNA를 결합하여 mtDNA 트리를 구축했다(그림 4C). 이 트리에는 비교적 높은 시퀀싱 깊이(sequencing depth)를 가진 11명의 고대인을 포함시켰다(온라인 보충 자료, 보충 표 S9). 전 세계 인간-mtDNA 계통과 유사하게, 우리의 mtDNA 트리도 동아시아(EA) 집단에 우세하게 분포하는 두 개의 주요 분기군인 M’과 R’을 보여준다(Soares et al. 2009). 또한 이는 약 4만 년 전(ka)과 2만 년 전에 두 차례의 mtDNA 확산이 있었음을 보여주며, 이는 각각 현존 한국인의 62%와 38%를 차지한다. 더 일찍 확산된 mtDNA에는 “N/Y/A”, “D”, “B/R”이 포함되며, 이들은 각각 한국인의 16%, 34%, 12%에 분포했다. “N/Y/A”와 “D”의 mtDNA 하플로타입은 현대 시베리아인(Siberians) 및 데블스 게이트(Devil’s Gate) 고대인과 함께 클러스터를 이루는 분기군으로, 유라시아(Eurasian) 조상 계통을 나타낸다. “A” 하플로그룹은 초기 및 중기 청동기(Bronze Age) 시대 오쿠네보(Okunevo)인들에게서도 자주 관찰되었는데(Lipson et al. 2018), 이들은 문화적으로 바카라숙(bakarasuk)과 연관되어 있었다(Lipson et al. 2018). 우리는 또한 약 4만 년 전에 확장되었으며 한국인의 12%를 차지하는, “B/R”로 분기된 고대 mtDNA “R”을 확인했다. 이 분기군의 뿌리는 전원(田園)인(Tianyuan)이었으며, 밧콤노우(Vat Komnou) 고대인 및 현대 중국인(Chinese)과도 함께 클러스터를 이루어 동아시아(EA) 조상 계통을 나타냈다. 이는 고대 동남아시아인(ancSEA)을 통해 전원(田園)인(Tianyuan)이 한국인(Korean) 유전체에 미친 유전적 영향을 설명할 수 있다. 이러한 오래된 mtDNA 물결은 한국의 황해(黃海)가 육지였던 플라이스토세(Pleistocene) 후기, 즉 한국의 서해안(西海岸)이 중국 본토와 연결되었을 때의 인류 이주를 설명한다.

The later dispersed mtDNA haplogroups consisted of “G/C/Z,” “M,” and “F” which account for 19%, 12%, and 7% of Koreans, respectively. The “G/C/Z” clades coclustered with Siberians and Bronze Age Nui Nap in Vietnam. However, the genetic origin of the Nui Nap is still unknown. On the other hand, the mtDNA haplogroup “C” is frequently observed from the early and middle Bronze Age Okunevo peoples who lived in central steppe regions (Lipson et al. 2018). The mtDNA topology and haplotype frequency in Okunevo imply a genetic association between Nui Nap and central steppe ancients. Both of the “M” and “F” clades showed subsequent diversification from ancient mtDNA haplogroups of ancM (M’) ~20 ka and ancR (R’) divergent in 60 ka, respectively. These clades explain southern waves of human migration by coclustering with EA_b populations. In particular, two ancients of Austroasiatic speakers, Man Bac and Ban Chiang, coclustered in the mtDNA “M” lineage (fig. 3C). It suggests that a subsequent expansion of this clade can be associated with the expansion of the Austroasiatic speaking population (Lipson et al. 2018). Haplotype analysis and the phylogenetic tree of the mtDNA support a continuous genetic influence from the north and south into Korea.

더 늦게 확산된 mtDNA 하플로그룹은 “G/C/Z”, “M”, “F”로 구성되며, 각각 한국인의 19%, 12%, 7%를 차지한다. “G/C/Z” 분기군은 시베리아인(Siberians) 및 베트남(Vietnam)의 청동기(Bronze Age) 시대 누이납(Nui Nap)인과 함께 클러스터를 이루었다. 그러나 누이납(Nui Nap)인의 유전적 기원은 아직 알려지지 않았다. 반면에, mtDNA 하플로그룹 “C”는 중앙 스텝(central steppe) 지역에 살았던 초기 및 중기 청동기(Bronze Age) 시대 오쿠네보(Okunevo)인들에게서 자주 관찰된다(Lipson et al. 2018). 오쿠네보(Okunevo)인의 mtDNA 계통(topology) 및 하플로타입 빈도는 누이납(Nui Nap)인과 중앙 스텝 고대인 간의 유전적 연관성을 암시한다. “M”과 “F” 분기군은 모두 각각 약 2만 년 전(ka)의 고대 mtDNA 하플로그룹 ancM (M’)과 6만 년 전에 분기된 ancR (R’)로부터 후속적으로 다양화되었음을 보여주었다. 이 분기군들은 EA_b 집단과 함께 클러스터를 이룸으로써 남쪽으로부터의 인류 이주 물결을 설명한다. 특히, 오스트로아시아(Austroasiatic)어족 화자인 두 고대인, 즉 만박(Man Bac)인과 반치앙(Ban Chiang)인은 mtDNA “M” 계통 내에서 함께 클러스터를 이루었다(그림 3C). 이는 이 분기군의 후속 확산이 오스트로아시아(Austroasiatic)어족 사용 인구의 확장과 연관될 수 있음을 시사한다(Lipson et al. 2018). 하플로타입 분석과 mtDNA의 계통수(phylogenetic tree)는 북쪽과 남쪽으로부터 한국으로 지속적인 유전적 영향이 있었음을 뒷받침한다.

그림 4. 한국인 인구 집단의 하플로타입 분포

(A) 55명 한국 남성의 Y-염색체 하플로타입,

(B) 88명 한국인의 mtDNA 하플로타입,

(C) 부트스트랩(bootstrap) $i=1,000$의 근린 결합(neighbor-joining) 방법으로 구축한 mtDNA 하플로타입의 계통수(phylogenetic tree). 우세한 mtDNA 하플로그룹 클러스터를 트리 오른쪽에 표시했다. 고대 하플로그룹은 M’과 R’로 나타냈다. P, 플라이스토세(Pleistocene); H, 홀로세(Holocene).

한국인의 혼합 시기 추정 Admixture Time Estimation for Koreans

We estimated the admixture time of Koreans using 286,222 SNPs and obtained significant prediction results from only three populations as references; Yakut, Han, and Japanese (table 2). The estimated admixture time was 5,482, 3,583, and 2,827 YA when we used the Koreans itself as one reference and Yakut, Han, and Japanese as the other comparison reference population, respectively. Our estimated admixture time with Japanese (97 generations away from the Japanese) is slightly earlier than the admixture date of the mainland Japanese (52 generations) estimated by Takeuchi et al. (2017).

우리는 286,222개의 SNP를 사용하여 한국인의 혼합 시기를 추정했으며, 야쿠트(Yakut)인, 한(Han)족, 일본인(Japanese)의 세 집단을 참조로 했을 때만 유의미한 예측 결과를 얻었다(표 2). 한국인 자체를 하나의 참조 집단으로, 그리고 야쿠트(Yakut)인, 한(Han)족, 일본인(Japanese)을 각각 다른 비교 참조 집단으로 사용했을 때, 추정된 혼합 시기는 각각 5,482년 전(YA), 3,583년 전, 2,827년 전이었다. 일본인(Japanese)과의 혼합 시기로 추정된 값(일본인(Japanese)으로부터 97세대 전)은 타케우치(Takeuchi) 등(2017)이 추정한 일본 본토인의 혼합 시기(52세대 전)보다 약간 이르다.

표 2. 한국인의 혼합 시기 추정

혼합 시기는 현재로부터 몇 세대 전(generations before the present)으로 표시했다. 괄호 안의 숫자는 세대와 연도(years)의 95% 신뢰 구간을 나타낸다.

We summarized our model of the genetic influence by pre-Neolithic Tianyuan to Iron Age Vat Komnou on Koreans in figure 5. This model supported the above gene flows well, suggesting Koreans contain prehistoric genetic components derived from Devil’s Gate and Man Bac groups both of whom are divergent from Tianyuan ancestry. The Neolithic Man Bac genome dominantly inherited the genetic components of Tianyuan and showed its genetic components widely distributed in EA. However, the Bronze and Iron Age ancients, such as Oakaie, Nui Nap, and Vat Komnou, seem to have much altered genetic components of EA_b genomes (70%). This is consistent with the EA_b ancestry frequency in contemporary Koreans. This model generally describes well the gene flow among the three Northeast Asians; Korean, Chinese, and Japanese.

우리는 신석기 이전(pre-Neolithic) 전원(田園)인(Tianyuan)부터 철기(Iron Age) 시대 밧콤노우(Vat Komnou)인까지 한국인에게 미친 유전적 영향을 우리의 모델로 요약하여 그림 5에 나타냈다. 이 모델은 앞서 언급된 유전자 흐름을 잘 뒷받침하며, 한국인이 데블스 게이트(Devil’s Gate) 집단과 만박(Man Bac) 집단에서 유래한 선사시대 유전 요소를 포함하고 있음을 시사한다. 이 두 집단은 모두 전원(田園)인(Tianyuan) 조상 계통에서 분기되었다. 신석기(Neolithic) 만박(Man Bac) 유전체는 전원(田園)인(Tianyuan)의 유전 요소를 우세하게 물려받았으며, 그 유전 요소는 동아시아(EA)에 널리 분포되어 있음을 보여주었다. 그러나 오아카이에(Oakaie), 누이납(Nui Nap), 밧콤노우(Vat Komnou)와 같은 청동기(Bronze) 및 철기(Iron Age) 시대 고대인들은 EA_b 유전체의 유전 요소를 (70% 수준으로) 크게 변화시킨 것으로 보인다. 이는 현대 한국인에게서 나타나는 EA_b 조상 계통 빈도와 일치한다. 이 모델은 한국인(Korean), 중국인(Chinese), 일본인(Japanese) 세 동북아시아(Northeast Asians) 집단 간의 유전자 흐름을 전반적으로 잘 설명한다.

그림 5. 한국인의 역사적 유전 구성을 묘사하는 혼합 트리 모델

한국인과 다른 아시아인(Asians)의 역사적 유전 구성을 묘사하는 혼합 모델에 qpgraph (Patterson et al. 2012)를 적용했다. 우리는 한국인을 형성하는 유전자 흐름을 가장 잘 설명할 수 있는 모델을 만들기 위해, 동아시아(EA) 집단과 연관된 고대 유전체를 가지고 혼합 트리 모델을 맞추었으며, 이로 인해 Eₛᵢ 조상 계통에 대한 혼합 모델 정보는 단순화되었다. D-통계량 및 f3-통계량과 이전 보고들(Lipson et al. 2018)에 기초하여, 기본 골격 트리(온라인 보충 자료, 보충 그림 10A)를 설정하고 고대 및 현대 개인들을 추가하여 모델을 확장했다(온라인 보충 자료, 보충 그림 10). ALDER로 추정된 한국인의 평균 혼합 시기(표 2)는 붉은색 원 옆에 표시했다. 검은색 원은 시간 보정 증거가 없는 조상 유전 계통 내의 유령 유전체(ghost genomes)를 나타내며, 더 많은 고대 집단이 발견되고 서열 분석되면 새로운 그룹이 추가될 수 있다. 검은색 선은 유전자 흐름을, 점선은 qpgraph 분석으로 추정된 비율이 표시된 혼합 사건을 나타낸다.

결론

We analyzed the haplotype distributions of 88 Koreans compared with ancient and modern whole genomes and suggested two major haplotype expansion events. A comprehensive genome comparison confirmed that Koreans possess dual ancestral genetic components originating broadly from East Siberia (Eₛᵢ) and East Asia (EA_b). Ancient genome comparisons revealed that the genetic makeup of Koreans can be best described as an admixture of the Neolithic Devil’s Gate genome in Russia and the Iron Age Vat Komnou in Southeast Asia.

우리는 88명 한국인의 하플로타입 분포를 고대 및 현대 전체 유전체와 비교 분석하여, 두 차례의 주요 하플로타입 확장 사건을 제안했다. 포괄적인 유전체 비교 결과, 한국인은 크게 동시베리아(East Siberia, Eₛᵢ)와 동아시아(East Asia, EA_b)에서 유래한 이중의 조상 유전 요소를 보유하고 있음을 확인했다. 고대 유전체 비교를 통해, 한국인의 유전적 구성은 러시아(Russia)의 신석기(Neolithic) 시대 데블스 게이트(Devil’s Gate) 유전체와 동남아시아(Southeast Asia)의 철기(Iron Age) 시대 밧콤노우(Vat Komnou) 유전체의 혼합으로 가장 잘 설명될 수 있음을 밝혀냈다.

Our analyses of ancient and present-day populations suggest a long and gradual admixture model of two Neolithic founders, the Devil’s Gate founder in Russia and the founder from Tianyuan Cave in China. These two major components were admixing throughout East Siberia and East Asia for an extended time up until the Neolithic period.

우리의 고대 및 현대 인구 집단 분석은, 러시아(Russia)의 데블스 게이트(Devil’s Gate) 창시 집단과 중국의 전원(田園) 동굴(Tianyuan Cave) 창시 집단이라는 두 신석기(Neolithic) 창시 집단이 오랫동안 점진적으로 혼합된 모델을 시사한다. 이 두 주요 구성 요소는 신석기(Neolithic) 시대에 이르기까지 오랜 기간 동시베리아(East Siberia)와 동아시아(East Asia) 전역에서 혼합되고 있었다.

Subpopulations of current East Asians, as well as modern Koreans, were probably established by a later regional genetic transition during the Bronze Age. The peopling of Korea is most likely a part of large population expansion and the subsequent admixture events which occurred in East Asia, rather than a unique isolated event or migration.

현대 한국인뿐만 아니라 현재 동아시아인(East Asians)의 하위 집단들은 아마도 청동기(Bronze Age) 시대 동안 후기 지역적 유전 변이(genetic transition)를 통해 형성되었을 것이다. 한국의 인구 형성은 독특하고 고립된 사건이나 이주라기보다는, 동아시아(East Asia)에서 발생한 대규모 인구 팽창 및 그에 따른 혼합 사건의 일부일 가능성이 높다.

We think that this kind of recent rapid expansion and admixture could be general models for other East Asian and Southeast Asian populations in which Bronze and Iron Age populations expanded and admixed with other peripheral region populations.

우리는 이러한 종류의 최근의 급격한 팽창과 혼합이, 청동기(Bronze Age) 및 철기(Iron Age) 시대 인구 집단이 확장하여 다른 주변 지역 집단과 혼합된 다른 동아시아(East Asian) 및 동남아시아(Southeast Asian) 집단에게도 일반적인 모델이 될 수 있다고 생각한다.

보충 자료

Supplementary data are available at Genome Biology and Evolution online.

보충 자료는 《유전체 생물학과 진화(Genome Biology and Evolution)》 온라인에서 이용할 수 있다.

감사의 말

The authors acknowledge the research grant provided by the Hanmaeum Peace Foundation and Mr Nam, Seungwoo. This work was supported by the Technology Innovation Program (20003641, Development and Dissemination on National Standard Reference Data) funded by the Ministry of Trade, Industry & Energy (MOTIE, Korea). This work was supported by the U-K BRAND Research Fund (1.190007.01) of UNIST (Ulsan National Institute of Science & Technology) and also by the Research Project Funded by Ulsan City Research Fund (1.190033.01) of UNIST and by the Research Project Funded by Ulsan City Research Fund (1.200047.01) of UNIST. We thank Prof. Dawn Field and Jaesu Bhak for editing the article. We also thank Prof. Andrea Manica for advising the admixture time analysis. J.B. is the CEO of Clinomics Inc. J.B. has an equity interest in the company.

저자들은 한마음평화재단과 남승우(Nam, Seungwoo) 선생이 제공한 연구비에 감사를 표한다. 이 연구는 산업통상자원부(MOTIE, Korea)가 지원하는 기술혁신프로그램(20003641, 국가표준참조데이터 개발 및 보급)의 지원을 받았다. 이 연구는 울산과학기술원(UNIST)의 U-K BRAND 연구기금(1.190007.01), 그리고 울산광역시(Ulsan City) 연구기금(1.190033.01 및 1.200047.01)의 지원을 받았다. 논문 편집에 도움을 준 돈 필드(Dawn Field) 교수와 박재수(Jaesu Bhak)에게 감사한다. 또한 혼합 시기 분석에 조언을 준 안드레아 마니카(Andrea Manica) 교수에게도 감사한다. 박종화(J.B.)는 클리노믹스(Clinomics Inc)의 대표이사(CEO)이며, 이 회사에 지분을 보유하고 있다.

저자 기여

J.K., S.J., and J.B. designed the study. S.J., J.B., J.O., K.T., S.S., S.F., and F.A. collected genomic data. J.K., S.J., J.-P.C., A.B., Y.J., and J.-J.K. performed the bioinformatics analysis. J.K., S.J., and J.B. interpreted data and drafted the article. All authors edited and approved the final version of the article.

김정은(J.K.), 전성원(S.J.), 박종화(J.B.)가 연구를 설계했다. 전성원(S.J.), 박종화(J.B.), 준 오하시(J.O.), 카츠시 토쿠나가(K.T.), 스미오 스가노(S.S.), 수탓 푸차로엔(S.F.), 파드 알-물라(F.A.)가 유전체 데이터를 수집했다. 김정은(J.K.), 전성원(S.J.), 최재필(J.-P.C.), 아스타 블라지테(A.B.), 전연수(Y.J.), 김종일(J.-J.K.)이 생물정보학 분석을 수행했다. 김정은(J.K.), 전성원(S.J.), 박종화(J.B.)가 데이터를 해석하고 논문 초안을 작성했다. 모든 저자가 논문 최종본을 편집하고 승인했다.

참고문헌

Alexander DH, Novembre J, Lange K. 2009. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 19(9):1655–1664.

Allentoft ME, et al. 2015. Population genomics of Bronze Age Eurasia. Nature 522(7555):167–172.

Bae CJ, Bae K. 2012. The nature of the early to late Paleolithic transition in Korea: current perspectives. Q Int. 281:26–35.

Bonatto SL, Salzano FM. 1997. Diversity and age of the four major mtDNA haplogroups, and their implications for the peopling of the New World. Am J Hurn Genet. 61(6):1413–1423.

Chiaroni J, Underhill PA, Cavalli-Sforza LL. 2009. Y chromosome diversity, human expansion, drift, and cultural evolution. Proc Natl Acad Sci U S Α. 106(48):20174–20179.

HUGO Pan-Asian SNP Consortium, et al. 2009. Mapping human genetic diversity in Asia. Science 326:1541–1545.

Damgaard PB, et al. 2018. 137 Ancient human genomes from across the Eurasian steppes. Nature 557(7705):369–374.

de Barros Damgaard P, et al. 2018. The first horse herders and the impact of early Bronze Age steppe expansions into Asia. Science 360(6396):eaar7711.

Haak W, et al. 2015. Massive migration from the steppe was a source for Indo-European languages in Europe. Nature 522(7555):207–211.

Jin H-J, Tyler-Smith C, Kim W. 2009. The peopling of Korea revealed by analyses of mitochondrial DNA and Y-chromosomal markers. PLoS One 4(1):e4210–e4210.

Jostins L, et al. 2014. YFitter: maximum likelihood assignment of Y chromosome haplogroups from low-coverage sequence data. arXiv: 1407.7988.

Karmin M, et al. 2015. A recent bottleneck of Y chromosome diversity coincides with a global change in culture. Genome Res. 25(4):459–466.

Kim J, et al. 2018. KoVariome: Korean National Standard Reference Variome database of whole genomes with comprehensive SNV, indel, CNV, and SV analyses. Sci Rep. 8(1):5677.

Kim S-H, et al. 2011. High frequencies of Y-chromosome haplogroup O2b-SRY465 lineages in Korea: a genetic perspective on the peopling of Korea. Invest Genet. 2(1):10–10.

Kim YJ, Jin HJ. 2013. Dissecting the genetic structure of Korean population using genome-wide SNP arrays. Genes Genomics. 35(3):355–363.

Kloss-Brandstatter A, et al. 2011. HaploGrep: a fast and reliable algorithm for automatic classification of mitochondrial DNA haplogroups. Hum Mutat. 32:25–32.

Lawson DJ, Hellenthal G, Myers S, Falush D. 2012. Inference of population structure using dense haplotype data. PLoS Genet. 8(1):e1002453.

Lazaridis I, et al. 2014. Ancient human genomes suggest three ancestral populations for present-day Europeans. Nature 513(7518):409–413.

Li H, Durbin R. 2009. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25(14):1754–1760.

Lipson M, et al. 2018. Ancient genomes document multiple waves of migration in Southeast Asian prehistory. Science 361(6397):92–95.

Liu X, et al. 2017. Characterising private and shared signatures of positive selection in 37 Asian populations. Eur J Hum Genet. 25(4):499–508.

Loh PR, et al. 2013. Inferring admixture histories of human populations using linkage disequilibrium. Genetics 193(4):1233–1254.

McColl H, et al. 2018. The prehistoric peopling of Southeast Asia. Science 361(6397):88–92.

McKenna A, et al. 2010. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20(9):1297–1303.

Norton CJ. 2000. The current state of Korean paleoanthropology. J Hum Evol. 38(6):803–825.

Park YC. 1992. Chronology of palaeolithic sites and its cultural transition in Korea. J Korean Archaeol Soc. 28:5–130.

Patel RK, Jain M. 2012. NGS QC Toolkit: a toolkit for quality control of next generation sequencing data. PLoS One 7(2):e30619.

Patterson N, et al. 2012. Ancient admixture in human history. Genetics 192(3):1065–1093.

Patterson N, Price AL, Reich D. 2006. Population structure and eigenanalysis. PLoS Genet. 2(12):e190.

Purcell S, et al. 2007. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 81(3):559–575.

Siska V, et al. 2017. Genome-wide data from two early Neolithic East Asian individuals dating to 7700 years ago. Sci Adv. 3(2):e1601877.

Soares P, et al. 2009. Correcting for purifying selection: an improved human mitochondrial molecular clock. Am J Hum Genet. 84(6):740–759.

Takeuchi F, et al. 2017. The fine-scale genetic structure and evolution of the Japanese population. PLoS One 12(11):e0185487.

1000 Genomes Project Consortium, et al. 2015. A global reference for human genetic variation. Nature 526:68.

Wang Y, Lu D, Chung YJ, Xu S. 2018. Genetic structure, divergence and admixture of Han Chinese, Japanese and Korean populations. Hereditas 155(1):19.

Yang MA, et al. 2017. 40,000-year-old individual from Asia provides insight into early population structure in Eurasia. Curr Biol. 27(20):3202–3208.e3209.

Associate editor: Naruya Saitou

윤순봉의 서재

[고대DNA] 고대 및 현대 유전체 서열 분석을 통한 한민족의 기원과 구성: Kim, Jungeun, et al(2020). The origin and composition of Korean ethnicity analyzed by ancient and present-day genome sequences.

The Origin and Composition of Korean Ethnicity Analyzed by Ancient and Present-Day Genome Sequences

고대 및 현대 유전체 서열 분석을 통한 한민족의 기원과 구성

[논문요약]

1. 결론

2. 무엇이 궁금했나?

3. 무엇을 발견했나?

1) 결과 1: 현대인의 유전 구조 (현대 한국인의 유전적 위치)

2) 결과 2: 고대인의 유전적 친연성 (우리의 조상은 누구인가?)

3) 결과 3: 한국인을 형성한 유전자 흐름 (언제, 어떻게 섞였나?)

4) 결과 4: 부계(Y-DNA)와 모계(mtDNA)의 증거

5) 결과 5: 혼합 시기 추정 (언제 혼합이 일어났나?)

4. 한국인 형성의 최종 모델

[논문번역]

요약

목차

서론

재료 및 방법

데이터 세트 Data Set

전체 유전체 서열 분석 및 유전형 분석 Whole-Genome Sequencing and Genotyping

하플로타입 분석 Haplotype Analysis

유전체 클러스터링 Genomic Clustering

혼합 시기 추정 Admixture Time Estimation

고대 집단과 현대 집단 간의 유전적 친연성 The Genetic Affinity between the Ancient and Present-Day Populations

혼합 모델 구축 Admixture Model Construction

결과 및 토의

한국인의 유전 구조 Korean Genetic Structure

신석기 시대 데블스 게이트 조상에서 한국인으로의 유전자 흐름 The Gene Flow Neolithic Age Devil’s Gate Ancestry to Korean People

한국 민족 집단을 형성한 고대의 유전자 흐름 The Ancient Gene Flow Making Up the Korean Ethnic Group

한국인 하플로타입 분석은 여러 차례의 유전 요소 유입을 보여준다 Korean Haplotype Analysis Reveals Multiwaves of Genetic Components

한국인의 혼합 시기 추정 Admixture Time Estimation for Koreans

결론

보충 자료

감사의 말

저자 기여

참고문헌

이것이 좋아요:

관련

댓글 남기기응답 취소

The Origin and Composition of Korean Ethnicity Analyzed by Ancient and Present-Day Genome Sequences

고대 및 현대 유전체 서열 분석을 통한 한민족의 기원과 구성

[논문요약]

1. 결론

2. 무엇이 궁금했나?

3. 무엇을 발견했나?

1) 결과 1: 현대인의 유전 구조 (현대 한국인의 유전적 위치)

2) 결과 2: 고대인의 유전적 친연성 (우리의 조상은 누구인가?)

3) 결과 3: 한국인을 형성한 유전자 흐름 (언제, 어떻게 섞였나?)

4) 결과 4: 부계(Y-DNA)와 모계(mtDNA)의 증거

5) 결과 5: 혼합 시기 추정 (언제 혼합이 일어났나?)

4. 한국인 형성의 최종 모델

[논문번역]

요약

목차

서론

재료 및 방법

데이터 세트 Data Set

전체 유전체 서열 분석 및 유전형 분석 Whole-Genome Sequencing and Genotyping

하플로타입 분석 Haplotype Analysis

유전체 클러스터링 Genomic Clustering

혼합 시기 추정 Admixture Time Estimation

고대 집단과 현대 집단 간의 유전적 친연성 The Genetic Affinity between the Ancient and Present-Day Populations

혼합 모델 구축 Admixture Model Construction

결과 및 토의

한국인의 유전 구조 Korean Genetic Structure

신석기 시대 데블스 게이트 조상에서 한국인으로의 유전자 흐름 The Gene Flow Neolithic Age Devil’s Gate Ancestry to Korean People

한국 민족 집단을 형성한 고대의 유전자 흐름 The Ancient Gene Flow Making Up the Korean Ethnic Group

한국인 하플로타입 분석은 여러 차례의 유전 요소 유입을 보여준다 Korean Haplotype Analysis Reveals Multiwaves of Genetic Components

한국인의 혼합 시기 추정 Admixture Time Estimation for Koreans

결론

보충 자료

감사의 말

저자 기여

참고문헌

이것이 좋아요:

관련

댓글 남기기응답 취소

윤순봉의 서재에서 더 알아보기