VARIATION OF Y-CHROMOSOMAL STRS IN YEZIDI AND CHALDEAN POPULATIONS IN IRAQI KURDISTAN

Background: Many ethnic groups live in the northern part of Iraq which represents the Iraqi part of Kurdistan. Short tandem repeats are widely used in population genetics and forensic science. Objective: This research aims to analyze the Y-chromosomal STR markers of two ethnic groups living in Iraqi Kurdistan, Yezidi, and Chaldean groups. Methodology: Samples of peripheral blood from a total of 44 unrelated males (22 for each ethnic group) were taken. DNA was extracted by using a DNA Extraction kit and analyzed for eight Y-chromosomal short tandem repeats (Y-GATA-H4, Y-GATA-C4, DYS458, DYS456, DYS448, DYS437, DYS392, and DYS19). Then, the PCR products were run on 10% polyacrylamide gel and stained by silver nitrate. The results were analyzed by Power marker V3.25 and the dendrogram was created by Mega X software. Results: The highest diversity was observed at Y-GATA-C4 (GD: 0.81) while the lowest diversity was observed at DYS456 (GD: 0.64) in the Yezidi group. In the Chaldean group, DYS458 (GD: 0.88) was the most diverse, while the least diverse marker was Y-GATA-H4 (GD:0.66). The marker Y-GATA-C4 was found to be the most informative marker in both groups with a PIC value of 0.8605. Conclusions: The study confirmed the high discrimination ability of the Y-chromosomal STRs analysis and provided a dataset on these two ethnic groups of Iraqi Kurdistan. The dendrogram of Yezidi and Chaldean datasets reveals that the Yezidi individuals are more closely related to each other as compared to the Chaldean group because intermarriage among Yezidi people is more than that among the Chaldean individuals .


INTRODUCTION
Kurdistan is a geographic region located in Western Asia and it includes parts of Iran, Turkey, Syria, and Iraq  (Rodziewicz, 2018). The religion of the Yezidi people is known as Yazidism which is monotheistic, and it has roots in the pre-Zoroastrian religion of Iran (Foltz, 2017). The Chaldean ethnic group is an Aramaicspeaking, Eastern Rite Catholic. In Mesopotamia, the cradle of civilization, they have a history dating back more than 5,500 years.

A. Ethical Committee Approval
The approval of ethics was performed by the ethical committee at the Duhok province Ministry of Health (Reference number: 21082022-6-9). Informed consent for each volunteer was made, genealogical information was documented, and each volunteer confirmed that their fathers, grandfathers, and great-grandfathers belong to the Yezidi or Chaldean ethnic group .

B. Metho ds
A total of 44 blood samples were collected from unrelated males of two ethnic groups who live in Iraqi Kurdistan, the Yezidi, and Chaldean. We also collected genealogical information about the donors. The DNA was extracted from the whole blood samples using DNA Extraction Kit according to the instructions provided by the supplier company (Dongsheng Biotech Company, China, CAT No. NH 1121). Eight primers of the Y chromosomal STRs were used, namely: Y-GATA-H4, DYS437, DYS392, DYS458, DYS448, DYS456, Y-GATA-C4, and DYS19.
The PCR program was; initial denaturation at 94˚C for 5 min (one cycle); then 34 cycles of 94˚C denaturation for 60 sec, specific annealing temperature (Table 1) for 35 sec and 1 min extension at 72˚C; followed by one cycle at 72˚C for 6 min (final extension). The amplified products along with 20 bp ladder DNA markers were run on 10% polyacrylamide gel electrophoresis for band sizing, and the bands were stained by silver nitrate for visualization . Statistical data analysis: The data of the results were analyzed by using the Power Marker V3.25 software and MEGA X was used for constructing the phylogenetic tree. The genetic relationship parameters were calculated according to Reynolds's (1983) statistics (Reynolds et al., 1983). The similarity matrix was used to construct the dendrogram using the unweighted pair group method arithmetic averages (UPGMA) procedure (Sokal, 1958). Phylogenetic tree construction was created by using MEGA-X software .

RESULTS
Using power marker V3.25 software analyzed some molecular parameters such as mean allele number, gene diversity, allele frequency, genetic distance, and polymorphic information content. The total number of alleles identified in the two populations was 88 alleles. The allele sizes range was from 111-315 bp (Table 2 and fig. 1) .  In the Yezidi population, the alleles number per locus ranged from 3 at locus DYS456 to 6 alleles at locus DYS448, DYS458, Y-GATA-C4, and Y-GATA-H4, with an average of 4.8750 alleles per locus. Allele frequency ranged from 0.2727 in Y-GATA-C4 to 0.4545 in DYS456 while the mean was 0.3597. The range of gene diversity was from 0.6405 in DYS456 to 0.8140 in Y-GATA-C4 while the mean was 0.7384, indicating a high level of diversity (Table 3).
In the Chaldean group, the number of alleles per locus range was from 4 at DYS437 and Y-GATA-H4 to 11 alleles at the DYS458 locus and the mean was 6.1250. The range of allele frequency ranged from 0.1818 in DYS458 to 0.4545 in Y-GATA-H4 with a mean of 0.3068. The range of gene diversity was from 0.6653 in Y-GATA-H4 to 0.8884 in DYS458 and the mean was 0.7748, this value is higher than the Yezidi population (Table  4) .   The availability value (alleles observed per sampled individuals) was calculated for accurate data analysis and its value was higher in the Chaldean group, with an average of 1.00, but it was 0.8807 in the Yezidi population because of the null alleles of the locus of some Yezidi samples (Tables  3 and 4) .
The value of PIC (polymorphic information content) was also calculated for the eight primers in both populations (Tables 3 and 4). The values ranged from 0.5669 in the Yezidi population for the least informative marker, DYS456, to 0.8781 for the most informative marker, DYS458 in the Chaldean population .
According to Table 5 in both populations together, the range of allele number was from 4 at DYS437 to 11 alleles at DYS458 locus with a mean of 7.2500 alleles per locus. The allele frequency ranged from 0.1818 in Y-GATA-C4 to 0.3659 in DYS448 and the mean was 0.2726. The range of gene diversity was from 0.7139 in DYS437 to 0.8740 in Y-GATA-C4 with a mean of 0.8035. Availability of alleles ranged from 0.7727 in DYS392 to 1.000 in DYS456, Y-GATA-C4, and Y-GATA-H4. The range of PIC values was from 0.7165 in Y-GATA-H4 to 0.8605 in Y-GATA-C4.
The results of Phylogenetic analysis created a dendrogram which resulted in the separation of the populations into three main clusters, Yezidi in one cluster and Chaldean in two other clusters except a few individuals were admixed with another cluster or sub-cluster from both populations, (fig. 2) .

DISCUSSION
In this study, eight loci of the human Y-STRs were used to determine the genetic variation and allele frequency between Yezidi and Chaldean populations in Duhok province. The results showed that within a total of 88 alleles, their sizes range from 111 to 315 bp (Table 2). These results are in agreement with those reported previously for the Iraqi Arab families living in the middle Euphrates and their PCR product size of the DYS392 locus ranged from 93 to 125 bp, and DYS19 ranged from 176 to 212 bp (Naji and Al Saadi, 2020). The mean number of alleles per locus scored in this study (Yezidi 4.8750, Chaldean 6.1250 alleles) was lower than those published in the NIST fact sheet, USA with an average of 9 alleles per locus (NIST, 2017) .
The high amount of genetic diversity in the population is suggested by the high number of alleles per population. Fattah and his colleagues reported that the average number of alleles in the Kurd population was 5.125 (Fattah et al., 2019). The high number of alleles within each population indicates a great level of genetic diversity. The allele frequency in the two groups, Yezidi and Chaldean was not similar to each other. A study by Ohied and Al Badran in the Basrah population with many similar primers used showed high allele frequency in all studied loci (Ohied and Al Badran, 2022). In another study by Imad and his colleagues in the middle and south of Iraq population, all eight primers used in this study were also used by them (Imad et al., 2013). Allele frequencies in all loci were higher than the results in this study. The data in Tables (3 and 4) indicate that the mean value of gene diversity in the Chaldean population is the highest (0.7748) then followed by the mean gene diversity in the Yezidi population (0.7384). Both Imad et al., (2013) and Naji and Al Saadi, (2020) reported much lower gene diversity than that reported in this study. In Northern Greece, genetic diversity value of 0.9992 also has been scored in 17 Y STR loci, five of these STRs were similar to those used in this study (Leda et al., 2008). The results also revealed that the genetic diversity in the Chaldean population was higher than those in the Yezidi Kurd population (Tables 3 and 4). These variations in genetic diversity values in different populations may be attributed to the gene flow and migration during different times in history. An important factor determining whether a genetic marker is informative is its polymorphism information content (PIC) value. Values of PIC greater than 0.5 (PIC>0.5) are considered a highly informative primer (Botstein et al., 1980). In this study, the value ranged from 0.5669 at the DYS456 locus with 3 alleles in the Yezidi Kurd population to 0.8781 at the DYS458 locus with 11 alleles in the Chaldean population. All these primers used in this study, therefore, can be considered informative due to their high values. These results are in agreement with those of Fattah and colleagues in 2019, (Fattah et al., 2019) whom they reported high PIC values. Naji and Al Saadi, (2020) found that DYS19 and DYS392 primers were the most polymorphic compared to other primers. Primer Y-GATA-C4 was found to be the most informative marker regarding both populations collectively with a PIC value of 0.8605. To evaluate the genetic differentiation and the distance between different populations, a phylogenetic tree was constructed. The phylogenetic tree ( fig. 2) separated the populations into two major clusters. The first cluster was subdivided into two other subclusters, one of the Yezidi and the other of the Chaldean subcluster but the other main cluster contained most of the Chaldean. There were few individuals from one clad clustered to another clad in both populations. Compared to Chaldean populations, Yezidi populations have a smaller genetic distance than Chaldean populations do, because of the intermarriage between the Yezidi population individuals. The admixture of a few individuals from one population to another one can be attributed to their long-sharing history of living together for thousands of years. Also, wars, genocides, immigration, and gene flow have their role in admixing some of the individuals from clusters. Another explanation for this is that there is an unknown number of males who have the same Y-STR profile (de Knijff, 2022). Tömöry and colleagues through their research study explained that there was not much genetic separation among Hungarian-speaking communities in the Carpathian Basin (Tömöry et al., 2007). The Hungarian gene pool was affected by neighboring gene flow and migration. Therefore, the gene flow may be one of the reasons for the admixing among these two populations.

CONCLUSION
All investigated loci have a high power of discriminating values, indicating that a DNA-based database can be created using these loci. The highest gene diversity was seen at Y-GATA-C4 (GD: 0.81) while the lowest diversity was observed at DYS456 (GD: 0.64) in the Yezidi group. In the Chaldean group, DYS458 (GD: 0.88) was the most diverse marker, while the least diverse marker was