Article Content
Introduction
Flowering in angiosperms is referred to as the transition from vegetative to reproductive growth, which ensures the success of propagation and crop yield1. This process is strictly regulated by various endogenous and environmental cues, among which day length (photoperiod) change is one of the critical factors to induce flowering at a favorable time. Based on the response to photoperiod, plants can be classified into three types: long-day, short-day, and day-neutral2. The general process of photoperiod-induced flowering is conserved between long-day and short-day plants. First, the photoreceptor in leaves tracks the photoperiod and integrates the information into the circadian clock to generate a flowering signal called florigen, which then transports to the shoot apex and activates a set of floral identity genes, thus triggering flowering3,4,5,6. Aside from those, there are multiple additional layers of post-transcriptional modifications that regulate flowering time in a photoperiod-dependent manner6,7,8,9.
Soybean (Glycine max) is one of the most important crops in the world. It contains high protein and oil contents, as well as a potent nitrogen fixation ability. As a typical short-day crop, soybean flowering is strictly regulated by photoperiod, which has a great impact on the adaptation of elite cultivars to a broad range of latitudes10. Current evidence supports the idea that cultivated soybean has been domesticated and cultivated for more than 5000 years from the temperate regions in China11, whereas artificial selection in combination with natural variation of critical quantitative trait loci (QTLs) controlling flowering and maturity led to the adaption of soybeans to the regions ranging from 54˚ N to 35˚S around the world12. Among those major QTLs, several of them that contribute to the flowering and maturity have been identified, which include E1 to E11, J, LUXs (LUX ARRHYTHMO), and Tofs (Time of flowering)3. E1, E3, E4, Tof4, Tof5, Tof8, Tof11, Tof12, and Tof13 contribute to the adaptation of soybeans to high-latitudes13,14,15,16,17,18,19; while J, LUXs, Tof16, and Tof18 facilitate their adaptation to low-latitudes20,21,22,23,24,25. In particular, E1 is a leguminous-specific transcription factor and was thought to be a repressor of photoperiodic responses in soybeans26. The long-juvenile gene J is the ortholog of EARLY FLOWERING 3 (ELF3), which forms the circadian clock evening complex (EC) together with ELF4 and LUX to suppress the transcription of E120,22,24,25. FLOWERING LOCUS T 2a (FT2a) and FT5a are two key FT genes whose expressions are repressed by E1 family members27. E3 and E4 are the homologs of photoreceptor phytochrome A (phyA), which can physically interact with EC or E1 to modulate their protein stability20. In a word, the phyA-EC-E1-FT pathway is primarily responsible for the photoperiodic flowering of soybeans and shows a profound impact on their adaptation ability. On the other hand, epigenetic regulators for flowering in soybeans remain largely unknown when compared to the well-studied epigenetic regulation mechanisms in the model plant Arabidopsis thaliana.
Epigenetic regulations refer to the regulation of gene transcription without altering the DNA sequence, which includes DNA methylation, histone modification, histone variants, and noncoding RNAs28. Histone modification refers to the histone methylation and demethylation at various lysine residues of the histone tail. Histone methylation is regulated by a set of protein methyltransferases; while histone demethylation is catalyzed by histone demethylases29. Human lysine-specific demethylase 1 (HsLSD1) was the first histone demethylase to be identified and demethylates H3K4 mono-methylation (H3K4me1) and H3K4 di-methylation (H3K4me2)30,31,32. In the model plant Arabidopsis, there are four HsLSD1 homologs, including LSD-LIKE1 (LDL1), LDL2, LDL3, and FLOWERING LOCUS D (FLD). All of them have the activity to demethylate H3K4 and participate in regulating the transition to flowering33,34,35,36. In the fld mutant, the increased H3K4me1 at the FLC locus (FLOWERING LOCUS C) results in the upregulation of FLC, which in turn suppresses the flowering33. Similarly, H3K4 methylation of the FLC locus is also regulated by LDL1 and LDL2, which are functionally redundant with FLD. Moreover, LDL1 and LDL2 also regulate the H3K4 methylation at FLOWERING WAGENINGEN (FWA, a floral repressor)34. Unlike the delayed flowering of fld and ldl1/ldl2 mutants, the flowering time of the ldl3 mutant is expedited35, suggesting that they have divergent roles in regulating the flowering. Heading data 18 (Hd18), which is a rice homolog of FLD, induces rice flowering under both short- and long-day conditions37. It was demonstrated in a previous study that overexpression of GmFLD complemented the late-flowering phenotype of the Arabidopsis fld mutant38. GmLDL2 was identified as a homolog of Arabidopsis histone demethylase LDL238,39. However, the function and characteristics of GmLDL2 in soybeans is still largely unknown.
In this study, we perform a forward genetic screen and characterize an early-flowering soybean mutant ef1 (early flowering 1). We further demonstrate that GmLDL2 is the target gene of ef1. Enzymatic activity assays demonstrate that GmLDL2 displays demethylase activity to H3K4me1 and H3K4me2. Furthermore, RNA-seq and chromatin immunoprecipitation (ChIP)-seq analyses reveal that GmLDL2 directly binds to the ortholog of Arabidopsis FERONIA (GmFER) to regulate its transcription by altering the levels of H3K4me1 and H3K4me2 at 3’UTR and downstream regions. Consistently, genetic analyses indicate that GmLDL2 negatively regulates soybean flowering via repressing the expression of GmFER. Finally, haplotype analyses suggest that different alleles of GmLDL2 undergo artificial selection in adaptation to low- or high-latitudes. Together, the study offers a potential target for improving regional adaptability by manipulating molecular design breeding of soybeans.
Results
Screening an early-flowering soybean mutant ef1 and identification of the causal gene
To identify other genes involved in the flowering time control of soybeans, we performed a forward genetic screening of progeny from soybean cultivar Huaxia3 (HX3) mutagenized with 60Co γ-Ray. An early-flowering mutant, designated as ef1 was isolated. The flowering time of ef1 mutant is significantly earlier than HX3 under both short-day and long-day conditions (Fig. 1a, b). In the field experiment, compared with HX3, the maturity time of ef1 mutant is also earlier (Fig. 1c). To verify the earlier flowering phenotype of the ef1 mutant under natural conditions, HX3 and ef1 were planted from January to December in the field and it was found that the flowering time of ef1 is earlier than HX3 for all sowing dates (Fig. 1d). These data demonstrated that the early flowering phenotype of ef1 is very stable. We also measured several yield-related traits of the ef1 mutant, including plant height, node number, valid branch number, pod number, one hundred seed weight, and grain weight per plant. Compared with the wild type, the mutant exhibits a slight reduction in pod number and grain weight per plant (Supplementary Table 1).

a Phenotype of whole plants and flowers of HX3 and ef1 under short-day conditions. ef1 showed early flowering compared with HX3. White arrow, magnified view. Scale bars, 10 cm. b The flowering time of HX3 and ef1 under SD (11 h/13 h) and LD (15 h/9 h) conditions. c The maturity time of HX3 and ef1 in the field conditions of Guangzhou (Summer). d Flowering time of HX3 and ef1 relative to sowing dates throughout the year in field conditions. e Frequency distributions of flowering time in the F2 progeny of a cross between HX3 and ef1 under field conditions. f SNP-index distribution along chromosomes. Red regression lines were obtained by averaging SNP indices from a moving window of five consecutive SNPs and shifting the window 200 kb at a time. The Y-axis value of each average SNP index was set at a midpoint between the first and fifth SNP. g Fine mapping of ef1 using an F2 population from LNHM (a landrace form Guangdong) × ef1 candidate gene. Numbers of recombination refer to the number of heterozygous plants at a physical location. The candidate region was located in an 860 kb region spanned by SNP2 and SNP4, in which only Glyma.06G250100 harbors a deletion according to resequencing. The red square indicates ef1 has a single-nucleotide deletion at this site. ATG is the start codon; TAA is the stop codon; UTR untranslated region, CDS coding sequence. h Comparison of AtLDL2 and GmLDL2. Pink and gray squares represent the SWRIM and amine oxidase domains, respectively. Data represent means ± SD (n = 10 plants) in (b–d). The P value was generated from a two-sided Student’s t-test. Source data are provided as a Source Data file.
To identify the causative gene responsible for the early flowering in ef1, we crossed ef1 with HX3 to generate an F2 population with 119 individual plants. Among these plants, 87 showed the flowering time comparable to wildtype, while 32 exhibited early flowering phenotype (Fig. 1e). The segregation ratio of early and regular flowering time was about 1:3 (χ2 = 0.226, P = 0.63) in the F2 population, indicating that the ef1 phenotype is controlled by a single recessive allele. To map the locus, we re-sequenced the genome of the mutant and wildtype pooled from the F2 segregation population and calculated a SNP index using the bulk-segregant analysis (BSA) method40. Linkage analysis with SNP index distribution indicated that the 0.5 Mb region between 23,068,949 and 42,300,283 on chromosome 6 co-segregated with the early flowering phenotype of ef1 (Fig. 1f and Supplementary Table 2). To further narrow down the candidate interval in chromosome 6, we generated another F2 population with 95 plants by crossing ef1 with another soybean accession, LNHM (Liannanhuimao), which is a later-flowering landrace variety from Guangzhou, China. The flowering time of LNHM is significantly later than ef1 under the long-day conditions (Supplementary Fig. 1a, b), and the F2 progeny derived from LNHM × ef1 cross shows distinct comparable flowering time (Supplementary Fig. 1c). Through linkage analysis, we successfully narrowed down the candidate gene to an 850 kb region in chromosome 6 between maker SNP2 and SNP4 using the variations (SNPs or InDels) between LNHM and ef1 based on the resequencing (Fig. 1g). By comparing resequencing data between HX3 and ef1, we identified one gene Glyma.06G250100, with a 1 bp deletion at the first exon in ef1 (Fig. 1g). Sequence alignment showed that Glyma.06G250100 was previously named GmLDL2 encoding a homolog of Arabidopsis histone demethylase LDL2 (Supplementary Fig. 2)38,39. We therefore designated ef1 as GmLDL2 (named ef1 as Gmldl2-1) in soybean, and the 1 bp deletion resulted in a frameshift that led to a premature stop codon to produce a putative truncated protein lacking an amine oxidase-like domain (Fig. 1h). Taken together, these results provide evidence that the GmLDL2 mutation could be the causal lesion responsible for the early flowering phenotype observed in ef1.
To investigate the expression pattern of GmLDL2 in different tissues, we conducted reverse transcription quantitative PCR (RT-qPCR) to examine the transcription levels of GmLDL2 and found that the expression level of GmLDL2 in leaves is much higher than in other tissues tested (Supplementary Fig. 3a). Furthermore, we monitored the periodicity of GmLDL2 expression at a 4-h interval over a period of 48 h. No diurnal rhythm was observed for GmLDL2 expression either under short-day or long-day conditions (Supplementary Fig. 3b, c). In addition, the expression of GmLDL2 fused with GFP driven by the CaMV 35S promoter showed that the protein is localized in the nucleus (Supplementary Fig. 3d), which is consistent with the previous finding38.
Functional validation of GmLDL2 in controlling soybean flowering
To further verify that the early flowering of ef1 was indeed due to the loss-of-function of GmLDL2, we introduced the coding sequence (CDS) of GmLDL2 driven by the CaMV 35S promoter in the ef1 mutant background. Two independent representative transgenic lines were selected based on qRT-PCR and western blotting assays. Both the GmLDL2 transcript and its protein levels were significantly higher in the two transgenic lines compared to the ef1 mutant (Supplementary Fig. 4a, b). Consistently, the GmLDL2 complementary transgenic lines exhibited later flowering time compared to the ef1 mutant under either short-day or long-day conditions (Fig. 2a, b). Furthermore, we conducted CRISPR/Cas9 to generate two Gmldl2 alleles in the HX3 background. Two small guided RNA target regions (target 1 and 2) in the first exon of GmLDL2 were designed (Fig. 2c), which resulted in two independent GmLDL2 mutation lines (designated as Gmldl2-2 and Gmldl2-3), in which Gmldl2-2 contains a 5 bp deletion at target 1, and Gmldl2-3 contains a 2 bp deletion at target 2 (Fig. 2c). The Gmldl2-2 encodes a predicted protein with 23 amino acids because the 5 bp deletion led to a frameshift and premature stop codon. Gmldl2-3 harbors a truncated protein with 207 amino acids (Fig. 2d). Subsequently, HX3, ef1, Gmldl2-2, and Gmldl2-3 seeds were planted in the growth chamber under short-day and long-day conditions, and their flowering time were compared. Both Gmldl2-2 and Gmldl2-3 plants displayed early flowering phenotypes compared to HX3, similar to ef1 (Fig. 2e, f). We also examined the expression levels of soybean flowering marker genes GmFT2a and GmFT5a27, and found that the expression of both GmFT2a and GmFT5a were upregulated in Gmldl2 mutant compared to HX3 (Fig. 2g, h). Collectively, these data confirm that GmLDL2 is the causal gene for the ef1 mutant.

a The phenotype of whole plants and flowers of HX3, ef1, and two GmLDL2 overexpressed transgenic lines under short-day conditions. The two GmLDL2 complementary lines showed later flowering compared to ef1. White arrow, magnified view. Scale bars, 10 cm. b The flowering time of HX3, ef1, and two GmLDL2 overexpressed transgenic lines under SD (11 h/13 h) and LD (15 h/9 h) conditions. c The design of sgRNAs for CRISPR/Cas9 to knock out GmLDL2. Two sgRNAs targeting the CDS of GmLDL2 are labeled in red. Results of Sanger sequencing of target sites in T1 generations in HX3 and Gmldl2 mutants. sgRNA target and PAM sequences were shown in red and blue, respectively. UTR untranslated region. d Predicted amino acid sequences of GmLDL2 in HX3 and Gmldl2 mutants. Missense amino acids resulting from the frameshift caused by CRISPR/Cas9-induced mutations are shown in red. The asterisk indicates the termination of translation resulting from a stop codon. e The phenotypes of whole plants and flowers of HX3, ef1, Gmldl2-2, and Gmldl2-3 under short-day conditions. ef1, Gmldl2-2, and Gmldl2-3 showed early flowering compared to HX3. White arrow, magnified view. Scale bars, 10 cm. f The flowering time of all varieties examined under SD (11 h/13 h) and LD (15 h/9 h) conditions. g, h The expression levels of GmFT2a and GmFT5a in HX3, ef1, Gmldl2-2, and Gmldl2-3 at ZT4 in SD (11 h/ 13 h) conditions. Data represent means ± SD (n = 10 plants) in (b and f), and data represent means ± SD (n = 3 biological replicates) in (g, h). The P value was generated from a two-sided Student’s t-test. Source data are provided as a Source Data file.
The histone demethylase activity assay of GmLDL2
GmLDL2 is phylogenetically and structurally similar to the human HsLSD1 and Arabidopsis AtLDL2 (Fig. 1h and Supplementary Fig. 2), both of which were demonstrated to exhibit demethylase activity at H3K4me1 and H3K4me232,34. To test whether GmLDL2 also has histone demethylase activity, we measured the enzymatic activity of GmLDL2 in vitro. Similar to human HsLSD1, affinity-purified recombinant protein His-GmLDL2 exhibited demethylation activity for H3K4me1 and H3K4me2, but not for H3K4me3, H3K9me2, H3K27me1, or H3K36me3 (Fig. 3a, b). Furthermore, we transiently expressed 35Spro:GmLDL2-GFP, 35Spro:HsLSD1-GFP, and 35Spro:GFP in tobacco leaves and examined their levels of histone modification. It was found that both human LSD1 and soybean LDL2 displayed significant reduction in H3K4me1 and H3K4me2 signals as compared with that in the empty vector control, whereas di-, mono- and trimethylation levels at H3K9, H3K27, and H3K36 were unaffected compared to 35Spro:GFP expressing cells (Fig. 3c, d, Supplementary Fig. 5). We also examined the histone demethylase activity of GmLDL2 in vivo in HX3 and Gmldl2 mutant plants. Unexpectedly, western blot data showed no significant difference in H3K4me1/2/3, H3K9me2, H3K27me1, and H3K36me3 among HX3 and Gmldl2 mutants (Supplementary Fig. 6), indicating GmLDL2 does not affect the change of global level of H3K4me1/2 in vivo. Therefore, we speculate that mutation of GmLDL2 may only affect the H3K4me1/2 methylation levels at certain loci rather than global modification across the chromosome. Together, these results demonstrated that GmLDL2 acts as a histone demethylase for H3K4me1/2 in soybean.

a The affinity-purified recombinant His-GmLDL2 from E.coli was incubated with calf thymus histone for H3K4me1/2 demethylase activity assay, and the histone methylation status was determined using methylation-specific antibodies. His-HsLSD1 was used as the positive control, and anti-H3 was used as a loading control. The experiment was repeated three times independently with similar results. b Quantification analysis of (a) by ImageJ. Data represent means ± SD (n = 3 biological replicates). The P value was generated from a two-sided Student’s t-test. c GmLDL2-GFP exhibits H3K4me1/2 demethylase activity in N.benthamiana leaves. GmLDL2-GFP expression in N.benthamiana nuclei (arrows) was visualized by monitoring GFP fluorescence (green). The histone methylation status was analyzed by immunostaining with specific histone methylation antibodies (red). The location of nuclei was visualized using 4’,6-diamidine-2-phenylindole staining (blue). Scale bars, 50 μm. d Statistical analysis of the levels of fluorescent signals in (c) and Supplementary Fig. 5. Data represent means ± SD (n = 3 biological replicates). The P value was generated from a two-sided Student’s t-test. At least 200 nuclei expressing GFP, HsLSD1-GFP, or GmLDL2-GFP were analyzed in each biological replicate. Source data are provided as a Source Data file.
Transcriptomic analyses of GmLDL2-mediated gene expression
To further investigate gene transcription regulated by GmLDL2, two biological replicates of transcriptome sequencing were performed using leaves from HX3 and ef1. Briefly, clean reads were mapped to the Williams 82 reference genome using HISAT2 software41. There were 90.83% and 88.25% mapping ratios in two replicates of HX3, and 88.53% and 91.91% in two replicates of ef1 (Supplementary Table 3). RNA-seq data show a good correlation between replicates of both HX3 and ef1 (Correlation Coefficient: R2 = 0.9072 in HX3 and R2 = 0.9392 in ef1) (Supplementary Fig. 7). Analyses of differentially expressed genes using DESeq2 software42 showed that, compared with HX3, 440 and 372 genes were upregulated and downregulated in ef1 respectively (Fig. 4a and Supplementary Data 1). Gene Ontology (GO) enrichment analysis showed that upregulated genes were mainly involved in kinase activity, transferase activity, DNA-binding transcription factor activity, and transcription regulator activity (Fig. 4b). While the downregulated genes were mainly enriched in molecular function, binding, extracellular region, and cell wall (Fig. 4c). Meanwhile, the KEGG analysis showed that upregulated genes tend to associate with fatty acids, linoleic acid metabolism, monoterpene biosynthesis, zeatin biosynthesis, alkaloids biosynthesis, isoflavone biosynthesis, and photosynthesis antenna protein pathways. Downregulated genes were predominantly enriched in isoquinoline alkaloid biosynthesis, limonene and pinene degradation, and isoflavonoid biosynthesis pathways (Supplementary Fig. 8). Notably, there is no significant difference in expression levels between HX3 and ef1 for the principal flowering time-associated genes, such as E1, E3, E4, PRRs, and LUXs (Fig. 4d), whose expression was further validated by RT-qPCR analysis (Supplementary Fig. 9). These results suggest that GmLDL2 may regulate flowering time in the classical EC-E1 independent manner.

a Volcano plot showing differentially expressed genes in HX3 and ef1. Upregulated and downregulated genes in ef1 compared with HX3 are highlighted in red and blue, respectively. Differential expression analysis was performed using DESeq2 with its default Wald test (two-sided). P values were adjusted for multiple comparisons using the Benjamini–Hochberg method to control the false discovery rate (FDR). Genes with FDR < 0.05 were considered significantly differentially expressed. b and c Enriched Gene Ontology terms in the 440 genes upregulated or 372 genes downregulated in ef1, respectively. GO enrichment analysis was conducted using GOATOOLS with default parameters. Fisher’s exact test (two-sided) was used to assess enrichment significance, and P values were adjusted for multiple testing using the Benjamini–Hochberg correction. The outer circle indicates the classification of GO terms, and the middle circle represents gene number of each GO in the soybean genome background, and the inner circle represents the number of differentially expressed genes. P-values are presented in purple. d Heat map of expression profiles of known flowering control genes in HX3 and ef1. Source data are provided as a Source Data file.
ChIP-seq analyses of the GmLDL2-binding sites
To investigate the direct genome-wide binding sites of GmLDL2 in vivo, we performed chromatin immunoprecipitation followed by sequencing (ChIP-seq) with anti-GmLDL2 antibody using leaves of HX3 and ef1 (as a negative control) at the same stage as those in RNA-seq. The GmLDL2 antibody was verified by western blot, which can detect the GmLDL2 protein in HX3 rather than Gmldl2 mutants (Supplementary Fig. 10). Libraries from two biological replicates were sequenced, with highly reproducible confirmed by Spearman correlation (ρ > 0.90) and principal component analysis (PCA) (Supplementary Fig. 11a, b and Supplementary Fig. 12). In total, we identified 27,840 reliable peaks (q value < 0.01) entire the whole genome using merged anti-GmLDL2 ChIP-seq data from two biological replicates (Supplementary Table 4).
Given that GmLDL2 is a histone demethylase and can associate with chromatin, we performed ChIP-seq experiments using the anti-H3K4me1/2/3 antibodies to determine genome-wide H3K4 methylation levels in HX3 and ef1, with two replicates for each condition (the same stage sampled for RNA-seq). Similar to the anti-GmLDL2 ChIP-seq data, we observed a strong correlation between biological replicates (Supplementary Fig. 11c–h and Supplementary Fig. 12) and identified thousands of reliable peaks across the whole genome (Supplementary Table 4). Genomic annotations and statistical analysis of these peaks revealed that the mono-, di-, and tri-H3K4 methylation are highly enriched in the genic regions, but their distribution patterns are different. The tri-H3K4 methylation are enriched in the transcription start site (TSS) and gene body regions close to TSS, whereas the H3K4me1 and H3K4me2 are present at relatively low levels at the TSS and more broadly distributed across the gene body compared to H3K4me3, in both HX3 and ef1 (Fig. 5a and Supplementary Fig. 13), which is consistent with previous studies43,44. Especially, H3K4me2 was observed slightly higher levels around transcription end site (TES), in contrast to H3K4me1 and H3K4me3 (Fig. 5a). This is a possible reason for the lack of significant change in the global distribution of the three histone methylations on the coding genes in ef1, further supporting the hypothesis that GmLDL2 may only affect H3K4 methylation at specific loci. This hypothesis was also supported by the observation of a small proportion of genes with hyper H3K4me1/2 levels exhibiting exceptionally high enrichment of anti-GmLDL2 signals (Supplementary Fig. 14). We also analyzed both hyper- and hypo-methylated regions of H3K4me1 and H3K4me2 in the ef1 mutant, and identified the associated genes that overlapped with GmLDL2-bound genes. By comparing these differentially methylated gene sets (with increased or decreased H3K4me1/2 levels) with GmLDL2 ChIP-seq data, we found that the proportion of GmLDL2-bound genes was comparable between the hyper- and hypo-methylated groups (Supplementary Fig. 15).

a Normalized read density of H3K4me1, H3K4me2, and H3K4me3 signals at gene body regions, as well as upstream 1.5 kb and downstream 1.5 kb, lines show the average value of all genes in HX3 and ef1. b H3K4me1, H3K4me2, and H3K4me3 reads density among three groups of genes with different expression levels in HX3 and ef1. Gene body regions were divided into equal bins, upstream 1.5 kb of TSS and downstream of TES were calculated per 10 bp sliding window by read density. Metaplots represent the average of the normalized read density of genes in each group. Heatmaps show the distribution of read density of all genes. Genes in each group were sorted from high to low by gene expression levels (FPKM). TSS transcription start site, TES transcription end site. c Venn diagram showing the overlap among genes that are transcriptionally upregulated in ef1, H3K4me1 and H3K4me2 hypermethylated genes in ef1, and GmLDL2 binding genes. d Genome tracks of RNA-seq, anti-GmLDL2, anti-H3K4me1, and anti-H3K4me2 ChIP-seq data for GmFER locus in HX3 and ef1. Y-axis value means normalized read counts. Structure of GmFER. P1 to P4, specific regions used for ChIP-qPCR analysis. UTR untranslated region, CDS coding sequence. Gray block showing the higher H3K4me1 and H3K4me2 modification levels in ef1 than HX3. e qRT-PCR analysis of GmFER expression level. GmActin was used as an internal control. f, g ChIP-qPCR analysis of H3K4me1 and H3K4me2 methylation status at the GmFER locus. h ChIP-qPCR analysis of GmLDL2 associated with GmFER locus. Data represent means ± SD (n = 3 biological replicates) in (e–h). The P value was generated from a two-sided Student’s t-test. Source data are provided as a Source Data file.
Identification of GmLDL2-specific target genes
To identify the specific target genes of GmLDL2, we investigated the relationship between H3K4 methylation levels in the transcribed region and gene expression levels. A total of 52,872 genes annotated in the soybean genome were sorted from high to low by gene expression levels and genes with expression levels greater than 1 were then divided into three groups, composed of highly expressed genes (FPKM > 40), median expressed genes (10 < FPKM ≤ 40), low expressed genes (1 < FPKM ≤ 10). The results showed that the reads density of H3K4me1, H3K4me2, and H3K4me3 modifications all ranked from high to low among three groups of genes accompanied by gene expression values, indicating that H3K4me1/2/3 positively correlated with the levels of gene expression (Fig. 5b). To identify the target genes of GmLDL2, we screened the whole genome per 200 bp regions and identified regions that exhibited H3K4me1 and H3K4me2 changes that are more 1.5 folds higher in ef1 compared to HX3. In the ef1 mutant, H3K4me1 and H3K4me2 levels increased at least 0.5-fold in 3053 and 7946 regions overlapping with genes, respectively (Supplementary Data 2, 3). HyperH3K4me1 and hyperH3K4me2 peaks were both significantly overrepresented in the promoter and distal intergenic regions (Supplementary Fig. 16).
Genes overlapping with these hyperH3K4me1 and hyperH3K4me2 regions, as well as peaks from anti-GmLDL2 ChIP-seq data, were considered as candidate target genes of GmLDL2. Given that the H3K4me1 and H3K4me2 are strongly correlated with transcription levels (Fig. 5b), we compared these candidate target genes of GmLDL2 with upregulated genes in ef1 and found that only one gene (Glyma.18G270900) exhibited increased levels of H3K4me1 and H3K4me2, and also overlapped with GmLDL2 binding site (Fig. 5c, Supplementary Table 5, and Supplementary Data 4, 5). Glyma.18G270900 (GmFER) is a flowering-associated gene encoding a protein similar to Arabidopsis FERONIA45. Specifically, GmFER expression in ef1 exhibited a fivefold increase in the RNA-seq data, also showing a higher level of H3K4me1/2 at the 3’UTR and gene downstream regions (Fig. 5d). RT-qPCR and ChIP-qPCR validated the upregulated expression and increased H3K4me1/2 levels of GmFER in ef1 compared with the HX3 (Fig. 5e–g). Moreover, ChIP-qPCR assay using antibodies against GmLDL2 confirmed that GmLDL2 was directly associated with the chromatin region of GmFER (Fig. 5h). Together, these results indicate that GmLDL2 negatively regulates GmFER transcription by directly binding to its chromatin to change H3K4me1 and H3K4me2 levels.
GmLDL2 negatively regulates soybean flowering in a GmFER-dependent manner
To test the in vivo function of GmFER in soybeans, we first transformed the construction of GmFER fused to 3×Flag tags driven by the CaMV 35S promoter (35Spro:GmFER-Flag) into wild-type (Col) Arabidopsis plants. Three independent transgenic lines were selected (Supplementary Fig. 17a). Western blotting using an anti-Flag antibody demonstrated that the recombinant GmFER protein was expressed in all three lines (Supplementary Fig. 17b). In comparison to Col, the transgenic plants displayed significantly earlier flowering time under long-day conditions (Supplementary Fig. 17c, d). Such a phenomenon is consistent with the FER function in Arabidopsis flowering45, indicating that FER orthologs are relatively conserved in controlling the flowering time. Furthermore, we transformed the 35Spro:GmFER-Flag construct into soybean and obtained two independent transgenic lines (GmFER-OE) (Fig. 6a). These plants showed obviously increased levels of transcripts and protein of GmFER compared to HX3 (Fig. 6b, c). Subsequently, HX3 and transgenic plants were grown in climate chambers under short-day or long-day conditions. Both GmFER-OE transgenic lines showed significantly earlier flowering time relative to HX3 (Fig. 6d). Moreover, we searched the EMS-induced soybean mutant library46 and found three independent mutant lines of GmFER: NJAU0800, NJAU1719, and NJAU1817. NJAU0800 accommodates a G to A substitution in the coding region, resulting in an early stop at its 661 amino acid residue and the loss of a part of its kinase domain (Supplementary Fig. 18a, b). NJAU1719 harbors a single base substitution (G to A), and this missense SNP is located in the kinase domain, replacing glycine (Gly) with an arginine (Arg) (Supplementary Fig. 18a, b). NJAU1817 has the missense SNP (T to A) that is located in the malectin domain, substituting leucine (Leu) with a methionine (Met) (Supplementary Fig. 18a, b). To test the phenotypes of these soybean FER mutant alleles, we planted them in a growth chamber to measure their flowering time under long-day conditions, all three Gmfer mutant lines displayed delayed flowering time when compared to the control plants (Supplementary Fig. 18c, d). Taken together, these results indicate that GmFER promotes flowering in soybeans.

a The phenotypes of whole plants and flowers of HX3 and two GmFER-OE transgenic lines under short-day conditions. The two independent GmFER-OE transgenic plants showed early flowering compared to HX3. White arrow, magnified view. Scale bars, 10 cm. b The expression level of GmFER in HX3 and GmFER-OE transgenic plants. GmActin was used as an internal control. c Validation of GmFER expression in GmFER-OE transgenic plants by western blot with anti-Flag antibody. Actin serves as the loading control. d The flowering time of HX3 and GmFER-OE plants under SD (11 h/13 h) and LD (15 h/9 h) conditions. e The design of sgRNAs for CRISPR/Cas9 to knock out GmFER in the Gmldl2 mutant background. sgRNAs targeting CDS are labeled in red. Results of Sanger sequencing of target sites in T1 generations in HX3 and Gmldl2/fer double mutants. sgRNA target and PAM sequences were shown in red and blue, respectively. UTR untranslated region. f Predicted amino acid sequences of GmLDL2 and GmFER in HX3 and Gmldl2/fer double mutants. Missense amino acids resulting from the frameshift caused by CRISPR/Cas9-induced mutations are shown in red. The asterisk indicates the termination of translation resulting from a stop codon. g The phenotypes of whole plants and flowers of HX3, ef1, Gmldl2-2/fer, and Gmldl2-3/fer under long-day conditions. All Gmldl2/fer double mutants showed later flowering compared to ef1. White arrow, magnified view. Scale bars, 10 cm. h The flowering time of all varieties examined under SD (11 h/13 h) and LD (15 h/9 h) conditions. Data represent means ± SD (n = 3 biological replicates) in (b), and data represent means ± SD (n = 10 plants) in (d, h). The P value was generated from a two-sided Student’s t-test. Source data are provided as a Source Data file.
To determine whether the early flowering phenotype of the ef1 mutant is associated with increased GmFER expression, we generated Gmldl2/fer double mutant via CRISPR/Cas9 technology in the Gmldl2 mutant background. Two targets were designed: target 3 and 4 for editing GmFER (Fig. 6e). After stable transformation, two independent Gmldl2/fer double mutants were obtained (designated as Gmldl2-2/fer and Gmldl2-3/fer). Both Gmldl2-2/fer and Gmldl2-3/fer contain a 6 bp deletion at target 3, resulting in the loss of two amino acids at positions 146 and 147 in the malectin domain of GmFER (Fig. 6e, f). The alanine residue at position 146 is highly conserved among FER homologs (Supplementary Fig. 19), implying its functional importance. Then we planted HX3, ef1 mutant, and Gmldl2/fer double mutant plants under short-day or long-day conditions to measure their flowering. The Gmldl2-2/fer and Gmldl2-3/fer double mutants displayed a significantly delayed flowering time compared to the Gmldl2 single mutant under both SD and LD conditions (Fig. 6g, h), indicating that the Gmfer mutation suppressed the early flowering phenotype of the Gmldl2 mutant. In summary, these data strongly support the conclusion that GmLDL2 acts as a negative regulator of soybean flowering by directly modulating the transcription of GmFER.
Distinct GmLDL2 alleles drive soybean regional adaptation
Flowering time is a critical trait during soybean domestication and has enabled soybeans to expand from their origin to other regions. Several genes contributing to this trait have been demonstrated to undergo natural or artificial selection13,15,16,21,23,24, which led us to investigate whether GmLDL2 could be a domestication gene. First, we estimated the nucleotide diversity across the 2 Mb genomic region by sliding window per 5 kb spanning GmLDL2 using a public resequencing dataset consisting of 2898 soybean varieties from around the world, including 103 wild soybeans, 1048 landraces, and 1747 improved cultivars47. Specifically, we found that GmLDL2 is located in a region with a high signature of fixation index (FST = 0.26) between wild and landrace soybean populations (Fig. 7a), indicating that this gene might contribute to the early population differentiation. Meanwhile, we found that the wild soybean population exhibited a high nucleotide diversity (π) of 3.30e-3 in this region, whereas the landrace and improved cultivar populations showed markedly lower diversity, with π values of 2.32e-3 and 0.99e-3, respectively (Supplementary Fig. 20a), indicating that the region had low heterozygosity and the abundance of common alleles. However, positive Tajima’s D values of 2.24, 3.36, and 0.37 were also observed in the region containing GmLDL2 in wild soybeans, landraces, and cultivars (Supplementary Fig. 20b). The high Tajima’s D values suggested that the landrace population may shrink as a result of balancing selection. These results suggest that GmLDL2 was selected during soybean domestication, leading to different alleles rapid accumulation in landraces and cultivars.

a The nucleotide variation indicated by Fst in wild soybeans, landraces, and improved cultivars across the 2 Mb genomic regions surrounding GmLDL2. b Proportions of GmLDL2 haplotypes within each of the three germplasm groups. c The geographical distributions of cultivar accessions harboring GmLDL2H1 and GmLDL2H2 across China. NE, Northeastern region of China; NR, Northern region of China; HR, Huanghuai region of China; SR, Southern region of China. d The demethylase activity assay of GmLDL2H1 and GmLDL2H2 in N.benthamiana leaves. GmLDL2H1-GFP and GmLDL2H2-GFP expression in N.benthamiana nuclei (arrows) was visualized by monitoring GFP fluorescence (green). The histone methylation status was analyzed by immunostaining with specific histone methylation antibodies (red). The location of nuclei was visualized using 4’,6-diamidine-2-phenylindole staining (blue). Scale bars, 50 μm. e Statistical analysis of the levels of fluorescent signals in (d). Data represent means ± SD (n = 3 biological replicates). The P value was generated from a two-sided Student’s t-test. At least 200 nuclei expressed GmLDL2H1-GFP or GmLDL2H2-GFP were analyzed in each biological replicate. f The affinity-purified recombinant His-GmLDL2H1 and His-GmLDL2H2 from E.coli were incubated with calf thymus histone for H3K4me1/2 demethylase activity assay, and the histone methylation status was determined using methylation-specific antibodies. His was used as the negative control, and anti-H3 was used as a loading control. The experiment was repeated three times independently with similar results. g Quantification analysis of (f) by ImageJ. Data represent means ± SD (n = 3 biological replicates). The P value was generated from a two-sided Student’s t-test. h–j Statistics of flowering time, plant height, and grain weight of soybean accessions carrying GmLDL2H1 (n = 18 accessions) and GmLDL2H2 (n = 46 accessions) under natural short-day conditions in Guangzhou and Sanya. The P value was generated from a two-sided Student’s t-test. In all the box plots, the center line indicates the median, the edges of the box represent the first and third quartiles, and the whiskers extend to the smallest and largest data points within 1.5 interquartile ranges from the edges. Source data are provided as a Source Data file.
To explore the evolutionary origin of the different alleles in GmLDL2, we examined variations of the GmLDL2 coding region and classified six unique high-confidence haplotypes based on four SNPs in the exon: one nonsynonymous SNP resulting in an arginine-to-leucine substitution at position 699 (R699L) and three synonymous SNPs (Supplementary Fig. 21a). To further investigate the evolutionary history of GmLDL2, we analyzed the percentages of different alleles in wild soybeans, landraces, and improved cultivars. The frequency of GmLDL2H1 allele was present in 57.4% of wild soybeans, whereas GmLDL2H2 was only 7.9%, indicating that GmLDL2H1 may represent the ancestral haplotype (Fig. 7b). Furthermore, the minimum spanning tree (MST) analysis indicates that GmLDL2H1 is centrally located and connected to other haplotypes with the fewest mutational steps (Supplementary Fig. 21b), suggesting it probably was the original haplotype in soybean ancestors. Moreover, both rooted phylogenetic tree analyses and conserved sequence alignment patterns across closely related species consistently support GmLDL2H1 as the ancestral haplotype of the GmLDL2 locus (Supplementary Fig. 21c, d). Interestingly, the frequencies of GmLDL2H1 decreased to 17.1% and 7.4% in the landraces and cultivars respectively, while the GmLDL2H2 frequencies increased to 66.8% and 87.0% in the landraces and cultivars, respectively (Fig. 7b). These results suggested that GmLDL2 is likely under strong artificial selection during domestication.
Next, we assessed the geographical distribution of the two major alleles (GmLDL2H1 and GmLDL2H2) within each subset of wild soybeans, landraces, and improved cultivars, as described above. The results indicate that the GmLDL2H1 and GmLDL2H2 exhibit contrasting patterns between wild and landrace accessions. In the wild soybean accessions, the frequency of GmLDL2H1 increased from 40% in the Huanghuai region (central China) to 100% in the Northeast and Southern regions. Conversely, in landrace varieties, it decreased from 37.3% to 26.7% across these regions (Supplementary Fig. 22). In the cultivar subgroup, 27.7% of accessions from the Huanghuai regions carried the GmLDL2H1 allele, and 72.3% of accessions carried the GmLDL2H2 allele. Interestingly, the frequency of the GmLDL2H1 allele increased to 40.5% in southern China, whereas the frequency of the GmLDL2H2 allele rose to 97.2% in high-latitude regions (Fig. 7c), indicating that those two alleles of GmLDL2 might contribute to the adaptation of cultivated soybeans to low- and high-latitudes, respectively. To examine the functional significance of the two major alleles of GmLDL2H1 and GmLDL2H2, we first predicted the three-dimensional structure of GmLDL2H1 and GmLDL2H2 and found them to be similar to each other (Supplementary Fig. 23), suggesting that both alleles do not alter the protein structure. We then assessed the enzymatic activity of GmLDL2H1 and GmLDL2H2 by luminescence assays with transient expression in tobacco leaves and western blot with histone methylation antibodies. It was found that GmLDL2H1 has higher histone demethylase activity than GmLDL2H2, especially for H3K4me2 (Fig. 7d–g and Supplementary Fig. 24).
GmLDL2 is a specific H3K4 demethylase and negatively regulates soybean flowering, whereas the demethylase activity towards H3K4me2 of GmLDL2H2 is significantly lower than that of GmLDL2H1. Therefore, we wondered whether the flowering time of the soybean accessions carrying the GmLDL2H1 allele differs from those harboring the GmLDL2H2 allele. To further evaluate the phenotypic effects of these two major haplotypes, we minimized the effects of natural variations in known soybean flowering-time genes, including E1-E4, J, LUXs, and Tof16. First, we selected the accessions with no variations in the CDS of E3, E4, LUXs, Tof18, GmFT2a, and GmFT5a. Then, we excluded the soybean accessions carrying known functional alleles, including the e1-as allele (E1 transaction variant causing early flowering)26, the HT1m mutation (J splice variant conferring later flowering)25, Tof11-1 to Tof11-10 (loss-of-function alleles of Tof11), and Tof12-1 to Tof12-4 (loss-of-function alleles of Tof12)16. Next, we characterized coding sequence polymorphisms in E2 and Tof16. Two exonic nonsynonymous SNPs defined three E2 haplotypes, while three Tof16 haplotypes were classified by one exonic nonsynonymous SNP and an in-frame deletion. To evaluate the phenotypic effects of the three haplotypes of E2 and Tof16, we compared flowering time differences among different haplotypes. The results showed that haplotype 2 of E2 flowered significantly later than the other two haplotypes, while both haplotype 2 and haplotype 3 of Tof16 flowered later than haplotype 1 (Supplementary Fig. 25). Consequently, we also excluded accessions with E2H2, Tof16H2, or Tof16H3. Finally, a total of 64 soybean accessions with nearly identical genetic backgrounds were selected to assess the functions of GmLDL2H1 and GmLDL2H2. Eighteen soybean accessions harboring GmLDL2H1 allele and 46 varieties carrying GmLDL2H2 were planted in Guangzhou (23°16′N, 113°23′E) and Sanya (18°15′N, 109°30′E), China. Six individual plants were measured for each variety, and as expected, varieties carrying GmLDL2H1 showed significantly later flowering time than those carrying GmLDL2H2 (Fig. 7h), along with increased plant height and grain weight per plant (Fig. 7i, j). Considering that GmLDL2-mediated regulation of soybean flowering time depends on GmFER, we wonder whether the expression levels and H3K4me1/2 modifications of GmFER contribute to the differences in flowering time between soybean carrying GmLDL2H1 and those carrying GmLDL2H2. To test this hypothesis, we measured the expression and H3K4me1/2 levels of GmFER in both GmLDL2H1 and GmLDL2H2 varieties. The results showed that the expression levels and H3K4me1/2 modifications of GmFER in GmLDL2H2 varieties are significantly higher than those carrying GmLDL2H1 (Supplementary Fig. 26). Taken together, we speculated that the GmLDL2H1 allele could be selected and may contribute to the adaptation of soybeans to low latitudes.
Discussion
Flowering at the appropriate time has a strong impact on the geographical adaptation and grain yield of crops, including soybeans. Previous studies have identified several critical regulators for flowering time control in soybean3, but environmental factors such as epigenetic regulators are rarely reported. In this study, we identified a GmLDL2 locus that acts as a flowering repressor through forward genetic screening. Cloning and functional characterization revealed that GmLDL2 encodes a histone demethylase, and inhibits flowering and maturity through suppressing the expression of GmFER by triggering the H3K4me1 and H3K4me2 demethylation at this locus, which is independent of the canonical EC-E1 pathway. Moreover, genomic and population genetic analyses revealed that distinct GmLDL2 alleles may promote flowering and improve latitude adaptation of soybeans.
GmLDL2 encodes a member of the amine-oxide domain-containing histone demethylase. We previously demonstrated that its orthologs GmFLD, GmLDL1a, and GmLDL1b all exhibit H3K4 demethylase activity39. Here, it was shown that GmLDL2 can specifically demethylate H3K4me1 and H3K4me2 (Fig. 3), similar to its orthologs. However, the global H3K4 methylation levels were not accumulated in the Gmldl2 mutant (Fig. 5a and Supplementary Fig. 6). It is plausible that GmLDL2 is not a master regulator for global histone H3K4me1 and H3K4me2 modification, or due to functional redundancy with other histone demethylase family members. Histone demethylase contains a large number of members, including JmjC-containing proteins and other LSD-like proteins. Instead, GmLDL2 may only modulate a small fraction of the genomic loci. In Arabidopsis, AtLDL2 acts redundantly with AtLDL1 in repressing FLC and FWA expression. Hence, the ldl2 single mutant plants display no significant phenotypic change34. By contrast, the flowering time of Gmldl2 mutant is significantly earlier compared to HX3 (Fig. 2), suggesting that LDL2 is divergent between soybean and Arabidopsis. It was reported that Arabidopsis LDL2 controls flowering time by repressing FLC and FWA expression via demethylating H3K4me1/234, while the expression of FLC-like and FWA-like genes were unaltered and no change was observed in H3K4 modification at those locus in soybeans (Supplementary Data 1–3), suggesting that LDL2 may have different targets across species.
Unlike H3K4me3 which is widely considered as an active marker, the relationship between gene transcription levels and H3K4me1/2 remains largely unclear in plants36,44,48,49,50. Here, our data showed that H3K4me1/2 levels were positively correlated with gene expression in soybeans (Fig. 5b), in combination with the GmLDL2 binding sites identified by ChIP-seq, we identified only one GmLDL2-targeted candidate gene of GmFER, which is consistent with the findings that upregulated genes in ef1 are mainly overrepresented in kinase activity and post-embryonic development GO terms (Fig. 4b). This was also reflected by the fact that the Gmldl2 mutant plants does not show obvious developmental defects besides the change in flowering and maturity time. Specially, it was found that the GmFER gene had higher H3K4me1/2 levels around the TES region (Fig. 5d), which is in agreement with previous studies found that HsLSD2 primarily associates with gene bodies and downstream of genes instead of promoters, and in conjunction with the H3K9-specific G9a histone methyltransferase and elongation factors51,52,53. Additionally, H3K4me2 is usually complicated by lncRNA enrichment54, and HDA6-LDL1/2 is crucial for repressing the lncRNAs, and are non-expressed or show low expression in Arabidopsis55. Therefore, it is also worthwhile to investigate the lncRNAs related to the soybean FER gene in future studies. FER proteins are members of the CrRLK1L family and have been extensively studied. They are involved in a variety of signaling pathways, and play crucial roles in fertility development, cell growth and mechanical integrity, morphogenesis, and resistance to abiotic and biotic stresses56. In Arabidopsis, it has been demonstrated that RLF1-FER is capable of controlling vegetative reproduction and growth transition. AtFER is involved in the circadian clock pathway, as well as influencing flowering by modulating the expression and alternative splicing of FLC45. However, the homologs of GmFLC-like or the circadian clock-related genes, such as J, LUXs, and PRRs, were not significantly altered in ef1 compared to HX3 (Fig. 4d and Supplementary Fig. 9), suggesting that FERONIA is divergent in regulating flowering between Arabidopsis and soybean. In particular, our data identified a mechanism of histone modification to regulate FERONIA expression.
Crop progenitors exhibit high allelic diversity, which has largely been lost during domestication. However, natural variations from wild ancestors can be reintroduced into modern crops through breeding to enhance plant adaptation and yield. The UPA2 (Upright Plant Architecture 2), which regulates plant architecture by reducing the leaf angle in maize, originated from the wild progenitor of maize (teosinte). This allele was lost during domestication, but reintroducing it into modern maize hybrids has improved performance under high-density planting and increased yield57. In tomatoes, the HAK20 (HAK/KUP/KT transporter 20) is responsible for salt tolerance, but its functional allele has been lost during domestication. Integrating the salt-tolerant allele from wild varieties into cultivated tomatoes significantly improves their salt tolerance58. Soybean was domesticated from the temperate regions of China to expand to a wide range of latitudes around the world. Understanding the genetic basis of flowering time is of major significance for improving soybean regional adaptation. Here, our results demonstrate that soybean accessions carrying the GmLDL2H1 allele (the ancestral haplotype) flower significantly later than those carrying the GmLDL2H2 allele under natural short-day conditions, leading to enhanced yields (Fig. 7h–j). Thus, introgressing the GmLDL2H1 allele from wild soybean into cultivars can simultaneously improve soybean adaptation and yield-related traits. Since the interaction between genetics and environment determines phenotypes, the combination of GmLDL2 with other genetic factors could provide avenues to improve crop breeding.
Based on our data, we propose a model (Fig. 8) of how GmLDL2 regulates flowering time and improves low-latitude adaptation in soybeans. According to this model, GmLDL2 could undergo natural and artificial selection during soybean domestication. In the low-latitude regions, natural selection of GmLDL2H1 might have higher demethylase activity for the removal of methylated H3K4me1/2, thus resulting in the lower expression of target genes GmFER, which leads to late flowering. In contrast, the artificial selection of GmLDL2H2 in the high-latitude regions results in lower enzymatic activity to demethylate H3K4me1/2, thereby causing the higher expression of GmFER, which leads to early flowering. In summary, our study not only identified a flowering regulator GmLDL2 in soybeans, but also revealed unreported mechanistic insights for understanding the flowering of plants, hence providing important resources for improving genetic manipulation of soybean breeding.