Article Content
1. Introduction
Solar flares, among the most energetic explosions in the solar system, are sudden and intense releases of energy originating from magnetically complex active regions on the Sun. These events are driven by magnetic reconnection, a fundamental plasma process where the reconfiguration of magnetic field lines liberates vast amounts of stored energy, resulting in the impulsive heating and acceleration of plasma and the emission of radiation across the entire electromagnetic spectrum, from radio waves to gamma rays [1, 2]. This energy release is intimately tied to the interplay between the Sun’s convective motions and its magnetic field, with convective flows twisting and shearing magnetic field lines, building up stress that is ultimately and explosively released through reconnection [3, 4]. The resulting intense radiation, particularly in the X-ray and extreme ultraviolet (EUV) wavelengths, can significantly impact the Earth’s ionosphere, causing disruptions to radio communication and navigation systems and posing a hazard to astronauts and satellites [5]. Understanding the physical processes that govern solar flares is therefore crucial for both advancing our knowledge of fundamental astrophysical processes and mitigating the potential societal and economic impacts of space weather.
The Solar Flare Index (SFI) serves as a crucial metric in solar-terrestrial physics, quantifying the daily level of solar flare activity based on the integrated intensity of observed flares [6]. Numerous studies have confirmed that solar flares have a more significant impact on Earth compared to other solar activity indices [7, 8]. The SFI time series, encompassing over five decades of historical observations, provides a valuable long-term record of solar flare activity. Exhibiting nonlinear and multiperiodic behavior, SFI is often characterized as chaotic time series [9]. This inherent complexity makes SFI a valuable tool for investigating the multifaceted nature of solar activity, encompassing both long-term trends and short-term fluctuations [10]. In recent years, SFI has gained recognition as a particularly effective metric for characterizing the long-term evolution of the heliospheric environment.
Ataç and Özgüç conducted extensive analyses of SFI data spanning from Solar Cycle (SC)-20 in 1987 to SC-24 in 2022 [11–14]. Their work included a detailed characterization of SFI as a solar activity measure [11], investigations of north-south asymmetry in SFI data across SCs [12], explorations of the relationships between SFI and other solar activity indicators [13], and the identification of periodic behaviors in SFI during various SCs [14]. Furthermore, statistical analyses of previous SCs reveal that the largest sunspots and flares typically occur several years before or after the peak of the sunspot cycle [15]. Deng, Mei, and Wang applied ensemble empirical mode decomposition to investigate the periodic variations and phase relationships between sunspot activity and grouped solar flares from January 1965 to March 2009. Their analysis revealed an average lag of 7.8 months for the 11-year Schwabe cycle of solar flares relative to sunspots, suggesting that this systematic phase delay is driven by inherent signals within the SC itself [16].
This study aims to forecast the timing of peak solar flare activity, as quantified by SFI in SC-25, contributing to improved space weather prediction capabilities. To achieve this objective, a previously developed and validated long short-term memory (LSTM+) neural network model is employed, which is demonstrated to be effective in predicting solar activity parameters. The LSTM + architecture is well-suited to capture the complex temporal dynamics inherent in SFI data. The model’s predictive performance for SC-25 is assessed through application to historical SFI data from previous SCs. Subsequently, the validated model is utilized to predict the timing of peak SFI in the ongoing SC-25. Furthermore, the relationship between SFI and sunspot number (SSN) is investigated to corroborate and refine the predicted timing of the SFI peak. It is anticipated that accurate prediction of SFI, coupled with analysis of its relationship with SSN, could enhance our understanding of solar flare behavior and contribute to more informed decision making regarding space weather mitigation strategies.
2. Data Source
In 1952, Kleczek introduced the metric “Q = i ∗ t” to quantify daily flare activity by incorporating both the intensity and duration of flares observed over a 24-hour period. He proposed that this relationship could provide an approximate estimate of the total energy emitted by flares. In this formulation, “i” represents the importance scale of the flare’s intensity, while “t” denotes the duration of the flare in minutes [17]. The daily and monthly SFI values used in our study come from the new SFI database, which is a composite record. It includes data from the Astronomical Institute of the Ondřejov Observatory, Czech Academy of Sciences (1937–1976), and the records of the Kandilli Observatory in Istanbul, Turkey (1977–2020) [18]. And the SSN data (Version 2) are from the World Data Center SILSO, Royal Observatory of Belgium, Brussels (https://www.sidc.be/SILSO/datafiles).
Given the inherent noise in the monthly averaged SFI data, Gaussian filters were used to smooth the time series, allowing for the accurate determination of minima, maxima, and their respective dates. Gaussian filters are well-suited for this task due to their characteristic shape in the frequency domain, which effectively attenuates high-frequency variations while preserving long-term trends [19]. The equation of the Gaussian-shaped filters was as follows [15], where z is the time in months and 2a (24-month) is the full width at half maximum (FWHM) of the filter, and −2a + 1 ≤ z ≤ +2a − 1.
()Figure 1 shows the unsmoothed and Gaussian-smoothed monthly average values of SFI and SSN, which highlights the overall trend and cyclical nature of both SFI and SSN, while minimizing the influence of short-term fluctuations.

Figure 1
3. Prediction Model
The optimal LSTM + model, an optimization of the conventional LSTM architecture, was employed for prediction. The impact of parameter adjustments on prediction results, along with detailed model specifics, is thoroughly discussed in our prior work [20, 21], where its excellent performance in predicting solar parameters has been demonstrated. The LSTM + model also includes components such as the forget gate, input gate, output gate, and a single memory-state unit [22]. Nevertheless, LSTM + incorporates two primary optimized processes: the reforecast (RF) procedure and the fine-adjustment of parameter (FAP) process, involving parameters such as the filter, the number of neurons (N), batch size (B), and epochs (E).
3.1. The RF Procedure
The reforecast (RF) procedure represents a critical optimization within the LSTM + framework. Unlike traditional LSTM models that rely on historical values for subsequent predictions, RF uses the latest forecast output as input. This approach captures genuine time series trends, rather than overfitting to past data. While historical data confirm a model’s ability to reproduce past behavior, it fails to reflect predictive accuracy for future data. By iteratively feeding forecast outputs, the RF process improves trend detection and enhances predictive performance, offering a robust evaluation of the LSTM + model’s forecasting capability. Figure 2 depicts the workflow of the RF procedure, where i denotes the number of forecasts, n is the length of the input, m is the length of the output, xt+i represents the input value, and ht+i represents the output values.

Figure 2
3.2. The FAP Process
The parameter N, representing the number of units in the LSTM model’s hidden layer, exerts a significant influence on its performance. An excessively large N can lead to increased computational cost and potential overfitting, manifested as prolonged training times and a higher risk of converging to local minima during optimization. Conversely, an insufficiently small N may limit the model’s capacity to capture the intricate temporal dependencies inherent in solar activity data, ultimately hindering its predictive capabilities. Similarly, the batch size (B), which dictates the number of data instances processed during each training iteration, plays a crucial role in the model’s learning dynamics. A small B can impede the model’s convergence and slow down the training process by introducing excessive noise into the gradient updates.
Conversely, larger values of B can enhance computational efficiency by improving memory utilization and enabling greater parallelization, potentially accelerating the training process. However, this benefit comes at the cost of potentially requiring more training epochs to achieve comparable levels of accuracy. Therefore, a careful balance must be struck when selecting B, taking into account the specific characteristics of the solar activity dataset and the available computational resources. Similarly, the choice of N involves a trade-off between model complexity and computational burden. Achieving optimal convergence speed while maintaining high predictive accuracy necessitates meticulous experimentation and parameter tuning [23].
3.3. Evaluation Indices
This study aims to predict SFI trend during SC-25, specifically focusing on the timing of peak solar activity and its magnitude relative to the previous cycle. To assess the performance of the employed LSTM + model, four sets of evaluation metrics were selected. The Nash–Sutcliffe model efficiency coefficient (NS), ranging from negative infinity to 1, was utilized to quantify the predictive power of the LSTM + model. A perfect forecast yields an NS value of 1. To evaluate the accuracy of the predicted peak SFI and its timing, the absolute percentage error of the peak value (EP) and the absolute percentage error of month of peak occurrence EMP were employed. In addition, to compare the trend variations between the predicted and actual values, the Pearson correlation coefficient (r) was also calculated. The definitions of all evaluation indicators are as follows. VAP is the peak value of the actual value, VPP is the peak value of the predicted value, VAi denotes the i − th actual value, VPi denotes the i − th predicted value,
is the average of the actual values,
is the average of the predicted values, and MAP and MPP represent the actual and predicted month value of the peak value, respectively.
()4. Results and Discussion
The LSTM + model employed in this study was trained and validated using a multistage approach. Initially, the model was trained with the SFI data from SC-18 to SC-22 to predict SC-23, and subsequently with data from SC-19 to SC-23 for predicting SC-24. For the prediction of SC-25, the model utilized the parameter combination that yielded the most favorable results across all evaluation indices for both the SC-23 and SC-24 predictions. This selection ensured the utilization of a parameter set demonstrated to produce accurate and reliable predictions based on past performance.
Drawing upon our established expertise in applying LSTM + models to the prediction of solar and solar wind parameters [24, 25], we employed a comparable parameter configuration in the present study. N was explored within the range of 20–100. The model was trained for either 100 or 500 epochs (E), with subsequent fine-tuning encompassing a ±5 epoch window around these initial values. A fixed B of 100 was implemented consistently across all experiments. Input sequence lengths, ranging from 20 to 120-time steps, were adapted to the specific dimensions of the dataset, while output sequence lengths corresponded to prediction horizons of 4-, 6-, and 12-time steps.
Prior to model training, a crucial preprocessing step of SFI normalization was performed. This process, conducted independently of the LSTM + model development, involved scaling the SFI data within each SC to a range of 0–1 based on the observed minimum and maximum values within that cycle. This normalization aimed to mitigate the inherent differences in cycle intensities, a factor that can confound long-term solar forecasting efforts. By reducing the influence of absolute SFI magnitudes, the model could focus on discerning the underlying temporal patterns of solar activity, such as the timing and shape of solar peaks, without being unduly influenced by variations in peak activity levels across different cycles. The decision to implement this independent normalization stemmed from the research objective, which prioritizes characterizing the temporal evolution of SFI within SC-25, rather than making direct comparisons of absolute intensities across cycles. Therefore, normalization was performed outside the LSTM + model framework to ensure that the model’s predictions remained focused on capturing the time-dependent trends and cyclical structure of solar activity, as reflected in the SFI data.
Figure 3 depicts the normalized SFI values from SC-18 to SC-24, including actual values (black line) and predicted values for SC-23 (brownish-yellow line) and SC-24 (blue line). Table 1 summarizes the results of the four evaluation indices. The absolute percentage error of the predicted peak SFI for SC-23 and SC-24 was 0.61% and 2.69%, respectively. Furthermore, the predicted timing of these maxima deviated by only one month from the observed dates. Pearson correlation coefficients between predicted and actual SFI values were highly significant (p < 0.01), reaching 0.993 and 0.951 for SC-23 and SC-24, respectively. The NS for SC-23 was nearly 1, indicating a near-perfect prediction. While the NS for SC-24 was slightly lower at 0.64, it remained within an acceptable range, suggesting a good overall prediction performance.

Figure 3
| NS | EP (%) | EMP (in month) | r | |
|---|---|---|---|---|
| PV-SC-23 | 0.98 | 0.61 | −1 | 0.993 ∗∗ |
| PV-SC-24 | 0.64 | 2.69 | 1 | 0.951 ∗∗ |
- ∗∗Correlation is significant at the 0.01 level (2-tailed).
The LSTM + model predicts a distinct cyclical pattern for SC-25, characterized by a rapid rise in the SFI to a peak, followed by a gradual decline, as shown in Figure 4. The model’s predictions exhibit a strong positive correlation with both the unsmoothed and Gaussian-smoothed monthly average SFI, with Pearson correlation coefficients of 0.629 and 0.892, respectively (p < 0.05). This indicates a statistically significant relationship between the predicted and observed SFI values. The model successfully captured the underlying trend and timing of the SFI maximum, projected to occur in January 2025.

Figure 4
Analysis of nonsmoothed monthly mean SSN and SFI data reveals a predominantly lagging relationship between the timing of their respective peaks across SC-18 to SC-24. This temporal offset, quantified in months, ranges from −1 month (SFI leading SSN in SC-19) to 14 months (SFI lagging SSN in SC-21). While the SFI and SSN peaks were nearly concurrent in SC-18, SC-22, and SC-23, pronounced lags were evident in SC-20, SC-21, and SC-24 (12, 14, and 8 months, respectively). This generally lagging behavior of the SFI peak and its notable intercycle variability warrant further investigation to elucidate the underlying physical mechanisms. As of the present date, the observed maximum in unsmoothed monthly average SSN for SC-25 occurred in August 2024, which may represent the ultimate peak for this cycle. Based on the aforementioned historical observations, the SFI peak could lag behind that of SSN by a period ranging from 0 to 12 months. Therefore, if the observed SSN maximum in August 2024 is indeed the cycle peak, the corresponding SFI peak is expected to occur within this timeframe. Our prediction of an SFI peak in January 2025 aligns with this historical range, falling within 5 months of the observed SSN peak. Further observations and analysis throughout the progression of SC25 will be necessary to definitively assess the accuracy of the model’s prediction and refine our understanding of the relationship between SFI and SSN maxima.
5. Conclusion
This study investigated the long-term prediction of the SFI. The validity of the LSTM + model was established through training and prediction on data from SC-23 and SC-24. The model demonstrated good overall prediction performance, evidenced by low absolute percentage errors for predicted peak values and occurrence times, as well as high Pearson correlation coefficients and Nash–Sutcliffe efficiency (NSE) values.
Our analysis confirms a positive correlation between SFI and SSN, consistent with the understanding that both phenomena are intrinsically linked to solar magnetic activity. Sunspots, as manifestations of intense magnetic fields, serve as precursors to solar flares, which are abrupt releases of magnetic energy. Higher SSNs generally correspond to increased solar activity, thus implying a greater probability of flare occurrence and potentially higher flare intensities. However, the relationship between sunspot emergence and flare occurrence is not strictly linear, as flare occurrence is modulated by a multitude of factors beyond SSN, including magnetic field topology and energy accumulation rates. Some sunspot regions may not produce flares, and conversely, some flares may occur during periods of relatively low SSNs.
At December of 2023, the unsmoothed monthly averaged SFI has reached the maximum in SC-25 as 8.495. This peak lagged behind the SSN unsmoothed monthly average peak of 160.50 (June 2023) by approximately half year. Considering the subsequent decrease and then increase value of SSN to 215.50 (August 2024), we predict a potential second, higher SFI peak after January 2024. Based on our prediction model, the peak SFI value for SC-25 is projected to occur in January 2025. Taking into account the 24-month Gaussian smoothing window and the LSTM + model’s accuracy for predicting SC-23 and SC-24 (within ±1 month), the margin of error is estimated to be 13 months. Therefore, the maximum SFI value for SC-25 is anticipated to occur within the timeframe of December 2023 to February 2026.
The application of the LSTM + model to the prediction of solar parameters did not reveal significant limitations. The data normalization method employed in this study effectively harmonizes the scale of data in each SC. While the true values of the predicted outcomes can be inferred, this approach to data processing proves particularly suitable for forecasting critical moments in the behavior of the research object. Predicting the trends and peak timing of SFI in the upcoming SC holds substantial scientific and practical significance. Scientifically, it contributes to a deeper understanding of solar activity mechanisms, enabling the validation and refinement of existing solar activity models. Practically, accurate predictions of solar flare activity and its impact on Earth’s space environment can provide a scientific basis for implementing protective measures for spacecraft operations, communication and navigation systems, and power grids. These proactive measures can mitigate the potential hazards of solar activity and enhance the resilience of human society to space weather events [26].