Multivariate Statistical Analysis of Coal in Dahor Formation, Borneo Island, Indonesia: A Comparative Study Utilizing Principal Component Analysis (PCA)

,


Introduction
Coal, being a vital energy source for centuries, has garnered extensive knowledge regarding its use. Its utility as a heat source and the diverse array of by-products derived from coal have been well understood. However, the comprehensive examination of coal's mineral composition, except for sulfur and iron, has only recently received attention. The continued and increasingly large-scale use of coal in the United States and numerous other industrialized and developing nations has led to heightened concerns about known hazards and speculation regarding potential additional hazards to environmental quality and human health. Consequently, there is still much to be discovered about both the harmful and beneficial attributes of coal, with the aim of minimizing harm to humans and the environment, while maximizing its utility for the betterment of society.
Presently, the Indonesian coal industry heavily relies on domestic consumption, with nearly 70% of the country's coal production being utilized by the state-owned Electricity Company as fuel for electricity generation. Approximately 10% of the coal is used in cement manufacturing, while the remainder is employed as industrial or process fuels in metallurgy. In line with national energy policies, the Indonesian government has announced plans to increase the usage of coal for domestic purposes while reducing coal exports. By 2025, Indonesian coal is projected to constitute approximately 33% of the country's total energy [1].

Figure 1. Location of the investigation area The Barito Basin, comprising the Warukin Formation and
Dahor Formation, is the geological product of interest in this study. While both formations are coalbearing, this report specifically focuses on the Dahor Formation. Previous investigations conducted by [2] encompassed petrography and geochemical analyses of the Dahor Formation.
The primary purpose of coal classification systems is to differentiate coal based on its physical characteristics, enabling the evaluation of its quality and economic value for various utilization purposes. Coal classification also provides information regarding specific coal properties that can serve as cutoff values for estimating coal resources and reserves. These properties include ash yield, calorific value, and total sulfur content. Coal can be classified based on its scientific properties, such as elemental composition, as well as commercial properties that determine its market value for specific coal uses, such as burning or carbonization. These commercial properties encompass coking or caking properties, calorific value, durability, grind ability, water content, and others.  Proximate analysis is a straightforward method employed to assess coal properties. In the literature, the terms "ash" and "inorganic mineral substances" are often used interchangeably. Ash refers to the residue left behind after coal combustion, and its composition varies based on the number and chemical characteristics of the mineral substances present in the coal. These mineral substances undergo thermal transformations during heating [3]. Hence, the analysis and testing of coal samples are necessary to gain insights into the quality of coal in the study area.
In this study, multivariate statistical analysis, specifically the Principal Component Analysis (PCA) method, is employed for the statistical analysis of coal samples. Multivariate statistical analysis is wellsuited for datasets with multiple correlated variables. The PCA method allows for the identification of the key variables contributing to sample quality. Parameters such as total moisture, volatile matter, fixed carbon, and ash content are examined. The application of PCA in statistical analysis has been demonstrated in various fields, including hydrogeology [4].
Given the aforementioned background, the objective of this study is to discern the differences in coal samples using multivariate statistical analysis. Each variable contributes to the overall quality of the coal samples, and their relative influences facilitate the classification and interpretation of samples based on their geological history and chemical properties. The outcomes of this study will contribute to a better understanding of coal characteristics, aiding efforts to reduce its harmful effects on human health and the environment while maximizing its beneficial applications.

Geological background
The study area is situated in Borneo, also known as Kalimantan, which is part of the micro-Sunda continental plate. In the past, it was a component of the Eurasian plate [5]. The basement rock of Kalimantan consists of Mesozoic granite. During the Cenozoic era, the island underwent subduction processes, resulting in the formation of Cenozoic magmatic products [6]. Since the Paleocene period, weathering processes have occurred on the island, leading to the generation of various sedimentary products. The abundant sedimentation has caused the slopes to flatten. From the Eocene epoch onwards, tidal environments have developed in certain parts of the island. Particularly from the Miocene period, the island experienced a regression process that continued until the Pleistocene [7].
The datasets used in this study were sourced from the publication by the Ministry of Energy and Mineral Resources, Indonesia, focusing on the petrography and geochemistry of coal samples from the Barito Basin on Borneo Island [2] (refer to Figure 2). Based on the geochemical analysis conducted by Wibisono et al. (2019) [2], it was determined that the majority of the samples belonged to the Lignite-Sub-bituminous class in terms of maturity (see Figure 3). The coal outcrops in the Barito Basin are associated with the Dahor Formation, which was deposited during the middle Miocene to early Pleistocene period [8]. This formation, along with the Warukin Formation, is known for its abundant coal deposits, which were formed during a regression sedimentation phase of the Barito Basin [9]. The regression sedimentation phase provided an ideal environment for peat formation and subsequent coalification.

Methodology and the datasets
To determine differences in coal samples, various parameters such as total moisture, volatile matter, fixed carbon, and ash content were used as data. Proximate analysis was employed as a method to characterize coal and understand its characteristics. It is well-known that Kalimantan Island, as a prominent coal producer in Indonesia, exhibits distinct coal characteristics across different regions. By comprehending these coal characteristics, qualitative and statistical differences can be observed through visual calculations using statistical analysis, specifically the PCA method.
Coal originates from the formation of peat in mires that provide favorable conditions for peat preservation. These waterlogged environments, often filled with standing or slowly moving water, promote prolific plant growth. When plants fall into the mire, they become submerged, slowing down or preventing rapid decomposition. This results in the accumulation of plant matter and the formation of peat. The organic matter composing peat can either accumulate in place or, less commonly, be transported by flowing water and accumulate nearby. Most coal beds are formed from plant material that accumulates in situ. The process of coal formation involves two primary processes: peatification and coalification (refer to Figure 3).
Coal rank plays a significant role in determining coal quality. Rank refers to the stages in the gradual and natural process of "coalification," during which buried plant matter undergoes transformations, becoming denser, drier, richer in carbon, and harder. The major coal ranks, ranked from lowest to highest, include lignite (also known as "brown coal" in some regions), sub-bituminous coal, bituminous coal, and anthracite. An example of coal ranking can be found in Table 1, as provided by ASTM [10], [11] Coal geochemistry analysis involves two main types: ultimate analysis and proximate analysis. The ultimate analysis aims to determine the chemical elements present in coal, including carbon, hydrogen, oxygen, nitrogen, and sulphur. On the other hand, proximate analysis is conducted to determine the coal rank based on parameters such as total moisture, fixed carbon, volatile matter, and ash content. According to the Encyclopedia of Coal (2009) [12], the proximate analysis consists of the following components; Proximate analysis of coal involves the determination of several parameters. The first parameter, total moisture, is divided into three types. Free moisture refers to water that sticks to the coal's surface, filling cracks and capillary holes. Its quantity can be determined by comparing the weight of the coal before and after drying it at room temperature or under sunlight at 30°-40°C for 1-2 hours until the weight stabilizes. Inherent moisture, on the other hand, represents water trapped within the internal pores of the coal during its formation. Lower-ranked coals generally have higher inherent moisture content. The total moisture is the combined weight of free moisture and inherent moisture. Another parameter, volatile matter, refers to the portion of coal that vaporizes when heated, including gases like CH4 (methane) and others. Ash content is the inorganic residue that remains after the complete combustion of coal. Lastly, fixed carbon is calculated by subtracting the total percent of moisture, ash content, and volatile matter from 100%. These parameters are essential in characterizing the composition and properties of coal.  Proximate analysis is a crucial method used to evaluate the characteristics of coal. It involves the determination of several key parameters. Total moisture, one of the parameters, includes free moisture and inherent moisture. Free moisture refers to water that adheres to the coal's surface, filling cracks and capillary holes. Its quantity can be measured by weighing the coal before and after drying it at room temperature or under controlled conditions. Inherent moisture, on the other hand, represents water trapped within the internal pores of the coal during its formation. Another parameter, volatile matter, refers to the fraction of coal that evaporates when heated. It includes gases such as methane and other volatile components. Ash content is the inorganic residue that remains after the complete combustion of coal, representing minerals and other non-combustible substances. Lastly, fixed carbon is calculated by subtracting the percentages of moisture, volatile matter, and ash from 100%. The proximate analysis provides valuable insights into the composition and characteristics of coal, assisting in its classification and utilization in various industries. The results of the proximate analysis were obtained from 42 core drill samples taken from 10 coal layers within the study area, as shown in Table 3. The analysis, based on a dry air basis (adb), reveals volatile matter content (%VM) ranging from 29.19% to 47.72%, moisture (%M) ranging from 7.37% to 13.74%, fixed carbon (%FC) ranging from 24.85% to 50.39%, and ash (%Ash) ranging from 3.41% to 38.6%. Additionally, calorific value (CV), HGI (Hardgrove Grindability Index), and density were determined through combustion and physical properties analysis. The results, based on adb, indicate a range of calorific values (3,539 cal/g to 5,803 cal/g), density (1.4% to 1.75%), and HGI (38 to 64).
Multivariate analysis [13], specifically Principal Component Analysis (PCA), is a powerful statistical method used to analyze complex data sets with multiple variables. In the context of coal analysis, multivariate statistical analysis helps uncover relationships and patterns among parameters such as total moisture, volatile matter, fixed carbon, and ash content. PCA, as a specific technique within the multivariate analysis, reduces the dimensionality of the data by transforming it into a new set of uncorrelated variables called principal components. These components capture the most significant sources of variation in the data [14]. By applying PCA to coal analysis, it becomes possible to identify the key factors influencing coal quality and to visualize the data in a simplified manner. This methodology enables a deeper understanding of the interrelationships between different coal properties, aiding in quality evaluation, resource estimation, and decision-making related to coal utilization.

Results and discussion
The multivariate analysis encompasses statistical methods that analyze multiple measurements (variables) for each object simultaneously [15]. It can be categorized into independent variables (variable interdependence) and dependent variables (variable dependency). Principal Component Analysis (PCA) is employed to reduce the dimensions of a data set with numerous dependent variables of the data. It accomplishes this by transforming the data into a smaller set while preserving uncorrelated data that contribute significantly to the variation of the original data [14].
By applying PCA to the coal analysis in this study, assumptions are made to simplify and provide clear results. These assumptions are based on linearity, which simplifies the problem in two ways: by limiting the potential set of bases and assuming the data set is continuous [14]. The primary goal of PCA is to transform the linear matrix P, derived from the m x n X matrix data set, into new data represented on a Y basis. This transformation involves finding the coefficients of the principal components that capture the most significant variation in the X data set [14].
During the analysis, a correlation matrix is generated to illustrate the correlation coefficients between the variables. This matrix serves multiple purposes, including summarizing the data, providing input for more advanced analyses, and serving as a diagnostic tool [13]. It allows researchers to gain insights into the relationships between different variables and identify potential patterns or associations. The results obtained from PCA analysis reveal the presence of four distinct quadrants in the processed coal data. Quadrant I indicated that coal within this quadrant is primarily influenced by Volatile Matter and Total Moisture. Quadrants II and III show that coal in these quadrants is primarily affected by Ash content. On the other hand, Quadrant IV represents coal samples that are predominantly impacted by Fixed Carbon [4]. These findings provide valuable insights into the interrelationships between the different parameters and help in understanding the factors that contribute to variations in coal quality.
In the proximate analysis of the coal samples, which involved examining parameters such as total moisture, volatile matter, and fixed carbon, the observed differences were relatively insignificant, except for the ash content [16]. The variation in ash content serves as an indicator of impurity minerals that accumulate within the coal seam, thus exerting an influence on the overall quality of the coal. Furthermore, the value of fixed carbon is influenced by the levels of volatile matter, ash content, and moisture. Higher concentrations of volatile matter, ash, and moisture correspond to lower fixed carbon content. Volatile matter, consisting of flammable gases like carbon monoxide and methane, plays a pivotal role in the combustion process. Coal with a higher volatile matter content tends to burn more rapidly, resulting in a lower coal rank [16][17]. These findings offer valuable insights into the interrelationships between different parameters in coal analysis and their implications for coal quality and utilization. By employing multivariate analysis, specifically Principal Component Analysis (PCA), researchers are able to gain a deeper comprehension of the intricate dynamics among various coal properties. This empowers them to make well-informed decisions regarding coal quality evaluation, resource estimation, and the formulation of effective strategies for coal utilization [18][19][20][21][22]. The application of PCA and other multivariate techniques provides a comprehensive understanding of coal characteristics and aids in the development of optimal approaches for harnessing the potential of coal resources in a sustainable and efficient manner. Continued research and analysis in this field will contribute to advancements in coal science and the improvement of coal-related industries.

Conclusion
The utilization of Principal Component Analysis (PCA) on the coal dataset has yielded intriguing results, unveiling the existence of three distinctive clusters. Cluster 1 predominantly encompasses data characterized by elevated values of Volatile Matter (VM) and Total Moisture (TM). Cluster 2, on the other hand, is predominantly composed of data exhibiting high Ash content. Lastly, Cluster 3 showcases data points exhibiting notable values of Fixed Carbon (FC) [4].
The analysis further indicated that the majority of data points were concentrated within Clusters 1 and 2, signifying the prevalence of high VM, TM, and Ash values. This combination of parameters contributes to a diminished Fixed Carbon content and consequently leads to lower calorific values. Consequently, the coal ranking within the study area is relatively low, specifically classified as lignite to sub-bituminous.
These findings underscore the significance of incorporating VM, TM, Ash, and FC parameters in coal analysis due to their substantial impact on coal quality and calorific value. Understanding the distinct characteristics and ranking of coal holds paramount importance for facilitating effective utilization and informed decision-making processes within the coal industry.
Further investigations and comprehensive analyses can be conducted to explore additional factors that may influence coal quality and ranking, thereby enhancing our understanding of the coal deposits present in the study area. Such insights have the potential to contribute significantly to improved resource estimation, the development of enhanced utilization strategies, and the establishment of more efficient and sustainable coal-based processes [23], [24], [25]. By expanding our knowledge in these areas, we can pave the way for advancements in coal science and technology, promoting more responsible and optimized utilization of this vital energy resource.