Why is geology sometimes said not to be a science?

11 minute read

Published: November 03, 2023

I recently came down with a severe flu. One night, lying in bed with a fever, drifting in and out of sleep, I dreamt that a journal reviewer was fiercely criticizing the discussion section of one of my manuscripts, stating “too many untenable postulations used”.

近日突得重感冒。夜间发烧卧床、半睡半醒之间，梦见一期刊审稿人猛烈地批评我某手稿中的讨论部分，称“用了太多站不住脚的假设”。

After struggling through the night, I got up the next morning and reflected on it—and realized the criticism was quite valid. Such critique, after all, is precisely part of a reviewer’s responsibility.

Based on my current understanding of the peer-review process, as long as the manuscript’s language is acceptable, the citations are appropriate, and the structure conforms to the standard “formula” of academic papers, the editor’s primary responsibilities are: (1) ensuring the manuscript fits the aims and scope of the journal; (2) checking whether the scientific problem addressed is of interest to the journal’s audience; (3) conducting a preliminary assessment of whether the conclusions advance understanding of the scientific question. The reviewer’s main duties, on the other hand, are: (1) evaluating whether the data presented are reliable; (2) examining whether the hypotheses proposed are reasonable; (3) verifying whether the logic of the argument is sound.

于是挣扎一夜。次日起床思之，顿觉有理。——这种批评本就是审稿人的职责所在。以我目前对学术同行评议的理解，在文章语言过关、引用恰切、结构符合论文之八股的基本前提下，期刊编辑的主要职责是：（1）确保手稿符合期刊的目标与宗旨(aims and scope)；（2）检查手稿的科学问题是否符合受众的兴趣；以及（3）初步审查手稿的结论是否推动了该科学问题的解决。审稿人的主要职责则是：（1）检验手稿的数据是否可信；（2）审查手稿的假设是否合理；（3）查验手稿的逻辑是否正确。

Considering the limitations of funding and experimental conditions, geologists often find it difficult to employ cutting-edge or high-risk technical methods to produce extremely precise or advanced data. In many cases, the analytical techniques used in their manuscripts are well-established and widely replicable across research groups—such as energy-dispersive spectroscopy (EDS), electron probe microanalysis (EPMA), and bulk or mineral major and trace element analyses.

Under such circumstances, the data presented in a manuscript are generally considered reliable. Therefore, the assumptions made to interpret those data and address a scientific question become crucial—often determining the fate of the manuscript.

考虑到经费和条件的限制，地质学者们往往很难动用一些挑战性的技术手段来做出极为高精尖的数据——在很多情况下，他们所在文中采用的获取测试数据的手段基本都是业内已经成熟的、其他课题组都轻而易举可以复现的，例如能谱、电子探针、全岩和矿物主微量测试等技术手段。对于这些情况，手稿中的数据一般都是可信的，因此手稿中为了论证某一科学问题而选取的假设便至关重要——这往往直接决定了一篇手稿的命运所在。

For instance, a short paper I submitted at the end of last year, which focused on the cooling rates of Chang’e-5 basalts, was rejected after the first round of peer review. The reason was that I had used a mineral growth rate parameter to calculate cooling rates, but the parameter had not been calibrated under the specific crystallization conditions of the Chang’e-5 basalts, and thus was deemed inappropriate. It wasn’t until February of this year that a foreign research group recalculated the growth rate under those exact conditions. After incorporating their result into the manuscript, the paper was eventually accepted and published.

This example clearly illustrates that the core issue of contention was the reasonableness of the assumption: whether the growth rate taken from previous literature was appropriate for application in this particular study.

举例来讲，笔者于去年年底投稿的有关嫦娥五号玄武岩冷却速率的短论文，在一审时被审稿人毙掉，原因是笔者为了得出冷却速率，在文中使用了一个矿物生长速率的参数值，而该值并没有经过嫦娥五号玄武岩结晶条件下的标定，因此被认为是不合理的。直到今年2月份国外一课题组基于嫦娥五号玄武岩的结晶条件，重新计算并得到了新的矿物生长速率值，将其应用到笔者的手稿后，文章最终才得以发表。可见这个例子的争议核心就是有关对“文献给定的生长速率是否适合应用于本研究”这一假设合理性的理解的。

In the field of geology, due to the historical and complex nature of natural samples, researchers often find it difficult to reach specific explanations for scientific questions based solely on direct observations and without relying on assumptions—or relying only on first-order principles. This methodological limitation sets geology apart from many other scientific disciplines.

For example, in space physics, the subjects of research—such as Earth’s ionosphere or solar wind activity—are typically present-day phenomena and ongoing processes. As such, key scientific questions, like “Are we entering a new Maunder Minimum?”, can often be addressed through real-time observational data and calculations based on fundamental physical principles.

In contrast, geological research is fundamentally retrospective: it aims to infer the past by examining natural samples currently available. The core challenge is to strip away, as much as possible, the overprinting effects of various post-formational processes that the sample may have undergone, in order to reconstruct the conditions, states, and environments at or even before its time of formation.

Because the past cannot be directly observed, this inferential process inevitably involves a multitude of assumptions. This is also one of the key reasons why, compared to other natural sciences, geological studies are generally more difficult to publish in top-tier journals like Nature or Science.

在地质学领域，由于自然样品的历史性和复杂性，学者们往往很难基于直接的观察、而不借助任何假设（或仅借助最基本的假设(first-order principles)）得出对某科学问题的具体解释。这一点似乎与其他科学领域的研究方法迥异。例如空间物理领域的研究对象往往是地球电离层、太阳风活动等现存的事物和“正在发生的”过程，往往只需要借助一手观测资料和第一性原理计算就可以回答诸如“我们是不是进入到蒙德极小期”等科学问题。而地质学研究思路是“推知过去”，即通过手里拿到的自然样本，尽可能地排除掉这个样本在形成以后经历的一系列改造过程的影响，来推知这个样本形成之时甚至形成之前的性质、状态和环境条件。由于无法直接观测过去，因此这个反推的过程必然充满着很多假设——这也是为什么相比其他自然科学，地质学的研究更难发表在Nature/Science等顶刊的原因。

However, the overreliance on assumptions is one of the key reasons why geology is sometimes viewed as being “less scientific.” In the field of geophysics, there exists a fundamental concept known as sensitivity testing or robustness testing. This refers to evaluating a multi-parameter numerical model by adjusting the possible value ranges of each parameter, conducting error propagation calculations or Monte Carlo simulations, to determine which parameters the model’s results are most sensitive to. This process helps assess whether the model is stable and reliable.

Theoretically, a model should only be considered scientifically valid for use in a study if it has passed such tests. However, if a model is built upon too many non-fundamental assumptions, it often cannot pass a sensitivity test—meaning it lacks scientific robustness.

Unfortunately, the field of geology is rife with such models that have not undergone sensitivity testing. And yet, in many cases, both authors and reviewers are tacitly aware of this “unscientific” aspect, but choose to accept it nonetheless. Why is that the case?

然而，太多的假设往往是地质学不像科学的重要原因之一。在地球物理领域有一个重要的概念，名叫“敏感性检验”或“稳健性检验”，指的是对于多参数的数值模型，通过调整各参数可能的取值范围，进行误差传递计算或蒙特卡洛模拟，来判断该模型的结果对哪些参数最为敏感，从而评估模型是否稳健可靠。理论上讲，一个模型只有通过了这种检验，其在所在论文中的应用才是科学的。但如果一个模型是基于太多非基本假设而得出的，其往往无法通过敏感性检验，也就是不科学的。遗憾的是，地质学领域充斥着太多这种未经敏感性检验的模型——即便如此，在很多时候，作者和审稿人也都对这种“不科学”心照不宣。这是为什么呢？

Let us consider a classic research approach in planetary materials science: I have a cumulate rock from Mars, and I want to understand its origin and the nature of its magmatic source region. To this end, I conduct SEM imaging, use EPMA and LA-ICP-MS to constrain the major and trace element compositions of its minerals, and apply SIMS to measure volatile contents and isotopic variations along mineral profiles.

After gathering the data, I proceed with the following steps:

Use empirical geobarometers and geothermometers to estimate the pressure and temperature conditions under which the sample formed;
Apply mineral–melt partition coefficients to back-calculate the trace element composition of the parent magma;
Use volatile solubility models and isotopic fractionation equations to estimate the volatile contents of the magma;
Finally, utilize thermodynamic modeling tools to explore the degrees of partial melting and subsequent fractional crystallization that the Martian mantle source must have undergone to produce the observed petrological, mineralogical, and geochemical features of this cumulate rock.

In doing so, I aim to construct a comprehensive magmatic evolution model of Mars—from source region to cumulate formation.

让我们想象一个经典的行星物质科学的研究思路——我有一块来自火星的堆晶岩，而我想知道这个岩石的成因和岩浆源区的性质。因此，我对这块玄武岩进行了扫描电镜成像，用电子探针和LA-ICPMS约束了它的矿物主微量元素含量，用SIMS测量了其矿物沿剖面的挥发分含量和同位素变化。集齐数据以后，我首先利用地质温压计经验公式来约束样品形成时的温压条件；随后利用矿物-熔体微量元素分配系数来计算母岩浆的微量元素含量；再利用岩浆挥发分溶解度模型和同位素分馏公式计算母岩浆的挥发分含量；最后，使用热力学模拟工具来探究火幔源区在多少程度的部分熔融、随后分离结晶以后才会产生观测到的火星堆晶岩的岩石学、矿物学与地球化学特征。如此一来，我将可以构建一套从岩浆源区性质到堆晶岩形成的火星岩浆演化模型。

Now, let’s consider how many non-fundamental assumptions are embedded in this process:

In the P-T estimation step, I assume that the sample’s composition and formation conditions fall within the calibration range of the geothermobarometers, and that the combined error from EPMA measurements and the empirical equations does not render the results meaningless.
In the magma composition reconstruction step, I assume that the mineral–melt partition coefficients and volatile solubility models derived from previous experimental studies are applicable to my sample, and that the rock has undergone only partial melting and fractional crystallization since its formation—excluding other possible post-formation alterations.
In the thermodynamic modeling step, I assume that the activity models of minerals used in the software are valid for this system, that the initial Martian mantle composition is correctly estimated, and that all the assumptions and derived results from the previous two steps are accurate.

This example highlights how geologic interpretations often rest on layers of interdependent assumptions—most of which go far beyond first-order physical principles.

再让我们看看这个过程蕴含有多少非基本假设。（1）在“约束温压条件”这一步，我假设了样本的化学组成和环境条件处在地质温压计的适用范围内，并且由电子探针测量带来的误差和地质温压计的拟合误差叠加起来的系统误差不会导致所得结果无意义；（2）在“计算母岩浆微量元素和挥发分含量”这一步，我假设了前人标定的矿物-熔体微量元素分配系数，以及挥发分溶解度模型是可以合理地应用于本样本的，且该样本从被观测之前只经历了岩浆源区的部分熔融和分离结晶这两个地质过程；（3）在“热力学模拟”这一步，我假设了热力学模拟涉及到的矿物活度模型是适用的，火幔的成分是正确的，以及此前的两步推导所涉及的假设与结果均是无误的。

However, are all of these non-fundamental assumptions truly valid? In most cases, the reality is that none of the assumptions listed above actually hold. For example:

The empirical geothermobarometers are not calibrated specifically for this magmatic composition.
The trace element partitioning models (such as the strain field model) carry large uncertainties even in their experimental derivation.
The sample may well have experienced additional geological processes such as assimilation, magma mixing, or secondary alteration, which are not accounted for.
Different thermodynamic modeling tools (e.g., Thermocalc, Petrolog, MELTS, SPICEs, etc.) often yield divergent results even when provided with similar inputs.

In light of these uncertainties, many geological studies essentially “work backward from the answer”—that is, the researcher begins with a desired or expected conclusion, and then tweaks parameters and assumptions throughout the modeling process to derive results that are consistent with this conclusion. As for discrepancies that the model fails to explain, they are conveniently attributed to “other geological processes not considered”.

What facilitates this practice most effectively is what I call the “stacking of assumptions”: you derive one result based on an assumption, and then use that result—along with another assumption—to derive a second result. For instance, you might first estimate the parent magma composition based on assumed partition coefficients (i.e., the degree of fractional crystallization), and then use this composition along with an assumed mantle source composition to infer the degree of partial melting.

How reliable is this final estimate of partial melting? Well, any discerning reader can probably draw their own conclusions.

然而，这些非基本假设真的都成立吗？在多数情况下，事实上，上面的假设没有一个是成立的。例如，地质温压计经验公式并非基于该特定岩浆成分而拟合；微量元素分配系数模型（即所谓的strain field model）在实验拟合之时就存在巨大的误差；地质样品很可能经历了同化混染、岩浆混合、次生交代等其他地质过程；不同的热力学模拟工具（如Thermocalc、Petrolog、MELTS、SPICEs等）会产生迥然不同的模拟结果……在这种情况下，地质学里面的很多研究都是在“对着答案出题”——即，研究者先设置一个期望得到的结论，然后设法调整上述各个计算过程中的参数和模型设置，从而得到与该期望相近的结果；模型未能解释的部分，则归因于“未考虑到的其他地质过程”。最有利于实现这一点的，莫过于笔者称之为“假设叠加”的效应——即基于一个假设得到了一个结果，然后基于该结果和其他假设再得到一个结果——例如我先基于分配系数模型的假设得到母岩浆成分（即分离结晶程度），再基于这个成分和假设的火幔成分，得到源区的部分熔融程度。最后得到的这个部分熔融程度究竟有多靠谱，想必任何一位读者都心知肚明。

Why, then, does geological research tend to rely so heavily on over-assumption?

First and foremost, it must be acknowledged that natural samples are incredibly complex. Even when considering just major elements, one must deal with Si, Ti, Al, Cr, Fe, Mn, Mg, Ca, Na, and K. Each additional element introduces another degree of freedom to the system, amplifying the “many-body problem” and the inherent chaotic nature of complex systems. Trying to determine just one or two unknowns (e.g., pressure–temperature conditions based on whole-rock composition, or trace element partition coefficients based on mineral–melt data) from such a tangled web of independently varying parameters is extremely difficult—and introduces substantial systematic uncertainties.

By comparison, think of physics or materials science: research often focuses on simple compounds made up of just two or three elements (e.g., C, N, B), enabling precise understanding of thermodynamic and kinetic behavior. If those same fields were forced to tackle the complex mineralogical systems found in nature, they might also be left “groping in the dark.”

Second, statistics remains a weak point for many geoscientists. For example, when fitting complex thermodynamic models or empirical geothermobarometers in geological studies, researchers almost never report the covariance matrix of the fitting parameters—only their error ranges. However, since these parameters are often not independent, knowing just their individual uncertainties is insufficient to conduct error propagation or perform a sensitivity analysis. Consequently, many model verification techniques that work in other scientific disciplines simply fail in geoscience. Both authors and reviewers are left without the tools to quantitatively evaluate the robustness of a model, and as a result, countless geological models degenerate into non-falsifiable “pseudo-science.”

那么为什么会导致地质学领域研究的这种过度假设呢？首先，不得不承认的是，自然样品实在太复杂了，光主量元素就要牵扯到Si、Ti、Al、Cr、Fe、Mn、Mg、Ca、Na、K这些元素。每增加一个元素，就相当于在体系中增加了一个自由度，进而导致“多体问题”的混沌现象。因此，在复杂体系中从一堆独立变化的参数中试图去求解另外一两个参数的值（例如根据全岩成分求解温压条件、根据矿物和岩浆成分拟合分配系数）本身就是件极为困难而又引入极大系统误差的事情。试想物理和材料科学的研究对象，一次不过就是研究两三种元素（如C、N、B）所组成的化合物的性质与热/动力学行为，才能构建起如此定量化的科学大厦——真正遇到自然体系中复杂的矿物组成，恐怕只能“两眼一抹黑”了。其次，统计学是许多地质学学者们的短板。例如，在诸如复杂体系热力学方程和地质经验公式的拟合过程中，几乎没有工作报道拟合参数的协方差矩阵，而是只给出各参数的误差范围。由于这些参数往往不是各自独立的，因此仅凭这些误差范围无法利用误差传递等方式进行敏感性检验。这就导致能应用于其他学科的模型检验方法，在地学领域失效。作者和审稿人从而也就无法定量评估模型的稳健型，使得无数地学模型就此沦为不可证伪的“玄学”。

People often say that the 21st century is the century of life sciences, a field in which profound breakthroughs have been made thanks to our molecular-scale understanding of biological systems, especially genetics. In the same spirit, one could imagine that geoscience may only emerge from its current “pre-scientific” state when we can understand and predict the thermodynamic and kinetic behavior of multi-element complex systems at the atomic scale.

So rather than saying “geology is not a science,” perhaps it is more accurate to say:

“Geology is a science of the future.”

Only when mathematics, physics, chemistry, and materials science reach far more advanced levels will humanity truly be able to comprehend the profound complexity of natural crystalline systems—and the mysterious, ever-changing processes of our planet and beyond.

都说21世纪是生命科学的世纪，其核心莫过于人类在基因（分子）尺度上对生命的运作方式产生了重大理解和突破成就。可以想见，如果人类也能对多元素复杂体系在原子尺度上理解并预测其热动力学行为，地质学或许才能真正突破“玄学/伪科学”的阴霾。所以，与其说“地质学不是科学”，或许不如说“地质学是未来的科学”——只有数学、物理学、化学和材料科学等学科发展到比目前高得多的水平，人类才能真正理解高深复杂的自然晶体体系，以及变幻莫测的地质过程。

Share on

Bluesky Facebook LinkedIn X (formerly Twitter)

Zilong Wang

Why is geology sometimes said not to be a science?

Share on

You May Also Enjoy

Hometown Delicacies

“Overman” & “Will to Power” in Thus Spoke Zarathustra

Symphony No. 2 (Mahler): IV. “Urlicht”

Gazing at the Stars, grounded in Reality