Can data science be used to improve the chemical industry? Yes! In this blog post, I show how I used the least means statistical algorithm to build a model that estimates the mass of NaCl produced during the acid-base reaction of NaHCO3 and excess HCL.


I began by performing an acid-base reaction using excess acid to ensure that the reaction went to completion. My acid of choice was Hydrochloric Acid (HCL) and my base was Sodium Bicarbonate (NaHCO3). I ran the reaction 25 times in separate containers before weighing the mass of salt produced, which in the case of my experiment was Sodium Chloride (NaCl).

Visualizing The Data

I then used Python in a Kaggle notebook to determine some important metrics of my data and to plot two scatter plots showing the relationship between each of the two independent variables (volume of HCL and mass of NaHCO3) and the one dependent variable (mass of NaCl).

Using The Least Squares Algorithm

The last step was to use least-squares linear regression to find the best line through the data that can function as a model for estimating the mass of product formed by new values of NaHCO3 and excess HCL. The image below shows the results of this operation as well as some other important information such as the R-Squared value which describes the relationship.


As you can see above, building a model that estimates the mass of NaCl formed by an acid-base reaction of NaHCO3 and excess HCL – with some level of accuracy – is possible. I hope that this blog post inspires you to use machine learning.

It is worth noting that more investigation is needed to ensure the best results.

* Since technology is continually developing, by the time you read this blog the products used may have changed.

Mass of NaCl (g):