In this post we will see how to extract numbers from chemical formula in excel.
Suppose we have a list of chemical formulas as shown below and we would like to pull the number after each of these letters e.g. C,H,N,O etc. the number for atoms can be either single, double or (in the case of H only) triple digit.
Thus the data looks like this:
- C20H37N1O5
- C10H12O3
- C20H19N3O4
- C23H40O3
- C9H13N1O3
- C14H26O4
- C58H100N2O9
We would like to have each element number for the list in separate columns. So in the first example it would be 20 37 1 5 corresponding to C20H37N1O5
We will write a function in VBA that will extract numbers from chemical formula of a compound and list down them below their atomic symbol.
Step 1
Open VB editor from Developer tab in excel as shown in the figure.
Step 2
Insert a module in which we will write a function that will extract numbers from chemical formula.
Step 3
Now double click the newly created module to open that and paste the below code in the module.
Option Explicit
Public Function ChemRegex(ByVal ChemFormula As String, ByVal Element As String) As Long
Dim strPattern As String
strPattern = “([CNHO])([0-9]*)”
‘this pattern is limited to the elements C, N, H and O only.
Dim regEx As New RegExp
Dim Matches As MatchCollection, m As Match
If strPattern <> “” Then
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern
End With
Set Matches = regEx.Execute(ChemFormula)
For Each m In Matches
If m.SubMatches(0) = Element Then
ChemRegex = IIf(Not m.SubMatches(1) = vbNullString, m.SubMatches(1), 1)
‘this IIF ensures that in CH4O the C and O are count as 1
Exit For
End If
Next m
End If
End Function
Step 4
Add reference to regex under Tools then References and select Microsoft VBScript Regular Expression 5.5
Step 5
Now, create a row with all the atomic elements that you may have in your chemical formulas from which the numbers will be extracted as shown below in the picture e.g. C,H,S,O,N etc
Insert the formula =ChemRegex($A2,B$1) in the cell B2 that will extract the number of element in the chemical formula for the atomic element that we have in cell B1.
Now copy and paste the formula through entire range and we will get the numbers extracted from chemical compound in excel sheet.
We can use this function to extract the numbers from any chemical formula including the formula that has parentheses in it and you can customize the values in Row 1 to accommodate all the atomic elements that you may have in your list.
Hope this helped.