# Extract Numbers From Chemical Formula In Excel

In this post we will see how to extract numbers from chemical formula in excel.

Suppose we have a list of chemical formulas as shown below and we would like to pull the number after each of these letters e.g. C,H,N,O etc. the number for atoms can be either single, double or (in the case of H only) triple digit.

Thus the data looks like this:

• C20H37N1O5
• C10H12O3
• C20H19N3O4
• C23H40O3
• C9H13N1O3
• C14H26O4
• C58H100N2O9

We would like to have each element number for the list in separate columns. So in the first example it would be 20 37 1 5 corresponding to C20H37N1O5

We will write a function in VBA that will extract numbers from chemical formula of a compound and list down them below their atomic symbol.

Step 1

Open VB editor from Developer tab in excel as shown in the figure.

Step 2

Insert a module in which we will write a function that will extract numbers from chemical formula.

Step 3

Now double click the newly created module to open that and paste the below code in the module.

Option Explicit

Public Function ChemRegex(ByVal ChemFormula As String, ByVal Element As String) As Long

Dim strPattern As String

strPattern = “([CNHO])([0-9]*)”

‘this pattern is limited to the elements C, N, H and O only.

Dim regEx As New RegExp

Dim Matches As MatchCollection, m As Match

If strPattern <> “” Then

With regEx

.Global = True

.MultiLine = True

.IgnoreCase = False

.Pattern = strPattern

End With

Set Matches = regEx.Execute(ChemFormula)

For Each m In Matches

If m.SubMatches(0) = Element Then

ChemRegex = IIf(Not m.SubMatches(1) = vbNullString, m.SubMatches(1), 1)

‘this IIF ensures that in CH4O the C and O are count as 1

Exit For

End If

Next m

End If

End Function

Step 4

Add reference to regex under Tools then References and select Microsoft VBScript Regular Expression 5.5

Step 5

Now, create a row with all the atomic elements that you may have in your chemical formulas from which the numbers will be extracted as shown below in the picture e.g. C,H,S,O,N etc

Insert the formula =ChemRegex(\$A2,B\$1) in the cell B2 that will extract the number of element in the chemical formula for the atomic element that we have in cell B1.

Now copy and paste the formula through entire range and we will get the numbers extracted from chemical compound in excel sheet.

We can use this function to extract the numbers from any chemical formula including the formula that has parentheses in it and you can customize the values in Row 1 to accommodate all the atomic elements that you may have in your list.

Hope this helped.

Share The Knowledge