Extract Numbers From Chemical Formula In Excel

In this post we will see how to extract numbers from chemical formula in excel.

Suppose we have a list of chemical formulas as shown below and we would like to pull the number after each of these letters e.g. C,H,N,O etc. the number for atoms can be either single, double or (in the case of H only) triple digit.

Thus the data looks like this:

  • C20H37N1O5
  • C10H12O3
  • C20H19N3O4
  • C23H40O3
  • C9H13N1O3
  • C14H26O4
  • C58H100N2O9




We would like to have each element number for the list in separate columns. So in the first example it would be 20 37 1 5 corresponding to C20H37N1O5

We will write a function in VBA that will extract numbers from chemical formula of a compound and list down them below their atomic symbol.

Step 1

Open VB editor from Developer tab in excel as shown in the figure.

Open developer tab to write VBA to extract numbers from chemical formula

Step 2

Insert a module in which we will write a function that will extract numbers from chemical formula.

Insert a module in the VB editor

Step 3

Now double click the newly created module to open that and paste the below code in the module.

Option Explicit

 Public Function ChemRegex(ByVal ChemFormula As String, ByVal Element As String) As Long

     Dim strPattern As String

     strPattern = “([CNHO])([0-9]*)”

                  ‘this pattern is limited to the elements C, N, H and O only.

     Dim regEx As New RegExp

 

     Dim Matches As MatchCollection, m As Match

 

     If strPattern <> “” Then

         With regEx

             .Global = True

             .MultiLine = True

             .IgnoreCase = False

             .Pattern = strPattern

         End With

 

         Set Matches = regEx.Execute(ChemFormula)

         For Each m In Matches

             If m.SubMatches(0) = Element Then

                 ChemRegex = IIf(Not m.SubMatches(1) = vbNullString, m.SubMatches(1), 1)

                             ‘this IIF ensures that in CH4O the C and O are count as 1

                 Exit For

             End If

         Next m

     End If

 End Function




VBA code to get chemical numbers extracted from chemical formula

Step 4

Add reference to regex under Tools then References and select Microsoft VBScript Regular Expression 5.5

Excel formula to pull chemical number from chemical formula

Step 5

Now, create a row with all the atomic elements that you may have in your chemical formulas from which the numbers will be extracted as shown below in the picture e.g. C,H,S,O,N etc

Excel formula to pull number from chemical formula

Insert the formula =ChemRegex($A2,B$1) in the cell B2 that will extract the number of element in the chemical formula for the atomic element that we have in cell B1.

Extract number from chemical formula

Now copy and paste the formula through entire range and we will get the numbers extracted from chemical compound in excel sheet.

get number extracted from chemical formula in excel




We can use this function to extract the numbers from any chemical formula including the formula that has parentheses in it and you can customize the values in Row 1 to accommodate all the atomic elements that you may have in your list.

Hope this helped.

Share The Knowledge

Random Posts