Augmenting RSDF output from SAP BW

This is a bit of a niche post, forgive me. But it does show how a knowledge of open source techniques (particularly Python) can get you out of many a sticky situation.

I recently helped implement SAP BW/4HANA at a local subsidiary of a Tier 1 global retailer. This was completed with the assistance of a 3rd Party testing team. They were not an SAP shop and had high expectations for how the documentation would be completed as an input to their testing activities.

We decided to use the self-generated documentation in SAP BW/4HANA to document our data flows and the object definitions via transaction code RSDF. This is a tremendous way to extract detailed, system-generated documentation at the click of a few buttons.

However we discovered that while the RSDF functionality does a good job of documenting the details of most of SAP BW system, the BW Transformation rules, especially Formula Routines, falls short. The Transformation rules for 1:1 mappings, constants and ABAP routines are fine, but the Formula details are not populated.

Not having a 100% complete record meant that we could not rely on the RSDF-generated documentation. This in turn meant we would have to create the system documentation manually at tremendous cost of effort and time. The small omission of Formula details in the RSDF documentation had put our go-live date in jeopardy.

That’s when I spent a little time brainstorming a solution. Necessity being the mother of invention we noticed that the RSDF output was in the form of an HTML document, and it struck me that we could fix this little challenge by using python screen-scraping techniques. If we could take the HTML document as an input and discover what detail was missing, we could potentially collect the missing detail through an SQL call to the BW database and then inject it into the document to fill the gap.

It worked!

Here is a description of the script we used (rough & ready, and ripe for refinement … ) — Note the full script is available at the end of the article:

1. Import Libraries:

The xxx_dbconfig reference is my connection credentials that are stored in a separate file with the following content:

2. Function to build list of HTML files

Here we look in the sub directory ‘/input_html/’ for html files and return a list of files for further processing.

3. Function to extract the soup

Here we take each file and extract the contents the form of a soup of HTML so that we can apply our search feature to find where the detail needs to be added.

4. Function to find Transformations in the soup

Here we search our soup for the Transformation block, returned as a list .

5. Function to clean up the soup

Here we take an extract of soup and clean up known issues with the way RSDF formats the HTML.

6. Function to retrive the missing detail from SAP BW

Here go to the SAP BW database and look-up the missing detail for the Transformations that contain Formulas.

7. Function to inject the Formula detail into the HTML

This function loops through the list of relevant transformations (those with missing Formula definitions) and add in the detail that was collected from the SAP BW database.

8. Function to write the corrected HTML to file

Here we take the corrected HTML and write it back out to file.

9. Function to convert HTML files to PDFs

10. Main script block

The main block processes the data in the following order:

  1. Get the list if HTML files
  2. For each file do the following:
  • Convert html to BeautifulSoup object
  • Extract transformations from the soup
  • Clean-up the soup, removing detail as required
  • Query SAP HANA to return Formula details
  • Construct and then inject ‘formula’ HTML and write to output HTML
  • Finally: if required, output to PDF


And there you go. I hope this is useful to someone out there. Please let me know if you do use this on your own implementation.

Full code: