Datacenter consolidation creates performance problems for slow Oracle 11g Client queries.

An application’s server infrastructure was moved from an international location into a central datacenter in the USA. The application uses data sources on a MS-SQL database, but weekly the application requires an extraction from an Oracle database.

The weekly Oracle data extraction original physical server environment took 45 minutes to complete in the international location on native physical servers. After moving to a USA virtualized datacenter the Oracle data extraction took over 18 hours and crashed before completion.

The part of the application, the weekly data extraction apparently was not tested before the application cutover. Now that the cutover has occurred the situation has become a high severity issue affecting business.

The organization has a highly effective cross organizational project management team that gets involved when high severity business impact exists. The project managers bring in all the responsible parties onto a conference bridge managing across technical and management silos.

The global services group has central analysis tools, in this case NetScout’s Sniffer InfiniStream. The servers in the former international environment do not have infrastructure embedded packet capture capability so we have been using MS-NetMon on the platform as it is included by Microsoft with the server and easier to allow its use from a logistic and security perspective.

 

The packet capture traces illustrate slowness not on the part of the Oracle server where the data is being extracted from, but in the requesting Oracle Client running on a 2008 Server with MS-SQL 2008 R2 (or R1). The Oracle Client (running on the new virtual MS 2008 SQL Server in the datacenter) sends a request to the Oracle Linux server also in the datacenter. The reply is very fast with about 10,000 bytes. After receiving the 10,000 bytes the Clients sits for roughly 150 milliseconds after fully receiving the previously requested data before it sends the SQLnet “get next row” request. It is this delay before spawning the get next that causes the slowness for many thousands of the same “get next row” requests.

The problem was found and analyzed in the new virtualized datacenter production environment and replicated / confirmed in the new virtualized datacenter development environment.

In Oracle and third party support forums many have reported the same difference in Oracle Client vs SQL plus / Microsoft SQL Management Suite performance.

Learned that the part of the Oracle Client 11g being used is the quite legacy OLEdb still being included and that Oracle does not support its client in any virtual environment other than its own! Therefore the only option is to use native physical machines…