Click here to close now.


@CloudExpo Authors: Pat Romanski, Elizabeth White, Liz McMillan, Victoria Livschitz, Ed Featherston

Related Topics: @CloudExpo

@CloudExpo: Blog Post

Cousins of Cobol in Big Data Analytics

How DFSORT, REXX Support Big Data Analytics

In this  article  I would  like to look at a few tools which are overlooked when it comes to Big Data analytics. Organizations that  have  already  heavy investment  on Mainframe  and  would like to continue  with the utilization of Mainframe can consider these  tools for further  expanding their Big Data Analytics reach.

DFSORT-  Sorting & Merging Large Data Sets :

  • Much before RDBMS have taken their place, Cobol programs have 2 major file manipulation operations namely:
  • SORT operation accepts un-sequenced input and produces output in specified sequence
  • The Merge operation compares records from two or more files and combines them in order
  • DFSORT adds the ability to do faster and easier sorting, merging, copying, reporting and analysis of your business information, as well as versatile data handling at the record, fixed position/length or variable position/length field, and bit level.
  • DFSORT is designed to optimize the efficiency and speed with which operations are completed through synergy with processor, device, and system features
  • A Cobol program will typically act as a intermediary in handling the FILE inputs and passing them to DFSORT
  • After all the input records have been passed to DFSORT, the sorting operation is executed. This operation arranges the entire set of records in the sequence specified by keys.
  • Much like a SORT , MERGE statement is also called from a COBOL job
  • The MERGE statement execution begins the MERGE processing. This operation compares keys with the records of the input files, and passes the sequenced records to create a MERGED output file
  • As per the documentation from the vendor , there is no maximum number of keys which can support the needs for Big Data Analytics processing
  • Some of the advanced options of DFSORT also facilitates parallel sort processing which goes well with needs of Big Data Analytics
  • With the work loads of Big Data Analytical jobs can span multiple physical and virtual servers including mainframe, it is good to see that DFSORT has the option to sort records either in EBCDIC or ASCII or another collating sequence. This can result in uniformity of massively parallel sorting jobs if they run on heterogeneous systems
  • The Job Control Language (JCL), which gives Hadoop like management of large file processing jobs in Mainframe have good features to specify multiple input and output file options for SORT and MERGE jobs
  • As evident this article does not aim as a tutorial for DFSORT and various performance features can be looked from Mainframe manuals or can ask Mainframe Gurus in your organization.


  • REXX (Restructured eXtended eXecutor) is another programming language that is used in the same eco system of Cobol and DFSORT and can considerably contribute to the Big Data Analytical needs of the enterprises
  • REXX has advantages in string manipulation, Dynamic data typing, Storage Management and is generally considered to be very reliable and robust
  • One of the most important strengths of REXX that is of relevance to Bigdata Analytics is its ‘'character string" handling ability.
  • There are some useful string manipulation functions like COPIES (), WORDS(), STRIP(), TRANSLATE(), which can go a long way in the Map Reduce functionality needs of typical big data analytical jobs
  • PARSE instruction is also used frequently in REXX programs. It is able to take strings from a number of sources and break them apart into constituent parts using a fairly natural notation
  • Probably PARSE could be one of the highly useful feature of REXX in its positioning as a Big Data Analytical tool
  • The REXX parse statement divides a source string into constituent parts and assigns these to symbols as directed by the governing parsing template
  • REXX, DFSORT and Cobol programs can be inter operable such that we could call a REXX program from Cobol , and all these can be tied together with JCL
  • Again this note is meant as a tutorial for REXX and lot of good documentation is available on utilizing the String manipulation features of REXX.

Summary : There is  a strong  need for enterprises  to  adopt Big Data  Analytics  and start mining the  huge sets  of  unstructured data which has been ignored so far to arrive at meaningful business decisions.  While  newer  frameworks like Hadoop  or  the new breed of  analytical databases are going to satisfy  this need,  however   enterprises  should not be spending their time on picking up the tools and languages when it comes to Big Data Analytics.

If there is a significant  investment  and organization direction is to use the legacy  platforms like Cobol, JCL, REXX, DFSORT  it is only prudent  to utilize best  of their capabilities  in arriving  at options for Big Data Analytics.

We are seeing   that  Big Data Analytics  is mainly dependent on Map / Reduce algorithms,  these  functions are aimed  at  crunching  large data sets, like reading the input files  and  create key/value pair   and map functions take these  key/value pairs  and generates  another  key/value pair.  Further Reducer function  also depends on  sorted  key/value pairs  and iterate them and reduce the output further.

If we look at the way this logic works,  there is a  heavy need for sorting, merging, string  manipulation and parsing all the way. Hence  the tools mentioned  above like DFSORT,  REXX  along with Cobol  will likely to satisfy  the Big Data needs  of large enterprises  if  they  have already invested  on Mainframe compute capacity.


More Stories By Srinivasan Sundara Rajan

Srinivasan is passionate about ownership and driving things on his own, with his breadth and depth on Enterprise Technology he could run any aspect of IT Industry and make it a success.

He is a seasoned Enterprise IT Expert, mainly in the areas of Solution, Integration and Architecture, across Structured, Unstructured data sources, especially in manufacturing domain.

He currently works as Technology Head For GAVS Technologies.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.

@CloudExpo Stories
SYS-CON Events announced today that JFrog, maker of Artifactory, the popular Binary Repository Manager, will exhibit at SYS-CON's @DevOpsSummit Silicon Valley, which will take place on November 3–5, 2015, at the Santa Clara Convention Center in Santa Clara, CA. Based in California, Israel and France, founded by longtime field-experts, JFrog, creator of Artifactory and Bintray, has provided the market with the first Binary Repository solution and a software distribution social platform.
SYS-CON Events announced today that Key Information Systems, Inc. (KeyInfo), a leading cloud and infrastructure provider offering integrated solutions to enterprises, will exhibit at the 17th International Cloud Expo®, which will take place on November 3–5, 2015, at the Santa Clara Convention Center in Santa Clara, CA. Key Information Systems is a leading regional systems integrator with world-class compute, storage and networking solutions and professional services for the most advanced softwa...
Cloud computing delivers on-demand resources that provide businesses with flexibility and cost-savings. The challenge in moving workloads to the cloud has been the cost and complexity of ensuring the initial and ongoing security and regulatory (PCI, HIPAA, FFIEC) compliance across private and public clouds. Manual security compliance is slow, prone to human error, and represents over 50% of the cost of managing cloud applications. Determining how to automate cloud security compliance is critical...
SYS-CON Events announced today that Agema Systems will exhibit at the 17th International Cloud Expo®, which will take place on November 3-5, 2015, at the Santa Clara Convention Center in Santa Clara, CA. Agema Systems is the leading provider of critical white-box rack solutions to data centers through the major integrators and value added distribution channels.
SYS-CON Events announced today that Interface Masters Technologies, provider of leading network visibility and monitoring solutions, will exhibit at the 17th International CloudExpo®, which will take place on November 3–5, 2015, at the Santa Clara Convention Center in Santa Clara, CA. Interface Masters Technologies is a leading provider of high speed networking solutions focused on Gigabit, 10 Gigabit, 40 Gigabit and 100 Gigabit Ethernet network access and connectivity products. For over 20 ye...
Interested in leveraging automation technologies and a cloud architecture to make developers more productive? Learn how PaaS can benefit your organization to help you streamline your application development, allow you to use existing infrastructure and improve operational efficiencies. Begin charting your path to PaaS with OpenShift Enterprise.
SYS-CON Events announced today that Harbinger Systems will exhibit at SYS-CON's 17th International Cloud Expo®, which will take place on November 3–5, 2015, at the Santa Clara Convention Center in Santa Clara, CA. Harbinger Systems is a global company providing software technology services. Since 1990, Harbinger has developed a strong customer base worldwide. Its customers include software product companies ranging from hi-tech start-ups in Silicon Valley to leading product companies in the US a...
SYS-CON Events announced today that Machkey International Company will exhibit at the 17th International Cloud Expo®, which will take place on November 3–5, 2015, at the Santa Clara Convention Center in Santa Clara, CA. Machkey provides advanced connectivity solutions for just about everyone. Businesses or individuals, Machkey is dedicated to provide high-quality and cost-effective products to meet all your needs.
In recent years, at least 40% of companies using cloud applications have experienced data loss. One of the best prevention against cloud data loss is backing up your cloud data. In his General Session at 17th Cloud Expo, Bryan Forrester, Senior Vice President of Sales at eFolder, will present how organizations can use eFolder Cloudfinder to automate backups of cloud application data. He will also demonstrate how easy it is to search and restore cloud application data using Cloudfinder.
Clearly the way forward is to move to cloud be it bare metal, VMs or containers. One aspect of the current public clouds that is slowing this cloud migration is cloud lock-in. Every cloud vendor is trying to make it very difficult to move out once a customer has chosen their cloud. In his session at 17th Cloud Expo, Naveen Nimmu, CEO of Clouber, Inc., will advocate that making the inter-cloud migration as simple as changing airlines would help the entire industry to quickly adopt the cloud wit...
SYS-CON Events announced today that Secure Infrastructure & Services will exhibit at SYS-CON's 17th International Cloud Expo®, which will take place on November 3–5, 2015, at the Santa Clara Convention Center in Santa Clara, CA. Secure Infrastructure & Services (SIAS) is a managed services provider of cloud computing solutions for the IBM Power Systems market. The company helps mid-market firms built on IBM hardware platforms to deploy new levels of reliable and cost-effective computing and hig...
Organizations already struggle with the simple collection of data resulting from the proliferation of IoT, lacking the right infrastructure to manage it. They can't only rely on the cloud to collect and utilize this data because many applications still require dedicated infrastructure for security, redundancy, performance, etc. In his session at 17th Cloud Expo, Emil Sayegh, CEO of Codero Hosting, will discuss how in order to resolve the inherent issues, companies need to combine dedicated a...
SYS-CON Events announced today that IBM Cloud Data Services has been named “Bronze Sponsor” of SYS-CON's 17th Cloud Expo, which will take place on November 3–5, 2015, at the Santa Clara Convention Center in Santa Clara, CA. IBM Cloud Data Services offers a portfolio of integrated, best-of-breed cloud data services for developers focused on mobile computing and analytics use cases.
SYS-CON Events announced today that ProfitBricks, the provider of painless cloud infrastructure, will exhibit at SYS-CON's 17th International Cloud Expo®, which will take place on November 3–5, 2015, at the Santa Clara Convention Center in Santa Clara, CA. ProfitBricks is the IaaS provider that offers a painless cloud experience for all IT users, with no learning curve. ProfitBricks boasts flexible cloud servers and networking, an integrated Data Center Designer tool for visual control over the...
“All our customers are looking at the cloud ecosystem as an important part of their overall product strategy. Some see it evolve as a multi-cloud / hybrid cloud strategy, while others are embracing all forms of cloud offerings like PaaS, IaaS and SaaS in their solutions,” noted Suhas Joshi, Vice President – Technology, at Harbinger Group, in this exclusive Q&A with Cloud Expo Conference Chair Roger Strukhoff.
Docker is hot. However, as Docker container use spreads into more mature production pipelines, there can be issues about control of Docker images to ensure they are production-ready. Is a promotion-based model appropriate to control and track the flow of Docker images from development to production? In his session at DevOps Summit, Fred Simon, Co-founder and Chief Architect of JFrog, will demonstrate how to implement a promotion model for Docker images using a binary repository, and then show h...
Learn how Backup as a Service can help your customer base protect their data. In his session at 17th Cloud Expo, Stefaan Vervaet, Director of Strategic Alliances at HGST, will discuss the challenges of data protection in an era of exploding storage requirements, show you the benefits of a backup service for your cloud customers, and explain how the HGST Active Archive and CommVault are already enabling this service today with customer examples.
Learn how IoT, cloud, social networks and last but not least, humans, can be integrated into a seamless integration of cooperative organisms both cybernetic and biological. This has been enabled by recent advances in IoT device capabilities, messaging frameworks, presence and collaboration services, where devices can share information and make independent and human assisted decisions based upon social status from other entities. In his session at @ThingsExpo, Michael Heydt, founder of Seamless...
SYS-CON Events announced today that VividCortex, the monitoring solution for the modern data system, will exhibit at the 17th International Cloud Expo®, which will take place on November 3–5, 2015, at the Santa Clara Convention Center in Santa Clara, CA. The database is the heart of most applications, but it’s also the part that’s hardest to scale, monitor, and optimize even as it’s growing 50% year over year. VividCortex is the first unified suite of database monitoring tools specifically desi...
Culture is the most important ingredient of DevOps. The challenge for most organizations is defining and communicating a vision of beneficial DevOps culture for their organizations, and then facilitating the changes needed to achieve that. Often this comes down to an ability to provide true leadership. As a CIO, are your direct reports IT managers or are they IT leaders? The hard truth is that many IT managers have risen through the ranks based on their technical skills, not their leadership ab...