Welcome!

@CloudExpo Authors: Elizabeth White, Liz McMillan, Yeshim Deniz, Pat Romanski, Aruna Ravichandran

Related Topics: @CloudExpo

@CloudExpo: Blog Post

Cousins of Cobol in Big Data Analytics

How DFSORT, REXX Support Big Data Analytics

In this  article  I would  like to look at a few tools which are overlooked when it comes to Big Data analytics. Organizations that  have  already  heavy investment  on Mainframe  and  would like to continue  with the utilization of Mainframe can consider these  tools for further  expanding their Big Data Analytics reach.

DFSORT-  Sorting & Merging Large Data Sets :

  • Much before RDBMS have taken their place, Cobol programs have 2 major file manipulation operations namely:
  • SORT operation accepts un-sequenced input and produces output in specified sequence
  • The Merge operation compares records from two or more files and combines them in order
  • DFSORT adds the ability to do faster and easier sorting, merging, copying, reporting and analysis of your business information, as well as versatile data handling at the record, fixed position/length or variable position/length field, and bit level.
  • DFSORT is designed to optimize the efficiency and speed with which operations are completed through synergy with processor, device, and system features
  • A Cobol program will typically act as a intermediary in handling the FILE inputs and passing them to DFSORT
  • After all the input records have been passed to DFSORT, the sorting operation is executed. This operation arranges the entire set of records in the sequence specified by keys.
  • Much like a SORT , MERGE statement is also called from a COBOL job
  • The MERGE statement execution begins the MERGE processing. This operation compares keys with the records of the input files, and passes the sequenced records to create a MERGED output file
  • As per the documentation from the vendor , there is no maximum number of keys which can support the needs for Big Data Analytics processing
  • Some of the advanced options of DFSORT also facilitates parallel sort processing which goes well with needs of Big Data Analytics
  • With the work loads of Big Data Analytical jobs can span multiple physical and virtual servers including mainframe, it is good to see that DFSORT has the option to sort records either in EBCDIC or ASCII or another collating sequence. This can result in uniformity of massively parallel sorting jobs if they run on heterogeneous systems
  • The Job Control Language (JCL), which gives Hadoop like management of large file processing jobs in Mainframe have good features to specify multiple input and output file options for SORT and MERGE jobs
  • As evident this article does not aim as a tutorial for DFSORT and various performance features can be looked from Mainframe manuals or can ask Mainframe Gurus in your organization.

REXX :

  • REXX (Restructured eXtended eXecutor) is another programming language that is used in the same eco system of Cobol and DFSORT and can considerably contribute to the Big Data Analytical needs of the enterprises
  • REXX has advantages in string manipulation, Dynamic data typing, Storage Management and is generally considered to be very reliable and robust
  • One of the most important strengths of REXX that is of relevance to Bigdata Analytics is its ‘'character string" handling ability.
  • There are some useful string manipulation functions like COPIES (), WORDS(), STRIP(), TRANSLATE(), which can go a long way in the Map Reduce functionality needs of typical big data analytical jobs
  • PARSE instruction is also used frequently in REXX programs. It is able to take strings from a number of sources and break them apart into constituent parts using a fairly natural notation
  • Probably PARSE could be one of the highly useful feature of REXX in its positioning as a Big Data Analytical tool
  • The REXX parse statement divides a source string into constituent parts and assigns these to symbols as directed by the governing parsing template
  • REXX, DFSORT and Cobol programs can be inter operable such that we could call a REXX program from Cobol , and all these can be tied together with JCL
  • Again this note is meant as a tutorial for REXX and lot of good documentation is available on utilizing the String manipulation features of REXX.

Summary : There is  a strong  need for enterprises  to  adopt Big Data  Analytics  and start mining the  huge sets  of  unstructured data which has been ignored so far to arrive at meaningful business decisions.  While  newer  frameworks like Hadoop  or  the new breed of  analytical databases are going to satisfy  this need,  however   enterprises  should not be spending their time on picking up the tools and languages when it comes to Big Data Analytics.

If there is a significant  investment  and organization direction is to use the legacy  platforms like Cobol, JCL, REXX, DFSORT  it is only prudent  to utilize best  of their capabilities  in arriving  at options for Big Data Analytics.

We are seeing   that  Big Data Analytics  is mainly dependent on Map / Reduce algorithms,  these  functions are aimed  at  crunching  large data sets, like reading the input files  and  create key/value pair   and map functions take these  key/value pairs  and generates  another  key/value pair.  Further Reducer function  also depends on  sorted  key/value pairs  and iterate them and reduce the output further.

If we look at the way this logic works,  there is a  heavy need for sorting, merging, string  manipulation and parsing all the way. Hence  the tools mentioned  above like DFSORT,  REXX  along with Cobol  will likely to satisfy  the Big Data needs  of large enterprises  if  they  have already invested  on Mainframe compute capacity.

 

More Stories By Srinivasan Sundara Rajan

Highly passionate about utilizing Digital Technologies to enable next generation enterprise. Believes in enterprise transformation through the Natives (Cloud Native & Mobile Native).

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


@CloudExpo Stories
Organizations do not need a Big Data strategy; they need a business strategy that incorporates Big Data. Most organizations lack a road map for using Big Data to optimize key business processes, deliver a differentiated customer experience, or uncover new business opportunities. They do not understand what’s possible with respect to integrating Big Data into the business model.
Enterprises have taken advantage of IoT to achieve important revenue and cost advantages. What is less apparent is how incumbent enterprises operating at scale have, following success with IoT, built analytic, operations management and software development capabilities – ranging from autonomous vehicles to manageable robotics installations. They have embraced these capabilities as if they were Silicon Valley startups. As a result, many firms employ new business models that place enormous impor...
Amazon is pursuing new markets and disrupting industries at an incredible pace. Almost every industry seems to be in its crosshairs. Companies and industries that once thought they were safe are now worried about being “Amazoned.”. The new watch word should be “Be afraid. Be very afraid.” In his session 21st Cloud Expo, Chris Kocher, a co-founder of Grey Heron, will address questions such as: What new areas is Amazon disrupting? How are they doing this? Where are they likely to go? What are th...
SYS-CON Events announced today that MIRAI Inc. will exhibit at the Japan External Trade Organization (JETRO) Pavilion at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. MIRAI Inc. are IT consultants from the public sector whose mission is to solve social issues by technology and innovation and to create a meaningful future for people.
SYS-CON Events announced today that Dasher Technologies will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 - Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Dasher Technologies, Inc. ® is a premier IT solution provider that delivers expert technical resources along with trusted account executives to architect and deliver complete IT solutions and services to help our clients execute their goals, plans and objectives. Since 1999, we'v...
Though cloud is the future of enterprise computing, a smooth transition of legacy applications and systems is critical for seamless business operations. IT professionals are eager to start leveraging the cost, scale and other benefits of cloud, but with massive investments already in place in existing infrastructure and a number of compliance and resource hurdles, it can be challenging to move to a cloud-based infrastructure.
SYS-CON Events announced today that NetApp has been named “Bronze Sponsor” of SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 – Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. NetApp is the data authority for hybrid cloud. NetApp provides a full range of hybrid cloud data services that simplify management of applications and data across cloud and on-premises environments to accelerate digital transformation. Together with their partners, NetApp emp...
In his session at 21st Cloud Expo, Raju Shreewastava, founder of Big Data Trunk, will provide a fun and simple way to introduce Machine Leaning to anyone and everyone. Together we will solve a machine learning problem and find an easy way to be able to do machine learning without even coding. Raju Shreewastava is the founder of Big Data Trunk (www.BigDataTrunk.com), a Big Data Training and consulting firm with offices in the United States. He previously led the data warehouse/business intellige...
SYS-CON Events announced today that IBM has been named “Diamond Sponsor” of SYS-CON's 21st Cloud Expo, which will take place on October 31 through November 2nd 2017 at the Santa Clara Convention Center in Santa Clara, California.
SYS-CON Events announced today that TidalScale, a leading provider of systems and services, will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 - Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. TidalScale has been involved in shaping the computing landscape. They've designed, developed and deployed some of the most important and successful systems and services in the history of the computing industry - internet, Ethernet, operating s...
Infoblox delivers Actionable Network Intelligence to enterprise, government, and service provider customers around the world. They are the industry leader in DNS, DHCP, and IP address management, the category known as DDI. We empower thousands of organizations to control and secure their networks from the core-enabling them to increase efficiency and visibility, improve customer service, and meet compliance requirements.
In his session at 21st Cloud Expo, Michael Burley, a Senior Business Development Executive in IT Services at NetApp, will describe how NetApp designed a three-year program of work to migrate 25PB of a major telco's enterprise data to a new STaaS platform, and then secured a long-term contract to manage and operate the platform. This significant program blended the best of NetApp’s solutions and services capabilities to enable this telco’s successful adoption of private cloud storage and launchi...
Data scientists must access high-performance computing resources across a wide-area network. To achieve cloud-based HPC visualization, researchers must transfer datasets and visualization results efficiently. HPC clusters now compute GPU-accelerated visualization in the cloud cluster. To efficiently display results remotely, a high-performance, low-latency protocol transfers the display from the cluster to a remote desktop. Further, tools to easily mount remote datasets and efficiently transfer...
SYS-CON Events announced today that IBM has been named “Diamond Sponsor” of SYS-CON's 21st Cloud Expo, which will take place on October 31 through November 2nd 2017 at the Santa Clara Convention Center in Santa Clara, California.
Join IBM November 1 at 21st Cloud Expo at the Santa Clara Convention Center in Santa Clara, CA, and learn how IBM Watson can bring cognitive services and AI to intelligent, unmanned systems. Cognitive analysis impacts today’s systems with unparalleled ability that were previously available only to manned, back-end operations. Thanks to cloud processing, IBM Watson can bring cognitive services and AI to intelligent, unmanned systems. Imagine a robot vacuum that becomes your personal assistant tha...
In his Opening Keynote at 21st Cloud Expo, John Considine, General Manager of IBM Cloud Infrastructure, will lead you through the exciting evolution of the cloud. He'll look at this major disruption from the perspective of technology, business models, and what this means for enterprises of all sizes. John Considine is General Manager of Cloud Infrastructure Services at IBM. In that role he is responsible for leading IBM’s public cloud infrastructure including strategy, development, and offering ...
In a recent survey, Sumo Logic surveyed 1,500 customers who employ cloud services such as Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP). According to the survey, a quarter of the respondents have already deployed Docker containers and nearly as many (23 percent) are employing the AWS Lambda serverless computing framework. It’s clear: serverless is here to stay. The adoption does come with some needed changes, within both application development and operations. Tha...
SYS-CON Events announced today that Avere Systems, a leading provider of enterprise storage for the hybrid cloud, will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on Oct 31 - Nov 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Avere delivers a more modern architectural approach to storage that doesn't require the overprovisioning of storage capacity to achieve performance, overspending on expensive storage media for inactive data or the overbui...
In his general session at 21st Cloud Expo, Greg Dumas, Calligo’s Vice President and G.M. of US operations, will go over the new Global Data Protection Regulation and how Calligo can help business stay compliant in digitally globalized world. Greg Dumas is Calligo's Vice President and G.M. of US operations. Calligo is an established service provider that provides an innovative platform for trusted cloud solutions. Calligo’s customers are typically most concerned about GDPR compliance, applicatio...
Widespread fragmentation is stalling the growth of the IIoT and making it difficult for partners to work together. The number of software platforms, apps, hardware and connectivity standards is creating paralysis among businesses that are afraid of being locked into a solution. EdgeX Foundry is unifying the community around a common IoT edge framework and an ecosystem of interoperable components.