Welcome!

Cloud Expo Authors: Liz McMillan, Roger Strukhoff, Pat Romanski, Elizabeth White, Jason Thompson

Related Topics: Cloud Expo

Cloud Expo: Blog Post

Cousins of Cobol in Big Data Analytics

How DFSORT, REXX Support Big Data Analytics

In this  article  I would  like to look at a few tools which are overlooked when it comes to Big Data analytics. Organizations that  have  already  heavy investment  on Mainframe  and  would like to continue  with the utilization of Mainframe can consider these  tools for further  expanding their Big Data Analytics reach.

DFSORT-  Sorting & Merging Large Data Sets :

  • Much before RDBMS have taken their place, Cobol programs have 2 major file manipulation operations namely:
  • SORT operation accepts un-sequenced input and produces output in specified sequence
  • The Merge operation compares records from two or more files and combines them in order
  • DFSORT adds the ability to do faster and easier sorting, merging, copying, reporting and analysis of your business information, as well as versatile data handling at the record, fixed position/length or variable position/length field, and bit level.
  • DFSORT is designed to optimize the efficiency and speed with which operations are completed through synergy with processor, device, and system features
  • A Cobol program will typically act as a intermediary in handling the FILE inputs and passing them to DFSORT
  • After all the input records have been passed to DFSORT, the sorting operation is executed. This operation arranges the entire set of records in the sequence specified by keys.
  • Much like a SORT , MERGE statement is also called from a COBOL job
  • The MERGE statement execution begins the MERGE processing. This operation compares keys with the records of the input files, and passes the sequenced records to create a MERGED output file
  • As per the documentation from the vendor , there is no maximum number of keys which can support the needs for Big Data Analytics processing
  • Some of the advanced options of DFSORT also facilitates parallel sort processing which goes well with needs of Big Data Analytics
  • With the work loads of Big Data Analytical jobs can span multiple physical and virtual servers including mainframe, it is good to see that DFSORT has the option to sort records either in EBCDIC or ASCII or another collating sequence. This can result in uniformity of massively parallel sorting jobs if they run on heterogeneous systems
  • The Job Control Language (JCL), which gives Hadoop like management of large file processing jobs in Mainframe have good features to specify multiple input and output file options for SORT and MERGE jobs
  • As evident this article does not aim as a tutorial for DFSORT and various performance features can be looked from Mainframe manuals or can ask Mainframe Gurus in your organization.

REXX :

  • REXX (Restructured eXtended eXecutor) is another programming language that is used in the same eco system of Cobol and DFSORT and can considerably contribute to the Big Data Analytical needs of the enterprises
  • REXX has advantages in string manipulation, Dynamic data typing, Storage Management and is generally considered to be very reliable and robust
  • One of the most important strengths of REXX that is of relevance to Bigdata Analytics is its ‘'character string" handling ability.
  • There are some useful string manipulation functions like COPIES (), WORDS(), STRIP(), TRANSLATE(), which can go a long way in the Map Reduce functionality needs of typical big data analytical jobs
  • PARSE instruction is also used frequently in REXX programs. It is able to take strings from a number of sources and break them apart into constituent parts using a fairly natural notation
  • Probably PARSE could be one of the highly useful feature of REXX in its positioning as a Big Data Analytical tool
  • The REXX parse statement divides a source string into constituent parts and assigns these to symbols as directed by the governing parsing template
  • REXX, DFSORT and Cobol programs can be inter operable such that we could call a REXX program from Cobol , and all these can be tied together with JCL
  • Again this note is meant as a tutorial for REXX and lot of good documentation is available on utilizing the String manipulation features of REXX.

Summary : There is  a strong  need for enterprises  to  adopt Big Data  Analytics  and start mining the  huge sets  of  unstructured data which has been ignored so far to arrive at meaningful business decisions.  While  newer  frameworks like Hadoop  or  the new breed of  analytical databases are going to satisfy  this need,  however   enterprises  should not be spending their time on picking up the tools and languages when it comes to Big Data Analytics.

If there is a significant  investment  and organization direction is to use the legacy  platforms like Cobol, JCL, REXX, DFSORT  it is only prudent  to utilize best  of their capabilities  in arriving  at options for Big Data Analytics.

We are seeing   that  Big Data Analytics  is mainly dependent on Map / Reduce algorithms,  these  functions are aimed  at  crunching  large data sets, like reading the input files  and  create key/value pair   and map functions take these  key/value pairs  and generates  another  key/value pair.  Further Reducer function  also depends on  sorted  key/value pairs  and iterate them and reduce the output further.

If we look at the way this logic works,  there is a  heavy need for sorting, merging, string  manipulation and parsing all the way. Hence  the tools mentioned  above like DFSORT,  REXX  along with Cobol  will likely to satisfy  the Big Data needs  of large enterprises  if  they  have already invested  on Mainframe compute capacity.

 

More Stories By Srinivasan Sundara Rajan

I am passionate about ownership and driving things on my own, with my breadth and depth on Enterprise Technology I could run any aspect of IT Industry and make it a success. I am a Seasoned Enterprise IT Expert, mainly in the areas of Solution,Integration and Architecture, across Structured, Unstructured data sources, especially in manufacturing domain. My recent work is on Natural Language Processing, Semantic Enrichment of Unstructured Data, Data Mining and Predictive Analytics. However I have a strong footing across all tiers of Enterprise IT spectrum. I am geared to handle the massive flow of data by Internet Of Things with appropriate platform, tools and processes.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


Cloud Expo Breaking News
The Internet of Things is not new. Historically, smart businesses have used its basic concept of leveraging data to drive better decision making and have capitalized on those insights to realize additional revenue opportunities. So, what has changed to make the Internet of Things one of the hottest topics in tech? In his session at Internet of @ThingsExpo, Chris Gray, Director, Embedded and Internet of Things, will discuss the underlying factors that are driving the economics of intelligent systems. Discover how hardware commoditization, the ubiquitous nature of connectivity, and the emergence of Big Data and analysis are providing the pull to meet customer expectations of a widely connected, multi-dimensional universe of people, things, and information.
SYS-CON Events announced today that Esri has been named “Bronze Sponsor” of SYS-CON's 15th International Cloud Expo®, which will take place on November 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA. Esri inspires and enables people to positively impact the future through a deeper, geographic understanding of the changing world around them. For more information, visit http://www.esri.com.
There will be 50 billion Internet connected devices by 2020. Today, every manufacturer has a propriety protocol and an app. How do we securely integrate these "things" into our lives and businesses in a way that we can easily control and manage? Even better, how do we integrate these "things" so that they control and manage each other so our lives become more convenient or our businesses become more profitable and/or safe? We have heard that the best interface is no interface. In his session at Internet of @ThingsExpo, Chris Matthieu, Co-Founder & CTO at Octoblu, Inc., will discuss how these devices generate enough data to learn our behaviors and simplify/improve our lives. What if we could connect everything to everything? I'm not only talking about connecting things to things but also systems, cloud services, and people. Add in a little machine learning and artificial intelligence and now we have something interesting...
After a couple of false starts, cloud-based desktop solutions are picking up steam, driven by trends such as BYOD and pervasive high-speed connectivity. In his session at 15th Cloud Expo, Seth Bostock, CEO of IndependenceIT, cuts through the hype and the acronyms, and discusses the emergence of full-featured cloud workspaces that do for the desktop what cloud infrastructure did for the server. He’ll discuss VDI vs DaaS, implementation strategies and evaluation criteria.
Cloud computing started a technology revolution; now DevOps is driving that revolution forward. By enabling new approaches to service delivery, cloud and DevOps together are delivering even greater speed, agility, and efficiency. No wonder leading innovators are adopting DevOps and cloud together! In his session at DevOps Summit, Andi Mann, Vice President of Strategic Solutions at CA Technologies, will explore the synergies in these two approaches, with practical tips, techniques, research data, war stories, case studies, and recommendations.
SYS-CON Events announced today that Cloudian, Inc., the leading provider of hybrid cloud storage solutions, has been named “Bronze Sponsor” of SYS-CON's 15th International Cloud Expo®, which will take place on November 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA. Cloudian is a Foster City, Calif.-based software company specializing in cloud storage. Cloudian HyperStore® is an S3-compatible cloud object storage platform that enables service providers and enterprises to build reliable, affordable and scalable hybrid cloud storage solutions. Cloudian actively partners with leading cloud computing environments including Amazon Web Services, Citrix Cloud Platform, Apache CloudStack, OpenStack and the vast ecosystem of S3 compatible tools and applications. Cloudian's customers include Vodafone, Nextel, NTT, Nifty, and LunaCloud. The company has additional offices in China and Japan.
Cloud Computing is evolving into a Big Three of Amazon Web Services, Google Cloud, and Microsoft Azure. Cloud 360: Multi-Cloud Bootcamp, being held Nov 4–5, 2014, in conjunction with 15th Cloud Expo in Santa Clara, CA, delivers a real-world demonstration of how to deploy and configure a scalable and available web application on all three platforms. The Cloud 360 Bootcamp, led by Janakiram MSV, an analyst with Gigaom Research, is the first bootcamp that introduces the core concepts of Infrastructure as a Service (IaaS) based on the workings of the Big Three platforms – Amazon EC2, Google Compute Engine, and Azure VMs. Bootcamp attendees will get to see the big picture and also receive the knowledge needed to make the best cloud decisions for their business applications and entire enterprise IT organization.
“Distrix fits into the overall cloud and IoT model around software-defined networking. There’s a broad category around software-defined networking that’s focused on data center, and we focus on the WAN,” explained Jay Friedman, President of Distrix, in this SYS-CON.tv interview at the Internet of @ThingsExpo, held June 10-12, 2014, at the Javits Center in New York City. Internet of @ThingsExpo 2014 Silicon Valley, November 4–6, at the Santa Clara Convention Center in Santa Clara, CA, will feature technical sessions from a rock star conference faculty and the leading IoT industry players in the world.
The Internet of Things promises to transform businesses (and lives), but navigating the business and technical path to success can be difficult to understand. In his session at 15th Internet of @ThingsExpo, Chad Jones, Vice President, Product Strategy of LogMeIn's Xively IoT Platform, will show you how to approach creating broadly successful connected customer solutions using real world business transformation studies including New England BioLabs and more.
Scott Jenson leads a project called The Physical Web within the Chrome team at Google. Project members are working to take the scalability and openness of the web and use it to talk to the exponentially exploding range of smart devices. Nearly every company today working on the IoT comes up with the same basic solution: use my server and you'll be fine. But if we really believe there will be trillions of these devices, that just can't scale. We need a system that is open a scalable and by using the URL as a basic building block, we open this up and get the same resilience that the web enjoys.
“The Internet of Things is a wave that has arrived and it’s growing really fast. The concern at Aria Systems is making sure that people understand the ramifications of their attempts to monetize whatever it is they build on the Internet of Things," explained C Brendan O’Brien, Co-founder and Chief Architect at Aria Systems, in this SYS-CON.tv interview at the Internet of @ThingsExpo, held June 10-12, 2014, at the Javits Center in New York City. Internet of @ThingsExpo 2014 Silicon Valley, November 4–6, at the Santa Clara Convention Center in Santa Clara, CA, will feature technical sessions from a rock star conference faculty and the leading IoT industry players in the world.
The Internet of Things is a natural complement to the cloud and related technologies such as Big Data, analytics, and mobility. In his session at Internet of @ThingsExpo, Joe Weinman will lay out four generic strategies – digital disciplines – to exploit emerging digital technologies for strategic advantage. Joe Weinman has held executive leadership positions at Bell Labs, AT&T, Hewlett-Packard, and Telx, in areas such as corporate strategy, business development, product management, operations, and R&D.
SYS-CON Events announced today that DevOps.com has been named “Media Sponsor” of SYS-CON's “DevOps Summit at Cloud Expo,” which will take place on June 10–12, 2014, at the Javits Center in New York City, New York. DevOps.com is where the world meets DevOps. It is the largest collection of original content relating to DevOps on the web today Featuring up-to-the-minute news, feature stories, blogs, bylined articles and more, DevOps.com is where the thought leaders of the DevOps movement make their ideas known.
There are 182 billion emails sent every day, generating a lot of data about how recipients and ISPs respond. Many marketers take a more-is-better approach to stats, preferring to have the ability to slice and dice their email lists based numerous arbitrary stats. However, fundamentally what really matters is whether or not sending an email to a particular recipient will generate value. Data Scientists can design high-level insights such as engagement prediction models and content clusters that allow marketers to cut through the noise and design their campaigns around strong, predictive signals, rather than arbitrary statistics. SendGrid sends up to half a billion emails a day for customers such as Pinterest and GitHub. All this email adds up to more text than produced in the entire twitterverse. We track events like clicks, opens and deliveries to help improve deliverability for our customers – adding up to over 50 billion useful events every month. While SendGrid data covers only abo...
SYS-CON Events announced today that the Web Host Industry Review has been named “Media Sponsor” of SYS-CON's 15th International Cloud Expo®, which will take place on November 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA. Since 2000, The Web Host Industry Review has made a name for itself as the foremost authority of the Web hosting industry providing reliable, insightful and comprehensive news, reviews and resources to the hosting community. TheWHIR Blogs provides a community of expert industry perspectives. The Web Host Industry Review Magazine also offers a business-minded, issue-driven perspective of interest to executives and decision-makers. WHIR TV offers on demand web hosting video interviews and web hosting video features of the key persons and events of the web hosting industry. WHIR Events brings together like-minded hosting industry professionals and decision-makers in local communities. TheWHIR is an iNET Interactive property.