Welcome!

@CloudExpo Authors: Dalibor Siroky, Kevin Jackson, Xenia von Wedel, Destiny Bertucci, Elizabeth White

Related Topics: @CloudExpo, Microservices Expo, Open Source Cloud

@CloudExpo: Article

More Use Cases for Big Data Analytics

Measuring Development Productivity with Hadoop

After its initial start in research work and in social network sites Hadoop is now becoming a big part of the enterprise IT landscape. There were recent announcements from Microsoft about embracing Hadoop as part of its Windows Azure High Performance Computing initiative and from Oracle regarding new options like Oracle Loader support for Hadoop-processed data.

Initial Use Cases for Hadoop
The following are typical use cases that can be realized with the power of Hadoop:

  • Analyzing customer web usage towards predicting what would be of interest to the customer and target advertisements accordingly
  • Detecting fraud in online systems based on various behavioral patterns
  • Market and customer segmentation
  • Recommendation engines - increase an average order size by recommending complementary products based on predictive analysis for cross-selling.
  • You can visit the Cloudera site, which distributes Hadoop, along with various support options to suit to the enterprise to learn more about the Hadoop use cases: http://www.cloudera.com/why-hadoop/

You can also refer to my earlier article on Traditional vs Big Data Analytics on various enterprise class use cases that can be realized using big analytical tools like Hadoop.

Providing Real-Time Dashboards for Development Productivity
While most of the above use cases are about runtime benefits to the enterprise, we do find that Hadoop, if used properly, can provide much-needed insight to the development teams by providing valuable dashboards to program managers and directors about the team's productivity and where they stand with respect to code quality, code coverage and whether code can meet the required deadlines with respect to the development life cycle. Let's analyze how this can be enabled with proper usage of Hadoop.

Large application developments happen, especially when your organization is developing products or other large custom applications. As a program manager you want to get a real-time dashboard of how your development teams are progressing. The following live information may provide you with lot of insight to track the projects:

  • Lines of code (a measure of function points that also provides an idea of functional coverage of the system)
  • Code Coverage %, i.e., the percentage of code that is covered through various unit test cases.
  • Types of exception generated during unit testing, whether they are application related or system related, for example, if during development there is lot of application-related exceptions, this may be an indication that the development team does not fully know the functionality.
  • Code quality analysis - whether code is not having any audit- or metric-related issues like depth of inheritance, cyclomatic complexity, etc.
  • Traceability of application modules to requirements.
  • Whether the build process is failing to integrate the code; if so where are all the dependencies.
  • Whether the development team is following the standards with the code conventions and development standards.

Currently most of the program managers are dependent on weekly meetings with the developers to derive this information and are subject to interpretation by individual developers. The main problem is that the above mentioned metrics are scattered in multiple log files and with a large development team, this may run into a huge volume of unstructured text. Some of the following log files will be of interest in this case:

  • Source code stored in various repositories
  • Eclipse or Visual Studio Log Files generated during development and unit testing
  • Log files generated by the test tools like JUnit
  • Logging information generated by the application servers and web servers during development as the developers will likely turn on their LOG4J or equivalent logging mechanisms
  • Debugging information generated by built-in tools like Eclipse or Visual Studio
  • Logs generated by the code quality analysis tools
  • Logs generated by code vulnerability scanning tools
  • Logs generated by build environments like Ant or cruise control or the equivalent

Typically Hadoop can be used to analyze these large amounts of unstructured log files and the output can be utilized to create dashboards in real time for the program managers.

Summary
The success of this use of Hadoop depends on the technical implementation of map and reduce functionalities that will act on the huge set of log files listed above from each developer's machine. However, considering the fact that similar algorithms have been implemented for various web-based log analytics, this implementation should not be too difficult. If implemented properly this can provide a real-time dashboard for program managers to monitor the performance of the development team and take corrective actions.

More Stories By Srinivasan Sundara Rajan

Highly passionate about utilizing Digital Technologies to enable next generation enterprise. Believes in enterprise transformation through the Natives (Cloud Native & Mobile Native).

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


@CloudExpo Stories
"Storpool does only block-level storage so we do one thing extremely well. The growth in data is what drives the move to software-defined technologies in general and software-defined storage," explained Boyan Ivanov, CEO and co-founder at StorPool, in this SYS-CON.tv interview at 16th Cloud Expo, held June 9-11, 2015, at the Javits Center in New York City.
A strange thing is happening along the way to the Internet of Things, namely far too many devices to work with and manage. It has become clear that we'll need much higher efficiency user experiences that can allow us to more easily and scalably work with the thousands of devices that will soon be in each of our lives. Enter the conversational interface revolution, combining bots we can literally talk with, gesture to, and even direct with our thoughts, with embedded artificial intelligence, whic...
As DevOps methodologies expand their reach across the enterprise, organizations face the daunting challenge of adapting related cloud strategies to ensure optimal alignment, from managing complexity to ensuring proper governance. How can culture, automation, legacy apps and even budget be reexamined to enable this ongoing shift within the modern software factory? In her Day 2 Keynote at @DevOpsSummit at 21st Cloud Expo, Aruna Ravichandran, VP, DevOps Solutions Marketing, CA Technologies, was jo...
As Marc Andreessen says software is eating the world. Everything is rapidly moving toward being software-defined – from our phones and cars through our washing machines to the datacenter. However, there are larger challenges when implementing software defined on a larger scale - when building software defined infrastructure. In his session at 16th Cloud Expo, Boyan Ivanov, CEO of StorPool, provided some practical insights on what, how and why when implementing "software-defined" in the datacent...
Blockchain. A day doesn’t seem to go by without seeing articles and discussions about the technology. According to PwC executive Seamus Cushley, approximately $1.4B has been invested in blockchain just last year. In Gartner’s recent hype cycle for emerging technologies, blockchain is approaching the peak. It is considered by Gartner as one of the ‘Key platform-enabling technologies to track.’ While there is a lot of ‘hype vs reality’ discussions going on, there is no arguing that blockchain is b...
Blockchain is a shared, secure record of exchange that establishes trust, accountability and transparency across business networks. Supported by the Linux Foundation's open source, open-standards based Hyperledger Project, Blockchain has the potential to improve regulatory compliance, reduce cost as well as advance trade. Are you curious about how Blockchain is built for business? In her session at 21st Cloud Expo, René Bostic, Technical VP of the IBM Cloud Unit in North America, discussed the b...
You know you need the cloud, but you’re hesitant to simply dump everything at Amazon since you know that not all workloads are suitable for cloud. You know that you want the kind of ease of use and scalability that you get with public cloud, but your applications are architected in a way that makes the public cloud a non-starter. You’re looking at private cloud solutions based on hyperconverged infrastructure, but you’re concerned with the limits inherent in those technologies.
Is advanced scheduling in Kubernetes achievable?Yes, however, how do you properly accommodate every real-life scenario that a Kubernetes user might encounter? How do you leverage advanced scheduling techniques to shape and describe each scenario in easy-to-use rules and configurations? In his session at @DevOpsSummit at 21st Cloud Expo, Oleg Chunikhin, CTO at Kublr, answered these questions and demonstrated techniques for implementing advanced scheduling. For example, using spot instances and co...
The cloud era has reached the stage where it is no longer a question of whether a company should migrate, but when. Enterprises have embraced the outsourcing of where their various applications are stored and who manages them, saving significant investment along the way. Plus, the cloud has become a defining competitive edge. Companies that fail to successfully adapt risk failure. The media, of course, continues to extol the virtues of the cloud, including how easy it is to get there. Migrating...
The use of containers by developers -- and now increasingly IT operators -- has grown from infatuation to deep and abiding love. But as with any long-term affair, the honeymoon soon leads to needing to live well together ... and maybe even getting some relationship help along the way. And so it goes with container orchestration and automation solutions, which are rapidly emerging as the means to maintain the bliss between rapid container adoption and broad container use among multiple cloud host...
Imagine if you will, a retail floor so densely packed with sensors that they can pick up the movements of insects scurrying across a store aisle. Or a component of a piece of factory equipment so well-instrumented that its digital twin provides resolution down to the micrometer.
The need for greater agility and scalability necessitated the digital transformation in the form of following equation: monolithic to microservices to serverless architecture (FaaS). To keep up with the cut-throat competition, the organisations need to update their technology stack to make software development their differentiating factor. Thus microservices architecture emerged as a potential method to provide development teams with greater flexibility and other advantages, such as the abili...
In his keynote at 18th Cloud Expo, Andrew Keys, Co-Founder of ConsenSys Enterprise, provided an overview of the evolution of the Internet and the Database and the future of their combination – the Blockchain. Andrew Keys is Co-Founder of ConsenSys Enterprise. He comes to ConsenSys Enterprise with capital markets, technology and entrepreneurial experience. Previously, he worked for UBS investment bank in equities analysis. Later, he was responsible for the creation and distribution of life settle...
Product connectivity goes hand and hand these days with increased use of personal data. New IoT devices are becoming more personalized than ever before. In his session at 22nd Cloud Expo | DXWorld Expo, Nicolas Fierro, CEO of MIMIR Blockchain Solutions, will discuss how in order to protect your data and privacy, IoT applications need to embrace Blockchain technology for a new level of product security never before seen - or needed.
Leading companies, from the Global Fortune 500 to the smallest companies, are adopting hybrid cloud as the path to business advantage. Hybrid cloud depends on cloud services and on-premises infrastructure working in unison. Successful implementations require new levels of data mobility, enabled by an automated and seamless flow across on-premises and cloud resources. In his general session at 21st Cloud Expo, Greg Tevis, an IBM Storage Software Technical Strategist and Customer Solution Architec...
Nordstrom is transforming the way that they do business and the cloud is the key to enabling speed and hyper personalized customer experiences. In his session at 21st Cloud Expo, Ken Schow, VP of Engineering at Nordstrom, discussed some of the key learnings and common pitfalls of large enterprises moving to the cloud. This includes strategies around choosing a cloud provider(s), architecture, and lessons learned. In addition, he covered some of the best practices for structured team migration an...
In his general session at 21st Cloud Expo, Greg Dumas, Calligo’s Vice President and G.M. of US operations, discussed the new Global Data Protection Regulation and how Calligo can help business stay compliant in digitally globalized world. Greg Dumas is Calligo's Vice President and G.M. of US operations. Calligo is an established service provider that provides an innovative platform for trusted cloud solutions. Calligo’s customers are typically most concerned about GDPR compliance, application p...
Coca-Cola’s Google powered digital signage system lays the groundwork for a more valuable connection between Coke and its customers. Digital signs pair software with high-resolution displays so that a message can be changed instantly based on what the operator wants to communicate or sell. In their Day 3 Keynote at 21st Cloud Expo, Greg Chambers, Global Group Director, Digital Innovation, Coca-Cola, and Vidya Nagarajan, a Senior Product Manager at Google, discussed how from store operations and ...
In his session at 21st Cloud Expo, Raju Shreewastava, founder of Big Data Trunk, provided a fun and simple way to introduce Machine Leaning to anyone and everyone. He solved a machine learning problem and demonstrated an easy way to be able to do machine learning without even coding. Raju Shreewastava is the founder of Big Data Trunk (www.BigDataTrunk.com), a Big Data Training and consulting firm with offices in the United States. He previously led the data warehouse/business intelligence and B...
"IBM is really all in on blockchain. We take a look at sort of the history of blockchain ledger technologies. It started out with bitcoin, Ethereum, and IBM evaluated these particular blockchain technologies and found they were anonymous and permissionless and that many companies were looking for permissioned blockchain," stated René Bostic, Technical VP of the IBM Cloud Unit in North America, in this SYS-CON.tv interview at 21st Cloud Expo, held Oct 31 – Nov 2, 2017, at the Santa Clara Conventi...