Welcome!

Cloud Expo Authors: Cynthia Dunlop, Elizabeth White, Trevor Parsons, Greg Ness, Carmen Gonzalez

Related Topics: Big Data Journal, SOA & WOA, Virtualization, AJAX & REA, Cloud Expo, SDN Journal

Big Data Journal: Article

From Metrics to Success

Ford scours for more big data to bolster quality, improve manufacturing, streamline processes

Ford has exploited the strengths of big data analytics by directing them internally to improve business results. In doing so, they scour the metrics from the company’s best processes across myriad manufacturing efforts and through detailed outputs from in-use automobiles -- all to improve and help transform their business.

So explains Michael Cavaretta, PhD, Technical Leader of Predictive Analytics for Ford Research and Advanced Engineering in Dearborn, Michigan. Cavaretta is one of a group of experts assembled this week for The Open Group Conference in Newport Beach, California.

Cavaretta has led multiple data-analytic projects at Ford to break down silos inside the company to best define Ford’s most fruitful data sets. Ford has successfully aggregated customer feedback, and extracted all the internal data to predict how best new features in technologies will improve their cars.

As a contributor to the The Open Group conference and its focus on "Big Data -- The Transformation We Need to Embrace Today," Cavaretta explains how big data is fostering business transformation by allowing deeper insights into more types of data efficiently, and thereby improving processes, quality control, and customer satisfaction.

The interview was moderated by Dana Gardner, Principal Analyst at Interarbor Solutions. [Disclosure: The Open Group is a sponsor of BriefingsDirect podcasts.]

Here are some excerpts:

Gardner: What's different now in being able to get at this data and do this type of analysis from five years ago?

Cavaretta: The biggest difference has to do with the cheap availability of storage and processing power, where a few years ago people were very much concentrated on filtering down the datasets that were being stored for long-term analysis. There has been a big sea change with the idea that we should just store as much as we can and take advantage of that storage to improve business processes.

Gardner: How did we get here? What's the process behind the benefits?

Sea change in attitude

Cavaretta: The process behind the benefits has to do with a sea change in the attitude of organizations, particularly IT within large enterprises. There's this idea that you don't need to spend so much time figuring out what data you want to store and worry about the cost associated with it, and more about data as an asset. There is value in being able to store it, and being able to go back and extract different insights from it. This really comes from this really cheap storage, access to parallel processing machines, and great software.

Cavaretta

I like to talk to people about the possibility that big data provides and I always tell them that I have yet to have a circumstance where somebody is giving me too much data. You can pull in all this information and then answer a variety of questions, because you don't have to worry that something has been thrown out. You have everything.

You may have 100 questions, and each one of the questions uses a very small portion of the data. Those questions may use different portions of the data, a very small piece, but they're all different. If you go in thinking, "We’re going to answer the top 20 questions and we’re just going to hold data for that," that leaves so much on the table, and you don't get any value out of it.

The process behind the benefits has to do with a sea change in the attitude of organizations, particularly IT within large enterprises.

We're a big believer in mash-ups and we really believe that there is a lot of value in being able to take even datasets that are not specifically big-data sizes yet, and then not go deep, not get more detailed information, but expand the breadth. So it's being able to augment it with other internal datasets, bridging across different business areas as well as augmenting it with external datasets.

A lot of times you can take something that is maybe a few hundred thousand records or a few million records, and then by the time you’re joining it, and appending different pieces of information onto it, you can get the big dataset sizes.

Gardner: You’re really looking primarily at internal data, while also availing yourself of what external data might be appropriate. Maybe you could describe a little bit about your organization, what you do, and why this internal focus is so important for you.

Internal consultants
Cavaretta: I'm part of a larger department that is housed over in the research and advanced-engineering area at Ford Motor Company, and we’re about 30 people. We work as internal consultants, kind of like Capgemini or Ernst & Young, but only within Ford Motor Company. We’re responsible for going out and looking for different opportunities from the business perspective to bring advanced technologies. So, we’ve been focused on the area of statistical modeling and machine learning for I’d say about 15 years or so.

And in this time, we’ve had a number of engagements where we’ve talked with different business customers, and people have said, "We'd really like to do this." Then, we'd look at the datasets that they have, and say, "Wouldn’t it be great if we would have had this. So now we have to wait six months or a year."

These new technologies are really changing the game from that perspective. We can turn on the complete fire-hose, and then say that we don't have to worry about that anymore. Everything is coming in. We can record it all. We don't have to worry about if the data doesn’t support this analysis, because it's all there. That's really a big benefit of big-data technologies.

The real value proposition definitely is changing as things are being pushed down in the company to lower-level analysts who are really interested in looking at things from a data-driven perspective. From when I first came in to now, the biggest change has been when Alan Mulally came into the company, and really pushed the idea of data-driven decisions.

The real value proposition definitely is changing as things are being pushed down in the company to lower-level analysts.

Before, we were getting a lot of interest from people who are really very focused on the data that they had internally. After that, they had a lot of questions from their management and from upper level directors and vice-president saying, "We’ve got all these data assets. We should be getting more out of them." This strategic perspective has really changed a lot of what we’ve done in the last few years.

Gardner: Are we getting to the point where this sort of Holy Grail notion of a total feedback loop across the lifecycle of a major product like an automobile is really within our grasp? Are we getting there, or is this still kind of theoretical. Can we pull it altogether and make it a science?

Cavaretta: The theory is there. The question has more to do with the actual implementation and the practicality of it. We still are talking a lot of data where even with new advanced technologies and techniques that’s a lot of data to store, it’s a lot of data to analyze, there’s a lot of data to make sure that we can mash-up appropriately.

And, while I think the potential is there and I think the theory is there. There is also a work in being able to get the data from multiple sources. So everything which you can get back from the vehicle, fantastic. Now if you marry that up with internal data, is it survey data, is it manufacturing data, is it quality data? What are the things do you want to go after first? We can’t do everything all at the same time.

Highest value

Our perspective has been let’s make sure that we identify the highest value, the greatest ROI areas, and then begin to take some of the major datasets that we have and then push them and get more detail. Mash them up appropriately and really prove up the value for the technologists.

Gardner: Clearly, there's a lot more to come in terms of where we can take this, but I suppose it's useful to have a historic perspective and context as well. I was thinking about some of the early quality gurus like Deming and some of the movement towards quality like Six Sigma. Does this fall within that same lineage? Are we talking about a continuum here over that last 50 or 60 years, or is this something different?

Cavaretta: That’s a really interesting question. From the perspective of analyzing data, using data appropriately, I think there is a really good long history, and Ford has been a big follower of Deming and Six Sigma for a number of years now.

The difference though, is this idea that you don't have to worry so much upfront about getting the data. If you're doing this right, you have the data right there, and this has some great advantages. You’ll have to wait until you get enough history to look for somebody’s patterns. Then again, it also has some disadvantage, which is you’ve got so much data that it’s easy to find things that could be spurious correlations or models that don’t make any sense.

The piece that is required is good domain knowledge, in particular when you are talking about making changes in the manufacturing plant. It's very appropriate to look at things and be able to talk with people who have 20 years of experience to say, "This is what we found in the data. Does this match what your intuition is?" Then, take that extra step.

Gardner: How has the notion of the Internet of things being brought to bear on your gathering of big data and applying it to the analytics in your organization?

Cavaretta: It is a huge area, and not only from the internal process perspective -- RFID tags within the manufacturing plans, as well as out on the plant floor, and then all of the information that’s being generated by the vehicle itself.

The Ford Energi generates about 25 gigabytes of data per hour. So you can imagine selling couple of million vehicles in the near future with that amount of data being generated. There are huge opportunities within that, and there are also some interesting opportunities having to do with opening up some of these systems for third-party developers. OpenXC is an initiative that we have going on to add at Research and Advanced Engineering.

Huge number of sensors

We have a lot of data coming from the vehicle. There’s huge number of sensors and processors that are being added to the vehicles. There's data being generated there, as well as communication between the vehicle and your cell phone and communication between vehicles.

There's a group over at Ann Arbor Michigan, the University of Michigan Transportation Research Institute (UMTRI), that’s investigating that, as well as communication between the vehicle and let’s say a home system. It lets the home know that you're on your way and it’s time to increase the temperature, if it’s winter outside, or cool it at the summer time.

The amount of data that’s been generated there is invaluable information and could be used for a lot of benefits, both from the corporate perspective, as well as just the very nature of the environment.

Gardner: Just to put a stake in the ground on this, how much data do cars typically generate? Do you have a sense of what now is the case, an average?

Cavaretta: The Energi, according to the latest information that I have, generates about 25 gigabytes per hour. Different vehicles are going to generate different amounts, depending on the number of sensors and processors on the vehicle. But the biggest key has to do with not necessarily where we are right now but where we will be in the near future.

With the amount of information that's being generated from the vehicles, a lot of it is just internal stuff. The question is how much information should be sent back for analysis and to find different patterns? That becomes really interesting as you look at external sensors, temperature, humidity. You can know when the windshield wipers go on, and then to be able to take that information, and mash that up with other external data sources too. It's a very interesting domain.

With the amount of information that's being generated from the vehicles, a lot of it is just internal stuff.

Gardner: What skills do you target for your group, and what ways do you think that you can improve on that?

Cavaretta: The skills that we have in our department, in particular on our team, are in the area of computer science, statistics, and some good old-fashioned engineering domain knowledge. We’ve really gone about this from a training perspective. Aside from a few key hires, it's really been an internally developed group.

Targeted training

The biggest advantage that we have is that we can go out and be very targeted with the amount of training that we have. There are such big tools out there, especially in the open-source realm, that we can spin things up with relatively low cost and low risk, and do a number of experiments in the area. That's really the way that we push the technologies forward.

Talking with The Open Group really gives me an opportunity to be able to bring people on board with the idea that you should be looking at a difference in mindset. It's not "Here’s a way that data is being generated, look, try and conceive of some questions that we can use, and we’ll store that too." Let's just take everything, we’ll worry about it later, and then we’ll find the value.

It's important to be thinking about data as an asset, rather than as a cost. You even have to spend some money, and it may be a little bit unsafe without really solid ROI at the beginning. Then, move towards pulling that information in, and being able to store it in a way that allows not just the high-level data scientist to get access to and provide value, but people who are interested in the data overall. Those are very important pieces.

The last one is how do you take a big-data project, how do you take something where you’re not storing in the traditional business intelligence (BI) framework that an enterprise can develop, and then connect that to the BI systems and look at providing value to those mash-ups. Those are really important areas that still need some work.

There are many companies, especially large enterprises, that are looking at their data assets and wondering what can they do to monetize this, not only to just pay for the efficiency improvement but as a new revenue stream.

Gardner: For those organizations that want to get started on this, how do you get started?

Understand that it maybe going to be a little bit more costly and the ROI isn't going to be there at the beginning.

Cavaretta: We're definitely a huge believer in pilot projects and proof of concept, and we like to develop roadmaps by doing. So get out there. Understand that it's going to be messy. Understand that it maybe going to be a little bit more costly and the ROI isn't going to be there at the beginning.

But get your feet wet. Start doing some experiments, and then, as those experiments turn from just experimentation into really providing real business value, that’s the time to start looking at a more formal aspect and more formal IT processes. But you've just got to get going at this point.

You may also be interested in:

More Stories By Dana Gardner

At Interarbor Solutions, we create the analysis and in-depth podcasts on enterprise software and cloud trends that help fuel the social media revolution. As a veteran IT analyst, Dana Gardner moderates discussions and interviews get to the meat of the hottest technology topics. We define and forecast the business productivity effects of enterprise infrastructure, SOA and cloud advances. Our social media vehicles become conversational platforms, powerfully distributed via the BriefingsDirect Network of online media partners like ZDNet and IT-Director.com. As founder and principal analyst at Interarbor Solutions, Dana Gardner created BriefingsDirect to give online readers and listeners in-depth and direct access to the brightest thought leaders on IT. Our twice-monthly BriefingsDirect Analyst Insights Edition podcasts examine the latest IT news with a panel of analysts and guests. Our sponsored discussions provide a unique, deep-dive focus on specific industry problems and the latest solutions. This podcast equivalent of an analyst briefing session -- made available as a podcast/transcript/blog to any interested viewer and search engine seeker -- breaks the mold on closed knowledge. These informational podcasts jump-start conversational evangelism, drive traffic to lead generation campaigns, and produce strong SEO returns. Interarbor Solutions provides fresh and creative thinking on IT, SOA, cloud and social media strategies based on the power of thoughtful content, made freely and easily available to proactive seekers of insights and information. As a result, marketers and branding professionals can communicate inexpensively with self-qualifiying readers/listeners in discreet market segments. BriefingsDirect podcasts hosted by Dana Gardner: Full turnkey planning, moderatiing, producing, hosting, and distribution via blogs and IT media partners of essential IT knowledge and understanding.

@CloudExpo Stories
SYS-CON Events announced today that Parasoft will exhibit at SYS-CON's 15th International Cloud Expo®, which will take place on November 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA. For 27 years, Parasoft has researched and developed software solutions that help organizations deliver defect-free software efficiently. By integrating Development Testing, API/cloud/SOA/composite app testing, and service virtualization, we reduce the time, effort, and cost of delivering secur...
Dyn solutions are at the core of Internet Performance. Through traffic management, message management and performance assurance, Dyn is connecting people through the Internet and ensuring information gets where it needs to go, faster and more reliably than ever before. Founded in 2001 at WPI, Dyn’s global presence services more than four million enterprise, small business and personal customers.
SimpleECM is the only platform to offer a powerful combination of enterprise content management (ECM) services, capture solutions, and third-party business services providing simplified integrations and workflow development for solution providers. SimpleECM is opening the market to businesses of all sizes by reinventing the delivery of ECM services. Our APIs make the development of ECM services simple with the use of familiar technologies for a frictionless integration directly into web applicat...
Until recently, many organizations required specialized departments to perform mapping and geospatial analysis, and they used Esri on-premise solutions for that work. In his session at 15th Cloud Expo, Dave Peters, author of the Esri Press book Building a GIS, System Architecture Design Strategies for Managers, will discuss how Esri has successfully included the cloud as a fully integrated SaaS expansion of the ArcGIS mapping platform. Organizations that have incorporated Esri cloud-based appl...
European data center operator DEAC is the largest in the Baltics. The activities are orientated to provide data center services and IT outsourcing on Eurasia and America scale in order to create the primary or backup or additional data center for customer in the EU, to protect its business and, most importantly, reduce costs up to 40% within 3-5 years. DEAC is an IT outsourcing services and solutions company whose highly experienced and qualified employees offer various groups of services and...
The Internet of Things will greatly expand the opportunities for data collection and new business models driven off of that data. In her session at Internet of @ThingsExpo, Esmeralda Swartz, CMO of MetraTech, will discuss how for this to be effective you not only need to have infrastructure and operational models capable of utilizing this new phenomenon, but increasingly service providers will need to convince a skeptical public to participate. Get ready to show them the money! Speaker Bio: ...
Samsung VP Jacopo Lenzi, who headed the company's recent SmartThings acquisition under the auspices of Samsung's Open Innovaction Center (OIC), answered a few questions we had about the deal. This interview was in conjunction with our interview with SmartThings CEO Alex Hawkinson. IoT Journal: SmartThings was developed in an open, standards-agnostic platform, and will now be part of Samsung's Open Innovation Center. Can you elaborate on your commitment to keep the platform open? Jacopo Lenzi: S...
The major cloud platforms defy a simple, side-by-side analysis. Each of the major IaaS public-cloud platforms offers their own unique strengths and functionality. Options for on-site private cloud are diverse as well, and must be designed and deployed while taking existing legacy architecture and infrastructure into account. Then the reality is that most enterprises are embarking on a hybrid cloud strategy and programs. In this Power Panel at 15th Cloud Expo, moderated by Ashar Baig, Research ...
Things are being built upon cloud foundations to transform organizations. This CEO Power Panel at 15th Cloud Expo, moderated by Roger Strukhoff, Cloud Expo and @ThingsExpo conference chair, will address the big issues involving these technologies and, more important, the results they will achieve. How important are public, private, and hybrid cloud to the enterprise? How does one define Big Data? And how is the IoT tying all this together?
When an enterprise builds a hybrid IaaS cloud connecting its data center to one or more public clouds, security is often a major topic along with the other challenges involved. Security is closely intertwined with the networking choices made for the hybrid cloud. Traditional networking approaches for building a hybrid cloud try to kludge together the enterprise infrastructure with the public cloud. Consequently this approach requires risky, deep "surgery" including changes to firewalls, subnets...
Ixia develops amazing products so its customers can connect the world. Ixia helps its customers provide an always-on user experience through fast, secure delivery of dynamic connected technologies and services. Through actionable insights that accelerate and secure application and service delivery, Ixia's customers benefit from faster time to market, optimized application performance and higher-quality deployments.
SYS-CON Events announced today that Stratogent will exhibit at SYS-CON's 15th International Cloud Expo®, which will take place on November 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA. Stratogent is a custom managed services organization based in San Mateo, California. We design, implement, and support mission critical infrastructure 24x7 on premises, in datacenters and in the Cloud. Since 2005, we have acted as an extension of internal IT teams, achieving a customer reten...
SYS-CON Events announces a new pavilion on the Cloud Expo floor where WebRTC converges with the Internet of Things. Pavilion will showcase WebRTC and the Internet of Things. The Internet of Things (IoT) is the most profound change in personal and enterprise IT since the creation of the Worldwide Web more than 20 years ago. All major researchers estimate there will be tens of billions devices--computers, smartphones, tablets, and sensors – connected to the Internet by 2020. This number will con...
The only place to be June 9-11 is Cloud Expo & @ThingsExpo 2015 East at the Javits Center in New York City. Join us there as delegates from all over the world come to listen to and engage with speakers & sponsors from the leading Cloud Computing, IoT & Big Data companies. Cloud Expo & @ThingsExpo are the leading events covering the booming market of Cloud Computing, IoT & Big Data for the enterprise. Speakers from all over the world will be hand-picked for their ability to explore the economic...
SYS-CON Events announced today that Cloudian, Inc., the leading provider of hybrid cloud storage solutions, has been named “Bronze Sponsor” of SYS-CON's 15th International Cloud Expo®, which will take place on November 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA. Cloudian is a Foster City, Calif.-based software company specializing in cloud storage. Cloudian HyperStore® is an S3-compatible cloud object storage platform that enables service providers and enterprises to bui...
SYS-CON Events announced today that Gridstore™, the leader in software-defined storage (SDS) purpose-built for Windows Servers and Hyper-V, will exhibit at SYS-CON's 15th International Cloud Expo®, which will take place on November 4–6, 2014, at the Santa Clara Convention Center in Santa Clara, CA. Gridstore™ is the leader in software-defined storage purpose built for virtualization that is designed to accelerate applications in virtualized environments. Using its patented Server-Side Virtual C...
As the Internet of Things unfolds, mobile and wearable devices are blurring the line between physical and digital, integrating ever more closely with our interests, our routines, our daily lives. Contextual computing and smart, sensor-equipped spaces bring the potential to walk through a world that recognizes us and responds accordingly. We become continuous transmitters and receivers of data. In his session at Internet of @ThingsExpo, Andrew Bolwell, Director of Innovation for HP’s Printing a...
SAP is delivering break-through innovation combined with fantastic user experience powered by the market-leading in-memory technology, SAP HANA. In his General Session at 15th Cloud Expo, Thorsten Leiduck, VP ISVs & Digital Commerce, SAP, will discuss how SAP and partners provide cloud and hybrid cloud solutions as well as real-time Big Data offerings that help companies of all sizes and industries run better. SAP launched an application challenge to award the most innovative SAP HANA and SAP ...
The Internet of Things (IoT) promises to evolve the way the world does business; however, understanding how to apply it to your company can be a mystery. Most people struggle with understanding the potential business uses or tend to get caught up in the technology, resulting in solutions that fail to meet even minimum business goals. In his session at Internet of @ThingsExpo, Jesse Shiah, CEO / President / Co-Founder of AgilePoint Inc., will show what is needed to leverage the IoT to transform...
SYS-CON Events announced today that AIC, a leading provider of OEM/ODM server and storage solutions, will exhibit at SYS-CON's 16th International Cloud Expo®, which will take place on June 9-11, 2015, at the Javits Center in New York City, NY. AIC is a leading provider of both standard OTS, off-the-shelf, and OEM/ODM server and storage solutions. With expert in-house design capabilities, validation, manufacturing and production, AIC's broad selection of products are highly flexible and are conf...