Wide and cheap availability of cloud-based media services is upon us. With the transformations these services are already bringing to the consumption of music, video and interactive media, change has likewise come to professional workflows. Documents in 2012 are read, written, collaborated on, and distributed anywhere an Internet-enabled device can reach – which is to say, everywhere.
In his session at the 10th International Cloud Expo, Christopher Kenneally, Director of Business Development a...| By Ken Blackwell | Article Rating: |
|
| September 14, 2000 12:00 AM EDT | Reads: |
16,833 |
In my last article (XML-J, Vol. 1, issue 3) I made the case for using custom classes derived from XML Schemas to represent XML documents in C++ applications. That article focused primarily on the problems of generating XML documents from program objects, and explained how custom classes have significant advantages over standards like DOM and SAX in terms of performance, object orientation and maintainability of source code.
Here I'll describe a unique methodology for parsing XML data into C++ classes that provides all the object-oriented benefits detailed in the first article, with increased performance (compared to traditional generic XML parsers).
The Problem with Conventional Parsers
C++ programmers have been dealing with parsing technologies for years. Most of you remember writing simple language parsers in school, and probably wrote the basic syntax parser in tools like Lex and Yacc. So, for C++ developers, the idea of a syntax parser isn't especially intimidating.
The basic grammar for XML is pretty simple compared to a programming language like C++ or Java, for example, but there's one problem unique to XML parsing that is daunting: unlike conventional programming languages, XML doesn't have a fixed set of tags (i.e., keywords). Imagine trying to develop a general-purpose grammar for a programming language with a user-defined set of keywords!
To solve the general problem of XML parsing, it's necessary to build a parser that can be dynamically fed a list of tags and rules for the specific dialect of XML to be parsed. In the terminology of XML standards, that means specifying an XML Schema file to a DOM parser so that it knows how to parse and validate the specific dialect of the input XML file.
If an application reads and writes a variety of dialects of XML documents, the DOM model is appropriate because it doesn't require source code changes for incremental support for a new dialect of XML. This is typically the case for integration broker applications, as described in my last article, in which the broker is reading, transforming and forwarding all kinds of XML documents within and between organizations.
However, as I also described, there's a large class of applications in which only a few types of XML are spoken and these don't often change. For these, the overhead of DOM and the lack of application-specific object orientation is a major drawback.
Static Parsers Derived from XML Schemas
Just as it's beneficial in some environments to derive C++ classes from XML Schemas for writing XML documents, it can also be beneficial to derive classes to read XML documents from schemas.
The typical process for creating a language parser in C++ is to hand-code the Lex rules and Yacc grammar, then generate the Lexer and parser from these XML dialect-specific input files (see Figure 1).
This process is tedious, however, and must be redone for each dialect of XML that your application needs to parse. While doable, the same logic that you'd hand-code in the rules and grammar is already encapsulated in the XML Schema file. A more efficient approach is to develop a translation program that can convert the XML Schema file into the equivalent Lex rules and Yacc grammar for the XML dialect (see Figure 2).
The example project in Listing 1 shows a generated grammar for a sample XML DTD file called acmepc.dtd. You'll see the generated Yacc input in acmepcxml_parser.y and the Lex input in acmepcxml_lexer.l. All the classes and parser for this project are contained in the C++ namespace acmepcxml.
Using the generated custom parser is simple. Just create an instance of the acmepcxml::XMLImporter class, initialize it with its Initialize() member and import the XML data into the schema-derived classes with the ImportFromFile() member. The importer exposes a base class root node of the class tree via the GetXObject() member. This base class is then dynamically cast back to the acmepc class that contains the context of the specific XML dialect defined by the acmepc.dtd schema (see Listing 1).
Advantages of Custom Parser Approach
There are four primary advantages to creating a custom parser rather than using a generic parser like DOM.
- First and foremost, it's fast. I've run benchmarks that show the custom parser to be up to three times faster than the fastest DOM parser I can find while also having a smaller in-memory footprint. The primary reason it's so much faster than DOM seems to be that it doesn't have to do dynamic validation of the XML input. Instead, validation is enforced by the automata generated by Yacc from the input files, which are derived from the XML Schema.
- The generated parser can integrate tightly with the derived classes de-
scribed in my previous article. There is no two-step process of parsing into the DOM hierarchy, then populating classes from the DOM data structures. The custom parser creates the schema-derived classes directly, without the need for the intermediate step. The generated parser can also integrate tightly with framework technologies you might be using, such as STL and MFC class libraries.
- You get all the source code to the components that link into your application. By using the GNU-licensed Flex and Bison tools, the output source code will run on virtually every operating system imaginable. I've been very successful, for example, in running Flex and Bison on Windows NT and using the output C/C++ code on a variety of platforms with no necessary source code changes.
- The final advantage, and the coolest of all, is that using Lex and Yacc enables you to handle those pesky XML entities more easily. I use this feature to automatically expand entities on input so my program doesn't have to worry about them. XML entities can be preprocessed just as a macro is preprocessed by a compiler when parsing a C input file. The class instances created by the custom parser contain data with entity references fully expanded. I can't stress enough the amount of headaches this little feature can save you when dealing with documents with lots of entities.
While XML processing may be new to the C++ community, the skills and technologies that have matured over the last decade in this community can still be very useful in handling XML data formats. In my last article I described the benefits of deriving C++ class definitions from XML Schemas. Here, I've gone a bit further to show how to derive parser grammars for XML dialects from the XML Schema.
As the XML Schema standard nears acceptance, there will be many other opportunities to reuse the work of schema designers to automatically derive programming source code, relational database schemas and other artifacts that otherwise would have to be coded by hand. C++ developers should look for these opportunities as ways to reduce the amount of repetitive work required to add or update support for specific XML dialects.
Published September 14, 2000 Reads 16,833
Copyright © 2000 SYS-CON Media, Inc. — All Rights Reserved.
Syndicated stories and blog feeds, all rights reserved by the author.
More Stories By Ken Blackwell
Ken Blackwell is the chief technical officer of Bristol Technology, Inc., where he oversees product architecture and research in XML, middleware and transaction analysis technologies.
Wide and cheap availability of cloud-based media services is upon us. With the transformations these services are already bringing to the consumption of music, video and interactive media, change has likewise come to professional workflows. Documents in 2012 are read, written, collaborated on, and distributed anywhere an Internet-enabled device can reach – which is to say, everywhere.
In his session at the 10th International Cloud Expo, Christopher Kenneally, Director of Business Development a...Feb. 17, 2012 02:00 PM EST Reads: 450 |
By Jeremy Geelan With Cloud Expo 2012 New York (10th Cloud Expo) just four months away, what better time to start introducing you in greater detail to the distinguished individuals in our incredible Speaker Faculty for the technical and strategy sessions at the conference...
We have technical and strategy sessions for you every day from June 11 through June 14 dealing with every nook and cranny of Cloud Computing and Big Data, but what of those who are presenting? Who are they, where do they work, what else h...Feb. 17, 2012 11:45 AM EST Reads: 488 |
By Elizabeth White Cloud is a shift from the focus on underlying technology implementation to leveraging existing implementations and further building upon them. Cloud orchestration or a network of clouds is the wave of the future where these clouds can operate with elasticity, scalability, and efficiency. Effective service management is an important aspect of managing such networks. The transition to the cloud will enable the further aggregation of composite web services and enhanced business-to-business capabili...Feb. 17, 2012 10:58 AM EST Reads: 437 |
By Brian McCallion I've been working on Enterprise Cloud Strategy and in the course of this work identified some interesting and non-obvious opportunities in the Cloud.
One solution I’ve examined is the well-crafted solution that is enStratus. enStratus has built a SaaS Cloud Management / Governance product focused on providing critical management, monitoring, governance capabilities tailored to the needs of the Global 2000 market, rather than the startup market. As I have worked with a current Fortune 500 clie...Feb. 17, 2012 07:00 AM EST Reads: 3,669 |
By Kevin Jackson From the NRO Press Release: "Considered one of the top women leaders in Federal IT, Ms. Singer was recognized for her innova... Feb. 17, 2012 07:00 AM EST Reads: 490 |
By Jeremy Geelan With Cloud Expo 2012 New York (10th Cloud Expo) now under four months away, what better time to start introducing you in greater detail to the distinguished individuals in our incredible Speaker Faculty for the technical and strategy sessions at the conference...
We have technical and strategy sessions for you every day from June 11 through June 14 dealing with every nook and cranny of Cloud Computing and Big Data, but what of those who are presenting? Who are they, where do they work, what e...Feb. 16, 2012 07:30 AM EST Reads: 923 |
By Jeremy Geelan "Having been in the IT field for many years, I believe the cloud computing chapter in the industry is an exciting one and I am proud to be a part of it," said National Reconaissance Office (NRO) Chief Information Officer Jill T. Singer Tuesday, as it was announced that she was one of 10 winners of the 2012 CloudNOW "Top Ten Women in Cloud" Awards.Feb. 16, 2012 06:30 AM EST Reads: 595 |
By Pat Romanski 2011 was a year of rapid adoption for public and private cloud services. Instant and on-demand server provisioning was the driving force behind the massive growth. On top, cloud server templates and script automation simplified application installation for simple and pre-defined application stacks, but have not targeted more complex enterprise application environments.
In his session at the 10th International Cloud Expo, John Yung, CEO of Appcara, will discuss how 2012 will be the year for app...Feb. 16, 2012 06:30 AM EST Reads: 2,038 |
By Liz McMillan As more enterprises are adopting clouds, the nature of cloud computing is changing. Previously, clouds were used to test applications or for non-mission critical applications. Today, enterprises are using clouds for cost-saving advantages and launching more mission critical applications that have defined performance needs.
In his session at the 10th International Cloud Expo, Eric Shepcaro, CEO and Chairman of the Board of Telx, will discuss how distributed computing has many advantages. It wou...Feb. 16, 2012 05:45 AM EST Reads: 1,840 |
By Liz McMillan Building a cloud computing environment with on-demand access to compute, network, and storage resources requires an elastic infrastructure at multiple levels. Virtualization combined with x86 servers has transformed the way we scale out compute resources. Unfortunately, legacy Fibre Channel and iSCSI storage architectures are rooted in rigid mainframe-era designs, and are fundamentally mismatched with the dynamic, shared modern data center.
In his session at the 10th International Cloud Expo, ...Feb. 16, 2012 05:30 AM EST Reads: 2,412 |
- How Are You Building Your Cloud?
- Cloud Expo New York Speaker Profile: Dave Asprey – Trend Micro
- Big Data in Telecom: The Need for Analytics
- Big Data Gold Mine in Cloud Governance and Automation
- Drool, Britannia? Is the UK Failing the Cloud?
- Cloud Expo New York Speaker Profile: Mårten Mickos – Eucalyptus Systems
- Thoughts on Big Data and Data Virtualization
- Cloud Expo New York Speaker Profile: Bernard Golden – HyperStratus
- What Motivates Open Standards in the Cloud?
- What to Expect in 2012: Cloud Computing and Open Source Software
- Will PaaS Finally Bring Open Source Love to the Enterprise?
- Australia's Lunatic NBN OK for Cloud (Update)
- The Future of Cloud Computing: Industry Predictions for 2012
- HP Puts Activist Shareholder on Board
- Gartner Hype Cycle for Emerging Technologies 2011
- How Are You Building Your Cloud?
- Cloud Expo New York Speaker Profile: Dave Asprey – Trend Micro
- Big Data in Telecom: The Need for Analytics
- i-Technology in 2012: Five Industry Predictions
- Big Data Gold Mine in Cloud Governance and Automation
- 9th International Cloud Expo | Cloud Expo Silicon Valley – Photo Album
- Drool, Britannia? Is the UK Failing the Cloud?
- Microsoft Tries Hadoop on Azure
- Cloud Expo New York Speaker Profile: Mårten Mickos – Eucalyptus Systems
- What is Cloud Computing?
- The Top 150 Players in Cloud Computing
- Six Benefits of Cloud Computing
- Virtualization Conference Keynote Webcast Live on SYS-CON.TV
- What's the Difference Between Cloud Computing and SaaS?
- GDS International: Global Warming Scam?
- Twenty-One Experts Define Cloud Computing
- The Future of Cloud Computing
- The Top 250 Players in the Cloud Computing Ecosystem
- SOA 2 Point Oh No!
- Cloud Expo Europe 2009 in Prague: Themes & Topics
- A Brief History of Cloud Computing: Is the Cloud There Yet?








With Cloud Expo 2012 New York (10th Cloud Expo) just four months away, what better time to start introducing you in greater detail to the distinguished individuals in our incredible Speaker Faculty for the technical and strategy sessions at the conference...
We have technical and strategy sessions for you every day from June 11 through June 14 dealing with every nook and cranny of Cloud Computing and Big Data, but what of those who are presenting? Who are they, where do they work, what else h...
Cloud is a shift from the focus on underlying technology implementation to leveraging existing implementations and further building upon them. Cloud orchestration or a network of clouds is the wave of the future where these clouds can operate with elasticity, scalability, and efficiency. Effective service management is an important aspect of managing such networks. The transition to the cloud will enable the further aggregation of composite web services and enhanced business-to-business capabili...
I've been working on Enterprise Cloud Strategy and in the course of this work identified some interesting and non-obvious opportunities in the Cloud.
One solution I’ve examined is the well-crafted solution that is enStratus. enStratus has built a SaaS Cloud Management / Governance product focused on providing critical management, monitoring, governance capabilities tailored to the needs of the Global 2000 market, rather than the startup market. As I have worked with a current Fortune 500 clie...
With Cloud Expo 2012 New York (10th Cloud Expo) now under four months away, what better time to start introducing you in greater detail to the distinguished individuals in our incredible Speaker Faculty for the technical and strategy sessions at the conference...
We have technical and strategy sessions for you every day from June 11 through June 14 dealing with every nook and cranny of Cloud Computing and Big Data, but what of those who are presenting? Who are they, where do they work, what e...
"Having been in the IT field for many years, I believe the cloud computing chapter in the industry is an exciting one and I am proud to be a part of it," said National Reconaissance Office (NRO) Chief Information Officer Jill T. Singer Tuesday, as it was announced that she was one of 10 winners of the 2012 CloudNOW "Top Ten Women in Cloud" Awards.
2011 was a year of rapid adoption for public and private cloud services. Instant and on-demand server provisioning was the driving force behind the massive growth. On top, cloud server templates and script automation simplified application installation for simple and pre-defined application stacks, but have not targeted more complex enterprise application environments.
In his session at the 10th International Cloud Expo, John Yung, CEO of Appcara, will discuss how 2012 will be the year for app...
As more enterprises are adopting clouds, the nature of cloud computing is changing. Previously, clouds were used to test applications or for non-mission critical applications. Today, enterprises are using clouds for cost-saving advantages and launching more mission critical applications that have defined performance needs.
In his session at the 10th International Cloud Expo, Eric Shepcaro, CEO and Chairman of the Board of Telx, will discuss how distributed computing has many advantages. It wou...
Building a cloud computing environment with on-demand access to compute, network, and storage resources requires an elastic infrastructure at multiple levels. Virtualization combined with x86 servers has transformed the way we scale out compute resources. Unfortunately, legacy Fibre Channel and iSCSI storage architectures are rooted in rigid mainframe-era designs, and are fundamentally mismatched with the dynamic, shared modern data center.
In his session at the 10th International Cloud Expo, ...
We have previously provided a Quickstart guide to standing up Rackspace cloud servers (and have one for Amazon servers as well). These are very low cost ways of building reliable, production ready capabilities for enterprise use (commercial and government).
Is Big Data destined for only the top 3,000 companies worldwide? What about medium or small companies who are equally as data-driven? Is there a place for Big Data in SMB markets? When I talk to SMB companies about their use of public cloud services, it’s a no-brainer. Pay as you go, lower costs up...
Israel-based startup Porticor launches this week with technology aimed at giving enterprises a way to encrypt data held in cloud computing services, including those from Amazon and Rackspace.
Porticor Virtual Private Data is focused on protecting data at rest in cloud-based computing centers where ...
If you are running the BIG-IP Edge Client on your iPhone, iPod or iPad, you may have gotten an AppStore alert for an update. If not, I just wanted to let you know that version 1.0.3 of the iOS Edge Client is available at the AppStore.
The main updates in v1.0.3:
URI scheme enhancement allows passi...
Statistics matter, not only in business, but increasingly also in our social life - well, at least in our social media life. Some of the statistics I noticed this week were round numbers, like 1000. With 1000 representing both the number now showing under "followers" in Twitter and the revenue numbe...
Let's face it right now the cloud is pretty immature. The level of automation and management of these environments are analogous to the early assembly lines, but it won't be this way long. This is not the industrial revolution and it moves at a wicked fast pace. Before we know it the next generation...
In previous posts such as Cloud Computing: Hype, Vision or Reality?, Hyped Cloud Technologies, PAAS is not Mainstream yet, SaaS is going Mainstream, Future applications: SaaS or traditional? I discussed Cloud Computing.
Recently I read Joe McKendrick's interesting article titled:Cloud Computing Mar...
Having covered Cloud Foundry, Force.com, Google App Engine and Red Hat OpenShift, we now take a look at Microsoft’s PaaS offering, Windows Azure.
Microsoft Windows Azure Platform is a Platform as a Service offering from Microsoft. It was announced in 2008 and became available in 2010. Since then Mi...
Many virtualization vendors offer certifications. With that in mind, is there really any value in pursuing these certifications from Microsoft and VMware? Is one more "valuable" than the other?
First, let me say that I am a big proponent of technical certifications. That is the reason why I have my...








