The Transformer component of this SDK is an engine for transforming any markup into an Instant Article structure in the DSL. The engine runs a set of rules on the markup that will specify the selection and transformation of elements output by the CMS into their Instant Articles counterparts. The transformer ships with a base set of rules for common elements (such as a basic paragraph or an image) that can be extended and customized by developers utilizing the SDK.
The Transformer
interprets any markup in order to fill in the InstantArticle
object structure. The transformation process follows a set of pre-defined selector rules which maps the markup of the input to known InstantArticles
Elements
. This user-defined configuration makes the Transformer versatile and powerful.
The power of the Transformer lies in the configurable transformer rules which maps elements from the input markup to Instant Article markup. These rules are specified in JSON format and applied bottom-up so rules defined at the bottom take precedence over the same rule which might have already been defined.
rules
arrayclass
attribute setRule
classThe transformer pseudo-algorithm is:
$document = loadHTML($input_file); foreach($document->childNodes as $node) { foreach($rules as $rule) { if ($rule->matches($context, $node)) { // Apply rule... } } }
This transformer will run through all elements, and for each element checking all rules. The rule to be applied will need to match two conditions:
Context is the container element that is now in the pipe being processed. This is returned by the method:
public function getContextClass() { return InstantArticle::getClassName(); }
If the Rule
will be handling more than one context, it is possible by returning an array of classes:
public function getContextClass() { return array(InstantArticle::getClassName(), Header::getClassName()); }
The selector field will be used only by rules that extend ConfigurationSelectorRule
. The selector field will be used as a CSS selector; or as an Xpath selector if beginning with /
.
The following markup is a sample of what could be used as input to the Transformer:
<html> <head> <script type="text/javascript" href="http://domain.com/javascript.js" /> </head> <body> <div class="header"> <div class="title"> <h1>The article title</h1> <h2>Sub Title</h2> <span class="author">Author name</author> </div> <div class="hero-image"> <img src="http://domain.com/image.png" /> <div class="image-caption"> Some amazing moment captured by Photographer </div> </div> </div> <p>Lorem <b>ipsum</b> dolor sit amet, consectetur adipiscing elit. Sed eu arcu porta, ultrices massa ut, porttitor diam. Integer id auctor augue.</p> <p>Vivamus mattis, sem id consequat dapibus, odio urna fermentum risus, in blandit dolor justo vel ex. Curabitur a neque bibendum, hendrerit sem in, congue lectus.</p> <div class="image"> <img src="http://domain.com/image.png" /> <div class="image-caption"> Some amazing moment captured by Photographer </div> </div> <p>Curabitur vulputate odio eu justo <i>venenatis</i>, a pretium orci placerat. Nam sed neque quis eros vestibulum mattis. Donec vitae mi egestas, laoreet massa et, fringilla libero.</p> </body> </html>
This rule configuration will:
transform()
){ "rules" : [ { "class": "TextNodeRule" }, { "class": "PassThroughRule", "selector" : "html" }, { "class": "PassThroughRule", "selector" : "head" }, { "class": "PassThroughRule", "selector" : "script" }, { "class": "PassThroughRule", "selector" : "body" }, { "class": "ItalicRule", "selector" : "i" }, { "class": "BoldRule", "selector" : "b" }, { "class": "ParagraphRule", "selector" : "p" }, { "class": "HeaderTitleRule", "selector" : "div.title h1" }, { "class": "HeaderSubTitleRule", "selector" : "div.title h2" }, { "class": "HeaderRule", "selector" : "div.header" }, { "class": "AuthorRule", "selector" : "span.author", "properties" : { "author.name" : { "type" : "string", "selector" : "span" } } }, { "class": "CaptionRule", "selector" : "div.image-caption" }, { "class": "ImageRule", "selector" : "div.image", "properties" : { "image.url" : { "type" : "string", "selector" : "img", "attribute": "src" }, "image.caption" : { "type" : "element", "selector" : "div.image-caption" } } }, { "class": "HeaderImageRule", "selector" : "div.hero-image", "properties" : { "image.url" : { "type" : "string", "selector" : "img", "attribute": "src" }, "image.caption" : { "type" : "element", "selector" : "div.image-caption" } } } ] }
Each custom rule implemented should comply with full contract of the Rule
abstract class.
class MyCustomRule extends Rule { public function matchesContext($context) {} public function matchesNode($node) {} public function apply($transformer, $container, $node) {} }
The best option is to use the ConfigurationSelectorRule
as base class for all custom Rules. This way the selector and more configurations are inherited by default.
To transform your markup into InstantArticle markup, follow these steps:
InstantArticle
instanceTransformer
and load it with rules (programmatically or from a file)// Loads the rules content file $rules_file_content = file_get_contents("simple-rules.json", true); // Instantiate Instant article $instant_article = InstantArticle::create(); // Creates the transformer and loads the rules $transformer = new Transformer(); $transformer->loadRules($rules_file_content); // Example loads the html from a file $html_file = file_get_contents("simple.html", true); // Ignores errors on HTML parsing libxml_use_internal_errors(true); $document = new \DOMDocument(); $document->loadHTML($html_file); libxml_use_internal_errors(false); // Invokes transformer $transformer->transform($instant_article, $document); // Get errors from transformer $warnings = $transformer->getWarnings(); // Renders the InstantArticle markup format $result = $instant_article->render();