Facebook Instant Articles PHP SDK - Transformer

Transformer

The Transformer component of this SDK is an engine for transforming any markup into an Instant Article structure in the DSL. The engine runs a set of rules on the markup that will specify the selection and transformation of elements output by the CMS into their Instant Articles counterparts. The transformer ships with a base set of rules for common elements (such as a basic paragraph or an image) that can be extended and customized by developers utilizing the SDK.


The Transformer interprets any markup in order to fill in the InstantArticle object structure. The transformation process follows a set of pre-defined selector rules which maps the markup of the input to known InstantArticles Elements. This user-defined configuration makes the Transformer versatile and powerful.

Transformer Configuration

The power of the Transformer lies in the configurable transformer rules which maps elements from the input markup to Instant Article markup. These rules are specified in JSON format and applied bottom-up so rules defined at the bottom take precedence over the same rule which might have already been defined.

  • Each rule in the configuration file should live in the rules array
  • Each entry should have at least the class attribute set
  • All classes referred by this configuration file must implement the Rule class

The transformer pseudo-algorithm is:

$document = loadHTML($input_file);
foreach($document->childNodes as $node) {
    foreach($rules as $rule) {
        if ($rule->matches($context, $node)) {
            // Apply rule...
        }
    }
}

This transformer will run through all elements, and for each element checking all rules. The rule to be applied will need to match two conditions:

  • Matches context
  • Matches selector

Matching context

Context is the container element that is now in the pipe being processed. This is returned by the method:

public function getContextClass() {
    return InstantArticle::getClassName();
}

If the Rule will be handling more than one context, it is possible by returning an array of classes:

public function getContextClass() {
    return array(InstantArticle::getClassName(), Header::getClassName());
}

Matching selector

The selector field will be used only by rules that extend ConfigurationSelectorRule. The selector field will be used as a CSS selector; or as an Xpath selector if beginning with /.


Example

Input HTML

The following markup is a sample of what could be used as input to the Transformer:

<html>
    <head>
        <script type="text/javascript" href="http://domain.com/javascript.js" />
    </head>
    <body>
        <div class="header">
            <div class="title">
                <h1>The article title</h1>
                <h2>Sub Title</h2>
                <span class="author">Author name</author>
            </div>
            <div class="hero-image">
                <img src="http://domain.com/image.png" />
                <div class="image-caption">
                  Some amazing moment captured by Photographer
                </div>
            </div>
        </div>
        <p>Lorem <b>ipsum</b> dolor sit amet, consectetur adipiscing elit. Sed eu arcu porta, ultrices massa ut, porttitor diam. Integer id auctor augue.</p>
        <p>Vivamus mattis, sem id consequat dapibus, odio urna fermentum risus, in blandit dolor justo vel ex. Curabitur a neque bibendum, hendrerit sem in, congue lectus.</p>
        <div class="image">
            <img src="http://domain.com/image.png" />
            <div class="image-caption">
              Some amazing moment captured by Photographer
            </div>
        </div>
        <p>Curabitur vulputate odio eu justo <i>venenatis</i>, a pretium orci placerat. Nam sed neque quis eros vestibulum mattis. Donec vitae mi egestas, laoreet massa et, fringilla libero.</p>
    </body>
</html>

Full rule configuration file for the HTML above

This rule configuration will:

  • run bottom-up
  • check if matches "class" (context)
  • check if matches "selector" (css or xpath)
  • Run the rule (calling the callback method transform())
{
    "rules" :
        [
            {
                "class": "TextNodeRule"
            },
            {
                "class": "PassThroughRule",
                "selector" : "html"
            },
            {
                "class": "PassThroughRule",
                "selector" : "head"
            },
            {
                "class": "PassThroughRule",
                "selector" : "script"
            },
            {
                "class": "PassThroughRule",
                "selector" : "body"
            },
            {
                "class": "ItalicRule",
                "selector" : "i"
            },
            {
                "class": "BoldRule",
                "selector" : "b"
            },
            {
                "class": "ParagraphRule",
                "selector" : "p"
            },
            {
                "class": "HeaderTitleRule",
                "selector" : "div.title h1"
            },
            {
                "class": "HeaderSubTitleRule",
                "selector" : "div.title h2"
            },
            {
                "class": "HeaderRule",
                "selector" : "div.header"
            },
            {
                "class": "AuthorRule",
                "selector" : "span.author",
                "properties" : {
                    "author.name" : {
                        "type" : "string",
                        "selector" : "span"
                    }
                }
            },
            {
                "class": "CaptionRule",
                "selector" : "div.image-caption"
            },
            {
                "class": "ImageRule",
                "selector" : "div.image",
                "properties" : {
                    "image.url" : {
                        "type" : "string",
                        "selector" : "img",
                        "attribute": "src"
                    },
                    "image.caption" : {
                        "type" : "element",
                        "selector" : "div.image-caption"
                    }
                }
            },
            {
                "class": "HeaderImageRule",
                "selector" : "div.hero-image",
                "properties" : {
                    "image.url" : {
                        "type" : "string",
                        "selector" : "img",
                        "attribute": "src"
                    },
                    "image.caption" : {
                        "type" : "element",
                        "selector" : "div.image-caption"
                    }
                }
            }
        ]
}

Creating Custom Rules

Each custom rule implemented should comply with full contract of the Rule abstract class.

class MyCustomRule extends Rule
{
    public function matchesContext($context)
    {}

    public function matchesNode($node)
    {}

    public function apply($transformer, $container, $node)
    {}
}

The best option is to use the ConfigurationSelectorRule as base class for all custom Rules. This way the selector and more configurations are inherited by default.

Invoking Transformer

To transform your markup into InstantArticle markup, follow these steps:

  • Create an InstantArticle instance
  • Create a Transformer and load it with rules (programmatically or from a file)
  • Load/retrieve the HTML content file in the original markup
  • Run the Transformer
  • Check for errors/warnings

Example

// Loads the rules content file
$rules_file_content = file_get_contents("simple-rules.json", true);

// Instantiate Instant article
$instant_article = InstantArticle::create();

// Creates the transformer and loads the rules
$transformer = new Transformer();
$transformer->loadRules($rules_file_content);

// Example loads the html from a file
$html_file = file_get_contents("simple.html", true);

// Ignores errors on HTML parsing
libxml_use_internal_errors(true);
$document = new \DOMDocument();
$document->loadHTML($html_file);
libxml_use_internal_errors(false);

// Invokes transformer
$transformer->transform($instant_article, $document);

// Get errors from transformer
$warnings = $transformer->getWarnings();

// Renders the InstantArticle markup format
$result = $instant_article->render();