Facebook Instant Articles PHP SDK - Parser

Parser

The Parser component of this SDK is an engine for transforming any Instant Article HTML markup into an InstantArticle Element structure. The Parser component uses the Transformer component under the hood.

Parser component is available in the SDK since version 1.2.0


The Parser interprets the HTML Instant Article markup in order to fill in the InstantArticle object structure. Since it uses the transformation process, it will follow a set of pre-defined rules which maps the markup HTML to InstantArticles Elements.

Parser Configuration

The Parser uses an all configured Transformer. No configuration is needed to use Parser.

As example, the configuration of Transformer rules for the Instant Article markup can be found here.

Code example

Context is the container element that is now in the pipe being processed. This is returned by the method:

$parser = new Parser();
$instant_article = $parser->parse($document);

The $document can be either DOMDocument or the string content of the Instant Article Markup.

Example

Input HTML

The following content is a valid Instant Article HTML markup:

<html>
  <head>
    <link rel="canonical" href="http://foo.com/article.html"/>
    <meta charset="utf-8"/>
    <meta property="op:generator" content="facebook-instant-articles-sdk-php"/>
    <meta property="op:generator:version" content="1.0.0"/>
    <meta property="op:generator:transformer" content="facebook-instant-articles-sdk-php"/>
    <meta property="op:generator:transformer:version" content="1.0.0"/>
    <meta property="op:markup_version" content="v1.0"/>
  </head>
  <body>
    <article>
      <header>
        <figure>
          <img src="https://jpeg.org/images/jpegls-home.jpg"/>
          <figcaption><h1>Image Name</h1>Some text on text node<cite>Some caption to the image</cite></figcaption>
        </figure>
        <h1>Big Top <b>Title</b></h1>
        <h2>Smaller <b>SubTitle</b></h2>
        <time class="op-published" datetime="1984-08-14T19:30:00+00:00">August 14th, 7:30pm</time>
        <time class="op-modified" datetime="2016-02-10T10:00:00+00:00">February 10th, 10:00am</time>
        <address><a href="#" title="Title of author">Author Name</a>
          Author more detailed description
          Even more
        </address>
        <address><a href="http://facebook.com/author" rel="facebook">Author in FB</a>
          Author user in facebook
        </address>
        <address><a title="PHP Programmer">Developer</a>
        </address>
        <h3 class="op-kicker">Some kicker of this article</h3>
      </header>
      <p>Some text to be within a paragraph for testing.</p>
      <figure data-feedback="fb:likes">
        <img src="http://mydomain.com/path/to/img.jpg"/>
        <audio title="audio title" autoplay="autoplay" muted="muted">
          <source src="http://foo.com/mp3"/>
        </audio>
      </figure>
      <figure data-feedback="fb:comments">
        <img src="http://mydomain.com/path/to/img.jpg"/>
        <script type="application/json" class="op-geotag">
          {
            "type": "Feature",
            "geometry": {
              "type": "Point",
              "coordinates": [23.166667, 89.216667]
            },
            "properties": {
              "title": "Jessore, Bangladesh",
              "radius": 750000,
              "pivot": true,
              "style": "satellite",
            }
          }
        </script>
        <audio title="audio title" autoplay="autoplay" muted="muted">
          <source src="http://foo.com/mp3"/>
        </audio>
      </figure>
      <figure data-feedback="fb:likes,fb:comments">
        <img src="https://jpeg.org/images/jpegls-home.jpg"/>
        <figcaption><h1>Image Name</h1>Some text on text node<cite>Some caption to the image</cite></figcaption>
      </figure>
      <p>Other text to be within a second paragraph for testing.</p>
      <figure class="op-slideshow">
        <figure>
          <img src="https://jpeg.org/images/jpegls-home.jpg"/>
        </figure>
        <figure>
          <img src="https://jpeg.org/images/jpegls-home2.jpg"/>
        </figure>
        <figure>
          <img src="https://jpeg.org/images/jpegls-home3.jpg"/>
        </figure>
        <figcaption><h1>Image Name</h1>Some text on text node<cite>Some caption to the image</cite></figcaption>
        <audio title="audio title" autoplay="autoplay" muted="muted">
          <source src="http://foo.com/mp3"/>
        </audio>
      </figure>
      <ol>
        <li>First list item</li>
        <li>One paragraph on the list</li>
        <li>On the span</li>
        <li>Text inside div?</li>
        <li>Other <a href="#">paragraph</a> on the li</li>
        <li>Last list item</li>
      </ol>
      <p>Some text to be within a paragraph for testing.</p>
      <figure class="op-interactive">
        <iframe src="http://example.com/custom-interactive" class="column-width" height="60">
          <h1>Some custom code</h1>
          <script>alert("test & more test");</script></iframe>
        <figcaption>This graphic is awesome.</figcaption>
      </figure>
      <figure class="op-ad">
        <iframe src="http://foo.com"></iframe>
      </figure>
      <blockquote>Some blockquotes creates <b>magic</b> in an article</blockquote>
      <figure class="op-map">
        <script type="application/json" class="op-geotag">
          {
            "type": "Feature",
            "geometry":
              {
                "type": "Point",
                "coordinates": [23.166667, 89.216667]
              },
            "properties":
              {
                "title": "Jessore, Bangladesh",
                "radius": 750000,
                "pivot": true,
                "style": "satellite",
              }
           }
        </script>
        <figcaption class="op-vertical-above"><h1 class="op-vertical-above op-center">title for caption</h1><h2 class="op-vertical-below op-right">sub title for caption</h2>


        <cite class="op-vertical-center op-left">credit within caption</cite></figcaption>
        <audio title="audio title" autoplay="autoplay" muted="muted">
          <source src="http://foo.com/mp3"/>
        </audio>
      </figure>
      <aside>
        We can be more efficient about where we grow, what we grow, and how we grow.
        <cite>Fruit Store Company</cite></aside>
      <p>Other text to be within a second paragraph for testing.</p>
      <figure class="op-tracker">
        <iframe>
          <h1>Some custom code</h1>
          <script>alert("test & more test");</script></iframe>
      </figure>
      <figure class="op-tracker">
        <iframe>
          <h1>Tracker with enclosing on the script</h1>
          <div><script>alert("test & more test");</script></div>
        </iframe>
      </figure>
      <figure class="op-interactive">
        <iframe class="no-margin">
          <h1>Custom code for your social embed</h1>
          <script>alert("test & more test");</script></iframe>
      </figure>
      <figure data-mode="fullscreen" data-feedback="fb:likes,fb:comments">
        <video data-fb-disable-autoplay="data-fb-disable-autoplay" controls="controls">
          <source src="http://mydomain.com/path/to/video.mp4" type="video/mp4"/>
        </video>
        <figcaption class="op-vertical-below"><h1>Video 1 Title</h1>

          <cite>Attribution Source</cite></figcaption>
        <script type="application/json" class="op-geotag">
          {
            "type": "Feature",
            "geometry": {
              "type": "Point",
              "coordinates": [ [23.166667, 89.216667], [23.166667, 89.216667] ]
            },
            "properties": {
              "title": "Jessore, Bangladesh",
              "radius": 750000,
              "pivot": true,
              "style": "satellite",
            }
          }
        </script>
      </figure>
      <ul class="op-related-articles" title="The related ones in the middle">
        <li>
          <a href="http://example.com/article.html"></a>
        </li>
        <li data-sponsored="true">
          <a href="http://example.com/sponsored-article.html"></a>
        </li>
        <li>
          <a href="http://example.com/another-article.html"></a>
        </li>
      </ul>
      <footer>
        <aside>
          <p>Some plaintext credits to<a href="http://facebook.com/author" rel="facebook">Author</a></p>
          <p>Paragraph text as credits</p>
        </aside>
        <ul class="op-related-articles" title="The related ones in the footer">
          <li>
            <a href="http://example.com/article.html"></a>
          </li>
          <li data-sponsored="true">
            <a href="http://example.com/sponsored-article.html"></a>
          </li>
          <li>
            <a href="http://example.com/another-article.html"></a>
          </li>
        </ul>
      </footer>
    </article>
  </body>
</html>

Loading the HTML content and sending to parser

Example loading the HTML file and sending to Parser as string:

$html_file = file_get_contents('/instant-article-example.html');

$parser = new Parser();
$parser->parse($html_file);

Example loading the HTML file and sending to Parser as DOMDocument:

$html_file = file_get_contents('/instant-article-example.html');
  
libxml_use_internal_errors(true);
$document = new \DOMDocument();
$document->loadHTML($html_file);
libxml_use_internal_errors(false);

$parser = new Parser();
$parser->parse($document);