Skip to content

Mf2 and Video

I ran into a bit of an issue lately, and it’s to do with the way X-Ray, or rather php-mf2 parses video elements in HTML (i.e., an h-entry).

The unfortunate frog I’ll be dissecting is the page at https://beko.famkos.net/2021/05/03/head-tracking-for-x4-foundations-on-linux/. (Sorry!)

Its source contains the following relevant bits of HTML (I’ve removed some classes, though, for brevity’s sake):

<figure class="wp-block-video">
  <video controls>
    <source src="https://beko.famkos.net/wp-content/uploads/2021/05/building_opentrack.mp4" type="video/mp4" class="u-video">
    <source src="https://beko.famkos.net/wp-content/uploads/2021/05/building_opentrack.webm" type="video/webm" class="u-video">
  </video>
</figure>

<figure class="wp-block-video u-video">
  <video controls>
    <source src="https://beko.famkos.net/wp-content/uploads/2021/05/opentrack_demo02_h264.mp4" type="video/mp4" class="u-video">
    <source src="https://beko.famkos.net/wp-content/uploads/2021/05/opentrack_demo02_h265.mp4" type="video/mp4" class="u-video">
    <source src="https://beko.famkos.net/wp-content/uploads/2021/05/opentrack_demo02_p9.webm" type="video/webm" class="u-video">
  </video>
</figure>

I think they’re might be an issue with that last bit, perhaps that u-video shouldn’t be used in combination with that figure tag, which doesn’t have a URL attribute (think href, or src).

Regardless, the output I got from X-Ray—and I had to bypass ActivityStreams here, too—gets me a JSON object with the following videos property:

"video": [
  "https://beko.famkos.net/wp-content/uploads/2021/05/building_opentrack.mp4",
  "https://beko.famkos.net/wp-content/uploads/2021/05/building_opentrack.webm",
  "https://beko.famkos.net/kind/article/",
  "https://beko.famkos.net/wp-content/uploads/2021/05/opentrack_demo02_h264.mp4",
  "https://beko.famkos.net/wp-content/uploads/2021/05/opentrack_demo02_h265.mp4",
  "https://beko.famkos.net/wp-content/uploads/2021/05/opentrack_demo02_p9.webm"
]

Okay, so two things happen here.

One of those URLs is (fairly clearly, although video URLs don’t necessarily need a file extension in them to work) wrong. Or rather: it’s exactly the (h-)feed URL I fed X-Ray, and almost certainly a byproduct of that (or another?) “empty” u-video element.

The other thing that should be obvious: the remaining URLs point two a total of five video files, but represent only two distinct pieces of footage. A video element with multiple sources in it, after all, is a very common way to let web browsers pick the file they support (best).

So, how do I properly display these videos in my social reader, which is a web app?

I see four possiblities:

  1. Don’t. Or rather, these videos are in the source, right? Then they’ll be displayed there. Turns out this is, in fact, not the case for “photo” (and “video”) posts. The actual fix, it seems, is for publishers to only use u-photo and u-video classes in true, Instagram-like photo or video posts.
  2. Don’t do anything, leave as is. Show all sources as if they were separate videos and let browsers display a warning for file types they can play.
  3. Re-examine the actual HTML, and display the video elements and files as the appear there. This is probably the more robust answer. This way I could still append a “video grid” of sorts to entry previews. Which, I think, was the reason behind video metadata in the first place. Or one of the reasons.
  4. Try to regroup the (valid) files into proper video elements. This is what I’ve played around with the last couple days. I wrote an algorithm of sorts that loops over they array, and compares filenames using, PHP’s similar_text(). It’s kind of ugly, but it was fun. (I somehow quite like solving challenges no one in their right mind would give a damn about.)

Kinda looks like this:

$result = [];
$count = 0;
$ref = '';

foreach ($videos as $i => $video) {            
  if ($i === 0) {
     $ref = basename($video); // Our initial reference.
     $result[$count] = [$video]; // Start the first "chunk."
     continue;
  }

  // Compare the current filename with the reference.
  similar_text($ref, basename($video), $percent);

  if (intval($percent) >= 75) {
    // Same file, different format?
    $result[$count][] = $video; // Add to current "chunk."
    continue;
  }

  $count++;

  $ref = basename($video);
  $result[$count] = [$video]; // Start a new "chunk."
}

This sort of works, although the threshold of 75% is very arbitrary, and other string comparison algorithms may be better suited for this kind of thing.
The result (with the one invalid URL removed)?

[
  [
    "https://beko.famkos.net/wp-content/uploads/2021/05/building_opentrack.mp4",
    "https://beko.famkos.net/wp-content/uploads/2021/05/building_opentrack.webm"
  ],
  [
    "https://beko.famkos.net/wp-content/uploads/2021/05/opentrack_demo02_h264.mp4",
    "https://beko.famkos.net/wp-content/uploads/2021/05/opentrack_demo02_h265.mp4",
    "https://beko.famkos.net/wp-content/uploads/2021/05/opentrack_demo02_p9.webm"
  ],
]

It only works because the order of `source` elements is kept intact. I also thought I could speed up things by comparing file extensions, like: “Two of the same file formats can never belong in the same ‘chunk,’” but that’s not true, as the H.264 and H.265 example (both “identical” files are MP4s) above shows. Also, filenames don’t have to be similar at all! Still very hacky indeed.

Replies

  1. Beko Pharm Beko Pharm on

    Heh, sorry for making you jump the hoops @jan The weird formatting is mostly because this is how Gutenberg wants to format videos. Hacking various …

    Via beko.famkos.net, in reply to Mf2 and Video.