Detecting Ligature Icons With JavaScript

– October 22, 2019 –

While working on axe-core, an accessibility testing library, we ran into an interesting problem. We needed to be able to detect when a string of text had been replaced by a ligature icon.

A ligature icon is a special type of ligature where instead of a combination of letters it displays an icon. For example, a ligature icon for the word "file" would replace the word with an icon of a file. Examples of a ligature icon fonts include Material Icon, Ligature Symbols, and Symbolset.

The problem with ligature icons is that there is no easy way to detect when a font uses ligatures and what words are associated with ligature icons.

It took me awhile, but I think I found a reliable and accurate way to detect them. If you don't want to read through my process, you can skip to the end where I post the code.

Different Ideas for Detecting Ligature Icons

Through CSS Properties

I initially thought I could detect a ligature font based on the CSS properties used when declaring it. I noticed ligature icon fonts typically use a property called font-feature-settings and thought it might be required to display a ligature icon. However, turning the property off in devtools didn't stop the icon from displaying so it's not a requirement to show them. Even though some ligature icon fonts use it, I couldn't rely on it.

Additionally, just because a font uses ligature icons doesn't mean that it will only be used for icons. Knowing if it's a ligature icon font would not be enough to guarantee all uses of it are icons.

Parsing the Font File

My next idea was seeing if I could parse the font file for information about which words were ligatures. I discovered the website Font Drop which could parse a font file and display all the information about it, including its ligatures. The website said it uses the opentype.js library to parse the font files, which I figured I could also use.

Unfortunately, I soon found out that the library does not support WOFF2 files, which is what Material Icon uses. There's been an open issue to support it, but so far no work has been done on it. In the same issue, users mentioned a fork of the library that supposedly supports WOFF2, but it doesn't seem to be actively developed. This meant that if we ran into issues with the library, we wouldn't be able to get them fixed without making a fork ourselves, which was less than ideal.

Width of the Word

I next thought I could determine when a ligature icon was used by checking the width of the word. I figured if the word "file" was replaced by a single character icon, then there would be a substantial difference in its rendered width compared to its word width (1 character vs. 4 characters). However, I couldn't just ask what the width of the textContent was since it would return the word "file" and not the ligature icon.

Whilst thinking about this, I remembered the canvas API measureText which can measure the width of a string based on a given font size and font family. I didn't know if it could handle ligature icons, but testing it out showed that it could.

I thought I could just add a single whitespace character to the middle of the word so the word would no longer render as an icon, and then compare that width to the width of the unchanged word. This sort of worked until I ran into two problems.

The first problem was that the width of a whitespace character isn't always consistent, but can differ depending on where it's located or what letters surround it. For example, in the font Big Shoulders Text a whitespace character at 36px is measured to be 7.45px wide. However, it's 8.35px wide when between the letters "r" and "e". This meant that adding a whitespace character in the middle of a word could give different widths depending on which characters the whitespace separated.

The second problem occurred when I found a ligature icon that was only two characters long. The font Ligature Symbols has a ligature icon for the word "vk." When I took the width of the word at 36px it measured 30.76px wide, while the width of a whitespace character was 12.94px wide. So together I expected the final width of the word and whitespace to be about 43.7px wide. However, the actual width was 49.99px wide, a difference of about 6px.

Since a whitespace character can differ in width, I couldn't be sure if that 6px difference was due to the word being replaced by a ligature icon or because the whitespace character was longer between the "v" and "k."

I also tried to measure the width of each individual character and add them together to get the final expected width, but this also proved to have too much variance due to letter spacing differences. In another experiment, I tried adding a zero-width-space character to the word but it seems the browser ignores these when determining the ligatures and so it didn't change the width from the word without a zero-width-space.

Compare Image Data

My last idea to try, and the one that ended up working, was to compare the pixel data of the first character to the pixel data of the entire word. The idea was that if I created a canvas the exact size of the first character and drew that character, drawing the entire word on the same size canvas (so the remaining letters wouldn't be visible) should produce the same image of the first character. If the images were different, I would know that the word was replaced by a ligature icon.

Before I started working on this solution, I already knew I was going to have to account for normal ligatures such as ae, fi, and ffi. That meant that small differences in the image should be ignored. It also meant that that the canvas size would need to be large enough to account for some pixels changing due to ligatures, but small enough to be performant when calling getImageData and looping through the data (twice).

As I began thinking about this, it all started to sound like a statistics problem in that I needed a large enough canvas size (sample size) to detect a (statistically) significant change. So I thought why not break out the statistics! After all, my high school teacher would be proud knowing I finally got to use it after all these years.

Note: the goal wasn't to be statistically accurate, but to use statistics to help inform our decision about numbers. That way I wasn't just choosing magic numbers out of a hat but had some reasoning behind them.

Determining Sample Size

In statistics, there's a few ways you can determine sample size depending on what information you already have or can find. For my particular case, I needed to choose a sample size without knowing anything about the data (it's mean, standard deviation, etc.). Luckily, there's a formula for determining sample size based only on how confident you want to be (confidence interval) and how much margin of error to account for.

X = Z**2 * p * (1 - p) / MOE**2

Where:

X = sample size
Z = z-score (determined from the confidence interval)
p = sample proportion (the expected results)
MOE = margin of error

I wanted my results to be super accurate, so chose a very high confidence interval of 99.9% (a z-score of 3.291), and kept the standard margin of error of 5%. For the sample proportion I chose 50% as that gives the largest sample size needed.

Using these numbers, the sample size needed was 1072. Square rooting that gave me a canvas size of about 32px x 32px.

Determining Percent Difference

Now that I knew what canvas size to use, I needed to know what percent change was needed to detect an icon. I wrote a script to find every icon on the Ligature Symbols webpage and calculate the percent difference from the first character to the whole word.

const canvas = document.createElement('canvas');
const context = canvas.getContext('2d');
const fontFamily = 'LigatureSymbols';
const percents = [];

let fontSize = 36;  
let font = `${fontSize}px "${fontFamily}"`;

document.querySelectorAll('.lsf.symbol').forEach(node => {
  const word = node.textContent;
  if (!word) return;  // there was a symbol without text

  context.font = font;
  let width = context.measureText(word.charAt(0)).width

  // ensure the width of the first character is the required size
  // (36 pixels does not mean the font is draw that wide)
  if (width < 33) {
    let ratio = 33 / width;
    width *= ratio;
    fontSize *= ratio;
    font = `${fontSize}px "${fontFamily}"`;
  }

  canvas.width = width;

  // the font size typically defines the height (including 
  // ascenders and descenders) of a font. this usually means we'll 
  // end up with a taller canvas than it is wide
  canvas.height = fontSize;

  // changing the size of the canvas resets all canvas properties, 
  // include font
  context.font = font;
  context.textAlign = 'left';
  context.textBaseline = 'top';

  // draw the first character and get the image data
  context.fillText(word.charAt(0), 0, 0);
  const firstCharData = new Uint32Array(
    context.getImageData(0, 0, width, fontSize).data.buffer
  );

  // draw the whole word and get the image data
  context.clearRect(0, 0, width, fontSize);
  context.fillText(word, 0, 0);
  const wholeWordData = new Uint32Array(
    context.getImageData(0, 0, width, fontSize).data.buffer
  );

  // we only want to know when a pixel has changed so can ignore 
  // the value at the pixel and only look to see if they were both
  // turned on or off
  const difference = firstCharData.reduce((diff, pixel, i) => {
    if (pixel === 0 && wholeWordData[i] === 0) return diff;
    if (pixel !== 0 && wholeWordData[i] !== 0) return diff;
    return ++diff;
  }, 0);

  percent = (difference / firstCharData.length).toFixed(2);
  percents.push(percent);
});

console.log(percents.sort().join(', '));

The resulting data showed that the minimum difference was 17%. Running the same code for Symbolset Geomicons Squared resulted in a minimum difference of 26%. This informed me that all ligature icons could be caught by using a difference threshold of about 15%.

All that was left to do was ensure that a normal text ligature wouldn't be flagged as an icon. I ran the same test on a few Google Fonts that had text ligatures and they showed that the fi ligature is about a 5% difference, which was perfect.

Caveats

Even though this approach works for the 3 sets of ligature icon fonts I tested, I would need to test it against a larger sample size to ensure my results are accurate. Unfortunately I couldn't find very many fonts that use ligature icons.

This approach also does not work for single character ligature icons as there would be nothing to compare the original font to before it changed.

Lastly, in testing Material Icon I found that they don't have any character data for the standard alphabet. This resulted in differences as low as 4% (a blank canvas compared to a single line for "minimize"). I had to adjust the code to check to see if the first character image data was empty and if it was assume the font was a ligature icon font (thus skipping the comparison).

Final Code

Putting it all together, we can determine when a string of text has been replaced with a ligature icon using the following code:

(function() {
  const canvas = document.createElement('canvas');
  const context = canvas.getContext('2d');

  window.isLigatureIcon = function isLigatureIcon(node) {
    const word = node.textContent;
    if (!word) return false;

    let fontSize = 36;
    let fontFamily = window.getComputedStyle(node).getPropertyValue('font-family');
    let font = `${fontSize}px "${fontFamily}"`;

    context.font = font;
    let width = context.measureText(word.charAt(0)).width

    // ensure the width of the first character is the required size
    // (36 pixels does not mean the font is draw that wide)
    if (width < 33) {
      let ratio = 33 / width;
      width *= ratio;
      fontSize *= ratio;
      font = `${fontSize}px "${fontFamily}"`;
    }

    canvas.width = width;

    // the font size typically defines the height (including 
    // ascenders and descenders) of a font. this usually means we'll 
    // end up with a taller canvas than it is wide
    canvas.height = fontSize;

    // changing the size of the canvas resets all canvas properties, 
    // include font
    context.font = font;
    context.textAlign = 'left';
    context.textBaseline = 'top';

    // draw the first character and get the image data
    context.fillText(word.charAt(0), 0, 0);
    const firstCharData = new Uint32Array(
      context.getImageData(0, 0, width, fontSize).data.buffer
    );

    // check to make sure there was data for the first character, 
    // if not it's assumed to be a ligature icon font
    if (!firstCharData.some(pixel => !!pixel)) return true;

    // draw the whole word and get the image data
    context.clearRect(0, 0, width, fontSize);
    context.fillText(word, 0, 0);
    const wholeWordData = new Uint32Array(
      context.getImageData(0, 0, width, fontSize).data.buffer
    );

    // we only want to know when a pixel has changed so can ignore 
    // the value at the pixel and only look to see if they were both
    // turned on or off
    const difference = firstCharData.reduce((diff, pixel, i) => {
      if (pixel === 0 && wholeWordData[i] === 0) return diff;
      if (pixel !== 0 && wholeWordData[i] !== 0) return diff;
      return ++diff;
    }, 0);

    return (difference / firstCharData.length).toFixed(2) > 0.15;
  };
})();