Implementing Webmention: Endpoint Discovery
Tags: js protocols webmention post
I don’t think it’s an exaggeration to say that protocols are at the foundation of modern civilization. From banking and the stock market to your traffic lights and radio, it’s become nearly impossible to escape technology’s reach. Underlying it all are these beautiful protocols. They may exist without you knowing, but they shouldn’t be a secret. There are protocols that communities build, companies, or labs. What binds these all together is that they help us communicate. Although the protocols run on computers, in a variety of different ways on differing systems, they exist to make our lives easier and better. They help shorten the distance between us, from me to you.
As I began one of my regular dives into the web, I came across a protocol that I found really interesting: Webmention. This protocol runs on top of HTTP and was built by the IndieWeb community. It allows different sites to network with each other and share data like comments, likes, etc. Sponsored by the W3C, Webmention felt accessible, useful, fun and something that even I could implement. With that I was hooked, and started reading as much as I could about it and the community behind it. Using that to push me forwards, I started writing my own implementation because I realized I would learn even more and build something that I could actually use on this site!
The ways people interact through webmentions is what makes it special. Hopefully it will provide some inspiration to continue reading this post as well as learn more about it. Even on the daily, Webmention continues to evolve. It’s amazing to see how different people come up with different ways to use, display, and play with webmentions. With things like making book clubs, news aggregators, the ability to RSVP to events, and so much more. Ingenuity and creativity are truly on display when you see what others have done and continue to do with this idea!
With that to motivate us, let’s take a deep dive into the “behind-the-scenes” of how Webmention works. What happens is as follows:
- You get the website who you want to send a webmention to.
- You figure out what it’s “webmention endpoint” is. This is where the webmentions should be sent to.
- The website sends the webmention to the correct endpoint/URL.
- Presto, the webmention should appear on the target site!
This seems straightforwards, and it is in a way, but each step holds a lot of complexity.
First things first, in order to send a webmention, we need to first know where to send it to. This isn’t easy since the protocol allows a variety of different ways to signify where this URL can be found. The webmentions.rocks site has great tests you can use to try out your implementation, and it helped me a lot. There are 23 tests just for the discovering the endpoint. We’re going to tackle each one.
This is definitely going to be a lot easier to understand if you’ve wrapped your head around HTTP, it’s headers and the basics of how it works. I highly recommend reading the HTTP MDN article here, before you tackle the rest of this post so it’s fresh in your mind.
The very first test requires you to find the “HTTP Link” header and use it to figure out the endpoint. When you go to any URL, your browser requests the website to give you data. It sends you this data alongside different headers. These headers can be used to describe the data you are receiving and provide any other additional information. An HTTP response looks something like this:
HTTP/1.1 200 OK
Date: Mon, 14 Oct 2019 13:00:00 GMT
Server: Express
Last-Modified: Wed, 02 Dec 2010 12:00:00 GMT
ETag: "000000-000000-000000"
Accept-Ranges: bytes
Content-Length: 10000
Content-Type: text/html
Link: </test/1/webmention>; rel=webmention
The Link header isn’t special, and it sure isn’t unique to the Webmentions protocol, all it’s saying is “Hey there! This thing you’ve requested is related to this other URL.” Now, in order to interact with these websites and to receive their headers I’ll be using JavaScript, which allows you to easily to run some of these first examples in a browser, and the rest all on Node.js.
Let’s kick things off with a small example of how we’d fetch the website. In JavaScript, the fetch
browser API lets you do this. If you’re working in Node, then node-fetch
works perfectly for our purposes too.
fetch("https://webmention.rocks/test/1")
.then(function(response){
console.log(response)
})
.catch(function(error){
console.log(error)
})
You can run this code in your browser and it should work perfectly! If there’s no errors it’ll print the response, otherwise it’ll let you know what went wrong. Perfect! There’s a handy method we can use to get the Link, the response returns a Headers object with a get
function. Let’s use it to find this header!
fetch("https://webmention.rocks/test/1")
.then(function(response){
console.log(response.headers.get("Link"))
})
.catch(function(error){
console.log(error)
})
Great, we can now have the Link header, but we need to the get the URL part out of it. The header.get("Link")
function returned looks like this:
</test/1/webmention>; rel=webmention
We only want the information in between the two arrow brackets. Now something like this:
const linkURLFromHeaderWithBrackets = res.headers.get('Link').split(';')[0]
// removes arrow brackets
const linkURLFromHeader = linkURLFromHeaderWithBrackets.substring(1, linkURLFromHeaderWithBrackets.length -1)
would work to get our URL out of it. Yet it’s not foolproof. What if, for example, more than one Webmention endpoint was returned? That’s fully permitted by the protocol specification (and in fact test #18 checks your ability to handle this). In order to deal with any number of links, you’ll need to split it by commas into pieces to get the link like we saw it above. Next, we split each piece by a semicolon in order to split apart the URL from the “rel=webmention” part. Now what exactly does that part of the Link header mean? It’s telling us that this URL is related to the webmention protocol, so we want to filter any of the URLs not related to the webmention protocol. That all will look like this:
// split all link headers by comma
const linkURLFromHeaderUncleaned = res.headers.get('Link').split(",")
// then split the site from the rel
.map(linkHeader => linkHeader.split(";") )
// then check that the rel includes the word webmention
.filter(linkHeader => linkHeader[1].split("rel=")[1].includes("webmention"))
const cleanedLinkURL = linkURLFromHeaderUncleaned[0][0].trim()
In the last line of the snippet above we take URLs which are related to the Webmention protocol and decide to only interact with the first one. The protocol allows this since if more than one URL is mentioned, you are supposed to use the first. We trim the spacing from both sides of the URL and there we have it! We now would have a URL to send Webmentions to… if the URL wasn’t relative. What does that mean? Well let’s say we’re on the website example.com and we want to specify their about page. We could refer to the URL as example.com/about or just talk about the /about page since we both know I’m talking about example.com! The Webmention standard allows the Link header to return either an absolute URL (like example.com/about) or a relative URL (like returning just the /about part).
Webmention endpoints are allowed to be referenced as relative or absolute. That means we need a function that can recognize absolute URLs and return them without changing anything. If it’s relative, it’ll make the website an absolute reference. Without an absolute reference, when we request the site it’ll think we’re making a relative request to our own site! That’s no good. Below I’ve written the function I’ve used to “relativize” the URL, but feel free to write your own or use a library to do so.
function relativizeURL(url, host, finalURL){
var beginsWithHttp = null
try {
// check if it's a correctly formatted url with the right protocol
const urlProtocol = new URL(url).protocol
beginsWithHttp = urlProtocol == "https:" || urlProtocol == "http:"
} catch(err){
// if not correctly formatted as a URL (must be relative)
beginsWithHttp = false
}
// if it doesn't begin with http and it doesnt start with a slash to indicate a page relative to the host
if(!beginsWithHttp && url.substring(0,1) != "/"){
// remove last part of slash and add url ONLY if last part of slash IS NOT included in the url piece
const indexOfLastSlash = urlToSendWebmentionTo.lastIndexOf("/")
const pieceOfURLOfLastSlash = urlToSendWebmentionTo.substring(indexOfLastSlash+1, urlToSendWebmentionTo.length)
if(url.includes(pieceOfURLOfLastSlash)){
const nonRelativeLink = urlToSendWebmentionTo.substring(0, urlToSendWebmentionTo.lastIndexOf("/"))
return nonRelativeLink + "/" + url
} else {
// correct location for redirect
return finalURL.substring(0, finalURL.lastIndexOf("/")) + "/" +url
}
}
// if it's an absolute URL it can be returned easily,
// if it's a normal relative URL it can be made absolute by adding the host part
if(beginsWithHttp){
return url
} else {
return host + url
}
}
Now we just need to ensure out code calls that function and we’ll be all set to pass the first test! Here’s what our program should look like up to this point:
fetch("https://webmention.rocks/test/1")
.then(function(res){
const finalURL = res.url
// split all link headers by comma
const linkURLFromHeaderUncleaned = res.headers.get('Link').split(",")
// then split the site from the rel
.map(linkHeader => linkHeader.split(";") )
// then check that the rel includes the word webmention
.filter(linkHeader => linkHeader[1].split("rel=")[1].includes("webmention"))
const cleanedLinkURL = linkURLFromHeaderUncleaned[0][0].trim()
// remove arrows from URL and give it as an argument to the relativizeURL function
var linkURL = relativizeURL( cleanedLinkURL .substring(1, cleanedLinkURL.length -1), "https://webmention.rocks", finalURL)
console.log(linkURL)
})
.catch(function(error){
console.log(error)
})
// full function is specified above, this is just here to remind you to include it
function relativizeURL(url, host, finalURL){
The only thing that’s different from what we’ve just done is that I’ve added the host and finalURL variables so the “relativize” function we wrote above can work successfully. The host is simply the URL we want to send a Webmention to without any slashes or additional parts at the end. That explains the first variable, but we’re no closer to understanding the second. Why use a special finalURL variable instead of the link we’re fetching from (i.e. “https://webmention.rocks/test/1”)? This is because URLs aren’t necessarily static, and a URL may redirect to different part of a site or a different site entirely! If the URL we’re requesting ends up being redirected to another part of the site, we need to know what the new correct URL is.
With that we’ve just completed the first test! I know it seems like a lot, and it was, but now we can pass the first and second tests on webmention.rocks! Congrats! Only 21 more tests to go!
If you try to run the third test, you’ll likely receive an error that says TypeError: “res.headers.get(…) is null”. This is because we can’t just look at the Link headers for the Webmention endpoint! In some cases, there will be no Link headers! Instead we’ll need to look inside of the main data the HTTP request is sending us, the actual website!
For this next section, we can’t rely on the web browser alone. You’re going to need Node.js installed if you’d like to follow along as well as the Cheerio dependency which can be installed with the command npm install cheerio --save
.
You see, the Webmention protocol allows you to specify the webmention endpoint in the HTTP Link headers, or as an a/link tag within an HTML document (i.e. a website). Just like with the HTTP Link headers, the URLs mentioned may be relative or absolute. The function we wrote earlier still works perfectly for this. The issue is we need a way to find the right link and a tags within the HTML document. The Cheerio library allows you to use different selectors to select different elements allowing us to find the data we’re looking for. Now that we have some background, let’s get to it!
First things first, we need to wrap our old code in an if statement. That way there’s no more errors if there is no HTTP Link header. We’ll also add a .then
call to our fetch request so we can deal with our HTML checking code. I’ve also moved the linkURL variable outside of the if statement as knowing whether or not it’s undefined gives us a clue for the next part.
fetch("https://webmention.rocks/test/1")
.then(function(res){
const finalURL = res.url
// moved variable aboves
var linkURL;
// new code
if(res.headers.get('Link')){
// split all link headers by comma
const linkURLFromHeaderUncleaned = res.headers.get('Link').split(",")
// then split the site from the rel
.map(linkHeader => linkHeader.split(";") )
// then check that the rel includes the word webmention
.filter(linkHeader => linkHeader[1].split("rel=")[1].includes("webmention"))
const cleanedLinkURL = linkURLFromHeaderUncleaned[0][0].trim()
// remove arrows from URL and give it as an argument to the relativizeURL function
linkURL = relativizeURL( cleanedLinkURL .substring(1, cleanedLinkURL.length -1), "https://webmention.rocks", finalURL)
}
// new code
return Promise.all([res.text(), res.headers.get('content-type').includes('text/html'), linkURL, finalURL])
})
.then(function([body, isHTML, linkURL, finalURL]){
// all HTML checking code will go here
})
.catch(function(error){
console.log(error)
})
// full function is specified above, this is just here to remind you to include it
function relativizeURL(url, host, finalURL){
The only line that I haven’t explained is this: return Promise.all([res.text(), res.headers.get('content-type').includes('text/html'), linkURL, finalURL])
and the arguments to the new .then function. Essentially, res.text() is a JavaScript Promise which contains the HTML document we requested. The issue is, if we just pass the HTML document we won’t know whether a webmention endpoint has already been found. If has been found, we shouldn’t even bother checking the HTML document. We also need to pass the linkURL and finalURL variables to the next function. Promise.all allows us to do that with ease and elegance. This Promise.all construct also allows us to return an array to the function contained within the next .then() call. Within that next function, we’ve used “array destructuring” to take each of those values and use them as arguments.
With that, we can now actually get started dealing with finding those pesky Webmention endpoints in the HTML document. If there’s already a link URL found in the HTTP headers, then the specification says that you should ignore any endpoints in the HTML document. We can deal with this by checking to make sure that linkURL is undefined (which when negated with the ! operator will become a true value). We can also only continue to check the HTML document if there is one which is what the res.headers.get('content-type').includes('text/html')
line is checking in the Promise.all section. That snippet will return true when there’s an HTML document included with the response and false if there isn’t.
if(!linkURL && isHTML){
} else {
return linkURL
}
Now we’re ready to use the Cheerio library! Make sure you’re importing it into the file (which you can do with the line const cheerio = require("cheerio")
) before we continue. We need to load the HTML document into the Cheerio library and then we can use its selectors to find our link/a tags.
if(!linkURL && isHTML){
const $ = cheerio.load(body)
const linkElement = $('link[rel~="webmention"]')
const aElement = $('a[rel~="webmention"]')
} else {
return linkURL
}
Afrer loading the HTML document into the Cheerio library, we can then select anything! All Webmention endpoint tags will have the “rel=webmention” property on their tag. That means it’ll look something like this:
<link rel="webmention" href="/test/3/webmention">
When selecting an HTML element, first we write the tag (i.e. link or a) and then we select a property by square bracketing the property we’re looking for. The “~=” part allows us to find any tag with that property. With that we’ve got our elements! Now it’s a matter of pulling out the URLs. Here is the code that allows us to do this properly.
if(!linkURL && isHTML){
const $ = cheerio.load(body)
const linkElement = $('link[rel~="webmention"]')
const aElement = $('a[rel~="webmention"]')
const linkElementURL = linkElement.attr('href')
const aLinkElementURL = aElement.attr('href')
if(linkElementURL == "" || aLinkElementURL == ""){
return finalURL
}
// the second should only be the case if no aLinkElementURL exists
if(linkElement.index() < aElement.index() && linkElementURL){
// link is before a
return relativizeURL(linkElementURL, host, finalURL)
}
// if the a element is before the link element OR
// if both indexes don't exist and the other element doesn't exist then run it
if(aElement.index() < linkElement.index() && aLinkElementURL){
return relativizeURL(aLinkElementURL, host, finalURL)
}
// if it matches none of the above (in the scenario where a link tag is before a tag) and the link is blank
if(!linkElementURL && aLinkElementURL){
return relativizeURL(aLinkElementURL, host, finalURL)
}
if(!aLinkElementURL && linkElementURL){
return relativizeURL(linkElementURL, host, finalURL)
}
} else {
return linkURL
}
In order to get find the URL, we must get the “href” attribute from the element. We do this with the line const linkElementURL = linkElement.attr('href')
. Next, we check to see if the URL for either one is empty, if so, that means the page itself is the Webmention (as shown in test 15). The specification also mentions that we should only care about whichever element comes first if there are both “a” and “link” tags. The .index() method on the element tells us it’s position. After we compare it, if the correct element is in front and has a URL (which means it exists), then that link will be chosen. If there is just one “link” OR “a” tag then that one will be chosen (which we can see in the bottom two if statements).
Our final version should look something like this;
const cheerio = require("cheerio")
const fetch = require('node-fetch')
fetch("https://webmention.rocks/test/1")
.then(function(res){
const finalURL = res.url
var linkURL;
if(res.headers.get('Link')){
// split all link headers by comma
const linkURLFromHeaderUncleaned = res.headers.get('Link').split(",")
// then split the site from the rel
.map(linkHeader => linkHeader.split(";") )
// then check that the rel includes the word webmention
.filter(linkHeader => linkHeader[1].split("rel=")[1].includes("webmention"))
const cleanedLinkURL = linkURLFromHeaderUncleaned[0][0].trim()
// remove arrows from URL and give it as an argument to the relativizeURL function
linkURL = relativizeURL( cleanedLinkURL .substring(1, cleanedLinkURL.length -1), "https://webmention.rocks", finalURL)
}
return Promise.all([res.text(), res.headers.get('content-type').includes('text/html'), linkURL, finalURL])
})
.then(function([body, isHTML, linkURL, finalURL]){
if(!linkURL && isHTML){
const $ = cheerio.load(body)
const linkElement = $('link[rel~="webmention"]')
const aElement = $('a[rel~="webmention"]')
const linkElementURL = linkElement.attr('href')
const aLinkElementURL = aElement.attr('href')
if(linkElementURL == "" || aLinkElementURL == ""){
return finalURL
}
// the second should only be the case if no aLinkElementURL exists
if(linkElement.index() < aElement.index() && linkElementURL){
// link is before a
return relativizeURL(linkElementURL, host, finalURL)
}
// if the a element is before the link element OR
// if both indexes don't exist and the other element doesn't exist then run it
if(aElement.index() < linkElement.index() && aLinkElementURL){
return relativizeURL(aLinkElementURL, host, finalURL)
}
// if it matches none of the above (in the scenario where a link tag is before a tag) and the link is blank
if(!linkElementURL && aLinkElementURL){
return relativizeURL(aLinkElementURL, host, finalURL)
}
if(!aLinkElementURL && linkElementURL){
return relativizeURL(linkElementURL, host, finalURL)
}
} else {
return linkURL
}
})
.then(function(link){
console.log(link)
})
.catch(function(error){
console.log(error)
})
function relativizeURL(url, host, finalURL){
var beginsWithHttp = null
try {
// check if it's a correctly formatted url with the right protocol
const urlProtocol = new URL(url).protocol
beginsWithHttp = urlProtocol == "https:" || urlProtocol == "http:"
} catch(err){
// if not correctly formatted as a URL (must be relative)
beginsWithHttp = false
}
// if it doesn't begin with http and it doesnt start with a slash to indicate a page relative to the host
if(!beginsWithHttp && url.substring(0,1) != "/"){
// remove last part of slash and add url ONLY if last part of slash IS NOT included in the url piece
const indexOfLastSlash = urlToSendWebmentionTo.lastIndexOf("/")
const pieceOfURLOfLastSlash = urlToSendWebmentionTo.substring(indexOfLastSlash+1, urlToSendWebmentionTo.length)
if(url.includes(pieceOfURLOfLastSlash)){
const nonRelativeLink = urlToSendWebmentionTo.substring(0, urlToSendWebmentionTo.lastIndexOf("/"))
return nonRelativeLink + "/" + url
} else {
// correct location for redirect
return finalURL.substring(0, finalURL.lastIndexOf("/")) + "/" +url
}
}
// if it's an absolute URL it can be returned easily,
// if it's a normal relative URL it can be made absolute by adding the host part
if(beginsWithHttp){
return url
} else {
return host + url
}
}
Conclusion
That final step completes our journey into the world of Webmention endpoint discovery! Congrats! The code we’ve written should pass every test on webmention.rocks and be nearly-fully complaint with the specification! This isn’t the end of our Webmention journey, in fact, it’s just the start! It’s great to have a Webmention endpoint which we can send Webmentions to, but HOW do we send a webmention? That question will be answered soon! I hope you’ve enjoyed yourself, and feel like you have the confidence to write your own implementation. Good luck!