Lorenzo Miniero, Creator of JANUS


Lorenzo Miniero is the chairman and co-founder of Meetecho. Lorenzo
received his degree and Ph.D. at the Computer Science Department of
the University of Napoli Federico II. He also is an active contributor to
the Internet Engineering Task Force (IETF) standardization activities.
He is most known as the author of the Janus WebRTC Server.

MP3 Download

Hi Lorenzo. Nice to see you.

Hi Dan, nice to see you again.

I was thinking about the first time we met, it was a number of years ago, at IETF meetings. It’s when you guys were first getting started actually, with Meetecho.

Maybe you could just talk a little bit about that. What is your background? People know you as an expert now because of the work that you’ve done. But they may not realize how you got started, and that a lot of this has happened pretty quickly in the last few years.

Well, yeah, it did. The IETF actually was an excellent playground for activities at the time. Especially when I was trying to grow up as an implementer, as a researcher, and things like this.

It may sound weird, but I actually have been in the IETF for quite a long time, if I think about it. It’s been more than 10 years now, and it all started with my master thesis, actually. When I wanted to start my thesis, at the time I didn’t even know much about working VOIP and so on. I was familiar with it, but I had never actually tinkered with it much.

I had done an exam on network computing, but that was pretty much it. And then, when it came to doing my master thesis, I started checking what I might do, and eventually I ended up in the network computing department, and this is how I met Simon Romano and another professor, Alfonso Buono, who basically showed up, magically talking about this idea of a thesis working on VOIP over Asterisk, and things that could be done. It really blew my mind away. And this is how it all started. My thesis actually involved the IETF from the very beginning. The idea was to start interacting with the IETF community, especially in the XCON working group.

So the XCON work group was quite big at the time. It was on centralized conferencing, and we wanted to play a bit around that using the new protocols that they were starting to work with, playing with star-based topologies, and my thesis was exactly on that. Basically implementing the BFCP, the binary floor control protocol, and trying to integrate it into existing frameworks and see how it works.

And, in fact, my first IETF ever was shortly after I did get my master degree. So as soon as I got my degree, I flew to San Diego for my first IETF ever. We did the demo there, it was all new to me, but we did get a very good response.

Since then, I’ve basically attended almost all the IETF meetings, I think. Always with the same idea and spirit in mind. So, trying to participate in the IETF, trying to prototype the new protocols, in the true spirit of the IETF, which is rough consensus and running code. We were really interested in the running code part.

Right, you wanted to build.

Exactly. This is how it all started, and since then I never stopped.

I was also very active in the MEDIACTRL Working Group, which was another effort that I was really interested in. I think this may be one of the first times that I actually interacted with you specifically. Because you were involved in VoiceXML, or something like this at the time. So this was partly related to it, what Media Control was trying to do.

And that period really made me grow, as an IETF contributor, as an implementer, as a researcher, everything, because I was keeping on working at the university after my thesis, as a junior researcher trying to do research, trying to build new things, and so on. Which, eventually, is also what led to Meetecho being born, and what I am today.

Yeah, that’s right. Wow, I had forgotten it was that long ago. That was back for me in the MRCP days, which was part of the work with VoiceXML.

We were trying to standardize the same things, but using SIP as a protocol, and the control channel that we could use to orchestrate whatever it needed to have happen on those channels.

That makes a lot of sense, and yes, WebRTC is kind of the spiritual successor to SIP, really. There’s a lot that we learned from SIP, right?

Yeah, and I did take a lot of lessons from there, actually, because the MEDIACTRL specification is something that I really loved at the time. I loved working on it, we prototyped it, and a lot of the things that I learned during that process I ended up putting in Janus itself.

It may come up during the [interview] anyways, but this is something that really stuck with me and a lot of the things that ended up in Janus – the modular architecture, the way it lets you have different plugins using their own protocols to interact with the application, and so on. I wrote things that actually derive from that place.

And, really, anything that I did at the IETF really helped me grow up.

Cool, I’m actually glad that you mentioned Janus, because that’s one of the main things that I wanted to talk with you about today.

I’ve done a number of interviews with people, and I think this is the first time that I’m going to ask if there’s a way you could explain Janus and pretend that I’m a complete idiot, because Janus does some really cool things, and the way it’s structured is nice in certain ways. But I think it might be helpful to start off trying to explain it in as brain-dead a way as possible, before we get into the normal explanations you would give to someone who’s been working in this area for a while.

Maybe the easiest way to explain it is to start from why we actually started working on it, the idea we had at the time.

We were providing a conferencing service for a long time, so we had a web conferencing platform, mostly born from the efforts that I was describing before – the university effort, all the prototyping that we were doing within the IETF, and so on.

This all ended up contributing to a whole conferencing application that we were implementing. As soon as WebRTC arrived, we tried to take advantage of this and make sure that we could also use WebRTC to access exactly the same features as you could build via SIP, for instance.

This was not always an easy process, because of course we were using some existing open source applications for this. Some were already WebRTC compliant, some were, let’s say, on the right way but not exactly compliant, and so on. We worked a lot to make those work, and trying to put everything together, basically.

Eventually we ended up in a place where there were things that we wanted to do with WebRTC that we couldn’t without heavy tinkering of those platforms. Basically, too much constrained by applications that were too vertical for our own needs.

About four years ago after the summer, I started tinkering a bit with the WebRTC specification myself, studying much more the code than I had done before – studying exactly how ICE worked, how DTLS worked, those things – and trying to build a small WebRTC stack, bit by bit, brick after brick. Trying to get ICE to work after that, trying to get DTLS and SRTP to work, and then checking what happened if I started sending my own RTP packets, and whatever happened with the browser and so on.

It seemed to work much better than I expected, actually, and this is when we got the realization that as soon as you have this kind of functionality available, then you can pretty much do anything you want with it. As soon as you have a WebRTC stack created, then there’s no limit with whatever you can do with the packets you receive, and with the packets you actually may want to send.

This is how we came up with the idea of Janus itself. A component that implements all the WebRTC functionality, in the core, takes care of creating the Peer Connection for you. And then, what you want to do with the actual packets – whatever the browser sends you, or whatever you may want to send back to the browser – is up to a customizable logic that sits in different plugins. So, you may have a module you might see as an application that sits within Janus. With this module, I may decide to just send you the media, back exactly as it is and this could act as an echo test mode, an echo test plugin.

You may have another plugin that acts as a SIP gateway, another application that implements an SFU, and whatever else. And basically the idea was to have something that was as modular as possible, so that if you came up with additional and different ways of handling the media that did not exist previously, you could just implement your own stuff, and implement it in a new plugin. And this is where the MEDIACTRL specification really did help me out a lot, because this idea was the basis of MEDIACTRL itself.

Just to give a very quick summary, the idea was that you might have a SIP call that would end up at a media server, and then you had an application server that would control what needed to happen with this SIP call. So you may decide to start an IVR, you may decide to play an announcement, you may attach this SIP call to a conference, and so on, all in a really dynamic fashion.

All the application server needed to do was handle incoming SIP calls, forward it to a media server and then use the control channel to decide whatever needed to happen. So again, play with IVR, or mixer functionality, and things like this. Which was a really flexible approach, and MEDIACTRL was itself based on the concept of control packages. Each of these functionality were conceived as different modules themselves, so you might have the IVR module that implemented those IVR functionality, the mixer functionality would implement those kind of application, or bridging, mixing bridging features, and so on. And you could combine them in order to do cool things on that SIP call.

All of those things, actually, I brought with me and tried to bring as much as possible into Janus as a concept, which is why we ended up with this kind of modular approach where each of the plugins can also talk its own language. So that if you talk to the SFU, you can actually exchange a conversation from a signaling perspective with the SFU itself with the Janus API only acting as a transport for that, which gives you a certain flexibility in terms of how you can implement new plugins, new stuff, and new things out of this approach.

I don’t know if this is an easy enough explanation of how Janus works, and what it is.

Well, I think it is. You described a little bit about why you went this direction, but it almost sounds like you just didn’t like the way anybody else built it. It didn’t seem to quite do what you needed, and so you built your own.

Looking today, if you were to compare, why … it’s funny, I was actually trying to figure out who to compare the Janus gateway to, and that’s kind of the problem, right? Because other gateways that are out there maybe existed before WebRTC, so they added stuff on to make it work with WebRTC. Or you have a purely JavaScript application that runs in your browser, but that’s not really a server. You can fake it and make it a server, but it isn’t. It’s an application running on a web browser.

Where do you fit? From a taxonomy perspective you’re a gateway, but you’re a gateway that can do essentially anything that WebRTC can enable.


In fact, we’ve been asking ourselves the same questions. At the time we published Janus, we actually released it as the Janus WebRTC gateway, which, funnily enough, ended up confusing people about the purpose of Janus itself. Because, as you mentioned, many people are actually convinced that Janus was just a gateway and stuff. But we could actually do many more things with it.

So we started engaging with the community, and asked them, “What do you think we should call Janus? How do you think we should define it? Should it be a WebRTC server? An application server of some sort? Is it a WebRTC enabler? What exactly is this?” And I really cannot give you an answer by itself. Maybe the most generic definition is a WebRTC enabler / WebRTC server of some sort, because it allows you to act as a gateway because it can interact with legacy technologies like SIP, RTSP, or others, depending …

You can also write your own application to interact with something that we didn’t even think up at this very moment. Or you may just live within the WebRTC world itself, until you can use the SFU without ever leaving the WebRTC world, if not, let’s say, to forward those streams outside in order to do other stuff with it if you wanted.

But in general, the idea is that we didn’t want to put any boundary on what we might do with WebRTC itself, because WebRTC by itself is a very interesting technology that actually enables you to do very interesting things. It is in part, let’s say, the heir to all the SIP world, the SIP infrastructure.

It shouldn’t constrain you like that, because it would limit the WebRTC specification to also the context of the specific session, like SIP tried to confine their specification to, which shouldn’t happen in WebRTC. WebRTC is just a technology that allows you to do a lot of crazy things together.

And this is what we tried to do with Janus itself. At the time, before Janus actually existed, we did manage to do cool things with WebRTC as well, and we were actually very happy with how those applications performed. Asterisk is a very interesting tool that we still use to this day, and the conferencing application that we were using at the time as an SFU, which is now called Licode, was actually a very interesting component that we also contributed to.

They were just, let’s say, too focused, and too vertical for our own needs. They did their job very well, but it was really hard to try and bend them to actually work for our own needs.

One of the use cases that we had in mind, and we still have in mind to this day, was large-scale broadcasting. So the main use case would be to just stream the Super Bowl out into the world, and to do it with WebRTC because it’s what gives you the lowest latency available, and sports people are always angry whenever you get something that is-


Ten seconds later. Yeah, exactly.

The idea is, “Is this possible?” And the quick answer, the ideal answer, would be yes. You could do it with WebRTC. And of course, the actual answer is much harder than that. But you do need some kind of WebRTC enabler to do these things. So if you want to inject media with WebRTC, and distribute it in a network that is not WebRTC compliant and then turn it into something that is WebRTC again, or something like that. This is something that we had on our mind four years ago, we still have it on our mind today, so we’re trying to work on cool things like this, and to be able to do all of these things we needed to have much more control over the WebRTC stack, and on all of the processes that involve the transition from the WebRTC world to something else, and vice-versa.

And whether this something else actually means that you do stay within the WebRTC world, as the SFU, then this is an added value. But in principle, the idea is that we didn’t actually want to put any boundaries on what really we might do with those packets. We wanted to be able to handle those packets, and in order to do that we needed to have complete control over the process, a better understanding of the WebRTC specification. Which is one of the things I’m grateful for Janus, because it forced me to study deeply all of those documents, and make sure everything was done correctly.

Personally I’m very pleased to hear of this focus. It’s kind of weird to call it a focus, because it’s actually pretty broad. When WebRTC started we knew, of course, that people would use it for telephony replacement, okay? Everybody would have their WebRTC endpoint support, and so on.

But that’s not really why it was created. It was created to try to bring communications to the masses of JavaScript developers and allow them to just innovate and come up with something new and different. And yet many of the gateways, many of the commercial systems that are out there, just reproduce the exact same use cases and verticalization, like you’re talking about, of all the prior products.

You’re really helping the promise of WebRTC here by … yes enabling those as well, for anyone who wants to do them, but not limiting what you provide to only that. And actually not limiting the extensibility, rather making the extensibility for anything beyond that easier.

Yeah, and we have to be happy that eventually there was no agreement on any signaling protocol by itself, because otherwise we might have ended up in the closed set of gateways that you described before. So, we would have been stuck with any of the existing protocols at the time, and this would have limited the scope of applications that you could implement.

That’s right, and that’s exactly why it was done, the lack of standardization on the signaling.

In this grand vision … but maybe it didn’t start so much as a grand vision, rather just a coding project. And then it turned into a pretty grand vision, I think.

So, in that process of doing that, you mentioned some things that went better than you expected. It was actually easier to get some of the parts working than you thought. Were there other things that surprised you with how well they went? I’m next going to ask, of course, which things didn’t go well. Let’s start with the positive first.

One of the things that went much better than we expected is not strictly code related, but is actually in the response we got from the community in general. One of the big questions that we had at the time, before actually releasing Janus, was whether it should be made open-source or whether we should keep it as a closed project of some sort. We knew we had something in our hands, but we didn’t know exactly how to bring it to the world, basically.

And we ended up deciding to do it open-source, even though we were a bit scared at the time; as a small company, you’re always scared that people may come and steal your work without contributing. But this ended up being the best choice we might’ve ever made, because by making it open-source a lot of people were actually able to play with it by themselves, and then provide feedback. We met a lot of people that started contributing to the project, and it really went much better than we initially expected. It put all the fears we had about the open-source release completely away. By making it so widely available, many more people than if it had been a closed project were able to test it, to play with it, and then decide whether or not to stick with it. Which also helped with debugging. People were testing it, were finding bugs much sooner than we might have done by ourselves. This is definitely one of the things that I was very happy about with the outcome of Janus itself.

And then, as you said, the general purpose of Janus actually expanded to a point that it went beyond our own ideas. Of course we had our own use cases in mind, which ended up in the plugins that we implemented out of the box, but then people started contacting us with even crazier ideas that eventually turned out to be possible with Janus. For instance there was a demo that was made, at the very beginning of Janus, by a couple of very well-known people in the WebRTC world that used a drone that you could control via Janus, via data channels. That was something that really blew my mind, because it showed that we had done something that could be used in crazy ways that were outside of our control, of our ideas, and so on.

But, as you said, it’s not always roses. There’s also the downsides of that.

I think the main downside might be that not everything turned out exactly the way I conceived it at the beginning. For instance, the modular nature that I had in mind for Janus at the very beginning was even more ambitious than it is right now. It is quite flexible, but I really had in mind something that was even more complex that allowed more of some kind of a Gstreamer-based approach where you have stuff like plugins that act as filters, as sinks, sources, and so on.

This never turned out to be the case. It’s probably good that it didn’t, because otherwise Janus might be still under development, and I might be still working on it. And there are things that are still missing in the specification that I really care about, like multi-stream support. For a lack of time, I never found the right occasion to start actually focusing on that aspect. There are some things that I still wish I had more time to work on, but in principle I’m pretty happy with how everything came out.

My audio cut out just slightly when you mentioned multi-stream support.

It was the idea that you can actually aggregate multiple streams of the same type over the same Peer Connection.


So you can have one outgoing audio, one outgoing video, and ten incoming video streams, for instance.

This is something that by design currently we do not support in Janus. In a single Peer Connection, you can only have at max one audio, one video, and one data channel. And this is basically all you can do with a single Peer Connection in Janus. Which means that if you want to do some kind of an SFU, you will need to involve multiple Peer Connections in the process to get the kind of bidirectional communication that would be needed.

Having multi-stream would help in several scenarios, but it wouldn’t help in all the scenarios because the plugins are still isolated from each other. So you can’t have, let’s say, a single Peer Connection where some bits go to a plugin, and other bits go to another plugin. You would still need to have different Peer Connections to talk to different plugins.

But the ability to aggregate the streams of the same user, with the same plugin, into a single Peer Connection would actually give some benefit. This is something that is definitely on our mind, and we want to implement it, even though there is a lack of agreement between browsers on how to do this right now. Like, the Plan B vs. Unified Plan, and so on, which would make this even harder to handle.

But, again, this doesn’t mean that we shouldn’t actually start working on this as soon as possible.

I think there actually is still agreement on how it’s supposed to work, they just haven’t implemented it yet.

Yeah, I didn’t want to say that, but yeah.

Yeah, yeah, I know.

We’re both very aware of what’s happening in the standards world, and the really good progress there, but the implementations are a little bit behind right now in the browsers. I think that’s gonna catch up, there are good commitments there for that to happen.

You talked a little bit about the reception that you’ve gotten for Janus, and that it’s actually been really good. I was wondering, do you have anything else you’d like to say about that? You mentioned a couple of interesting use cases that people had, but what have you encountered in terms of people saying, “Yeah, this is what I needed, and this is why. I couldn’t do this other thing until I found Janus.”

Most of the time, it’s people contacting us saying that they did their homework, they tried to have a look at the existing solutions, and eventually they chose Janus because it gave them the right flexibility to do their things.

Often, we are not part of that decision process, which in part is a good thing, because the availability of so many open-source solutions out there actually helps the whole WebRTC server ecosystem. It helps people to actually do their homework, study what’s available, study how they will work, experiment with them, and so on.

Knowing that so many people eventually ended up using Janus for their needs is actually something that makes us very proud, because it means that people actually try to do things with other solutions and – not because the other solutions are bad, but just because they may need to focus on one specific scenario – they end up preferring a solution like ours instead.

I’m curious, is there anything that you would recommend that people not consider Janus for? Like you just say, it’s just not the right tool? You could use it, but you shouldn’t.

That’s a good question. I really don’t have an exact scenario in mind. Some people contact us thinking that Janus is just a pure gateway; for instance, they come up with the idea that the SIP plugin is some kind of a SIP gateway server functionality by itself. I’m just making a very specific example here just to give you an idea of the reasoning, but sometimes they have a different assumption of what actually sits between the Janus plugins by themselves. For instance, the SIP plugin itself, which is the plugin that allows you to implement the SIP gateway functionality, allowing you to do WebRTC to SIP and SIP to WebRTC scenarios, is actually just a very simple WebRTC enabler for SIP calls. There is a SIP stack that sits within the plugin but only acts as an endpoint. So it doesn’t act as a SIP server, or a SIP proxy, or something like this. It’s just an endpoint that you can use on behalf of users in order to register at the SIP infrastructure, send calls, receive calls, handle the process … just as if it were a regular SIP softphone component, and so on.

So anytime you end up wanting to do more crazy things with this, like creating complex infrastructure based on SIP within the SIP plugin itself, this is something that I discourage, because this is something you can do with existing SIP infrastructures already.


If the access is what you need, then Janus can help you by allowing these users to communicate with the SIP infrastructure, but don’t ask Janus to do more than it actually needs.

This can be generalized to basically try not to reinvent the wheel as much as possible, so if there are existing solutions that work well … One thing that we’re asked for often is, can Janus do transcoding? We often say, “It might. You can implement it in your plugins, but we discourage it.” So try and keep Janus as lightweight as possible, make sure that the streams go where they need to go, and if you need transcoding, make use of the several existing solutions that allow you to do that much better than we could, if we had to invent it on our own.

In principal, I suggest you try not to reinvent the wheel and try to use Janus in conjunction with other applications that will actually help you do that job. Because there are many, and they are actually very good.

That makes sense.

A plugin is supposed to be a plugin, not a-

Full-fledged solution.

Exactly, right.

I’m curious, in this process that you’ve gone through of building this, what have you learned during that process? One of the reasons I’m asking is if there’s someone else who says, “Hey, you know, maybe I’ll just build my own thing, just like Lorenzo did.” Is there any advice you would have for them? You know, something like, “Yeah, don’t do this. Because that’s a bad idea.”

The first suggestion is definitely to do your homework. Especially in the WebRTC world, [you and I have] both shown that huge slide with all the WebRTC protocols and technologies that you need to be aware of in order to implement a WebRTC stack. So you really need to do your studying, and that actually was very helpful for me at the time, because I had to dig deeply into the documents in order to really understand how exactly everything worked, and sometimes that wasn’t even enough, because you start to clash with what’s supposed to work and then doesn’t because of other reasons. You start to become an expert on debugging issues, and things like this.


In my experience, Janus was really a great learning process by itself. Because I had implemented a lot of stuff since then, since the very beginning, but really, never something that was so complex at the time. Even the modular architecture, it was something that I was mostly familiar with because I had worked a lot with Asterisk … I still do from time to time. I had worked a lot on the MEDIACTRL specification, which in principal is modularized, as we have explained before. And we also implemented a prototype of that server.

So I already had a bit of familiarity with how to do that. But making it as modular as Janus is has proved to be quite a helpful learning experience, and one thing that I suggest is to try to make your application as modular as possible from the very beginning, which is something that to my fault, I didn’t do from the very beginning.

I had made the plugins themselves, the media plugins themselves, modular but I hadn’t done the same for other things that I could have done with Janus … like transports, for instance. At the time that Janus was released, we only had the simple HTTP-based interface API that you could use to go through Janus. Which, from an API perspective, is exactly the same as it is today. But how it was implemented was completely different. It was hard-coded within the code.

As soon as people started to come up with ideas like, “Why don’t you use WebSockets? Why don’t you use RabbitMQ? Why don’t you use foo or bar, or whatever?” As soon as we wanted to make this possible, it would either have meant adding a lot of stuff to the core and making it as heavy as it can be or to try to make this other bit modular as well, which we truly designed with the route that we took. This is what we did eventually, but it took a bit of effort, a bit of time, that we could have spared if we had thought about that at the very beginning.

So one of the things that I can definitely suggest is try to think about all the responsibilities that you can separate, all the things that you can actually put in modules, and make modular and expand later on. And try to do that yourself as well, because it’s going to pay in the future.

That makes a lot of sense.

Sometimes it’s hard to have the full grand vision when you start for just how modular to make something. And sometimes you’re just trying to get it to work, too, I imagine.

Yeah, I know, absolutely.

In fact, the first experiments that I made were all very hard-coded, so I had a simple WebRTC stack. I was sending my streams, my RTP packets from within the same code, it was really ugly. The first experiments just to see if I could get things working, you don’t want to see those!

But as soon as you have a better understanding of how you can separate all of those things, it eventually pays off along the road, because you end up being able to do a lot of things that you are not thinking at the time when you actually started writing applications, which proves to be really helpful later on.

So what’s next? Is there anything that you’re excited about? Or you think needs addressing? You talked about multi-steam obviously, but is there anything else that’s kind of like, “Ooh, man, I’d really love to do this.”

Yeah, there are a couple of things.

One is a thing that I’m actually working on right now, a new plugin that we’ve started implementing, and we actually also released it as a pull request at the moment, which is a Janus Lua plugin. So far we’ve discussed in general that Janus is a modular architecture and you can write your own plugin, and implement this and that.


But Janus is written in C, so if you want to write your own plugin, ideally you need to be competent in C and to know how to write your own plugin in C, which can be complex for some people. This is what lead us to actually start writing a new plugin that can actually act as an intermediary between a Lua script and the C code.

So you have a C plugin that takes care of all the low level functionality that most of the plugins use – how you can actually make sure that one stream coming from a user gets to another user, this kind of media routing … or, let’s say, sending a keyframe request, or handling the life cycle of a specific user, and so on.

This low level stuff is all done in C, and then we expose all the functionality itself via Lua script instead, which means that you can write your own Lua script in order to decide which kind of application that you want to implement. Just for fun, we implemented an echo test global written just in Lua. We implemented a simple SFU written in Lua itself. I implemented a demo that does some kind of a chat-roulette with Lua.

Without patching the C code, we just play with the Lua script in order to decide what to do with the media coming from the user, and without ever touching the media either. We just invoke some methods and so on.

This is something that’s really, really exciting, and it might be the next step in making this even more modular in the future, because then you can have your own battery of Lua scripts, that just fits with every case that you have in mind.

Apart from this, one of the things that also I want to really push forward is the scaling of Janus instances by themselves, because right now, the concept of scaling is mostly a loose orchestration of the Janus resources that you have. Mostly each Janus instance is pretty much an isolated component, so two Janus servers do not talk to each other, and probably neither would I want them to actually talk to each other and keep some state. I want them to be as unaware as possible of each other and keep all the state in a controlling application of some sort.

But making these actually more dynamic, flexible is one of the next things we want to start working on, so that if you wanted you could implement, if you allow me the term, some some kind of a Platform as a Service kind of thing that is actually based on Janus as the bricks in the background. So that you can say, “I need a conference with 1000 users, how do I do that?”

How you can actually distribute multiple Janus instances in the background, and make this transparent to the user actually using these resources in the background, unaware of the fact there may be streams that are actually redirected between each of them.

This is something we would really like to start working on, because there is only so much optimization that you can do on the single instance by itself. Say you can scale a single Janus instance up to X. If I want to do X plus one million, how do I do that? This is the answer that we’re trying to find.

Right, that’s interesting because I heard you talk about not just efficiencies, but what happens as you scale. Is there extra stuff that you can do to make life easier for the users of it, without them having to explicitly code how those different instances would interact?

It’s not just scaling up. A lot of times, scaling up just means, “Oh yeah, now you can ask for ten thousand and your machine doesn’t explode because we’re efficient in the use of resources,” but I thought I heard you talking even more intelligently than that. It’s not just about efficiency. It’s also, perhaps, about ease of use, ease of working with that large a number.

Yeah, exactly.

The problem there is also that there are only that many assumptions you can make with Janus itself because scaling a Janus-based application may be done in a specific way, and the same approach may not work with another Janus application instead.


The different nature of the plugins means that each plugin may actually scale in a very different way. The SIP plugin just acts as a SIP endpoint by itself, so there is no interrelationship between any of those SIP calls that it does. If two users are actually talking to each other, that’s a problem of the SIP infrastructure in the background; from the plugin perspective it’s just handling a user, turning it from WebRTC to SIP or vice-versa and it’s done. So, handling one thousand calls, or ten thousand calls, or whatever … it means you can just spawn as many instances as you want, and then just load balance the usage. Say, “Okay, I’ll put this user here, this user there,” and so on.

As soon as there is some kind of interrelationship involved instead – maybe with some kind of a video conference, or an SFU, or something like this, where you have a one-to-many kind of approach, where the one is actually the shared source of information for all the many that are actually receiving this packet – this becomes more complicated, in that case. You cannot just say, “Okay, I’ll go on server one if the user is publishing on server three,” without doing anything that actually makes those packets go from server three to server one, just to make a very simple example, and it becomes even more complicated when you try to involve different plugins.

So context awareness is very important when you do scaling, and making this less of a problem for the user to worry about, for the application to worry about in the background. If you can find a way to make this pretty much automatic without you having to worry about how the context actually works, and how to distribute this in an efficient way is problematic.

Which also, incidentally, made our own scaling tests complicated. Because each plugin needed to have its own stress campaign, it was addressed in different ways. Saying it can serve one thousand Peer Connections may be true for one plugin and not be true for another plugin. The concept of users are different, because a user can be associated with ten Peer Connections rather than one. The bit rate might be different from use case to use case.

The flexibility is a plus, but it can be a problem to have to face later on.


Now you may end up having to actually classify plugins in certain ways, or require them to provide certain kinds of interfaces in order to benefit from what you’re talking about, basically, the intelligence that you can add in there.

I presume that you don’t intend to create an arbitrary program writing program. That has computer science concerns, right?

The idea is to implement some kind of an orchestrator that implements the context awareness in there, because there are some common properties that one can evince from the relationships between all of these plugins. That’s the idea in principle, not something that’s as complex as you are suggesting.

Right, the arbitrary Turing Machine thing.

I have one follow-up question on something that you had mentioned. I’m a big fan of scripting access.

My background is speech recognition. One of the best things we did in our research lab was to … we were using TCL at the time, that was one of the languages that was available and easy to extend with. We did that for our recognizer. We broke all the pieces up and then gave script access to each of those pieces. For doing research it was just phenomenal. It was really unparalleled because you weren’t constantly writing and debugging your C programs. It was much faster to do scripting.

So I’m a big fan of that.

I’m curious, why did you choose Lua? I have nothing against it, I’m just curious.

It was actually suggested to us by one of the companies that we are doing consulting for. This is something that we might both benefit from. They were more proficient in Lua at the time, and we wanted to have something that was flexible at the same time.

But the idea is that this Lua plugin stack would just be a door-opener, so not the only solution for that. For instance, as soon as I started mentioning this effort with the community, I chatted with Saúl Ibarra Corretgé, which I don’t know if you know, but he is a very nice guy who contributed a lot to Janus in the past and is now working on Jitsi. We’re very good friends, and we were discussing about this new functionality, and he also suggested, “Why not do the same thing with JavaScript, for instance, which is something that a lot of people use.” And he suggested a project that might be able to help with that.

And now I’m talking with you about this and this other idea came about. Ideally this is all feedback that actually might end up in Janus, sooner or later, so as soon as the Janus Lua plugin is up to a point where it can actually be used in an efficient way to implement what it needs to do, the same exact kind of binding functionality can also be abstracted and reused in other contexts as well, because all the lower-level stuff is already there, all the media routing and stuff. What we need to do is hook this up to a different language, like JavaScript, or the technology that you were mentioning, or whatever one has in mind.

Really, this is exciting for us just because it allows us to go beyond the idea that if you want to write a plugin in Janus, you need to do it in C. The idea is to say, “You really don’t. You can do it in other ways that may be good enough.”

For instance, I’m aware of another company that has done a Java binding to plugins instead. They are doing all logic in Java, and they have some kind of Java Native Access kind of bridge that actually allows them to do that, so I’m really curious to see where this all will lead.

Yeah, you could argue whether or not that’s actually faster than writing in C, but plenty of people have Java frameworks, and it’s really nice if they can just operate in that world.

The last question that I have is, is there anything that you’d like to tell us? Anything in the works? Or an announcement that you’re excited about? Or anything you want to talk about?

I think we covered pretty much all the exciting stuff. One thing that I’m really excited about is that we are going to Singapore in a few weeks. I think it’s next week.

It’s very soon, yes.

We are going to the IETF, where we are going to do the remote participation services for the IETF as usual. Some people may not be aware of the fact that we are actually the official remote participation service for the IETF meetings.

Anytime you want to access a meeting remotely or present from home instead because you couldn’t make it, you are actually using our technology, and behind the curtains, you’re actually using Janus to do that. Actually using three different plugins at the same time, but that’s a matter we can discuss some other time, and there’s a lot to say about that.

After the IETF, we’re also going on a tour later on to South Korea, and Japan as well, in order to talk a bit about Janus and other cool things WebRTC-related. That makes for exciting times, and I’m really looking forward to it.

Great. Thank you so much, and I look forward to talking with you again. Have fun in Asia.

Okay, thank you. It was nice talking to you again, as well.



Dr. Daniel Burnett has a history of almost 2 decades of experience with Web and Internet standards dealing with communications, having co-authored VoiceXML, MRCPv2, WebRTC, and many other standards. In creating AllThingsRTC, Dan aims to provide the innovators in the real-time communications space a forum for explaining the topics that really matter.