Legacy systems modernization is a topic that never gets old. We talk to Vadim Zaytsev of Raincode & Raincode Labs to learn how Raincode uses language engineering techniques to save banks millions of dollars that they spend to run their COBOL and PACBASE systems.
Listen to find out:
- What drives companies around the world to migrate away from mainframes.
- How Raincode migrates legacy systems without actually migrating any code.
- How Raincode modernizes legacy systems written in exotic 4GLs to a more modern COBOL.
- What role does language engineering play in all of this.
- What is SLEBoK and why you should care.
[For some people] it’s not sustainable to stay in the mainframe, and they need to move elsewhere. And that’s where we come in and we help them.
A company can have some problem in their codebase, and they feel that it’s a problem, but they either don’t know what to do with it, or they don’t have enough information to make the decision or they don’t know what options are available
We basically do refactoring. In academic terms it’s a term rewriting system… Over the course of the years, we’ve developed (..) close to 200 different refactorings, all of them are very small, very local. And since they’re local we can sometimes test them exhaustively, sometimes we can even verify that they are universally true. We can prove them to be correct.
At some point we started having data description languages, architecture description languages, markup languages like XML or Markdown. And what people started noticing (…) is that a lot of technologies that are used to deal with these languages are the same, or could be the same.
Sergej: Welcome to Beyond Parsing. This podcast is dedicated to language engineering, creating custom languages and using them to develop domain-specific software tools.
Federico: We are your hosts Federico and Sergej, two language engineering consultants.
Sergej: Today we are very glad to have Vadim Zaytsev on our show, Vadim is a very active member of the language engineering community, both on the academic and the industrial side of it, in fact on the industrial side, Vadim is the chief science officer of not one, but two companies. Hello, Vadim.
Vadim: Hello, good morning.
Sergej: And with us we also have Federico today. Hi Federico.
Sergej: My first question to Vadim would be can you tell us more about the companies that you are the chief science officer of and what does a chief science officer do?
Vadim: Okay, thank you. First of all, glad to be here. Nice talking to you. So there’re two brands that I’m frequently representing, that’s Raincode and Raincode Labs. Raincode is the company and it’s a brand that’s been active since 20 years or so already. It’s a hardcore mainframe to .NET product provider in a compiler space. So if you want to migrate your code from a mainframe, if you have your code in, let’s say, COBOL, PL/I, Assembler, this kind of nice languages, mostly owned by big companies like IBM and CA and whatnot, then you can use Raincode products to run them on a different infrastructure like .NET.
Vadim: Well, right now there is also .NET Core, so you can run it on Linux and Windows and whatever you want. That’s a product producing company. If you want to run your COBOL, or if you’re just listening to this podcast and you feel nostalgic about COBOL, then you can just go to Raincode.com and the COBOL compiler is for free. You just fill in the form, you get a download link and that’s it. You need to use Visual Studio or Visual Studio Code or something along those lines and then it just runs. If you want some support, you come back to us. If you don’t, you don’t. And for PL/I and Assembler, it’s slightly more complex to start.
Vadim: Then Raincode Labs is more flexible. Recently, since a couple of years already, we found out that there’s quite a lot of requests for things that are not exactly mainframe to .NET. For instance, a company can have some problem in their codebase, and they feel that it’s a problem, but they either don’t know what to do with it, or they don’t have enough information to make the decision or they don’t know what options are available. And in that case, we call it compiler services. We start with consulting, they tell us the story and for instance then .NET may not even enter the scene.
Sergej: COBOL and PL/I and those languages still do? It’s related to those languages?
Vadim: It could be but it’s quite often also related to so called fourth-generation languages (4GLs). These are basically domain-specific languages but developed before we knew how to design domain-specific languages well. They were mostly focused on writing as little code as possible, which is a noble goal. But if you go too far into that then you get strange syntaxes and you have leaky abstractions, which means that you end up having a language cocktail and your business logic is scattered all over the place.
Sergej: I guess we can touch more on that later in the podcast.
Federico: Can I ask a naive question, why would someone want to leave the mainframe? What is wrong with staying on the mainframe?
Vadim: Oh, that’s a very good question. And in indeed, mainframe has existed since 1952, unless I’m gravely mistaken. So you can say it’s a mature technology, right?
Vadim: As mature as it gets in computers, it’s very stable. It’s very reliable. It’s very mature. It’s proven. IBM is still producing new mainframes today and they are running fine and they seem they’ll not go out of business anytime soon. So indeed, why would people want to migrate. On the mainframe basically for you to understand the money flows, you pay per MIPS. MIPS is a million instructions per second. The good thing is when the MIPS were introduced, it was the famous IBM System/370 computer… if later you want a cool picture for your website, I can give you a cool picture taken here in the Deutsches Museum in Munich after MoDELS’19.
Federico: That would be nice.
Vadim: Yeah, they have a museum with the old IBM hardware. When that system was very widespread, and its main competitor was a VAX-11/780 system, both of them had pretty much the same performance, and that performance was roughly equivalent to 1 million instruction per second. So that’s sort of one MIPS. And if you move to bigger computers, bigger mainframes, more modern mainframes that are around today then… well, a typical system would run around 4000 MIPS, let’s say, and if you have a system of 4000 MIPS then you’re paying somewhere between 6 and 16 millions of dollars per year just to be able to run it. It’s a bit expensive. Yes, but if you want reliability you can pay for it in money, and then you get reliability. So that’s good. If we’re talking… let’s say 20 years ago, 10 years ago even, then let’s suppose that suddenly a bank has twice as many MIPS that they have to pay for.
Vadim: What does it mean? It means that they have twice as many customers or that their customers are doing more stuff with them. If you’re a bank, and you suddenly see that you have twice as many customers, that’s good. That’s extremely good. That means you get more money in, it means your business is growing. And you will gladly pay twice as much to IBM or whoever has the mainframe. What happens now is there’s a lot of this pesky things called mobile apps. People get… I don’t know… They get bored, they sit somewhere in a traffic jam, and they whip out their mobile phone and they start clicking on things. And many of these clicks, they transform to something that needs to be computed. And then if you have much more transactions on the mainframe, it’s still something that you need to pay for and pay dearly for. It’s something that does not translate to more customers. It may translate to more customer satisfaction, which means you cannot simply ignore it and cache the values or do something like this.
Vadim: I think there was a famous story with FedEx, if I’m not mistaken, it’s been in the news, they suddenly started having too many transactions that they needed to pay for because people were just anxious about getting their packages and basically refreshing the page all the time literally going with their finger over the screen every second. And their solution was technically sound, it was to cache the value and then not to compute it again, going through all the queries but to just cache it and renew it every 10 minutes or so. But that resulted in the decrease of the quality of the service that they’ve been providing, because then people started saying, “Oh, but my package has just arrived and my app still says that it’s not there.”
Vadim: So again, when people are not satisfied now, they post proofs of the bad quality of your service on Twitter. And that’s immediately bad. They don’t just call your hotline as they would do 30 years ago. Tthe world is changing and that’s why some people stay happily on the mainframe. Some people stay happily with COBOL that’s also fine. But some people just see the way their business works, it’s not sustainable for them to stay in the mainframe, and they need to move elsewhere. And that’s where we come in and we help them. We don’t deal with happy people, happy people are happy without us. We turn sad people into happy people.
Federico: And I guess you have to deal only with… well rich clients because if someone can afford paying some millions per year for their computations, well, they have some money to spend?
Vadim: Well, usually when the customer goes to us and the customer has money, that means that we can offer more options, basically. That’s why if you go to our website we have references like: Oh, this is the biggest migration in the world; and this is the biggest portfolio in the world, and so forth. And that’s because if this kind of customers come to us, we can offer different options and we can follow through with the chosen option. And if you’re in deep trouble with legacy software, and you’re a very small company, then probably one of the best ways is to just stop and start over in a more healthy ecosystem.
Sergej: So when you come in and let’s say you have a customer that says, “Please do the migration for me.” You come in you see COBOL, you see this PACBASE, you see PL/I, you see Assembler, are you able to translate all of it or do you just tell them like, “We can translate 80% and 20% you will have to do yourself” or how do you do this?
Vadim: Well, this is a very good question, as well. And I see you’ve prepared a lot of good questions. I wonder if there will be some tricky ones later as well. Yes, this is a good question and a tricky one, as well. But first of all, we are not translating the code that they have into something else. In fact, in many cases, this is a very hard requirement by the customer that their codebase remains untouched. It means that exactly the same code that has been running on the mainframe will be running elsewhere.
Sergej: That’s an interesting approach to migration I would say.
Vadim: Sometimes there is also a technical reason for it. In particular, if you can run the same line of code on the mainframe and on… let’s say desktop or Windows Server, then you can actually do that. And you can do that for testing purposes. And just to run the same transaction twice, once through the old system, and once through the new system and see if it results in the same state of the database. And that’s the best kind of testing. And some companies have been doing that.
Sergej: … And I guess if it does result in the same state, then for the impatient web app users, you can just run this code where you don’t have to pay for per MIPS, right?
Vadim: Right. The customers that want to really make sure that it’s the same they first run it for themselves, and then they just start running the things that they think that they can migrate only on .NET. And then they slowly reduce number of MIPS on the mainframe. And that’s good because remember you pay per MIPS. If you start running just slightly fewer things on the mainframe, you need to pay less money for it. It means you can for instance finance further migration. So, migration can also be gradual. Migration is a very complex process. And the more you can shift to either before the migration or after the migration, the better. So, in some cases where… well in that particular project, there was actually some translation requested by a customer. But we simplified the project by moving a lot of checks and a lot of preparation before the migration since they wanted to change… That’s actually has been published. It’s called Quality First! A Large Scale Modernization Report at SANER 2019.
Sergej: So basically, when you do the migration, you can go to the customer and say, “Okay, now your code was running on this big mainframe. Now the same thing can run on .NET.” But it’s still in COBOL, and PL/I and this whole mix of languages? So they still have to pay their COBOL programmers who are slowly probably retiring. And so on.
Vadim: In the case of mBank, we just wrote the testing infrastructure for them and the code analysis tool, and that enabled their programmers to change a lot of things and start refactoring their code, testing their code, improving the quality of their code before the migration. And then the migration went… well, they wanted the translation so they were very interested in getting the translated code to look like it could have been written by a programmer, an actual program.
Sergej: So you actually translated their code into something more modern.
Vadim: In their case, yes. In other cases, as you say, it’s a migration where the code is not changed. But it means that after you moved from the mainframe to .NET, you get all the tools that come with .NET. You get Visual Studio, you get any plugin that works on Visual Studio, you get our plugin where you can debug the application and that’s for instance, quite often the case with Assembler code. Nobody wants to keep IBM Assembler code forever, especially if you’ll end up emulating it or running it in anyhow on a non-IBM infrastructure like Intel. But then if you just are given the opportunity to not touch it and not deal with it during the migration, then you can just migrate to our assembler compiler, and then use it later. And have some nice debugging sessions that are very comfortable in Visual Studio, with breakpoints and whatever, even in your self-modifying code.
Sergej: Yeah I can imagine that this kind of allowing organizations to only change one thing at a time must be very valuable to them with these old products.
Vadim: Yeah, that’s usually highly appreciated. The fewer changes at the same time, the better. But sometimes it’s also linked with database migration and stuff like this.
Sergej: At the end they can also add more .NET code. Write some code in C# that kind of calls into the old COBOL code?
Vadim: Sure. They have then in their hands the full capabilities of .NET. And .NET is, for those that that don’t know, it’s a multi language platform, so you can write it in C#, in F#, in J#, there are a lot of sharps there. Not just sharps, there’re tons of languages available there.
Sergej: Visual Basic.
Vadim: Visual Basic, .NET. Well, it’s sort of a complete redesign of Visual Basic that runs on .NET which is pretty much a different syntax for C#, but if people are more comfortable with a Visual Basic syntax then why the hell not.
Federico: I was wondering, from the point of view of someone that has to implement the software for this migration, how complex is to be the language part. So I guess a compiler for COBOL and how complex is to build the system part because I guess that you have to replicate on .NET a lot of system functionalities that you add on the mainframe. So I guess both system can have libraries to print something on the screen, but I guess the mainframe can have some other functionalities that are not immediately available in the .NET standard libraries.
Sergej: Even the character encoding is different on mainframes, right?
Vadim: Oh, yes, there is EBCDIC.
Vadim: Which is different from ASCII or Unicode or whatever. And EBCDIC quite often is what we keep in memory. Again, if you’re migrating very large portfolios, then you can’t be sure that somewhere in one of this 250 million lines of code that you’re dealing with, there is a comparison of a character to a hard coded constant, just to check that if it’s a dot let’s say. If there is then you need to represent the dot with whatever dot is in EBCDIC and not in ASCII.
Vadim: Going back to the original question of what is harder, both are hard. And so the complexity is usually in the details, the devil is in the details. The complexity of writing a compiler as such, is substantial. But it’s a very good complexity because you take a book from your shelf, you read it, you do exactly what the book says and you have a compiler. Probably you have a much better compiler than what you would have written without following what the book says exactly. Because, again, compiler construction is something that has been a thing since 1956.
Vadim: And people know what they’re doing. And well, we fancy ourselves to be compiler experts, so we think that we know what we’re doing. But of course, writing the compiler for legacy language means that you’re dealing with some strange things. Like, when you’re compiling assembler, for instance, you sometimes are allowed to compile it to something that just runs. And sometimes you need to emulate it because it can just take a random byte from a random location and say, well, let’s execute that one. And of course, you didn’t know before that so you couldn’t compile it. Then you have to fall back to an emulator. Or in COBOL you can have an ALTER statement, which sort of takes an existing GO TO which you would think is already bad, and at runtime, it redirects the existing GO TO to a different location.
Vadim: This is not as problematic for the programmers, because if they’re using ALTER, probably they know what they’re doing. Well, let’s hope. But it’s extremely problematic for the compiler writer because then you need to shift a part of your GO TO infrastructure, the dispatch table, to runtime, as opposed to compile time. But yes, a lot of effort is also going to replicating decimal types in a particular format, because when you’re computing something that’s sort of money then you’re usually not using floating point, because that’s a good way to lose some of the little cents when you’re dealing with big sums. And so then it’s always fixed decimals and they need to be in a particular format to be efficient. There’re dates of a particular format, and they differ from one fourth-generation language to another. It is hard but I must say, it would have been much easier if the documentation was better, or if the documentation was always available. And from time to time, we need to write an entire compiler for something for which the documentation is just not available at all.
Federico: Okay, not the ideal scenario.
Vadim: Not the ideal scenario. No, not what I was signing up for when I took the compiler course.
Sergej: How do you manage to do that?
Vadim: Well, requirement solicitation with talking to people who know how things work. And the second is looking at the codebase. So the owners of the codebase are our customers, so they can just give us the code and we sign that, I won’t post it online, I won’t Tweet about, it’s fine. And then I can have it. Documentation belongs to the vendors of the original compiler. So that was, for instance, the case with SDC… it’s a big company in Scandinavia and they had most of their business logic in a system called AppBuilder. And we built for them a system called TIALAA. TIALAA is short for There Is A Life After AppBuilder.
Sergej: Well you were having fun doing that work too.
Vadim: Oh, definitely. Yes. In that case, we could see the codebase. If you see in the code that you do something and then… You run a SQL query and then suddenly you start checking the return code in some magic data structure, then you can just guess that the data structure is populated by running a SQL query magically. And then again, you go back to the experts in this domain, and you ask them whether you are right. If they can answer then that’s good; if they cannot then you write tests, and you let them run the tests, and then you see what returns.
Sergej: You don’t run the tests directly yourself on AppBuilder, you kind of have to go through these experts?
Vadim: Yeah, we don’t always have the access to the original system. That’s quite common. Also, because it’s expensive. Again, these people pay per MIPS, and compilation happens on the production server basically, which is also extremely different in the modern setup. If you have a modern setup, then usually you have a developer, the developer has a machine and you have a piece of code on your machine, you work on it, you compile it and test it locally. And then you deploy it somewhere. Which means if you recompile even live or every couple of seconds well you’re making your computer warmer, but you don’t disrupt the production service. On the mainframe, for instance, they have sometimes code that has been in production for many years without ever being recompiled. Because recompilation is costly. And if they want to have a big recompilation of everything, that’s a huge event that needs to be planned beforehand, probably split into several steps. And these steps need to happen in the off-time where it would not disrupt the customers and whatnot. It’s a big deal.
Federico: Before you were talking about 250 million lines of code. So that make me wonder if you have some stories to share about your more successful projects. I realize that a lot of what you do has to be confidential but I don’t know if you can give an idea to our listeners of some of your successes I think it would be interesting.
Vadim: Yeah, so some of the stories are more confidential than others. In some cases… if you read our report especially papers that I write or co-write, then you will see that, oh, we did it for a customer, let’s call it A and they had a system, let’s call it B. And they used the language, let’s call it C, but it’s important for you to know that C, well whatever… D was dynamically typed, and it was imperative and it was this and that. 250 million lines of code was a recent project that we are allowed to talk about, and that was in one codebase, in one portfolio.
Vadim: It was one of the big banks in Spain, it was called Bankia. They had PACBASE systems. PACBASE is a fourth-generation language. It’s been developed in France in 1972. And it basically compiles to COBOL. So you write your system in PACBASE, there’re some rules and you compile them to COBOL, and then you deploy the COBOL and then you run it. It’s relatively common to see a PACBASE system in France, there are some in Belgium, Spain is sort of a more exotic place for it. And the farther you go from France then the lower the chance of you encountering a PACBASE system.
Vadim: So the owners of the system have changed over the years and the prices have only gone up. At some point, they wanted to stop the maintenance at all. Then they lowered the maintenance instead. There are no happy PACBASE customers right now. Some of them are migrating to RPP, basically a reimplementation in Java. And some of them are too afraid to move to something completely different, or, for instance, they don’t have any Java experts in-house. It’s also a valid reason not to do that.
Vadim: So what we do with PACBASE we let PACBASE do whatever PACBASE does: we generate COBOL code and it’s very dirty code: it’s not formatted, all the names are generated, it has GO TOs all over the place. Next to GO TO, it has all kinds of strange things like PERFORM THROUGH, which basically says, oh, there’s several paragraphs in COBOL (let’s say methods in more object oriented terms). These several methods happen to stand next to one another in the code, let’s run them in sequence. And if some of them are exiting, then don’t mind it, just continue executing because we want it.
Vadim: And obviously, if you have a construct like this, it’s impossible to maintain. If you have badly formatted code with generated names, it’s impossible to maintain. COBOL people are expensive anyway. But since they’re expensive, they can also afford to be picky. If you hire an expensive expert in COBOL, and you show them this kind of bad code on the first day, they just quit and wish you good luck. It’s okay to have COBOL but the COBOL needs to be up to the standards of people who can maintain it.
Vadim: We basically do refactoring. In academic terms it’s a term rewriting system: we parse COBOL with our own DSL, we have some internal representation of the abstract syntax with some extra annotations. And we have, again, our own transformation network. And over the course of the years, we’ve developed by now something 100… I think closer to 200 different refactorings, all of them are very small, very local. And since they’re local we can sometimes test them exhaustively, sometimes we can even verify that they are universally true.
Vadim: So we can prove them to be correct. And then instead of 47 levels of nested if-then-else statements, you have one EVALUATE (switch case) with one or two levels of depth. And instead of generated names, you have short names, instead of PERFORM THROUGH, you just have normal PERFORM, which is basically like a function call. Instead of GO TOs, you have the WHILE loops and the DO loops and whatever suits the code. And then you still have COBOL code. It’s still sort of expensive, but it’s at least feasible for you to find people to maintain it.
Vadim: And then of course, if you run it with our compiler, you don’t pay anything for the compiler at all. And if you run it in .NET then you can develop new components in cheaper languages, let’s say so something like C#. You take any student of any university that has some resemblance of a diploma and they can write pretty decent C# code right away… oh, maybe not very decent, but they can write some C# code, and then the IDE helps you to refactor their code into something that’s more decent. We’ve been doing this PACBASE for many years. And it’s a sort of a self harm business. So the better we do with this, the worse our future with PACBASE migration is. We’re killing our own potential customer base. Not literally.
Federico: Interesting business strategy.
Vadim: Yes. So that’s why we are not focusing only on PACBASE. But it allows our marketing people to say that we are the best in the world in PACBASE migration, because we’ve helped the most companies to get rid of PACBASE. So this Spanish bank just heard that we were bragging that we’ve dealt with more than 200 million lines of code of PACBASE, but those were different projects accumulated over the years. And they reached out to us and said, “Well, how about having 250 millions in one system?” And just to give you an estimation, it was more than 100,000… I think something like 120,000 different programs in that. So it’s not just a lot of lines of code because COBOL is verbose. Yes, it’s verbose, but not more verbose than Java, let’s say.
Sergej: But is it also because it was generated by PACBASE?
Vadim: No. If you have 120,000 programs, then you have 120,000 programs. And that’s the complexity of your system. And that’s… I believe by now it has been even delivered already. But on our website it at least says that, okay, this is the name of the company and they expect great things from us.
Sergej: So this reminds me of another question that I wanted to ask. So I hear you talking about the complex compiler and all the difficulties and you said you test it, you have some form of verification, can you maybe touch more on how you actually are able to make sure that your compiler does what you want it to do and how to prove it to customers, so to speak? Because they probably care about it. So what do you tell them and how do they react?
Vadim: Yeah, they care about it. And this is a very good question. There’s actually two questions that are very different. Well, the question is the same, the answer is very different for the two parts. First, how we do it. And the second is how we prove to the customers that we’re good at it. So there have been very few occasions where I actually needed to explain, for instance, my testing infrastructure to the customers, but usually what they care about is whether we’ve made somebody’s code run before with the same approach and whether they are happy. They are looking for references. So if the next PACBASE customer comes for instance, they are not interested in my rewriting system and what is the term rewriting, what is a refactoring and what kind of proofs do I have. They are interested to say, “Okay, you say that Bankia was your customer, can I call them? Can I call them and ask them whether Raincode are good guys?” And they literally do that. And if our Spanish friends say that we’re good, then we’re good.
Sergej: How do yourself kind of ensure that you can sleep well and that you don’t break things?
Vadim: Oh, yes, that’s very important. Yes, sleep is very important. I agree. For verification, it’s very rare that we can prove something but for instance, in PACBASE, I mentioned that sometimes there are many levels of nested IFs, and sometimes the conditions are repeating or they’re incomplete if-then-else statements with either then or else branches missing — it was easy for them to generate code like this. But this is just data flow that’s representable with first-order logic. By simply representing it in first-order logic and applying the laws of first-order logic, you can simplify it to universally equivalent form, and that’s it. It’s proven right just by construction. If you use first-order logic, you cannot go wrong, unless you implement it badly but you try not to do that.
Sergej: So is it just you try not to do that or how do you ensure that?
Vadim: We definitely try not to do that. After that comes the testing part. We’re very heavy on testing, especially in systems where we don’t have access to documentation, for instance, or where the documentation is huge. The documentation for IBM Assembler, for instance, it’s quite well written, but just for the instructions, it’s around 2,000 pages of very small font PDF. And so that’s just impossible to just read it and look at the code and say, “Aha, that’s good.” We write tests and we write more tests. If we have a parser for a special syntax, we need to have a test that it’s recognised all the constructs; that it can parse it well so the recognition plus checking that all the things that need to end up in a tree in the right way end up in the tree in the right places; then there’re typing tests. If there is some normalisation like desugaring, then there are tests for that. Then there are tests for code generation. Most of the time when you read papers about compiler testing, they are about testing optimisations, or optimisers.
Vadim: We’re not that heavy on optimisations simply because it’s either not required or the performance is lost in different places, not in badly generated code. We’re not usually catering to computational scientists. We’re usually going for financial software or shipping software or booking software. It’s a lot of tests and the ultimate test of course, in parsing for instance, is: can you parse the entire portfolio of the customer that you’re working with right now. And when you just start to develop your compiler, then this is the number that is communicated in pretty much every email exchange with the customer. Say, “Well, today we’re at 20%. Today we’re at 90%. We’re at 99.5%. We’re 99.7%.” And once you reach 100%, you’re not allowed to go back.
Vadim: And the same with runability. If you compile something, can you run it? If you compile something… in .NET, for instance, there is a Microsoft tool that does byte code verification: it looks at the byte code that you’ve generated and checks that the stack is balanced in all the branches and all this kind of stuff. So then, once you can compile 100%, yes, you’re not allowed to go back but then you need to reach 100% of verified code. Then you start running and then… it grows and eventually you have integration tests where you have things like… there is a front-end and something is clicked on the front-end, the data is synchronised to some value in the memory. That value in the memory gets propagated to the right parameter for the SQL query, the SQL query goes all the way to the server, it runs on the server, the server gives the result. The result is propagated back, it ends up again, in the right place and memory. That thing gets synchronised and it gets back to the screen and that gets verified.
Sergej: Are these tests something that you would write or something that the customer would write?
Vadim: It depends. For parsing tests, obviously, it’s me who’s writing it. For this “Quality First” project with mBank, we basically gave them the tools and before we knew it, they’ve written something like 4000 to 5000 tests.
Sergej: They were very eager to get it.
Vadim: Oh, yeah. But that’s a common misconception that developers don’t care about code quality, or the developers like to write bad code. Nobody wants to write bad code. Developers just make the best of the tools that you give them. And if you give them the nice code analysis thing, and integrate into SonarQube and everything, and they see this thing working and complaining about their code, they’re more than happy to get rid of the smells. They’re more than happy to write the tests if test writing is sort of easy enough and it’s tool facilitated.
Sergej: So do you use any kind of DSLs to write tests?
Vadim: Yes, usually. But it’s 2020, so it’s very easy to create a language. Whenever you want to do something structural and represent some information somewhere then whether you want it or not you end up creating a language.
Sergej: Just out of curiosity, what tools do you use to create these languages?
Vadim: We are very heavy on our own infrastructure. We have our own tools for everything. We have like several parser generators that we use ourselves. We occasionally use ANTLR from time to time, but usually for one off projects. And for things that sort of used to work 20 years ago and still work and are supposed to work in 20 years in the future, we prefer to use our own tools simply because it’s easier to tweak and it’s easier to bug fix. And if anything goes wrong, then we have only ourselves to blame.
Federico: Can you tell us about SLEBoK and talk about this initiative of yours?
Vadim: Yeah, so SLEBoK is short for Software Language Engineering Body of Knowledge. Let’s step back first to say what software language engineering is because I don’t know if everyone is familiar with the term. Since 1950s or so we’ve had programming languages and then at some point we have these fourth-generation languages. Then at some point we started having data description languages, architecture description languages, markup languages like, XML or Markdown or anything. And there’re a lot of this something languages. And what people started noticing like 10, maybe 15 years ago, is that a lot of technologies that are used to deal with these languages, are the same, or it could be the same, which means that if you have some technique that only works on a modelling language, maybe it works only on a modelling language simply because somebody happened to publish a paper that was applying this technique to a modelling language. It doesn’t stop you from applying it to an architecture description language or requirement description language or even a programming language.
Sergej: Can you maybe give an example of such a technique?
Vadim: Well, I don’t know, parsing? No, but… anything… transformation! So, there is model transformation and there is program transformation. And if you write a paper that is transforming source code, you have a choice of conferences to send it to because there’s choice of venues. If you write something that transforms models you have a different set of venues and these two lists, they do not always overlap. So, one of the overlapping conferences is the Software Language Engineering conference that was founded 12 by now years ago I think.
Vadim: And that was exactly for this… to get people together that do programming languages that do DSLs, MDE, that do ModelWare, XMLware, that do COBOL, why not? People that do ontologies and taxonomy and to get them in one room and… then as a grammar guy, I could say, “Oh, I’m doing grammar convergence: I take two grammars and I transform them until they converge to one point.” And then somebody from the modelling world could say, “Oh, Isn’t it a little bit like model weaving.” And even if I’m not familiar with the term, then if I have the term, I can google it up, or DBLP it up, and get the information from it and either apply my own technique in a different domain or get the wisdom from a different domain and apply it to my own thing.
Vadim: So the Software Language Engineering conference has been going on for a while. And since recently, we started thinking that it’s time to concoct a single source for all the knowledge about SLE. So, right now there are different courses given with this exact name, Software Language Engineering, or some variation of it, in different countries, in different universities. Obviously, since people teach students about it, they know what is the main thing or they have an opinion about what are the main terms? What are the main concepts? What are the theories in this field?
Vadim: And SLEBoK is an ongoing initiative to collect all these things, to start structuring them, to start thinking about what are the things that we need to change or fix in the future, and so forth. It’s an ongoing initiative. If you want to know more, it’s slebok.github.io and we have some events. And one of the events that’s coming up by the way, it’s the Workshop on Open and Original Problems in Software Language Engineering, short for OOPSLE (not to be confused with OOPSLA, which is a different venue. Well, slightly different then…) And it will be at STAF in Bergen on 22 I think of June. So if you have any problem that’s either underresearched, or you want to know more about how other people research it, or you think it’s an open problem, and nobody cares about it, but you desperately want a solution, then go there and present your story.
Sergej: Thank you for the invitation. I think we are almost out of time. I have one last question. I’ve looked at your website and your biography, and basically what you’ve been involved with, it seems like you manage to do a lot of stuff in your time. So how do you do it? How are you so productive?
Vadim: I guess one of the things that can be said is to be enthusiastic about things that you do. People think this is a joke, but there have actually been published articles in psychology research journals that measured it quantitatively and qualitatively and say that this is true. If people are more excited about something, then they are more productive. So it’s ill-advised curiosity and a lot of enthusiasm. Ill-advised because a lot of career advice just says choose one thing and do it well and follow that thing. And I’ve been going a bit more broad than normal people should.
Sergej: You’ve been doing exactly the opposite and been successful with it.
Vadim: Not exactly the opposite. I haven’t been publishing on stamp collecting, even though when I was a kid… Well anyway…
Federico: A topic for the next episode.
Sergej: Okay, so where can people learn more about your work?
Vadim: I’m Grammarware. If you google up Grammarware you’ll find grammarware.net or .com or .org, it’s the same website, which basically has… the most valuable thing that it has, it has the PDFs of all my papers. It also has things that I collect for fun, like links to all the events I’ve been to. But that’s just my personal hobby, let’s say, or self indulgence. But the PDFs of the papers are there. There is grammarware.github.io which has links to ongoing projects and such. Well, there’s also Twitter called @grammarware where I tweet about random things from time to time.
Vadim: And especially I’m actively tweeting when I’m attending conferences. But beware, I can be slightly opinionated, but it’s Twitter you’re expected to be. So if you want to know more about me, then google up Grammarware, if you want to know more about Raincode, google up Raincode. There is just raincode.com and raincodelabs.com and these are two different websites that tell different stories about everything that I’ve talked about, and much, much more. There’re a lot of… literally just stories there that you can download in a fancy PDF form. They tell you much more details about what I’ve talked about.
Sergej: Thank you very much for joining us this morning.
Vadim: Thank you for having me. That was an entertaining journey through different topics. A little bit hectic jumping, but that’s fine.
Sergej: Thank you very much.
Federico: Thank you.
Vadim: Thank you! Have fun.
Sergej: Thank you for listening to this episode of Beyond Parsing. We are your hosts Federico and Sergej, two language engineering consultants.
Federico: If you want to learn more about domain-specific languages and language engineering, visit beyondparsing.com.