How to Write a Emacs Major Mode for Syntax Coloring

Buy Xah Emacs Tutorial. Master emacs benefits for life.
, , …,

This page shows you how to write a emacs major mode to do syntax coloring of your own language.

Problem

You want to write a major mode for a new language, so that the keywords of the language will be highlighted.

Suppose your language source code looks like this:

Sin[x]^2 + Cos[y]^2 == 1
Pi^2/6 == Sum[1/x^2,{x,1,Infinity}]

You want the words “Sin”, “Cos”, “Sum”, colored as functions, and “Pi” and “Infinity” colored as constants.

Solution

Save the following in a file.

;; test. my-math-mode, my first major mode

(setq my-highlights
      '(("Sin\\|Cos\\|Sum" . font-lock-function-name-face)
        ("Pi\\|Infinity" . font-lock-constant-face)))

(define-derived-mode my-math-mode fundamental-mode
  (setq font-lock-defaults '(my-highlights))
  (setq mode-name "math lang"))

The string "Sin\\|Cos\\|Sum" is a regex, the font-lock-function-name-face is a pre-defined variable that holds the value for the default font face used for function keywords.

The line define-derived-mode defines your mode, named “my-math-mode”, based on the fundamental-mode (which is the most basic mode).

The line (setq font-lock-defaults '(my-highlights)) sets up the syntax highlighting for your mode.

The line (setq mode-name "math lang") defines the name to be displayed on the status line, so users know what mode they are in. Otherwise it'll show as *invalid*.

Now, just select the above code and call eval-region to let emacs know about it. Now, when you call “math-lang-mode”, emacs will now syntax color the buffer's text. (you must have font-lock-mode on, if not, call font-lock-mode.) Here's what it looks like:

emacs my math major mode screen
emacs my math major mode screen

Here's another simple example: Emacs Lisp: html6-mode.

Writing a Mode for a Language that Has Hundreds of Keywords

Typically, a language has hundreds of keywords. Elisp has a way to generate regex for your keywords.

Suppose you are writing a mode for the Linden Scripting Language (LSL). LSL has about 553 keywords. First, here's a sample of LSL source code so you get some idea of how we want it colored.

// sample LSL file

// Examples of variable declaration and assignment:
integer score = 0;
string mySay = "i ♥ you";
vector v = <3,4,5>;
list myList= [2,4,7,3];

// Example of defining a function.
// built-in function's names start with “ll” (Linden Library).
integer sum(integer a, integer b)
{
    integer result = a + b;
    return result;
}

 default
 {
     state_entry()
     {
         llSay(0, mySay);
     }

     touch_start(integer total_number)
     {
         if (score == 1) {
             llSay(0, mySay);
         } else {
             llWhisper(0, "Ouch!");
         }
     }
 }

Each type of keyword uses a different color:

Here's the code.

;;; mylsl-mode.el --- sample major mode for editing LSL.

;; Copyright © 2015, by you

;; Author: your name ( your email )
;; Version: 2.0.13
;; Created: 26 Jun 2015
;; Keywords: languages
;; Homepage: http://ergoemacs.org/emacs/elisp_syntax_coloring.html

;; This file is not part of GNU Emacs.

;;; License:

;; You can redistribute this program and/or modify it under the terms of the GNU General Public License version 2.

;;; Commentary:

;; short description here

;; full doc on how to use here


;;; Code:

;; define several category of keywords
(setq mylsl-keywords '("break" "default" "do" "else" "for" "if" "return" "state" "while") )
(setq mylsl-types '("float" "integer" "key" "list" "rotation" "string" "vector"))
(setq mylsl-constants '("ACTIVE" "AGENT" "ALL_SIDES" "ATTACH_BACK"))
(setq mylsl-events '("at_rot_target" "at_target" "attach"))
(setq mylsl-functions '("llAbs" "llAcos" "llAddToLandBanList" "llAddToLandPassList"))

;; generate regex string for each category of keywords
(setq mylsl-keywords-regexp (regexp-opt mylsl-keywords 'words))
(setq mylsl-type-regexp (regexp-opt mylsl-types 'words))
(setq mylsl-constant-regexp (regexp-opt mylsl-constants 'words))
(setq mylsl-event-regexp (regexp-opt mylsl-events 'words))
(setq mylsl-functions-regexp (regexp-opt mylsl-functions 'words))

;; create the list for font-lock.
;; each category of keyword is given a particular face
(setq mylsl-font-lock-keywords
      `(
        (,mylsl-type-regexp . font-lock-type-face)
        (,mylsl-constant-regexp . font-lock-constant-face)
        (,mylsl-event-regexp . font-lock-builtin-face)
        (,mylsl-functions-regexp . font-lock-function-name-face)
        (,mylsl-keywords-regexp . font-lock-keyword-face)
        ;; note: order above matters, because once colored, that part won't change.
        ;; in general, longer words first
        ))

;;;###autoload
(define-derived-mode mylsl-mode fundamental-mode
  "lsl mode"
  "Major mode for editing LSL (Linden Scripting Language)…"

  ;; code for syntax highlighting
  (setq font-lock-defaults '((mylsl-font-lock-keywords))))

;; clear memory. no longer needed
(setq mylsl-keywords nil)
(setq mylsl-types nil)
(setq mylsl-constants nil)
(setq mylsl-events nil)
(setq mylsl-functions nil)

;; clear memory. no longer needed
(setq mylsl-keywords-regexp nil)
(setq mylsl-types-regexp nil)
(setq mylsl-constants-regexp nil)
(setq mylsl-events-regexp nil)
(setq mylsl-functions-regexp nil)

;; add the mode to the `features' list
(provide 'mylsl-mode)

;; Local Variables:
;; coding: utf-8
;; End:

;;; mylsl-mode.el ends here

Note that the highlighting mechanism of font-lock-defaults is based on first-come-first-serve basis. Once a piece of text got its coloring, it won't be changed. So, the order of your list is important. In general, put longer length keywords first. (this won't fix all cases where a keyword matches part of other keywords. If your language has a lot such keywords, you need to use other forms to solve this problem. (info "(elisp) Search-based Fontification"))

The `( ,a ,b …) is a lisp special syntax to evaluate parts of elements inside the list. Inside the paren, elements preceded by a , will be evaluated.

In the above, we based our mode on c-mode, because the syntax is similar. Basing on a similar language's mode will save you time in coding many features, such as handling comment and indentation.

To understand the line:

(provide 'mylsl-mode)

See: What's Emacs Lisp feature?

See also: Emacs Lisp's Library System: What's require, load, load-file, autoload, feature?.

Now, to run the code, call eval-buffer. 〔➤ Emacs: How to Evaluate Emacs Lisp Code

Open the LSL language sample file given above, then call mylsl-mode. Here's the result:

emacs sample mylsl-mode
sample mylsl-mode syntax highlighting result.

How to Name Your Mode

Emacs Lisp: How to Name Your Major Mode

Full Featured Language Mode

In this tutorial, we only covered syntax coloring of fixed strings.

Complex Syntax Coloring

For many language, the syntax coloring are not fixed set of strings. For example, in XML, you have <xyz>…</xyz> pattern where the “xyz” can be anything.

emacs html-mode syntax coloring screenshot 2013-07-31
emacs html-mode syntax coloring screenshot

Note color and underline for text inside the “h1” tag. Even though the text isn't any of the keyword in HTML language, but it needs to be syntax colored in a particular way.

Features of a Major Mode

A full featured language mode should also handle comments, indentation, keyword completion, function documentation lookup, function template insertion, graphical menus, supporting emacs's customize-group scheme, or any other features that may be useful for coding the language your mode is designed for.

The following will help you implement other features for a major mode:

(info "(elisp) Major Mode Conventions")

Like it? Buy Xah Emacs Tutorial.
blog comments powered by Disqus