Teaching IDA Pro to understand the RISC-V P Extension

Our security analysis team has long been hunting for vulnerabilities in all sorts of systems — from biometric scanners to Mercedes-Benz head units. But every firmware researcher eventually hits that moment: encountering a brand-new (or little-known) microcontroller or a fresh CPU architecture with custom extensions.

Lately, these encounters have become increasingly frequent: over the past few years, the market has seen an influx of new chips coming out of China, each with its own extensions and core implementations. Not long ago, one of such devices landed on our desks for analysis: a RISC-V chip with the RV32-based instruction set and a P Extension (not even the latest version), adding short SIMD-style operations — Packed-SIMD Instructions.

It was perfectly normal for our experts to see it for the first time. But apparently, IDA Pro was meeting it for the first time as well. This meant we had to not only study an early draft of the P Extension (a.k.a. Packed-SIMD Extension) but also implement support in IDA Pro for several instructions from that extension and perform lifting — translating instructions into an intermediate representation understandable by the decompiler. That’s exactly the experience we’ll share in this article.

Before diving into the implementation, it’s worth understanding what we’re dealing with. Let’s start with the RISC-V architecture manual.

Instruction Formats in RISC-V
#

The RV32I base instruction set defines four formats: R, I, S, and U. You can find the full details in the RISC-V Instruction Set Manual, Volume I: Unprivileged ISA.

Sharp-eyed readers will notice a key feature: opcodes and registers have fixed positions within the instruction encoding. This is deliberate: such a layout greatly simplifies the instruction decoder design. Moreover, RISC-V decoders are table-based, which streamlines the processor’s instruction parsing.

A Look at the Decoder
#

The entry point for decoding is the opcode field, occupying the lowest 7 bits of a 32-bit instruction (inst[6:0]). The decoder is table-driven; its structure looks like this:

For 32-bit and wider instructions, the two least significant bits are always 11, and the remaining five bits encode an index into the table. Each table entry can point to another table, a terminal instruction decoder, or nothing (Reserved, as expected).

P Extension
#

With the decoding basics covered, the next step is figuring out where P Extension instructions live and how to parse them. For this, we refer to the draft specification available in the riscv/riscv-p-spec repository on GitHub.

We needed one of the earliest public versions — 0.5. So, we stocked up on patience, ~~pu-erh tea~~, and rolled back the commit history.

Version 0.5 (and several subsequent versions) is notable for using 0x7F or 0b1111111 as the opcode. Here lies the catch that gave us a fair bit of trouble: this value is normally reserved for encoding 80+ bit instructions. The decoder does exactly what it was designed to do and nothing more. Without special handling it will always try to read 80+ bits, which is not correct here since the instructions in this extension are 32-bit. In the early version we encountered, 0x7F was used as the opcode — technically breaking the decoder’s foundation and contradicting the official ISA spec.

You can read more about instruction length encoding in the relevant section of the manual. A summary is shown below.

Side note: our journey into this extension began with undecodable instructions in IDA Pro listings. The 0x7F opcode just added fuel to the fire. Pinpointing the exact extension version became a reverse-engineering rabbit hole of its own. Intuition hinted at R-Type instructions — but that’s a story for another day…

Adding Instruction Support
#

Back to practice. To analyze the device’s firmware, we needed to add support for these instructions: umar64, maddr32, msubr32, mulsr64, and mulr64. Fortunately, all of them reseide within the same decoding table, as outlined in the Instruction Encoding Table section of the P Extension draft.

Open your favorite editor, recall your Python skills, and let’s get to it.

Base and Abstractions
#

As mentioned earlier, a decoding table can contain either nested tables or terminal instruction decoders. Keeping this in mind, we implemented a few abstract and base classes:

class ITableEntry(metaclass=ABCMeta):
    pass


class InstructionTable(ITableEntry):
    def __init__(self,
                 rows: int,
                 cols: int,
                 get_row: Callable[[insn_t], int],
                 get_col: Callable[[insn_t], int],
                 *entries):
        if rows * cols != len(entries):
            raise ValueError(f"Rows ({rows}) and cols ({cols}) don't match entries length ({len(entries)})")

        self._rows = rows
        self._cols = cols
        self._get_row = get_row
        self._get_col = get_col
        self._entries = entries

    def _index(self, row: int, col: int) -> int:
        return row * self._cols + col

    def lookup(self, insn: insn_t) -> Optional[ITableEntry]:
        row = self._get_row(insn)
        col = self._get_col(insn)
        idx = self._index(row, col)

        entry = self._entries[idx]
        return entry.lookup(insn) if isinstance(entry, InstructionTable) else entry


class ADecoder(ITableEntry):
    @staticmethod
    def _b2m(bits: int) -> int:
        return (1 << bits) - 1

    @classmethod
    def _bits(cls, value: int, hi: int, lo: int) -> int:
        return (value >> lo) & cls._b2m(hi - lo + 1)

    @classmethod
    def decode(cls, insn: insn_t) -> bool:
        ...

ITableEntry describes a decoding table entry;
InstructionTable is the table itself;
ADecoder is the base class for all decoders, including terminal instruction decoders.

InstructionTable takes rows and cols to define the table geometry, get_row and get_col extract table indices from an instruction, and entries hold table elements.

R-Type Base
#

Since our instructions are R-Type, we created a helper class for them:

class RTypeDecoder(ADecoder, ITableEntry):
    _itype: ClassVar[int] = -1
    
    @classmethod
    def decode(cls, insn: insn_t) -> bool:
        opcode = get_bytes(insn.ea, 1)[0]

        # GE80B encoding is used for v0.5.x
        if opcode & 0x7F != 0x7F:
            print(f"Invalid opcode {opcode & 0x7F}")
            return False

        data = int.from_bytes(get_bytes(insn.ea, 4), byteorder="little")

        insn.size = 4
        insn.Op1.type = o_reg
        insn.Op1.reg = cls._bits(data, 11, 7)   # Rd
        insn.Op2.type = o_reg
        insn.Op2.reg = cls._bits(data, 19, 15)  # Rs1
        insn.Op3.type = o_reg
        insn.Op3.reg = cls._bits(data, 24, 20)  # Rs2

        insn.itype = cls._itype

        return True

The decode method is overridden with a routine common to all instructions:

Verify that the opcode matches;
Read the full instruction (32 bits / 4 bytes);
Extract argument information: 3 registers and their numbers.

The Final Five
#

The last step is assigning internal IDs for parsing. We create an enumeration of instruction IDs (five in total) and terminal classes that implement decode.

class PExtension:
    maddr32 = CUSTOM_INSN_ITYPE
    msubr32 = CUSTOM_INSN_ITYPE + 1
    mulr64  = CUSTOM_INSN_ITYPE + 2
    umar64  = CUSTOM_INSN_ITYPE + 3
    mulsr64 = CUSTOM_INSN_ITYPE + 4

    name_mapping = {
        maddr32: "maddr32",
        msubr32: "msubr32",
        mulr64:  "mulr64",
        umar64:  "umar64",
        mulsr64: "mulsr64",
    }
    
    @classmethod
    def values(cls) -> Set[int]:
        return set(list(cls.name_mapping.keys()))


class Maddr32(RTypeDecoder):
    _itype: ClassVar[int] = PExtension.maddr32


class Msubr32(RTypeDecoder):
    _itype: ClassVar[int] = PExtension.msubr32


class Mulr64(RTypeDecoder):
    _itype: ClassVar[int] = PExtension.mulr64
    
    @classmethod
    def decode(cls, insn: insn_t) -> bool:
        res = super().decode(insn)
        if not res:
            return False

        insn.Op1.reg = insn.Op1.reg & ~1  # Rd - pair of registers
        return True


class Umar64(RTypeDecoder):
    _itype: ClassVar[int] = PExtension.umar64


class Mulsr64(RTypeDecoder):
    _itype: ClassVar[int] = PExtension.mulsr64

Decoder Tables
#

The final piece: three decoder tables.

def b2m(bits: int) -> int:
    return (1 << bits) - 1


def bits(value: int, hi: int, lo: int) -> int:
    return (value >> lo) & b2m(hi - lo + 1)


def insn_data(insn: insn_t, length: int) -> int:
    return int.from_bytes(get_bytes(insn.ea, length), byteorder="little")


p_funct3_001 = InstructionTable(
    16, 8,
    lambda insn: bits(insn_data(insn, 4), 31, 28),
    lambda insn: bits(insn_data(insn, 4), 27, 25),
    None, None, None, None, None, None, None, None,
    None, None, None, None, None, None, None, None,
    None, None, None, None, None, None, None, None,
    None, None, None, None, None, None, None, None,
    None, None, None, None, None, None, None, None,
    None, None, None, None, None, None, None, None,
    None, None, None, None, None, None, None, None,
    None, None, None, None, None, None, None, None,
    None, None, None, None, None, None, None, None,
    None, None, None, None, None, None, None, None,
    None, None, Umar64, None, None, None, None, None,
    None, None, None, None, None, None, None, None,
    None, None, Maddr32, Msubr32, None, None, None, None,
    None, None, None, None, None, None, None, None,
    Mulsr64, None, None, None, None, None, None, None,
    Mulr64, None, None, None, None, None, None, None,
)


ge80b_table = InstructionTable(
    1, 8,
    lambda insn: 0,
    lambda insn: bits(insn_data(insn, 4), 14, 12),
    None, p_funct3_001, None, None, None, None, None, None,
)


rv_table = InstructionTable(
    4, 8,
    lambda insn: bits(insn_data(insn, 4), 6, 5),
    lambda insn: bits(insn_data(insn, 4), 4, 2),
    None, None, None, None, None, None, None, None,
    None, None, None, None, None, None, None, None,
    None, None, None, None, None, None, None, None,
    None, None, None, None, None, None, None, ge80b_table,
)

rv_table — base RISC-V table;
ge80b_table — table for 0x7F opcode;
p_funct3_001 — P Extension table for funct3 = 001, leading to terminal decoders.

Integrating with IDA Pro
#

Finally, we wire everything into IDA Pro. We derive IDP_Hooks class and override methods for instruction analysis or decoding (ev_ana_insn) and mnemonic/operand display (ev_out_mnem, ev_out_operand).

class PExtensionIdpHook(IDP_Hooks):
    def __init__(self):
        IDP_Hooks.__init__(self)
    
    @staticmethod
    def _b2m(bits: int) -> int:
        return (1 << bits) - 1
    
    @classmethod
    def _bits(cls, value: int, hi: int, lo: int) -> int:
        return (value >> lo) & cls._b2m(hi - lo + 1)
    
    def _decode(self, insn: insn_t) -> "bool":
        opcode = get_bytes(insn.ea, 1)[0]
        if opcode & 0x7F != 0x7F:
            return False

        entry: ADecoder = rv_table.lookup(insn)
        if entry is None:
            return False

        return entry.decode(insn)

    def ev_ana_insn(self, out: "insn_t *") -> "bool":
        return self._decode(out)
    
    def ev_out_mnem(self, outctx: "outctx_t *") -> "int":
        typ = outctx.insn.itype
        
        if typ >= CUSTOM_INSN_ITYPE and typ in PExtension.name_mapping:
            mnem = PExtension.name_mapping[typ]
            outctx.out_tagon(COLOR_INSN)
            outctx.out_line(mnem)
            outctx.out_tagoff(COLOR_INSN)
            
            width = max(1, 16 - len(mnem))
            outctx.out_line(" " * width)
            
            return 1
        
        return 0
    
    def ev_out_operand(self, outctx: "outctx_t *", op: "op_t const *") -> "bool":
        insn = outctx.insn
        
        if insn.itype in PExtension.values():
            if op.type == o_displ:
                outctx.out_value(op, OOF_ADDR)
                outctx.out_register(ph_get_regnames()[op.reg])
                return True
            
        return False

Comparison of IDA Pro listings before and after our extension:

Adding Decompiler Support
#

Instruction support alone isn’t enough: decompiler output still leaves much to be desired. Decoded instructions were replaced by assembler stubs, and register operations weren’t accounted for.

The solution: a lifter. We encapsulated instruction behavior in a class (deriving microcode_filter_t) and implemented lifting procedures using Hex-Rays microcode. You can find microcode details here and in the decompiler SDK. For plugin debugging, Lucid is highly recommended.

Lifting: The Good, the Bad, the Ugly
#

The core idea: describe each added instruction in terms of micro-operations and register mappings to microcode. In other words, we need to tell the decompiler what kind of creature it is dealing with.

This requires consulting the P Extension draft and translating instruction semantics into a form the decompiler understands — the lifting process itself.

Here’s a snippet from the lifter class, showing arguably the nastiest implementation in the bunch — for maddr32 and msubr32. The full code with explanations is available on GitHub.

class PExtensiionLifter(microcode_filter_t):
    def __init__(self):
        super(PExtensiionLifter, self).__init__()
        self._p_ext_handlers = {
            PExtension.maddr32: self._maddr32,
            PExtension.msubr32: self._msubr32,
            PExtension.mulr64:  self._mulr64,
            PExtension.umar64:  self._umar64,
            PExtension.mulsr64:  self._mulsr64,
        }

        self._NO_MOP = mop_t()

    def install(self):
        install_microcode_filter(self, True)
        print(f"Installed P-Extension lifter... ({len(self._p_ext_handlers)} instruction(s) supported)")

    def remove(self):
        install_microcode_filter(self, False)
        print("Removed P-Extension lifter...")

    def match(self, cdg):
        return cdg.insn.itype in self._p_ext_handlers

    def apply(self, cdg):
        return self._p_ext_handlers[cdg.insn.itype](cdg, cdg.insn)

    def _mac_common(self, cdg, insn, op: m_add | m_sub):
        rd = reg2mreg(insn.Op1.reg)
        rs1 = reg2mreg(insn.Op2.reg)
        rs2 = reg2mreg(insn.Op3.reg)

        # Temp register for multiplication result
        tmp64 = cdg.mba.alloc_kreg(8)  # 64 bits
        tmp64_mop = mop_t(tmp64, 8)

        # Temp register for masked multiplication result
        tmp32 = cdg.mba.alloc_kreg(8)  # 32 bits
        tmp32_mop = mop_t(tmp32, 8)

        imm_mop = mop_t()
        imm_mop.make_number(0xFFFF_FFFF, 8)

        # Hex-Rays doesn't support operands of different sizes, so we use extension up to 8 bytes.
        # Hope upper parts of `rs1` and `rs2` are zeroed =D
        cdg.emit(m_mul, 8, rs1, rs2, tmp64, 0)
        # As previously we use 64 bits to mask the lower part of multiplication via 0x0000_0000_FFFF_FFFF
        cdg.emit(m_and, imm_mop, tmp64_mop, tmp32_mop)
        # For this step reduce the width to 32 bits, force it via second arg `4`
        cdg.emit(op, 4, rd, tmp32, rd, 0)

        # Free allocations of temp registers
        cdg.mba.free_kreg(tmp64, 8)
        cdg.mba.free_kreg(tmp32, 4)

        return MERR_OK

    def _maddr32(self, cdg, insn):
        return self._mac_common(cdg, insn, m_add)

    def _msubr32(self, cdg, insn):
        return self._mac_common(cdg, insn, m_sub)

Each micro-op is clearly labeled, and multiplication nuances in _mac_common (32-bit operands, 64-bit result) are explained in comments with respect to the lifter API.

Now we can enjoy the fruits of our labor: decompiled output without any assembler stubs.

Conclusion
#

If you can’t analyze firmware because your decompiler lacks support for certain instructions, don’t despair. You can identify the instructions, write a decoder and hook it into the decompiler.

Don’t fear dense documentation or sparse API descriptions: often there’s a neat and consistent design behind them, grasped fully only through hands-on use. As for the people who designed them… perhaps all we can do is forgive them.

Instruction Formats in RISC-V#

A Look at the Decoder#

P Extension#

Adding Instruction Support#

Base and Abstractions#

R-Type Base#

The Final Five#

Decoder Tables#

Integrating with IDA Pro#

Adding Decompiler Support#

Lifting: The Good, the Bad, the Ugly#

Conclusion#