Metamorphism, Hacking and IT E-Book Dump Release

[ Pobierz całość w formacie PDF ]
INTERNATIONAL JOURNAL OF COMPUTER SCIENCE VOLUME 2 NUMBER 1 2007 ISSN 1306-4428
Metamorphism, Formal Grammars and
Undecidable Code Mutation
Eric Filiol
that the set
D
i
of polymorphic viruses with an inﬁnite number
of forms is a
Abstract
—This paper presents a formalisation of the different
existing code mutation techniques (polymorphism and metamor-
phism) by means of formal grammars. While very few theoretical
results are known about the detection complexity of viral mutation
techniques, we exhaustively address this critical issue by considering
the Chomsky classiﬁcation of formal grammars. This enables us to
determine which family of code mutation techniques are likely to be
detected or on the contrary are bound to remain undetected. As an
illustration we then present, on a formal basis, a proof-of-concept
metamorphic mutation engine denoted
PB MOT
, whose detection has
been proven to be undecidable.
Keywords
—Polymorphism, Metamorphism, Formal Grammars,
Formal Languages, Language Decision, Code Mutation, Word Prob-
lem.
Σ
3
-complete set. Unfortunately, no results is
known for other classes of polymorphic viruses and for the
general case of metamorphism. Many open problems still
remain.
Up to now, only very few examples of metamorphic
codes are known to exist. The most sophisticated one is the
MetaPHOR
engine whose essential feature is a certain amount
of non-determinism. Experiments in our laboratory showed
that existing antivirus software can be very easily defeated
by
MetaPHOR
-like technology. However, the analysis of this
engine [9, Chap. 4] has proved that its metamorphic techniques
still belong to trivial classes.
Our research thus focused on the formalisation of metamor-
phism by means of formal grammar and languages. We aimed
at identifying the different possible classes of possible code
mutation techniques. The ﬁrst results, which are presented in
this paper, enable to assert that detection complexity of code
mutation techniques can be far higher that NP-complete and
that for some well-chosen classes, detection is an undecidable
problem.
The links between polymorphism and formal grammars
has been introduced in [16] for the ﬁrst time. Unfortunately,
the author did touch on this issue only. Metamorphism is
not addressed at all. Some aspects are dealt with in a very
naive way and we will prove in this paper that some of its
conclusions are totally wrong.
This paper is organised as follows. Section II presents
the main theoretical tools of computability theory we use
throughout this paper. Section III then explains how code
mutation techniques can be modelled by formal grammars and
how their detection can be reduced to the problem of deciding
a language. Section IV will then presents our proof-of-concept
metamorphic engine, denoted
POC PBMOT
we have designed
in order to validate our theoretical model. In particular, we
will show that detecting this engine is an undecidable problem.
Section V ﬁnally concludes and present some future work.
I. I
NTRODUCTION
Polymorphism and metamorphism are the two techniques
dedicated to hinder sequence-based antiviral detection. The
principle is to cancel as much as possible any potential ﬁxed
element in a malware code that would represent a potential
detection pattern. Polymorphism has formerly been introduced
by Fred Cohen [5] with the concept of
Largest Viral Set
while
metamorphism has appeared in the early 2000s as a response
to the polymorphism’s inherent limitations.
From a theoretical point of view [20], [21], the core of
a polymorphic malware is its kernel which is made up of an
infection trigger condition
1
I
(
d, p
)
, a payload routine
D
(
d, p
)
,
the corresponding payload trigger condition
T
(
d, p
)
and a
selection function
S
(of target programs to infect). It is
precisely the latter function which is in charge of the code
mutation.
Metamorphic viruses differ from polymorphic viruses since
their respective selection function are different. While all
polymorphic forms of a virus share the same kernel, the
metamorphic forms of a virus have a completely different
kernel. Consequently, if detection remains tractable for some
classes of polymorphism – the kernel does not change during
the mutation process and thus can be used as a detection
pattern –, it becomes far different as far as metamorphism
is concerned.
There are very few theoretical results concerning the de-
tection complexity of code mutation techniques. This problem
has been addressed very recently only. D. Spinellis has proved
[17] that detection of bounded-length polymorphic viruses is
an NP-complete problem. Zuo and Zhou [20] have then proved
(
p
)
II. F
ORMAL
G
RAMMARS AND
R
EWRITING
S
YSTEMS
Let us ﬁrst recall basic notation and concepts we will
use throughout this paper. We consider a ﬁnite set
Σ=
{
a
1
,a
2
,...,a
n
}
as an alphabet whose elements are called
symbols
. A sequence of symbols of
Σ
is called a chain
Also Associate Senior Professor at ESIEA - Laval
ﬁliol@esiea.fr
E. Filiol
is with the Lab. of Virology and Cryptology, ESAT, Rennes
b
1
b
2
b
3
...b
m
with
b
i
∈ Σ
and
m
≥ 0
. The concatenation of
(France)
Email: eric.ﬁliol@esat.terre.defense.gouv.fr
1
The running environment
(d, p)
is made up of
data
(
d
)and
programs
(
p
).
two chains
x
and
y
is the chain
xy
b
1
b
2
...b
m
c
1
c
2
...c
n
.
Let
A
and
B
two sets of chains deﬁned over
=
Σ
. Then we can
IJCS VOLUME 2 NUMBER 1 2007 ISSN 1306-4428
70
© 2007 WASET.ORG
INTERNATIONAL JOURNAL OF COMPUTER SCIENCE VOLUME 2 NUMBER 1 2007 ISSN 1306-4428
deﬁne the following sets:
A
formal language
is ﬁnally deﬁned by the set
L
(
G
)={
x
∈
Σ
∗
|
⇒
∗
x
.
It is nothing more than the “words” (or chains) generated
with respect to this grammar. From this point of view, natural
languages and programming languages are just instances of a
wider concept.
A huge classiﬁcation work of formal grammars has been
done by Noam Chomsky [3], [4]. This author has identiﬁed
four different classes.
•
Class 0 grammars (or
free grammars
). They are all
grammars whose productions may be freely written as
x
S
}
with respect to the grammar
G
=(
N, T, S, R
)
AB
=
{
xy
|
x
∈
A, y
∈
B
}
,
A
∗
=
{
x
1
x
2
...x
n
|
n
≥ 0
,x
1
,x
2
,...,x
n
∈
A
}
,
A
+
=
{
x
1
x
2
...x
n
|
n
≥ 1
,x
1
,x
2
,...,x
n
∈
A
}
.
This notation enables us to introduce the concept of
formal
grammar
.
Deﬁnition 1:
[14] A
formal grammar
G
is the 4-tuple
G
=
(
N, T, S, R
)
where:
N
is a set of non-terminal symbols;
•
T
is an alphabet of terminal symbols with
N
∩
T
= ∅
;
•
::=
y
where
y
is an arbitrary chain of symbols taken in
S
∈
N
is the start symbol;
•
T
. These grammars all generate languages that can be
decided by Turing machines. Equivalently, the languages
they generate are recursively enumerable (see [6, Chap.
2]);
•
Class 1 grammars (or
context-sensitive grammars
). The
unique constraint on the production lies in the fact that
the size of words cannot decrease. Consequently, the only
possible productions are in the form of
x
N
∪
R
is a rewriting system, that is to say a ﬁnite set of rules
R
•
T
∗
(we cannot rewrite chains which contain only terminal
symbols).
The rewriting system (still known as
semi-Thue system
) over
Σ
)
∗
×(
)
∗
, such that
⊆ (
T
∪
N
T
∪
N
(
u, v
) ∈
R
⇒
u
∈
Σ
∗
× Σ
∗
. In other words, it is
is in fact a ﬁnite subset of
the set
R
= {(
u
1
,v
1
)
,...,
(
u
n
,v
n
)}
.Apair
(
u, v
) ∈
R
is a
::=
y
as long
rewriting rule or
production
. It is denoted
u
::=
v
for short
. This class contains all natural languages.
•
Class 2 grammars
context-free grammars
.Theyareall
grammars whose productions are in the form of
X
as
|
y
|≥|
x
|
R
).
A rewriting system
R
enables to deﬁne a rewriting relation,
denoted
(instead of
(
u, v
) ∈
y
where
X
is a unique nonterminal symbol and where
y
is an element of
::=
⇒
R
which is deﬁned as:
) ∈ Σ
∗
× Σ
∗
.
)
∗
.Theterm
X
can be
rewritten independently from its context contrary to class
1 grammars. Subsets of class 2 grammars are commonly
used to implement programming languages.
•
Class 3 grammars (or
regular grammars
). They are all
grammars whose productions are in the form of
X
(
N
∪
T
rus
⇒
rvs
if and only if
(
u, v
) ∈
R
and
(
r, s
∈ Σ
∗
directly (e.g.
This means that we can build the chain
rvs
∈ Σ
∗
.
Example 1:
[12] Let us consider
in one step) from the chain
rus
Σ={
A, a, b, c
}
and
::=
x
R
= {(
A, aAa
)
,
(
A, bAb
)
,
(
A, c
)
,
(
A, aca
)}
.
N
2
and
x
T
∗
.
or
X
::=
xY
with
(
X, Y
) ∈
∈
A
⇒
R
aAa
III. P
OLYMORPHISM
,M
ETAMORPHISM AND
F
ORMAL
G
RAMMARS
In this section, we will ﬁrst formalise code mutation in term
of rewriting techniques. We will then be able to exhaustively
address the complexity of code mutation detection according
to the class of the grammar in use.
aAa
⇒
R
aaAaa
⇒
R
aacaa
This relation allows to deﬁne a reﬂexive and transitive closure
for the relation
aaAaa
⇒
R
⇒
. We will denote it
. This relation is
Σ
∗
by:
deﬁned, for every
r, g, h
in
⇒
R
1) if
g
⇒
R
h
then
g
h
2)
g
⇒
R
g
3) if
g
⇒
R
h
.
Equivalently, two words are related with respect to this re-
lation, if and only if one can be produced from the another.
As an example, in the previous example, we can replace the
symbol
⇒
R
r
and
r
⇒
R
h
then
g
A. Code mutation and Formal Languages
Let us consider the set of x86 instructions as our working
alphabet. These instructions may be combined according to
(rewriting) rules that completely deﬁne every compiler. This
set of rules can be deﬁned as a class 2 formal grammar indeed.
The assembly language is then the language which is generated
by this grammar.
Implementing a polymorphic engine – in particular the
garbage generator which is its most important part – consists in
generating a formal language, denoted
polymorphic language
,
with its own grammar.
Let us consider a polymorphic engine generated by the
grammar (the example is derived from [16]).
⇒
R
.
A Thue system is a semi-Thue system in which the relation
⇒
R
⇒
R
by
⇔
R
is symmetric. It is consequently denoted
.Letus
consider the following example.
Example 2:
Let us consider the grammar
G
=(
N, T, S, R
)
with
T
=Σ={0
,
1}∪{
}
,
N
= {
S, X, Y
}
and
R
deﬁned
by:
S
::=
1
S
|0
S
|
X
X
::=
0
Y
Y
::=
1
Y
|0
Y
|
Z
G
= {{
A, B
}
,
{
a, b, c, d, x, y
}
,S,R
}
.
Z
::=
Instructions
a, b, c
and
d
represent garbage code while instruc-
tions
x
and
y
are the decryptor’s instructions (e.g.
x
This grammar builds every chain containing at least one zero.
=
XOR
IJCS VOLUME 2 NUMBER 1 2007 ISSN 1306-4428
71
© 2007 WASET.ORG
INTERNATIONAL JOURNAL OF COMPUTER SCIENCE VOLUME 2 NUMBER 1 2007 ISSN 1306-4428
[EDI], AL
and
y
=
INC EDI
). The rewriting system
R
result we got a far higher number of evolving capabilities
but also a far higher computing complexity. Moreover,
NFAs allow transitions on the empty strings.
We can now formalise the action of any antivirus with respect
to code mutation detection. Without loss of generality, we will
consider NFAs only (in fact it is possible to reduce a NFA to
a DFA, up to an exponential increase of the number of states
[14]).
Deﬁnition 3:
[12] We say that a chain
x
can be deﬁned as:
S
::=
aS
|
bS
|
cS
|
xA
A
::=
aA
|
bA
|
cA
|
dA
|
yB
B
::=
aB
|
bB
|
cB
|
dB
|
Consequently, our polymorphic language is made up of every
word in the form of
=
x
1
x
2
...x
n
with
x
i
∈ Σ
is decided by an automaton
2
A
=(
Q,
Σ
,τ,q
0
,F
)
if
there exist a state sequence
q
1
,q
2
,...,q
n+1
of
Q
and a symbol
sequence
x
1
,x
2
,...,x
n
of
}
∗
x
}
∗
y
}
∗
.
{
a, b, c, d
{
a, b, c, d
{
a, b, c, d
Σ ∪{
}
such that
q
i+1
∈
τ
(
q
i
,x
i
)
for every
i
)
the set of any chain detected by
A
. It is the language decided
by
A
.Inotherwords,
A
decides whether
L
∈{1
,
2
,...,n
}
with
q
0
=
q
1
. Then we note
L
(
A
Every of these words corresponds to a mutated variant of the
initial decryptor. It is thus “easy” (e.g for an antivirus software)
to determine that the word
abcddxd
is not in this language
with respect to
G
, contrary to the word
adcbxaddbydab
.
Other examples of less trivial polymorphic grammars are
presented in [9, Chap. 6].
All this being deﬁned, the essential issue for any antivirus
is to have an algorithm which is able to determine whether a
“word” (a mutated form) belongs to a polymorphic language
or not. But as soon as we consider sophisticated polymorphic
techniques, this ability is very difﬁcult to evaluate. That is
precisely the interest of modelling code mutation with formal
grammars. According to the grammar class, we can get a more
accurate insight of the detection complexity.
or not.
Consequently, the automaton
A
is a solution of the language
decision problem with respect to the grammar
G
.
This deﬁnition describes the fact that as soon as an antivirus
software embeds an automaton
A
which is able to solve
the (polymorphic) decision problem with respect to a given
polymorphic grammar, then it is able to detect any of its
“polymorphic words” (e.g mutated forms). Two critical issues
are then to be considered:
•
the relevant complexity of the automaton,
•
every time the polymorphic grammar is changing (e.g. the
polymorphic engine is a new one), the antivirus software
must be upgraded with a new automaton which decides
the new polymorphic language.
The last points underlines the essential interest of metamor-
phic techniques compared to polymorphic ones. That is the
reason why antivirus software are bound to be defeated by
metamorphism. Indeed every new metamorphic mutation aims
at producing a new grammar and a new word generated
by the latter at the same time. Consequently, we deﬁne
metamorphism as a grammar whose words are themselves a set
of productions with respect to a grammar. This may be sound
as a naive assumption about antivirus software. Unfortunately,
it is not. It has been proved in [7], [8] that these software still
heavily relies on ﬁrst- or second generation scanning tech-
niques contrary to what is claimed by the antivirus vendors.
Let us now consider our formal deﬁnition of metamorphism.
Deﬁnition 4:
(Metamorphic Virus) Let
G
1
=(
(
A
)=
L
(
G
)
B. Antiviral Detection and Language Decision
In order to set up our model, let us consider the following
deﬁnition.
Deﬁnition 2:
[12] Let
G
=(
N, T, S, R
)
be a grammar and
T
∗
a chain with respect to
G
.The
language decision
problem
with respect to
G
consists in determining whether
x
x
∈
or not. The
language completeness problem
is that
which aims at deciding whether
L
∈
L
(
G
)
T
∗
or not.
The ﬁrst problem models the detection problem of polymor-
phism (once the relevant grammar is known). The second one
models the concepts of non detection and false positive.
In order to address the detection problem, let us just recall
the existing algorithmic tools we have at our disposal’s. They
will thus enable us to give complexity results with the different
instances of this problem. A detailed description of these tools
can be found in any computability handbook (e.g [12]). The
generic tool is a ﬁnite automaton. Two different kinds of
automata are to be considered.
•
Deterministic Finite Automata
(DFA). It is the most
simple form of automaton. It may be represented by a
directed graph, whose nodes are the possible states of
the automaton and the arcs joining nodes are labelled by
symbols of the alphabet. The symbols in fact determin-
istically deﬁne the transition condition from one state to
another state.
•
Non deterministic Finite Automaton
(NFA). It is a gen-
eralised automaton. The essential difference with a DFA
comes from the fact that more than one arc with the same
label can start from a node. This means that from one
given state, a condition can result in different effects. As a
(
G
)=
N, T, S, R
)
be grammars where
T
is a set
of formal grammars,
S
is the (starting) grammar
G
1
and
R
a rewriting system with respect
N
,T
,S
,R
)
and
G
2
=(
(
N
∪
T
)
∗
. A metamorphic
virus is thus described by
G
2
and every of its mutated form
is a word in
L
(
L
(
G
2
))
.
The notation
L
(
L
(
G
2
))
which is more practical, stands for
L
where
x
is a grammar. This
deﬁnition describes the fact that from one metamorphic form
to another, the virus kernel is changing: the virus is mutating
and change the mutation rules at the same time. Section IV
will present a proof-of-concept of this formalisation. Deﬁni-
tion 4 in fact somehow relates to
two-level grammars
(or Van
(
x
)
for some
x
∈
L
(
G
2
)
2
The sets
Q
,
and
F
are the set of the automaton’s possible states, a
ﬁnite alphabet and a subset of states that can accepted by the automatom
respectively. The state
q
0
denotes the initial state whereas
τ
is the transition
function
τ : Q × Σ → Q
which maps a state and a symbol to a state.
Σ
IJCS VOLUME 2 NUMBER 1 2007 ISSN 1306-4428
72
© 2007 WASET.ORG
INTERNATIONAL JOURNAL OF COMPUTER SCIENCE VOLUME 2 NUMBER 1 2007 ISSN 1306-4428
Wijngaarden grammars; 2VW grammars for short) [19], [10]
in which grammar
G
2
can be compared to some extent to the
2VW metagrammar. It is an open problem, at the moment,
to determine whether our construction of Deﬁnition 4 is a
particular case of 2VW grammars or not.
As a ﬁrst consequence, managing metamorphism implies to
have suitable automata at our disposal’s in order to solve the
language decision problem with respect to
G
2
-like grammars.
We now can give complexity results for this problem,
according to the existing grammar classes:
Proposition 1:
The language decision problem:
•
is undecidable for class 0 grammars;
•
has NP-complexity for class 1 and class2 grammars;
•
has polynomial complexity for class 3 grammars.
Proof:
This proof has been established by summarizing
results given in [11], [14]. As far as class 0 grammars are
concerned, we show that they generate recursively enumer-
able languages (their productions simulate Turing machines).
Consequently, deciding whether
x
We will consider this in Section IV.
To end with formal grammars, it is worth giving the
following results, whose proof will be found in [12,
§
10.4].
Theorem 2:
Let
G
i
=(
N
i
,T
i
,S
i
,R
i
)
with
i
=1
,
2
two
context-free grammars. Deciding whether
L
(
G
1
) ∩
L
(
G
2
)=
∅
or not is an undecidable problem. Determining whether
T
i
L
or not is undecidable as well.
These two results, in the special context of context-free
grammars illustrates the concepts of false positive (grammar
G
1
is not a viral one while grammar indeed is a viral one)
and of non detection, as far as antiviral detection is concerned
[9, Chap. 2 and 4].
(
G
i
)=
IV. U
NDECIDABLE
M
UTATION
T
ECHNIQUES
A. The Word Problem
The
word problem
has been introduced and formalised by
E. Post in 1950 [15]. Aside the Turing’s Halting problem, it
is one of the most famous problem which is known to be
undecidable. This problem consists in deciding whether two
ﬁnite words
r
and
s
over an alphabet
∈
L
(
G
)
or not for given
G
and
x
reduces the Halting problem.
For class 2, the language decision problem can be solved
with non deterministic pushdown automata while class 1
grammars, it is solved with linear-bounded non deterministic
Turing machines. Lastly, class 3 grammars are solved by
deterministic ones. Hence the results.
Proposition 1 stresses on the fact that the choice of underlying
grammar is essential when designing a polymorphic engine.
It has a direct impact on its resistance against its potential
detection. Quite all known polymorphic engines refer to class
3 grammars. That is the reason why they can be successfully
detected. But contrary to claims in [16] about systematic
detection capabilities, intractability (classes 2 and 1) or even
worse impossibility (class 0) rule out the practical detection
when considering the cases of code mutation engines with
respect to higher classes of grammars. From a practical point
of view, no antivirus can monopolize computer resources
for minutes or even hours to solve some rather computable
instances of a NP problem. Nowadays, our saving grace is
that malware writers still seem to neglect or ignore what are
the “good” grammars (or the “worst” ones from the defense
point of view). This statement of course only holds for already
detected or detectable malware.
There exist a huge number of theoretical results in the
ﬁeld of formal languages [2] that can be used to build code
mutation techniques that are untractable or even impossible to
detect. From a general point of view, the approach consists in
considering the undecidability status for some known problem.
In this respect, we may consider the Rice’s Theorem. Let us
consider a trivial property
P
about a language; in other words,
there exists at least one recursively enumerable language (class
0)
L
for which
P
holds and at least one recursively enumerable
language
L
for which
P
does not.
Theorem 1:
(Rice’s Theorem [12]) For any non trivial prop-
erty
P
with respect of languages, the problem of determining
whether
P
holds for a language
L
∈ Σ
are equivalent or
not, up to a rewriting system
R
. In other words, it consists in
deciding whether
r
Σ
⇒
R
s
or not.
Theorem 3:
[15] The word problem with respect to a semi-
Thue system is undecidable.
The proof consists in reducing the word problem to the Halting
problem which is itself undecidable.
Example 3:
(Tzeitzin semi-Thue Systems) Let the rewriting
system
R
deﬁned over the alphabet
Σ={
a, b, c, d, e
}
.
(
ac, ca
)
,
commutation
(
ad, da
)
,
(
bc, cb
)
,
(
bd, db
)
,
(
eca, ce
)
,
(
edb, de
)
,
(
cca, ccae
)
deletion/insertion
This semi-Thue system is called
Tzeitsin system
[18]. It is the
smallest semi-Thue system which is known to be undecidable.
We will denote it
T
0
. As a consequence, any semi-Thue system
which contains
T
0
is itself undecidable. There exist many other
undecidable Thue systems. We will use in the rest of the paper
the Tzeitsin system
T
1
deﬁned by:
(
ac, ca
)
,
(
ad, da
)
,
(
bc, cb
)
,
(
bd, db
)
,
(
eca, ce
)
,
(
edb, de
)
,
(
M
)
of a Turing machine
(
cdca, cdcae
)
,
M
, in undecidable.
This theorem is essential since it clearly indicates in which
context to look at in order to systematically defeat antivirus.
(
caaa, aaa
)
,
(
daaa, aaa
)
IJCS VOLUME 2 NUMBER 1 2007 ISSN 1306-4428
73
© 2007 WASET.ORG
INTERNATIONAL JOURNAL OF COMPUTER SCIENCE VOLUME 2 NUMBER 1 2007 ISSN 1306-4428
B. Code Mutation Based on The Word Problem: the
PBMOT
engine
The central principle is to use formal grammars whose
rewriting system is a Thue system which itself contains a
Tzeitsin system or any other system which is known to be
undecidable. From a general point of view, this implies that
the code mutation engine based on such grammars will be
undecidable as well. The core approach to design such an
engine is to practically implement the concept of Deﬁnition 4.
In other words, the engine’s rewriting rules (namely the
mutation rules) will change from mutation to mutation. For
that purpose, two main constraints are to be satisﬁed:
•
The deep nature of the chosen rules as well as their more
or less sophisticated level will have a direct impact on the
detectability of the engine which embeds them.
As far as our
POC PBMOT
is concerned, the
MetaPHOR
engine code has been considered as a starting point. The design
steps are hereafter presented.
1) the different modules of the
MetaPHOR
engine have
been analysed in depth[9, Chap. 4]. The set
T
has then
been built accordingly: it corresponds, up to some minor
differences, to the different possible instructions;
2) in a second step, our analysis aimed at indentifying
the main functionalities involved in the code mutation.
A proto-rewriting (embryonic) system
R
0
has been
ﬁrst designed, in such way that it includes the
T
0
system. The system
R
0
is a framework whose essen-
tial role is to perform code mutation itself. In other
words, the rewriting takes the pairs
the rewriting system of
G
2
contains an undecidable Thue
system;
, during the
i
-
th mutation step, contains an undecidable Thue system
as well.
From an implementation point of view, the central approach
consists in coding the rewriting system of
L
i
(
every word (hence a grammar) in
L
i
(
G
2
)
•
in words
u
1
v
1
u
2
v
2
...u
n−1
v
n−1
u
n
v
n
into consideration. At this
early stage, the set
N
is not deﬁned yet;
3) the analysis was then concerned with modifying the
functions involved in the code transformation at a meta-
level (the central point in metamorphism). This part has
enabled to choose the set
N
of non terminal symbols in
a ﬁrst step. These symbols are essential since they are
directly implied in the structure of the words in the form
of
u
1
v
1
u
2
v
2
...u
n−1
v
n−1
u
n
v
n
(
u
i
,v
i
)
G
2
)
grammars
)
∗
where the sets
T
and
N
as words on the alphabet
(
N
∪
T
are those of grammars
G
1
.Inotherwordstheset
R
of rules
(with respect to grammar
G
1
):
R
= {(
u
1
,v
1
)
,
(
u
2
,v
2
)
,...,
(
u
n
,v
n
)}
is coded as the following word:
at a macro-level. In a
second step, the proto-system
R
0
has been modiﬁed and
tuned up in order to achieve the ﬁnal rewriting system
R
which includes the
T
1
system.
The critical point lies in the mutation of a word in the form
of
u
1
v
1
u
2
v
2
...u
n−1
v
n−1
u
n
v
n
,
(1)
made of terminal and non terminal symbols.
The other essential point is to design a grammar
G
2
which is
able to manipulate such “grammar-words”. The set
T
contains
words build on
)
∗
(Equation 1). As for the set
N
,it
contains symbols which are speciﬁc to the grammar
G
2
but it
can also contains symbols in
N
. The starting element is a word
in
(
N
∪
T
u
1
v
1
u
2
v
2
...u
n−1
v
n−1
u
n
v
n
into a system
)
∗
. We just have to deal with the rewriting system
R
on words of
N
∪
(
T
)
∗
with the constraint that
R
⊃
(
N
∪
T
T
0
or
R
= {(
u
1
,v
1
)
,
(
u
2
,v
2
)
,...,
(
u
n
,v
n
)}
.
R
⊃
T
1
.
If the general principle ruling the design of grammar
G
2
is
simple to grasp, on the other hand its practical construction is
technically far more complex. We will not present here due
to lack of space – it would require tens of pages – but also
not to give ready-to-use techniques that could be misused. We
will give the two core principles of this practical construction
only:
Indeed, the successive rewriting steps may induce variations of
size in subwords
u
i
and
v
i
. It is thus necessary to record and
update all these variations. In the same or equivalent way as
the
MetaPHOR
engine does, the rewriting management must
take the conditional or not jumps into account.
We now can state the following result.
Proposition 2:
The detection of
POC PBMOT
-based meta-
morphic codes is an undecidable problem.
the ﬁnal code must be envisaged in functional terms and
not in terms of code (assembly instructions). This point is
critical since it is not the form of the different instructions
but their interactions which is the most important aspect.
If the rewriting rules are not trivial ones, the code muta-
tion, in terms of opcodes, will work in a straightforward
and “natural” way. From a technical point of view, this
means that the rewriting rules have to deeply modify in
words
u
1
v
1
u
2
v
2
...u
n−1
v
n−1
u
n
v
n
, both the order of the
u
i
v
i
and the pairs
•
Sketch of Proof.
Every mutated form
ν
i
is a word in the form of
u
1
v
1
u
2
v
2
...u
i
n−1
v
i
n−1
u
i
n
v
i
n
)
L
(
L
i
(
G
2
)) =
L
(
.
Detecting such a code consists in deciding whether two words
u
1
v
1
u
2
v
2
...u
i
n−1
v
i
n−1
u
i
n
v
i
n
ν
i
=
(
u
i
,v
i
)
themselves at the same level;
the whole code must be organised in terms of procedures
(or blocks of codes) even if the coding itself is not
structured in this way.
•
and
u
1
v
1
u
2
v
2
...u
n−1
v
n−1
u
n
v
n
ν
j
=
,
IJCS VOLUME 2 NUMBER 1 2007 ISSN 1306-4428
74
© 2007 WASET.ORG
[ Pobierz całość w formacie PDF ]

zanotowane.pl

doc.pisz.pl

pdf.pisz.pl

sylwina.xlx.pl

Metamorphism

Metamorphism, Hacking and IT E-Book Dump Release

Tematy