原创

字段解析(1)

在ClassfileParser::parseClassFile()函数中,解析完常量池、父类和接口后,接着会调用parser_fields()函数解析字段信息。调用语句如下:

u2 java_fields_count = 0;
// Fields (offsets are filled in later)
FieldAllocationCount fac;
Array<u2>* fields = parse_fields(class_name,
                                     access_flags.is_interface(),
                                     &fac, &java_fields_count,
                                     CHECK_(nullHandle));

在调用parse_fields()方法之前定义了一个变量fac,类型为FieldAllocationCount,定义如下:

来源:classFileParser.cpp文件

class FieldAllocationCount: public ResourceObj {
 public:
  u2 count[MAX_FIELD_ALLOCATION_TYPE];

  FieldAllocationCount() {
    for (int i = 0; i < MAX_FIELD_ALLOCATION_TYPE; i++) { // MAX_FIELD_ALLOCATION_TYPE的值为10
      count[i] = 0;
    }
  }

  FieldAllocationType update(bool is_static, BasicType type) {
    FieldAllocationType atype = basic_type_to_atype(is_static, type);
    // Make sure there is no overflow with injected fields.
    assert(count[atype] < 0xFFFF, "More than 65535 fields");
    count[atype]++;
    return atype;
  }
};

count数组用来统计各个类型变量的数量,这些类型通过FieldAllocationType枚举值定义。FieldAllocationType枚举类的定义如下:

enum FieldAllocationType {
  STATIC_OOP,                // 0 Oops
  STATIC_BYTE,               // 1 Boolean, Byte, char
  STATIC_SHORT,              // 2 shorts
  STATIC_WORD,               // 3 ints
  STATIC_DOUBLE,             // 4 aligned long or double

  NONSTATIC_OOP,             // 5
  NONSTATIC_BYTE,            // 6
  NONSTATIC_SHORT,           // 7
  NONSTATIC_WORD,            // 8
  NONSTATIC_DOUBLE,          // 9

  MAX_FIELD_ALLOCATION_TYPE, // 10
  BAD_ALLOCATION_TYPE = -1
};

主要统计静态与非静态的这5种变量的数量,这样在分配内存空间时,会根据变量的数量计算所需要的内存大小。统计的类型如下:

  • Oop,引用类型
  • Byte,字节类型
  • Short,短整型
  • Word,双字类型
  • Double,浮点类型
    update()方法用来更新对应类型变量的总数量。其中的BasicType枚举类的定义如下:
    ```

源代码位置:utilities/globalDefinitions.hpp 
enum BasicType {
T_BOOLEAN = 4,
T_CHAR = 5,
T_FLOAT = 6,
T_DOUBLE = 7,
T_BYTE = 8,
T_SHORT = 9,
T_INT = 10,
T_LONG = 11,
T_OBJECT = 12,
T_ARRAY = 13,
T_VOID = 14,
T_ADDRESS = 15, // 表示ret指令用到的表示返回地址的returnAddress类型
T_NARROWOOP = 16,
T_METADATA = 17,
T_NARROWKLASS = 18,
T_CONFLICT = 19, // for stack value type with conflicting contents
T_ILLEGAL = 99
};

调用basic_type_to_atype()方法将BasicType对象转换为对应的FieldAllocationType对象,如下:

static FieldAllocationType _basic_type_to_atype[2 * (T_CONFLICT + 1)] = {
BAD_ALLOCATION_TYPE, // 0
BAD_ALLOCATION_TYPE, // 1
BAD_ALLOCATION_TYPE, // 2
BAD_ALLOCATION_TYPE, // 3
///////////////////////////////////////////////////////////
NONSTATIC_BYTE , // T_BOOLEAN = 4,
NONSTATIC_SHORT, // T_CHAR = 5,
NONSTATIC_WORD, // T_FLOAT = 6,
NONSTATIC_DOUBLE, // T_DOUBLE = 7,
NONSTATIC_BYTE, // T_BYTE = 8,
NONSTATIC_SHORT, // T_SHORT = 9,
NONSTATIC_WORD, // T_INT = 10,
NONSTATIC_DOUBLE, // T_LONG = 11,
NONSTATIC_OOP, // T_OBJECT = 12,
NONSTATIC_OOP, // T_ARRAY = 13,
///////////////////////////////////////////////////////////
BAD_ALLOCATION_TYPE, // T_VOID = 14,
BAD_ALLOCATION_TYPE, // T_ADDRESS = 15,
BAD_ALLOCATION_TYPE, // T_NARROWOOP = 16,
BAD_ALLOCATION_TYPE, // T_METADATA = 17,
BAD_ALLOCATION_TYPE, // T_NARROWKLASS = 18,
BAD_ALLOCATION_TYPE, // T_CONFLICT = 19,

BAD_ALLOCATION_TYPE, // 0
BAD_ALLOCATION_TYPE, // 1
BAD_ALLOCATION_TYPE, // 2
BAD_ALLOCATION_TYPE, // 3
///////////////////////////////////////////////////////////
STATIC_BYTE , // T_BOOLEAN = 4,
STATIC_SHORT, // T_CHAR = 5,
STATIC_WORD, // T_FLOAT = 6,
STATIC_DOUBLE, // T_DOUBLE = 7,
STATIC_BYTE, // T_BYTE = 8,
STATIC_SHORT, // T_SHORT = 9,
STATIC_WORD, // T_INT = 10,
STATIC_DOUBLE, // T_LONG = 11,
STATIC_OOP, // T_OBJECT = 12,
STATIC_OOP, // T_ARRAY = 13,
///////////////////////////////////////////////////////////
BAD_ALLOCATION_TYPE, // T_VOID = 14,
BAD_ALLOCATION_TYPE, // T_ADDRESS = 15,
BAD_ALLOCATION_TYPE, // T_NARROWOOP = 16,
BAD_ALLOCATION_TYPE, // T_METADATA = 17,
BAD_ALLOCATION_TYPE, // T_NARROWKLASS = 18,
BAD_ALLOCATION_TYPE, // T_CONFLICT = 19,
};

static FieldAllocationType basic_type_to_atype(bool is_static, BasicType type) {
assert(type >= T_BOOLEAN && type < T_VOID, "only allowable values");
FieldAllocationType result = _basic_type_to_atype[ type + (is_static ? (T_CONFLICT + 1) : 0) ];
assert(result != BAD_ALLOCATION_TYPE, "bad type");
return result;
}


方法baseic_type_to_atype()的实现很简单,这里不在介绍。  

### 1、为变量分配内存空间
为变量分配内存,在ClassFileParser::parse_fields()函数中有如下调用:

u2 fa = NEW_RESOURCE_ARRAY_IN_THREAD(
THREAD, u2, total_fields
(FieldInfo::field_slots + 1));

其中NEW_RESOURCE_ARRAY_IN_THREAD宏定义如下:

#define NEW_RESOURCE_ARRAY_IN_THREAD(thread, type, size)\
(type) resource_allocate_bytes(thread, (size) sizeof(type))

宏替换后相当于如下调用代码:

u2 fa = (u2) resource_allocate_bytes(THREAD, (total_fields (FieldInfo::field_slots + 1)) sizeof(u2))

其中FieldInfo是个枚举类型,枚举常量field_slots的值为6,在内存中开辟total_fields * (FieldInfo::field_slots + 1)个sizeof(u2)大小的内存空间,因为存储时要按如下的规则存储:

f1: [access, name index, sig index, initial value index, low_offset, high_offset]
f2: [access, name index, sig index, initial value index, low_offset, high_offset]
...
fn: [access, name index, sig index, initial value index, low_offset, high_offset]
[generic signature index]
[generic signature index]
...

也就是如果有n个变量,那么每个变量要占用6个u2类型的存储空间,不过每个变量还可能会有generic signature index,所以只能暂时开辟足够大小的空间来临时存储一下,在后面会按照实际情况来分配空间,然后copy一下即可,这样就避免了由于某些变量没有generic signature index而多分配出的空间。 

变量在Class文件中的存储格式如下:

field_info {
u2 access_flags;
u2 name_index;
u2 descriptor_index;
u2 attributes_count;
attribute_info attributes[attributes_count];
}

其中的access_flags、name_index与descriptor_index对应的就是每个fn中的access、name index与sig index。另外的initial value index用来存储常量值(如果这个变量是一个常量),low_offset与high_offset在后面会详细介绍,这里暂时不介绍。

调用的resource_allocate_bytes()函数如下:

extern char resource_allocate_bytes(Thread thread, size_t size, AllocFailType alloc_failmode) {
return thread->resource_area()->allocate_bytes(size, alloc_failmode);
}
char allocate_bytes(size_t size, AllocFailType alloc_failmode = AllocFailStrategy::EXIT_OOM) {
return (char
)Amalloc(size, alloc_failmode);
}
void Amalloc(size_t x, AllocFailType alloc_failmode = AllocFailStrategy::EXIT_OOM) {
// 校验ARENA_AMALLOC_ALIGNMENT必须是2的整数倍
assert(is_power_of_2(ARENA_AMALLOC_ALIGNMENT) , "should be a power of 2");
// 宏扩展后为:
// ((((size_t)(x)) + (((size_t)((2
BytesPerWord))) - 1)) & (~((size_t)(((size_t)((2*BytesPerWord))) - 1))))
x = ARENA_ALIGN(x);

if (!check_for_overflow(x, "Arena::Amalloc", alloc_failmode))
  return NULL;

if (_hwm + x > _max) {
  return grow(x, alloc_failmode);
} else {
  char *old = _hwm;
  _hwm += x;
  return old;
}

}

最终是在ResourceArea中分配空间,每个线程有一个_resource_area属性,调用的Amalloc()函数与之前在释放Handle句柄时介绍到的Amalloc_4()函数非常相似,这里不过多介绍。

_resource_area属性的定义如下:

// Thread local resource area for temporary allocation within the VM
ResourceArea* _resource_area;

在创建线程对象Thead时就会初始化这个属性,在构造函数中有如下调用:

set_resource_area(new (mtThread)ResourceArea()); // 初始化_resource_area属性

ResourceArea继承自Arena类,通过ResourceArea分配内存空间后就可以通过ResourceMark释放,类似于HandleArea和HandleMark。  
### 2、读取变量
下面看ClassFileParser::parse_fields()方法中对变量的读取,如下:

// The generic signature slots start after all other fields' data.
int generic_signature_slot = total_fields * FieldInfo::field_slots;
int num_generic_signature = 0;
for (int n = 0; n < length; n++) {
cfs->guarantee_more(8, CHECK_NULL); // access_flags, name_index, descriptor_index, attributes_count
// 读取变量的访问标识
AccessFlags access_flags;
jint flags = cfs->get_u2_fast() & JVM_RECOGNIZED_FIELD_MODIFIERS;
access_flags.set_flags(flags);
// 读取变量名称索引
u2 name_index = cfs->get_u2_fast();
int cp_size = _cp->length(); // 读取常量池中的数量

Symbol*  name = _cp->symbol_at(name_index);
// 读取描述符索引
u2 signature_index = cfs->get_u2_fast();
Symbol*  sig = _cp->symbol_at(signature_index);

u2     constantvalue_index = 0;
bool   is_synthetic = false;
u2     generic_signature_index = 0;
bool   is_static = access_flags.is_static();
FieldAnnotationCollector parsed_annotations(_loader_data);
// 读取变量属性
u2 attributes_count = cfs->get_u2_fast();
if (attributes_count > 0) {
  parse_field_attributes(attributes_count, is_static, signature_index,
                         &constantvalue_index, &is_synthetic,
                         &generic_signature_index, &parsed_annotations,
                         CHECK_NULL);
  if (parsed_annotations.field_annotations() != NULL) {
    if (_fields_annotations == NULL) {
      _fields_annotations = MetadataFactory::new_array<AnnotationArray*>(
                                         _loader_data, length, NULL,
                                         CHECK_NULL);
    }
    _fields_annotations->at_put(n, parsed_annotations.field_annotations());
    parsed_annotations.set_field_annotations(NULL);
  }
  if (parsed_annotations.field_type_annotations() != NULL) {
    if (_fields_type_annotations == NULL) {
      _fields_type_annotations = MetadataFactory::new_array<AnnotationArray*>(
                                              _loader_data, length, NULL,
                                              CHECK_NULL);
    }
    _fields_type_annotations->at_put(n, parsed_annotations.field_type_annotations());
    parsed_annotations.set_field_type_annotations(NULL);
  }

  if (is_synthetic) {
    access_flags.set_is_synthetic();
  }
  if (generic_signature_index != 0) {
    access_flags.set_field_has_generic_signature();
    fa[generic_signature_slot] = generic_signature_index;
    generic_signature_slot ++;
    num_generic_signature ++;
  }
} // 变量属性读取完毕

FieldInfo* field = FieldInfo::from_field_array(fa, n);
field->initialize(access_flags.as_short(),
                  name_index,
                  signature_index,
                  constantvalue_index);
BasicType type = _cp->basic_type_for_signature_at(signature_index);

// Remember how many oops we encountered and compute allocation type
FieldAllocationType atype = fac->update(is_static, type);
field->set_allocation_type(atype);

// After field is initialized with type, we can augment it with aux info
if (parsed_annotations.has_any_annotations())
   parsed_annotations.apply_to(field);

} // 结束了for语句

按格式读取出变量的各个值后存储到fa中,其中FieldInfo::from_field_array()方法的实现如下:

static FieldInfo from_field_array(u2 fields, int index) {
return ((FieldInfo)(fields + index field_slots));
}

取出第index个变量对应的6个u2类型的内存位置,然后强制转换为FieldInfo*,这样就通过FieldInfo类非常方便的存取6个属性了,FieldInfo类的定义如下:

// This class represents the field information contained in the fields
// array of an InstanceKlass. Currently it's laid on top an array of
// Java shorts but in the future it could simply be used as a real
// array type. FieldInfo generally shouldn't be used directly.
// Fields should be queried either through InstanceKlass or through
// the various FieldStreams.
class FieldInfo VALUE_OBJ_CLASS_SPEC {
u2 _shorts[field_slots];
...
}

这个类没有虚函数,并且_shorts数组中的元素也是u2类型,也就是占用16位,在内存布局与之前介绍存储变量的布局完全一样,直接通过类中定义的方法操作_shorts数组即可。

调用field->initialize()方法存储读取出来的变量各个属性值,方法的实现如下:

void initialize(u2 access_flags,
u2 name_index,
u2 signature_index,
u2 initval_index ){
_shorts[access_flags_offset] = access_flags;
_shorts[name_index_offset] = name_index;
_shorts[signature_index_offset] = signature_index;
_shorts[initval_index_offset] = initval_index;

_shorts[low_packed_offset] = 0;
_shorts[high_packed_offset] = 0;

}

调用_cp->basic_type_for_signature_at()从变量的签名中读取类型,方法的实现如下:

BasicType ConstantPool::basic_type_for_signature_at(int which) {
return FieldType::basic_type(symbol_at(which));
}

Symbol symbol_at(int which) {
assert(tag_at(which).is_utf8(), "Corrupted constant pool");
return
symbol_at_addr(which);
}

BasicType FieldType::basic_type(Symbol* signature) {
return char2type(signature->byte_at(0));
}

BasicType FieldType::basic_type(Symbol* signature) {
return char2type(signature->byte_at(0));
}

// Convert a char from a classfile signature to a BasicType
inline BasicType char2type(char c) {
switch( c ) {
case 'B': return T_BYTE;
case 'C': return T_CHAR;
case 'D': return T_DOUBLE;
case 'F': return T_FLOAT;
case 'I': return T_INT;
case 'J': return T_LONG;
case 'S': return T_SHORT;
case 'Z': return T_BOOLEAN;
case 'V': return T_VOID;
case 'L': return T_OBJECT;
case '[': return T_ARRAY;
}
return T_ILLEGAL;
}

调用ConstantPool类中定义的symbol_at()函数从常量池which索引处获取表示签名字符串的Symbol对象,然后根据签名第1个字符就可判断出来变量的类型。得到变量的类型后,调用fac->update()函数更新对应类型的变量数量,这在本篇文章之前已经介绍过,这里不再介绍。

下面就是将临时存储变量信息的fa中的信息copy到新的数组中,代码如下:

// Now copy the fields' data from the temporary resource array.
// Sometimes injected fields already exist in the Java source so
// the fields array could be too long. In that case the
// fields array is trimed. Also unused slots that were reserved
// for generic signature indexes are discarded.
Array fields = MetadataFactory::new_array(
_loader_data, index
FieldInfo::field_slots + num_generic_signature,
CHECK_NULL);
_fields = fields; // save in case of error
{
int i = 0;
for (; i < index FieldInfo::field_slots; i++) {
fields->at_put(i, fa[i]);
}
for (int j = total_fields
FieldInfo::field_slots;j < generic_signature_slot; j++) {
fields->at_put(i++, fa[j]);
}
assert(i == fields->length(), "");
}
```
在创建fields数组时,可以看到元素类型为u2的数组的大小变为了index * FieldInfo::field_slots + num_generic_signature,其中的index表示实际共有的变量数量(因为可能还有注入的变量),另外根据实际情况分配了num_generic_signature的存储位置,下面就是从fa中获取信息copy到fields中了,逻辑比较简单,这里不再详细介绍。

正文到此结束