排查问题技巧:
MySQL Cluster 自带了一个错误代码的查看的小程序。通过这个小东西我们可以方便的定位问题的原因。 这个程序就是 perror 在MYSQL安装目录的bin下面。
如报错:ERROR 1005 (HY000) at line 474868: Can't create table 'Table Name'(errno: 136)
你可以用perror命令查询 错误原因:
#/usr/local/mysql/bin/perror --ndb 136
MySQL error code 136: No more room in index file
通过它的解释 我们把 index的相关选项改高一些。如下: (依实际情况决定数值大小)
MaxNoOfTables: 1024
MaxNoOfOrderedIndexes: 1024
MaxNoOfUniqueHashIndexes: 1024
这样问题就能解决了。
问题1:在线添加节点后使用 ALTER ONLINE TABLE TABLENAME REORGANIZE PARTITIO N命令 将表数据在数据节点上重新分配时报错:
mysql> alter online table Billinfo reorganize partition;
ERROR 1297 (HY000): Got temporary error 410 'REDO log files overloaded (decrease TimeBetweenLocalCheckpoints or increase
NoOfFragmentLogFiles)' from NDBCLUSTER
解决方法:
调整管理节点上的config.ini配置文件里[ndbd default]下的 TimeBetweenLocalCheckpoints 值和 FragmentLogFileSize 值
TimeBetweenLocalCheckpoints 参数:
Default value = 20
Changed it to TimeBetweenLocalCheckpoints =6
Setting TimeBetweenLocalCheckpoints to 6 or less means that local checkpoints will be executed continuously without pause, independent of the cluster's workload.
FragmentLogFileSize 参数:
Default value = 16M
Changed it to FragmentLogFileSize=256M
Setting this parameter allows you to control directly the size of the redo log files. This can be useful in situations when MySQL Cluster is operating under a high load and it is unable to close fragment log files quickly enough before attempting to open new ones. Increasing the size of the fragment log files gives the cluster more time before having to open each new fragment log file.
问题2:插入数据时,报错ERROR 1297 (HY000): Got temporary error 4010 'Node failure caused abort of transaction' from NDBCLUSTER,数据节点强制关闭。
解决方法: 加大:MaxNoOfConcurrentTransactions 和 MaxNoOfConcurrentOperations 参数
问题3:ERROR 1297 (HY000) at line 1872: Got temporary error 410 'REDO log files overloaded, consult online manual (decrease TimeBetweenLocalCheckpoints, and|or incre' from NDBCLUSTER
Solution : Modify / Add parameter TimeBetweenLocalCheckpoints in your config.ini.
Default value = 20
Changed it to TimeBetweenLocalCheckpoints =6
Setting TimeBetweenLocalCheckpoints to 6 or less means that local checkpoints will be executed continuously without pause, independent of the cluster's workload.
问题4:error 2815: 'File not found(Ndbd file system inconsistency error, please report a bug). Ndbd file system error, restart node initial'.
解决方法:增大config.ini文件中的 NoOfFragmentLogFiles 和 RedoBuffer 参数
问题5:错误日志:
Status:Temporary error,restart node
Message:System error,node killed during node restart by other node (Internal error,programming error or missing error message, please report a bug)
Error:2303
Error data:Node 12 killed this node because GCp stop was detected
Error object:NDBCNTR (Line:273)Ox000006
原因:这个问题是因为undo日志空间文件 用完了。
解决方法:增加 undo日志空间文件 ,语法如下
alter logfile group lg_1 add undofile 'undo_2.log' initial_size 1024M engine ndbcluster;
参考:http://blog.csdn.net/chenxingzhen001/article/details/7598167
NDB常见错误指南
在安装和使用mysql cluster集群的过程中,遇到了很多问题,现在把那些问题列出来,并复上解决方法,希望大家在以后的使用中能少花时间少走弯路:
1、导入数据的时候遇到如下报错
ERROR 1005 (HY000) at line 25: Can't create table 'tuge.pangolin_fnc_accountverification' (errno: 140)
Error | 1296 | Got error 738 'Record too big' from NDB
分析原因:表行数据太大
解决办法:需要更改表结构,使最大单行数据的大小小于8KB!
2、导入大量数据的时候遇到如下报错
Error 1297: Got temporary error 'REDO' log overloaded.
ERROR : Got temporary error 1204 'Temporary failure, distribution changed' from NDBCLUSTER
ERROR : Got temporary error 1234 'REDO log files overloaded (increase disk hardware)' from NDBCLUSTER
分析原因:遇到这个错误,是表示redo log用完了,需要增加
解决办法:修改config.ini文件,更改或添加如下参数:
FragmentLogFileSize=256M
NoOfFragmentLogFiles=16
NoOfFragmentLogFiles这个参数可以更改到更大,但是初始化的时候会慢一点
3、在导入大量数据的时候,出现如下报错:
ERROR 1114 (HY000) at line 54: The table 'gps_led_sendadverthistory' is full
分析原因:你分配的内存或者硬盘空间已经用完(如果你采用磁盘表的话),需要通过ndb管理节点客户端和登录mysql查看具体的原因,通过ndb_mgm登录,执行命令:
All report memory 看下分配的内存是否使用完,如果采用磁盘表的,还需要登录mysqld节点,然后执行如下查询,来确定是否是磁盘不足:
SELECT TABLESPACE_NAME, FILE_NAME, EXTENT_SIZE*TOTAL_EXTENTS/1024/1024 AS TOTAL_MB, EXTENT_SIZE*FREE_EXTENTS/1024/1024 AS FREE_MB, EXTRA FROM information_schema.FILES WHERE FILE_TYPE="DATAFILE";
解决办法:如果是数据内存不足的情况,直接更改文件config.ini中[NDBD DEFAULT]下的
DataMemory=11480M
IndexMemory=1024M
根据实际情况加大
如果是采用磁盘表,磁盘文件空间不足的情况,就需要添加磁盘文件,格式如下:
ALTER TABLESPACE ts_1
ADD DATAFILE 'data_4.dat'
INITIAL_SIZE 10240M
ENGINE NDBCLUSTER;
4、在还原的时候遇到如下报错:
Configuration error: Error : Could not alloc node id at 192.168.100.223 port 1186: Id 12 configured as ndbd(NDB), connect attempted as mysqld(API).
Unable to connect to management server.
分析原因:因为还原的时候会去连接mysqld,而在配置ndb的时候没有添加空的mysqld节点,因此出现这个报错
解决办法:在config.ini添加几个空的mysqld节点,语法如下:
[mysqld]
[mysqld]
[mysqld]
[mysqld]
5、导入大量数据的时候或者执行大量的insert或者update操作的时候,经常会出现如下报错:
ERROR 1297 (HY000) at line 1: Got temporary error 1218 'Send Buffers overloaded in NDB kernel' from NDBCLUSTER
这个报错的原因有几个,第一、是参数SendBufferMemory设置过小,第二、和mysqld的二进制日志有关系,第三、和网络环境有关系
解决办法:查看网络环境是否OK ,是否出现双工模式部队,是否都是千兆网或者百兆网,加大SendBufferMemory的设置,一般2M就够了,设置TotalSendBufferMemory = 256M,最后如果还是出现如此报错,建议关闭mysql的二进制日志,在my.cnf中注释掉相关的参数就可以了,比如:
#log-bin=mysql-bin
6、错误日志
Status: Temporary error, restart node
Message: System error, node killed during node restart by other node (Internal error, programming error or missing error message, please report a bug)
Error: 2303
Error data: Node 12 killed this node because GCP stop was detected
Error object: NDBCNTR (Line: 273) 0x00000006
分析原因:这个问题是因为undo日志文件空间用完了!
解决办法:增加undo日志空间文件,语法如下:
ALTER LOGFILE GROUP lg_1
ADD UNDOFILE 'undo_2.log'
INITIAL_SIZE 2048M
ENGINE NDBCLUSTER;
7、还原的时候出现如下报错:
Temporary error: 266: Time-out in NDB, probably caused by deadlock
分析原因:出现这个问题是锁定超过1.2s
解决办法:在config.ini中加入:TransactionDeadLockDetectionTimeOut=10000 默认是1200
8、创建日志文件组的时候,出现如下报错:
Error CODE : 1528 Failed TO CREATE LOGFILE GROUP
用SHOW WARNINGS; 查看详细报错如下:
Got error 1504 'Out of logbuffer memory' FROM NDB
Failed TO CREATE LOGFILE GROUP
分析原因:是因为SharedGlobalMemory的值比UNDO_BUFFER_SIZE小导致错误
解决办法:把SharedGlobalMemory调大,原来我的UNDO_BUFFER_SIZE为128,但是我的SharedGlobalMemory为80,将SharedGlobalMemory设置成384M解决问题。
参考:http://blog.chinaunix.net/uid-20682026-id-3257100.html
【个人实践】在SQL语言数据还原时,遇到如下问题:
问题1:导入“数据表结构”前,要做如下替换处理
(1).将引擎 ENGINE=MyISAM 和 ENGINE=InnoDB 为 ENGINE=ndbcluster;
(2).把类似 KEY `user_guid` (`user_guid`) USING HASH 中的 USING HASH 去掉,否则会出现类型“#1112 - Table 'tk_user' uses an extension that doesn't exist in this MySQL version”的错误;
问题3:ERROR 1114 (HY000) at line 209502: The table 'tk_feed' is full
ndb_mgm> Node 12: Data usage decreased to 80%(2073 32K pages of total 2560)
Node 11: Data usage decreased to 78%(2012 32K pages of total 2560)
Node 12: Data usage decreased to 78%(2006 32K pages of total 2560)
Node 11: Data usage increased to 80%(2053 32K pages of total 2560)
Node 12: Data usage increased to 80%(2064 32K pages of total 2560)
Node 12: Data usage increased to 90%(2304 32K pages of total 2560)
解决方法:调整如下参数
DataMemory=15000M
IndexMemory=2560M
参考:http://stackoverflow.com/questions/5641156/mysql-cluster-the-table-is-full-workaround
问题4:ERROR 1297 (HY000) at line 421616: Got temporary error 4010 'Node failure caused abort of transaction' from NDBCLUSTER
解决方法:调整如下参数
MaxNoOfConcurrentTransactions = 16384
MaxNoOfConcurrentOperations = 100000
MaxNoOfLocalOperations = 100000
按照上面调整了参数,但“sql语言数据还原”时,还是报上面的错,最后通过继续修改如下参数,问题才给解决(泪奔):
FragmentLogFileSize = 256M
NoOfFragmentLogFiles = 16
其他参考: