jdilt�Ĳ��

��̿�ݼ�--û��һ��

jdilt — 2007/9/13 9:28:32

һ��÷��

F1 ��ʾ��ǰ��windows�İ��ݡ�                                       F2 ��ѡ��һ��ļ��Ļ��ζ�š��

F3 ��ϵ�ʱ��Ǵ򿪡��ң��ļ�� Ի��             F10��ALT ��ǰ��Ĳ˵��

windows��CTRL+ESC �򿪿�ʼ�˵�                                                    CTRL+ALT+ɾ�� win9x�д򿪹رճ��Ի��

SHIFT+ɾ�� ɾ��ѡ��ѡ��Ŀ,��ļ�,��ֱ��ɾ��Ƿ��վ

CTRL+N �½�һ��µ��ļ�                                                                             CTRL+O �򿪡��ļ��Ի��

CTRL+P �򿪡��ӡ��Ի��                                                                          CTRL+S ��浱ǰ��ļ�

CTRL+X ��б�ѡ��Ŀ��                                                             CTRL+INSERT �� CTRL+C ��Ʊ�ѡ��Ŀ��

SHIFT+INSERT �� CTRL+V ճ��е��ݵ��ǰλ��                 ALT+BACKSPACE �� CTRL+Z ��һ��Ĳ��

ALT+SHIFT+BACKSPACE ��һ��Ĳ��                                Windows��+M ��С��б��򿪵Ĵ��ڡ�

Windows��+CTRL+M ��½��ָ��һ��ǰ��ڵĴ�С��λ��      Windows��+E ��Դ��

Windows��+F �򿪡��ң��ļ��Ի��                                           Windows��+R �򿪡��С��Ի��

Windows��+BREAK �򿪡�ϵͳ��ԡ��Ի��                                             Windows��+CTRL+F �򿪡��ң��Ի��

SHIFT+F10��һ� �򿪵�ǰ���Ŀ�Ŀ�ݲ˵�

SHIFT �ڷ��CD��ʱ��²��ţ��Զ��CD��ڴ�word��ʱ��²��,��ĺ�

ALT+F4 �رյ�ǰӦ�ó��                                                                                 ALT+SPACEBAR �򿪳��ϽǵĲ˵�

ALT+TAB �л��ǰ��                                                                                      ALT+ESC �л��ǰ��

ALT+ENTER ��windows��е�MSDOS��ڴ��ں�ȫ��Ļ״̬��л�

PRINT SCREEN ��ǰ��Ļ��ͼ��ʽ��                             ALT+PRINT SCREEN ��ǰ���򴰿��ͼ��ʽ��
CTRL+F4 �رյ�ǰӦ�ó��еĵ�ǰ�ı��word�У�

CTRL+F6 �л��ǰӦ�ó��е��һ��ı��shift ��ǰһ��ڣ�

��IE�У�

ALT+RIGHT ARROW ��ʾǰһҳ��ǰ��                                            ALT+LEFT ARROW ��ʾ��һҳ��˼��

CTRL+TAB ��ҳ��ϵĸ��л��shift��                             F5 ˢ��                            CTRL+F5 ǿ��ˢ��

ִ�в˵��Ӧ�� ALT+�˵��ϴ��»��ߵ��ĸ                             �رն��ĵ��еĵ�ǰ�� CTRL+ F4

�رյ�ǰ��ڻ��˳�� ALT+ F4                                                              ��ʾ��ǰ��ڵ�ϵͳ�˵� ALT+�ո��

��ʾ��ѡ��Ŀ�Ŀ�ݲ˵� SHIFT+ F10                                                         ��ʾ��ʼ��˵� CTRL+ ESC

�� CTRL+ Z

��ʹ�á�Windows��Դ��Ŀ�ݼ� Ŀ�Ŀ�ݼ�

��ǰѡ��չ��,Ҫ�۵��ѡ��ļ��ͷ�۵��ѡ��ļ�� NUM LOCK+��(-)

��ǰѡ��۵��ˣ�Ҫչ��ѡ��һ��ļ��Ҽ�ͷչ��ǰѡ��µ��ļ�� NUM LOCK+* չ��ѡ��ļ�� NUM LOCK+�Ӻ�(+)

��ʹ�� WINDOWS��

��ʹ�� Microsoft��Ȼ��̻�� Windows�ձ��κμ��ݼ��̵��¿�ݼ��

��ϵİ�ť��ѭ�� WINDOWS+ TAB

��ʾ��ң��ļ�� WINDOWS+F                                                              ��ʾ��ң�� CTRL+WINDOWS+F

��ʾ��С�� WINDOWS+ R                                                                        ��ʾ��Windows��Դ�� WINDOWS+ E

��ʾ��ʼ��˵� WINDOWS                                                                               ��ʾ��ϵͳ��ԡ��Ի�� WINDOWS+BREAK

��С��ԭ��д�� WINDOWS+ D                                                                ��С��д�� SHIFT+ WINDOWS+ M

�ġ�ʹ�á��ҵĵ��ԡ��͡�Windows��Դ��Ŀ�ݼ�

�ر��ѡ�ļ��м��и��ļ��а�ס SHIFT��ٵ��رհ�ť��ڡ��ҵĵ��ԡ��

��ƶ��һ��ͼ ALT+��ͷ ��ǰ�ƶ��һ��ͼ ALT+�Ҽ�ͷ

�鿴��һ��ļ�� BACKSPACE

�塢ʹ�öԻ��еĿ�ݼ�

��ǰ�ؼ��Ǹ��ť��Ҫ��ð�ť��ǰ�ؼ��Ǹ��ѡ��,Ҫѡ��ø�ѡ��

��ǰ�ؼ��Ǹ�ѡ�ť,Ҫ��ѡ��ո�                                         ��Ӧ�� ALT+��»��ߵ��ĸ ��ѡ��ť ENTER

��ѡ��ƶ� SHIFT+TAB                                                                      ��ѡ���ƶ� CTRL+SHIFT+TAB

��ѡ��ǰ�ƶ� TAB                                                                                    ��ѡ���ǰ�ƶ� CTRL+TAB

��ڡ��Ϊ��򡰴򿪡� �Ի��ѡ��ĳ�ļ��У� Ҫ��һ��ļ�� BACKSPACE

�ڡ��Ϊ��򡰴򿪡��Ի��д򿪡��浽��򡰲��ġ� F4

ˢ�¡��Ϊ��򡰴򿪡��Ի�� F5

��ʹ�á��桱��ҵĵ��ԡ��͡�Windows��Դ��ݼ�

��ʱ��á��Զ��š� ��ܰ�ס SHIFT�� CD-ROM

��ļ��ס CTRL�϶��ļ�                                                           ��ݷ�ʽ��סCTRL+SHIFT�϶��ļ�

ѡ��Ŀ CTRL+ A                                                                      �鿴��Ŀ��ALT+ENTER��ALT+˫��

�ߡ�Microsoft�Ŵ��Ŀ�ݼ�

Windows�ձ�+PRINT SCREEN��Ļ��Ƶ��壨��꣩

Windows�ձ�+SCROLL LOCK��Ļ��Ƶ��壨��꣩

Windows�ձ�+ PAGE UP�л��ɫ�� Windows�ձ�+ PAGE DOWN�л��

Windows�ձ�+��ϼ�ͷ��ӷŴ�� Windows�ձ�+��¼�ͷ��С�Ŵ��

�ˡ�ʹ�ø��ѡ��ݼ�

�л�ճ�ͼ�� SHIFT�� л��л�� NUM LOCK��

�л�ɸѡ�� SHIFT�� л��ALT+��SHIFT+NUM LOCK

�л��߶Աȶȿ��ALT+��SHIFT+PRINT SCREEN
----------------------------------------------------------------
shift ��ʮ��

��ͼ򵥽��һ��shift��ļ��ֹ��ܣ�

1��ɾ��ļ��
��Ҽ��Ҫɾ��ļ�,�ڰ�סshift��ͬʱѡ��ɾ��ʹ�ü��ʱ��del��ͬʱ��shift��Ȼ�󵥻��ǡ�ȷ��ɾ��ɾ��ѡ�ļ��ǰ��ļ��ŵ��վ�С�

2��Զ��ţ�
��Ѿ��ѹ��Ϊ��Զ��š�,�ڹ��з��̵�ͬʱ��סshift��ֱ��ĵ��,��ʱȡ��Զ��š��ܡ��Զ��š��ܱ��ر��, ��ʱ��סshift��򽫵��Զ��š�

3��ر��ڣ�
��ʹ�á��ҵĵ��ԡ��жര��ʱ,��ر��ѡ�ļ��м��и��ļ��п��Ȱ�סshift��Ȼ��ѡ�ļ��б��Ͻǵ��رհ�ť(��)��ɡ�

4��޸Ĵ򿪷�ʽ��
��չ��ͬ��ļ��ò�ͬ�ĳ��ļ�, ��Ȱ�סshift��Ҽ��ļ��Ȼ�󵥻��򿪷�ʽ��ɡ�

5��˵��
��Ӧ�ó��У��Ҽ��ĳһ��ʱ,�ᵯ��һ��ݲ˵��Ҽ��,��ʹ��һ��أ��ʱֻ��ѡ��ö��Ȼ��shift��f10��ɡ�

6��ļ��
��win98��ʱ��סshift��ֱ��ϵͳ��̽��, �򡰿�ʼ\��\�� еĳ��򽫲��ִ�С��֮ǰ�Ͱ�ס��shift��ϵͳ��Զ��밲ȫģʽ��

7��ѡ��ļ��
��桰�ҵĵ��ԡ��Դ��,��еĵ��ʽ��䵽�ļ��к��ָ��ĳһ��,Ȼ��ڰ��shift��ͬʱ��ָ��һ��,ϵͳ��ѡ��֮��ɵľ��ڵ��ж��Դﵽ��ѡȡ��Ŀ�ġ�

8��ݷ�ʽ��
��ʹ�á��ҵĵ��ԡ��windows ��Դ��ʱ,ҪѸ�ٴ��ݷ�ʽ,��Ȱ�סctrl��shift,Ȼ��ļ��ϵ��ϼ��ɡ�

9��ϵͳ��
��ϵͳ��й��г��Ҫ��ϵͳʱ,��ѡ�񡰹ر�ϵͳ��еġ��Ȼ��ڰ�סshift��ͬʱ��굥��ǡ��ť��ɿ��ϵͳ��, ��Ϊ��ʱwin98ֻ��gui,��´�bios��, �Ӷ��Խ�ʡ��ʱ�䣨��˷��Ե�ǰ״̬Ϊ��ȫģʽ��,��ǰ�װ��Ӳ��Ҳ��ܲ��ã��

10��л��룺
��win98�н��ʱ,��ctrl��shift��ڸ��뷨�н��л��Ȼ��Ҳ��Զ��alt��shift��shift�ӿո��ɽ��ȫ��/��Ƿ�ʽ��ת��

11��internet explorer�У�
�Ȱ�סshift��ٵ��һ��ӣ��ͻᵯ��һ��µ��ʾ��ָ��webҳ�棬��ú��Ҽ��ĳһ�ı��ӣ�ע�⣺ͼ��Ӳ��У��ѡ��´��д򿪡� һ��

12��word 97�У�
��סshift��ٵ��ļ��˵��еġ��رա��͡��桱����滻Ϊ��ȫ��رա��͡�ȫ��桱��

13��ڻ�ͼʱ��Ȱ��shift��Ի��ͼ�Σ��Ρ�Բ�Ρ��Ρ��ֱ �߻�ˮƽ�ߵȣ��ڸı�ͼ�ζ��Ĵ�Сʱ��Ȱ��shift��Ա��ԭͼ�ĳ��

�� alt��

1��
��alt��Լ����ڵĲ˵��ʹ�˵��ĵ�һ��˵��Ϊ��,��alt��һ��ĸ�Ϳ��Լ��ĸ��Ĳ˵��,�簴��alt��f�Ϳ��Լ��ǰ��ڵġ��ļ��file��˵��,�ڶԻ��,ͬʱ��alt��ʹ��»��ߵ��ĸ��ѡ��ѡ�ִ��Ӧ�Ĳ��

2��ֹĳһ��̣�
��ctrl��alt��del ��ϼ��ɵ��һ��رճ��򡱶Ի��ѡȡĳһ��, Ȼ��񡱼��ɿ��ֹ�ý��̡��ٴΰ��ctrl��alt��del��ϼ��

3��б��
�ڶԻ��У��ͨ��alt��¼�ͷ��ѡ��б��ȥ�ƶ��Ѱ�Ҳ��б��¼�ͷ��ť֮�ࡣ

4��ٲ鿴��ԣ�
��alt��enter��ϼ��ڰ�סalt��ͬʱ��˫��ĳ��Դ��ұߴ��е��ļ��С��ļ��ϵ�ͼ��,��ߴ��е��ļ��в��У�, ��ɿ��ٴ��䡰��ԡ��Ի��൱�ڵ��Ҽ��ѡȡ��ԡ��

5��л�dos��ڣ�
��alt��enter��ϼ��ڴ��ں�ȫ��Ļ��ʽ��л� ms��dos��

6��ץͼ�ȼ��
��Ϸ��Ӧ��У��Ļ�ϳ��־��Ļ��ʱ��alt��printscreen��ϼ��Ϳ��԰ѵ�ǰ���ڻ�Ի��ݿ��

MyEclipse+Resin ��װ�ĵ�

jdilt — 2007/9/11 8:49:02

1. ��װjava��
�ȴ�http://java.sun.com��j2sdk-1_4_2_11-windows-i586-p.exe
��װĿ¼��D:\j2sdk1.4.2_11

��JAVA_HOME(ϵͳ)��
JAVA_HOME=D:\j2sdk1.4.2_11
��CLASSPATH(ϵͳ)��
CLASSPATH=D:\j2sdk1.4.2_11\bin;.;D:\j2sdk1.4.2_11\lib\tools.jar;D:\j2sdk1.4.2_11\lib\dt.jar
��(ϵͳ)��Path��룺
D:\j2sdk1.4.2_11\bin

2. ��װmysql5.0
��setup.exe
ѡ��װ��ͣ�custom
��װĿ¼��d:\MySQL\MySQL Server 5.0\

config mysql server:
instance type: developer machine
multifunctional
tablespace: installation path

default charset: gbk

3. ��װMyEclipse��

4. ��װmysql-connector-java-3.1.11-bin.jar
��mysql-connector-java-3.1.11-bin.jar�ļ��临�Ƶ�resin��libĿ¼��Լ�%catalina_home%/common/libĿ¼�¡�

5. ��װresin-3
��ѹ��resin-3.zip�ļ��x:\resin-3
��ϵͳ��RESIN_HOMEΪ��d:\resin-3
��resin��httpd
��ӣ�
http://localhost:8080/

6.��MyEclipse�е��resin��Ŀ�ķ��

1��½��java��Ŀ�Ĺ��·��а�resin��libĿ¼��а��jdk��tools.jar��

2��Ŀ��д��а��Ĳ��޸ģ�

run->new application->

Main class:
com.caucho.server.resin.Resin

Program arguments:
-conf "${project_loc}\resin.conf"

VM arguments:
-Djava.util.logging.manager=com.caucho.log.LogManagerImpl

3��resin.config�ڵ�ǰ��Ŀ�ĸ�Ŀ¼�£��޸��е�·��ǰ��Ŀ��Ŀ¼

4��Ŀ

JavaScript DOM ��

jdilt — 2007/9/10 8:48:48

��һ�£�

JavaScript��һ�ֽű��ԣ�ͨ��ֻ��ͨ��web��ȥִ��ĳ�ֲ��
DOM��һ�׶��ĵ��ݽ��г��͸���ķ��
DHTML
 ��HTML��в��
 ��CSS��ʽ
 ��JavaScript��̬�ı��ʽ

�ڶ��£�

JavaScript��Ƕ�뷽ʽ��
1��HTMLҳ��ڲ��

2��ⲿjs�ļ�

JavaScrip��Ϊ��ԣ��Ǳ��ԡ�

�﷨�ص㣺
1��
��
var beatles = Array("John","Paul","George","Ringo");
��
var lennon = Array();
lennon["name"] = "John";
lennon["year"] = 1940;
lennon["living"] = false;
2�� 䣺
if
{}
else
{}
û��else if ��
3��
�ڽ��
��host object��web��ṩ��Ԥ�� 磺Form��Image��Element��
ͨ��document��Ի�ø��ҳ�ϵ�Ԫ��

��3�£�

DOM�еĽڵ�

Don't forget to ...

pΪԪ�ؽڵ� titleΪ��Խڵ� Don't forget to ...Ϊ�ı��ڵ�

�ĸ�ʵ�õ�DOM��
1�� document.getElementById("id_string")
��ض��
2�� document.getElementsByTagName("tagName_String")
��ض��
��ͨ��
�磺�õ��ڵ��
document.getElementsByTagName("*").length;
3�� object.getAttribute("attributeName")
4�� object.setAttribute("attributeName")

��£�

��⣺��onclick�¼��Ӧʱ��ӱ��Ĭ��ΪҲ�ᷢ��.��
��ӷ��ֵfalse��ֹĬ��Ϊ��
onclick = "showPic(this);return false;"

node.nodeType ��
Ԫ�ؽڵ��ֵ��1
��Խڵ��ֵ��2
�ı��ڵ��ֵ��3

��£�

Ԥ��·��ȷ��ҳ��û��javascript��Ҳ��
http://www.example.com" onclick = "popUp(this.href);return false;">example

��CSSѧϰ��ĵ��ṹ��ĵ��ʽ��롣

��£�

window.onload ��¼��

��£�

ʹ��Javascript�ı��ҳ�Ľṹ��
Document.write()��
�ƻ��Javascript��ԭ��Ӧ��ʹ�á�
MIME��Ϊ application/xhtml + xml �� document.wirte()��ݡ�
innerHTML��Ҳ�ǷǱ�׼��DOM��
1�� Element document.createElement("tagName");
2�� Text document.createTextNode("textData");
3�� oElement = object.appendChild(oNode)��
4�� oElement = object.insertBefore(oNewNode [, oChildNode])��
�磺node_old.parentNode.insertBefore(node_new,node_old);
5�� DOMû��ṩinsertAfter()��insertAfter��¼��

�ڰ��£�

For-in ѭ��԰�ĳ��±꣨�򣺹��еĹؼ��֣��ʱ�ظ��һ��

�� IE��֧��

�ھ��£�

��ҳ�Ĺ��:
�ṹ�㣺HTML XHTML
��ʾ�㣺CSS
��Ϊ�㣺Javascript&DOM

style��ԣ�
�ĵ��ÿ��Ԫ�ؽڵ㶼��һ��style��ԣ��Ԫ�ص��ʽ��Ϣ��ѯ��Ի᷵��һ��󣬶��һ��򵥵��ַ��
element.style.property
eg: para.style.color; element.style.fontFamily;
style��ֻ�ܼ��Ƕ��HTML��ʽ��Ϣ��css��ļ��е��Ϣ��ܱ��

��Ԫ�ص�style��Ըı�Ԫ��ʽ��򵥵ķ�ʽ��ͨ��ı�Ԫ�ص�class��ֵ��

��ҪΪԪ��׷��ʽʱ��ʹ��addClass�� addClass��¼��

��ʮ�£�

position��Ե�4��ȡֵ
static��Ĭ��ֵ��й�Ԫ�ذ�װ��HTML�ĵ��еĳ��˳��ʾ
relative��static��ƣ��ڿ��float��Ե��´��ĵ��ʾ˳��
absolute��ɷ��õ�λ��Ԫ��ĵ��г��ֵ�λ��޹أ�Ϊ�˱��ͻ��ֻleft / right ��top / buttom ��

ʱ�䶯��
setTimeout("function",interval);

function changePosition(){
  .
  .
  .
movement = setTimeout("changePosition()",5000);
}
movementΪȫ�ֱ��ں��ȡ��clearTimeout(movement);

parseInt��ڴ��ַ��ȡ��ֵ��Ϣ��
Eg��parseInt("15px") == 15;

parseInt��toString��Խ��ת��

//10��ת16��
var n = 123;
alert(n.toString(16));

//16��ת2��
var n = 0xff;
alert(n.toString(2));

//16��ַ��ת10��
var n = "ff"
alert(parseInt(n,16));

ʹ��overflow��Խ��вü��
Overflow��ʾ��С�ڰ��ݵ��
overflow��4��ֵ��
visible��ü��ݣ�ȫ��ݿɼ�
hidden��ü��ݣ��ֲ��ɼ�
scroll��ü��ݣ��й��
auto��scroll��ƣ�ֻ��ڷ��й��

ע�⣺��position��Ϊabsolution��Ԫ��һ��position��Ϊrelative��Ԫ�أ��߾ͳ�Ϊǰ�ߵ��ǰ��ں��ߵ��ﰴabsolute��ʽ�ڷš�

��ʮ��£�

Ajax��
��˴��첽��ģ��û��ÿһ��󲻼��ûᵼ��ҳ��ˢ�£��ں�̨��д��
Ajax��ڿͻ��˺ͷ��֮��һ��תվ��javascript�ű��Ȱ��ӿͻ��˷��תվ��תվ�ٰ��ת��ӦҲͨ��תվת��javascript��תվ��XMLRequest��

DOM��ԣ�

insertBefore��appendChild��ڲ��ڵ�ʱ��Ҫ��Ľڵ㱾��ĵ��У��ô��Զ��removeChild��ԭʼλ�á��˶��ת��ĵ��ڽڵ㡣

DOM��ԣ�
Ԫ�ؽڵ� ��Խڵ� �ı��ڵ�
nodeName Ԫ�� "#text"
nodeType 1 2 3
nodeValue null ��ֵ �ı��

��DOM��ڵ㣺
 childNodes
 firstNode
 lastNode
 previousSibling
 nextSibling
 parentNode

��11�£�

��վ��ƣ�
/images
/styles
/script
layout.css
color.css
typography.css

basic.css
@import url (layout.css);
@import url (color.css);
@import url (typography.css);

��ҳ��У�ֻ��Ҫ��basic.css ��

ע�⣺
��ɫ��Ϊĳ��Ԫ��ĳ��ǰ��ɫ��ӦΪ��һ�ֱ��ɫ��ע��ϸ�ڣ��ܵ��ĳЩ�ı��ݱ�ɡ��Ρ��֡�
a:link {
color:#445;
background-color:#e66;
}

��֣��ͨ��ÿ��Ԫ�صľ��պͼ�඼��Ϊ0��ҳ�治��Ĭ��õ�Ӱ�졣
*{
padding 0;
margin 0;
}

�Ƽ��Ϣ��layout.css��Ϣ��typography.css�

��¼��
addLoadEvent
insertAfter
addClass
StripTable

function addLoadEvent(func) {
var oldonload = window.onload;
if (typeof window.onload != ��function') {
window.onload = func;
} else {
window.onload = function() {
oldonload();
func();
}
}
}

addLoadEvent(stripeTables);

The Easy Way to Extract Useful Text from Arbitrary HTML

jdilt — 2007/8/31 7:36:22

You��ve finally got your hands on the diverse collection of HTML documents you needed. But the content you��re interested in is hidden amidst adverts, layout tables or formatting markup, and other various links. Even worse, there��s visible text in the menus, headers and footers that you want to filter out. If you don��t want to write a complex scraping program for each type of HTML file, there is a solution.

This article shows you how to write a relatively simple script to extract text paragraphs from large chunks of HTML code, without knowing its structure or the tags used. It works on news articles and blogs pages with worthwhile text content, among others��

Do you want to find out how statistics and machine learning can save you time and effort mining text?

The concept is rather simple: use information about the density of text vs. HTML code to work out if a line of text is worth outputting. (This isn��t a novel idea, but it works!) The basic process works as follows:

Parse the HTML code and keep track of the number of bytes processed.
Store the text output on a per-line, or per-paragraph basis.
Associate with each text line the number of bytes of HTML required to describe it.
Compute the text density of each line by calculating the ratio of text to bytes.
Then decide if the line is part of the content by using a neural network.

You can get pretty good results just by checking if the line��s density is above a fixed threshold (or the average), but the system makes fewer mistakes if you use machine learning �� not to mention that it��s easier to implement!

Let��s take it from the top��

Converting the HTML to Text

What you need is the core of a text-mode browser, which is already setup to read files with HTML markup and display raw text. By reusing existing code, you won��t have to spend too much time handling invalid XML documents, which are very common �� as you��ll realise quickly.

As a quick example, we��ll be using Python along with a few built-in modules: htmllib for the parsing and formatter for outputting formatted text. This is what the top-level function looks like:

def extract_text(html):
    # Derive from formatter.AbstractWriter to store paragraphs.
    writer = LineWriter()
    # Default formatter sends commands to our writer.
    formatter = AbstractFormatter(writer)
    # Derive from htmllib.HTMLParser to track parsed bytes.
    parser = TrackingParser(writer, formatter)
    # Give the parser the raw HTML data.
    parser.feed(html)
    parser.close()
    # Filter the paragraphs stored and output them.
    return writer.output()

The TrackingParser itself overrides the callback functions for parsing start and end tags, as they are given the current parse index in the buffer. You don��t have access to that normally, unless you start diving into frames in the call stack �� which isn��t the best approach! Here��s what the class looks like:

class TrackingParser(htmllib.HTMLParser):
    """Try to keep accurate pointer of parsing location."""
    def __init__(self, writer, *args):
        htmllib.HTMLParser.__init__(self, *args)
        self.writer = writer
    def parse_starttag(self, i):
        index = htmllib.HTMLParser.parse_starttag(self, i)
        self.writer.index = index
        return index
    def parse_endtag(self, i):
        self.writer.index = i
        return htmllib.HTMLParser.parse_endtag(self, i)

The LineWriter class does the bulk of the work when called by the default formatter. If you have any improvements or changes to make, most likely they��ll go here. This is where we��ll put our machine learning code in later. But you can keep the implementation rather simple and still get good results. Here��s the simplest possible code:

class Paragraph:
    def __init__(self):
        self.text = ''
        self.bytes = 0
        self.density = 0.0
 
class LineWriter(formatter.AbstractWriter):
    def __init__(self, *args):
        self.last_index = 0
        self.lines = [Paragraph()]
        formatter.AbstractWriter.__init__(self)
 
    def send_flowing_data(self, data):
        # Work out the length of this text chunk.
        t = len(data)
        # We've parsed more text, so increment index.
        self.index += t
        # Calculate the number of bytes since last time.
        b = self.index - self.last_index
        self.last_index = self.index
        # Accumulate this information in current line.
        l = self.lines[-1]
        l.text += data
        l.bytes += b
 
    def send_paragraph(self, blankline):
        """Create a new paragraph if necessary."""
        if self.lines[-1].text == '':
            return
        self.lines[-1].text += 'n' * (blankline+1)
        self.lines[-1].bytes += 2 * (blankline+1)
        self.lines.append(Writer.Paragraph())
 
    def send_literal_data(self, data):
        self.send_flowing_data(data)
 
    def send_line_break(self):
        self.send_paragraph(0)

This code doesn��t do any outputting yet, it just gathers the data. We now have a bunch of paragraphs in an array, we know their length, and we know roughly how many bytes of HTML were necessary to create them. Let��s see what emerge from our statistics.

Examining the Data

Luckily, there are some patterns in the data. In the raw output below, you��ll notice there are definite spikes in the number of HTML bytes required to encode lines of text, notably around the title, both sidebars, headers and footers.

While the number of HTML bytes spikes in places, it remains below average for quite a few lines. On these lines, the text output is rather high. Calculating the density of text to HTML bytes gives us a better understanding of this relationship.

The patterns are more obvious in this density value, so it gives us something concrete to work with.

Filtering the Lines

The simplest way we can filter lines now is by comparing the density to a fixed threshold, such as 50% or the average density. Finishing the LineWriter class:

    def compute_density(self):
        """Calculate the density for each line, and the average."""
        total = 0.0
        for l in self.lines:
            l.density = len(l.text) / float(l.bytes)
            total += l.density
        # Store for optional use by the neural network.
        self.average = total / float(len(self.lines))
 
    def output(self):
        """Return a string with the useless lines filtered out."""
        self.compute_density()
        output = StringIO.StringIO()
        for l in self.lines:
            # Check density against threshold.
            # Custom filter extensions go here.
            if l.density > 0.5:
	        output.write(l.text)
	return output.getvalue()

This rough filter typically gets most of the lines right. All the headers, footers and sidebars text is usually stripped as long as it��s not too long. However, if there are long copyright notices, comments, or descriptions of other stories, then those are output too. Also, if there are short lines around inline graphics or adverts within the text, these are not output.

To fix this, we need a more complex filtering heuristic. But instead of spending days working out the logic manually, we��ll just grab loads of information about each line and use machine learning to find patterns for us.

Supervised Machine Learning

Here��s an example of an interface for tagging lines of text as content or not:

The idea of supervised learning is to provide examples for an algorithm to learn from. In our case, we give it a set documents that were tagged by humans, so we know which line must be output and which line must be filtered out. For this we��ll use a simple neural network known as the perceptron. It takes floating point inputs and filters the information through weighted connections between ��neurons�� and outputs another floating point number. Roughly speaking, the number of neurons and layers affects the ability to approximate functions precisely; we��ll use both single-layer perceptrons (SLP) and multi-layer perceptrons (MLP) for prototyping.

To get the neural network to learn, we need to gather some data. This is where the earlier LineWriter.output() function comes in handy; it gives us a central point to process all the lines at once, and make a global decision which lines to output. Starting with intuition and experimenting a bit, we discover that the following data is useful to decide how to filter a line:

Density of the current line.
Number of HTML bytes of the line.
Length of output text for this line.
These three values for the previous line,
�� and the same for the next line.

For the implementation, we��ll be using Python to interface with FANN, the Fast Artificial Neural Network Library. The essence of the learning code goes like this:

from pyfann import fann, libfann
 
# This creates a new single-layer perceptron with 1 output and 3 inputs.
obj = libfann.fann_create_standard_array(2, (3, 1))
ann = fann.fann_class(obj)
 
# Load the data we described above.
patterns = fann.read_train_from_file('training.txt')
ann.train_on_data(patterns, 1000, 1, 0.0)
 
# Then test it with different data.
for datin, datout in validation_data:
    result = ann.run(datin)
    print 'Got:', result, ' Expected:', datout

Trying out different data and different network structures is a rather mechanical process. Don��t have too many neurons or you may train too well for the set of documents you have (overfitting), and conversely try to have enough to solve the problem well. Here are the results, varying the number of lines used (1L-3L) and the number of attributes per line (1A-3A):

The interesting thing to note is that 0.5 is already a pretty good guess at a fixed threshold (see first set of columns). The learning algorithm cannot find much better solution for comparing the density alone (1 Attribute in the second column). With 3 Attributes, the next SLP does better overall, though it gets more false negatives. Using multiple lines also increases the performance of the single layer perceptron (fourth set of columns). And finally, using a more complex neural network structure works best overall �� making 80% less errors in filtering the lines.

Note that you can tweak how the error is calculated if you want to punish false positives more than false negatives.

Conclusion

Extracting text from arbitrary HTML files doesn��t necessarily require scraping the file with custom code. You can use statistics to get pretty amazing results, and machine learning to get even better. By tweaking the threshold, you can avoid the worst false positive that pollute your text output. But it��s not so bad in practice; where the neural network makes mistakes, even humans have trouble classifying those lines as ��content�� or not.

Now all you have to figure out is what to do with that clean text content!

�ȼ��

jdilt — 2007/8/25 17:39:54

1��ȽϹ��
        ��Ź��ڵ��ķ�չ��Խ��Խ��û��ʼ��Ϲ����TaoBao��Ebay��ڻ��кܶ��С�͵�B to C רҵ��վ��ڶ��վ�У��λ��ң��Ҫ��½ÿ��վ��鿴�Լ��Ҫ��Ʒ�أ��˱ȽϹ��룬��ν�ȽϹ��Ϊ��ṩ�Ӷ��վ�н��Ʒ�۸��վ��﷽��Եȷ��ıȽ��ϣ��űȽϹ��վ�ķ�չ��ò��Ϊ��ṩ��㣬ҲΪ��ƹ��Ʒ�ṩ�˻��ᣬʵ��Ҳ�͵��һ��ˣ��Ҵ��Ϲ��Ҫ��ӱȽϹ��վ��õ��ͨ��õ��Ϣ��Ӽ��У��ϢҲ��ȫ�棨��Щ�ȽϹ��վ��˲�Ʒ��ֵ��Ϣ֮�⣬��˰��̵��۵ȣ��ǱȽϹ��վ��𽥷�չ�ݱ�Ϊ��档

2��
        2004��3��26�գ��IT��վ��Ż��5.75��Ԫ�չ�ŷ�޵�һ��ȽϹ��վKelkoo��Ϣ��ʹ�ñȽϹ��һ��ϡ��ҵģʽ��˹㷺��ע��ż��ȽϹ��վ��̵��Ͳ�Ʒ��ӣ��ȽϹ��վ�Ѿ��Ƶ��Ϊ�û��ѯ��Ʒ��Ϣ�Ĺ��ߣ�Ϊ�ƶ��ṩ֧�֣��һЩ��վ��ʼ�𽥷��ȽϹ��һ�ʣ��ĳ�Ϊ��档

����в��֪��ıȽϹ��վ��ֻ�ǽ�һ��Щ�ȽϹ��վ�ſ�ʼʹ�ù��һ����1996��ȽϹ��վBizRate.com��Ҳ��Լ��ʹ��Ϊȫ��õĹ��档Yahoo��googleҲ�ֱ��Ƴ��Լ��Ĺ��棨http://shopping.yahoo.com��http://froogle.google.com��Ҳ�ǻ��е��漼��ڹ��ڣ��2004��8848��վ��·��һ�ʲ��ϳ��ڹ��ڸ��ý��С��ʵ��ν�Ĺ��ԱȽϹ��ģʽΪ��ķ��

��ϻ��ҳ��е��Ǹ��е��Դ��վ��ƣ��ļ��Ҳ��ڱ��¼��Ϲ��վ��û��ĳ��Ʒʱ��۸��Ʒ��վ�ϵĲ�Ʒ��¼��ᱻ��û��Ը��ݲ�Ʒ�۸񡢶��վ��κ�ƫ�õ��ؽ��ѡ��Ϲ��վ��Ʒ��һ��˵��汾��Щ��Ʒ��

��һ��ҳ��ȵ��Ҫ��ڣ��Ʒ��˽��Ʒ˵��Ȼ��Ϣ֮�⣬ͨ��Խ��Ʒ�۸�۸�Ƚϡ��ҿ��ԶԲ�Ʒ��̵��Щ��Ƚ��ָ��û��һ��Ӱ�죬��֪��Ȳ��Ǻܸߵ��̣�ͨ��棬��˱��û��ֵĻ��ᣬ��нϺõ��Ҳ��ӹ˿͵��Ρ��BizRateΪ��û��ö��ַ�ʽ��м��Ʒ��ơ�Ʒ��վ��Ƶȣ��û��ԶԲ�Ʒ��ȣ��Է��Լ��Щ��ϢҲ��Ա��û��ο��˵��û�ʹ�ù��Ʒʱ��Ի�ñȽϷḻ��Ϣ��ƶ��Ʒ��нϴ�Ĳο��ֵ��Ҳ��һ��Ƕ�˵��̵��ù��ƹ��ӱ��û��ֵĻ��ᣬ�Ӷ��ﵽ��Ŀ�ģ��Ϊ��۵�һ�ֳ��ô��ֶΡ�

3��վ

��⣺

��1��BizRate.com
��еıȽϹ��վ�У��1996��BizRate.com �ǽ��磬Ҳ�ǽ�Ϊ�ɹ��һ��2004��Ѿ�ӵ�г��40000��߹��վ��3000��Ʒ��ʵ��Ѿ��Ϊ��һ��Ż��

��2��Shopping.com
��Shopping.com��1999�꣬��ɣ�һ��ǹ��DealTime��һ��ۺ��ƽ̨Epinions��2003��4��ֶ��ϲ��Shopping.com��վ�С�Shopping.com��߹��ϢΪ��ݣ��ḻ�Ĳ�Ʒ��Ϣ��̵��Ϣ��Ʒ��Ϣ��Ʒ��ۡ��ָ�ϡ��ƷͼƬ��ϸ��ȣ��̵��Ϣ��ֲ�Ʒ�ļ۸��̵��Ǽ��û��ۡ��Żݾ��Թ˿��м�ֵ��Ϣ��̵��¼Shopping.com��õ��ھ��۹��ķ�ʽ��200��Ԫ��Ԥ��𣬸��û��Ʒ��Ϣ�ĵ��ȡ��á�

��3��Kelkoo.com
��Ż��չ��ŷ��ıȽϹ��վ��Kelkoo��1999�꣬��˾�ܲ��ڰ��裬Ŀǰ�Ѿ��ͨ��10��ŷ�޹��ҵıȽϹ��񣬴�Լ��ŷ��ȫ��û��10��ʹ��Kelkoo�ķ��Kelkoo��ҵģʽ��Shopping.com��ƣ��̵��¼��Ʒ��ѵģ��Ҫ��û��շѡ�

         (4) PriceGrabber.com
        PriceGrabber.com��Ŀǰ�ȽϹ��վ��ľ�ͷ��һ��רҵ��Ĵ�ֱ��棬ͨ��Ժ��Ϣ�Ĳɼ��ṩ��׼��Ʒ��̼��Ѷ��ͬʱ��Ϊ�̼��ṩһ��Ч��Ӫ��ƽ̨��̼��Լ��͵��ƹ�ɱ��ô��Լ��ߵ�Ŀ��û��ù�˾��չ�ܿ죬��05��С�

         (5) Shopzilla.com
        ��1996�꣬��Ҫ��Ӣ��£�Ҳ��05��У��PriceGrabber֮��

��ڣ�

         ��ڵıȽϹ��Ӧ��Ǹո��𲽣��ܶ๫˾��һ��ļ��ȱ��ʽ��Ҫ��Ͷ��ͬʱ�ƹ��Ȳ��Ǻܴ��ڶ��վ�У��û��֪��ôһ��վ�Ĵ��ڡ�

         (1) Gobygo.com
         ��𲽲��ã��Ҫ��Ҫ��վ��ע��Ʒ�ƹ㣬��֪�Ƿ��Google Base��ơ�

         (2) Qunar.com
         רҵ�Ļ�Ʊ��Ƶ�Ƚ��վ��Ʊ�Ͷ��Ƶ��и��ˣ��˵Ļ�Ʊ�ˣ�ͬʱҲ�ṩ��Ϣ��ٶȲ��졣

         (3) ��о��ǹ��Ϣ��վ

         ��Щ�ȽϹ��վ��䵱һ��н�Ľ�ɫ��û��Ƚ��ֱ��ӵ�ѡ��Ʒ��ڵĵ��վ��ɹ����Щ��վ֪��ȵ��ߣ��վ�Զ��ͻ��Ҫ��Ƚ��վ��ȡ��ֵ��񡣿��Կ��ıȽ��ȹ��ڳ��죬��Ѿ��Լ��ӯ��ģʽ��ϣ��õĽ��Ҳ��ø�רҵ��Ͼ��˵��һ��ѡ�ĳ��

the zen of css design < css��ֻ��>

jdilt — 2007/8/24 18:49:15

��һЩ��HTMLԪ�ر��䲻��ص�ҳ�沼�֡��

��д�Ϸ��ġ��ýṹ�Ĳ�ӵ��HTML�ĵ��ڴ˼�̻��Ӧ�ø��css��ʽ

��õı�ǣ�

1��ѡ��DOCTYPE��û��֪��HTML��Ե��

Example:

2��ָ��Ժ��ַ��

Examples:

Sets the document's XML language, in this case the ISO code for English, en.

Assigns a character set to the document, in this case UTF-8.

3��ָ��

ÿ��HTMLҳ�涼��Ҫһ��ҳ��ݵ�<>Ԫ�ء��ڷÿ��ղ�ҳ��ʱ��title��Ϊ��ӵ��浽��С��title�а��Ĺؼ��ָ��Ȥ��õ�title��ҳ��е��վ��ÿ��ҳ�涼Ӧ��һ��Ψһ�ı��⣬��Ӧ�ü��Եظ��ҳ��ݣ��Ƿ��ؽ��վ�㡣

Example:

css Zen Garden: The Beauty in CSS Design

4��ʹ��ʵ��ı�ǩ

��ĵ��ݽṹѡ��HTMLԪ��ĵ��Ǹ��ʽѡ��

�磺��pԪ�ذ��ֶ��䣬��Ϊ�˻��У��blockquote��õ��֣��Ϊ�˵õ��Ҳ��ʵĿ��ṹ�ı�ǩʱ��Կ��ʹ��ͨ�õ�div��spanԪ�ء�

Example:

The Road to Enlightenment

and not:

The Road to Enlightenment

            5��������ʹ��div��span

            6���������ٵ�ʹ�ñ�ǩ

            7���ʵ���ʹ��class��id

FreeBSD 6.2��װ��滷��

jdilt — 2007/8/21 21:28:53

��1�� handbook �ֲᣬ�鿴��Ļ��Ƿ��ʺϰ�װ FreeBSD��
2�� FreeBSD-Install ��̷��Ļ��Ĺ��У��Դ��ڹ��FreeBSD��˵��а� 1 ��밲װ��ʱϵͳ�� Country Selection��ѡ��˵��
��ѡ��45 China��
�� System Console Keymap��ѡ��USA ISO��(US ISO keymap)��
��ͽ��ϵͳ��װ��˵�(sysinstall Main Menu)��FreeBSD�İ�װ��ȫ��ã�ʹ��¼�ͷ�ƶ��ͬ��ѡ�
Usage
   Standard
   Express
   Custom
   Configure
   Doc
   Keymap
   Options
   Fixit
   Upgrade
   Load Config
   Index
��ﰴ�¿ո��߻س��ɽ��Ӧ�Ĳ˵�ѡ���
��ѡ��Custom��Զ��尲װ�� Custom �˵��
��1 Exit��    // �˳��ص��һ��˵�״̬
��2 Options�� //��View/Set various installation options�鿴/��ð�װѡ�
��3 Partition�� //��Allocate disk space for FreeBSD��ϵͳ��
��4 Label�� //��Label allocated disk partitions��ϵͳ��
��5 Distributions�� //��Select distribution(s) to extractѡ��װ��ݡ�
��6 Media�� //��Choose the installation media typeѡ��װ��ʡ�
��7 Commit��  //��Perform any pending Partition/Label/Extract actions ��а�װѡ��ϣ��ύ��İ�װ��̡�

��ѡ��3 Partition��ϵͳ��ˣ��FreeBSD�ġ�FDISK Partition Editor��
��һ��̷��Ľ��棬��ֶ��ԣ��ǿ�ҽ��ʹ�� "A" ѡ���ϵͳ�Զ�Ϊ�㴴��Ȼ��ѡ�� Q��˳� ��ϵͳ��档��˳��ϵͳ��ʱ��ϵͳ��ʾ��Ƿ񴴽��ϵͳ��˼��һ��ʣ��ṩ��ѡ�
�ڡ�Install Boot Manager for drive ad0?��а�װBootMgr��
   ��һѡ�BootMgr��Install the FreeBSD Boot Manager��FreeBSD��Ϊ��ˡ�
   ��밲װ��MBR��ѡ��Standard��Install a standard MBR (no boot manager)��(��Ƽ�)
   ��Ѿ��ѡ��None��Leave the Master Boot Record untouched��

��ڽ��÷��֮��ص��Choose Custom Installation Options�Զ��尲װѡ��˵��ѡ��4 Lable��ϵͳ��FreeBSD Disklabel Editor��Ǹոս��ķ��Ƿ��һ��һ��ѡ��A auto DEfaults��ɣ��FreeBSD�Զ��ã��־�ѡ�Զ��ɣ�Q��档��ɺ�һ��ܵĴ��̷��Ϣ��£��ĳ��ǰ�˵Ĵ��̷��ҽ��ˣ��
Part Mount Size Newfs Part
   ad0s2a / 512MB UFS2 Y
   ad0s2b swap 166MB SWAP
   ad0s2d /var 1107MB UFS2+S Y
   ad0s2e /tmp 512MB UFS2+S Y
   ad0s2f /usr 2698MB UFS2+S Y

�� ѡ��5 Distributions��װFreeBSD��ݣ�
ѡ��˵��кܶ�ѡ��
Exit��
All��
Reset��
4 Developer��
5 X-Developer��
6 Kern-Developer��
7 X-Kern-Developer��
8 User��
9 X-User��
A Minimal��
B Custom
ʹ��¼�ͷ�ƶ��ͬ��ѡ���¿ո��߻س�ѡ��Ҫ��X��һ��Ҫ��Xorgѡ�в��ȫ��װ���ҽ��ѡ��A Minimal��С��װ��Ȼ�󷵻ء�Choose Custom Installation Options��ѡ��B Custom��ѡ���ѡALL��س��һ��ѡ��YES��װPorts(��һ��Ҫ��װ)��
Ȼ��ѡ�� Exit ��˳��ص��һ��˵��
�� ѡ6 ��밲װ��ѡ����ǵ�Ȼѡ�� CD-ROM ��

�� ڣ��ѡ�� 2 Options�� Ǹղŵ��趨��û��⣬��Ǿ�Ҫ��ʼ��ʵ�ʵİ�װ��ˡ��˳��ϼ��˵��
�� ѡ��7 Commit��OK��ϵͳ��ʾ��User confirmation Requested��Ի��ȷ��Ƿ�ʼ��װ��(��һ�λ��)��ʲô�ط��д��󣬿��ѡ��No��ͷ��á��û��⣬ѡ��yes��ʼ��װ�ɡ�
��ǰ�װ��ˡ��װ��ʾ��Ƿ��Ҫ��ò˵��ı��(Visit the general configuration menu for a chance to set any last options?)��ѡ��Ĭ�ϵġ�No��ɰ�װ��̡�
ѡ��X Exit��Exit this menu (returning to previous)��ص��sysinstall main Menu ��װ��˵��
��ڡ�sysinstall Main Menu��ѡ��[X ExitInstall]�� -> ��Yes��(Are you sure you wish to exit? The system will reboot (be sure to remove any floppies/CDs/DVDs from the drives). ȷ��˳��)��

��Ĳ��裬��Ѿ��װ��һ��С��ϵͳ��ϵͳ��ǵ�Ҫ��Զ��ż��ǽ��
��ϵͳ��ǰ�氲װ�ˡ�BootMgr��ʱ��һ��˵����ǽ��ֻ��һ�� FreeBSD ϵͳ��ϵͳ��ʾ��
��F1 FreeBSD
��һ��ϵͳ��Զ��freeBSD��չ��ǰ��ȴ�һ�ᣬ�ͻ��Login��ʾ��root��س��Ϥ��#��ֳ��ǰ�ˣ��Ϊ��װʱû��룩��
��棬��ǾͿ�ʼ��װͼ�ν��棬�� Gnome Ϊ��KDE �İ�װ����ȥ��
��Ҫ�� sysinstall��ǲ��ֵ��Ϥ�Ĳ˵��ˡ�^_^.
   ѡ�� Configure�� Configure ��ò˵��
X Exit
   Distributions
   Packages
   Root Password
   Fdisk
   Label
   User Management
   Console
   Time Zone
   Media
   Mouse
   Networking
   Security
   Startup
   TTYs
   Options
   HTML Docs
   Load KLD

��棬��ǿ�ʼ��һЩ FreeBSD �Ļ��á�
��ȣ�Supper User �Ŀ��ǲ��ȫ�ģ��ѡ�� Root Passward��ROOT��룬һ��Ҫ�ģ��
��趨ʱ��(Time Zone)��ѯ��Ƿ��UTCʱ��(��ʱ��)��ھ��󲿷��˵��ȻѡNO�ˣ��Ȼ��UTCʱ�䣬��ϵͳ�ͻ��Լ�ѡ��ѡ��򣬵�Ȼ��5��Asia��ѡ��ʱ��ˣ�9��й��Ȼ��ѡ1��ʱ��east China�ˣ��س��ϵͳ�ٴ�ѯ�ʣ��Ƿ��ڸ�ʱ��ѡ(Yes)��
��趨(Mouse) ��һ��ѡ��2 Enable��ϵͳ��Զ��ҵ��ġ�
��ܽ��롰Networking��(ѡ��У�
X Exit��Exit this menu (returning to previous)��
Interfaces��Configure additional network interfaces��
AMD��This machinewants to run the auto-mounter service��
AMD Flags��Set flags to AMD service (if enabled)��
Anon FTP��This machine wishes to allow anonymous FTP��
Gateway��This machine will route packets between interfaces��
inetd��This machine wants to run the inet daemon��
Mail��This machine wants to run a Mail Transfer Agent��
NFS client��This machine will be an NFS client��
NFS server��This machine will be an NFS server��
Ntpdate��Select a clock-synchronization server��
PCNFSD��Run authentication server for clients with PC-NFS.��
rpcbind��RPC port mapping daemon (formerly portmapper)��
rpc.statd��NFS status monitoring daemon��
rpc.lockd��NFS file locking daemon��
Routed��Select routing daemon (default: routed)��
Rwhod��This machine wants to run the rwho daemon��
sshd��This machine wants to run the SSH daemon��
TCP Extensions��Allow RFC1323 and RFC1644 TCP extensions?��)��
��-> Interfaces (��趨) -> (��ʼ��趨֮ǰ,ϵͳ�Ὣ��еĿ��ͨѶ�ӿ��г��ͨ��ǵ�һ��λ�ã��ͺŻ��Ϊ��ͬ��иı�)��
��a.��ѡ��(lnc0 Lance/PCnet (Isolan/Novell NE2100/NE32-VL) ethernet)��
��b.�Ƿ��IPv6 (ѡ��No��) ��
��c.�Ƿ��DHCP(��̬DNS) ��ʹ�� ADSL ��ѡ�� DHCP Clent ��Ҳ��ҵ�ѡ��
��d.Ȼ��ֱ��һ�� Host ��domain��IPv4 Gateway��Name server��IPv4 Address��Netmask ��Ҫ��Ϣ�Ľ��棬��ʹ�õ� DHCP ��ʲôҲ��
��e.��ʾ��Would you like to bring the lnc0 interface up right now?��ʱѡ��Yes��
��ѡ��X Exit��˳��FreeBSD Configuration Menu��ò˵��

��뵽 Xorg �İ�װ��ã�rootȨ��ã�
1��װXorg
(1)ͨ��FreeBSD��̰�װ��Ƽ��
�� sysinstall MainMenu ��˵��ѡ�� -> Configure -> Distributions
   �� Distributions  �˵��ѡ�
   X Exit��Exit this menu (returning to previous)��
   All��All system sources, Binaries and X Window System��
   Reset��Reset all of the below��
   base��Binary base distribution (required)��
   kernels��Binary kernel distributions (required)��
   dict��Spelling checker dictionary files��
   doc��Miscellaneous FreeBSD online docs��
   games��Games (non-commercial)��
   info��GNU info files��
   man��System manual pages - recommended��
   catman��Preformatted system manual pages��
   proflibs��Profiled versions of the libraries��    src��Sources for everything��
   ports��The FreeBSD Ports collection��
   local��Local additions collection��
   X.Org��The X.Org distribution��)
��ѡ�� X.Org ��뵽 X.org �Ĳ˵�ѡ���У��ǰ� Basic��Server��Fonts �е��ȫѡ��Ȼ�� OK��ȷ��˳��
   ѡ�� ports ��ǿ�ҽ��鰲װ��FreeBSD ��кܶ��ͨ�� porst ��ģ��Ժ�� FreeBSD ��Ϥ��Ҫ��װ�ܶ��ʱ��ѡ��װ�� ports ��һ��Ӣ��ľٶ��
�� OK��ȷ�ϣ� �ӡ�Distributions��˻ء�Configuration��лᰲװ�ղ�ѡ�е��

(2)��밲װ
# cd /usr/ports/x11/xorg
# make install clean

(3)��Package��װ
# pkg_add -r xorg

��װ��ɺ�� Exit �� sysinstall -> X Exit Install��
��ˣ��ˣ�� reboot һ�¡�

��ڣ��ٴν��ϵͳ��

��ˣ��ڣ��ǿ�ʼ�� ADSL ��ã�
(1)�༭ /etc/ppp/ppp.conf
      default:
      set log Phase tun command
      adsl:
      set device PPPoE:rl0
      set authname ��ʺ�
   set authkey ��
   set dial
      set login
      add default HISADDR
      enable dns
   (2)�κ� # ppp -ddial adsl
   (3)��Զ��
   �� /etc/rc.conf,��ݣ�
   #Auto dial ADSL at startup
      ppp_enable="YES"
      ppp_mode="ddial"
      ppp_nat="YES"
      ppp_profile="adsl"

��һ�� xorg.conf �ļ��ļ��ķֱ��ʡ�ˢ��ʡ��ܡ�
��Xorg ��

(1)��/root��xorg.conf.new
# Xorg -configure
�� root �û��ĸ�Ŀ¼��һ�� Xorg ��ļ� xorg.conf.new

(2)��ļ��ȷ��Xorg�ܹ��ϵͳ�ϵ��Կ��
# Xorg -config xorg.conf.new
��ʾ�ڻҵĸ��Ӻ�"X"��ָ�룬˵��óɹ��Ctrl+Alt+Backspace��˳��Խ��档��ò��ɹ�Ҳû��ϵ��˵��xorg.conf.new��е��
(3)��xorg.conf.new�ļ��ԣ�
��ʾ��ˢ��ʣ��뵽xorg.conf.new��"Monitor"С��У�
Section "Monitor"
Identifier "Monitor0"
VendorName "Monitor Vendor"
ModelName "Monitor Model"
Horizsync 31.5-99.0 #��Լ��ʾ��
VertRefresh 50.0-90.0 #��Լ��ʾ��
EndSection

��ʾ��ֱ��ʼ�ɫ��޸�xorg.conf.new��"Screen"С�ڡ�

Section "Screen"
Identifier "Screen0"
Device "Card0"
Monitor "Monitor0"
DefaultDepth 24 #��Լ��ʾ��С��8��ͬ
SubSection "Display"
Viewport 0 0
Depth 24 #��Լ��ʾ��
Modes "800x600" #��Լ��ʾ��
EndSubSection
EndSection

��м��֧��
��6.0֮ǰ��ѡ��Ҫ�Լ��ã�6.1�У��ʹ��Xorg -configure��xorg.conf.new��ô�ù��ܲ��Ҫ�Լ��ã��޸�xorg.conf.new�е�"InputDevice"С�ڡ�

Section "InputDevice"
Identifier "Mouse0"
Driver "mouse"
Option "rotocol" "auto"
Option "Device" "/dev/sysmouse"
Option "ZAxisMapping" "4 5 6 7" #û��һ��ֶ��
EndSection

��޸Ĵ��̺��ٽ��в��ԣ��û��⽫xorg.conf.new��Ϊxorg.conf�󿽵��Ŀ¼��/etc/X11/��
# cp xorg.conf.new /etc/X11/xorg.conf
��ɲ鿴/var/log/xorg.0.log Ȼ��ʾ�޸ġ�

ע�⣺�� vmware 5.x �ϰ�װ��ô��ϵͳ��ļ��ݣ��Ǿ��ҷ��Գɹ��ļ� xorg.conf ��ݣ��

Section "ServerLayout"
      Identifier    "X.org Configured"
      Screen    0  "Screen0" 0 0
      InputDevice "Mouse0" "CorePointer"
      InputDevice "Keyboard0" "CoreKeyboard"
EndSection

Section "Files"
      RgbPath    "/usr/X11R6/lib/X11/rgb"
      ModulePath "/usr/X11R6/lib/modules"
      FontPath    "/usr/X11R6/lib/X11/fonts/misc/"
      FontPath    "/usr/X11R6/lib/X11/fonts/TTF/"
      FontPath    "/usr/X11R6/lib/X11/fonts/Type1/"
      FontPath    "/usr/X11R6/lib/X11/fonts/CID/"
      FontPath    "/usr/X11R6/lib/X11/fonts/75dpi/"
      FontPath    "/usr/X11R6/lib/X11/fonts/100dpi/"
EndSection

Section "Module"
      Load  "dbe"
      Load  "dri"
      Load  "extmod"
      Load  "glx"
      Load  "record"
      Load  "xtrap"
      Load  "freetype"
      Load  "type1"
EndSection

Section "InputDevice"
      Identifier  "Keyboard0"
      Driver    "kbd"
EndSection

Section "InputDevice"
      Identifier  "Mouse0"
      Driver    "mouse"
      Option          "Protocol" "auto"
      Option          "Device" "/dev/sysmouse"
      Option          "ZAxisMapping" "4 5 6 7"
EndSection

Section "Device"
      ### Available Driver options are:-
      ### Values: : integer, : float, : "True"/"False",
      ### : "String", : " Hz/kHz/MHz"
      ### [arg]: arg optional
      #Option    "HWcursor"                # []
      #Option    "NoAccel"                   # []
      Identifier  "Card0"
      Driver    "vmware"
      VendorName  "VMware Inc"
      BoardName "[VMware SVGA II] PCI Display Adapter"
      BusID    "PCI:0:15:0"
EndSection

Section "Screen"
      Identifier "Screen0"
      Device    "Card0"
      Monitor "vmware"
      SubSection "Display"
            Viewport 0 0
            Depth    1
            Modes 800x600
      EndSubSection
      SubSection "Display"
            Viewport 0 0
            Depth    4
            Modes 800x600
      EndSubSection
      SubSection "Display"
            Viewport 0 0
            Depth    8
            Modes 800x600
      EndSubSection
      SubSection "Display"
            Viewport 0 0
            Depth    15
            Modes    800x600
      EndSubSection
      SubSection "Display"
            Viewport 0 0
            Depth    16
            Modes    800x600
      EndSubSection
      SubSection "Display"
            Viewport 0 0
            Depth    24
            Modes 800x600
      EndSubSection
EndSection

# VMware SVGA

Section "Monitor"
Identifier  "vmware"
VendorName "VMware, Inc"
HorizSync 1-10000
VertRefresh 1-10000

ModeLine "640x480" 100 640 700 800 900 480 500 600 700
ModeLine "800x600" 100 800 900 1000 1100 600 700 800 900
ModeLine "1024x768" 100 1024 1100 1200 1300 768 800 900 1000
ModeLine "1152x864" 100 1152 1200 1300 1400 864 900 1000 1100
ModeLine "1152x900" 100 1152 1200 1300 1400 900 1000 1100 1200
ModeLine "1280x1024" 100 1280 1300 1400 1500 1024 1100 1200 1300
ModeLine "1376x1032" 100 1376 1400 1500 1600 1032 1100 1200 1300
ModeLine "1600x1200" 100 1600 1700 1800 1900 1200 1300 1400 1500
ModeLine "2364x1773" 100 2364 2400 2500 2600 1773 1800 1900 2000
EndSection

Section "Device"
Identifier  "VMware SVGA"
Driver       "vmware"

EndSection

��뵽����ĵĹ��̡�>��滷��İ�װ��ã�Gnome2.6.12�� (rootȨ��)

1��Gnome��װ
��ٴ�� sysinstall �� sysinstall Main Menu ��װ��˵��
��sysinstall -> Configure -> Packages -> 1 CD/DVD -> gnome -> ѡ��gnome2-2.6.12��ʱ��ѡ��Զ��(��ʣ��xchat2-2.6.1_1��Ҳ��ѡ��) -> ��Tab��ѡOK�ٻس��ء�Package Selection��
Ȼ��ѡ�� linux�� linux basic ��Ҳװ�ϣ��Ժ��кܶ�� linux Ӧ��Ҫ��֧�֣��
�ڡ�Package Selection��ڰ�Tab��ѡ��Install��ʼ��װ��
�ӡ�Distributions��˻ء�Configuration��лᰲװ�ղ�ѡ�е��ʱ��װʱ��ϳ��ص��FreeBSD configuration Menu�� -> ��˳�sysinstall��

2�� gnome ��ļ��
vi .xinitrc
   ��е��һ�м��
"exec  /usr/local/bin/gnome-session"
��ע�⣬��ʹ��KDE��"exec /usr/local/bin/startkde"��

3��startx ��棬��ʱ��Gnome��Ӣ�ĵġ�

4��ı��ػ��á�
(1)��/etc/login.conf��󣬼��²��
#
#Chinese Users Accounts.
#
chinese|Chinese Users Accounts:\
:charset=eucCN:\
:lang=zh_CN.eucCN:\
:tc=default:

(2)ִ��
#cap_mkdb /etc/login.conf

(3)��vipw�޸ĵ��ͣ��Ӧ��û��޸ģ��޸ĸ�ʽ��ݣ�
root1$lOOD78Dm$oSG5u21RGrXoC.TTJ3nCs.:0:0:chinese:0:0:Charlie &:/root:/bin/csh
�ؼ��Ӧλ�ü��"chinese","chinese"�Ǻ͵�1��ö�Ӧ�ġ�
��˳��

(4)��µ��ִ��startx��Gnome��ȫ��ĵ��ˡ�
----------------------------------------------------------------------------------------
��棬��ǽ�ʹ��Windows�µ�Simsun��
�� /usr/X11R6/lib/X11/fonts/�½��һ��"TrueType"Ŀ¼��Windows�µ�simsun.ttc��Ϊsimsun.ttf�󿽱�� /usr/X11R6/lib/X11/fonts/TrueType��ؽ��建�� fc-cache -fv
��Ȼ��GNOME��壬��ϸ��е��ƽ��ѡ�ޣ��΢��ѡ��΢��˳��ѡRGB��

��ڿ��ͼ�ε�¼��¼ ��ѡ�� GDM ��ʽ��
�޸�/etc/rc.conf,��
gdm_enable="YES"
��Ӧ�þ��һ��Ư��ϵͳ�ˡ�

��ã��ǾͿ��𣿣��У��ǻ�Ҫ��뷨����ѡ�� scim ��뷨��

�� scim ��뷨��װ֮ǰ��ǻ��һ��ǵİ�װ��Ҫ�� ports ��ķ��ָ��Щ ports ��İ�װ��̽��޷��ܡ�

�޸� /etc/make.conf �ļ��е��󲿷ּ��䣺

MASTER_SITE_OVERRIDE= \
ftp://ftp.tw.FreeBSD.org/pub/FreeBSD/ports/distfiles/${DIST_SUBDIR} \
ftp://ftp4.tw.FreeBSD.org/pub/FreeBSD/ports/distfiles/${DIST_SUBDIR} \
ftp://ftp10.tw.FreeBSD.org/pub/FreeBSD/ports/distfiles/${DIST_SUBDIR} \
ftp://ftp13.tw.FreeBSD.org/pub/FreeBSD/ports/distfiles/${DIST_SUBDIR} \
ftp://ftp.jp.FreeBSD.org/pub/FreeBSD/ports/distfiles/${DIST_SUBDIR} \
ftp://ftp3.jp.FreeBSD.org/pub/FreeBSD/ports/distfiles/${DIST_SUBDIR} \
ftp://ftp5.jp.FreeBSD.org/pub/FreeBSD/ports/distfiles/${DIST_SUBDIR} \
ftp://ftp.jaist.ac.jp/pub/FreeBSD/ports/distfiles/${DIST_SUBDIR} \
ftp://ftp.freebsdchina.org/pub/FreeBSD/ports/distfiles/${DIST_SUBDIR}

��ϵ� ports ��ַ��Ҵ��̳��ģ��лǰ��Ĵ�ţ��˽�ķ��ף�

��濪ʼ��ǵ� scim ��뷨�İ�װ�� (rootȨ��)
(1)SCIM��װ
��װƴ��뷨
# cd /usr/ports/chinese/scim-pinyin
# make install clean
��װ��뷨
# /usr/ports/chinese/scim-tables
# make insall clean

(2)scim ��뷨�Ļ��
�ٲ鿴��ǰ��locale��ã�
locale
�ڲ鿴��ʹ�õ�shell:
echo $0 �� cat /etc/passwd
��ʹ�õ�Shell��bash��sh��
��༭ ~/.profile �ļ��е��²��
export LANG=zh_CN.eucCN
export LC_CTYPE=zh_CN.eucCN
export XMODIFIERS='@im=scim'
export GTK_IM_MODULE=scim

��ʹ�õ�Shell��csh��tcsh��
��༭ ~/.cshrc �ļ��м��²��
setenv LANG zh_CN.eucCN
setenv LC_CTYPE zh_CN.eucCN
setenv XMODIFIERS @im=scim
setenv GTK_IM_MODULE scim

(3)��~/.xinitrc�е� exec /usr/local/bin/gnome-session  ��֮ǰ��䣺

execl /usr/local/bin/scim -d &

��ע�⣺��X�²��ǵ��X��һ�㰴Ctrl+Alt+BackSpace
��ʹ�� gdm ��ϵͳ��ʱֱ�ӽ��뵽ͼ�ν��״̬�£��ô��ע�⣬gdm ��ȡ .xinitrc �ļ��ݣ��ʱ��ע��޸� ~/.profile �ļ��м��ݣ�
export LANG=zh_CN.eucCN
export LC_CTYPE=zh_CN.eucCN
export XMODIFIERS='@im=scim'
export GTK_IM_MODULE=scim
��

��棬��ǿ�ʼ��װ��
��򵥵ķ�ʽ��޸�/boot/defaults/loader.conf�ļ��еġ�Sound modules��֣��Ӧ��Load�ϼ��ɣ��Ϊ��޸ģ��ȱ��֪��Լ��ͣ��Ѷ��𣿱𼱣��
��ִ�У�
kldload snd_driver
#��Ǹ� meta ��һ�μ����豸�� ȷ��ٶȡ�
Ȼ��dmesg | grep pcm

pcm0: ; port 0xe000-0xe03f,0xdc00-0xdcff irq 11 at device 31.5 on pci0
pcm0: ;

��״̬��ͨ�� /dev/sndstat �ļ��ѯ��

# cat /dev/sndstat
FreeBSD Audio Driver (newpcm)
Installed devices:
pcm0: ; at io 0xdc00, 0xe000 irq 11 bufsz 16384 kld snd_ich (1p/1r/0v channels duplex default)

˵��Ϊ��snd_ich

#vi /boot/defaults/loader.conf
�޸ģ�snd_ich_load="NO" #intel Tch  ��һ�У�
�ĳɣ�snd_ich_load="YES" #intel Tch

��磬�ҵ��ִ�� # cat /dev/sndstat ��ʾΪ snd_es173x
��ҵ� /boot/defaults/loader.conf �ļ��е� ##############################################################
###  Sound modules  ##########################################
##############################################################
С�ڣ��С��У��һ�� snd_es173x_load = "NO" ��޸ĳ� snd_es173x_load = "YES"
(ע�⣺��С�ڣ�û�з��ͣ�Ҳ��ֹ��Լ��һ�У��Ϊ
��_load = "YES" )

��棬��Ѿ��ʹ��ˣ��𣿲��У��ǽ��
��Ҫ�ڿ��̨��ʵ��Ļ��
�� /usr/ports/chinese/cce Ŀ¼�� make install clean ��а�װ��
��װ��ɺ�� .cshrc �ļ��ݣ�
alias vi 'env LC_CTYPE=en_US.ISO8859-1 vi'
��Ϊ��ʹ�� vi ʱ��ȷ��ʶ��ġ��
setenv LANG zh_CN.eucCN
setenv LC_CTYPE zh_CN.eucCN

��Ҫʹ�ú��ֻ��ʱ�� cce ��ֻ��˳��ֻ��ʱ�� exit ��

��棬��Ǹÿ�ʼ Gaim ��ʱͨѶ��ߵİ�װ�İ�װ�ˡ�
�� /usr/ports/net-im/gaim-openq Ŀ¼��Ŀ¼�� make install clean ��а�װ��

��˻�˵�ˣ�Gaim ��ʹ��ˣ��Ѷ�Ѿ��ѵ�¼��Э��ˣ� Gaim ��¼��ȥ�ˣ��ģ��ǵø�л FreeBSD ��Ǽ�ʱ��ṩ�� openq-2006 �Ĳ��

freebsd 6.2 ��װ��ñʼ�

jdilt — 2007/8/13 13:56:46

freebsd 6.2 ��װ��ñʼ�

#��С��װ��.
sysinstall,�Ŀ��,ʱ��,src-sys,ports,man,Xorg, bash3,cvsup-without-gui,lynx,linux,unzip
��,sshd��

++��sshd��
vi /etc/ssh/sshd_config ,��
PermitRootLogin yes
�˳��񼴿� /etc/rc.d/sshd restart

++csh��bash
chfn -s /usr/local/bin/bash username

++��Զ��gnome
echo "exec /usr/local/bin/gnome-session" >/root/.xinitrc
��/etc/rc.conf �� gdm_enable="YES"

###++��cvsup (�ڶ��ΰ�װʱû�и��cvsup,��ֻ��ϵͳ��)
###cd /usr/share/examples/cvsup
###�༭ports-supfile
###*default host=cvsup.FreeBSDChina.org
###cvsup -g -L 2 ports-supfile

++ʹ��wget/axel��port��
ͨ��޸�make.confʵ��
#FETCH_CMD=proz -s -k 5 �Cno-curses
FETCH_CMD= wget -c -t 1
DISABLE_SIZE=yes

++ѡ�� ports ��
�޸� /etc/make.conf �ļ��е��󲿷ּ��䣺
MASTER_SITE_OVERRIDE= \
ftp://ftp.FreeBSDChina.org/pub/FreeBSD/ports/distfiles/${DIST_SUBDIR} \
ftp://ftp.FreeBSDChina.org/pub/FreeBSD/ports/distfiles/${DIST_SUBDIR} \
ftp://ftp.cn.freebsd.org/pub/FreeBSD/ports/distfiles/${DIST_SUBDIR} \
ftp://ftp.tw.FreeBSD.org/pub/FreeBSD/ports/distfiles/${DIST_SUBDIR} \

++��װ��,pkg_add��make��ְ�װ��뿪��ն��װ
mkdir /usr/ports/distfiles/pkg
export PACKAGESITE=ftp://ftp.freebsdchina.org/pub/FreeBSD/ports/i386/packages-6.2-release/Latest/
export PKGDIR=/usr/ports/distfiles/pkg
#export PACKAGESITE=ftp://ftp.freebsdchina.org/pub/FreeBSD/ports/i386/packages-6-stable/Latest/
#pkg_add -K��Խ�tbzԴ��浽$PKGDIRĿ¼��,��Ҫʹ�ô˲��
pkg_add -rK wget prozilla gaim gaim-openq eva vsftpd gftp xpdf rdesktop stardict zh-stardict2-dict-zh_CN
compupic gthumb zh-fcitx xchm.tbz zh-unrar
firefox2(6-stableԴ��3.0�汾) fusefs-libs fusefs-kmod

++��װ��ý��
��İ�,��װ
http://ftp.br.freebsd.org/local/packages/audio/lame-3.97_1.tbz
ftp://ftp.nsysu.edu.tw/FreeBSD/ports/i386/packages-5-stable/All/win32-codecs-3.1.0.p7_2,1.tbz
pkg_add -rK zh-xmms xmms-wma mplayer mplayer-fonts zh-mplayer-fonts kmplayer xine beep-media-player bmp-extra-plugins
aumix #��
++xmms��б��
��xmms��Ȼ��[��ѡ��]�C>[��],��playlist��ɣ�-misc-simsun-medium-r-normal-*-*-120-*-*-p-*-gb2312.1980-0",*-r-*

++��װ��
kldload snd_driver    #һ�μ����豸��
dmesg | grep pcm
cat /dev/sndstat,�õ��Ϣ��ʽ��
FreeBSD Audio Driver (newpcm)
Installed devices:
pcm0: ; at io 0xdc00, 0xe000 irq 11 bufsz 16384 kld snd_ich (1p/1r/0v channels duplex default)
˵��Ϊ��snd_ich
#vi /boot/defaults/loader.conf
�޸ģ�snd_ich_load="NO" #intel Tch ��һ�У�
�ĳɣ�snd_ich_load="YES" #intel Tch
(ע�⣺��С�ڣ�û�з��ͣ�Ҳ��ֹ��Լ��һ�У��Ϊ
��_load = "YES" )
��ֱ��ں˱��֧��
device sound
device snd_ich

++��ں�
/stand/sysinstall�C>Configure�C>Distributions�C>src�C>sys
��װ��Ӧ��/usr/src/sys�ļ��
/boot/kernel/kernel #�ں��ļ�
cd /usr/src/sys/i386/conf && cp GENERIC GENERIC.bak
vi GENERIC #��ĵ�ע��޸ļ��,��Ҫ��׼�#
#cpu  I486_CPU
#cpu  I586_CPU
cpu  I686_CPU
ident  NEWKER  \\��ĳ��ں˵��
options SC_DISABLE_REBOOT \\�ڿ��̨��CTRL+ALT+DEL��
\\��ǽ
options IPFIREWALL
options IPFIREWALL_VERBOSE
options IPFIREWALL_VERBOSE_LIMIT=5
options TCP_DROP_SYNFIN
\\�� apache2�е��
options ACCEPT_FILTER_DATA
options ACCEPT_FILTER_HTTP
# PCI Ethernet NICs. #��һ��ֵ��ͺ�ע�͵�,��ǰ��
device  miibus  # MII bus support \\��
/usr/sbin/config GENERIC #��ں�Դ��
cd ../compile/GENERIC
make depend && make && make install
��#make buildkernel KERNCONF=NEWKER
#make installkernel KERNCONF=NEWKER

++��÷��ǽ
/etc/rc.conf
\\��
firewall_enable="YES"
firewall_script="/etc/rc.firewall"
firewall_type="/etc/ipfw.rules" \\��Ƿ��ǽ�Զ��ű�
firewall_quiet="NO"
firewall_logging_enable="YES"
log_in_vain="NO"
tcp_drop_synfin="NO"
tcp_restrict_rst="YES"
icmp_drop_redirect="YES"
��˳�
vi /etc/ipfw.rules
\\ ��ע�� -q ǰ��Ҫ��һ��ո�
-q -f flush
-q add 00301 allow all from any to any via lo0
-q add 00302 check-state
-q add 00303 allow tcp from any to 10.72.255.131 53 out via vr0 setup keep-state  \\ 10.72.255.131 ��DNS��ַ,��Ҹ��ݱ��صĸ��
-q add 00400 allow udp from any to 10.72.255.131 53 out via vr0 keep-state    \\  vr0 ��,��Ҹ��ݼ��ĸ�,��¶��һ��
-q add 00500 allow tcp from any to any 80 in via vr0 setup keep-state
-q add 00900 allow tcp from any to any 25 out via vr0 setup keep-state
-q add 01200 allow tcp from any to any via vr0 setup keep-state uid root
-q add 01300 allow icmp from any to any in via vr0  keep-state
-q add 01400 allow tcp from any to any 21 in via vr0 setup keep-state
-q add 01500 allow tcp from any to me  21 in via vr0 setup limit src-addr 2
-q add 01600 allow tcp from any to any 22 in via vr0 setup keep-state
-q add 01800 allow tcp from any to me  22 in via vr0 setup limit src-addr 2
��˳�

++��
��xp��simsun.ttc tahoma.ttf��/usr/X11R6/lib/X11/fonts/TTF/,��չ��Ϊttf
fc-cache -fv

++��fcitx
cd /usr/ports/chinese/fcitx
make install clean
��~/.profile �ļ��м�� #��õ��bash
export LANG="zh_CN.eucCN"
export LC_CTYPE="zh_CN.eucCN"
export XMODIFIERS='@im=fcitx'

++vim��
cd /usr/ports/editors/vim && make install clean #��װvim
cp /usr/local/share/vim/vim70/vimrc_example.vim ~/.vimrc
�༭.vimrc,ע�͵�" set nocompatible��

++��װntfsд֧��
��װ Kernel source
�ֹ��
http://ftp.lv.freebsd.org/pub/FreeBSD/ports/packages/Latest/fusefs-ntfs.tbz
pkg_add path/fusefs-ntfs.tbz

++JAVA��װ
1.open http://www.sun.com/software/java2/download.html, download the SCSL Source file,jdk-1_5_0-src-scsl.zip and the SCSL Binaries file, jdk-1_5_0-bin-scsl.zip .
2.In addition, please download the patchset, bsd-jdk15-patches-3.tar.bz2, from http://www.eyesbeyond.com/freebsddom/java/jdk15.html.
3.manually fetch the J2SE SDK self-extracting file for the Linux platform (j2sdk-1_4_2_12-linux-i586.bin) from http://javashoplm.sun.com/ECom/docs/Welcome.jsp?StoreId=22&PartDetailId=j2sdk-1.4.2_12-oth-JPR&SiteId=JSC&TransactionId=noreg
4.Please place the downloaded file(s) in /usr/ports/distfiles
pkg_add -rK m4
pkg_add -rK zip

++vsftpd��
�༭ /usr/local/etc/vsftpd.conf��
listen=YES
ftp_username=ftp
local_enable=YES
anon_upload_enable=YES
anon_mkdir_write_enable=YES
write_enable=YES

++linux��fdisk��
pkg_add -rK linuxfdisk

===��===
++��豸
mount_cd9660 -C gbk /dev/acd0 /mnt/cdrom #��-C eucCNΪ��ʾ��ļ��
mount_msdosfs #fat
mount_ntfs     #ntfs

++��Թؼ��Ѱ ports
cd /usr/ports/
make search key=ldap
make search name=ldap #֪��

make fetch-recursive

VIָ��ժҪ

jdilt — 2007/8/9 16:42:46

һ. ��ƶ�ָ��
    01. h��ƶ�һ��ַ�
    02. l��ƶ�һ��ַ�
    03. j��ƶ�һ��
    04. k��ƶ�һ��

    05. 0��ƶ��ǰ��ǰ��
    06. $��ƶ��ǰ��
    07. ^��ƶ��ǰ�е�һ��ǿհ��ַ�

    08. b��ƶ��һ��ʵĵ�һ��ַ�
    09. w��ƶ��һ��ʵĵ�һ��ַ�
    10. e��ƶ��һ��ʵ��һ��ַ�

    11. H��ƶ��ǰҳ��һ��
    12. M��ƶ��ǰҳ�м��
    13. L��ƶ��ǰҳ��
    14. Ctrl + f��·�ҳ
    15. Ctrl + b��Ϸ�ҳ
    16. Ctrl + d��·��ҳ
    17. Ctrl + u��Ϸ��ҳ

    18. n-��ƶ�n��
    19. n+��ƶ�n��
    20. nG��ƶ��n�У�nΪ0��ʾ�ƶ��һ�У�
    21. fx��ƶ��x�ַ��
    22. Fx��ƶ��x�ַ��
    23. tx��ƶ��x�ַ�ǰ
    24. Tx��ƶ��x�ַ�ǰ
    25. ;��ظ��һ��f��t��
    26. ,��ظ��һ��f��t��
    27. /string��ƶ��string��
    28. ?string��ƶ��string��
    29. n��ظ��һ��/��?��
    30. N��ظ��һ��/��?��

    31. n(��ƶ�n��ӣ��.?!��֣�
    32. n)��ƶ�n��
    33. n{��ƶ�n��䣨��Կհ��л��֣�
    34. n}��ƶ�n��

��. �༭ָ��
    01. a��ӹ��λ�ú��濪ʼ׷��
    02. A��ӹ��к��濪ʼ׷��
    03. i��ӹ��λ��ǰ�濪ʼ��
    04. I��ӹ��ǰ�濪ʼ��
    05. o��ӹ��濪ʼ��һ��
    06. O��ӹ��濪ʼ��һ��

    07. x��ɾ��λ��ַ�
    08. r��滻��λ�õ��ַ��r��Ҫ�滻��ַ�
    09. R��滻��ڵ��У�ֱ��EscΪֹ
    10. s��ɾ��λ�õ��ַ��ģʽ
    11. S��ɾ��ڵ��У��ģʽ
    12. u��ָ��һ��޸�
    13. U��ָ��е��޸�

    14. d��ɾ��p��ƣ�c��޸ģ��ָ��ֱ��ǰ��ܹ��e��w��b��$��0��)��(��}��{��ϴ��ĳһ��Χ�ڵľ��
    15. p��ճ��
    16. D��ɾ��λ�õ��н��
    17. dd��ɾ��
    18. cc��ɾ��У��ģʽ
    19. yy��ƹ��
    20. v��ѡ��Χ

��. �˳�ָ��
    01. :q��û��κ��޸ĵ��˳�
    02. :q!��Ѿ��޸Ĳ��˳�
    03. :w��޸�
    04. :wq��޸Ĳ��˳�
    05. :x��޸Ĳ��˳�

VI��ռ�

jdilt — 2007/7/31 16:16:24
vi �Ĺ��ģʽ
��
��Vi �ڳ�ʼ��Ƚ��༭ģʽ��ʱ�û��һЩԤ�ȶ��İ��ƶ��ꡢɾ��֡��ƻ�ճ��ֵȡ��Щ��ͨ��ַ�� l ��ƶ��꣬�൱��Ҽ�ͷ��k ��ƶ��꣬�൱��¼�ͷ��ڱ༭ģʽ�£��û��һЩ��ⰴ��ѡ��֣�Ȼ�� ٽ��ɾ��ƵȲ��
��
��û��ڱ༭ģʽ�¼�� i, a, o ��֮�󣬿ɽ��ģʽ�� : �ɽ��ģʽ�� ڲ��ģʽ�£��û��ģ�� Esc ֮��κ��ַ��ǲ��뵽�༭�� ַ�� Esc ֮�󣬴Ӳ��ģʽ�л��༭ģʽ��
��
��ģʽ��Vi ��ѹ��Ų��Ļ��·��ڵ�һ��ַ��λ��ʾһ�� :��ð�ţ��ʱ�� û��Ϳ��Լ��һЩ����Щ��ļ��ȡ�ļ��ݡ�ִ�� Shell ���� Vi ��ʽ�ķ�ʽ��ַ��滻�ַ��ȡ�
��
��༭ģʽ
��
��ƶ��
��
��Ҫ��ݽ��޸ģ��ȱ��ѹ��ƶ��ָ��λ�á��ƶ��򵥵ķ�ʽ�ǰ��̵��ϡ��¡��Ҽ�ͷ��ԭʼ�ķ��֮�⣬�û�� vi �ṩ��ڶ��ַ��ϼ��ƶ��꣬Ѹ�ٵ��ָ��л��У�ʵ�ֶ�λ��磺
��k��j��h��l ��ֱܷ��ͬ��ϡ��¡��Ҽ�ͷ��
��Ctrl+b ��ļ��ƶ�һҳ��൱�� PageUp ��
��Ctrl+f ��ļ��ƶ�һҳ��൱�� PageDown ��
��H ��Ƶ��Ļ��У�Highest��
��nH ��Ƶ��Ļ�ĵ� n ��
��2H ��Ƶ��Ļ�ĵ� 2 ��
��M ��Ƶ��Ļ��м䣨Middle��
��L ��Ƶ��Ļ��У�Lowest��
��nL ��Ƶ��Ļ�ĵ�� n ��
��3L ��Ƶ��Ļ�ĵ�� 3 ��
��w ��ָ��ƹ�꣬��һ��ֵĿ�ͷ
��e ��ָ��ƹ�꣬��һ��ֵ�ĩβ
��b ��ָ��ƹ�꣬��ǰһ��ֵĿ�ͷ
��0 ��0��ƹ�꣬��еĿ�ͷ
��$ ��ƹ�꣬��е�ĩβ
��^ �ƶ��꣬��еĵ�һ��ǿ��ַ�
��
��滻��ɾ��
��
��궨λ��ļ��ָ��λ�ú󣬿��ַ��滻��ָ��ַ��ӵ�ǰ��λ��ɾ��һ��ַ��磺
��rc �� c �滻��ָ��ĵ�ǰ�ַ�
��nrc �� c �滻��ָ��ǰ n ��ַ�
��5rc �� c �滻��ָ��ǰ 5 ��ַ�
��x ɾ��ָ��ĵ�ǰ�ַ�
��nx ɾ��ָ��ǰ n ��ַ�
��3x ɾ��ָ��ǰ 3 ��ַ�
��dw ɾ��Ҳ��
��ndw ɾ��Ҳ�� n ��
��3dw ɾ��Ҳ�� 3 ��
��db ɾ��
��ndb ɾ�� n ��
��5db ɾ�� 5 ��
��dd ɾ��У��ȥ��϶
��ndd ɾ�� n ��ݣ��ȥ��϶
��3dd ɾ�� 3 ��ݣ��ȥ��϶
��dd ɾ��ǰ��
��dn+��¼�ɾ��ǰ��Լ��n��
��dn+��ϼ�ɾ��ǰ��Լ��n��
��R�󣬿�ʼ��ݰ��ַ��滻�ı��ݣ�ֱ��ESC��
��
��ճ��͸��
��
��ɾ��ݣ��ַ��ֻ��У��û��ʧ��Ǳ��в��Ƶ��һ��ڴ滺��С��û��
��ճ��е�ָ��λ�á��һ��ǣ�
��p Сд��ĸ p��ճ��ĺ��
��P ��д��ĸ P��ճ��ǰ��
��ַ��֣�ֱ��ճ��ڹ��ǰ��棻��Ϊ��ģ��ճ��ڵ�ǰ��е��һ�л��һ�С�
��ע��ĸ�Ĵ�Сд��vi �༭��һ�Դ�Сд��ĸ�� p �� P��ṩһ��ƵĹ��ܡ�ͨ��Сд��ڹ��ĺ��в��д��ڹ��ǰ��в��
��ʱ��Ҫ��һ��ĵ��λ�ã�ͬʱ��ԭ��λ�õ��ݡ��£��Ӧ��ָ��ݸ��ƣ��Ǽ��У��ڴ滺��һ��ǣ�
��yy ��Ƶ�ǰ�е��ڴ滺��
��nyy �� n ��ݵ��ڴ滺��
��5yy �� 5 ��ݵ��ڴ滺��
��
��ַ��
��
��Ƚ��ı༭��һ��vi �ṩ��ǿ��ַ��ܡ�Ҫ��ļ��ָ��ֻ��ֵ�λ�ã�� vi ֱ�ӽ��ֹ��ʽ��С��ǣ��ַ� / ��Ҫ��ַ��Ȼ�󰴻س��༭��ִ��ļ�ĩβ��򣩣��ҵ�ָ��ַ��󣬽��ͣ��ַ��Ŀ�ͷ�� n ��Լ��ִ��ҳ��һ�ַ��´γ��ֵ�λ�á��ַ� ? ȡ�� / ��ʵ�ַ��ļ��ͷ��򣩡��磺
��/str1 ��ַ�� str1
��n ��ҳ� str1 �ַ��´γ��ֵ�λ��
��?str2 ��ַ�� str2
��Σ��ļ�ĩβ��ͷʱ��ѭ��ļ��һ�˲��ִ�С�
��
��ظ�
��
��ڱ༭�ĵ��Ĺ��У�Ϊ��ĳ��ı༭��ɵĺ��ó����⣬��û�ϣ��µĹ��λ��ظ�ǰ��ִ�й��ı༭����ظ��
��u ��ǰһ��Ľ��
��. �ظ��һ��޸��ĵ��
��Alt + u ��ǰһ��Ľ��
��
��6. �ı�ѡ��
��
��vi �ɽ��뵽һ�ֳ�Ϊ Visual ��ģʽ��ڸ�ģʽ�£��û��ù��ƶ��ӵ�ѡ��ı��Ȼ�� ִ��༭��ɾ��Ƶȡ� v �ַ�ѡ�� V ��ѡ��
��
��ģʽ
��
��ģʽ
��
��ڱ༭ģʽ��ȷ��λ��֮�󣬿��л��ģʽ��
��i��insert �ڹ��
��a �ڹ��Ҳ��
��o �ڹ��е��һ��
��O �ڹ��е��һ��
��I �ڹ��еĿ�ͷ��
��A �ڹ��е�ĩβ��
��˼��л��ģʽ�ļ򵥷��⻹��һЩ����ڽ��ģʽ֮ǰ��ɾȥһ��ģ��Ӷ�ʵ��ĵ��滻��Щ��
��s ��滻��ָ��ַ�
��ns ��滻��Ҳ� n ��ַ�
��cw ��滻��Ҳ��
��ncw ��滻��Ҳ�� n ��
��cb ��滻��
��ncb ��滻�� n ��
��cd ��滻��
��ncd ��滻�� n ��
��c$ ��滻�ӹ�꿪ʼ��ĩβ��ַ�
��c0 ��滻�ӱ��п�ͷ��ַ�
��
��˳��ģʽ
��
��˳��ģʽ�ķ��ǣ�� ESC ��ϼ�Ctrl+[ ��
��
��ģʽ
��
�� vi ��ģʽ�£��ʹ�ø��ӵ���ڱ༭ģʽ�¼��롰:��Ļ��һ�У�� ʾð�ţ��ʱ�ѽ��ģʽ��ģʽ�ֳơ�ĩ��ģʽ��û��ݾ��ʾ��Ļ�� һ�У��س��vi ִ��
��
��˳��
��
��ڱ༭ģʽ�¿�� ZZ ��˳� vi �༭��򣬸����޸ģ��ԭʼ�ļ��ֻ��Ҫ�˳��༭��򣬶��㱣��༭��ݣ��
��: q ��δ��޸ĵ��˳�
��: q! ��޸ģ��˳��༭��
��wq ��޸��˳�
��
��2. �к��ļ�
��
��༭�е�ÿһ��Ķ��Լ��кţ��ƶ��굽ָ��У�
��: n ��Ƶ�� n ��ģʽ�£��Թ涨��кŷ�Χ��ֵ��ָ��кţ��ַ��.��ʾ��е��кţ��ַ��$��ʾ��һ�е��кţ��򵥵ı��ʽ��硰.+5��ʾ��ǰ��µĵ� 5 �С��磺
:345 ��Ƶ�� 345 ��
:345w file �� 345 ��д�� file �ļ�
:3,5w file �� 3 �� 5 ��д�� file �ļ�
:1,.w file �� 1 ��ǰ��д�� file �ļ�
:.,$w file ��ǰ��һ��д�� file �ļ�
:.,.+5w file �ӵ�ǰ�п�ʼ�� 6 ��д�� file �ļ�
:1,$w file ��д�� file �ļ��൱�� :w file ��
��ģʽ�£��ļ��ж�ȡ��ģ��д��ļ��磺
:w ��༭��д��ԭʼ�ļ��༭��м��
:wq ��༭��д��ԭʼ�ļ��˳��༭��൱�� ZZ ��
:w file ��༭��д�� file �ļ��ԭ��ļ��ݲ��
:a,bw file �� a �� b �е��д�� file �ļ�
:r file ��ȡ file �ļ��ݣ��뵱ǰ��еĺ��
:e file �༭��ļ� file ��ԭ��
:f file ��ǰ�ļ��Ϊ file
:f ��ӡ��ǰ�ļ��ƺ�״̬��ļ��ڵ��кŵ�
��
��ַ��
��
��һ��ַ��ͨ��ַ��ָ��С��ϣ��ַ��/��֮�䣻��ϣ��ַ��?��֮�䡣��磺
:/str/ ��Ƶ��һ��ַ�� str ��
:?str? ��Ƶ��һ��ַ�� str ��
:/str/w file ��һ��ַ�� str ��д�� file �ļ�
:/str1/,/str2/w file ��ַ�� str1 ��ַ�� str2 ��д�� file �ļ�

MetaQuerier: Exploring and Integrating the Deep Web

jdilt — 2007/7/26 13:20:58

MetaQuerier: Exploring and Integrating the Deep Web

|| Projects || Funding || People || Publications || Tutorials || Demos || Datasets ||

This research aims at enabling effective access to structured information sources on the Internet. Over the past few years, the Web has deepened dramatically- A significant and increasing amount of information is hidden on the "deep" Web, behind the query interfaces of searchable databases. There are numerous such autonomous and heterogeneous sources, each with a different schema and native query constraints. Because current crawlers cannot effectively query databases, such data is invisible to traditional search engines, and thus remains largely hidden from users.

We propose to build a metaquery system, to help users in finding and querying online databases effectively and uniformly. Our efforts aim at opening up the deep Web to users, by building a MetaQuerier; see the architecture below. On this wild frontier of the deep Web, the MetaQuerier will address the challenges of both exploration and integration. Our goal is thus two fold: First, to make the deep Web systematically accessible: the MetaExplorer will discover sources on the deep Web to build a searchable repository, in order to help users find sources useful for their information need. Second, to make the deep Web uniformly usable: the MetaIntegrator will help users interact with online databases to ask queries.

Projects

First, the MetaExplorer project focuses on the discovery, modeling, and structuring of databases on the Web, to build a searchable source repository. Essentially, this MetaExplorer project will develop a "search engine" of Web databases: It will develop crawlers for efficiently discovering databases on the Internet, design models for representing these databases, develop wrappers for automatically extracting their model parameters (e.g., schema details on their query interfaces), and structure and index a searchable repository of Web sources.

Second, the MetaIntegrator project focuses on the integration issues of online sources-- i.e., to bring sources coherently together for query answering. Specifically, we will investigate source selection, query mediation, and schema integration, for building the MetaIntegrator. In studying large-scale integration, these thrusts will benefit from the source repository of the companion MetaExplorer. We will investigate the key enabling technology of dynamic ad-hoc information integration. In contrast to a traditional static system, our MetaIntegrator is dynamic (as new sources may be added any time when they are discovered) and essentially requires ad-hoc integration, which must dynamically select and bring together different sources to answer a query.
Given the pressing need for effective access to the deep Web, we believe the synergy between the exploration and integration focuses of the two sub-projects will together bring a more complete and timely solution for realizing our MetaQuerier goal.

Funding
We gratefully acknowledge our funding sources:

NSF CAREER Award 2002, IIS-0133199: for MetaExplorer

NSF ITR Award 2003, IIS-0313260: for MetaIntegrator

NSF REU/ITR Award 2004, IIS-0434721

Intel WIE Intel Scholars Grant 2004

NCSA (National Center for Supercomputing Applications) Faculty Fellows Award 2003

UIUC Faculty Startup Funds

People

Kevin Chen-Chuan Chang

Bin He

Chengkai Li

Zhen Zhang

Govind Kabra

Shui-Lung Chuang

Publications

Context-Aware Wrapping: Synchronized Data Extraction. S.-L. Chuang, K. C.-C. Chang, and C. Zhai. To appear in Proceedings of the 33rd International Conference on Very Large Data Bases (VLDB), Vienna, Austria, September 23-28 2007. [PDF]

Accessing the Deep Web: A Survey. B. He, M. Patel, Z. Zhang, and K. C.-C. Chang. Communications of the ACM (CACM), 50(5):94-101, May 2007. [PDF]

Collaborative Wrapping: A Turbo Framework for Web Data Extraction. S.-L. Chuang, K. C.-C. Chang, and C. Zhai. In Proceedings of the IEEE 23rd International Conference on Data Engineering (ICDE), Istanbul, Turkey, April 2007. [PDF] (poster)

Automatic Complex Schema Matching across Web Query Interfaces: A Correlation Mining Approach. B. He and K. C.-C. Chang. ACM Transactions on Database Systems (TODS), 31(1), March 2006. [PDF]
Light-weight Domain-based Form Assistant: Querying Web Databases On the Fly. Z. Zhang, B. He, and K. C.-C. Chang. In Proceedings of the 31st Very Large Data Bases Conference (VLDB 2005), Trondheim, Norway, August 2005. [PDF]
Making Holistic Schema Matching Robust: An Ensemble Approach. B. He and K. C.-C. Chang. In Proceedings of the 2005 ACM SIGKDD Conference (KDD 2005) (Full Paper), Chicago, Illinois, August 2005. [PDF]

Query Routing: Finding Ways in the Maze of the Deep Web. G. Kabra, C. Li, and K. C.-C. Chang. In Proceedings of the ICDE International Workshop on Challenges in Web Information Retrieval and Integration (ICDE-WIRI 2005), Tokyo, Japan, April 2005. [PDF]

Toward Large Scale Integration: Building a MetaQuerier over Databases on the Web. K. C.-C. Chang, B. He, and Z. Zhang. In Proceedings of the Second Conference on Innovative Data Systems Research (CIDR 2005), Asilomar, California, January 2005. [PDF]

Mining Semantics for Large Scale Integration on the Web: Evidences, Insights and Challenges. K. C.-C. Chang, B. He, and Z. Zhang. SIGKDD Explorations, 6(2):67-76, December 2004. Invited paper. [PDF]

A Holistic Paradigm for Large Scale Schema Matching. B. He and K. C.-C. Chang. SIGMOD Record, 33(4):20-25, December 2004. Invited paper. [PDF]

Organizing Structured Web Sources by Query Schemas: A Clustering Approach. B. He, T. Tao, and K. C.-C. Chang. In Proceedings of the 13th Conference on Information and Knowledge Management (CIKM 2004) (Full Paper), Washington D.C., November 2004. [PDF]

Structured Databases on the Web: Observations and Implications. K. C.-C. Chang, B. He, C. Li, M. Patel, and Z. Zhang. SIGMOD Record, 33(3):61-70, September 2004. [PDF]

MetaQuerier over the Deep Web: Shallow Integration across Holistic Sources. K. C.-C. Chang, B. He, and Z. Zhang. In Proceedings of the VLDB Workshop on Information Integration on the Web (VLDB-IIWeb'04), Toronto, Canada, August 2004. [PDF]

On-the-fly Constraint Mapping across Web Query Interfaces. Z. Zhang, B. He, and K. C.-C. Chang. In Proceedings of the VLDB Workshop on Information Integration on the Web (VLDB-IIWeb'04), Toronto, Canada, August 2004. [PDF]

Discovering Complex Matchings across Web Query Interfaces: A Correlation Mining Approach. B. He, K. C.-C. Chang, and J. Han. In Proceedings of the 2004 ACM SIGKDD Conference (KDD 2004) (Full Paper), Seattle, Washington, August 2004. [PDF]

Mining Complex Matchings across Web Query Interfaces. B. He, K. C.-C. Chang, and J. Han. In Proceedings of the 9th ACM SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery (SIGMOD-DMKD'04) (Full Paper), Paris, France, June 2004. [PDF]

Understanding Web Query Interfaces: Best-Effort Parsing with Hidden Syntax. Z. Zhang, B. He, and K. C.-C. Chang. In Proceedings of the 2004 ACM SIGMOD Conference (SIGMOD 2004), Paris, France, June 2004. [PDF]

Clustering Structured Web Sources: A Schema-based, Model-Differentiation Approach. B. He, T. Tao, and K. C.-C. Chang. In Proceedings of the EBDT Workshop on Clustering Information over the Web (EDBT-ClustWeb'04), Crete, Greece, March 2004. An expanded version of this paper, invited to be a part of the Current Trends in Database Technology volume, is published in the Springer-Verlag Lecture Notes in Computer Science Series Vol. 3268. [PDF]

Statistical Schema Matching across Web Query Interfaces. B. He and K. C.-C. Chang. In Proceedings of the 2003 ACM SIGMOD Conference (SIGMOD 2003), San Diego, California, June 2003. [PDF]

Approximate Query Translation Across Heterogeneous Information Sources. K. C.-C. Chang and H. Garcia-Molina. In Proceedings of the 26th VLDB Conference (VLDB 2000), pages 566-577, Cairo, Egypt, September 2000. [Extended Version]

Technical Reports

A Structure-Driven Yield-Aware Web Form Crawler: Building a Database of Online Databases. B. He, C. Li, D. Killian, M. Patel, Y. Tseng, and K. C.-C. Chang. UIUCDCS-R-2006-2752, Department of Computer Science, UIUC, July 2006. [PDF]

Tutorials

Accessing the Web: From Search to Integration. K. C.-C. Chang and J. Cho. In Proceedings of the 2006 ACM SIGMOD Conference (SIGMOD 2006), Chicago, June 2006. Tutorial description. [PDF] [Part II: Web Integration; Bibliography]

Demos

Online Demo: Query capability extraction for understanding Web query interfaces

MetaQuerier: Querying Structured Web Sources On-the-fly. B. He, Z. Zhang, and K. C.-C. Chang. In Proceedings of the 2005 ACM SIGMOD Conference (SIGMOD 2005), System Demonstration, Baltimore, Maryland, June 2005. [PDF]

MetaQuerier: Querying Structured Web Sources On-the-fly. B. He, Z. Zhang, and K. C.-C. Chang. In Second Midwest Database Research Symposium, Chicago, Illinois, April 2005.

Towards Building a MetaQuerier: Extracting and Matching Web Query Interfaces. B. He, Z. Zhang, and K. C.-C. Chang. In Proceedings of the 21st International Conference on Data Engineering (ICDE 2005), System Demonstration, Tokyo, Japan, April 2005. [PDF]

Towards Building a MetaQuerier: Extracting and Matching Web Query Interfaces. B. He, Z. Zhang, and K. C.-C. Chang. In NSF Information and Data Management (IDM) Workshop 2004, Boston, Massachussett, October 2004.

Knocking the Door to the Deep Web: Integrating Web Query Interfaces. B. He, Z. Zhang, and K. C.-C. Chang. In Proceedings of the 2004 ACM SIGMOD Conference (SIGMOD 2004), System Demonstration, Paris, France, June 2004. [PDF]

Toward a MetaQuerier for the Deep Web: Integrating Web Query Interfaces. B. He, Z. Zhang, and K. C.-C. Chang. In First Midwest Database Research Symposium, Chicago, Illinois, April 2004.

Knocking the Doors to the Deep Web: Understanding Web Query Interfaces. Z. Zhang, B. He, and K. C.-C. Chang. In NSF Information and Data Management (IDM) Workshop 2003, Seattle, Washington, September 2003.

Datasets

The UIUC Web Integration Repository

��ת��"��ҳ"��;��빤��

jdilt — 2007/7/26 12:38:11

��"��ҳ"��;��빤��

��־��

"��ҳ"��The Invisible Web��ָ��̳��ǲ�Ը��ĳЩ��ݣ��Ϊ��ԭ��ͨ��棨popular search engines��޷��ݡ��Щ��ͨ��"֩��"��׽ӽ��"��"��deep Web��Ҫ��ҳ��visible Web��500�౶��Invisible Web �Ѿ��ѧ�ߺ��ߵĹ�ע��ԣ��վ��Ѱ�󹹽��ܹ��ʾInvisible Web��Ŀ¼ָ�ϣ��Ľ��ܵȶ��ֶԲߣ��;��ԣ��û�Ӧע��ԣ��Ϥ��Ŀ¼��ר��棬��Ӧ��ɡ�һ��˵��Invisible Web��Դ�Ŀ¼ָ�ϣ�directories��м��ܵ��վ��searchable sites��ݿ⣨free Web databases��Լ�ר��棨specialized search engines��ͨ��;��֣�ѡ��ʹ��Ӧ�ļ��ߡ�

һ��Ŀ¼ָ��

1��Librarians' Index to the Internet( http://lii.org/)��һ��ľ��ͼ��Աɸѡ��ά��İ��14,000��վ��Ŀ¼��ڲ�ѯ��һ��ʼ��"and databases"�Ϳ��Խ��ص�
"Invisible Web"��Դ��"biology and databases"��ݿ⣩��Ϳ��ҵ��ͨ��޷��й��﷽��ݿ��Դ��

2��FindLaw ( http://www.findlaw.com/)��ķ��վ��ڷ��Ŀ��ݿ⣬�ǲ��ҷ��Invisible Web�ĳ��ù��ߡ�

3��InfoMine ( http://infomine.ucr.edu)��ͼ��Ա��Ƶİ��120,000 ��ѧ��վ�ķ��Ŀ¼��

4��About.com ( http://www.about.com/) ��ݹ㷺��ѯ��ƣ��ڶ�"Invisible Web"��Դ��о�ѡ��ź��ۣ��"Invisible Web"��ҵ��ܶ��ҳ��ӣ��磺"Invisible Web: The Cloaked Internet"��"��ҳ"��ڸǵ��Դ�� " Visible versus Invisible Web"��ӿɼ��ҳ��"��ҳ"��ȵȣ� ��Ϊ��"Invisible Web"��ָ�ϡ�

5��Academicinfo�� http://www.academicinfo.net/��ѧ��Դ��ָ�ϣ��ṩһ��ʺϴ�ѧ��õ�ѧ��Դ��ڡ��"Subject Gateway"��֪ʶ��֪ʶ��ķ�ʽϸ��⣬��»㼯��ѧ�Ƶİ��ݿ�ȵĸ��Դ��վ��ӡ��Ŀ¼��ͼ��ݺ�ѧ��λ�ĵ��ԴΪ��ķ��ָ��Ѱ��ʹ�÷��㡣

��"Invisible Web"��վ

1��Direct Search ( http://www.freepint.com/gary/direct.htm)��Ȩ��ڼ��"��ҳ"��վ��ӵ��Ӵ��Invisible Web��Դ��ӡ�

2��The Invisible Web Directory ( http://www.invisible-web.net/)��ҳ��The Invisible Web: Uncovering Information Sources Search Engines Can't See �� Chris Sherman�� Gary Price��ר��ָ��Invisible Web��Դ��վ��վ��ּ��"Ѱ��޷��ҵ��ص��Դ"��Finding Hidden Internet Resources Search Engines Can't See��

3��Profusion ( http://www.profusion.com)�� Intelliseek��˾��µ�һ��Ͳ��Ԫ��棬��Ϊ��WEB��ά��News��ţ��Jobs��ְҵ��Ϣ��MP3�� Downloads��ļ��Legal��򣩡�Discussions��飩��21��Դ��ࡣ��ҳ��Ϊ��޷��ݿ⡢�ٿ�ȫ��Դ��͵��Ϣ��

4��CompletePlanet ( http://www.completeplanet.com/)��BrightPlanet��˾��Ӫ��վ��70,000��Լ��ݿ⼰ר��棬��ڴ��ݿ��Ҳ��ܹ��ͨ��ļ��

��Invisible Web ��ݿ�

1��AnimalSearch ( http://animalsearch.net/)��һ��ʺϼ�ͥʹ�õ��йض��վ��ݿ⡣

2��Educator's Reference Desk ( http://www.eduref.org/)��ڹ�ȥ��ʮ��У��վʼ��AskERIC��վ�ϵ�2000��ѧ�ƻ�, 3000��߽��Ϣ��Ӻ�200��ѯ��վ�ṩ "ERIC��о��ݿ�"--��Ľ��Դ��ݿ��GEM��Gateway to Educational Materials��ļ��ڡ�

3��NatureServe Explorer( http://www.natureserve.org/explorer)��ϰٿ�ȫ�飬�ṩ��ͼ��ô�
60,000��ֲ�����̬ϵͳ��Ȩ��ϡ�

4��Nuclear Explosions Database ( http://www.ga.gov.au/oracle/nukexp_query.html)��Ĵ��ǵ��ѧ��ݿ⣬�ṩ1945��緶Χ�ں˱�ը�ĵص㡢ʱ�䡢��ģ��ݣ��"databases"�µ�"Online Tools"��Կ��ͼ��ߺ��ݿ��б��

5��PubMed ( http://www.ncbi.nlm.nih.gov/entrez/query.fcgi), �ṩ��MEDLINE��ҽѧͼ��ݵ��ݿ⣩1400��ȫ�ĺ��Դ�ļ��ҽѧͼ��ݣ�NLM��Ĺ��＼��Ϣ��ģ�NCBI��ֻ��ѧ�ڿ��׹ݣ�PubMed Central (PMC)��160��רҵ�ڿ�ȫ�ĺ͡��ܡ�
��Bookshelf��ݿ��в��ҽѧ��ͼ��ȫ��,��ṩĿǰ��ݿ�NCBI��Entrez��ݿ��ϵͳ��Դ�Ϊ��ڿ��Լ��ѧ��ݿ⡣

6��LookSmart's FindArticles ( http://www.findarticles.com/)��LookSmart��һ��ȫ��ݿ⣬��ṩ900��ֳ��5500��ƪ��µ�ȫ��Ѽ��ʹ�ӡ��LookSmart��ҳ��http://search.looksmart.com/��е�"Articles"��ť��Ҳ�ɽ��ݿ⡣

7��Directory of Open Access Journals ( http://www.doaj.org/)��2003��5��¡�´�ѧͼ��Ƴ��Ŀ��ʽĿ¼��ϵͳ��ṩ1300��ڿ��ƪĿ��300��Ȼ��ѧ��Ŀ�ѧ��ѧ�ڿ��ȫ�ļ��

�ġ��

1��Incywincy�� http://www.incywincy.com/��Net Research Server (NRS)��Ϊ��ĵ�Invisible Web��棬��Ŀ¼��DMOZ��http://dmoz.org/��ṩ�� Open Directory Project��"֩��"��򲢲��̽Ѱ��е��վ��ץȡODP�е�Invisible Web��ҳ��

2��google scholar( http://scholar.google.com),Googleѧ��ʵ��Google��һ��Ӽ��漰ҽҩ��Լ��ѧ�ȶ��򣬿��ѵ�һЩ��ؼ��ص�ѧ��Կ��£��о��ġ��鼮��ժҪ��ȵ�,��п��г��µĲ�ͬ�汾�Լ��õĴ��Զ�pubmed��＼��Ϣ�� (NCBI)��ڼ��MEDLINE��PreMED-LINE��ݿ��ϼ��ϵͳ��ȶ��רҵ��ݿ��м��

3��Singingfish ( http://www.singingfish.com)��һ��/��Ƶ��棬��ֻ��Windows Media��
Real�� QuickTime��mp3�ȶ�ý��ĵ��ʹ�á�

4��Google News ( http://news.google.com/) ��ܺ��Google��ӵ��4500��Դ��ÿ15��Զ��£��"Top Stories"��˵��ɹ��ͬ��ҵ�"��"��ע��Yahoo!News��Topix.net��
Daypop��Ҳ��ƹ��ܡ�

5��Scirus ( http://www.scirus.com/) ��һ��16700��ҳ��Ŀ�ѧ��棬��ĸ߼��˳Ƶ��Դӿ�ѧѧ�ƣ��磺 Agricultural and Biological Sciences��Astronomy...��ϢԴ��磺NASA�� US Patent Office...��ļ��ʽ��磺PDF��HTML...��Ϣ��ͣ��磺Abstracts��Articles�� Books...��;��1920��ڿ��ѧ��ף��Ҿ��ݡ��桢��º��о��ҳ��Ϊ��㡣

��ַ�HTML�ĵ��non-HTML files��湦��ǿ��ɫ��ͨ��棬��Google (http://www.google.com/) ��Yahoo! (http://www.yahoo.com/)��Gigablast (http://www.gigablast.com/)�ȣ�Ҳ��Invisible Web��ɺ��ӵĹ��

3

jdilt — 2007/7/25 16:03:54

�� Ԥ��

��Ԥ��

1�� ȥ�� һ��

2�� Ѷ��Դ��ݺϳ�һ�µ��ݴ洢

3�� ת�� 淶��ӳ�䵽һ��ض��[0.0,1.0]��ԸĽ��漰��ھ��㷨�ľ��Ⱥ�Ч��

4�� ݹ�Լ ͨ��ۼ��aggregation��ԡ��ࣨclustering��С��ݹ�ģ

5�� ɢ�� & ��ֲ�

Data Mining ѧϰ�ʼǡ�һ��

jdilt — 2007/7/22 13:09:50

��һ�� 

��ݲֿ⣺��һ��Դ�ڵ��վ��ͳһ��ģʽ��֯�洢��֧�־��߷��

��ݲֿ⼼�� ݼ�� OLAP��

OLAP��һ�ַ��л��ܡ��ϲ��;ۼ��ܡ�

��ھ�DM��or ֪ʶ��֣�KDD��̣�

1�� һ��ݣ�

2�� ݼ�� ϶��Դ��

3�� ѡ�� ݿ��м��ص��ݣ�

4�� ݱ任 ��ݱ任��ͳһ��ʺ��ھ��ʽ��

5�� ھ� ��ʹ��ܷ��ȡ��ģʽ��

6�� ģʽ�� ĳ��Ȥ�ȶ��ʶ��ʾ֪ʶ��Ȥ��ģʽ��

7�� ֪ʶ��ʾ ��ʹ�ÿ��ӻ��֪ʶ��ʾ��û��ṩ�ھ򵽵�֪ʶ��

��ھ�Ĺ�� ھ�ʲô��͵�ģʽ

��ھ��ࣺ�� and Ԥ��

Descriptive mining tasks characterize the general properties of the data in the database. Predictive mining tasks perform inference on the current data in order to make predictions.

��Ե��ھ��ݿ��ݵ�һ��ԡ�

Ԥ��ھ��ͨ��Ե�ǰ��ݽ��ƶ��Ԥ�⡣

Data mining functionalities, and the kinds of patterns they can discover, are described below.

1�� Concept/class description: characterization and discrimination

These descriptions can be derived via

1��       data characterization, by summarizing the data of the class under study (often called the target class) in general terms

2��       data discrimination, by comparison of the target class with one or a set of comparative classes (often called the contrasting classes)

3��       both data characterization and discrimination

2�� Association analysis

    Support ֧�ֶ� confidence ��Ŷ�

multidimensional association rule

single-dimensional association rules

3�� Classication and prediction

��ࣺͨ��ǩ��֪��ݣ��ѵ��ݣ��ķ��õ�һ��ģ�ͻ�ʽ��Ȼ��õõ��ģ��Ԥ��ǩδ֪��ķ��ࡣ

ģ�͵ı�ʾ��ʽ��IF-THEN��ж��ѧ��ʽ��

��Ԥ��ݶ��

Ԥ��Ԥ��ȱ�򲻿��õ��ֵ

��ط�� ڷ��Ԥ��֮ǰ��У��ʶ��Է��Ԥ��õ��ԡ�Ԥ��ų��

4�� Clustering analysis

��һ��£�ѵ��ݲ��Ǳ��Ǻõģ��ԣ��Ҫ�þ��ѵ��ݷ��顣��ԭ��ԡ��С�� maximizing the intraclass similarity and minimizing the interclass similarity��
ÿ��γɵĴؿ��һ��࣬��

A Roadmap to Text Mining and Web Mining

jdilt — 2007/7/21 15:09:26

A Roadmap to Text Mining and Web Mining

- Under Construction, Last Modified: Jan 8, 2002 -

Text Mining in General

M. Hearst, Untangling Text Data Mining, ACL99
Mining in Textual Mountains: An Interview with Marti Hearst, Mappa Mundi, 1999
Semio Co., Text Mining and the Knowledge Management Space, 1999
D. Radev, Text Data Mining: An Overview

Workshops

PAKDD-2002 Workshop on Text Mining
SDM-2002 Text Mining Workshop (Text Mining 2002)
ICDM-2001 Workshop on Text Mining (TextDM'2001)
SDM-2001 Text Mining Workshop (TextMine'01)
KDD-2000 Workshop on Text Mining
UMN-IMA Text Mining Workshop (2000)
IJCAI-99 Workshop on Text Mining
ECML-98 Workshop on Text Mining

Tutorials

D. Mladenic & M. Grobelink, ECML/PKDD-2001 Tutorial on Text Mining

Classes

M. Hearst, Seminar on Text Data Mining (SIMS, UCBerkeley)
W. Cohen, Machine Learning for Text Mining (LTI, CS, CMU)
W. Pratt, Text Mining (ICS, UCI)

Links

Google Directory: Text Mining
Open Directory: Text Mining

Web Mining in General

R. Kosala et. al, Web Mining Research: A Survey, SIGKDD Explorations, 2000
R. Cooley et. al, Web Mining: Information and Pattern Discovery on the World Wide Web
D. Greening, Data Mining on the Web, WebTechniques, 2000

People

Sergey Brin
Oren Etzioni
David W. Embley
Filippo Menczer
Bamshad Mobasher

Workshops

ECML/PKDD-2001 Semantic Web Mining Workshop
SDM-2001 Workshop on Web Mining
PRICAI-2000 Workshop on Text and Web Mining
INFWET97

Task-Driven Text Mining

Text Categorization

Y. Yang, An Evaluation of Statistical Approaches to Text Categorization, Journal of IR, 1999
Andrew McCallum
Kamal Nigam
David D. Lewis
AAAI98 Workshop on Learning for Text Categorization
Text Mining, Automatic Classification & Indexing

Document Clustering

Clustering Text & Useful Scripts
Scatter/Gather
I. Dhillon, Co-Clustering Documents and Words Using Bipartite Spectral Graph Partitioning, KDD01.

Rule Mining from Text

H. Ahonen-Myka et. al, Applying Data Mining Techniques in Text Analysis, Technical Report, 1997
R. Feldman et. al, FACT, 1996
R. Feldman et. al, Knowledge Discovery in Texts(KDT), KDD95
Rayid Ghani
Heikki Mannila

Relationship Mining

Y. Park et. al, Hybrid Text Mining for Finding Abbreviations and Their Definitions, EMNLP-2001
N. Sundaresan et. al, Mining the Web for Relations, WWW9, 2000
L. Larkey et. al, Acrophile: An Automated Acronym Extractor and Server, ACM DL-2000
J. Yi et. al, Mining the Web for Acronyms Using the Duality of Patterns and Relations, ACM CIKM-99 Workshop on Web Information and Data Management

Topic Detection

Chris Clifton

Text Segmentation

UCBerkeley TextTiling

Text Summarization

Text Summarization Technology

Knowledge Understanding

Udo Hahn

Text Navigation, Visualization and User Interface

UCBerkeley Cat-a-Cone
MIT Shakespeare Project
Data Visualization

Methodology-Driven Text Mining

Neural Networks

WEBSOM - SOM-based Text Mining
D. Merkl et. al, Text Data Mining, In A Handbook of Natural Language Processing: Techniques and Applications for the Processing of Language as Text, 1998
Vitali Schetinin

Evolutionary Computation

A. Freitas, A Survey of Evolutionary Algorithms for Data Mining and Knowledge Discovery, In Advances in Evolutionary Computation, 2002.

Parallel Text Mining

J. Chen, Parallel Text Mining for Cross-Language Information Retrieval Using a Statistical Translation Model

Hyperlinks Analysis

Soumen Chakrabarti, Automatic Resource Compilation by Analyzing Hyperlink Structure and Associated Text, WWW7, 1998

Application-Driven Text Mining

Bioinformatics

Text Mining for Bioinformatics Tutorial
Text Mining for Molecular Biology

Business and Customer Relationship Management(CRM)

D. Evans, Text Mining Towards Decision Supports, 1999
C. Halliman, Business Intelligence Using Smart Techniques: Environmental Scanning Using Text Mining and Competitor Analysis Using Scenarios and Manual Simulation

Text Mining in the Noisy World

Data Mining in the Noisy World

Z. Tian et. al, An N-gram-based Approach for Detecting Approximately Duplicate Database Records, International Journal on Digital Libraries, 2001
W. Cohen, Hardening Soft Information Sources, KDD00
M. Hernandez et. al, Real-world Data is Dirty: Data Cleansing and The Merge/Purge Problem, Data Mining and Knowledge Discovery, 1998
Merge-Purge and Data Cleaning

Machine Learning in the Noisy World

P. Domingos, Unifying Instance-Based and Rule-Based Induction, Machine Learning, 1996
C. Janikow, FID: Fuzzy Decision Trees

Information Retrieval in the Noisy World

G. Bordogna et. al, Modeling Vagueness in Information Retrieval, ESSIR, 2000

Databases in the Noisy World (a.k.a. Information Integration)

H. Lu et. al, Discovering and Reconciling Semantic Conflicts: A Data Mining Conflicts, DS-7, 1997

Tools for Text Mining (Or Related Fields)

Information Extraction

D. Appelt et. al, Introduction to Information Extraction Technology, IJCAI-99 Tutorial
H. Cunningham, Information Extraction: A User Guide, Technical Report, 1999
I. Muslea, Extraction Patterns for Information Extraction, AAAI-99 Workshop on Machine Learning for Information Extraction
I. Muslea, Extraction Patterns: From Information Extraction to Wrapper Induction, Technical Report, 1998
C. Cardie, Empirical Methods in Information Extraction, AI Magazine, 1997
Repository

I. Muslea, RISE: Repository of Information Extraction

Workshops

AAAI-99 Workshop on Machine Learning for Information Extraction

Machine Learning Techniques for Information Extraction

M. Califf et. al, Relational Learning of Pattern-Match Rules for Information Extraction, AAAI-99
Mark Craven
Tom Mitchell
Stephen Soderland

Statistical Information Extraction

Dayne Freitag
Hugo Zaragoza

Wrapper Induction

N. Kushmerick, Wrapper induction: Efficiency and Expressiveness, Artificial Intelligence, 2000

Information Extraction and Text Mining

IJCAI-2001 Workshop on Adaptive Text Extraction and Mining

Information Extraction and Information Retrieval

J. Bear et. al, Using Information Extraction to Improve Document Retrieval, TREC6, 1997

Natural Language Processing

Computational Linguistics Journal
COLING-2002 | ACL-2002
C. Manning et. al, Foundataions of Statistical Natural Language Processing
D. Jurafsky et. al., Speech and Language Processing
LIA Publication
Text Mining for Natural Language Processing

D. Lin et. al, Discovery of Inference Rules for Question-Answering, Journal of Natural Language Engineering, 2001
D. Lin et. al, Induction of Semantic Classes from Natural Language Text, KDD-2001

Natural Language Processing and Databases

NLDB-2002

Machine Learining

Machine Learning Journal
ICML-2002
Repository

UCI Machine Learning Repository

Machine Learning on Text

UW-Madison: Machine Learning for Text Analysis (2000)
ICML-99 Workshop on Machine Learning in Text Data Analysis

Information Retrieval

Information Retrieval Journal
ACM SIGIR-2002
IR Resources
WebIR
The Center for Intelligent Information Retrieval at UMass
R. Baeza-Yates et. al, Modern Information Retrieval
Information Retrieval for Text Mining

J. Neto et. al, Document Clustering and Text Summarization

Information Retrieval on the Web

Data Engineering Special Issue on Next-Generation Web Search
AAAI-2000 Workshop on AI for Web Search
V. Raghavan, Information Retrieval on the World-Wide Web, 1997

Information Retrieval and Machine Learning

R. Belew et. al, Machine Learning and Information Retrieval

Information Retrieval using Natural Language Processing

A. Arampatzis, Linguistically-Motivated Information Retrieval

Data Mining

Data Mining and Knowledge Discovery Journal
ACM KDD-2002 | ICDM-2002
J. Han et. al, Data Mining: Concepts and Techniques, Morgan Kaufmann, 2001
KDNuggets Directory

Databases

VLDB Journal
ACM SIGMOD-2002
D. Sullivan, Document Warehousing and Text Mining

Web

World Wide Web Journal
WWW-2002
WWW + Databases: WebDB

Digital Libraries

D-Lib Forum & Magazine
ACM JCDL-2002

Intelligent Agents

Autonmous Agents and Multi-Agent Systems Journal
AAMAS-2002
Agent Web
BotSpot FAQ
Web Agent

C. Petrie, Agent-Based Engineering, the Web, and Intelligence, IEEE Expert, 1996
Syskill & Webert: Identifying Interesting Web Sites

Agent and Information Retrieval

Agent-Based IR

Haym Hirsh
Foster Provost
Jason Rennie
Sean Slattery
Charles Elkan
Christos Faloutsos
Geoff Webb
Osmar R. Zaiane
W. Fan
Bing Liu

Institutions

UT-Austin Machine Learning Group
CMU Text Learning Group
University of Helsinki FDK Data Mining and Machine Learning Group
Albert-Ludwigs-University Computational Linguistic Research Group
University of Waikato Text Mining Group
Text Mining at Kent Ridge Digital Lab(KRDL), Singapore
Text Mining at PMSI, France
Imperial College Data Mining Group
Text Mining at KI, Germany
XRCE MLTT
LIA TLN

Projects

IBM Clever
IBM Data Abstraction
Web->KB
WebWatcher
STARTS
FAQFinder
Combining Machine Learning and Natural Language Processing for Knowledge Discovery in Text Corpora
IBM-TRL Text Mining

Products

Brosis Xcise
iCrossReader
Intelligent Miner for Text (IBM)
Leximancer
SRA
Temis
TextAnalyst (Megaputer)
Text Mining by Filter Composition
Text Mining Tools (The Data Warehousing Information Center)
VantagePoint
VisualText (TextAnalysis)
WizSoft
WordStat (Provalis Research)
Alembic Workbench (MITRE)
INTEX (LADL)
LexGram (University of Stuttgart)
LinguistX Platform (InXight)
PAGE (DFKI)
Pinocchio (ITC-Irst)

jdilt�Ĳ���

���̿�ݼ�--û�����һ����

MyEclipse+Resin ��װ�ĵ�

JavaScript DOM �������

The Easy Way to Extract Useful Text from Arbitrary HTML

Converting the HTML to Text

Examining the Data

Filtering the Lines

Supervised Machine Learning

Conclusion

�ȼ������������

the zen of css design < css���ֻ���>

The Road to Enlightenment

FreeBSD 6.2��װ���������滷��

freebsd 6.2 ��װ���ñʼ�

VIָ��ժҪ

VI�����ռ�

MetaQuerier: Exploring and Integrating the Deep Web

MetaQuerier: Exploring and Integrating the Deep Web

|| Projects || Funding || People || Publications || Tutorials || Demos || Datasets ||

Projects

Funding

People

Publications

Technical Reports

Tutorials

Demos

Datasets

��ת������"����������ҳ"��;���빤��

3

Data Mining ѧϰ�ʼǡ�һ��

A Roadmap to Text Mining and Web Mining

A Roadmap to Text Mining and Web Mining

- Under Construction, Last Modified: Jan 8, 2002 -

Text Mining in General

Web Mining in General

Task-Driven Text Mining

Methodology-Driven Text Mining

Application-Driven Text Mining

Text Mining in the Noisy World

Tools for Text Mining (Or Related Fields)

Institutions

Projects

Products

jdilt�Ĳ��

��̿�ݼ�--û��һ��

JavaScript DOM ��

�ȼ��

the zen of css design < css��ֻ��>

FreeBSD 6.2��װ��滷��

freebsd 6.2 ��װ��ñʼ�

VI��ռ�

��ת��"��ҳ"��;��빤��