Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
L
lmp_server
Project
Project
Details
Activity
Releases
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
lmp
lmp_server
Commits
63816961
Commit
63816961
authored
Apr 08, 2024
by
pengxin
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
调整清洗数据
parent
44b91781
Changes
2
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
12 additions
and
7 deletions
+12
-7
DatasetCleanServiceImpl.java
...ce/webadmin/app/service/impl/DatasetCleanServiceImpl.java
+9
-5
DataCleanerUtil.java
...main/java/com/yice/webadmin/app/util/DataCleanerUtil.java
+3
-2
No files found.
application-webadmin/src/main/java/com/yice/webadmin/app/service/impl/DatasetCleanServiceImpl.java
View file @
63816961
...
...
@@ -37,10 +37,7 @@ import org.springframework.transaction.annotation.Transactional;
import
java.io.File
;
import
java.io.FileWriter
;
import
java.io.IOException
;
import
java.util.ArrayList
;
import
java.util.Arrays
;
import
java.util.Date
;
import
java.util.List
;
import
java.util.*
;
import
java.util.concurrent.ConcurrentHashMap
;
import
java.util.concurrent.Future
;
import
java.util.stream.Collectors
;
...
...
@@ -251,9 +248,16 @@ public class DatasetCleanServiceImpl extends BaseService<DatasetClean, Long> imp
List
<
DatasetRule
>
rules
=
new
ArrayList
<>();
if
(
null
!=
datasetCleanConfig
)
{
String
[]
j
sonStrings
=
{
datasetCleanConfig
.
getFilterConfig
(),
datasetCleanConfig
.
getDesensitiveConfig
(),
String
[]
nonEmptyJ
sonStrings
=
{
datasetCleanConfig
.
getFilterConfig
(),
datasetCleanConfig
.
getDesensitiveConfig
(),
datasetCleanConfig
.
getDesensitiveConfig
(),
datasetCleanConfig
.
getDeduplicateConfig
(),
datasetCleanConfig
.
getCleanConfig
()};
String
[]
jsonStrings
=
Arrays
.
stream
(
nonEmptyJsonStrings
)
.
map
(
Optional:
:
ofNullable
)
.
filter
(
Optional:
:
isPresent
)
.
map
(
Optional:
:
get
)
.
toArray
(
String
[]::
new
);
ObjectMapper
objectMapper
=
new
ObjectMapper
();
rules
=
Arrays
.
stream
(
jsonStrings
)
.
map
(
jsonString
->
{
...
...
application-webadmin/src/main/java/com/yice/webadmin/app/util/DataCleanerUtil.java
View file @
63816961
...
...
@@ -180,7 +180,7 @@ public class DataCleanerUtil {
// 计算词重复率
double
repetitionRate
=
(
double
)
repeatedWordsCount
/
totalWords
;
return
repetitionRate
<
threshold
?
document
:
DatasetConstant
.
EMPTY_STR
;
return
repetitionRate
>
threshold
?
DatasetConstant
.
EMPTY_STR
:
document
;
}
/**
...
...
@@ -215,7 +215,8 @@ public class DataCleanerUtil {
// 计算字重复率
double
repetitionRate
=
(
double
)
repeatedCharactersCount
/
totalCharacters
;
return
repetitionRate
<
threshold
?
text
:
DatasetConstant
.
EMPTY_STR
;
//根据阀值判断进行数据返回
return
repetitionRate
>
threshold
?
DatasetConstant
.
EMPTY_STR
:
text
;
}
/**
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment