Análisis cualitativo y cuantitativo de la publicación científica en América Latina sobre la base de datos de DOAJ (2025)

Objetivo

En este sitio se ofrecen datos y visualizaciones sobre la publicación científica en América Latina minados del Directory of Open Access Journals (DOAJ) para el año 2025. Para ello toma los datos de una notebook en R. La notebook consume los datos en tiempo real; por ende, su carga es más lenta. Los gráficos que allí se ofrecen deben correrse de forma manual (Run all cells).

Librerías a utilizar

Cada una de estas librerías se utilizan para realizar análisis estadístico y visualizaciones de la base de datos generada por DOAJ en formato CSV.

# URL oficial de DOAJ
url_doaj <- "https://doaj.org/csv"
# Descargar del CSV
temp <- tempfile(fileext = ".csv")
download.file(url_doaj, destfile = temp, method = "libcurl")

# Lectura de la BBDD
journal <- read.csv(temp, stringsAsFactors = FALSE)

# transformación en tibble
journal <- as_tibble(journal)
head(journal)
# A tibble: 6 × 51
  Journal.title Journal.URL URL.in.DOAJ When.did.the.journal…¹ Alternative.title
  <chr>         <chr>       <chr>                        <int> <chr>            
1 Nyimak: Jour… http://jur… https://do…                   2017 ""               
2 MCBS (Molecu… https://ce… https://do…                   2017 ""               
3 Acta Univers… https://ka… https://do…                   2014 "AUC Philologica"
4 Cuadernos pa… https://re… https://do…                   2016 "CILH"           
5 Enfances, Fa… https://jo… https://do…                   2014 ""               
6 RUDN Journal… http://jou… https://do…                   2008 "Vestnik Rossijs…
# ℹ abbreviated name:
#   ¹​When.did.the.journal.start.to.publish.all.content.using.an.open.license.
# ℹ 46 more variables: Journal.ISSN..print.version. <chr>,
#   Journal.EISSN..online.version. <chr>, Keywords <chr>,
#   Languages.in.which.the.journal.accepts.manuscripts <chr>, Publisher <chr>,
#   Country.of.publisher <chr>, Other.organisation <chr>,
#   Country.of.other.organisation <chr>, Journal.license <chr>, …

Selección de columnas de la BBDD

head(journal[, c("Journal.title", "Subjects", "Keywords")], 15)
# A tibble: 15 × 3
   Journal.title                                              Subjects  Keywords
   <chr>                                                      <chr>     <chr>   
 1 Nyimak: Journal of Communication                           Language… communi…
 2 MCBS (Molecular and Cellular Biomedical Sciences)          Medicine… biomedi…
 3 Acta Universitatis Carolinae: Philologica                  Language… philolo…
 4 Cuadernos para la Investigación de la Literatura Hispánica Language… spanish…
 5 Enfances, Familles, Générations                            Geograph… gender …
 6 RUDN Journal of Political Science                          Politica… politic…
 7 Éducation et Socialisation                                 Educatio… educati…
 8 RUDN Journal of Russian History                            History … russian…
 9 Turkish Journal of Bioscience and Collections              Science:… bioscie…
10 The Rehabilitation Journal                                 Social S… speech-…
11 Transitare                                                 Social S… tourism…
12 European Journal of Biology                                Science:… biology…
13 Revue d'ethnoécologie                                      Geograph… anthrop…
14 Cultura, Educación,  Sociedad                              Educatio… human s…
15 Tarih Dergisi                                              History … history 
print(paste("Total de publicaciones en la base de datos de DOAJ:", length(journal$Journal.title)))
[1] "Total de publicaciones en la base de datos de DOAJ: 22431"

Manipulación y limpieza de la base de datos

Para renombrar las columnas

colnames(journal)
 [1] "Journal.title"                                                              
 [2] "Journal.URL"                                                                
 [3] "URL.in.DOAJ"                                                                
 [4] "When.did.the.journal.start.to.publish.all.content.using.an.open.license."   
 [5] "Alternative.title"                                                          
 [6] "Journal.ISSN..print.version."                                               
 [7] "Journal.EISSN..online.version."                                             
 [8] "Keywords"                                                                   
 [9] "Languages.in.which.the.journal.accepts.manuscripts"                         
[10] "Publisher"                                                                  
[11] "Country.of.publisher"                                                       
[12] "Other.organisation"                                                         
[13] "Country.of.other.organisation"                                              
[14] "Journal.license"                                                            
[15] "License.attributes"                                                         
[16] "URL.for.license.terms"                                                      
[17] "Machine.readable.CC.licensing.information.embedded.or.displayed.in.articles"
[18] "Author.holds.copyright.without.restrictions"                                
[19] "Copyright.information.URL"                                                  
[20] "Review.process"                                                             
[21] "Review.process.information.URL"                                             
[22] "Journal.plagiarism.screening.policy"                                        
[23] "URL.for.journal.s.aims...scope"                                             
[24] "URL.for.the.Editorial.Board.page"                                           
[25] "URL.for.journal.s.instructions.for.authors"                                 
[26] "Average.number.of.weeks.between.article.submission.and.publication"         
[27] "APC"                                                                        
[28] "APC.information.URL"                                                        
[29] "APC.amount"                                                                 
[30] "Journal.waiver.policy..for.developing.country.authors.etc."                 
[31] "Waiver.policy.information.URL"                                              
[32] "Has.other.fees"                                                             
[33] "Other.fees.information.URL"                                                 
[34] "Preservation.Services"                                                      
[35] "Preservation.Service..national.library"                                     
[36] "Preservation.information.URL"                                               
[37] "Deposit.policy.directory"                                                   
[38] "URL.for.deposit.policy"                                                     
[39] "Persistent.article.identifiers"                                             
[40] "Does.the.journal.comply.to.DOAJ.s.definition.of.open.access."               
[41] "Continues"                                                                  
[42] "Continued.By"                                                               
[43] "LCC.Codes"                                                                  
[44] "Subscribe.to.Open"                                                          
[45] "Mirror.Journal"                                                             
[46] "Open.Journals.Collective"                                                   
[47] "Subjects"                                                                   
[48] "Added.on.Date"                                                              
[49] "Last.updated.Date"                                                          
[50] "Number.of.Article.Records"                                                  
[51] "Most.Recent.Article.Added"                                                  

Creación nuevo tibble para manipular la información, según criterios de análisis

journal.select <- journal%>% select(Journal.title, Country.of.publisher, Languages.in.which.the.journal.accepts.manuscripts, Journal.license, Publisher, Review.process, Subjects, APC, Persistent.article.identifiers, Keywords, Added.on.Date)

Renombrar y simplificar nombre de columnas

journal.select <- journal.select %>%
  rename(title = Journal.title) %>%
  rename(country = Country.of.publisher) %>%
  rename(language = Languages.in.which.the.journal.accepts.manuscripts) %>%
  rename (License = Journal.license) %>%
  rename (Review = Review.process) %>%
  rename (Ids = Persistent.article.identifiers) %>%
  rename (Added = Added.on.Date)

Conversión de la columna Added.on.Date en formato año-mes-día

journal.select$Added <- as.Date(journal.select$Added) 

Limpieza de la columna “Subjects”

Se realiza una limpieza y selección de las primeras dos palabras, dentro de los temas de las publicaciones para falicitar manipulación, análisis y visualizaciones.

journal.select$Subjects <- str_extract(journal.select$Subjects, "\\w+(?:[^\\w]+\\w+){0,1}")
#eliminación signos de puntuación
journal.select$Subjects <- gsub("[[:punct:]]", "", journal.select$Subjects)
journal.select <- journal.select %>%
  mutate(country = trimws(country),
         country = case_when(
      str_detect(country, "Bolivia")   ~ "Bolivia",
      str_detect(country, "Venezuela") ~ "Venezuela",
      str_detect(country, "Russian")   ~ "Russia",
      str_detect(country, "Iran")      ~ "Iran",
      str_detect(country, "Korea")     ~ "Korea",
      str_detect(country, "Moldova")   ~ "Moldova",
      str_detect(country, "Congo")     ~ "Congo",
      str_detect(country, "Tanzania")  ~ "Tanzania",
      str_detect(country, "Palestine") ~ "Palestine",
      TRUE ~ country)) %>%
  mutate(Subjects = str_trim(Subjects)) %>%
  mutate(Subjects = sapply(Subjects, function(x) {
    words <- str_split(x, "\\s{2,}|,|\\s*\\band\\b\\s*|\\s+")[[1]]  # dividir por palabras, "and", comas, o espacios
    words <- unique(words[words != ""])  # eliminar vacíos y duplicados
    paste(words, collapse = " ")
  }))

Limpieza columnas “Language” e “Ids”, identificadores persistentes

Se ejecuta una función para que organice los idiomas/ids, lo que facilita las visualizaciones y la manipulación del tibble.

# Crear una función para ordenar los idiomas en una lista
sort_columns <- function(column_list) {
  sorted_columns <- sort(unlist(strsplit(column_list, ", ")))
  return(paste(sorted_columns, collapse = ", "))
}
# Aplicar la función a cada celda en la columna 'language'
journal.select <- journal.select %>%
  mutate(across(c(language, Ids), ~ sapply(.x, sort_columns)))

Función caption

# Función caption
#an_actual <- format(Sys.Date(), "%Y")
add_caption <- function(author = "Romina De León y Gimena del Rio", year = format(Sys.Date(), "%Y")) {
  paste0("Citar como: ", author, ", ", year, 
         ". Análisis de revistas latinoamericanas en DOAJ.")
}

Total de revistas en DOAJ

Creación de un df con porcentajes y cantidades de publicaciones por países

porcen_journal <- journal.select %>%
  group_by(country)%>%
  count()%>%
  ungroup()%>%
  mutate(percentage= round(n / sum(n) * 100, 2)) %>%
  bind_rows(data.frame(country = "Total", n = NA, percentage = sum(.$percentage)))
porcen_journal[order(porcen_journal$n, decreasing = TRUE),] 
# A tibble: 141 × 3
   country            n percentage
   <chr>          <int>      <dbl>
 1 Indonesia       2612      11.6 
 2 United Kingdom  2255      10.0 
 3 Brazil          1456       6.49
 4 United States   1304       5.81
 5 Iran            1043       4.65
 6 Spain           1004       4.48
 7 Poland           964       4.3 
 8 Switzerland      795       3.54
 9 Russia           649       2.89
10 Türkiye          645       2.88
# ℹ 131 more rows

Minería y visualización de datos

Este apartado ofrece gráficas que buscan comparar el contexto de la publicación científica a nivel global con el de de América Latina. Se revisan cuestiones relacionadas con cantidad de revistas, disciplinas,editoriales, lengua de publicación y cobro de APC

Georreferenciación de las revistas a nivel global

p1 <- suppressWarnings(
ne_countries(scale = "large", returnclass = "sf") %>% 
left_join(
    porcen_journal %>%
    filter(!country %in% c("Total")) %>%
      mutate(
             country = trimws(country),
             country_std = case_when(
                country %in% c("United States", "USA") ~ "United States of America",
                TRUE ~ countrycode(country, origin = "country.name", destination = "country.name")
                ),
             country_std = coalesce(country_std, country),
             color_point = case_when( n <= 20 ~ "#91A1AF",
                                      n <= 50 ~ "#21BCFF",
                                      n <= 100 ~ "#F5276C",
                                      n <= 200 ~ "#2C0995",
                                      n <= 300 ~ "#27F5B0",
                                      n <= 450 ~ "#009E3F",
                                      n <= 600 ~ "#2e86c1",
                                      n <= 800 ~ "#1b4f72",
                                      n <= 1000 ~ "#884ea0",
                                      n <= 1250 ~ "#a569bd",
                                      n <= 1500 ~ "#af7ac5",
                                      n <= 1750 ~ "#d98880",
                                      n <= 2500 ~ "#8965F6",
                                      n <= 3500 ~"#729509",
                                      TRUE ~ "darkblue"
    )
      ),
    by = c("name" = "country_std")
  ) %>% 
  filter(!is.na(geometry)) %>%
  mutate(point_geom = st_point_on_surface(geometry),
         tooltip_text = paste0("<strong>", name, "</strong><br/>Revistas: ", n)) %>%
  ggplot() +
  geom_sf(fill = "gray90", color = "white", size = 0.1) +
  geom_point_interactive(
    aes(
      geometry = point_geom, 
      size = n,
     color = color_point,
     tooltip = tooltip_text,
     data_id = name
    ),
    stat = "sf_coordinates",
    alpha = 0.7
  ) +
  scale_size_continuous(range = c(2, 10), guide = "none") +
  scale_color_identity(
  name = "Rangos de n",
  guide = "legend",
  breaks = c("#91A1AF", "#21BCFF", "#F5276C", "#2C0995",
             "#27F5B0", "#009E3F", "#2e86c1", "#1b4f72",
             "#884ea0", "#a569bd", "#af7ac5", "#d98880",
             "#8965F6", "#729509", "darkblue"),
  labels = c("≤ 20", "21–50", "51–100", "101–200",
             "201–300", "301–450", "451–600", "601–800",
             "801–1000", "1001–1250", "1251–1500", "1501–1750",
             "1751–2500", "2501–3500", "> 3500")
)
 +
  labs(
    title = "Distribución de revistas por país",
      x = NULL,
      y = NULL,
    caption = add_caption()
  ) +
  theme_minimal() +
  theme(plot.title = element_text(size = 18, face = "bold"),
  legend.title = element_text(size = 14, face = "bold"),
  legend.text = element_text(size = 12),
  legend.position = "bottom"))

girafe(
  ggobj = p1,
  options = list(
    opts_hover(css = "fill-opacity:1;stroke:black;stroke-width:1pt;"),
    opts_tooltip(css = "background-color:white;color:black;padding:5px;border-radius:5px;font-family:sans-serif;"),
    opts_toolbar(saveaspng = TRUE)
  ),
  width_svg = 10,
  height_svg = 7
)